1 / 53

Data Collections

Data Collections. Chapter 11 (skip 11.3, 11.4, 11.5 & 11.6.3). Adapted from the online slides provided by John Zelle ( http://mcsp.wartburg.edu/zelle/python/ppics2/index.html ). Objectives. To understand the use of lists (arrays) to represent a collection of related data.

barid
Download Presentation

Data Collections

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Data Collections Chapter 11 (skip 11.3, 11.4, 11.5 & 11.6.3) Adapted from the online slides provided by John Zelle (http://mcsp.wartburg.edu/zelle/python/ppics2/index.html)

  2. Objectives • To understand the use of lists (arrays) to represent a collection of related data. • To be familiar with the functions and methods available for manipulating Python lists. • To be able to write programs that use lists to manage a collection of information. • To understand the use of Python dictionaries for storing nonsequential collections.

  3. Example Problem:Simple Statistics • Many programs deal with large collections of similar information. • Words in a document • Students in a course • Data from an experiment • Customers of a business • Graphics objects drawn on the screen • Cards in a deck

  4. Sample Problem:Simple Statistics Let’s review some code we wrote in chapter 8: # average4.py # A program to average a set of numbers # Illustrates sentinel loop using empty string as sentinel def main(): sum = 0.0 count = 0 xStr = input("Enter a number (<Enter> to quit) >> ") while xStr != "": x = eval(xStr) sum = sum + x count = count + 1 xStr = input("Enter a number (<Enter> to quit) >> ") print("\nThe average of the numbers is", sum / count) main()

  5. Sample Problem:Simple Statistics • This program allows the user to enter a sequence of numbers, but the program itself doesn’t keep track of the numbers that were entered – it only keeps a running total. • Suppose we want to extend the program to compute not only the mean, but also the median and standard deviation.

  6. Sample Problem:Simple Statistics • The median is the data value that splits the data into equal-sized parts. • For the data 2, 4, 6, 9, 13, the median is 6, since there are two values greater than 6 and two values that are smaller. • One way to determine the median is to store all the numbers, sort them, and identify the middle value.

  7. Sample Problem:Simple Statistics • The standard deviation is a measure of how spread out the data is relative to the mean. • If the data is tightly clustered around the mean, then the standard deviation is small. If the data is more spread out, the standard deviation is larger. • The standard deviation is a yardstick to measure/express how exceptional the data is.

  8. Sample Problem:Simple Statistics • The standard deviation is • Here is the mean, represents the ith data value and n is the number of data values. • The expression is the square of the “deviation” of an individual item from the mean.

  9. Sample Problem:Simple Statistics • The numerator is the sum of these squared “deviations” across all the data. • Suppose our data was 2, 4, 6, 9, and 13. • The mean is 6.8 • The numerator of the standard deviation is

  10. Sample Problem:Simple Statistics • As you can see, calculating the standard deviation not only requires the mean (which can’t be calculated until all the data is entered), but also each individual data element! • We need some way to remember these values as they are entered.

  11. Applying Lists • We need a way to store and manipulate an entire collection of numbers. • We can’t just use a bunch of variables, because we don’t know many numbers there will be. • What do we need? Some way of combining an entire collection of values into one object.

  12. Lists and Arrays • Suppose the sequence is stored in a variable s. We could write a loop to calculate the sum of the items in the sequence like this:sum = 0for i in range(n): sum = sum + s[i] • Almost all computer languages have a sequence structure like this, sometimes called an array.

  13. Lists and Arrays • A list or array is a sequence of items where the entire sequence is referred to by a single name (i.e. s) and individual items can be selected by indexing (i.e.s[i]). • In other programming languages, arrays are generally a fixed size, meaning that when you create the array, you have to specify how many items it can hold. • Arrays are generally also homogeneous, meaning they can hold only one data type.

  14. Lists and Arrays • Python lists are dynamic. They can grow and shrink on demand. • Python lists are also heterogeneous, a single list can hold arbitrary data types. • Python lists are mutable sequences of arbitrary objects.

  15. List Operations

  16. List Operations • Except for the membership check, we’ve used these operations before on strings. • The membership operation can be used to see if a certain value appears anywhere in a sequence.>>> mylist = [1,2,3,4]>>> 3 in mylistTrue >>> if 4 in mylist: print(“Yes”) Yes >>>

  17. List Operations • The summing example from earlier can be written like this:sum = 0for x in s: sum = sum + x • Unlike strings, lists are mutable:>>> mylist = [1,2,3,4]>>> mylist[3]4>>> mylist[3] = "Hello”>>> mylist[1, 2, 3, 'Hello']>>> mylist[2] = 7>>> mylist[1, 2, 7, 'Hello']

  18. List Operations • A list of identical items can be created using the repetition operator. This command produces a list containing 50 zeroes:zeroes = [0] * 50

  19. List Operations • Lists are often built up one piece at a time using append.nums = []x = eval(input('Enter a number: '))while x >= 0: nums.append(x) x = eval(input('Enter a number: ')) • Here, nums is being used as an accumulator, starting out empty, and each time through the loop a new value is tacked on.

  20. List Operations

  21. List Operations >>> lst = [3, 1, 4, 1, 5, 9] >>> lst.append(2) >>> lst [3, 1, 4, 1, 5, 9, 2] >>> lst.sort() >>> lst [1, 1, 2, 3, 4, 5, 9] >>> lst.reverse() >>> lst [9, 5, 4, 3, 2, 1, 1] >>> lst.index(4) 2 >>> lst.insert(4, "Hello") >>> lst [9, 5, 4, 3, 'Hello', 2, 1, 1] >>> lst.count(1)s 2 >>> lst.remove(1) >>> lst [9, 5, 4, 3, 'Hello', 2, 1] >>> lst.pop(3) 3 >>> lst [9, 5, 4, 'Hello', 2, 1]

  22. List Operations • Most of these methods don’t return a value – they change the contents of the list in some way. • Lists can grow by appending new items, and shrink when items are deleted. Individual items or entire slices can be removed from a list using the del operator.

  23. List Operations • >>> myList=[34, 26, 0, 10]>>> del myList[1]>>> myList[34, 0, 10]>>> del myList[1:3]>>> myList[34] • del isn’t a list method, but a built-in operation that can be used on list items.

  24. List Operations • Basic list principles • A list is a sequence of items stored as a single object. • Items in a list can be accessed by indexing, and sublists can be accessed by slicing. • Lists are mutable; individual items or entire slices can be replaced through assignment statements.

  25. List Operations • Lists support a number of convenient and frequently used methods. • Lists will grow and shrink as needed.

  26. Statistics with Lists • One way we can solve our statistics problem is to store the data in lists. • We could then write a series of functions that take a list of numbers and calculates the mean, standard deviation, and median. • Let’s rewrite our earlier program to use lists to find the mean.

  27. Statistics with Lists • Let’s write a function called getNumbers that gets numbers from the user. • We’ll implement the sentinel loop to get the numbers. • An initially empty list is used as an accumulator to collect the numbers. • The list is returned once all values have been entered.

  28. Statistics with Lists def getNumbers(): nums = [] # start with an empty list # sentinel loop to get numbers xStr = input("Enter a number (<Enter> to quit) >> ") while xStr != "": x = eval(xStr) nums.append(x) # add this value to the list xStr = input("Enter a number (<Enter> to quit) >> ") return nums • Using this code, we can get a list of numbers from the user with a single line of code:data = getNumbers()

  29. Statistics with Lists • Now we need a function that will calculate the mean of the numbers in a list. • Input: a list of numbers • Output: the mean of the input list • def mean(nums): sum = 0.0 for num in nums: sum = sum + num return sum / len(nums)

  30. Statistics with Lists • The next function to tackle is the standard deviation. • In order to determine the standard deviation, we need to know the mean. • Should we recalculate the mean inside of stdDev? • Should the mean be passed as a parameter to stdDev?

  31. Statistics with Lists • Recalculating the mean inside of stdDev is inefficient if the data set is large. • Since our program is outputting both the mean and the standard deviation, let’s compute the mean and pass it to stdDev as a parameter.

  32. Statistics with Lists • def stdDev(nums, xbar): sumDevSq = 0.0 for num in nums: dev = xbar - num sumDevSq = sumDevSq + dev * dev return sqrt(sumDevSq/(len(nums)-1)) • The summation from the formula is accomplished with a loop and accumulator. • sumDevSq stores the running sum of the squares of the deviations.

  33. Statistics with Lists • We don’t have a formula to calculate the median. We’ll need to come up with an algorithm to pick out the middle value. • First, we need to arrange the numbers in ascending order. • Second, the middle value in the list is the median. • If the list has an even length, the median is the average of the middle two values.

  34. Statistics with Lists • Pseudocode - sort the numbers into ascending order if the size of the data is odd: median = the middle value else: median = the average of the two middle values return median

  35. Statistics with Lists def median(nums): nums.sort() size = len(nums) midPos = size // 2 if size % 2 == 0: median = (nums[midPos] + nums[midPos-1]) / 2 else: median = nums[midPos] return median

  36. Statistics with Lists • With these functions, the main program is pretty simple! • def main(): print("This program computes mean, median and standard deviation.") data = getNumbers() xbar = mean(data) std = stdDev(data, xbar) med = median(data) print("\nThe mean is", xbar) print("The standard deviation is", std) print("The median is", med)

  37. Tuples • Tuples are similar to lists but are immutable (their content can’t be changed) • Parentheses are used to represent tuples instead of square brackets • When it is known that its content won’t change then use tuples instead of lists as they are more efficient, otherwise use lists

  38. Tuples examples >>> a = (1,2,3) >>> a (1, 2, 3) >>> type(a) <class 'tuple'> >>> a[1] 2 >>> a[1:2] (2,) >>> a[0:2] (1, 2) >>> for x in a: print(x) 1 2 3 >>> a[1] = 4 Traceback (most recent call last): File "<pyshell#22>", line 1, in <module> a[1] = 4 TypeError: 'tuple' object does not support item assignment

  39. Non-Sequential Collections • Python provides another built-in data type for collections, called a dictionary. • Not all programming languages have dictionaries, while almost all have arrays or lists.

  40. Dictionary Basics • Typically, when we retrieve information from a sequential collection, we look it up by its position, or index, in the collection. • Say you want to retrieve data about students or employees based on social security numbers and not by the index of the student or the employee.

  41. Dictionary Basics • The combination of social security number with other data is known as a key-value pair. • We access the value (the student information) associated with a particular key (the social security number) • It is easy to think of many key-value pairs: username & passwords, names & phone numbers, etc.

  42. Dictionary Basics • A collection that allows us to loop up data with arbitrary keys is called a mapping • Python dictionaries are mappings • Some other languages call them hashes or associative arrays

  43. Dictionary Basics • A dictionary can be created in Python by listing key-value pairs inside curly brackets: • >>> passwd = {“guido”:”superprogrammer”, “turing”:”genius”, “bill”:”monopoly”} • Keys and values are joined with ‘:’, and commas are used to separate pairs.

  44. Dictionary Basics • The main use of a dictionary is to look up a value associated with a particular key, using indexing notation: • >>> passwd[“guido”] “superprogrammer” >>>passwd[“bill”] “monopoly” • <dictionary>[<key>] returns the object associated with the given key

  45. Dictionary Basics • Dictionaries are mutable. The value associated with a key can be changed with assignment. • >>> passwd[“bill”] = “bluescreen” >>>passwd {“turing”:”genius”, “bill”:”bluescreen”, “guido”:”superprogrammer”} • Did you notice the dictionary did not print out in the same order it was entered? • Mappings are unordered.

  46. Dictionary Basics • Python stored dictionaries in a way that makes key lookup very efficient. • Special algorithms are used for this • If you want to keep a collection of items in a certain order, use a list! • But lists won’t allow you to access an item through its key, you can only access through an index

  47. Dictionary Basics • Dictionaries are mutable collections that implement a mapping from keys to values. • Keys can be any standard type, like strings, ints and floats • Values can be of any type, including lists and programmer-defined classes.

  48. Dictionary Operations • Python dictionaries support several built-in operations. • Dictionaries can be extended (data added after creation) by adding new entries. • >>> passwd[“newuser”] = “ImANewbie” >>>passwd {“turing”:”genius”, “bill”:”bluescreen”, “newuser:IamANewbie”, “guido”:”superprogrammer”}

  49. Dictionary Operations • A common way to build a dictionary is to start with an empty collection and add the key-value pairs one at a time. • Suppose usernames and passwords were stored in a file called “passwords”, where each line of the file contains a username and password separated by a space.

  50. Dictionary Operations passwd = {} infile = open(“passwords”, “r”) for line in infile : user, pass = line.split() passwd[user] = pass infile.close()

More Related