3.1.1 Lists Are Mutable
3.1.1 Lists Are Mutable
3.1 LISTS
A list is an ordered sequence of values. It is a data structure in Python. The values inside
the lists can be of any type (like integer, float, strings, lists, tuples, dictionaries etc) and are
called as elements or items. The elements of lists are enclosed within square brackets. For
example,
ls1=[10,-4, 25, 13]
ls2=[“Tiger”, “Lion”, “Cheetah”]
Here, ls1 is a list containing four integers, and ls2 is a list containing three strings. A list
need not contain data of same type. We can have mixed type of elements in list. For
example,
ls3=[3.5, ‘Tiger’, 10, [3,4]]
Here, ls3 contains a float, a string, an integer and a list. This illustrates that a list can be
nested as well.
In fact, list() is the name of a method (special type of method called as constructor –
which will be discussed in Module 4) of the class list. Hence, a new list can be created
using this function by passing arguments to it as shown below –
>>> ls2=list([3,4,1])
>>> print(ls2)
[3, 4, 1]
>>> print(ls[2][0])
2
>>> print(ls[2][1])
3
Note that, the indexing for inner-list again starts from 0. Thus, when we are using double-
indexing, the first index indicates position of inner list inside outer list, and the second index
means the position particular value within inner list.
Unlike strings, lists are mutable. That is, using indexing, we can modify any value within list.
In the following example, the 3rd element (i.e. index is 2) is being modified –
The list can be thought of as a relationship between indices and elements. This relationship
is called as a mapping. That is, each index maps to one of the elements in a list.
34 hi
Hello
-5
List elements can be accessed with the combination of range() and len() functions as well –
ls=[1,2,3,4]
for i in range(len(ls)):
ls[i]=ls[i]**2
Here, we wanted to do modification in the elements of list. Hence, referring indices is suitable
than referring elements directly. The len() returns total number of elements in the list (here
it is 4). Then range() function makes the loop to range from 0 to 3 (i.e. 4-1). Then, for every
index, we are updating the list elements (replacing original value by its square).
>>> ls1=[1,2,3]
>>> ls2=[5,6,7]
>>> print(ls1+ls2) #concatenation using +
[1, 2, 3, 5, 6, 7]
>>> ls1=[1,2,3]
>>> print(ls1*3) #repetition using *
[1, 2, 3, 1, 2, 3, 1, 2, 3]
t=['a','b','c','d','e']
Extracting full list without using any index, but only a slicing operator –
>>> print(t[:])
['a', 'b', 'c', 'd', 'e']
append(): This method is used to add a new element at the end of a list.
>>> ls=[1,2,3]
>>> ls.append(‘hi’)
>>> ls.append(10)
>>> print(ls)
[1, 2, 3, ‘hi’, 10]
extend(): This method takes a list as an argument and all the elements in this list
are added at the end of invoking list.
>>> ls1=[1,2,3]
>>> ls2=[5,6]
>>> ls2.extend(ls1)
>>> print(ls2)
[5, 6, 1, 2, 3]
sort(): This method is used to sort the contents of the list. By default, the function
will sort the items in ascending order.
When we want a list to be sorted in descending order, we need to set the argument
as shown –
>>> ls.sort(reverse=True)
>>> print(ls)
[16, 10, 5, 3, -2]
clear(): This method removes all the elements in the list and makes the list empty.
>>> ls=[1,2,3]
>>> ls.clear()
>>> print(ls)
[]
insert(): Used to insert a value before a specified index of the list.
>>> ls=[3,5,10]
>>> ls.insert(1,"hi")
>>> print(ls)
[3, 'hi', 5, 10]
index(): This method is used to get the index position of a particular value in the list.
>>> ls=[4, 2, 10, 5, 3, 2, 6]
>>> ls.index(2)
1
Here, the number 2 is found at the index position 1. Note that, this function will give
index of only the first occurrence of a specified value. The same function can be
used with two more arguments start and end to specify a range within which the
search should take place.
Here, the argument ls1 for the append() function is treated as one item, and made as
an inner list to ls2. On the other hand, if we replace append() by extend() then the
result would be –
>>> ls1=[1,2,3]
>>> ls2=[5,6]
>>> ls2.extend(ls1)
>>> print(ls2)
[5, 6, 1, 2, 3]
2. The sort() function can be applied only when the list contains elements of compatible
types. But, if a list is a mix non-compatible types like integers and string, the comparison
cannot be done. Hence, Python will throw TypeError. For example,
>>> ls=[34,[2,3],5]
>>> ls.sort()
TypeError: '<' not supported between instances of 'list' and 'int'
Integers and floats are compatible and relational operations can be performed on them.
Hence, we can sort a list containing such items.
3. The sort() function uses one important argument keys. When a list is containing tuples,
it will be useful. We will discuss tuples later in this Module.
4. Most of the list methods like append(), extend(), sort(), reverse() etc. modify the list
object internally and return None.
>>> ls=[2,3]
>>> ls1=ls.append(5)
>>> print(ls)
[2,3,5]
>>> print(ls1)
None
remove(): When we don’t know the index, but know the value to be removed, then
this function can be used.
Note that, this function will remove only the first occurrence of the specified value,
but not all occurrences.
>>> ls=[5,8, -12, 34, 2, 6, 34]
>>> ls.remove(34)
>>> print(ls)
[5, 8, -12, 2, 6, 34]
Unlike pop() function, the remove() function will not return the value that has been
deleted.
del: This is an operator to be used when more than one item to be deleted at a time.
Here also, we will not get the items deleted.
>>> ls=[3,6,-2,8,1]
>>> del ls[2] #item at index 2 is deleted
>>> print(ls)
[3, 6, 8, 1]
>>> ls=[3,6,-2,8,1]
>>> del ls[1:4] #deleting all elements from index 1 to 3
>>> print(ls)
[3, 1]
>>> avg=sum(ls)/len(ls)
>>> print(avg)
11.857142857142858
When we need to read the data from the user and to compute sum and average of those
numbers, we can write the code as below –
ls= list()
while (True):
x= input('Enter a number: ')
if x== 'done':
break
x= float(x)
ls.append(x)
In the above program, we initially create an empty list. Then, we are taking an infinite while-
loop. As every input from the keyboard will be in the form of a string, we need to convert x
into float type and then append it to a list. When the keyboard input is a string ‘done’, then
the loop is going to get terminated. After the loop, we will find the average of those
numbers with the help of built-in functions sum() and len().
The method list() breaks a string into individual letters and constructs a list. If we want a list
of words from a sentence, we can use the following code –
>>> s="Hello how are you?"
>>> ls=s.split()
>>> print(ls)
['Hello', 'how', 'are', 'you?']
Note that, when no argument is provided, the split() function takes the delimiter as white
space. If we need a specific delimiter for splitting the lines, we can use as shown in
following example –
>>> dt="20/03/2018"
>>> ls=dt.split('/')
>>> print(ls)
['20', '03', '2018']
There is a method join() which behaves opposite to split() function. It takes a list of strings
as argument, and joins all the strings into a single string based on the delimiter provided.
For example –
>>> ls=["Hello", "how", "are", "you"]
>>> d=' '
>>> d.join(ls)
'Hello how are you'
Here, we have taken delimiter d as white space. Apart from space, anything can be taken
as delimiter. When we don’t need any delimiter, use empty string as delimiter.
fhand = open(‘logFile.txt’)
for line in fhand:
line = line.rstrip()
if not line.startswith('From '):
continue
words = line.split()
print(words[2])
Obviously, all received mails starts from the word From. Hence, we search for only such
lines and then split them into words. Observe that, the first word in the line would be From,
second word would be email-ID and the 3rd word would be day of a week. Hence, we will
extract words[2] which is 3rd word.
Consider a situation –
a= “hi”
b= “hi”
Now, the question is whether both a and b refer to the same string. There are two
possible states –
a hi a
hi
b hi b
In the first situation, a and b are two different objects, but containing same value. The
modification in one object is nothing to do with the other. Whereas, in the second case,
both a and b are referring to the same object. That is, a is an alias name for b and vice-
versa. In other words, these two are referring to same memory location.
To check whether two variables are referring to same object or not, we can use is operator.
>>> a= “hi”
>>> b= “hi”
>>> a is b #result is True
>>> a==b #result is True
When two variables are referring to same object, they are called as identical objects. When
two variables are referring to different objects, but contain a same value, they are known as
equivalent objects. For example,
>>> s1=input(“Enter a string:”) #assume you entered hello
>>> s2= input(“Enter a string:”) #assume you entered hello
String literals are interned by default. That is, when two string literals are created in the
program with a same value, they are going to refer same object. But, string variables read
from the key-board will not have this behavior, because their values are depending on the
user’s choice.
>>> ls1=[1,2,3]
>>> ls2=[1,2,3]
>>> ls1 is ls2 #output is False
>>> ls1 == ls2 #output is True
3.1.11 Aliasing
When an object is assigned to other using assignment operator, both of them will refer to
same object in the memory. The association of a variable with an object is called as
reference.
>>> ls1=[1,2,3]
>>> ls2= ls1
>>> ls1 is ls2 #output is True
Now, ls2 is said to be reference of ls1. In other words, there are two references to the
same object in the memory.
An object with more than one reference has more than one name, hence we say that object
is aliased. If the aliased object is mutable, changes made in one alias will reflect the other.
>>> ls2[1]= 34
>>> print(ls1) #output is [1, 34, 3]
def del_front(t):
del t[0]
One should understand the operations that will modify the list and the operations that
create a new list. For example, the append() function modifies the list, whereas the +
operator creates a new list.
>>> t1 = [1, 2]
>>> t2 = t1.append(3)
>>> print(t1) #output is [1 2 3]
>>> print(t2) #prints None
>>> t3 = t1 + [5]
>>> print(t3) #output is [1 2 3 5]
>>> t2 is t3 #output is False
Here, after applying append() on t1 object, the t1 itself has been modified and t2 is not
going to get anything. But, when + operator is applied, t1 remains same but t3 will get the
updated result.
The programmer should understand such differences when he/she creates a function
intending to modify a list. For example, the following function has no effect on the original
list –
def test(t):
t=t[1:]
ls=[1,2,3]
test(ls)
print(ls) #prints [1, 2, 3]
def test(t):
return t[1:]
ls=[1,2,3]
ls1=test(ls)
print(ls1) #prints [2, 3]
print(ls) #prints [1, 2, 3]
In the above example also, the original list is not modified, because a return statement always
creates a new object and is assigned to LHS variable at the position of function call.
3.2 DICTIONARIES
A dictionary is a collection of unordered set of key:value pairs, with the requirement that
keys are unique in one dictionary. Unlike lists and strings where elements are accessed
using index values (which are integers), the values in dictionary are accessed using keys. A
key in dictionary can be any immutable type like strings, numbers and tuples. (The tuple
can be made as a key for dictionary, only if that tuple consist of string/number/ sub-tuples).
As lists are mutable – that is, can be modified using index assignments, slicing, or using
methods like append(), extend() etc, they cannot be a key for dictionary.
One can think of a dictionary as a mapping between set of indices (which are actually keys)
and a set of values. Each key maps to a value.
To initialize a dictionary at the time of creation itself, one can use the code like –
>>> tel_dir={'Tom': 3491, 'Jerry':8135}
>>> print(tel_dir)
{'Tom': 3491, 'Jerry': 8135}
>>> tel_dir['Donald']=4793
>>> print(tel_dir)
{'Tom': 3491, 'Jerry': 8135, 'Donald': 4793}
NOTE that the order of elements in dictionary is unpredictable. That is, in the above
example, don’t assume that 'Tom': 3491 is first item, 'Jerry': 8135 is second item
etc. As dictionary members are not indexed over integers, the order of elements inside it
may vary. However, using a key, we can extract its associated value as shown below –
>>> print(tel_dir['Jerry'])
8135
Here, the key 'Jerry' maps with the value 8135, hence it doesn’t matter where exactly it
is inside the dictionary.
If a particular key is not there in the dictionary and if we try to access such key, then the
KeyError is generated.
>>> print(tel_dir['Mickey'])
KeyError: 'Mickey'
The len() function on dictionary object gives the number of key-value pairs in that object.
>>> print(tel_dir)
{'Tom': 3491, 'Jerry': 8135, 'Donald': 4793}
>>> len(tel_dir)
3
The in operator can be used to check whether any key (not value) appears in the dictionary
object.
>>> 'Mickey' in tel_dir #output is False
>>> 'Jerry' in tel_dir #output is True
>>> 3491 in tel_dir #output is False
We observe from above example that the value 3491 is associated with the key 'Tom' in
tel_dir. But, the in operator returns False.
The dictionary object has a method values() which will return a list of all the values
associated with keys within a dictionary. If we would like to check whether a particular value
exist in a dictionary, we can make use of it as shown below –
Each of the above methods will perform same task, but the logic of implementation will be
different. Here, we will see the implementation using dictionary.
It can be observed from the output that, a dictionary is created here with characters as keys
and frequencies as values. Note that, here we have computed histogram of counters.
Dictionary in Python has a method called as get(), which takes key and a default value as
two arguments. If key is found in the dictionary, then the get() function returns
corresponding value, otherwise it returns default value. For example,
s=input("Enter a string:")
d=dict()
for ch in s:
d[ch]=d.get(ch,0)+1
print(d)
In the above program, for every character ch in a given string, we will try to retrieve a
value. When the ch is found in d, its value is retrieved, 1 is added to it, and restored. If ch
is not found, 0 is taken as default and then 1 is added to it.
Output would be –
Tom 3491
Jerry 8135
Mickey 1253
Note that, while accessing items from dictionary, the keys may not be in order. If we want to
print the keys in alphabetical order, then we need to make a list of the keys, and then sort
that list. We can do so using keys() method of dictionary and sort() method of lists. Consider
the following code –
The usage of comma-separated list k,v here is internally a tuple (another data structure in
Python, which will be discussed later).
Now, we need to count the frequency of each of the word in this file. So, we need to take
an outer loop for iterating over entire file, and an inner loop for traversing each line in a file.
Then in every line, we count the occurrence of a word, as we did before for a character.
The program is given as below –
d=dict()
print(d)
The output of this program when the input file is myfile.txt would be –
While solving problems on text analysis, machine learning, data analysis etc. such kinds of
treatment of words lead to unexpected results. So, we need to be careful in parsing the text
and we should try to eliminate punctuation marks, ignoring the case etc. The procedure is
discussed in the next section.
The str class has a method maketrans() which returns a translation table usable for another
method translate(). Consider the following syntax to understand it more clearly –
The above statement replaces the characters in fromstr with the character in the same
position in tostr and delete all characters that are in deletestr. The fromstr and
tostr can be empty strings and the deletestr parameter can be omitted.
Using these functions, we will re-write the program for finding frequency of words in a file.
import string
except:
print("File cannot be opened")
exit()
d=dict()
for line in fhand: line=line.rstrip()
line=line.translate(line.maketrans('','',string.punctuation))
line=line.lower()
print(d)
Comparing the output of this modified program with the previous one, we can make out that
all the punctuation marks are not considered for parsing and also the case of the alphabets
are ignored.
3.3 TUPLES
A tuple is a sequence of items, similar to lists. The values stored in the tuple can be of any
type and they are indexed using integers. Unlike lists, tuples are immutable. That is, values
within tuples cannot be modified/reassigned. Tuples are comparable and hashable objects.
Hence, they can be made as keys in dictionaries.
A tuple can be created in Python as a comma separated list of items – may or may not be
enclosed within parentheses.
If we would like to create a tuple with single value, then just a parenthesis will not suffice.
For example,
>>> t2=tuple()
>>> type(t2)
<class 'tuple'>
If we provide an argument of type sequence (a list, a string or tuple) to the method tuple(),
then a tuple with the elements in a given sequence will be created –
Create tuple using string:
>>> t=tuple('Hello')
>>> print(t)
('H', 'e', 'l', 'l', 'o')
>>> t=tuple([3,[12,5],'Hi'])
>>> print(t)
(3, [12, 5], 'Hi')
Note that, in the above example, both t and t1 objects are referring to same memory
location. That is, t1 is a reference to t.
Elements in the tuple can be extracted using square-brackets with the help of indices.
Similarly, slicing also can be applied to extract required number of items from tuple.
Modifying the value in a tuple generates error, because tuples are immutable –
>>> t[0]='Kiwi'
TypeError: 'tuple' object does not support item assignment
We wanted to replace ‘Mango’ by ‘Kiwi’, which did not work using assignment. But, a tuple
can be replaced with another tuple involving required modifications –
>>> t=('Kiwi',)+t[1:]
>>> print(t)
('Kiwi', 'Banana', 'Apple')
3.3.1 Comparing Tuples
Tuples can be compared using operators like >, <, >=, == etc. The comparison happens
lexicographically. For example, when we need to check equality among two tuple objects,
the first item in first tuple is compared with first item in second tuple. If they are same, 2nd
items are compared. The check continues till either a mismatch is found or items get over.
Consider few examples –
>>> (1,2,3)==(1,2,5)
False
>>> (3,4)==(3,4)
True
The meaning of < and > in tuples is not exactly less than and greater than, instead, it
means comes before and comes after. Hence in such cases, we will get results different
from checking equality (==).
>>> (1,2,3)<(1,2,5)
True
>>> (3,4)<(5,2)
True
When we use relational operator on tuples containing non-comparable types, then
TypeError will be thrown.
>>> (1,'hi')<('hello','world')
TypeError: '<' not supported between instances of 'int' and 'str'
The sort() function internally works on similar pattern – it sorts primarily by first element, in
case of tie, it sorts on second element and so on. This pattern is known as DSU –
Decorate a sequence by building a list of tuples with one or more sort keys
preceding the elements from the sequence,
Sort the list of tuples using the Python built-in sort(), and
Undecorate by extracting the sorted elements of the sequence.
Consider a program of sorting words in a sentence from longest to shortest, which
illustrates DSU property.
The list is: [(3, 'Ram'), (3, 'and'), (5, 'Seeta'), (4, 'went'),
(2, 'to'), (6, 'forest'), (4, 'with'), (8, 'Lakshman')]
In the above program, we have split the sentence into a list of words. Then, a tuple
containing length of the word and the word itself are created and are appended to a list.
Observe the output of this list – it is a list of tuples. Then we are sorting this list in descending
order. Now for sorting, length of the word is considered, because it is a first element in the
tuple. At the end, we extract length and word in the list, and create another list containing
only the words and print it.
>>> x,y=10,20
>>> print(x) #prints 10
>>> print(y) #prints 20
When we have list of items, they can be extracted and stored into multiple variables as
below –
The best known example of assignment of tuples is swapping two values as below –
>>> a=10
>>> b=20
>>> a, b = b, a
>>> print(a, b) #prints 20 10
In the above example, the statement a, b = b, a is treated by Python as – LHS is a
set of variables, and RHS is set of expressions. The expressions in RHS are evaluated and
assigned to respective variables at LHS.
While doing assignment of multiple variables, the RHS can be any type of sequence like
list, string or tuple. Following example extracts user name and domain from an email ID.
>>> email='[email protected]'
>>> usrName, domain = email.split('@')
>>> print(usrName) #prints chetanahegde
>>> print(domain) #prints ieee.org
As dictionary may not display the contents in an order, we can use sort() on lists and then
print in required order as below –
>>> d = {'a':10, 'b':1, 'c':22}
>>> t = list(d.items())
>>> print(t)
[('b', 1), ('a', 10), ('c', 22)]
>>> t.sort()
>>> print(t)
[('a', 10), ('b', 1), ('c', 22)]
3.3.4 Multiple Assignment with Dictionaries
We can combine the method items(), tuple assignment and a for-loop to get a pattern for
traversing dictionary:
d={'Tom': 1292, 'Jerry': 3501, 'Donald': 8913}
for key, val in list(d.items()):
print(val,key)
Once we get a key-value pair, we can create a list of tuples and sort them –
print("List of tuples:",ls)
ls.sort(reverse=True)
print("List of sorted tuples:",ls)
In the above program, we are extracting key, val pair from the dictionary and appending
it to the list ls. While appending, we are putting inner parentheses to make sure that each
pair is treated as a tuple. Then, we are sorting the list in the descending order. The sorting
would happen based on the telephone number (val), but not on name (key), as first element
in tuple is telephone number (val).
lst = list()
for key, val in list(counts.items()):
lst.append((val, key))
lst.sort(reverse=True)
for key, val in lst[:10]:
print(key, val)
Run the above program on any text file of your choice and observe the output.
3.3.6 Using Tuples as Keys in Dictionaries
As tuples and dictionaries are hashable, when we want a dictionary containing composite
keys, we will use tuples. For Example, we may need to create a telephone directory where
name of a person is Firstname-last name pair and value is the telephone number. Our job
is to assign telephone numbers to these keys. Consider the program to do this task –
telDir={}
for i in range(len(number)):
telDir[names[i]]=number[i]
1. Strings are more limited compared to other sequences like lists and Tuples.
Because, the elements in strings must be characters only. Moreover, strings are
immutable. Hence, if we need to modify the characters in a sequence, it is better to
go for a list of characters than a string.
2. As lists are mutable, they are most common compared to tuples. But, in some
situations as given below, tuples are preferable.
a. When we have a return statement from a function, it is better to use tuples
rather than lists.
b. When a dictionary key must be a sequence of elements, then we must use
immutable type like strings and tuples
c. When a sequence of elements is being passed to a function as arguments,
usage of tuples reduces unexpected behavior due to aliasing.
3. As tuples are immutable, the methods like sort() and reverse() cannot be applied on
them. But, Python provides built-in functions sorted() and reversed() which will take
a sequence as an argument and return a new sequence with modified results.