Python Unit-2 Notes
Python Unit-2 Notes
STRINGS:
A string can be defined as set of characters enclosed within pair of
single quotes ' ' or double quotes " " or triple quotes ''' '''.
About Unicode
chr() function
The chr() method returns a character (a string) from an integer
(represents unicode code point of the character).
chr(i)
chr() Parameters
chr() returns:
a character (a string) whose Unicode code point is the integer i
If the integer i is outside the range, ValueError will be raised.
print(chr(97))
print(chr(65))
print(chr(1200))
Output
a
A
Ұ
print(chr(-1))
Output
ord() function
The ord() function returns an integer representing the Unicode
character.
ord(ch)
ord() Parameters
print(ord('5')) # 53
print(ord('A')) # 65
print(ord('$')) # 36
Output
53
65
36
my_string = "Hello"
print(my_string)
my_string = '''Hello'''
print(my_string)
# triple quotes string can extend multiple lines
my_string = '''Hello, welcome to
the world of Python'''
print(my_string)
In the above example, we have not specified any data type, in Python
even though we do not specify any data type by default it will be
considered as string. Whenever we enter the data, by default it will
be converted to string.
Example:
Let us take the sting as
str1='REVA' # Creating a string
Representation of string:
R E V A
0 1 2 3
str1='REVA'
print(len(str1)
Output:
Slicing a string
Slicing in Python is a feature that enables accessing parts of
sequences like strings, tuples, and lists. You can also use them to
modify or delete the items of mutable sequences such as lists.
Program:
Example:
my_string = 'REVA'
my_string[3] = 'a'
Example:
my_string = 'REVA'
del my_string[1]
Example:
del my_string
my_string
Example:
s=('Hi '
' REVA'
' UNIVERSITY')
print(s)
Output:
Hi REVA UNIVERSITY
Example:
str1='SHIVA KUMAR'
for i in str1:
print(i)
Using in as an operator
Example:
str1='REVA'
print('E' in str1)
print('e' in str1)
print('REV' in str1)
str2='ECE'
if 'EC' in str2:
print('Found the String')
else:
print('Not Found')
String Comparison
To compare two strings, we will compare each and every
character in both the strings. If all the characters in both the
strings are same, then only it is possible for us to tell that
both the strings are same. If there is any mismatching
character, then difference between the Unicode values of
mismatching characters in both the strings will be calculated.
Based on the difference value, we will tell whether string1 is
greater or string2 is greater.
Capitalize()
The function is used to convert the first character of a word
into uppercase character.
Example:
str1='reva University'
str1.capitalize()
Output:
Reva University
lower()
The function is used to convert the entire string into lower
case.
Example:
str1='ELECTRONICS'
str1.lower()
Output:
electronics
upper()
The function is used to convert the entire string into Upper
case.
Example:
str1='electronics'
str1.upper()
Output:
ELECTRONICS
center()
The center() method takes two arguments:
Example:
str1='REVA'
str1.center(10)
Output:
' REVA '
Example:
str1='REVA'
str1.center(10,'*')
Output:
'***REVA***'
count()
The function is used to count the number of occurences of a
character or set of characters.
Example:
str1='REVA UNIVERSITY'
str1.count('E')
Output:
2
Example:
str1='REVA UNIVERSITY'
str1.count('REVA')
Output:
1
find()
The function returns an integer value:
.If the substring exists inside the string, it returns the index of
the first occurrence of the substring.
.If substring doesn't exist inside the string, it returns -1.
Example:
str1='REVA UNIVERSITY'
str1.find('A')
Output:
3
Example:
str1='REVA UNIVERSITY'
str1.find('UNI')
Output:
5
rfind()
The function prints the last occurence of the character.
Example:
str1='REVA UNIVERSITY'
str1.find('E')
Output:
9
strip()
The function returns a copy of the string by removing both
the leading and the trailing characters.
Example:
str1=' REVA UNIVERSITY '
str1.strip()
Output:
'REVA UNIVERSITY'
lstrip()
The function returns a copy of the string with leading
characters removed.
Example:
str1=' REVA UNIVERSITY '
str1.lstrip()
Output:
'REVA UNIVERSITY '
rstrip()
The function returns a copy of the string with with trailing
characters removed.
Example:
str1=' REVA UNIVERSITY '
str1.rstrip()
Output:
' REVA UNIVERSITY'
replace()
The replace() method returns a copy of the string where all
occurrences of a substring is replaced with another substring.
Example:
str1='reva UNIVERSITY'
str1.replace('reva','REVA')
Output:
'REVA UNIVERSITY'
title()
The title() method returns a string with first letter of each
word capitalized; a title cased string.
Example:
str1='reva university'
str1.title()
Output:
'Reva University'
split()
The split() method breaks up a string at the specified
separator and returns a list of strings. The string splits at the
specified separator.
If the separator is not specified, any whitespace (space,
newline etc.) string is a separator.
Example:
str1='REVA UNIVERSITY'
str1.split()
Output:
['REVA', 'UNIVERSITY']
Example:
str1='COMPUTER SCIENCE'
str1.split('E')
Output:
['COMPUT', 'R SCI', 'NC', '']
isalpha()
The isalpha() method returns True if all characters in the
string are alphabets. If not, it returns False.
Example:
str1='REVA UNIVERSITY'
str1.isalpha()
Output:
True
isalnum()
The isalnum() method returns True if all characters in the
string are alphanumeric (either alphabets or numbers). If not,
it returns False.
Example:
str1='CS 1 ECE 2'
str1.isalnum()
Output:
True
islower()
The islower() method returns True if all alphabets in a string
are lowercase alphabets. If the string contains at least one
uppercase alphabet, it returns False.
Example:
str1='cse'
str1.islower()
Output:
True
isupper()
The isupper() method returns True if all alphabets in a string
are uppercase alphabets. If the string contains at least one
lowercase alphabet, it returns False.
Example:
str1='CSE'
str1.isupper()
Output:
True
isdigit()
The isdigit() method returns True if all characters in a string
are digits. If not, it returns False.
Example:
str1='2021'
str1.isdigit()
Output:
True
startswith()
The startswith() method returns True if a string starts with
the specified prefix(string). If not, it returns False.
Example:
str1='REVA UNIVERSITY'
str1.startswith('R')
Output:
True
endswith()
The endswith() method returns True if a string ends with the
specified prefix(string). If not, it returns False.
Example:
str1='REVA UNIVERSITY'
str1.startswith('Y')
Output:
True
casefold()
The casefold() method is an aggressive lower() method which
converts strings to case folded strings for caseless matching.
The casefold() method removes all case distinctions present
in a string. It is used for caseless matching, i.e. ignores cases
when comparing.
Example:
str1='REVA'
str1.casefold()
Output:
reva
index()
The index() method returns the index of a substring inside
the string (if found). If the substring is not found, it raises an
exception.
The index() method is similar to find() method for strings. The
only difference is that find() method returns -1 if the
substring is not found, whereas index() throws an exception.
Example:
str1='REVA'
str1.index('R')
Output:
0
Regular Expressions
A Regular Expression (RegEx) is a sequence of characters that defines
a search pattern. For example,
^a...s$
The above code defines a RegEx pattern. The pattern is: any five
letter string starting with a and ending with s.
A pattern defined using RegEx can be used to match against a string.
abs No match
alias Match
Alias No match
An abacus No match
MetaCharacters
Metacharacters are characters that are interpreted in a special way
by a RegEx engine. Here's a list of metacharacters:
[] . ^ $ * + ? {} () \ |
[] - Square brackets
Square brackets specify a set of characters you wish to match.
a 1 match
ac 2 matches
[abc]
Hey Jude No match
abc de ca 5 matches
Here, [abc] will match if the string you are trying to match contains
any of the a, b or c.
You can also specify a range of characters using - inside square
brackets.
[a-e] is the same as [abcde].
. - Period
A period matches any single character (except newline '\n').
Expression String Matched?
a No match
ac 1 match
..
acd 1 match
^ - Caret
The caret symbol ^ is used to check if a string starts with a certain
character.
Expressio
String Matched?
n
a 1 match
^a abc 1 match
bac No match
$ - Dollar
The dollar symbol $ is used to check if a string ends with a certain
character.
Expression String Matched?
a 1 match
a$ formula 1 match
cab No match
* - Star
The star symbol * matches zero or more occurrences of the pattern
left to it.
Expression String Matched?
mn 1 match
man 1 match
woman 1 match
+ - Plus
The plus symbol + matches one or more occurrences of the pattern
left to it.
Expression String Matched?
man 1 match
maaan 1 match
Expression String Matched?
woman 1 match
? - Question Mark
The question mark symbol ? matches zero or one occurrence of the
pattern left to it.
Expression String Matched?
mn 1 match
man 1 match
woma
1 match
n
{} - Braces
Consider this code: {n,m}. This means at least n, and at
most m repetitions of the pattern left to it.
Expression String Matched?
Let's try one more example. This RegEx [0-9]{2, 4} matches at least 2
digits but not more than 4 digits
Expression String Matched?
1 match (match at
ab123csde
ab123csde)
[0-9]{2,4} 12 and
3 matches (12, 3456, 73)
345673
1 and 2 No match
| - Alternation
Vertical bar | is used for alternation (or operator).
Expression String Matched?
cde No match
() - Group
Parentheses () is used to group sub-patterns. For example, (a|b|
c)xz match any string that matches either a or b or c followed by xz
Expression String Matched?
ab xz No match
Special Sequences
Special sequences make commonly used patterns easier to write.
Here's a list of special sequences:
football Match
afootball No match
football No match
afootball Match
ab 2 matches (at a b)
\S
No match
Python RegEx
Python has a module named re to work with regular expressions. To
use it, we need to import the module.
import re
s = re.search(pattern, str)
import re
s = re.search('Python', string)
if s:
print("pattern found inside the string")
else:
print("pattern not found")
>>> s.start()
0
>>> s.end()
6
>>> s.span()
(0, 6)
>>> s.group()
‘Python’
re.match()
import re
pattern = '^a...s$'
test_string = 'abyss'
result = re.match(pattern, test_string)
if result:
print("Search successful.")
else:
print("Search unsuccessful.")
re.sub()
Example1:
re.sub('^a','b','aaa')
Output:
'baa'
Example2:
s=re.sub('a','b','aaa')
print(s)
Output:
‘bbb’
Example3:
s=re.sub('a','b','aaa',2)
print(s)
Output:
‘bba’
re.subn()
The re.subn() is similar to re.sub() expect it returns a tuple of 2 items
containing the new string and the number of substitutions made.
Example1:
s=re.subn('a','b','aaa')
print(s)
Output:
(‘bbb’, 3)
re.findall()
The re.findall() method returns a list of strings containing all
matches.
Syntax:
re.findall(pattern, string)
Example1:
s=re.findall('a','abab')
print(s)
Output:
['a', 'a']
re.split()
The re.split method splits the string where there is a match and
returns a list of strings where the splits have occurred.
You can pass maxsplit argument to the re.split() method. It's the
maximum number of splits that will occur.
Syntax:
re.split(pattern, string)
Example1:
s=re.split('a','abab')
print(s)
Output:
Example2:
s=re.split('a','aababa',3)
print(s)
Output:
import re
pattern='\w+'
s1='shiva'
s2='sachin1'
s3='virat2'
a=re.search(pattern,s1)
b=re.search(pattern,s2)
c=re.search(pattern,s3)
print(a)
print(b)
print(c)
pattern='(0|91)?[6-9][0-9]{9}'
p1='9731822325'
a=re.search(pattern,p1)
if a:
print("Search is successful")
else:
print("Search is unsuccessful")
import re
pattern='(\w)+@(\w)+\.(com)'
email='john_123@gmail.com'
s1=re.search(pattern,email)
if s1:
print("Search is successful")
else:
print("Unsuccessful")
CASE STUDY
Street Addresses: In this case study, we will take one street address
as input and try to perform some operations on the input by making
use of library functions.
Example:
str1.replace('ROAD','RD')
Output:
str1.replace('NORTH','NRTH')
Output:
re.sub('ROAD','RD',str1)
Output:
'100 NORTH MAIN RD'
re.sub('NORTH','NRTH',str1)
Output:
re.split('A',str1)
Output:
re.findall('O',str1)
Output:
['O', 'O']
re.sub('^1','2',str1)
Output:
Roman Numerals
I=1
V=5
X = 10
L = 50
C = 100
D = 500
M = 1000
Ex1:
1940
MCMXL
Ex2:
1946
MCMXLVI
Ex3:
1940
MCMXL
Ex4:
1888
MDCCCLXXXVIII
1000=M
2000=MM
3000=MMM
Example:
pattern = '^M?M?M?$'
re.search(pattern, 'M')
Output:
re.search(pattern, 'MM')
Output:
re.search(pattern, 'MMM')
Output:
re.search(pattern, 'ML')
re.search(pattern, 'MX')
re.search(pattern, 'MI')
re.search(pattern, 'MMMM')
100=C
200=CC
300=CCC
400=CD
500=D
600=DC
700=DCC
800=DCCC
900=CM
Example:
pattern = '^M?M?M?(CM|CD|D?C?C?C?)$'
re.search(pattern,'MCM')
Output:
re.search(pattern,'MD')
Output:
re.search(pattern,'MMMCCC')
Output:
Example:
pattern='^M{0,3}$'
re.search(pattern,'MM')
Output:
Output:
re.search(pattern,'MMM')
Output:
1=I
2=II
3=III
4=IV
5=V
6=VI
7=VII
8=VIII
9=IX
10=X
20=XX
30=XXX
40=XL
50=L
60=LX
70=LXX
80=LXXX
90=XC
Example:
pattern='^M?M?M?(CM|CD|D?C?C?C?)(XC|XL|L?X?X?X?)
(IX|IV|V?I?I?I?)$'
re.search(pattern,'MDLVI')
Output:
re.search(pattern,'MCMXLVI')
Output:
Output: