Python Unit 3
Python Unit 3
Dr. Anand M
Assistant Professor
Python - Modules
• A module allows you to logically organize your Python code. Grouping related code into a module makes the
code easier to understand and use.
• A module is a Python object with arbitrarily named attributes that you can bind and reference.
• Simply, a module is a file consisting of Python code. A module can define functions, classes, and variables. A
module can also include runnable code.
• Example:
• The Python code for a module named aname normally resides in a file named aname.py. Here's an example of a
simple module, hello.py
def print_func( par ):
print "Hello : ", par
return
The import Statement:
• You can use any Python source file as a module by executing an import statement in some other Python source
file. import has the following syntax:
import module1[, module2[,... moduleN]
• When the interpreter encounters an import statement, it imports the module if the module is present in the
search path. A search path is a list of directories that the interpreter searches before importing a module.
• Example:
import hello
hello.print_func("Zara")
This would produce following result:
Hello : Zara
• A module is loaded only once, regardless of the number of times it is imported. This prevents the module
execution from happening over and over again if multiple imports occur.
The from...import * Statement:
It is also possible to import all names from a module into the current namespace by using the following import
statement:
from modname import *
• This provides an easy way to import all the items from a module into the current namespace; however, this
statement should be used sparingly.
Locating Modules:
When you import a module, the Python interpreter searches for the module in the following sequences:
• The current directory.
• If the module isn't found, Python then searches each directory in the shell variable PYTHONPATH.
• If all else fails, Python checks the default path. On UNIX, this default path is normally /usr/local/lib/python/.
The module search path is stored in the system module sys as the sys.path variable. The sys.path variable contains
the current directory, PYTHONPATH, and the installation-dependent default.
The PYTHONPATH Variable:
• The PYTHONPATH is an environment variable, consisting of a list of directories. The syntax of
PYTHONPATH is the same as that of the shell variable PATH.
• Here is a typical PYTHONPATH from a Windows system:
set PYTHONPATH=c:\python20\lib;
• And here is a typical PYTHONPATH from a UNIX system:
set PYTHONPATH=/usr/local/lib/python
Namespaces and Scoping:
• Variables are names (identifiers) that map to objects. A namespace is a dictionary of variable names (keys) and their
corresponding objects (values).
• A Python statement can access variables in a local namespace and in the global namespace. If a local and a global variable have
the same name, the local variable shadows the global variable.
• Each function has its own local namespace. Class methods follow the same scoping rule as ordinary functions.
• Python makes educated guesses on whether variables are local or global. It assumes that any variable assigned a value in a
function is local.
• Therefore, in order to assign a value to a global variable within a function, you must first use the global statement.
• The statement global VarName tells Python that VarName is a global variable. Python stops searching the local namespace for
the variable.
• For example, we define a variable Money in the global namespace. Within the function Money, we assign Money a value .
therefor Python assumes Money is a local variable. However, we access the value of the local variable Money before setting it,
so an UnboundLocalError is the result. Uncommenting the global statement fixes the problem.
Example:
Money = 2000
def AddMoney():
Money = Money + 1
print Money
AddMoney()
print Money
The dir( ) Function:
• The dir() built-in function returns a sorted list of strings containing the names defined by a module.
• The list contains the names of all the modules, variables, and functions that are defined in a module.
• Example:
import math
content = dir(math)
print content;
• This would produce following result:
['__doc__', '__name__', '__package__', 'acos', 'acosh', 'asin', 'asinh',
'atan', 'atan2', 'atanh', 'ceil', 'copysign', 'cos', 'cosh', 'degrees',
'e', 'erf', 'erfc', 'exp', 'expm1', 'fabs', 'factorial', 'floor', 'fmod',
'frexp', 'fsum', 'gamma', 'hypot', 'isinf', 'isnan', 'ldexp', 'lgamma',
'log', 'log10', 'log1p', 'modf', 'pi', 'pow', 'radians', 'sin', 'sinh',
'sqrt', 'tan', 'tanh', 'trunc']
The globals() and locals() Functions:
• The globals() and locals() functions can be used to return the names in the global and local
namespaces depending on the location from where they are called.
• If locals() is called from within a function, it will return all the names that can be accessed locally
from that function.
• If globals() is called from within a function, it will return all the names that can be accessed
globally from that function.
• The return type of both these functions is dictionary. Therefore, names can be extracted using the
keys() function.
The reload() Function:
• When the module is imported into a script, the code in the top-level portion of a module is executed
only once.
• Therefore, if you want to reexecute the top-level code in a module, you can use the reload() function.
The reload() function imports a previously imported module again.
• Syntax:
The syntax of the reload() function is this:
reload(module_name)
Here module_name is the name of the module you want to reload and not the string containing the
module name. For example to re-load hello module, do the following:
reload(hello)
Packages in Python:
Example:
Consider a file Pots.py available in Phone directory. This file has following line of source code:
def Pots():
print "I'm Pots Phone"
Similar way we have another two files having different functions with the same name as above:
Phone/Isdn.py file having function Isdn()
Phone/G3.py file having function G3()
Now create one more file __init__.py in Phone directory :
Phone/__init__.py
To make all of your functions available when you've imported Phone, you need to put explicit import
statements in __init__.py as follows:
from Pots import Pots
from Isdn import Isdn
from G3 import G3
Note: Application file (e.g. package.py) should be saved in the parent directory of Phone sub-directory.
The Python Libraries for data processing, data
mining and visualization
• A library is a collection of pre-combined codes that can be used
iteratively to reduce the time required to code. They are particularly
useful for accessing the pre-written frequently used codes, instead of
writing them from scratch every single time. Similar to the physical
libraries, these are a collection of reusable resources, which means
every library has a root source. This is the foundation behind the
numerous open-source libraries available in Python.
• Python library is a collection of modules that contain functions and
classes that can be used by other programs to perform various tasks.
What is Numpy?
• Numpy, Scipy, and Matplotlib provide MATLAB-like functionality in python.
• Numpy Features:
• Typed multidimentional arrays (matrices)
• Fast numerical computations (matrix math)
• High-level math functions
• NumPy is one of the most essential Python Libraries for scientific
computing and it is used heavily for the applications of Machine Learning
and Deep Learning. NumPy stands for NUMerical PYthon. Machine learning
algorithms are computationally complex and require multidimensional
array operations. NumPy provides support for large multidimensional array
objects and various tools to work with them.
14
Why do we need NumPy
Let’s see for ourselves!
15
Why do we need NumPy
• Python does numerical computations slowly.
• 1000 x 1000 matrix multiply
• Python triple loop takes > 10 min.
• Numpy takes ~0.03 seconds
16
Logistics: Versioning
• In this class, your code will be tested with:
• Python 2.7.6
• Numpy version: 1.8.2
• Scipy version: 0.13.3
• OpenCV version: 2.4.8
17
NumPy Overview
1. Arrays
2. Shaping and transposition
3. Mathematical Operations
4. Indexing and slicing
5. Broadcasting
18
Arrays
Structured lists of numbers.
• Vectors
• Matrices
• Images
• Tensors
• ConvNets
19
Arrays
Structured lists of numbers.
• Vectors
• Matrices
• Images
• Tensors
• ConvNets
20
Arrays
Structured lists of numbers.
• Vectors
• Matrices
• Images
• Tensors
• ConvNets
21
Arrays
Structured lists of numbers.
• Vectors
• Matrices
• Images
• Tensors
• ConvNets
22
Arrays
Structured lists of numbers.
• Vectors
• Matrices
• Images
• Tensors
• ConvNets
23
Arrays, Basic Properties
import numpy as np
a = np.array([[1,2,3],[4,5,6]],dtype=np.float32)
print a.ndim, a.shape, a.dtype
24
Arrays, creation
• np.ones, np.zeros
• np.arange
• np.concatenate
• np.astype
• np.zeros_like, np.ones_like
• np.random.random
25
Arrays, creation
• np.ones, np.zeros
• np.arange
• np.concatenate
• np.astype
• np.zeros_like, np.ones_like
• np.random.random
26
Arrays, creation
• np.ones, np.zeros
• np.arange
• np.concatenate
• np.astype
• np.zeros_like, np.ones_like
• np.random.random
27
Arrays, creation
• np.ones, np.zeros
• np.arange
• np.concatenate
• np.astype
• np.zeros_like, np.ones_like
• np.random.random
28
Arrays, creation
• np.ones, np.zeros
• np.arange
• np.concatenate
• np.astype
• np.zeros_like, np.ones_like
• np.random.random
29
Arrays, creation
• np.ones, np.zeros
• np.arange
• np.concatenate
• np.astype
• np.zeros_like, np.ones_like
• np.random.random
30
Arrays, creation
• np.ones, np.zeros
• np.arange
• np.concatenate
• np.astype
• np.zeros_like, np.ones_like
• np.random.random
31
Arrays, creation
• np.ones, np.zeros
• np.arange
• np.concatenate
• np.astype
• np.zeros_like, np.ones_like
• np.random.random
32
Arrays, danger zone
• Must be dense, no holes.
• Must be one type
• Cannot combine arrays of different shape
33
Shaping
a = np.array([1,2,3,4,5,6])
a = a.reshape(3,2)
a = a.reshape(2,-1)
a = a.ravel()
1. Total number of elements cannot change.
2. Use -1 to infer axis shape
3. Row-major by default (MATLAB is column-major)
34
Return values
• Numpy functions return either views or copies.
• Views share data with the original array, like references in Java/C++.
Altering entries of a view, changes the same entries in the original.
• The numpy documentation says which functions return views or
copies
• Np.copy, np.view make explicit copies and views.
35
Transposition
a = np.arange(10).reshape(5,2)
a = a.T
a = a.transpose((1,0))
36
Saving and loading arrays
np.savez(‘data.npz’, a=a)
data = np.load(‘data.npz’)
a = data[‘a’]
37
Image arrays
Images are 3D arrays: width, height, and channels
Common image formats:
height x width x RGB (band-interleaved)
height x width (band-sequential)
Gotchas:
Channels may also be BGR (OpenCV does this)
May be [width x height], not [height x width]
38
Saving and Loading Images
SciPy: skimage.io.imread,skimage.io.imsave
height x width x RGB
PIL / Pillow: PIL.Image.open, Image.save
width x height x RGB
OpenCV: cv2.imread, cv2.imwrite
height x width x BGR
39
Recap
We just saw how to create arrays, reshape them, and permute axes
Questions so far?
40
Recap
We just saw how to create arrays, reshape them, and permute axes
Questions so far?
41
Mathematical operators
• Arithmetic operations are element-wise
• Logical operator return a bool array
• In place operations modify the array
42
Mathematical operators
• Arithmetic operations are element-wise
• Logical operator return a bool array
• In place operations modify the array
43
Mathematical operators
• Arithmetic operations are element-wise
• Logical operator return a bool array
• In place operations modify the array
44
Mathematical operators
• Arithmetic operations are element-wise
• Logical operator return a bool array
• In place operations modify the array
45
Math, upcasting
Just as in Python and Java, the result of a math operator is cast to the
more general or precise datatype.
uint64 + uint16 => uint64
float32 / int32 => float32
46
Math, universal functions
Also called ufuncs
Element-wise
Examples:
• np.exp
• np.sqrt
• np.sin
• np.cos
• np.isnan
47
Math, universal functions
Also called ufuncs
Element-wise
Examples:
• np.exp
• np.sqrt
• np.sin
• np.cos
• np.isnan
48
Math, universal functions
Also called ufuncs
Element-wise
Examples:
• np.exp
• np.sqrt
• np.sin
• np.cos
• np.isnan
49
Indexing
x[0,0] # top-left element
x[0,-1] # first row, last column
x[0,:] # first row (many entries)
x[:,0] # first column (many entries)
Notes:
• Zero-indexing
• Multi-dimensional indices are comma-separated (i.e., a tuple)
50
Indexing, slices and arrays
I[1:-1,1:-1] # select all but one-pixel
border
I = I[:,:,::-1] # swap channel order
I[I<10] = 0 # set dark pixels to black
I[[1,3], :] # select 2nd and 4th row
51
Python Slicing
Syntax: start:stop:step
a = list(range(10))
a[:3] # indices 0, 1, 2
a[-3:] # indices 7, 8, 9
a[3:8:2] # indices 3, 5, 7
a[4:1:-1] # indices 4, 3, 2 (this one is tricky)
52
Python Libraries for Data Science
Pandas:
adds data structures and tools designed to work with table-like data (similar
to Series and Data Frames in R)
Link: http://pandas.pydata.org/
53
Pandas
• From Data Exploration to visualization to analysis – Pandas is the
almighty library you must master!
Pandas Series
It is like one-dimensional array capable of holding data of any type
(integer, string, float, python objects, etc.). Series can be created using
constructor.
Syntax :- pandas.Series( data, index, dtype, copy) Creation of Series
is also possible from – ndarray, dictionary, scalar value.
Series can be created using
1. Array
2. Dict
3. Scalar value or constant
Data Handling using Pandas
Pandas Series
e.g.
s = pseries.Series()
print(s)
Output
Series([], dtype: float64)
Data Handling using Pandas
Pandas Series
Create a Series from ndarray
Without index With index position
e.g. e.g.
Output Output
1 a 100 a
2 b 101 b
3 c 102 c
4 d 103 d
dtype: object Dtype:object
Note : default index is starting from 0
Note : index is starting from 100
Data Handling using Pandas
Pandas Series
Create a Series from dict
Eg.1(without index) Eg.2 (with index) import pandas as
import pandas as pd1 pd1 import numpy as np1
import numpy as np1 data = {'a' : 0., 'b' : 1., 'c' : 2.}
data = {'a' : 0., 'b' : 1., 'c' : 2.} s = pd1.Series(data,index=['b','c','d','a'])
s = pd1.Series(data) print(s)
print(s)
Output
Output b 1.0
a 0.0 c 2.0
b 1.0 d NaN
c 2.0 a 0.0
dtype: float64 dtype: float64
Data Handling using Pandas
Create a Series from Scalar e.g
import pandas as pd1
import numpy as np1
s = pd1.Series(5, index=[0, 1, 2, 3])
print(s)
Output
0 5
1 5
2 5
3 5
dtype: int64
Note :- here 5 is repeated for 4 times (as per no of index)
Data Handling using Pandas
Pandas Series
Maths operations with Series
e.g.
import pandas as pd1
s = pd1.Series([1,2,3])
t = pd1.Series([1,2,4])
u=s+t #addition operation 0 2
1 4
print (u) 2 7
u=s*t # multiplication operation dtype: int64
print (u) output
0 1
1 4
2 12
dtype: int64
Data Handling using Pandas
Pandas Series Head function
e.g
Output
a 1
b. 2
c. 3
dtype: int64
Return first 3 elements
Data Handling using Pandas
Pandas Series tail function
e.g
Output
c 3
d. 4
e. 5
dtype: int64
Return last 3 elements
dtype: int64 c 3
d. 4
e. 5
dtype: int64
Data Handling using Pandas -1
Pandas Series
Retrieve Data Using Label as (Index) e.g.
Output c
3
d 4
dtype: int64
Data Handling using Pandas -1
Pandas Series
Retrieve Data from selection
There are three methods for data selection:
loc gets rows (or columns) with particular labels from the index.
iloc gets rows (or columns) at particular positions in the index (so it
only takes integers).
ix usually tries to behave like loc but falls back to behaving like iloc if a
label is not present in the index.
ix is deprecated and the use of loc and iloc is encouraged instead
Data Handling using Pandas -1
Pandas Series Retrieve
Data from selection
e.g.
>>> s = pd.Series(np.nan,
index=[49,48,47,46,45, 1, 2, 3, 4, 5]) >>> s.ix[:3] # the integer is in the index so s.ix[:3] works
>>> s.iloc[:3] # slice the first three rows
like loc
49 NaN
49 NaN 48 NaN
48 NaN
47 NaN 47 NaN
>>> s.loc[:3] # slice up to and including label 3 46 NaN
45 NaN
49 NaN
48 NaN
1 NaN
47 NaN 2 NaN
46 NaN 3 NaN
45 NaN
1 NaN
2 NaN
3 NaN
Aggregation Functions in Pandas
Aggregation - computing a summary statistic about each group, i.e.
• compute group sums or means
• compute group sizes/counts
min, max
count, sum, prod
mean, median, mode, mad
std, var
69
Aggregation Functions in Pandas
agg() method are useful when multiple statistics are computed per column:
In [ ]: flights[['dep_delay','arr_delay']].agg(['min','mean','max'])
Out[ ]:
70
Basic Descriptive Statistics
df.method() description
describe Basic statistics (count, mean, std, min, quantiles, max)
kurt kurtosis
71
Matplotlib
• Matplotlib is the most popular library for exploration and data
visualization in the Python ecosystem. Every other library is built upon
this library.
• Matplotlib offers endless charts and customizations from histograms
to scatterplots, matplotlib lays down an array of colors, themes,
palettes, and other options to customize and personalize our plots.
matplotlib is useful whether you’re performing data exploration for a
machine learning project or building a report for stakeholders, it is
surely the handiest library!
Matplotlib
Python Library – Matplotlib
Matplotlib is a comprehensive library for creating static, animated, and interactive
visualizations in Python.It is used to create
1. Develop publication quality plots with just a few lines of code
2. Use interactive figures that can zoom, pan, update...
We can customize and Take full control of line styles, font properties, axes properties... as
well as export and embed to a number of file formats and interactive environments
Graphics to explore the data
Seaborn package is built on matplotlib but provides high level
interface for drawing attractive statistical graphics, similar to ggplot2
library in R. It specifically targets statistical data visualization
In [ ]: %matplotlib inline
74
Graphics
description
distplot histogram
barplot estimate of central tendency for a numeric variable
violinplot similar to boxplot, also shows the probability density of the
data
jointplot Scatterplot
regplot Regression plot
pairplot Pairplot
boxplot boxplot
swarmplot categorical scatterplot
factorplot General categorical plot
75
Basic statistical Analysis
statsmodel and scikit-learn - both have a number of function for statistical analysis
The first one is mostly used for regular analysis using R style formulas, while scikit-learn is
more tailored for Machine Learning.
statsmodels:
• linear regressions
• ANOVA tests
• hypothesis testings
• many more ...
scikit-learn:
• kmeans
• support vector machines
• random forests
• many more ...
76
Plotly
• Plotly is a free and open-source data visualization library. It is high
quality, publication-ready and interactive charts. Boxplot, heatmaps,
bubble charts are a few examples of the types of available charts.
• It is one of the finest data visualization tools available built on top of
visualization library D3.js, HTML, and CSS. It is created using Python
and the Django framework.