Data Science Handwritten Notes - 3

MODERN INSTITUTE OF TECHNOLOGY AND RESEARCH CENTRE, ALWAR
NAME OF FACULTY: DR. AWANIT KUMAR
SUB : Data Science Using Python 3ADS-07
SEMESTER : V SESSION : 2021-22 (ODD SEM.)
BRANCH : AI&DS BATCH : A
UNIT : III
Lecture Total
Topics Covered
No. Page
1 Arrays and Vectorized Computation 4
2 The NumPy ND array- Creating ND arrays 4
3 Data Types for ND arrays 3
4 Arithmetic with NumPy Arrays & Basic Indexing and Slicing 3
5 Boolean Indexing-Transposing Arrays and Swapping Axes 4
6 Universal Functions: Fast Element-Wise Array Functions 4
Mathematical and Statistical Methods-Sorting Unique and Other Set

7 4
Logic.
References:
1. Data Analysis with Python A Modern Approach, David Taieb, Packt Publishing, ISBN-
9781789950069
2. Python Data Analysis, Second Ed., Armando Fandango, Packt Publishing, ISBN:
9781787127487
UNIT- 3 LECTURE-1
Numpy Basic: Array & Vectorization
NumPy, short for Numerical Python, is one of the most important foundational packages
for numerical computing in Python.
Consider a NumPy array of one million integers, and the equivalent Python list:
In [7]: import numpy as np
In [8]: my_arr = np.arange(1000000)
In [9]: my_list = list(range(1000000))
A Multidimensional Array
One of the key features of NumPy is its N-dimensional array object, or ndarray, which is
a fast, flexible container for large datasets in Python.
In [12]: import numpy as np
# Generate some random data

In [13]: data = np.random.randn(2, 3)
In [14]: data
Out[14]:
array([[-0.2047, 0.4789, -0.5194],
[-0.5557, 1.9658, 1.3934]])
I then write mathematical operations with data:
In [15]: data * 10
Out[15]:
array([[-2.0471, 4.7894, -5.1944],
[-5.5573, 19.6578, 13.9341]])
In [16]: data + data

Out[16]:
array([[-0.4094, 0.9579, -1.0389],
[-1.1115, 3.9316, 2.7868]])
An ndarray is a generic multidimensional container for homogeneous data; that is, all of
the elements must be the same type. Every array has a shape, a tuple indicating the size of
each dimension, and a dtype, an object describing the data type of the array:
In [17]: data.shape
Out[17]: (2, 3)
In [18]: data.dtype
Out[18]: dtype('float64')
Creating nd-arrays
The easiest way to create an array is to use the array function. This accepts any sequence-
like object (including other arrays) and produces a new NumPy array containing the
passed data.
In [19]: data1 = [6, 7.5, 8, 0, 1]

In [20]: arr1 = np.array(data1)
In [21]: arr1
Out[21]: array([6. , 7.5, 8. , 0. , 1. ])
Nested sequences, like a list of equal-length lists, will be converted into a

multidimensional array:
In [22]: data2 = [[1, 2, 3, 4], [5, 6, 7, 8]]
In [23]: arr2 = np.array(data2)
In [24]: arr2
Out[24]:
array([[1, 2, 3, 4],
[5, 6, 7, 8]])
Since data2 was a list of lists, the NumPy array arr2 has two dimensions with shape
inferred from the data. We can confirm this by inspecting the ndim and shape attributes:
In [25]: arr2.ndim
Out[25]: 2
In [26]: arr2.shape
Out[26]: (2, 4)
In addition to np.array, there are a number of other functions for creating new arrays. As
examples, zeros and ones create arrays of 0s or 1s, respectively:
In [29]: np.zeros(10)
Out[29]: array([0., 0., 0., 0., 0., 0., 0., 0., 0., 0.])
In [30]: np.zeros((3, 6))
Out[30]:
array([[0., 0., 0., 0., 0., 0.],
[0., 0., 0., 0., 0., 0.],
[0., 0., 0., 0., 0., 0.]])
In [31]: np.empty((2, 3, 2))

Out[31]:
array([[[0., 0.],
[0., 0.],
[0., 0.]],
[[0., 0.],
[0., 0.],
[0., 0.]]])
arange is an array-valued version of the built-in Python range function:
In [32]: np.arange(15)
Out[32]: array([ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12,
13, 14])
Vectorization
Arrays are important because they enable you to express batch operations on data without
writing any for loops. NumPy users call this vectorization.
In [51]: arr = np.array([[1., 2., 3.], [4., 5., 6.]])
In [52]: arr
Out[52]:
array([[1., 2., 3.],
[4., 5., 6.]])
In [53]: arr * arr

Out[53]:
array([[ 1., 4., 9.],
[16., 25., 36.]])
In [54]: arr - arr

Out[54]:
array([[0., 0., 0.],
[0., 0., 0.]])
Arithmetic operations with scalars propagate the scalar argument to each element in the
array:
In [55]: 1 / arr
Out[55]:
array([[1. , 0.5 , 0.3333],
[0.25 , 0.2 , 0.1667]])
In [56]: arr ** 0.5

Out[56]:
array([[1. , 1.4142, 1.7321],
[2. , 2.2361, 2.4495]])
Comparisons between arrays of the same size yield boolean arrays:
In [57]: arr2 = np.array([[0., 4., 1.], [7., 2., 12.]])
In [58]: arr2
Out[58]:
array([[ 0., 4., 1.],
[ 7., 2., 12.]])
In [59]: arr2 > arr

Out[59]:
array([[False, True, False],
[ True, False, True]])
Basic Indexing and Slicing
NumPy array indexing is a rich topic, as there are many ways you may want to select a
subset of your data or individual elements. One-dimensional arrays are simple; on the
surface they act similarly to Python lists:
In [60]: arr = np.arange(10)
In [61]: arr
Out[61]: array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
In [62]: arr[5]
Out[62]: 5
In [63]: arr[5:8]
Out[63]: array([5, 6, 7])
In [64]: arr[5:8] = 12
In [65]: arr
Out[65]: array([ 0, 1, 2, 3, 4, 12, 12, 12, 8, 9])
UNIT- 3 LECTURE-2
Data Types in NumPy
NumPy has some extra data types, and refer to data types with one character, like i for
integers, u for unsigned integers etc. Below is a list of all data types in NumPy and the
characters used to represent them.
 i - integer
 b - boolean
 u - unsigned integer
 f - float
 c - complex float
 m - timedelta
 M - datetime
 O - object
 S - string
 U - unicode string
 V - fixed chunk of memory for other type ( void )
import numpy as np
arr = np.array([1, 2, 3, 4])
print(arr.dtype)
Create an array with data type string
import numpy as np
arr = np.array([1, 2, 3, 4], dtype='S')
print(arr)
print(arr.dtype)
For i, u, f, S and U we can define size as well.

import numpy as np
arr = np.array([1, 2, 3, 4], dtype='i4')
print(arr)
print(arr.dtype)
Change data type from float to integer by using 'i' as parameter value:
import numpy as np
arr = np.array([1.1, 2.1, 3.1])
newarr = arr.astype('i')
print(newarr)
print(newarr.dtype)
UNIT- 3 LECTURE-3
NumPy Arithmetic Operations
Arithmetic operations are possible only if the array has the same structure and
dimensions. We carry out the operations following the rules of array manipulation. We
have both functions and operators to perform these functions.
NumPy Add function

This function is used to add two arrays. If we add arrays having dissimilar shapes we get
“Value Error”.
import numpy as np
a = np.array([10,20,100,200,500])
b = np.array([3,4,5,6,7])
np.add(a, b)
Output
array([ 13, 24, 105, 206, 507])

NumPy Add Operator
We can also use the add operator “+” to perform addition of two arrays.
import numpy as np
a = np.array([10,20,100,200,500])
b = np.array([3,4,5,6,7])
print(a+b)
Output
[ 13 24 105 206 507]
NumPy Subtract function

We use this function to output the difference of two arrays. If we subtract two arrays
having dissimilar shapes we get “Value Error”.
import numpy as np
a = np.array([10,20,100,200,500])
b = np.array([3,4,5,6,7])
np.subtract(a, b)
Output
array([ 7, 16, 95, 194, 493])
NumPy Subtract Operator
We can also use the subtract operator “-” to produce the difference of two arrays.
import numpy as np
a = np.array([10,20,100,200,500])
b = np.array([3,4,5,6,7])
print(a-b)
Output
[ 7 16 95 194 493]
NumPy Multiply function

We use this function to output the multiplication of two arrays. We cannot work with
dissimilar arrays.
import numpy as np
a = np.array([7,3,4,5,1])
b = np.array([3,4,5,6,7])
np.multiply(a, b)
Output
array([21, 12, 20, 30, 7])

NumPy Multiply Operator
We can also use the multiplication operator “*” to get the product of two arrays.
import numpy as np
a = np.array([7,3,4,5,1])
b = np.array([3,4,5,6,7])
print(a*b
Output
[21 12 20 30 7]
NumPy Divide Function
We use this function to output the division of two arrays. We cannot divide dissimilar
arrays.
import numpy as np
a = np.array([7,3,4,5,1])
b = np.array([3,4,5,6,7])
np.divide(a,b)
Output
array([2.33333333, 0.75 , 0.8 , 0.83333333, 0.14285714])
NumPy Divide Operator
We can also use the divide operator “/” to divide two arrays.
import numpy as np
a = np.array([7,3,4,5,1])
b = np.array([3,4,5,6,7])
print(a/b)
Output
[2.33333333 0.75 0.8 0.83333333 0.14285714]
NumPy Mod and Remainder function
We use both the functions to output the remainder of the division of two arrays.
NumPy Remainder Function
import numpy as np
a = np.array([7,3,4,5,1])
b = np.array([3,4,5,6,7])
np.remainder(a,b)
Output
array([1, 3, 4, 5, 1])
NumPy Mod Function
import numpy as np
a = np.array([7,3,4,5,1])
b = np.array([3,4,5,6,7])
np.mod(a,b)
Output
array([1, 3, 4, 5, 1])
NumPy Power Function
This Function treats the first array as base and raises it to the power of the elements of the
second array.
import numpy as np
a = np.array([7,3,4,5,1])
b = np.array([3,4,5,6,7])
np.power(a,b)
Output
array([ 343, 81, 1024, 15625, 1])
NumPy Reciprocal Function
This Function returns the reciprocal of all the array elements.
import numpy as np
a = np.array([7,3,4,5,1])
np.reciprocal(a)
Output
array([0, 0, 0, 0, 1])
UNIT- 3 LECTURE-4
NumPy - Indexing & Slicing
Contents of ndarray object can be accessed and modified by indexing or slicing, just like
Python's in-built container objects.
As mentioned earlier, items in ndarray object follows zero-based index. Three types of
indexing methods are available − field access, basic slicing and advanced indexing.
Basic slicing is an extension of Python's basic concept of slicing to n dimensions. A
Python slice object is constructed by giving start, stop, and step parameters to the built-
in slice function. This slice object is passed to the array to extract a part of array.
Example 1
import numpy as np
a = np.arange(10)
s = slice(2,7,2)
print a[s]
Its output is as follows −
[2 4 6]
In the above example, an ndarray object is prepared by arange() function. Then a slice
object is defined with start, stop, and step values 2, 7, and 2 respectively. When this slice
object is passed to the ndarray, a part of it starting with index 2 up to 7 with a step of 2 is
sliced.
The same result can also be obtained by giving the slicing parameters separated by a
colon : (start:stop:step) directly to the ndarray object.
Example 2
import numpy as np
a = np.arange(10)
b = a[2:7:2]
print b
Here, we will get the same output −
[2 4 6]
If only one parameter is put, a single item corresponding to the index will be returned. If
a : is inserted in front of it, all items from that index onwards will be extracted. If two
parameters (with : between them) is used, items between the two indexes (not including
the stop index) with default step one are sliced.
Example 3
# slice single item
import numpy as np
a = np.arange(10)
b = a[5]
print b
Its output is as follows −
5
Example 4
# slice items starting from index
import numpy as np
a = np.arange(10)
print a[2:]
Now, the output would be −
[2 3 4 5 6 7 8 9]
Example 5
# slice items between indexes
import numpy as np
a = np.arange(10)
print a[2:5]
Here, the output would be −
[2 3 4]
The above description applies to multi-dimensional ndarray too.
Example 6
import numpy as np
a = np.array([[1,2,3],[3,4,5],[4,5,6]])
print a
# slice items starting from index

print 'Now we will slice the array from the index a[1:]'
print a[1:]
The output is as follows −
[[1 2 3]
[3 4 5]
[4 5 6]]
Now we will slice the array from the index a[1:]

[[3 4 5]
[4 5 6]]
Slicing can also include ellipsis (…) to make a selection tuple of the same length as the
dimension of an array. If ellipsis is used at the row position, it will return an ndarray
comprising of items in rows.
Example 7
# array to begin with
import numpy as np
a = np.array([[1,2,3],[3,4,5],[4,5,6]])
print 'Our array is:'

print a
print '\n'
# this returns array of items in the second column

print 'The items in the second column are:'
print a[...,1]
print '\n'
# Now we will slice all items from the second row

print 'The items in the second row are:'
print a[1,...]
print '\n'
# Now we will slice all items from column 1 onwards

print 'The items column 1 onwards are:'
print a[...,1:]
The output of this program is as follows −
Our array is:

[[1 2 3]
[3 4 5]
[4 5 6]]
The items in the second column are:

[2 4 5]
The items in the second row are:

[3 4 5]
The items column 1 onwards is:

[[2 3]
[4 5]
[5 6]]
UNIT- 3 LECTURE-5
Boolean Indexing
Let’s consider an example where we have some data in an array and an array of names
with duplicates. I’m going to use here the randn function in numpy.random to generate
some random normally distributed data:
In [98]: names = np.array(['Bob', 'Joe', 'Will', 'Bob', 'Will', 'Joe',

'Joe'])
In [99]: data = np.random.randn(7, 4)
In [100]: names
Out[100]: array(['Bob', 'Joe', 'Will', 'Bob', 'Will', 'Joe', 'Joe'],
dtype='<U4')
In [101]: data
Out[101]:
array([[ 0.0929, 0.2817, 0.769 , 1.2464],
[ 1.0072, -1.2962, 0.275 , 0.2289],
[ 1.3529, 0.8864, -2.0016, -0.3718],
[ 1.669 , -0.4386, -0.5397, 0.477 ],
[ 3.2489, -1.0212, -0.5771, 0.1241],
[ 0.3026, 0.5238, 0.0009, 1.3438],
[-0.7135, -0.8312, -2.3702, -1.8608]])
Suppose each name corresponds to a row in the data array and we wanted to select all the
rows with corresponding name 'Bob'. Like arithmetic operations, comparisons (such
as ==) with arrays are also vectorized. Thus, comparing names with the
string 'Bob' yields a boolean array:
In [102]: names == 'Bob'

Out[102]: array([ True, False, False, True, False, False, False])
This boolean array can be passed when indexing the array:
In [103]: data[names == 'Bob']

Out[103]:
array([[ 0.0929, 0.2817, 0.769 , 1.2464],
[ 1.669 , -0.4386, -0.5397, 0.477 ]])
The boolean array must be of the same length as the array axis it’s indexing. You can
even mix and match boolean arrays with slices or integers (or sequences of integers; more
on this later).
In these examples, I select from the rows where names == 'Bob' and index the columns,
too:
In [104]: data[names == 'Bob', 2:]

Out[104]:
array([[ 0.769 , 1.2464],
[-0.5397, 0.477 ]])
In [105]: data[names == 'Bob', 3]

Out[105]: array([1.2464, 0.477 ])
To select everything but 'Bob', you can either use != or negate the condition using ~:
In [106]: names != 'Bob'

Out[106]: array([False, True, True, False, True, True, True])
In [107]: data[~(names == 'Bob')]

Out[107]:
array([[ 1.0072, -1.2962, 0.275 , 0.2289],
[ 1.3529, 0.8864, -2.0016, -0.3718],
[ 3.2489, -1.0212, -0.5771, 0.1241],
[ 0.3026, 0.5238, 0.0009, 1.3438],
[-0.7135, -0.8312, -2.3702, -1.8608]])
UNIT- 3 LECTURE-6
Universal Functions: Fast Element-Wise Array Functions
A universal function, or ufunc, is a function that performs element-wise operations on

data in ndarrays. You can think of them as fast vectorized wrappers for simple functions
that take one or more scalar values and produce one or more scalar results.
Many ufuncs are simple element-wise transformations, like sqrt or exp:
In [137]: arr = np.arange(10)
In [138]: arr
Out[138]: array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
In [139]: np.sqrt(arr)
Out[139]:
array([0. , 1. , 1.4142, 1.7321, 2. , 2.2361, 2.4495,
2.6458,
2.8284, 3. ])
In [140]: np.exp(arr)
Out[140]:
array([ 1. , 2.7183, 7.3891, 20.0855, 54.5982,
148.4132,
403.4288, 1096.6332, 2980.958 , 8103.0839])
These are referred to as unary ufuncs. Others, such as add or maximum, take two arrays
(thus, binary ufuncs) and return a single array as the result:
In [141]: x = np.random.randn(8)
In [142]: y = np.random.randn(8)
In [143]: x
Out[143]:
array([-0.0119, 1.0048, 1.3272, -0.9193, -1.5491, 0.0222, 0.7584,
-0.6605])
In [144]: y
Out[144]:
array([ 0.8626, -0.01 , 0.05 , 0.6702, 0.853 , -0.9559, -
0.0235,
-2.3042])
In [145]: np.maximum(x, y)
Out[145]:
array([ 0.8626, 1.0048, 1.3272, 0.6702, 0.853 , 0.0222, 0.7584,
-0.6605])
Here, numpy.maximum computed the element-wise maximum of the elements in x and y.

While not common, a ufunc can return multiple arrays. modf is one example, a vectorized
version of the built-in Python divmod; it returns the fractional and integral parts of a
floating-point array:
In [146]: arr = np.random.randn(7) * 5
In [147]: arr
Out[147]: array([-3.2623, -6.0915, -6.663 , 5.3731, 3.6182, 3.45
, 5.0077])
In [148]: remainder, whole_part = np.modf(arr)
In [149]: remainder
Out[149]: array([-0.2623, -0.0915, -0.663 , 0.3731, 0.6182, 0.45
, 0.0077])
In [150]: whole_part
Out[150]: array([-3., -6., -6., 5., 3., 3., 5.])
Ufuncs accept an optional out argument that allows them to operate in-place on arrays:
In [151]: arr
Out[151]: array([-3.2623, -6.0915, -6.663 , 5.3731, 3.6182, 3.45
, 5.0077])
In [152]: np.sqrt(arr)
Out[152]: array([ nan, nan, nan, 2.318 , 1.9022, 1.8574,
2.2378])
In [153]: np.sqrt(arr, arr)

2.2378])
In [154]: arr
2.2378])
UNIT- 3 LECTURE-7
Mathematical and Statistical Methods
A set of mathematical functions that compute statistics about an entire array or about the
data along an axis are accessible as methods of the array class. You can use aggregations
(often called reductions) like sum, mean, and std (standard deviation) either by calling
the array instance method or using the top-level NumPy function.
Here I generate some normally distributed random data and compute some aggregate
statistics:
In [177]: arr = np.random.randn(5, 4)
In [178]: arr
Out[178]:
array([[ 2.1695, -0.1149, 2.0037, 0.0296],
[ 0.7953, 0.1181, -0.7485, 0.585 ],
[ 0.1527, -1.5657, -0.5625, -0.0327],
[-0.929 , -0.4826, -0.0363, 1.0954],
[ 0.9809, -0.5895, 1.5817, -0.5287]])
In [179]: arr.mean()
Out[179]: 0.19607051119998253
In [180]: np.mean(arr)
Out[180]: 0.19607051119998253
In [181]: arr.sum()
Out[181]: 3.9214102239996507
Functions like mean and sum take an optional axis argument that computes the statistic
over the given axis, resulting in an array with one fewer dimension:
In [182]: arr.mean(axis=1)
Out[182]: array([ 1.022 , 0.1875, -0.502 , -0.0881, 0.3611])
In [183]: arr.sum(axis=0)
Out[183]: array([ 3.1693, -2.6345, 2.2381, 1.1486])
Here, arr.mean(1) means “compute mean across the columns” where arr.sum(0) means
“compute sum down the rows.”
Other methods like cumsum and cumprod do not aggregate, instead producing an array of
the intermediate results:
In [184]: arr = np.array([0, 1, 2, 3, 4, 5, 6, 7])
In [185]: arr.cumsum()
Out[185]: array([ 0, 1, 3, 6, 10, 15, 21, 28])
In multidimensional arrays, accumulation functions like cumsum return an array of the

same size, but with the partial aggregates computed along the indicated axis according to
each lower dimensional slice:
In [186]: arr = np.array([[0, 1, 2], [3, 4, 5], [6, 7, 8]])
In [187]: arr
Out[187]:
array([[0, 1, 2],
[3, 4, 5],
[6, 7, 8]])
In [188]: arr.cumsum(axis=0)
Out[188]:
array([[ 0, 1, 2],
[ 3, 5, 7],
[ 9, 12, 15]])
In [189]: arr.cumprod(axis=1)
Out[189]:
array([[ 0, 0, 0],
[ 3, 12, 60],
[ 6, 42, 336]])

Data Science Handwritten Notes - 3

Uploaded by

Copyright:

Available Formats

Data Science Handwritten Notes - 3

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Data Science Handwritten Notes - 3

Uploaded by

Copyright:

Available Formats

MODERN INSTITUTE OF TECHNOLOGY AND RESEARCH CENTRE, ALWAR

NAME OF FACULTY: DR. AWANIT KUMAR

SUB : Data Science Using Python 3ADS-07

SEMESTER : V SESSION : 2021-22 (ODD SEM.)

BRANCH : AI&DS BATCH : A

2 The NumPy ND array- Creating ND arrays 4

3 Data Types for ND arrays 3

4 Arithmetic with NumPy Arrays & Basic Indexing and Slicing 3

5 Boolean Indexing-Transposing Arrays and Swapping Axes 4

6 Universal Functions: Fast Element-Wise Array Functions 4

Mathematical and Statistical Methods-Sorting Unique and Other Set

Numpy Basic: Array & Vectorization

In [7]: import numpy as np

In [8]: my_arr = np.arange(1000000)

In [9]: my_list = list(range(1000000))

In [12]: import numpy as np

# Generate some random data

In [16]: data + data

In [19]: data1 = [6, 7.5, 8, 0, 1]

Nested sequences, like a list of equal-length lists, will be converted into a

In [22]: data2 = [[1, 2, 3, 4], [5, 6, 7, 8]]

In [23]: arr2 = np.array(data2)

In [31]: np.empty((2, 3, 2))

arange is an array-valued version of the built-in Python range function:

In [51]: arr = np.array([[1., 2., 3.], [4., 5., 6.]])

In [53]: arr * arr

In [54]: arr - arr

In [56]: arr ** 0.5

Comparisons between arrays of the same size yield boolean arrays:

In [57]: arr2 = np.array([[0., 4., 1.], [7., 2., 12.]])

In [59]: arr2 > arr

Basic Indexing and Slicing

In [60]: arr = np.arange(10)

Data Types in NumPy

arr = np.array([1, 2, 3, 4])

Create an array with data type string

arr = np.array([1, 2, 3, 4], dtype='S')

For i, u, f, S and U we can define size as well.

arr = np.array([1, 2, 3, 4], dtype='i4')

arr = np.array([1.1, 2.1, 3.1])

NumPy Arithmetic Operations

NumPy Add function

array([ 13, 24, 105, 206, 507])

[ 13 24 105 206 507]

NumPy Subtract function

NumPy Multiply function

array([21, 12, 20, 30, 7])

NumPy Divide Function

array([2.33333333, 0.75 , 0.8 , 0.83333333, 0.14285714])

NumPy Divide Operator

[2.33333333 0.75 0.8 0.83333333 0.14285714]

NumPy Mod and Remainder function

NumPy Remainder Function

NumPy Power Function

array([ 343, 81, 1024, 15625, 1])

NumPy Reciprocal Function

This Function returns the reciprocal of all the array elements.

NumPy - Indexing & Slicing

# slice items starting from index