Unit5 NumPy Pandas Notes
Unit5 NumPy Pandas Notes
Library in python :
Library in python :
•Python has created several open-source libraries, each with its root
source.
•A library is an initially merged collection of code scripts that can be used
iteratively to save time. It's similar to a physical library in that it holds
reusable resources, as the name implies.
import numpy
arr = numpy.array([10,20,30,40,50])
print(arr)
import numpy as np
arr = np.array([10,20,30,40,50])
print(arr)
Output
[10,20,30,40,50]
import numpy as np
arr = np.array([[10,20,30], [40,50,60]])
print(arr)
Output is
[[10 20 30]
[40 50 60]]
Array Creation: Array Manipulation:
numpy.array numpy.reshape
numpy.zeros numpy.ravel
numpy.ones numpy.transpose
numpy.empty numpy.swapaxes
numpy.arange numpy.concatenate
numpy.linspace numpy.vstack
numpy.random.rand numpy.hstack
numpy.random.randn numpy.split
numpy.random.randint numpy.resize
Array Creation:
import numpy as np
my_list = [1, 2, 3, 4]
numpy.array arr = np.array(my_list)
numpy.zeros
numpy.ones print(arr)
numpy.empty
numpy.arange
numpy.linspace
numpy.random.rand
numpy.random.randn
numpy.random.randint
Array Creation: import numpy as np
numpy.array arr = np.zeros(5)
numpy.zeros print(arr)
numpy.ones
numpy.empty [0. 0. 0. 0. 0.]
numpy.arange import numpy as np
numpy.linspace
numpy.random.rand arr = np.zeros((2, 3))
numpy.random.randn print(arr) [[0. 0. 0.]
numpy.random.randint [0. 0. 0.]]
Array Creation: import numpy as np
numpy.array arr = np.ones(4)
numpy.zeros print(arr)
numpy.ones [1. 1. 1. 1.]
numpy.empty
numpy.arange import numpy as np
numpy.linspace
numpy.random.rand
arr = np.ones((2, 3))
numpy.random.randn print(arr) [[1. 1. 1.]
numpy.random.randint [1. 1. 1.]]
Array Creation: import numpy as np
numpy.array arr = np.empty(5)
numpy.zeros print(arr)
numpy.ones
numpy.empty
numpy.arange
numpy.linspace import numpy as np
numpy.random.rand arr = np.empty((2, 2))
numpy.random.randn
numpy.random.randint
print(arr)
Array Creation: import numpy as np
numpy.array arr = np.arange(0, 10, 2)
numpy.zeros print(arr)
numpy.ones
numpy.empty
numpy.arange
numpy.linspace [0 2 4 6 8]
numpy.random.rand
numpy.random.randn
numpy.random.randint
Create an array of evenly spaced values within a specified
interval: np.arange(start, stop, step size).
import numpy as np
arr = np.arange(0, 20, 2) *Exclude Last element
print(arr)
Output [ 0 2 4 6 8 10 12 14 16 18]
import numpy as np
arr = np.array(range(10))
print(arr)
Output
[0 1 2 3 4 5 6 7 8 9]
Array Creation: import numpy as np
numpy.array arr = np.linspace(0, 1, 5)
numpy.zeros print(arr)
numpy.ones
numpy.empty
numpy.arange
numpy.linspace
numpy.random.rand
numpy.random.randn [0. 0.25 0.5 0.75 1. ]
numpy.random.randint
Create an array of evenly spaced numbers in a
specified interval:
numpy.linspace(start, stop, number of elements,
endpoint=True, retstep=False)
numpy.add
numpy.subtract
numpy.multiply
numpy.divide
numpy.power
numpy.exp
numpy.log
numpy.sin
numpy.cos
numpy.tan
numpy.dot
numpy.inner
numpy.outer
Addition: np.add(x, y)
import numpy as np
x = np.array([1, 2, 3])
y = np.array([4, 5, 6])
result = np.add(x, y)
print(result)
# Output: [5 7 9]
Subtraction: np.subtract(x, y)
import numpy as np
x = np.array([1, 2, 3])
y = np.array([4, 5, 6])
result = np.subtract(x, y)
print(result)
# Output: [ 4 10 18]
Division: np.divide(x, y)
import numpy as np
x = np.array([1, 2, 3])
y = np.array([4, 5, 6])
result = np.divide(x, y)
print(result)
x = np.array([1, 2, 3])
y = np.array([2, 3, 4])
result = np.power(x, y)
print(result)
# Output: [ 1 8 81]
Sine: np.sin(x)
import numpy as np
x = np.array([0, np.pi/2, np.pi])
result = np.sin(x)
print(result) #
numpy.mean
numpy.median
numpy.var
numpy.std
numpy.min
numpy.max
import numpy
a = numpy.array([13, 24, 22, 13, 11, 28, 16, 24, 18])
print(a) [13 24 22 13 11 28 16 24 18]
print ('mean:',numpy.mean(a)) mean: 18.77777777777778
print ('median:',numpy.median(a)) median: 18.0
print ('minimum:',numpy.min(a)) minimum: 11
print ('maximum:',numpy.max(a)) Maximum: 28
print ('sum of array:',numpy.sum(a)) sum of array: 169
print ('product of array:',numpy.prod(a)) product of array: 987086848
print ('covariance:',numpy.cov(a)) covariance: 35.69444444444444
print (‘variance:',numpy.var(a)) variance: 31.728395061728392
print ('standard deviation:',numpy.std(a))standard deviation: 5.632796380282922
print ('sort an array:',numpy.sort(a)) sort an array: [11 13 13 16 18 22 24 24 28]
print ('power:',numpy.power(a,3)) power: [ 2197 13824 10648 2197 1331 21
4096 13824 5832]
Reverse the array element
Array Manipulation:
numpy.flip import numpy as np
numpy.reshape
arr = np.array([1, 8, 3, 9, 5, -6])
numpy.ravel
numpy.transpose arr_r = np.flip(arr)
numpy.concatenate print(arr_r)
numpy.vstack
numpy.hstack [-6 5 9 3 8 1]
numpy.resize
Array Manipulation:
numpy.flip import numpy as np
numpy.reshape arr = np.array([[1, 8, 3], [9, 5, -6]])
numpy.ravel arr_r = np.reshape(arr, (3, 2))
numpy.transpose print(arr_r)
numpy.concatenate
numpy.vstack ([[1, 8],
numpy.hstack [3, 9],
numpy.resize
[5, -6]])
Elements must be equal
Array Manipulation:
Make array Flatten i.e. 1D array
numpy.flip import numpy as np
numpy.reshape
arr = np.array([[1, 2, 3], [4, 5, 6]])
numpy.ravel arr_flattened = np.ravel(arr)
numpy.transpose print(arr_flattened)
numpy.concatenate
numpy.vstack
numpy.hstack array([1, 2, 3, 4, 5, 6])
numpy.resize
Array Manipulation: Transpose of an Array
numpy.flip import numpy as np
numpy.reshape arr = np.array([[1, 2, 3], [4, 5, 6]])
numpy.ravel arr_transposed = np.transpose(arr)
numpy.transpose print(arr_transposed)
numpy.concatenate
numpy.vstack
numpy.hstack ([[1, 4],
numpy.resize [2, 5],
[3, 6]])
Array Manipulation: import numpy as np
arr1 = np.array([[1, 2], [3, 4]])
arr2 = np.array([[5, 6], [7, 8]])
numpy.reshape arr_concatenated = np.concatenate((arr1, arr2))
numpy.ravel print(arr_concatenated)
numpy.transpose
numpy.swapaxes [[1 2]
numpy.concatenate [3 4]
numpy.vstack [5 6]
numpy.hstack [7 8]]
numpy.resize
# concatenate two arrays
Array Manipulation: arr1 = np.array([[1, 2], [3, 4]])
arr2 = np.array([[5, 6], [7, 8]])
numpy.reshape arr_concatenated_h = np.hstack((arr1, arr2))
numpy.ravel print(arr_concatenated_h)
numpy.transpose arr_split = np.hstack(arr_concatenated_h, 1)
print(arr_split)
numpy.concatenate
numpy.vstack Vertically concatenated array:
numpy.hstack [[1 2]
numpy.resize [3 4] Horizontally concatenated array:
[5 6] [[1 2 5 6]
[7 8]] [3 4 7 8]]
Array Manipulation: import numpy as np
arr = np.array([[1, 3], [8, 5], [9, 2]])
print(arr)
numpy.reshape arr_resized = np.resize(arr, (3, 8))
numpy.ravel print("Resized array:")
numpy.transpose print(arr_resized)
numpy.swapaxes
numpy.concatenate Resized array:
numpy.vstack [[1 3 8 5 9 2 1 3]
numpy.hstack
numpy.resize
[8 5 9 2 1 3 8 5]
[9 2 1 3 8 5 9 2]]
Relational Operators in numpy array
import numpy as np
a = np.array([1, 2, 3, 8,-2])
b = np.array([2, 6, 1, 4, 7])
print(a > b) [False False True True False]
print(a < b) [ True True False False True]
print(a == b) [False False False False False]
print(a != b) [ True True True True True]
print(a >= b) [False False True True False]
print(a <= b) [ True True False False True]
import numpy Exercise
a= numpy.array([[100,200,300],[400,500,600],[700,800,900]])
print(a.ndim) 2
(3, 3)
print(a.shape)
[100 200 300 400 500 600 700 800 900]
print(a.flatten()) [100 500 900]
print(numpy.diagonal(a)) 900
print(numpy.max(a)) [[100 200 300]]
print(a[0:1]) [[100 200]]
print(a[0:1, 0:2]) 600
print(a[1][2]) 4500
500.0
print(numpy.sum(a))
[[900 800 700]
print(numpy.mean(a)) [600 500 400]
print(numpy.flip(a)) [300 200 100]]
Pandas Library
Pandas is a powerful data manipulation and analysis
library for Python that provides a variety of data
structures for working with tabular and labeled data.
The main data structures provided by Pandas are:
Series: A one-dimensional labeled array capable of
holding any data type.
DataFrame: A two-dimensional labeled data structure
with columns of potentially different types. It is similar to
a spreadsheet or SQL table.
Panel: A three-dimensional labeled data structure, used
for heterogeneous data.
Differences between Series and DataFrame in Pandas:
c=a.append(b)
0 50 20 30 40
1 41 25 23 48
2 5 6 4 3
Add columns in Data frame
import pandas as pd
a=pd.DataFrame([[100,200,300,400],[4,5,3,8],[51,62,41,36]],
columns=['pen', 'books', 'tab', 'lapi'], index = ['x','y','z'])
import pandas as pd
b=pd.DataFrame([[50,20,30,40],[41,25,23,48],[5,6,4,3]],
columns=['pen', 'books', 'tab', 'lapi'])
c=a.append(b)
print(c)
c['mobile']= [52,3,6,41,4,8]
print('new DataFrame\n',c)
Add columns in Data frame pen books tab lapi
x 100 200 300 400
y 4 5 3 8
z 51 62 41 36
0 50 20 30 40
c['mobile']= [52,3,6,41,4,8] 1 41 25 23 48
2 5 6 4 3
new DataFrame
pen books tab lapi mobile
x 100 200 300 400 52
y 4 5 3 8 3
z 51 62 41 36 6
0 50 20 30 40 41
1 41 25 23 48 4
2 5 6 4 3 8
Delete Row/ columns in Data frame
Drop command for deleting Rows/ Column
c=c.drop(index =[0]
c=c.drop(columns =[“pen”, “lapi”]
Basic information functions
Info(), describe(), head(), tail()
import pandas as pd
data = [['Rahul', 28, 'Mumbai'], ['Priya', 30, 'Delhi'], ['Jay', 25, 'Bangalore'], ['Anjali',
27, 'Chennai']]
df = pd.DataFrame(data, columns=['Name', 'Age', 'City'])
print(df)
print(df.info())
import pandas as pd
data = [['Rahul', 28, 'Mumbai'], ['Priya', 30, Name Age City
'Delhi'], ['Jay', 25, 'Bangalore'], ['Anjali', 27, 0 Rahul 28 Mumbai
'Chennai']] 1 Priya 30 Delhi
df = pd.DataFrame(data, columns=['Name', 2 Jay 25 Bangalore
'Age', 'City']) 3 Anjali 27 Chennai
print(df) <class 'pandas.core.frame.DataFrame'>
print(df.info()) RangeIndex: 4 entries, 0 to 3
Data columns (total 3 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 Name 4 non-null object
1 Age 4 non-null int64
2 City 4 non-null object
dtypes: int64(1), object(2)
memory usage: 224.0+ bytes
None
import pandas as pd Name Age City
0 Rahul 28 Mumbai
data = [['Rahul', 28, 'Mumbai'], ['Priya', 1 Priya 30 Delhi
30, 'Delhi'], ['Jay', 25, 'Bangalore'], 2 Jay 25 Bangalore
['Anjali', 27, 'Chennai']] 3 Anjali 27 Chennai
Age
df = pd.DataFrame(data, count 4.000000
columns=['Name', 'Age', 'City']) mean 27.500000
print(df) std 2.081666
min 25.000000
print(df.describe()) 25% 26.500000
50% 27.500000
75% 28.500000
max 30.000000
import pandas as pd
Name Age City
data = [['Rahul', 28, 'Mumbai'],
['Priya', 30, 'Delhi'], ['Jay', 25,
0 Rahul 28 Mumbai
'Bangalore'], ['Anjali', 27, 1 Priya 30 Delhi
'Chennai']] 2 Jay 25 Bangalore
df = pd.DataFrame(data, 3 Anjali 27 Chennai
columns=['Name', 'Age', 'City']) Name Age City
print(df) 0 Rahul 28 Mumbai
print(df.head(2)) 1 Priya 30 Delhi
import pandas as pd
Name Age City
data = [['Rahul', 28, 'Mumbai'],
['Priya', 30, 'Delhi'], ['Jay', 25,
0 Rahul 28 Mumbai
'Bangalore'], ['Anjali', 27, 1 Priya 30 Delhi
'Chennai']] 2 Jay 25 Bangalore
df = pd.DataFrame(data, 3 Anjali 27 Chennai
columns=['Name', 'Age', 'City']) Name Age City
print(df) 3 Anjali 27 Chennai
print(df.tail(1))
Sort DataFrame
Index Hierarchy
Introduction to Matplotlib:
• A Powerful Visualization Library
• Data visualization plays a crucial role in understanding
patterns, trends, and relationships in data, making it easier to
communicate insights effectively.
• Widely used Python library for creating high-quality plots
and charts
• Matplotlib is a popular open-source library that provides a
wide range of tools for creating visually appealing plots and
charts.
• With Matplotlib, you can create various types of plots, including
line plots, bar plots, scatter plots, histograms, heatmaps, and more.
This versatility allows you to choose the most appropriate plot type
for your data and effectively communicate insights.
• Matplotlib also provides functionalities for adding titles, labels, and
legends to your plots.
matplotlib.pyplot: This subpackage contains the primary plotting
functions that are commonly used for creating and customizing plots. It
provides an interface similar to MATLAB's plotting functions.
matplotlib.figure: This subpackage defines the Figure class, which
represents the entire figure or window that contains one or more axes. It
provides methods for managing and customizing the figure properties.
matplotlib.axes: This subpackage defines the Axes class, which
represents an individual plot or subplot within a figure. It provides
methods for creating and manipulating various types of plots.
matplotlib.collections: This subpackage provides classes for efficient
handling of collections of objects that can be plotted, such as
LineCollection, PatchCollection, etc. It is often used for creating more
complex plots with efficient rendering.
matplotlib.cm: This subpackage contains various color maps
that can be used for mapping numerical values to colors in plots.
matplotlib.colors: This subpackage provides classes and
functions for manipulating and defining colors in plots,
including color maps, color conversions, and color
specifications.
matplotlib.colorbar: This subpackage provides functionality
for creating colorbars, which are used to display the mapping
between numerical values and colors in a plot.
matplotlib.legend: This subpackage provides classes and
functions for creating legends, which are used to label and
explain the elements of a plot.
matplotlib.ticker: This subpackage provides classes and
functions for controlling the formatting and placement of tick
marks on the axes, as well as formatting the tick labels.
matplotlib.gridspec: This subpackage provides classes for
creating more complex grid layouts for subplots within a figure.
matplotlib.image: This subpackage provides functions for
reading, displaying, and manipulating images in plots.
matplotlib.text: This subpackage provides classes for adding
text elements, such as titles, labels, and annotations, to plots.
import matplotlib.pyplot as p
x = [2, 3, 1, 7, 4]
y = [1, 2, 3, 4, 5]
Histogram
import matplotlib.pyplot as plt
categories = ['A', 'B', 'C', 'D', 'E']
values = [10, 15, 7, 12, 9]
fig, ax = plt.subplots() # Create a figure and axis
ax.bar(categories, values) # Plot the bar graph
# Customize the plot
ax.set_xlabel('Categories')
ax.set_ylabel('Values')
ax.set_title('Bar Graph')
plt.show()
import matplotlib.pyplot as plt
import numpy as np
DataSet Loading:
A collection of data is called dataset. It is having the following two components −
Features − The variables of data are called it's features. They are also known as predictors,
inputs, or attributes.
Feature matrix − It is the collection of features, in case there is more than one.
Feature Names − It is the list of all the names of the features.
Response − It is the output variable that basically depends upon the feature variables. They are
also known as target, label, or output.
Response Vector − It is used to represent the response column. Generally, we have just one
response column.
Target Names − These represent the possible values taken by a response vector.
•test_size − This represents the ratio of test data to the total given data.
As in the above example, we are setting test_data = 0.3 for 150 rows of
X.
It will produce test data of 150*0.3 = 45 rows.
Supervised Learning: Supervised learning is a type of machine learning method in which we provide sample
labeled data to the machine learning system in order to train it, and on that basis, it predicts the output.
Supervised learning can be grouped further in two categories of algorithms:
Classification
Regression
Unsupervised Learning: Unsupervised learning is a learning method in which a machine learns
without any supervision.
The training is provided to the machine with a set of data that has not been labeled, classified,
or categorized, and the algorithm needs to act on that data without any supervision. The goal of
unsupervised learning is to restructure the input data into new features or a group of objects
with similar patterns.
It can be further classifieds into two categories of algorithms:
Clustering
Association
Reinforcement Learning:
Reinforcement learning is a feedback-based learning method, in which a learning agent gets a
reward for each right action and gets a penalty for each wrong action. The agent learns
automatically with these feedbacks and improves its performance. In reinforcement learning,
the agent interacts with the environment and explores it. The goal of an agent is to get the most
reward points, and hence, it improves its performance.
End of the Topic