Unit5 NumPy Pandas Notes

Programming in Python
Library in python :
Library in python :
•Python has created several open-source libraries, each with its root
source.
•A library is an initially merged collection of code scripts that can be used
iteratively to save time. It's similar to a physical library in that it holds
reusable resources, as the name implies.
A Python library is also a group of interconnected modules. It contains

code bundles that can be reused in a variety of programs. It simplifies and
facilitates Python programming for programmers.
• NumPy - A library for numerical computing in Python.
• Pandas - A library for data manipulation and analysis.
• Matplotlib - A library for data visualization.
• SciPy - A library for scientific computing and
optimization.
• Scikit-learn - A library for machine learning, including
classification, regression, clustering, and more.
• TensorFlow - A library for deep learning and neural
networks.
• Keras - A high-level neural networks API, running on top
of TensorFlow.
• PyTorch - A library for deep learning and neural
networks.
• Django - A popular web development framework for
building web applications.
• Flask - A lightweight web framework for building web
applications.
• BeautifulSoup - A library for web scraping and parsing
HTML and XML documents.
• OpenCV - A library for computer vision and image
processing.
• Pillow - A library for image processing and manipulation.
NumPy
• NumPy is aLibrary
powerful library for numerical
computing in Python.
• It provides an array object that is faster and
more efficient than traditional Python lists for
handling large amounts of numerical data.
Features of NumPy
Efficient numerical computations: NumPy is designed to
handle large amounts of numerical data efficiently. It provides
optimized routines for mathematical operations such as linear
algebra, Fourier transforms, and random number generation,
making it faster than traditional Python lists.
Multidimensional arrays: NumPy provides an n-dimensional
array object that allows you to store and manipulate large
amounts of data in a more compact and efficient way than
traditional Python lists. This makes it easy to perform operations
on large datasets, such as matrix multiplication or statistical
• NumPy arrays are faster and more compact than
Python lists.
• An array consumes less memory and is convenient to
use.
• NumPy uses much less memory to store data and it
provides a mechanism of specifying the data types.
• This allows the code to be optimized even further.
Examples
import numpy
arr = numpy.array([10,20,30,40,50])
print(arr)
import numpy as np
arr = np.array([10,20,30,40,50])
print(arr)
Output
[10,20,30,40,50]
import numpy as np
arr = np.array([[10,20,30], [40,50,60]])
print(arr)
Output is
[[10 20 30]
[40 50 60]]
Array Creation: Array Manipulation:
numpy.array numpy.reshape
numpy.zeros numpy.ravel
numpy.ones numpy.transpose
numpy.empty numpy.swapaxes
numpy.arange numpy.concatenate
numpy.linspace numpy.vstack
numpy.random.rand numpy.hstack
numpy.random.randn numpy.split
numpy.random.randint numpy.resize
Array Creation:
import numpy as np
my_list = [1, 2, 3, 4]
numpy.array arr = np.array(my_list)
numpy.zeros
numpy.ones print(arr)
numpy.empty
numpy.arange
numpy.linspace
numpy.random.rand
numpy.random.randn
numpy.random.randint
Array Creation: import numpy as np
numpy.array arr = np.zeros(5)
numpy.zeros print(arr)
numpy.ones
numpy.empty [0. 0. 0. 0. 0.]
numpy.arange import numpy as np
numpy.linspace
numpy.random.rand arr = np.zeros((2, 3))
numpy.random.randn print(arr) [[0. 0. 0.]
numpy.random.randint [0. 0. 0.]]
numpy.array arr = np.ones(4)
numpy.ones [1. 1. 1. 1.]
numpy.empty
numpy.arange import numpy as np
numpy.linspace
numpy.random.rand
arr = np.ones((2, 3))
numpy.random.randn print(arr) [[1. 1. 1.]
numpy.random.randint [1. 1. 1.]]
numpy.array arr = np.empty(5)
numpy.ones
numpy.empty
numpy.arange
numpy.linspace import numpy as np
numpy.random.rand arr = np.empty((2, 2))
numpy.random.randn
print(arr)
numpy.array arr = np.arange(0, 10, 2)
numpy.ones
numpy.empty
numpy.arange
numpy.linspace [0 2 4 6 8]
numpy.random.rand
numpy.random.randn
Create an array of evenly spaced values within a specified
interval: np.arange(start, stop, step size).
import numpy as np
arr = np.arange(0, 20, 2) *Exclude Last element
print(arr)
Output [ 0 2 4 6 8 10 12 14 16 18]
import numpy as np
arr = np.array(range(10))
print(arr)
Output
[0 1 2 3 4 5 6 7 8 9]
numpy.array arr = np.linspace(0, 1, 5)
numpy.ones
numpy.empty
numpy.arange
numpy.linspace
numpy.random.rand
numpy.random.randn [0. 0.25 0.5 0.75 1. ]
Create an array of evenly spaced numbers in a
specified interval:
numpy.linspace(start, stop, number of elements,
endpoint=True, retstep=False)
import numpy as np Output

arr = np.linspace(0, 10, 5) [ 0. 2.5 5. 7.5 10. ]
print(arr)
import numpy as np
arr1 = np.arange(0, 20, 4)
arr2, step = np.linspace(0, 100, 5, endpoint=False,
retstep=True)
print(arr1)
print(arr2) Output
print(step) [ 0 4 8 12 16]
[ 0. 20. 40. 60. 80.]
20.0
import numpy as np
arr1 = np.arange(0,30,5)
arr2 = np.linspace(0,30,5)
print(arr1)
print(arr2)
OUTPUT
[ 0 5 10 15 20 25]
[ 0. 7.5 15. 22.5 30. ]
arr = np.random.rand(2, 2)
numpy.array print(arr)
numpy.zeros
numpy.ones
numpy.empty
numpy.arange
numpy.linspace
numpy.random.rand [[0.83938699 0.3221221 ]
numpy.random.randn [0.10969336 0.24568426]]
arr = np.random.rand(2, 2)
numpy.array print(arr)
numpy.zeros
numpy.ones
numpy.empty
numpy.arange
numpy.linspace
numpy.random.rand [[0.83938699 -0.7221221 ]
numpy.random.randn [-0.90969336 0.24568426]]
Array Creation:
import numpy as np
numpy.array arr = np.random.randint(0, 20, (2, 3))
numpy.ones
numpy.empty
numpy.arange
numpy.linspace
numpy.random.rand [[ 9 5 15]
numpy.random.randn [11 1 1]]
Mathematical Operations:
numpy.add
numpy.subtract
numpy.multiply
numpy.divide
numpy.power
numpy.exp
numpy.log
numpy.sin
numpy.cos
numpy.tan
numpy.dot
numpy.inner
numpy.outer
Addition: np.add(x, y)
import numpy as np
x = np.array([1, 2, 3])
y = np.array([4, 5, 6])
result = np.add(x, y)
print(result)
# Output: [5 7 9]
Subtraction: np.subtract(x, y)
import numpy as np
x = np.array([1, 2, 3])
y = np.array([4, 5, 6])
result = np.subtract(x, y)
print(result)
# Output: [-3 -3 -3]

Multiplication: np.multiply(x, y)
import numpy as np
x = np.array([1, 2, 3])
y = np.array([4, 5, 6])
result = np.multiply(x, y)
print(result)
# Output: [ 4 10 18]
Division: np.divide(x, y)
import numpy as np
x = np.array([1, 2, 3])
y = np.array([4, 5, 6])
result = np.divide(x, y)
print(result)
# Output: [0.25 0.4 0.5 ]

Exponentiation: np.power(x, y)
import numpy as np
x = np.array([1, 2, 3])
y = np.array([2, 3, 4])
result = np.power(x, y)
print(result)
# Output: [ 1 8 81]
Sine: np.sin(x)
import numpy as np
x = np.array([0, np.pi/2, np.pi])
result = np.sin(x)
print(result) #
Output: [0. 1. 0.]

Statistical Functions:
numpy.mean
numpy.median
numpy.var
numpy.std
numpy.min
numpy.max
import numpy
a = numpy.array([13, 24, 22, 13, 11, 28, 16, 24, 18])
print(a) [13 24 22 13 11 28 16 24 18]
print ('mean:',numpy.mean(a)) mean: 18.77777777777778
print ('median:',numpy.median(a)) median: 18.0
print ('minimum:',numpy.min(a)) minimum: 11
print ('maximum:',numpy.max(a)) Maximum: 28
print ('sum of array:',numpy.sum(a)) sum of array: 169
print ('product of array:',numpy.prod(a)) product of array: 987086848
print ('covariance:',numpy.cov(a)) covariance: 35.69444444444444
print (‘variance:',numpy.var(a)) variance: 31.728395061728392
print ('standard deviation:',numpy.std(a))standard deviation: 5.632796380282922
print ('sort an array:',numpy.sort(a)) sort an array: [11 13 13 16 18 22 24 24 28]
print ('power:',numpy.power(a,3)) power: [ 2197 13824 10648 2197 1331 21
4096 13824 5832]
Reverse the array element
Array Manipulation:
numpy.flip import numpy as np
numpy.reshape
arr = np.array([1, 8, 3, 9, 5, -6])
numpy.ravel
numpy.transpose arr_r = np.flip(arr)
numpy.concatenate print(arr_r)
numpy.vstack
numpy.hstack [-6 5 9 3 8 1]
numpy.resize
Array Manipulation:
numpy.reshape arr = np.array([[1, 8, 3], [9, 5, -6]])
numpy.ravel arr_r = np.reshape(arr, (3, 2))
numpy.transpose print(arr_r)
numpy.concatenate
numpy.vstack ([[1, 8],
numpy.hstack [3, 9],
numpy.resize
[5, -6]])
Elements must be equal
Array Manipulation:
Make array Flatten i.e. 1D array
numpy.reshape
arr = np.array([[1, 2, 3], [4, 5, 6]])
numpy.ravel arr_flattened = np.ravel(arr)
numpy.transpose print(arr_flattened)
numpy.concatenate
numpy.vstack
numpy.hstack array([1, 2, 3, 4, 5, 6])
numpy.resize
Array Manipulation: Transpose of an Array
numpy.reshape arr = np.array([[1, 2, 3], [4, 5, 6]])
numpy.ravel arr_transposed = np.transpose(arr)
numpy.transpose print(arr_transposed)
numpy.concatenate
numpy.vstack
numpy.hstack ([[1, 4],
numpy.resize [2, 5],
[3, 6]])
Array Manipulation: import numpy as np
arr1 = np.array([[1, 2], [3, 4]])
arr2 = np.array([[5, 6], [7, 8]])
numpy.reshape arr_concatenated = np.concatenate((arr1, arr2))
numpy.ravel print(arr_concatenated)
numpy.transpose
numpy.swapaxes [[1 2]
numpy.concatenate [3 4]
numpy.vstack [5 6]
numpy.hstack [7 8]]
numpy.resize
# concatenate two arrays
Array Manipulation: arr1 = np.array([[1, 2], [3, 4]])
arr2 = np.array([[5, 6], [7, 8]])
numpy.reshape arr_concatenated_h = np.hstack((arr1, arr2))
numpy.ravel print(arr_concatenated_h)
numpy.transpose arr_split = np.hstack(arr_concatenated_h, 1)
print(arr_split)
numpy.concatenate
numpy.vstack Vertically concatenated array:
numpy.hstack [[1 2]
numpy.resize [3 4] Horizontally concatenated array:
[5 6] [[1 2 5 6]
[7 8]] [3 4 7 8]]
Array Manipulation: import numpy as np
arr = np.array([[1, 3], [8, 5], [9, 2]])
print(arr)
numpy.reshape arr_resized = np.resize(arr, (3, 8))
numpy.ravel print("Resized array:")
numpy.transpose print(arr_resized)
numpy.swapaxes
numpy.concatenate Resized array:
numpy.vstack [[1 3 8 5 9 2 1 3]
numpy.hstack
numpy.resize
[8 5 9 2 1 3 8 5]
[9 2 1 3 8 5 9 2]]
Relational Operators in numpy array
import numpy as np
a = np.array([1, 2, 3, 8,-2])
b = np.array([2, 6, 1, 4, 7])
print(a > b) [False False True True False]
print(a < b) [ True True False False True]
print(a == b) [False False False False False]
print(a != b) [ True True True True True]
print(a >= b) [False False True True False]
print(a <= b) [ True True False False True]
import numpy Exercise
a= numpy.array([[100,200,300],[400,500,600],[700,800,900]])
print(a.ndim) 2
(3, 3)
print(a.shape)
[100 200 300 400 500 600 700 800 900]
print(a.flatten()) [100 500 900]
print(numpy.diagonal(a)) 900
print(numpy.max(a)) [[100 200 300]]
print(a[0:1]) [[100 200]]
print(a[0:1, 0:2]) 600
print(a[1][2]) 4500
500.0
print(numpy.sum(a))
[[900 800 700]
print(numpy.mean(a)) [600 500 400]
print(numpy.flip(a)) [300 200 100]]
Pandas Library
Pandas is a powerful data manipulation and analysis
library for Python that provides a variety of data
structures for working with tabular and labeled data.
The main data structures provided by Pandas are:
Series: A one-dimensional labeled array capable of
holding any data type.
DataFrame: A two-dimensional labeled data structure
with columns of potentially different types. It is similar to
a spreadsheet or SQL table.
Panel: A three-dimensional labeled data structure, used
for heterogeneous data.
Differences between Series and DataFrame in Pandas:
Dimensionality: A Series is a one-dimensional data structure, while a

DataFrame is a two-dimensional data structure.
Data Structure: A Series can hold a single column of data, while a
DataFrame can hold multiple columns of data.
Index: A Series has only one index, while a DataFrame has both a row
index and a column index.
Size: A Series can have any length, while a DataFrame must have the
same length for all its columns.
Accessing Data: In a Series, data can be accessed using only the index.
In a DataFrame, data can be accessed using both the row index and the
column index.
How to Create Series using pandas
0 10
Create Series from list/array 1 20
import pandas as pd 2 30
3 40
a= [10, 20, 30, 40, 50] 4 50
b= pd.Series(a) dtype: int64
print(b)
Create Series using index
import pandas as pd
s = pd.Series([3,4,-5,8], index=['a','b','c','d'])
print(s)
a 3
b 4
c -5
d 8
dtype: int64
With specified index: a 10
b 20
import pandas as pd c 30
my_list = [10, 20, 30, 40, 50] d 40
e 50
my_index = ['a', 'b', 'c', 'd', 'e'] dtype: int64
my_series = pd.Series(my_list, index=my_index)
print(my_series)
Create Series using dictionary a 10
b 20
import pandas as pd
c 30
d 40
my_dict = {'a': 10, 'b': 20, 'c': 30, 'd': 40, 'e': 50}
e 50
my_series = pd.Series(my_dict) dtype: int64
print(my_series)
Index([5, 6, 4, 'a'], dtype='object')
import pandas as p
s = p.Series([3,4,-5,8], [5,6,4,"a"])
print(s)
print(s.shape)
print(s.size)
5 3
print(s.keys()) 6 4
4 -5
a 8
dtype: int64
(4,)
4
Index([5, 6, 4, 'a'], dtype='object')
Creating DataFrame
DataFrame(data, columns=list of column names)
Data represents multi dimensional data of any data

type
Columns shows the list of column names
Example
import pandas as pd
a = pd.DataFrame([[100,200,300,400],[4,5,3,8],[51,62,41,36]],
columns=['pen', 'books', 'tab', 'lapi'])
print(a)
print(a.shape)
pen books tab lapi
print(a.size) 0 100 200 300 400
print(a.keys()) 1 4 5 3 8
2 51 62 41 36
(3, 4)
12
Index(['pen', 'books', 'tab', 'lapi'], dtype='object')
Add Rows / new Data frame
import pandas as pd
a=pd.DataFrame([[100,200,300,400],[4,5,3,8],[51,62,41,36]],
columns=['pen', 'books', 'tab', 'lapi'], index = ['x','y','z'])
import pandas as pd
b=pd.DataFrame([[50,20,30,40],[41,25,23,48],[5,6,4,3]],
c=a.append(b) pen books tab lapi
print(c) x 100 200 300 400
y 4 5 3 8
z 51 62 41 36
c=a.append(b)
0 50 20 30 40
1 41 25 23 48
2 5 6 4 3
Add columns in Data frame
import pandas as pd
a=pd.DataFrame([[100,200,300,400],[4,5,3,8],[51,62,41,36]],
columns=['pen', 'books', 'tab', 'lapi'], index = ['x','y','z'])
import pandas as pd
b=pd.DataFrame([[50,20,30,40],[41,25,23,48],[5,6,4,3]],
c=a.append(b)
print(c)
c['mobile']= [52,3,6,41,4,8]
print('new DataFrame\n',c)
Add columns in Data frame pen books tab lapi
x 100 200 300 400
y 4 5 3 8
z 51 62 41 36
0 50 20 30 40
c['mobile']= [52,3,6,41,4,8] 1 41 25 23 48
2 5 6 4 3
new DataFrame
pen books tab lapi mobile
x 100 200 300 400 52
y 4 5 3 8 3
z 51 62 41 36 6
0 50 20 30 40 41
1 41 25 23 48 4
2 5 6 4 3 8
Delete Row/ columns in Data frame
Drop command for deleting Rows/ Column
c=c.drop(index =[0]
c=c.drop(columns =[“pen”, “lapi”]
Basic information functions
Info(), describe(), head(), tail()
import pandas as pd
data = [['Rahul', 28, 'Mumbai'], ['Priya', 30, 'Delhi'], ['Jay', 25, 'Bangalore'], ['Anjali',
27, 'Chennai']]
df = pd.DataFrame(data, columns=['Name', 'Age', 'City'])
print(df)
print(df.info())
import pandas as pd
data = [['Rahul', 28, 'Mumbai'], ['Priya', 30, Name Age City
'Delhi'], ['Jay', 25, 'Bangalore'], ['Anjali', 27, 0 Rahul 28 Mumbai
'Chennai']] 1 Priya 30 Delhi
df = pd.DataFrame(data, columns=['Name', 2 Jay 25 Bangalore
'Age', 'City']) 3 Anjali 27 Chennai
print(df) <class 'pandas.core.frame.DataFrame'>
print(df.info()) RangeIndex: 4 entries, 0 to 3
Data columns (total 3 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 Name 4 non-null object
1 Age 4 non-null int64
2 City 4 non-null object
dtypes: int64(1), object(2)
memory usage: 224.0+ bytes
None
import pandas as pd Name Age City
0 Rahul 28 Mumbai
data = [['Rahul', 28, 'Mumbai'], ['Priya', 1 Priya 30 Delhi
30, 'Delhi'], ['Jay', 25, 'Bangalore'], 2 Jay 25 Bangalore
['Anjali', 27, 'Chennai']] 3 Anjali 27 Chennai
Age
df = pd.DataFrame(data, count 4.000000
columns=['Name', 'Age', 'City']) mean 27.500000
print(df) std 2.081666
min 25.000000
print(df.describe()) 25% 26.500000
50% 27.500000
75% 28.500000
max 30.000000
import pandas as pd
Name Age City
data = [['Rahul', 28, 'Mumbai'],
['Priya', 30, 'Delhi'], ['Jay', 25,
0 Rahul 28 Mumbai
'Bangalore'], ['Anjali', 27, 1 Priya 30 Delhi
'Chennai']] 2 Jay 25 Bangalore
df = pd.DataFrame(data, 3 Anjali 27 Chennai
columns=['Name', 'Age', 'City']) Name Age City
print(df) 0 Rahul 28 Mumbai
print(df.head(2)) 1 Priya 30 Delhi
import pandas as pd
Name Age City
data = [['Rahul', 28, 'Mumbai'],
['Priya', 30, 'Delhi'], ['Jay', 25,
0 Rahul 28 Mumbai
print(df) 3 Anjali 27 Chennai
print(df.tail(1))
Sort DataFrame
print (df.sort_values('Name', ascending=True))

import pandas as pd
data = [['Rahul', 28, 'Mumbai'], Name Age City
['Priya', 30, 'Delhi'], ['Jay', 25, 0 Rahul 28 Mumbai
print(df) 3 Anjali 27 Chennai
print 2 Jay 25 Bangalore
(df.sort_values('Name', 1 Priya 30 Delhi
0 Rahul 28 Mumbai
ascending=True))
import pandas as pd
names = ['Raj', 'Sita', 'Amit', 'Neha', 'Vijay', 'Priya']
states = ['Karnataka', 'Maharashtra', 'Karnataka', 'Delhi', 'Maharashtra', 'Delhi']
genders = ['Male', 'Female', 'Male', 'Female', 'Male', 'Female']
ages = [28, 32, 25, 29, 35, 27]
index = pd.MultiIndex.from_arrays([states, genders], names=['State', 'Gender'])
df = pd.DataFrame({'Name': names, 'Age': ages}, index=index)
print(df)
Index Hierarchy
Introduction to Matplotlib:
• A Powerful Visualization Library
• Data visualization plays a crucial role in understanding
patterns, trends, and relationships in data, making it easier to
communicate insights effectively.
• Widely used Python library for creating high-quality plots
and charts
• Matplotlib is a popular open-source library that provides a
wide range of tools for creating visually appealing plots and
charts.
• With Matplotlib, you can create various types of plots, including
line plots, bar plots, scatter plots, histograms, heatmaps, and more.
This versatility allows you to choose the most appropriate plot type
for your data and effectively communicate insights.
• Matplotlib also provides functionalities for adding titles, labels, and
legends to your plots.
matplotlib.pyplot: This subpackage contains the primary plotting
functions that are commonly used for creating and customizing plots. It
provides an interface similar to MATLAB's plotting functions.
matplotlib.figure: This subpackage defines the Figure class, which
represents the entire figure or window that contains one or more axes. It
provides methods for managing and customizing the figure properties.
matplotlib.axes: This subpackage defines the Axes class, which
represents an individual plot or subplot within a figure. It provides
methods for creating and manipulating various types of plots.
matplotlib.collections: This subpackage provides classes for efficient
handling of collections of objects that can be plotted, such as
LineCollection, PatchCollection, etc. It is often used for creating more
complex plots with efficient rendering.
matplotlib.cm: This subpackage contains various color maps
that can be used for mapping numerical values to colors in plots.
matplotlib.colors: This subpackage provides classes and
functions for manipulating and defining colors in plots,
including color maps, color conversions, and color
specifications.
matplotlib.colorbar: This subpackage provides functionality
for creating colorbars, which are used to display the mapping
between numerical values and colors in a plot.
matplotlib.legend: This subpackage provides classes and
functions for creating legends, which are used to label and
explain the elements of a plot.
matplotlib.ticker: This subpackage provides classes and
functions for controlling the formatting and placement of tick
marks on the axes, as well as formatting the tick labels.
matplotlib.gridspec: This subpackage provides classes for
creating more complex grid layouts for subplots within a figure.
matplotlib.image: This subpackage provides functions for
reading, displaying, and manipulating images in plots.
matplotlib.text: This subpackage provides classes for adding
text elements, such as titles, labels, and annotations, to plots.
import matplotlib.pyplot as p
x = [2, 3, 1, 7, 4]
y = [1, 2, 3, 4, 5]
fig,f= p.subplots() # Create a figure and axis
f.plot(x, y) # Plot the data

f.set_xlabel('X-axis’) # Customize the plot
f.set_ylabel('Y-axis')
f.set_title('Simple Line Plot')
p.show() # Show the plot

import matplotlib.pyplot as plt
# Create a figure and axis
fig, ax = plt.subplots()
ax.plot([1, 2, 3, 4, 5], [1, 4, 9, 16, 25], 'ro-')
ax.set_xlabel(‘Temperature')
ax.set_ylabel(‘Resistance')
ax.set_title(‘Thermal effect on Resistance')
# Show the plot
plt.show()
y = [100, 200, 300, 400, 500]
x = [20, 44, 36, 58, 100]
color = ['red', 'green', 'blue', 'yellow', 'purple']
sizes = [30, 60, 90, 120, 150]
fig, ax = plt.subplots() # Create a figure and axis
ax.scatter(x, y, c=color, s=sizes) # Plot the data
ax.set_xlabel('X-axis’) # Customize the plot
ax.set_ylabel('Y-axis')
ax.set_title('Scatter Plot')
plt.show() # Show the plot
x = [1, 2, 3, 4, 5]
y = [2, 4, 6, 8, 10]
symbols = ['o', 's', '^', 'D', 'v']
for i in range(len(x)): # Plot the data
ax.scatter(x[i], y[i], marker=symbols[i], s=100)
ax.set_xlabel('X-axis’) # Customize the plot
ax.set_ylabel('Y-axis')
ax.set_title('Scatter Plot with Different Symbols')
# Show the plot
plt.show()
data = [1, 2, 3, 3, 4, 4, 4, 5, 5, 6, 6, 6, 6, 7, 7, 8, 8, 9, 9,
9,10,12,10,14,12,14,10,9,8,12]
ax.hist(data, bins=6, edgecolor='black’) # Plot the histogram
ax.set_xlabel('Value’) # Customize the plot
ax.set_ylabel('Frequency')
ax.set_title('Histogram')
plt.show() # Show the plot
Histogram
categories = ['A', 'B', 'C', 'D', 'E']
values = [10, 15, 7, 12, 9]
ax.bar(categories, values) # Plot the bar graph
# Customize the plot
ax.set_xlabel('Categories')
ax.set_ylabel('Values')
ax.set_title('Bar Graph')
plt.show()
import numpy as np
x = np.linspace(0, 10, 50) # Generate some data

y1 = np.sin(x)
y2 = np.cos(x)
fig, axes = plt.subplots(nrows=2, ncols=1, figsize=(8, 6)) # Create subplots
axes[0].plot(x, y1, label='sin(x)’) # Line plot on the first subplot
axes[0].set_xlabel('x')
axes[0].set_ylabel('sin(x)')
axes[0].set_title('Line Plot 1')
axes[0].legend()
axes[1].scatter(x, y2, label='cos(x)', color='red', marker='o’) # Scatter plot on the second subplot
axes[1].set_xlabel('x')
axes[1].set_ylabel('cos(x)')
axes[1].set_title('Scatter Plot 2')
axes[1].legend()
plt.tight_layout() # Adjust spacing between subplots
plt.show() # Display the figure

Scikit Learn
Scikit learn is an open-source Python library that implements a range of
machine learning, pre-processing, cross-validation, and visualization algorithms
using a unified interface.
Important features of scikit-learn:

Simple and efficient tools for data mining and data analysis. It features various
classification, regression, and clustering algorithms including support vector
machines, random forests, gradient boosting, k-means, etc.
Accessible to everybody and reusable in various contexts.
Built on the top of NumPy, SciPy, and matplotlib.
Open source, commercially usable – BSD license.
Scikit Learn - Modelling Process:
DataSet Loading:
A collection of data is called dataset. It is having the following two components −
Features − The variables of data are called it's features. They are also known as predictors,
inputs, or attributes.
Feature matrix − It is the collection of features, in case there is more than one.
Feature Names − It is the list of all the names of the features.
Response − It is the output variable that basically depends upon the feature variables. They are
also known as target, label, or output.
Response Vector − It is used to represent the response column. Generally, we have just one
response column.
Target Names − These represent the possible values taken by a response vector.
Splitting the dataset:

To check the accuracy of our model, we can split the dataset into two pieces-a training set and a
As seen in the example above, it uses train_test_split() function of
scikit-learn to split the dataset. This function has the following
arguments −
•X, Y − Here, X is the feature matrix and Y is the response vector,
which need to be split.
•test_size − This represents the ratio of test data to the total given data.
As in the above example, we are setting test_data = 0.3 for 150 rows of
X.
It will produce test data of 150*0.3 = 45 rows.
•random_size − It is used to guarantee that the split will always be the

same. This is useful in the situations where you want reproducible
results.
Introduction to Machine Learning
Machine learning is programming computers to optimize a performance criterion

using example data or past experience. We have a model defined up to some
parameters, and learning is the execution of a computer program to optimize the
parameters of the model using the training data or past experience. The model
may be predictive to make predictions in the future, or descriptive to gain
knowledge from data.
The field of study known as machine learning is concerned with the question of
how to construct computer programs that automatically improve with experience.
How does Machine Learning work:

A Machine Learning system learns from historical data, builds the prediction
models, and whenever it receives new data, predicts the output for it. The accuracy
of predicted output depends upon the amount of data, as the huge amount of data
helps to build a better model which predicts the output more accurately.
Features of Machine Learning:
Machine learning uses data to detect various patterns in a given dataset.
It can learn from past data and improve automatically.
It is a data-driven technology.
Machine learning is much similar to data mining as it also deals with a huge amount of data.
Following are some key points that show the importance of Machine Learning:
Rapid increment in the production of data
Solving complex problems, which are difficult for a human
Decision-making in various sector including finance
Finding hidden patterns and extracting useful information from data.
Classification of Machine Learning:

At a broad level, machine learning can be classified into three types:
Supervised learning
Unsupervised learning
Reinforcement learning
Supervised Learning: Supervised learning is a type of machine learning method in which we provide sample
labeled data to the machine learning system in order to train it, and on that basis, it predicts the output.
Supervised learning can be grouped further in two categories of algorithms:
Classification
Regression
Unsupervised Learning: Unsupervised learning is a learning method in which a machine learns
without any supervision.
The training is provided to the machine with a set of data that has not been labeled, classified,
or categorized, and the algorithm needs to act on that data without any supervision. The goal of
unsupervised learning is to restructure the input data into new features or a group of objects
with similar patterns.
It can be further classifieds into two categories of algorithms:
Clustering
Association
Reinforcement Learning:
Reinforcement learning is a feedback-based learning method, in which a learning agent gets a
reward for each right action and gets a penalty for each wrong action. The agent learns
automatically with these feedbacks and improves its performance. In reinforcement learning,
the agent interacts with the environment and explores it. The goal of an agent is to get the most
reward points, and hence, it improves its performance.
End of the Topic

Unit5 NumPy Pandas Notes

Uploaded by

Unit5 NumPy Pandas Notes

Uploaded by

Programming in Python

A Python library is also a group of interconnected modules. It contains

import numpy as np Output

# Output: [-3 -3 -3]

# Output: [0.25 0.4 0.5 ]

Output: [0. 1. 0.]

Dimensionality: A Series is a one-dimensional data structure, while a

Data represents multi dimensional data of any data

print (df.sort_values('Name', ascending=True))

fig,f= p.subplots() # Create a figure and axis

f.plot(x, y) # Plot the data

p.show() # Show the plot

x = np.linspace(0, 10, 50) # Generate some data

plt.show() # Display the figure

Important features of scikit-learn:

Splitting the dataset:

•random_size − It is used to guarantee that the split will always be the

Machine learning is programming computers to optimize a performance criterion

How does Machine Learning work:

Classification of Machine Learning:

You might also like