0% found this document useful (0 votes)
44 views101 pages

Python Libraries For Machine Learning

Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
Download as pdf or txt
0% found this document useful (0 votes)
44 views101 pages

Python Libraries For Machine Learning

Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
Download as pdf or txt
Download as pdf or txt
You are on page 1/ 101

Python Libraries for Machine Learning

Rohit Gupta

All rights reserved. No part of this publication may be reproduced, distributed, or transmitted in
any form or by any means, including photocopying, recording, or other electronic or mechanical
methods, without the prior written permission of the publisher, except in the case of brief
quotations embodied in critical reviews and certain other noncommercial uses permitted by
copyright law. Although the author/co-author and publisher have made every effort to ensure
that the information in this book was correct at press time, the author/co-author and publisher do
not assume and hereby disclaim any liability to any party for any loss, damage, or disruption
caused by errors or omissions, whether such errors or omissions result from negligence,
accident, or any other cause. The resources in this book are provided for informational purposes
only and should not be used to replace the specialized training and professional judgment of a
health care or mental health care professional. Neither the author/co-author nor the publisher
can be held responsible for the use of the information provided within this book. Please always
consult a trained professional before making any decision regarding the treatment of yourself or
others.

Author – Rohit Gupta


Publisher – C# Corner
Editorial Team – Deepak Tewatia, Baibhav Kumar
Publishing Team – Praveen Kumar
Promotional & Media – Rohit Tomar

https://www.c-sharpcorner.com/ebooks/ 2
Background and Expertise

As a Technical Lead at MCN Solutions, I work with a team of engineers and developers to
design, develop, and implement blockchain and IoT solutions for various clients across different
domains and industries. I am responsible for leading the technical architecture, coding
standards, testing, and deployment of the projects, as well as mentoring and guiding the junior
members of the team.

I have a Master of Science degree in Informatics from the Institute of Informatics and
Communication, University of Delhi. I also hold an Intel Edge AI for IoT Nanodegree from
Udacity, where I learned how to deploy AI models on edge devices. I am passionate about
exploring the intersection of blockchain and IoT, and how they can enhance the security,
efficiency, and scalability of various applications. I am also a Technical Writer and a Program
Director at C# Corner, where I share my knowledge and experience with the community and
help them with their queries and issues. I have been a speaker at TED Circles, where I
discussed the potential and challenges of blockchain and IoT in the future.

— Rohit Gupta

https://www.c-sharpcorner.com/ebooks/ 3
Table of Contents:
Introduction................................................................................................................................. 5
ML Python Libraries ...................................................................................................................... 8
Python NumPy ............................................................................................................................11
Python Pandas ............................................................................................................................19
Python Skit-Learn ........................................................................................................................33
Python Matplotlib .......................................................................................................................42
Python Seaborn...........................................................................................................................64
Python TensorFlow......................................................................................................................87

https://www.c-sharpcorner.com/ebooks/ 4
1
Introduction

Overview

In this chapter, we outline the essential pre-requisites for this book,


including a basic understanding of Python 3.8, familiarity with Google
Colab, and prior knowledge of object-oriented programming. We'll
cover key Python libraries like NumPy, Pandas, Scikit-Learn,
Matplotlib, Seaborn, and TensorFlow. We also provide installation
steps for Python 3 on Windows, Ubuntu, and macOS.

https://www.c-sharpcorner.com/ebooks/ 5
Pre-Requisites
• To get best out of this book, it is recommended to have prior knowledge of object-
oriented programming using Python.
• The current version of Python is 3.8. Since Python 2 will be getting obsolete soon and
TensorFlow 2.1 is the last stable release to support Python 2, hence we will be using
Python 3 throughout the book.
You can check your version using
python3 --version

To learn about programming in Python 3, you can read my book on Python 3.


• It is recommended to have prior experience of using Google Colab as we will be using
Google Colab as our python environment.
To do a hand-on, please feel free to visit.

What is inthis Book?


In this book, we will learn to use the following Python libraries:
• 1. NumPy
• 2. Pandas
• 3. SciKit-Learn
• 4. Matplotlib
• 5. Seaborn
• 6. TensorFlow
In this book, we will discuss various functionalities of the python libraries, implement each of the
functionalities using Python 3.
After completing the book, you should be able to use and implement each of the listed Python
libraries to your benefit.

Steps to Install Python 3


Windows
• Python's latest version can be downloaded from https://www.python.org/downloads/.
• At the time of writing this, the latest version was 3.8.1.
• Run the installer file with the Administrator.
• Make sure to tick the following
• Add Python 3.x to PATH
• Adding Python to Path will make Python available to you over the whole system
• Click on Install Now
• Wait till the process is over.

Ubuntu
• For Ubuntu 17.10+, python3 comes installed already
• For Ubuntu 16.10 and 17.03
sudo apt-get update && sudo apt-get install python3

https://www.c-sharpcorner.com/ebooks/ 6
• For Ubuntu 14.04 and 16.04
sudo add-apt-repository ppa:deadsnakes/ppa
sudo apt-get update
sudo apt-get install python3

MAC os / Mac os x
• Install Homebrew, type in the following command and hit enter
/usr/bin/ruby -e "$(curl -fsSL
https://raw.githubusercontent.com/Homebrew/install/master/install)"

• Open Terminal and type in the following command and hit enter
brew install python3

https://www.c-sharpcorner.com/ebooks/ 7
2
ML Python Libraries

Overview

In this chapter, we introduce key Python libraries for data analysis


and machine learning: NumPy for numerical operations, Pandas for
data manipulation, Scikit-Learn for machine learning algorithms,
Matplotlib and Seaborn for data visualization, and TensorFlow for
neural networks. Each library's purpose and official resources are
highlighted, setting the foundation for deeper exploration in the
following chapters.

https://www.c-sharpcorner.com/ebooks/ 8
Numpy
According to Wikipedia, Numeric, the ancestor of NumPy, was developed by Jim Hugunin.
Another package, Numarray was also developed, having some additional functionalities. In
2005, Travis Oliphant created the NumPy package by incorporating the features of Numarray
into the Numeric package. There are many contributors to this open-source project.
According to the NumPy Official Documentation, Numpy or Numerical Python is a Python library
that provides the following
• a powerful N-dimensional array object
• sophisticated (broadcasting) functions
• tools for integrating C/C++ and Fortran code
• useful linear algebra, Fourier Transform, and random number capabilities.
It can also provide an efficient multi-dimensional container of generic data. Arbitrary data types
can be defined. The official website is www.numpy.org,

Pandas
According to Wikipedia, Pandas is a software library written in Python for data manipulation and
analysis. In particular, it offers data structures and operations for manipulating numerical tables
and time series. It is free software released under the three-clause BSD license. The name is
derived from the term "panel data", an econometrics term for data sets that include observations
over multiple periods for the same individuals
The original author is Wes McKinney. Pandas was first released on 11 January 2008. The
official website is www.pandas.pydata.org
• DataFrame object for data manipulation with integrated indexing.
• Tools for reading and writing data between in-memory data structures and different file
formats.
• Data alignment and integrated handling of missing data.
• Reshaping and pivoting of data sets.
• Label-based slicing, fancy indexing, and sub-setting of large data sets.
• Data structure column insertion and deletion.
• Group by engine allowing split-apply-combine operations on data sets.
• Data set merging and joining.
• Hierarchical axis indexing to work with high-dimensional data in a lower-dimensional
data structure.
• Time series-functionality: Date range generation and frequency conversion, moving
window statistics, moving window linear regressions, date shifting and lagging.
• It provides data filtration.

Scikit-Learn
According to Wikipedia, Scikit-learn (formerly scikits.learn) is a free software machine learning
library for the Python programming language.
It contains various pre-implemented classification, regression and clustering algorithms including
support vector machines, random forests, gradient boosting, k-means and DBSCAN, and is
designed to interoperate with the Python numerical and scientific libraries NumPy and SciPy.
The scikit-learn project started as scikits.learn a Google Summer of Code project by David
Cournapeau. Its name stems from the notion that it is a "SciKit" (SciPy Toolkit), a separately-
developed and distributed third-party extension to SciPy.
It was first released in June 2012. The official website is www.scikit-learn.org.

https://www.c-sharpcorner.com/ebooks/ 9
Matplotlib
According to Wikipedia, Matplotlib is a plotting library for the Python programming language and
its numerical mathematics extension NumPy. It provides an object-oriented API for embedding
plots into applications using general-purpose GUI toolkits like Tkinter, wxPython, Qt, or GTK+.
There is also a procedural "pylab" interface based on a state machine (like OpenGL), designed
to closely resemble that of MATLAB, though its use is discouraged. SciPy makes use of
matplotlib.
Matplotlib was originally written by John D. Hunter, has an active development community, and
is distributed under a BSD-style license. It was first released in 2003. The official website is
www.matplotlib.org.

Seaborn
According to the Seaborn Official Documentation, Seaborn is a library for making statistical
graphics in Python. It is built on top of matplotlib and closely integrated with pandas data
structures.
Here are some of the functionalities that seaborn offers:
• A dataset-oriented API for examining relationships between multiple variables
• Specialized support for using categorical variables to show observations or aggregate
statistics
• Options for visualizing univariate or bivariate distributions and for comparing them
between subsets of data
• Automatic estimation and plotting of linear regression models for different kinds of
dependent variables
• Convenient views onto the overall structure of complex datasets
• High-level abstractions for structuring multi-plot grids that let you easily build complex
visualizations
• Concise control over matplotlib figure styling with several built-in themes
• Tools for choosing colour palettes that faithfully reveal patterns in your data
Seaborn aims to make visualization a central part of exploring and understanding data. Its
dataset-oriented plotting functions operate on dataframes and arrays containing whole datasets
and internally perform the necessary semantic mapping and statistical aggregation to produce
informative plots
The official website is www.seaborn.pydata.org.

TensorFlow
According to Wikipedia, TensorFlow is a free and open-source software library for dataflow and
differentiable programming across a range of tasks. It is a symbolic math library and is also used
for machine learning applications such as neural networks. It is used for both research and
production at Google.
TensorFlow was developed by the Google Brain team for internal Google use. It was released
under the Apache License 2.0 on November 9, 2015. The official website is www.tensorflow.org.

Conclusion
In the above chapter, I introduced to you numpy, pandas, sklearn, seaborn, matplotlib and
TensorFlow. In the next chapters, we will be studying in detail about each of the libraries.

https://www.c-sharpcorner.com/ebooks/ 10
3
Python NumPy

Overview

In this chapter, we delve into Python NumPy, covering its history,


installation, and key features. You'll learn about creating and
manipulating N-dimensional arrays, essential attributes, and functions
like numpy.zeros(), numpy.arange(), and numpy.random.random().
This overview will equip you with the knowledge to harness NumPy's
power for numerical and scientific computing in Python.

https://www.c-sharpcorner.com/ebooks/ 11
What is Python NumPy?
According to Wikipedia, Numeric, the ancestor of NumPy, was developed by Jim Hugunin.
Another package Numarray was also developed, having some additional functionalities. In 2005,
Travis Oliphant created NumPy package by incorporating the features of Numarray into Numeric
package. There are many contributors to this open-source project.
NumPy or Numerical Python is a python library that provides the following
• a powerful N-dimensional array object
• sophisticated (broadcasting) functions
• tools for integrating C/C++ and Fortran code
• useful linear algebra, Fourier Transform and random number capabilities.
It can also provide an efficient multi-dimensional container of generic data. Arbitrary data-types
can be defined. The official website is http://www.numpy.org

Installing NumPy in Python


Ubuntu/ Linux
sudo apt update -y
sudo apt upgrade -y
sudo apt install python3-tk python3-pip -y
sudo pip install numpy -y

Anaconda
conda install -c anaconda numpy

NumPy Array
It is a powerful N-dimensional array which is in the form of rows and columns. We can initialize
NumPy arrays from nested Python list and access its elements.
NumPy array is not the same as the Standard Python Library Class array. Array, which only
handles 1D arrays.

Single Dimensional NumPy Array


Single dimensional arrays are numpy arrays where the data is stored in a linear manner and as
continuous memory locations.
import numpy as np
a = np.array([1,2,3])
print(a)
the above code will result in [1 2 3]

Multi-Dimensional arrays
Multidimensional arrays are numpy arrays where the data can be stored in a non-linear manner
and as continuous memory locations.
import numpy as np
a = np.array([[1,2,3],[4,5,6]])
print(a)
the above code will result in [[1 2 3] [4 5 6]]

https://www.c-sharpcorner.com/ebooks/ 12
NumPy Array Attributes
ndarray.ndim
It returns the number of axes (dimensions) of the array.
import numpy as np
a = np.array([[1,2,3],[4,5,6]])
print(a.ndim)
The output of the above code will be 2, since 'a' is a 2D array

ndarray.shape
It returns a tuple of the dimension of the array, i.e. (n,m), where n is number of rows and m is
the number of columns
import numpy as np
a = np.array([[1,2,3],[4,5,6]])
print(a.shape)
The output of the above code will be (2,3), i.e. 2 rows and 3 columns

ndarray.size
It returns the total number of elements of the array.
import numpy as np
a = np.array([[1,2,3],[4,5,6]])
print(a.size)
The output of the above code will be 6 i.e. 2 x 3

ndarray.dtype
It returns an object describing the type of elements in the array.
import numpy as np
a = np.array([[1,2,3],[4,5,6]])
print(a.dtype)
The output of the above code will be "int32" i.e. 32-bit integer
we can explicitly define the data type of a NumPy array
import numpy as np
a = np.array([[1,2,3],[4,5,6]], dtype = float)
print(a.dtype)
The above code will return "float64" i.e. 64-bit float

ndarray.itemsize
It returns the size in bytes of each element of the array.
import numpy as np
a = np.array([[1,2,3],[4,5,6]])
print(a.itemsize)
The output of the above code will be 4 i.e. 32/8

https://www.c-sharpcorner.com/ebooks/ 13
ndarray.data
It returns the buffer containing the actual elements of the array. This is an alternative of
accessing the elements through indexing.
import numpy as np
a = np.array([[1,2,3],[4,5,6]])
print(a.data)
The above code will return the list of elements

ndarray.sum()
The function will return the sum of all the elements of the ndarray
import numpy as np
a = np.random.random( (2,3) )
print(a)
print(a.sum())

The matrix generated for me is [[0.46541517 0.66668157 0.36277909]


[0.7115755 0.57306008 0.64267163]],
hence for me above code will return 3.422183052180838. Since random number is used here,
hence you may not get the same output.

ndarray.min()
The function will return the minimum element value from the ndarray
import numpy as np
a = np.random.random( (2,3) )
print(a.min())
The matrix generated for me is [[0.46541517 0.66668157 0.36277909]
[0.7115755 0.57306008 0.64267163]],
hence for me above code will return 0.36277909. Since random number is used here, hence
you may not get the same output

ndarray.max()
The function will return the maximum element value from the ndarray
import numpy as np
a = np.random.random( (2,3) )
print(a.max())
The matrix generated for me is [[0.46541517 0.66668157 0.36277909]
[0.7115755 0.57306008 0.64267163]],
hence for me the above code will return 0.7115755. Since random number is used here, hence
you may not get the same output

NumPy Functions
numpy.type()
Syntax
type(numpy.ndarray)

https://www.c-sharpcorner.com/ebooks/ 14
It is a python function which is used to return the type of the parameter passed. In the case of
numpy array, it will return numpy.ndarray
import numpy as np
a = np.array([[1,2,3],[4,5,6]])
print(type(a))
The above code will return numpy.ndarray

numpy.zeros()
Syntax
numpy.zeros((rows,columns), dtype)
The above function will create a numpy array of the given dimensions with each element being
zero. If no dtype is defined, default dtype is taken
import numpy as np
np.zeros((3,3))
print(a)
The above code will result in a 3x3 numpy array with each element being zero.

numpy.ones()
Syntax
numpy.ones((rows,columns), dtype)
The above function will create a numpy array of the given dimensions. If no dtype is defined with
each element being one, default dtype is taken.
import numpy as np
np.ones((3,3))
print(a)
The above code will result in a 3x3 numpy array with each element being one.

numpy.empty()
Syntax
numpy.empty((rows,columns))
The above function creates an array whose initial content is random and depends on the state of
the memory.
import numpy as np
np.empty((3,3))
print(a)
The above code will result in a 3x3 numpy array with each element being random.

numpy.arange()
Syntax
numpy.arange(start, stop, step)
The above function is used to make a numpy array with elements in the range between
the start and stop value with the difference of step value.

https://www.c-sharpcorner.com/ebooks/ 15
import numpy as np
a=np.arange(5,25,4)
print(a)
The output of the above code will be [ 5 9 13 17 21 ]

numpy.linspace()
Syntax
numpy.linspace(start, stop, num_of_elements)
The above function is used to make a numpy array with elements in the range between
the start and stop value and num_of_elements as the size of the numpy array. The default dtype
of numpy array is float64
import numpy as np
a=np.linspace(5,25,5)
print(a)
The output of the above code will be [ 5 10 15 20 25 ]

numpy.logspace()
Syntax
numpy.logspace(start, stop, num_of_elements)
The above function is used to make a numpy array with elements in the range between
the start and stop value and num_of_elements as the size of the numpy array. The default dtype
of numpy array is float64. All the elements will be spanned over logarithmic scale i.e the
resulting elements are the log of the corresponding element.
import numpy as np
a=np.logspace(5,25,5)
print(a)
The output of the above code will be [1.e+05 1.e+10 1.e+15 1.e+20 1.e+25]

numpy.sin()
Syntax
numpy.sin(numpy.ndarray)
The above code will return the sin of the given parameter.
import numpy as np
a=np.logspace(5,25,2)
print(np.sin(a))
The output of the above code will be [ 0.0357488 -0.3052578]
Similarly, there are cos() , tan(), etc.

numpy.reshape()
Syntax
numpy.resahpe(dimensions)
The above function is used to change the dimensions of a numpy array. The number of
arguments in the reshape decides the dimensions of the numpy array.

https://www.c-sharpcorner.com/ebooks/ 16
import numpy as np
a=np.arange(9).reshape(3,3)
print(a)
The output of the above code will be a 2D array with 3x3 dimensions

numpy.random.random()
Syntax
numpy.random.random( (rows, column) )
The above function is used to return a numpy ndarray with the given dimensions and each
element of ndarray being randomly generated.
a = np.random.random((2,2))
The above code will return a 2x2 ndarray

numpy.exp()
Syntax
numpy.exp(numpy.ndarray)
The above function returns a ndarray with exponential of every element
b = np.exp([10])
The above code returns the value 22026.4657948

numpy.sqrt()
Syntax
numpy.sqrt(numpy.ndarray)
The above function returns a ndarray with ex of every element
b = np.sqrt([16])
The above code returns the value 4

NumPy Basic Operations


a = np.array( [ 5, 10, 15, 20, 25] )
b = np.array( [ 0, 1, 2, 3 ] )
1. The below code will return the difference between the two arrays
c = a - b
2. The below code will return the arrays containing the square of each element
b**2
3. The below code will return the value according to the given expression
10* np.sin(a)
4. The below code will return "true" at every element position which satisfies the given condition
a<15

NumPy Array Basic Operations


a = np.array( [[1,1], [0,1]])
b = np.array( [[2,0],[3,4]])

https://www.c-sharpcorner.com/ebooks/ 17
1. The below code will return the elementwise product of both the arrays
a * b
2. The below code will return the matrix product of both the arrays
a @ b
or
a.dot(b)

Conclusion
In this chapter, we studied numpy, installing numpy, numpy array, numpy array attributes,
numpy functions, numpy basic operations and numpy array basic operations. Hope you were
able to understand each and everything. For any doubts, please comment on your query.

https://www.c-sharpcorner.com/ebooks/ 18
4
Python Pandas

Overview

In this chapter, we explore Python Pandas, covering its key features


and installation process. Dive into the practical uses of Pandas for
data manipulation, including data structures, reading and writing
data, handling missing values, and iterating over rows. Equip
yourself with the skills to efficiently manage and analyze data using
Pandas.

https://www.c-sharpcorner.com/ebooks/ 19
What is Pandas in Python?
According to Wikipedia, Pandas is a software library written for the Python programming
language for data manipulation and analysis. In particular, it offers data structures and
operations for manipulating numerical tables and time series. It is free software released under
the three-clause BSD license. The name is derived from the term "panel data", a term for data
sets that include observations over multiple periods for the same individuals
The original author is Wes McKinney. Pandas was first released on 11 January 2008. The
official website is www.pandas.pydata.org

Uses of Pandas in Python


• DataFrame object for data manipulation with integrated indexing.
• Tools for reading and writing data between in-memory data structures and different file
formats.
• Data alignment and integrated handling of missing data.econometrics
• Reshaping and pivoting of data sets.
• Label-based slicing, fancy indexing, and subsetting of large data sets.
• Data structure column insertion and deletion.
• Group by engine allowing split-apply-combine operations on data sets.
• Data set merging and joining.
• Hierarchical axis indexing to work with high-dimensional data in a lower-dimensional
data structure.
• Time series-functionality: Date range generation and frequency conversion, moving
window statistics, moving window linear regressions, date shifting and lagging.
• Provides data filtration.

Installing Pandas in Python


Ubuntu/Linux
sudo apt update -y
sudo apt upgrade -y
sudo apt install python3-tk python3-pip -y
sudo pip install numpy -y

Anaconda Prompt
conda install -c anaconda pandas

Anaconda Navigator
https://docs.anaconda.com/anaconda/navigator/tutorials/pandas/

Input and Output using Python Pandas


In this section, I will be explaining how to load data onto our python environment from various
file formats.

Reading and Writing CSV


This section concentrates on reading from and writing to CSV files. According to Wikipedia, A
comma-separated values file is a delimited text file that uses a comma to separate values. Each
line of the file is a data record. Each record consists of one or more fields, separated by
commas.

https://www.c-sharpcorner.com/ebooks/ 20
pandas.read_csv()
This function is used to read CSV or comma-separated values files
Syntax
pandas.read_csv(filepath_or_buffer: Union[str, pathlib.Path, IO[~AnyStr]], sep=',',
delimiter=None, header='infer', names=None, index_col=None, usecols=None, squeeze=False,
prefix=None, mangle_dupe_cols=True, dtype=None, engine=None, converters=None,
true_values=None, false_values=None, skipinitialspace=False, skiprows=None, skipfooter=0,
nrows=None, na_values=None, keep_default_na=True, na_filter=True, verbose=False,
skip_blank_lines=True, parse_dates=False, infer_datetime_format=False,
keep_date_col=False, date_parser=None, dayfirst=False, cache_dates=True, iterator=False,
chunksize=None, compression='infer', thousands=None, decimal=b'.', lineterminator=None,
quotechar='"', quoting=0, doublequote=True, escapechar=None, comment=None,
encoding=None, dialect=None, error_bad_lines=True, warn_bad_lines=True,
delim_whitespace=False, low_memory=True, memory_map=False, float_precision=None)
import pandas as pd
df = pd.read_csv('titanic.csv', header=None, nrows=10)
print(df)
In the above code, we are reading the 'titanic.csv' file and converting it to DataFrame object

DataFrame.to_csv()
This function is used to write to CSV or comma-separated values files
Syntax

DataFrame.to_csv(self, path_or_buf=None, sep=', ', na_rep='', float_format=None,


columns=None, header=True, index=True, index_label=None, mode='w', encoding=None,
compression='infer', quoting=None, quotechar='"', line_terminator=None, chunksize=None,
date_format=None, doublequote=True, escapechar=None, decimal='.')
import pandas as pd
data = {'Name':['C','Sharp','Corner'], 'Age':[20,21,22], 'Address':['De
lhi','Kanpur','Tamil Nadu']}
df = pd.DataFrame(data)
df.to_csv('new.csv')
The above code will create a new file named new.csv which will contain data

Reading and Writing to Excel


This section concentrates on reading from and writing to excel files. According to Wikipedia,
Microsoft Excel is a spreadsheet developed by Microsoft for Windows, macOS, Android and
iOS. It features calculation, graphing tools, pivot tables, and a macro programming language
called Visual Basic for Applications.

pandas.read_excel()
This function is used to read excel files
Syntax
pandas.read_excel(io, sheet_name=0, header=0, names=None, index_col=None,
usecols=None, squeeze=False, dtype=None, engine=None, converters=None,

https://www.c-sharpcorner.com/ebooks/ 21
true_values=None, false_values=None, skiprows=None, nrows=None, na_values=None,
keep_default_na=True, verbose=False, parse_dates=False, date_parser=None,
thousands=None, comment=None, skip_footer=0, skipfooter=0, convert_float=True,
mangle_dupe_cols=True, **kwds)
import pandas as pd
df = pd.read_excel('titanic.xlsx', header=None, nrows=10)
print(df)
The above code will read 10 rows from titanic.xslx and will write them to DataFrame df

pandas.to_excel()
This function is used to write to excel files
Syntax
DataFrame.to_excel(self, excel_writer, sheet_name='Sheet1', na_rep='', float_format=None,
columns=None, header=True, index=True, index_label=None, startrow=0, startcol=0,
engine=None, merge_cells=True, encoding=None, inf_rep='inf', verbose=True,
freeze_panes=None)
import pandas as pd
data = {'Name':['C','Sharp','Corner'], 'Age':[20,21,22], 'Address':['De
lhi','Kanpur','Tamil Nadu']}
df = pd.DataFrame(data)
df.to_excel('new.xlsx')
The above code will write pandas.DataFrame df to new.xlsx

Reading and Writing from JSON


This section concentrates on reading from and writing to JSON files. According to Wikipedia,
JavaScript Object Notation is an open-standard file format or data interchange format that uses
human-readable text to transmit data objects consisting of attribute–value pairs and array data
types.

pandas.read_json()
This function is used to read JSON or JavaScript Object Notation file
Syntax
pandas.read_json(path_or_buf=None, orient=None, typ='frame', dtype=None,
convert_axes=None, convert_dates=True, keep_default_dates=True, numpy=False,
precise_float=False, date_unit=None, encoding=None, lines=False, chunksize=None,
compression='infer')
import pandas as pd
df = pd.read_json('titanic.json')
print(df)
The above code will read the titanic.json file and convert the data to pandas.DataFrame object

DataFrame.to_json()
This function is used to write to JSON or JavaScript Object Notation file
Syntax

https://www.c-sharpcorner.com/ebooks/ 22
DataFrame.to_json(self, path_or_buf=None, orient=None, date_format=None,
double_precision=10, force_ascii=True, date_unit='ms', default_handler=None, lines=False,
compression='infer', index=True
import pandas as pd
data = {'Name':['C','Sharp','Corner'], 'Age':[20,21,22], 'Address':['De
lhi','Kanpur','Tamil Nadu']}
df = pd.DataFrame(data)
df.to_json('new.json')
The above will convert pandas.DataFrame to json and write it to titanic.json

Pandas Data Structure


In computer science, a data structure is a data organization, management, and storage format
that enables efficient access and modification. More precisely, a data structure is a collection of
data values, the relationships among them, and the functions or operations that can be applied
to the data.

Pandas Series
Syntax
pandas.Series(data=None,index=None, dtype=None, name=None, copy=False,
fastpath=False)
It is a 1D ndarray with labels, these labels can be unique, but there is no compulsion on them
being unique. It supports both integer and label-based indexing. The Pandas series has a bunch
of methods for performing various operations on it.
import pandas as pd
s = pd.Series([1, 2, 3, 4], index = ['A', 'B', 'C', 'D'])
The above code will give the following result:

In the following, we are converting a numpy ndarray to pandas series


import pandas as pd
import numpy as np
data = np.array(['c','s','h','a','r','p'])
s = pd.Series(data)
print (s)
The above will result the following:

https://www.c-sharpcorner.com/ebooks/ 23
Note: the default index of pandas series is 0, 1, 2 ....

Pandas Series Slicing


Slicing means to extract only a part of the given data structure.
import pandas as pd
import numpy as np
data = np.array(['c','s','h','a','r','p','c','o','r','n','e','r'])
s = pd.Series(data)
print(s[:4])
The code will output the following:

import pandas as pd
import numpy as np
data = np.array(['c','s','h','a','r','p','c','o','r','n','e','r'])
s = pd.Series(data)
print(s[5:])
The output of the above code will be

import pandas as pd
import numpy as np
data = np.array(['c','s','h','a','r','p','c','o','r','n','e','r'])
s = pd.Series(data)
print(s[1:6])
The output of the above code will be

import pandas as pd
import numpy as np
data = np.array(['c','s','h','a','r','p','c','o','r','n','e','r'])

s = pd.Series(data)
print(s[6])

https://www.c-sharpcorner.com/ebooks/ 24
The output of the above code will be c

Python Pandas Series Functions


Following are a list of functions used by pandas series library
Function Description
It is used to add series or list-like objects with the same length to the caller
add()
series
It is used to subtract series or list-like objects with the same length from the
sub()
caller series
It is used to multiply series or list-like objects with the same length with the caller
mul()
series
It is used to divide series or list-like objects with the same length by the caller
div()
series
sum() It returns the sum of the values for the requested axis
prod() It returns the product of the values for the requested axis
mean() It returns the mean of the values for the requested axis
It is used to put each element of passed series as exponential power of caller
pow()
series and returns the results
It is used to get the absolute numeric value of each element in
abs()
Series/DataFrame
conv() It is used to return the covariance of two series
combine_first() It is used to combine two series into one
count() It returns the number of non-NA/null observations in the series
size() It returns the number of elements in the underlying data
name() It is used to give a name to series object i.e. to the column
is_unique() It returns boolean if values in the object are unique
idxmax() It is used to extract the index positions of the highest values in the Series
idxmin() It is used to extract the index positions of the lowest values in the Series
sort_values() It is used to sort values of a series in ascending or descending order
sort_index() It is used to sort the indexes of a series in ascending or descending order
It is used to return a series of a specified number of rows from the beginning of a
head()
Series
It is used to return a series of a specified number of rows from the end of a
tail()
Series
It is used to compare every element of the caller series with passed series. It
le() returns true for every element which is less than or equal to the element in
passed series
It is used to compare every element of the caller series with passed series. It
ne()
returns true for every element which is not equal to the element in passed series
It is used to compare every element of the caller series with passed series. It
ge() returns true for every element which is greater than or equal to the element in
passed series
It is used to compare every element of the caller series with passed series. It
eq()
returns true for every element which is equal to the element in passed series
It is used to compare every element of the caller series with passed series. It
gt()
returns true for every element which is greater than the element in passed series
It is used to compare every element of the caller series with passed series. It
lt()
returns true for every element which is less than the element in passed series

https://www.c-sharpcorner.com/ebooks/ 25
clip() It is used to clip values below and above the passed least and max values
clip_lower() It is used to clip values below a passed least value
clip_upper() It is used to clip values above a passed maximum value
astype() It is used to change the type of a series
tolist() It is used to convert series to list
get() It is used to extract values from a series
unique() It is used to see the unique values in a particular column
nunique() It is used to count the unique values
value_counts() It is used to count the number of the times each unique value occurs in a series
It is used to get the numeric representation of an array( which is then converted
factorize()
to series) by identifying distinct values
map() It is used to tie together the values from one object to another
between() It is used to check which values lie between 1st and 2nd argument
It is used for executing custom operations that are not included in pandas or
apply()
numpy

Pandas DataFrame
It is a 2D size-mutable, potentially heterogeneous tabular labelled data structure with columns of
potentially different types.
Pandas DataFrame consists of 3 principal components, the data, rows and columns.
Pandas DataFrame output automatically inserts the index, default index is 1,2,3 ......
data = {'Country': ['Belgium', 'India', 'Brazil'],'Capital': ['Brusse
ls', 'New Delhi', 'Brasilia'],'Population': [11190846, 1303171035, 20
7847528]}
df = pd.DataFrame(data,columns=['Country', 'Capital', 'Population'])
The above code, will output the following:

import pandas as pd
data = {'Name':['C','Sharp','Corner'], 'Age':[20,21,22]}
df = pd.DataFrame(data)
The above code, will output the following:

Pandas DataFrame Column Selection


The following code demonstrates how we can load only the desired columns from the file.
import pandas as pd
data = {'Name':['C','Sharp','Corner'], 'Age':[20,21,22], 'Address':['De
lhi','Kanpur','Tamil Nadu']}

https://www.c-sharpcorner.com/ebooks/ 26
df = pd.DataFrame(data)
print(df[['Name','Address']])
The above code, will output the following:

Pandas DataFrame Rows Selection


The following code demonstrates how we can load only the desired rows from the file.
import pandas as pd
data = {'Name':['C','Sharp','Corner'], 'Age':[20,21,22], 'Address':['De
lhi','Kanpur','Tamil Nadu']}
df = pd.DataFrame(data)
data1= df.loc[0]
print(data1)
The output of the above code will be:

Pandas DataFrame Checking Missing Data Value(s)


Missing values is a very big problem, when it comes to preparing data for generating model. The
magnitude of the problem can be estimated from the fact that one missing data point can
hamper the efficiency of the model and can also result in generating a wrongly trained model.
To handle missing values, we use two functions i.e. isnull() and notnull()
1. isnull(): this function checks if the DataFrame element is empty. It returns true if the data is
missing, else it will return false
2. notnull(): this function checks if the DataFrame element is not empty. It returns false if the
data is missing, else it will return true
import pandas as pd
data = {'Name':['C','Sharp','Corner'], 'Age':[20,21,22], 'Address':['De
lhi','Kanpur','Tamil Nadu']}
df = pd.DataFrame(data)
df.isnull()
The above code will output the following:

https://www.c-sharpcorner.com/ebooks/ 27
Pandas DataFrame Filling Missing Value
Earlier we checked if the values are empty or not, and if any value is missing then we can fill the
values using fillna(), replace() and interpolate()

DataFrame.fillna()
Syntax
fillna(self, value=None, method=None, axis=None, inplace=None, limit=None, downcast=None,
**kwargs)
This function will replace the NaN value with the passed values
import pandas as pd
import numpy as np
data = {'Name':[np.nan,'Sharp','Corner'], 'Age':[20,np.nan,22], 'Addres
s':[np.nan,'Kanpur','Tamil Nadu']}
df = pd.DataFrame(data)
df.fillna(0)
the output of the above code will be:

In the above code, we are replacing all "NaN" values with "0"

DataFrame.replace()
Syntax
replace(self, to_replace=None, value =None, inplace=None, Limit=None, regex=None,
method='pad')
Values of DataFrame are replaced with other values dynamically.
import pandas as pd
import numpy as np
data = {'Name':[np.nan,'Sharp','Corner'], 'Age':[20,np.nan,22], 'Addres
s':[np.nan,'Kanpur','Tamil Nadu']}
df = pd.DataFrame(data)
df.replace()
the output of the above code will be

In the above output, we replaced the NaN value with the previous value

https://www.c-sharpcorner.com/ebooks/ 28
DataFrame.interpolate()
Syntax
interpolate(self, method='linear', axis=0, limit=None, inplace=False, limit_direction='forward',
limit_area=None, downcast=None, **kwargs)
This function is used to fill NA values based upon different interpolation techniques
import pandas as pd
import numpy as np
data = {'Name':[np.nan,'Sharp','Corner'], 'Age':[20,np.nan,22], 'Addres
s':[np.nan,'Kanpur','Tamil Nadu']}
df = pd.DataFrame(data)
df.interpolate()
the output of the above code will be

In the above output, we performed linear interpolation. Since at the time of row zero we don’t
have any previous value hence they cannot be replaced with interpolated value.

Pandas DataFrame Dropping Missing Values


It is often seen that having incomplete knowledge is more dangerous than having no knowledge.
So, to save guard us against such a situation we delete the incomplete data and keep only
those data rows that are complete in themselves. For this, we use dropna().
import pandas as pd
import numpy as np
data = {'Name':[np.nan,'Sharp','Corner'], 'Age':[20,np.nan,22], 'Addres
s':[np.nan,'Kanpur','Tamil Nadu']}
df = pd.DataFrame(data)
df.dropna()
The output of the above code will be

In the above output, you can see that only row 2 is in the output, this is because rows 0 & 1 had
NaN values.

Iterating Over Pandas DataFrame Rows


DataFrame.iterrows()
It is used to get each element of each row

https://www.c-sharpcorner.com/ebooks/ 29
import pandas as pd
import numpy as np
data = {'Name':[np.nan,'Sharp','Corner'], 'Age':[20,np.nan,22], 'Addres
s':[np.nan,'Kanpur','Tamil Nadu']}
df = pd.DataFrame(data)
for i, j in df.iterrows():
print(i,j)
print()
The output of the above code will be

Iterating Over Pandas DataFrame Rows


DataFrame.iterrows()
It is used to get each element of each row
import pandas as pd
import numpy as np
data = {'Name':[np.nan,'Sharp','Corner'], 'Age':[20,np.nan,22], 'Addres
s':[np.nan,'Kanpur','Tamil Nadu']}
df = pd.DataFrame(data)
col = list(df)
for i in col:
print(df[i])
The output of the above code will be

Python Pandas DataFrame Functions


Following are a list of functions under by pandas series library

https://www.c-sharpcorner.com/ebooks/ 30
Function Description
index() It returns index (row labels) of the DataFrame
insert() it inserts a column into a DataFrame
It returns addition of DataFrame and other, element-wise, it is equivalent to
add()
binary add
It returns subtraction of DataFrame and other, elementwise, it is equivalent
sub()
to binary sub
It returns multiplication of DataFrame and other, elementwise, it is
mul()
equivalent to binary mul
It returns floating division of DataFrame and other, elementwise, it is
div()
equivalent to binary truediv
unique() It extracts the unique values in the DataFrame
nunique() It returns the count of the unique value in DataFrame
value_counts() It counts the number of times each unique value occurs within the Series
columns() It returns the column labels of the DataFrame
axes() It returns a list representing the axes of the DataFrame
isnull() It creates a Boolean Series for extracting rows with null values
notnull() It creates a Boolean Series for extracting rows with non-null values
between() It extracts rows where a column value falls in between a predefined range
It extracts rows from a DataFrame where a column value exists in a
isin()
predefined collection
It returns a Series with the data type of each column. The result's index is
dtypes()
the original DataFrame's columns
astypes() It converts the data types in a Series
It returns a Numpy representation of the DataFrame i.e. axes labels will be
values()
removed
sort_values()- Set,
It sorts a DataFrame in Ascending or Descending order of passed column
Set2
It sorts the values in a DataFrame based on their index positions or labels
sort_index() instead of their values but sometimes a DataFrame is made out of two or
more DataFrames and hence later index can be changed using this method
loc() It retrieves rows based on an index label
iloc() It retrieves rows based on an index position
It retrieves DataFrame rows based on either index label or index position.
ix()
This method is the best combination of loc() and iloc() methods
rename() It is used to change the names of the index labels or column names
columns() It is used to change the column name
drop() It is used to delete rows or columns from a DataFrame
pop() It is used to delete rows and columns from a DataFrame
sample() It pulls out a random sample of rows or columns from a DataFrame
nsmallest() It pulls out the rows with the smallest values in a column
nlargest() It pulls out the rows with the largest values in a column
shape() It returns a tuple representing the dimensionality of DataFrame
ndim() It returns an 'int' representing the number of axes/ array dimensions
rank() It returns values in a Series can be ranked in order with this method
It is an alternative string-based syntax for extracting a subset from a
query()
DataFrame
copy() It creates an independent copy of pandas object

https://www.c-sharpcorner.com/ebooks/ 31
It creates a Boolean Series and uses it to extract rows that have a duplicate
duplicated()
value
It is alternative of 'duplicated()' with the capability of removing them through
drop_duplicates()
filtering
It sets the DataFrame index (row labels) using one or more existing
set_index()
columns
reset_index() It resets the index of a DataFrame
It is used to check a DataFrame for one or more condition and return the
where()
result accordingly

Conclusion
In this chapter, we studied python pandas, uses of pandas in python, installing pandas, input
and output using python pandas, pandas series and pandas dataframe. Hope you were able to
understand each and everything. For any doubts, please comment on your query.

https://www.c-sharpcorner.com/ebooks/ 32
5
Python Skit-Learn

Overview

In this chapter, we introduce Scikit-Learn, covering its features and


installation steps. We explain classification with a practical example
using the MNIST Digits Database. Key functions like loading datasets,
splitting data, training models, and performance analysis techniques
are discussed. This overview equips you to effectively use Scikit-
Learn for machine learning projects.

https://www.c-sharpcorner.com/ebooks/ 33
What is Scikit-Learn?
According to Wikipedia, SciKit-learn (formerly scikits.learn) is a free software machine learning
library for the Python programming language.
It features various classification, regression and clustering algorithms including support vector
machines, random forests, gradient boosting, k-means and DBSCAN, and is designed to
interoperate with the Python numerical and scientific libraries NumPy and SciPy.
The scikit-learn project started as scikits.learn a Google Summer of Code project by David
Cournapeau. Its name stems from the notion that it is a "SciKit" (SciPy Toolkit), a separately-
developed and distributed third-party extension to SciPy.
It was first released on June 2012. The official website is www.scikit-learn.org.

Are Scikit-Learn and Sklearn same?


Yes, both are same, with the difference that "scikit-learn" is the official name of the package and
is mostly used only for installing the package, whereas "sklearn" is the abbreviated name which
is used when we have to use it for python programming.

Features of Scikit-Learn
• Clustering: SciKit-learn provides functions for grouping unlabeled data such as KMeans.
• Cross-Validation: SciKit-learn provides functions for estimating the performance of
supervised models on unseen data.
• Datasets: SciKit-learn provides functions for test datasets and for generating datasets
with specific properties for investigating model behavior.
• Dimensionality Reduction: SciKit-learn provides functions for reducing the number of
attributes in data for summarization, visualization and feature selection such as Principal
component analysis.
• Ensemble methods: SciKit-learn provides functions for combining the predictions of
multiple supervised models.
• Feature extraction: SciKit-learn provides functions for defining attributes in image and
text data.
• Feature selection: SciKit-learn provides functions for identifying meaningful attributes
from which to create supervised models.
• Parameter Tuning: SciKit-learn provides functions for getting the most out of supervised
models.
• Manifold Learning: SciKit-learn provides functions for summarizing and depicting
complex multi-dimensional data.
• Supervised Models: SciKit-learn provides functions for a vast array not limited to
generalized linear models, discriminate analysis, naive Bayes, lazy methods, neural
networks, support vector machines and decision trees.

Installing Scikit-Learn in Python


Ubuntu/Linux
sudo apt update -y
sudo apt upgrade -y
sudo apt install python3-tk python3-pip -y
sudo pip install scikit-learn -y

https://www.c-sharpcorner.com/ebooks/ 34
Anaconda Prompt
conda install scikit-learn

Classification
Classification is the process of predicting the class of given data points. Classes are sometimes
called as targets/ labels or categories.
For example, spam detection in email service providers can be identified as a classification
problem. This is s binary classification since there are only 2 classes as spam and not spam.
A classifier utilizes some training data to understand how given input variables relate to the
class. In this case, known spam and non-spam emails have to be used as the training data.
When the classifier is trained accurately, it can be used to detect an unknown email.
Classification belongs to the category of supervised learning where the targets also provided
with the input data. There are many applications in classification in many domains such as in
credit approval, medical diagnosis, target marketing etc.
from sklearn import datasets
from sklearn.linear_model import LinearRegression
digits = datasets.load_digits() #loading the MNIST Digits Database

clf=LinearRegression() #creating LinearRegression Classifier

Python Scikit-Learn Functions


In this section, we will be studying some commonly used SciKit-Learn functions.

Loading Dataset
Scikit-learn comes with a few standard datasets, for instance, the iris and digits datasets for
classification and the Boston house prices dataset for regression.
from sklearn import datasets
iris = datasets.load_iris()
digits = datasets.load_digits()
The above code will load iris dataset into "iris" and MNIST Digits dataset into "digits"

1. ndarray.data
we use data attribute to show the data that was loaded
print(digits.data)
The above will result in the following:

Similarly, we can print iris.data

https://www.c-sharpcorner.com/ebooks/ 35
2. ndarray.target
we use the target attribute to show the labels of the dataset that we loaded
print(digits.target)
The above will result : [0 1 2 ... 8 9 8]. Similarly, we can print iris.target

Breaking data into Training and Test Set


A dataset can be divided into 3 parts:
• Test Set: It is the part of the dataset, which is used for Black box testing, i.e. here the
motive is to test the model on data that the model has never seen before. Here we use
metrics like the Confusion Matrix, accuracy score and F1 score.
• Training Set: It is the part of the dataset which is used to train the model and make the
hypothesizes more accurate
• Validation set: In machine learning, when we generate a model, we need to validate the
model before sending it for the final testing. So, to validate the working of the model we
use the validation set
For Example:
Let us take the MNIST cloth dataset
Here we would like to generate a model that can predict the type and genre of the cloth.
So, to do so, we would divide the whole dataset into a ratio of 7:3 where 70% of the data is the
training + Validation Data and 30% is the Test data
In Python to segregate the training and test data, we use the
sklear.model_selection.train_test_split()
from sklearn import datasets
from sklearn.model_selection import train_test_split
digits = datasets.load_digits() #loading the MNIST Digits Database

X,y = digits.data, digits.target


X_train, X_test, y_train, y_test = train_test_split( X, y, test_size=0.
3, random_state=0)
The above code demonstrates how to divide the dataset in the 7:3 ratio

Learning and Predicting


Learning
According to Wikipedia, Learning is the process of acquiring new or modifying existing,
knowledge, behaviors, skills, values, or preferences. The ability to learn is possessed by
humans, animals, and some machines; there is also evidence for some kind of learning in
certain plants.
In the case of Machine Learning, learning is a process that improves the knowledge of an AI
program by making observations about its environment.

Prediction
According to Wikipedia, A prediction, or forecast, is a statement about a future event. A
prediction is often, but not always, based upon experience or knowledge. There is no universal

https://www.c-sharpcorner.com/ebooks/ 36
agreement about the exact difference between the two terms; different authors and disciplines
ascribe different connotations.
“Prediction” refers to the output of an algorithm after it has been trained on a historical dataset
and applied to new data when you’re trying to forecast the likelihood of a particular outcome,
such as whether a customer will buy yogurt in 30 days. The algorithm will generate probable
values for an unknown variable for each record in the new data, allowing the model builder to
identify what that value will most likely be.

classifier.fit(labels,targets)
The fit method is provided on every estimator. It usually takes some samples X, targets y if the
model is supervised, and potentially other sample properties such as sample_weight.
It should:
• clear any prior attributes stored on the estimator, unless warm_start is used.
• validate and interpret any parameters, ideally raising an error if invalid
• validate the input data.
• estimate and store model attributes from the estimated parameters and provided data,
and
• return the now fitted estimator to facilitate method chaining.
from sklearn import datasets
from sklearn.linear_model import LinearRegression
digits = datasets.load_digits() #loading the MNIST Digits Database

clf=LinearRegression() #creating LinearRegression Classifier

X,y = digits.data[:-10], digits.target[:-


10] #selecting the first 10 data and targets

clf.fit(X,y) #we fitted the given data and targets to form a model

classifier.predict()
It predicts each sample, usually only taking X as input. In a classifier or regressor, this prediction
is in the same target space used in fitting (e.g. one of {‘red’, ‘amber’, ‘green’} if the y in fitting
consisted of these strings). Despite this, even when y is passed to fit, the output
of predict should always be an array or sparse matrix. In a clustered or outlier detector the
prediction is an integer.
If the estimator was not already fitted, calling this method should raise an
exceptions.NotFittedError.
from sklearn import datasets
from sklearn.linear_model import LinearRegression
digits = datasets.load_digits() #loading the MNIST Digits Database

clf=LinearRegression() #creating LinearRegression Classifier

X,y = digits.data[:-10], digits.target[:-


10] #selecting the first 10 data and targets

clf.fit(X,y) #we fitted the given data and targets to form a model

https://www.c-sharpcorner.com/ebooks/ 37
clf.predict(X[0].reshape(1,-
1)) #we are predicting the output based on the trained model

Performance Analysis
Performance Analysis is the process of studying or evaluating the performance of a particular
scenario in comparison of the objective which was to be achieved.
SciKit-Learn provides various functions to study the performance of a model or algorithm.

classifier.confusion_matrix()
According to Wikipedia, a confusion matrix is a specific table layout that allows visualization of
the performance of an algorithm, typically a supervised learning one (in unsupervised learning it
is usually called a matching matrix). Each row of the matrix represents the instances in a
predicted class while each column represents the instances in an actual class (or vice versa).
The name stems from the fact that it makes it easy to see if the system is confusing two classes
(i.e. commonly mislabeling one as another).
It is a special kind of contingency table, with two dimensions ("actual" and "predicted"), and
identical sets of "classes" in both dimensions (each combination of dimension and class is a
variable in the contingency table).
A confusion matrix shows the number of “correct” predictions made by the algorithm, a
confusion matrix is often used for calculations of accuracy score.
Positive Negative
True TP TN
False FP FN
# Python script for confusion matrix creation.
from sklearn.metrics import confusion_matrix
actual = [1, 1, 0, 1, 0, 0, 1, 0, 0, 0]
predicted = [1, 0, 0, 1, 0, 0, 1, 1, 1, 0]
results = confusion_matrix(actual, predicted)
print ('Confusion Matrix :')
print(results)
The output of the above code will be: Confusion Matrix : [[4 2] [1 3]]

classifier.accuracy_score()
A method on an estimator, usually a predictor, which evaluates its predictions on a given
dataset, and returns a single numerical score. A greater return value should indicate better
predictions; accuracy is used for classifiers and R^2 for regressors by default.
If the estimator was not already fitted, calling this method should raise an
exceptions.NotFittedError.
Some estimators implement a custom, estimator-specific score function, often the likelihood of
the data under the model.
Accuracy = (TP+TN)/ (TP+TN+FN+FP)
where, TP- True Positive
TN- True Negative
FN- False Negative
FP- False Positive
from sklearn.metrics import accuracy_score

https://www.c-sharpcorner.com/ebooks/ 38
y_pred = [0, 2, 1, 3]
y_true = [0, 1, 2, 3]
accuracy_score(y_true, y_pred)
A list of SciKit-Learn Functionalities in form of table is provided below:
Sno. Function Name Description
1 sklearn.base Base Class for all the Estimators
2 sklear.calibration Calibration of Predicted Probabilities
It provides various unsupervised learning
3 sklear.cluster
algorithms
Provides all the functionalities of the K-Means
4 sklearn.cluster.k_means
clustering algorithm
5 sklear.cluster.bicluster It provides the spectral biclustering algorithms
Meta-estimators for building composite models
6 sklearn.compose
with transformers
This module includes methods and algorithms to
robustify estimate the covariance of features
7 sklearn.covariance given a set of points. The precision matrix
defined as the inverse of the covariance is also
estimated
It provides methods and algorithms to support
8 sklearn.cros_decomposition
cross decomposition
This module includes utilities to load datasets,
including methods to load and fetch popular
9 sklearn.datasets
reference datasets. It also provides artificial data
generators
This module includes matrix decomposition
10 sklearn.decomposition algorithms, including among others PCA, NMF or
ICA.
It provides Linear Discriminant Analysis and
11 sklearn.discriminant_analysis
Quadratic Discriminant Analysis
It provides Dummy Estimators which are helpful
12 sklearn.dummy to get a baseline value of those metrics for
random predictions
This module includes ensemble-based methods
13 sklearn.ensemble for classification, regression and anomaly
detection
This module contains all custom warnings and
14 sklearn.exceptions
error classes used across scikit-learn
This module provides importable modules that
15 sklearn.experimental enable the use of experimental features or
estimators
This module deals with features extraction from
16 sklearn.feature_extraction raw data. It can currently extract features from
text and images
This module implements feature selection
algorithms. It currently provides univariate filter
17 sklearn.feature_selection
selection methods and the recursive feature
elimination algorithm
This module implements Gaussian Process-
18 sklearn.gaussian_process
based regression and classification

https://www.c-sharpcorner.com/ebooks/ 39
This module provides us with capabilities to
19 sklearn.isotonic
implement isotonic regression
It provides transformers for missing value
20 sklearn.impute
imputation
This module implements several approximate
21 sklearn.kernel_approximation kernel feature maps based on Fourier
Transforms
It provides capabilities to help us implement
22 sklearn.kernel_ridge
kernel ridge regression
It module implements generalized linear models.
It includes Ridge regression, Bayesian
Regression, Lasso and Elastic Net estimators
23 sklearn.linear_model
computed with Least Angle Regression and
coordinate descent. It also implements
Stochastic Gradient Descent related algorithms.
It provides functionalities to implement Linear
24 sklearn.linear_model.LinearRegression
Regression
It provides functionalities to implement Logistic
25 sklearn.linear_model.LogisticRegression
Regression
This module implements data embedding
26 sklearn.mainifold
techniques
It includes score functions, performance metrics
27 sklearn.metrics
and pairwise metrics and distance computations
28 sklearn.metrics.accuracy_score It gives the accuracy classification score
29 sklearn.metrics.confusion_matrix It gives the confusion matrix
It gives the F1 score or balanced F-score or F-
30 sklearn.metrics.f1_Score
measure
It builds a text report showing the main
31 sklearn.metrics.classification_report
classification metrics
32 sklearn.metrics.precision_score It gives the precision of the classification
33 sklearn.metrics.mean_absolute_error It gives the mean absolute error regression loss
34 sklearn.metrics.mean_squared_error It gives the mean squared error regression loss
This module implements mixture modelling
35 sklearn.mixture
algorithms
36 sklearn.model_selection This module contains model selections functions
This module provides functionalities for
37 sklearn.multiclass implementation of multiclass and multilabel
classification
This module implements multioutput regression
and classification. The estimators provided in this
module are meta-estimators: they require a base
38 sklearn.multioutput
estimator to be provided in their constructor. The
meta-estimator extends single output estimators
to multi-output estimators.
This module implements Naive Bayes
algorithms. These are supervised learning
39 sklearn.naive_bayes methods based on applying Bayes’ theorem with
strong (naive) feature independence
assumptions.
This module implements the k-nearest neighbors’
40 sklearn.neighbours
algorithm.

https://www.c-sharpcorner.com/ebooks/ 40
This module includes models based on neural
41 sklearn.neaural_network
networks
This module implements utilities to build a
42 sklearn.pipeline composite estimator, as a chain of transforms
and estimators
43 sklearn.inspection This module includes tools for model inspection
This module includes scaling, centring,
44 sklearn.preprocessing normalization, binarization and imputation
methods
It provides random_projection. Random
Projections are a simple and computationally
efficient way to reduce the dimensionality of the
45 sklearn.random_projection
data by trading a controlled amount of accuracy
(as additional variance) for faster processing
times and smaller model sizes.
This module implements semi-supervised
learning algorithms. These algorithms utilized
46 sklearn.semi_supervised small amounts of labelled data and large
amounts of unlabeled data for classification
tasks. This module includes Label Propagation.
This module includes Support Vector Machine
47 sklearn.svm
algorithms
This module includes decision tree-based
48 sklearn.tree
models for classification and regression.
49 sklearn.utils It includes various utilities

Conclusion
In this chapter, we studied python scikit-learn, features of scikit-learn in python, installing scikit-
learn, classification, how to load datasets, breaking dataset into test and training sets, learning
and predicting, performance analysis and various functionalities provided by scikit-learn. Hope
you were able to understand each and everything. For any doubts, please comment on your
query.

https://www.c-sharpcorner.com/ebooks/ 41
6
Python Matplotlib
Overview

In this chapter, we delve into the basics of Matplotlib, a plotting


library for Python. We'll cover its key components, including the
object-oriented API, various plotting functions, and installation
methods. Get ready to harness the power of Matplotlib to create a
wide array of plots, from simple 2D line charts to complex 3D
surface plots, enhancing your data visualization skills in Python.

https://www.c-sharpcorner.com/ebooks/ 42
What is MatPlotLib?
According to Official Documentation, Matplotlib is a plotting library for the Python programming
language and its numerical mathematics extension NumPy. It provides an object-oriented API
for embedding plots into applications using general-purpose GUI toolkits like Tkinter, wxPython,
Qt, or GTK+. There is also a procedural "pylab" interface based on a state machine (like
OpenGL), designed to closely resemble that of MATLAB, though its use is discouraged. SciPy
makes use of Matplotlib.
Matplotlib was originally written by John D. Hunter, has an active development community, and
is distributed under a BSD-style license. It was first released in 2003. The official website
is www.matplotlib.org.

What is Matplotlib PyPlot?


matplotlib.pyplot is a collection of command style functions that make matplotlib work like
MATLAB. Each pyplot function makes some change to a figure
For Example, creates a figure, creates a plotting area in a figure, plots some lines in a plotting
area, decorates the plot with labels, etc.
In matplotlib.pyplot various states are preserved across function calls, so that it keeps track of
things like the current figure and plotting area, and the plotting functions are directed to the
current axes (please note that "axes" here and in most places in the documentation refers to
the axes part of a figure and not the strict mathematical term for more than one axis).
Following is a list of all the Matplotlib PyPlot functions
Function Description
acorr Plot the autocorrelation of x.
angle_spectrum Plot the angle spectrum.
annotate Annotate the point xy with text text.
arrow Add an arrow to the axes.
autoscale Auto scale the axis view to the data (toggle).
axes Add axes to the current figure and make it the current
axes.
axhline Add a horizontal line across the axis.
axhspan Add a horizontal span (rectangle) across the axis.
axis Convenience method to get or set some axis properties.
axvline Add a vertical line across the axes.
axvspan Add a vertical span (rectangle) across the axes.
bar Make a bar plot.
barbs Plot a 2D field of barbs.
barh Make a horizontal bar plot.
box Turn the axes box on or off on the current axes.
boxplot Make a box and whisker plot.
broken_barh Plot a horizontal sequence of rectangles.
cla Clear the current axes.
clabel Label a contour plot.
clf Clear the current figure.
clim Set the color limits of the current image.
close Close a figure window.

https://www.c-sharpcorner.com/ebooks/ 43
cohere Plot the coherence between x and y.
colorbar Add a colorbar to a plot.
contour Plot contours.
contourf Plot contours.
csd Plot the cross-spectral density.
Remove the Axes ax (defaulting to the current axes)
delaxes
from its figure.
draw Redraw the current figure.
Plot y versus x as lines and/or markers with attached
errorbar
errorbars.
eventplot Plot identical parallel lines at the given positions.
figimage Add a non-resampled image to the figure.
figlegend Place a legend on the figure.
fignum_exists Return whether the figure with the given id exists.
figtext Add text to figure.
figure Create a new figure.
fill Plot filled polygons.
fill_between Fill the area between two horizontal curves.
fill_betweenx Fill the area between two vertical curves.
findobj Find artist objects.
Get the current Axes instance on the current figure
gca
matching the given keyword args, or create one.
gcf Get the current figure.
gci Get the current colorable artist.
get_figlabels Return a list of existing figure labels.
get_fignums Return a list of existing figure numbers.
grid Configure the grid lines.
hexbin Make a hexagonal binning plot.
hist Plot a histogram.
hist2d Make a 2D histogram plot.
hlines Plot horizontal lines at each y from xmin to xmax.
imread Read an image from a file into an array.
imsave Save an array as an image file.
imshow Display an image, i.e.
Install a repl display hook so that any stale figure is
install_repl_displayhook automatically redrawn when control is returned to the
repl.
ioff Turn the interactive mode off.
ion Turn the interactive mode on.
isinteractive Return the status of interactive mode.
legend Place a legend on the axes.
locator_params Control behavior of major tick locators.
loglog Make a plot with log scaling on both the x and y-axis.
magnitude_spectrum Plot the magnitude spectrum.
margins Set or retrieve autoscaling margins.

https://www.c-sharpcorner.com/ebooks/ 44
matshow Display an array as a matrix in a new figure window.
minorticks_off Remove minor ticks from the axes.
minorticks_on Display minor ticks on the axes.
pause Pause for interval seconds.
pcolor Create a pseudocolor plot with a non-regular rectangular
grid.
Create a pseudocolor plot with a non-regular rectangular
pcolormesh
grid.
phase_spectrum Plot the phase spectrum.
pie Plot a pie chart.
plot Plot y versus x as lines and/or markers.
plot_date Plot data that contains dates.
plotfile Plot the data in a file.
polar Make a polar plot.
psd Plot the power spectral density.
quiver Plot a 2D field of arrows.
quiverkey Add a key to a quiver plot.
rc Set the current rc params.
rc_context Return a context manager for managing rc settings.
Restore the rc params from Matplotlib's internal default
rcdefaults
style.
rgrids Get or set the radial gridlines on the current polar plot.
savefig Save the current figure.
sca Set the current Axes instance to ax.
scatter A scatter plot of y vs x with varying marker size and/or
color.
sci Set the current image.
semilogx Make a plot with a log scale on the x-axis.
semilogy Make a plot with log scaling on the y-axis.
set_cmap Set the default colormap.
setp Set a property on an artist object.
show Display a figure.
specgram Plot a spectrogram.
spy Plot the sparsity pattern of a 2D array.
stackplot Draw a stacked area plot.
stem Create a stem plot.
step Make a step plot.
streamplot Draw streamlines of a vector flow.
subplot Add a subplot to the current figure.
subplot2grid Create an axis at a specific location inside a regular grid.
subplot_tool Launch a subplot tool window for a figure.
subplots Create a figure and a set of subplots.
subplots_adjust Tune the subplot layout.
suptitle Add a centered title to the figure.
switch_backend Close all open figures and set the Matplotlib backend.

https://www.c-sharpcorner.com/ebooks/ 45
table Add a table to an Axes.
text Add text to the axes.
thetagrids Get or set the theta gridlines on the current polar plot.
tick_params Change the appearance of ticks, tick labels, and
gridlines.
Change the ScalarFormatter used by default for linear
ticklabel_format
axes.

tight_layout Automatically adjust subplot parameters to give specified


padding.
title Set a title for the axes.
tricontour Draw contours on an unstructured triangular grid.
tricontourf Draw contours on an unstructured triangular grid.
tripcolor Create a pseudocolor plot of an unstructured triangular
grid.
triplot Draw an unstructured triangular grid as lines and/or
markers.
twinx Make and return a second axes that shares the x-axis.
twiny Make and return a second axes that shares the y-axis.
uninstall_repl_displayhook Uninstall the matplotlib display hook.
violinplot Make a violin plot.
vlines Plot vertical lines.
xcorr Plot the cross-correlation between x and y.
xkcd Turn on xkcd sketch-style drawing mode.
xlabel Set the label for the x-axis.
xlim Get or set the x limits of the current axes.
xscale Set the x-axis scale.
Get or set the current tick locations and labels of the x-
xticks
axis.
ylabel Set the label for the y-axis.
ylim Get or set the y-limits of the current axes.
yscale Set the y-axis scale.
yticks Get or set the current tick locations and labels of the y-
axis.

What is Matplotlib Inline?


%matplotlib inline is how we use inline in a jupyter document. With this backend, the output of
plotting commands is displayed inline within frontends like the jupyter notebook, directly below
the code cell that produced it. The resulting plots will then also be stored in the notebook
document. When using the 'inline' backend, your matplotlib graphs will be included in your
notebook, next to the code.
Note:
A complete list of matplotlib functions can be found here.

https://www.c-sharpcorner.com/ebooks/ 46
Installing Matplotlib
Ubuntu/ Linux
sudo apt update -y
sudo apt upgrade -y
sudo apt install python3-tk python3-pip -y
sudo pip install matplotlib -y

Anaconda Prompt
conda install -c conda-forge matplotlib

Anatomy of a figure
Feature
The figure keeps track of all the child Axes, a smattering of 'special' artists (titles, figure legends,
etc.), and the canvas. A figure can have any number of Axes, but to be useful should have at
least one.

Axes
This is what you think of as 'a plot', it is the region of the image with the data space. A given
figure can contain many Axes, but a given Axes object can only be in one Figure. The Axes
contains two (or three in the case of 3D) Axis objects (be aware of the difference
between Axes and Axis) which take care of the data limits (the data limits can also be controlled
via set via the set_xlim() and set_ylim() Axes methods). Each Axes has a title (set via set_title()),
an x-label (set via set_xlabel()), and a y-label set via set_ylabel()).
The Axes class and its member functions are the primary entry point to working with the OO
interface.

Axis
These are number-line-like objects. They take care of setting the graph limits and generating the
ticks (the marks on the axis) and ticklabels (strings labelling the ticks). The location of the ticks is
determined by a Locator object and the ticklabel strings are formatted by a Formatter. The
combination of the correct Locator and Formatter gives very fine control over the tick locations
and labels.

Artist
Basically, everything you can see on the figure is an artist (even the Figure, Axes,
and Axis objects).
This includes Text objects, Line2D objects, collection objects, Patch objects ... (you get the
idea). When the figure is rendered, all of the artists are drawn to the canvas.
Most Artists are tied to an Axes; such an Artist cannot be shared by multiple Axes or moved
from one to another.
Note:
here are several toolkits which are available that extend python matplotlib functionality. Some of
them are separate downloads, others can be shipped with the matplotlib source code but have
external dependencies.

https://www.c-sharpcorner.com/ebooks/ 47
• Basemap: It is a map plotting toolkit with various map projections, coastlines and
political boundaries.
• Cartopy: It is a mapping library featuring object-oriented map projection definitions, and
arbitrary point, line, polygon and image transformation capabilities.
• Excel tools: Matplotlib provides utilities for exchanging data with Microsoft Excel.
• Mplot3d: It is used for 3-D plots.
• Natgrid: It is an interface to the natgrid library for irregular gridding of the spaced data.

What is Backend in Matplotlib?


Matplotlib targets many different use cases and output formats. Some people use matplotlib
interactively from the python shell and have plotting windows pop up when they type commands.
Some people run Jupyter notebooks and draw inline plots for quick data analysis. Some people
use matplotlib in batch scripts to generate postscript images from numerical simulations, and
still, others run web application servers to dynamically serve up graphs.
To support all of these use cases, matplotlib can target different outputs, and each of these
capabilities is called a backend; the "frontend" is the user-facing code, i.e., the plotting code,
whereas the "backend" does all the hard work behind-the-scenes to make the figure.
There are two types of backends: user interface backends (for use in pygtk, wxpython, tkinter,
qt4, or macosx; also referred to as "interactive backends") and hardcopy backends to make
image files (PNG, SVG, PDF, PS; also referred to as "non-interactive backends").
Following is the list of Backend renderers
Backend Description
Agg rendering in a Qt5 canvas (requires PyQt5). This backend can be activated in
Qt5Agg
IPython with %matplotlib qt5.

https://www.c-sharpcorner.com/ebooks/ 48
Agg rendering embedded in a Jupyter widget. (requires ipympl). This backend can
ipympl
be enabled in a Jupyter notebook with %matplotlib ipympl.
Agg rendering to a GTK 3.x canvas (requires PyGObject, and pycairo or cairocffi).
GTK3Agg
This backend can be activated in IPython with %matplotlib gtk3.
Agg rendering into a Cocoa canvas in OSX. This backend can be activated in
macosx
IPython with %matplotlib osx.
Agg rendering to a Tk canvas (requires TkInter). This backend can be activated in
TkAgg
IPython with %matplotlib tk.
Embed an interactive figure in a Jupyter classic notebook. This backend can be
nbAgg
enabled in Jupyter notebooks via %matplotlib notebook.
WebAgg On show() will start a tornado server with an interactive figure.
GTK3Cairo Cairo rendering to a GTK 3.x canvas (requires PyGObject, and pycairo or cairocffi).
Agg rendering to a Qt4 canvas (requires PyQt4 or pyside). This backend can be
Qt4Agg
activated in IPython with %matplotlib qt4.
Agg rendering to a wxWidgets canvas (requires wxPython 4). This backend can be
WXAgg
activated in IPython with %matplotlib wx.

Plotting 2D Graphs
2D or 2-dimensional graphs are those which have only 2 axes i.e. x and y, they are planar
graphs

Line 2D Plot
A line chart or line plot or line graph or curve chart is a type of chart which displays information
as a series of data points called 'markers' connected by straight line segments. It is a basic type
of chart common in many fields.
import matplotlib.pyplot as plt

# x axis values
x = [1,2,3]
# corresponding y axis values
y = [3,1,2]

# plotting the points


plt.plot(x, y)

# naming the x axis


plt.xlabel('x - axis')
# naming the y axis
plt.ylabel('y - axis')

# giving a title to my graph


plt.title('2D Line Plot!')

# function to show the plot


plt.show()

https://www.c-sharpcorner.com/ebooks/ 49
Output

Custom 2D Plot
A custom 2D plot is a plot which can be enhanced graphically
import matplotlib.pyplot as plt

# x axis values
x = [1,2,3,4,5,6]
# corresponding y axis values
y = [2,4,1,5,2,6]

# plotting the points


plt.plot(x, y, color='green', linestyle='dashed', linewidth = 3,
marker='o', markerfacecolor='blue', markersize=12)

# setting x and y axis range


plt.ylim(1,8)
plt.xlim(1,8)

plt.xlabel('x - axis')
# naming the y axis
plt.ylabel('y - axis')

plt.title('Custom Plots!')

# function to show the plot


plt.show()
Output

https://www.c-sharpcorner.com/ebooks/ 50
Bar 2D Chart
According to Wikipedia, a bar chart or bar graph is a chart or graph that presents categorical
data with rectangular bars with heights or lengths proportional to the values that they represent.
The bars can be plotted vertically or horizontally. A vertical bar chart is sometimes called a line
graph.
import matplotlib.pyplot as plt

# x-coordinates of left sides of bars


left = [1, 2, 3, 4, 5]

# heights of bars
height = [10, 20, 30, 40, 50]

# labels for bars


tick_label = ['one', 'two', 'three', 'four', 'five']

# plotting a bar chart


plt.bar(left, height, tick_label = tick_label,
width = 0.8, color = ['blue', 'green', 'red','yellow','black'])

# naming the x-axis


plt.xlabel('x - axis')
# naming the y-axis
plt.ylabel('y - axis')
# plot title
plt.title('bar chart!')

# function to show the plot


plt.show()
Output

https://www.c-sharpcorner.com/ebooks/ 51
Histogram 2D Plot
According to Wikipedia, a histogram is an accurate representation of the distribution of
numerical data. It is an estimate of the probability distribution of a continuous variable and was
first introduced by Karl Pearson. It differs from a bar graph, in the sense that a bar graph relates
two variables, but a histogram relates only one.
import matplotlib.pyplot as plt

# frequencies
ages = [2,5,70,40,30,45,50,45,43,40,44,
60,7,13,57,18,90,77,32,21,20,40]

# setting the ranges and no. of intervals


range = (0, 100)
bins = 5

# plotting a histogram
plt.hist(ages, bins, range, color ='red',
histtype = 'bar', rwidth = 0.8)

# x-axis label
plt.xlabel('age')
# frequency label
plt.ylabel('No. of people')
# plot title
plt.title('My histogram')

# function to show the plot


plt.show()
Output

https://www.c-sharpcorner.com/ebooks/ 52
Scatter 2D Plot
According to Wikipedia, a scatter plot is a type of plot or mathematical diagram using Cartesian
coordinates to display values for typically two variables for a set of data. If the points are coded,
one additional variable can be displayed
import matplotlib.pyplot as plt

# x-axis values
x = [2,4,5,7,6,8,9,11,12,12]
# y-axis values
y = [1,2,3,4,5,6,7,8,9,10]

# plotting points as a scatter plot


plt.scatter(x, y, label= "stars", color= "green",
marker= "x", s=30)

# x-axis label
plt.xlabel('x - axis')
# frequency label
plt.ylabel('y - axis')
# plot title
plt.title('Scatter plot!')
# showing legend
plt.legend()

# function to show the plot


plt.show()
Output

https://www.c-sharpcorner.com/ebooks/ 53
Pie-Chart 2D Plot
According to Wikipedia, a pie chart is a circular statistical graphic, which is divided into slices to
illustrate numerical proportion. In a pie chart, the arc length of each slice is proportional to the
quantity it represents.
import matplotlib.pyplot as plt

# defining labels
activities = ['eat', 'sleep', 'work', 'play']

# portion covered by each label


slices = [3, 7, 8, 6]

# color for each label


colors = ['r', 'y', 'g', 'b']

# plotting the pie chart


plt.pie(slices, labels = activities, colors=colors,
startangle=90, shadow = True, explode = (0.3, 0, 0.1, 0),
radius = 1.2, autopct = '%1.1f%%')

# plotting legend
plt.legend()

# showing the plot


plt.show()
Output

https://www.c-sharpcorner.com/ebooks/ 54
Plotting 3D Graphs
Toolkits are collections of application-specific functions that extend Matplotlib.

mplot3d
The mplot3d toolkit adds simple 3D plotting capabilities to matplotlib by supplying an axes object
that can create a 2D projection of a 3D scene. The resulting graph will have the same look and
feel as regular 2D plots.

Line 3D Plot
According to Wikipedia, a line chart or line plot or line graph or curve chart is a type of chart
which displays information as a series of data points called 'markers' connected by straight line
segments. It is a basic type of chart common in many fields.
# This import registers the 3D projection but is otherwise unused.
from mpl_toolkits.mplot3d import Axes3D #F401 unused import

import numpy as np
import matplotlib.pyplot as plt

plt.rcParams['legend.fontsize'] = 10
fig = plt.figure()
ax = fig.gca(projection='3d')

# Prepare arrays x, y, z
theta = np.linspace(-4 * np.pi, 4 * np.pi, 100)
z = np.linspace(-2, 2, 100)
r = z**2 + 1
x = r * np.sin(theta)
y = r * np.cos(theta)

ax.plot(x, y, z, label='parametric curve')


ax.legend()
plt.show()
Output

https://www.c-sharpcorner.com/ebooks/ 55
Scatter 3D Plot
According to Wikipedia, a scatter plot is a type of plot or mathematical diagram using Cartesian
coordinates to display values for typically two variables for a set of data. If the points are coded,
one additional variable can be displayed
# This import registers the 3D projection, but is otherwise unused.
from mpl_toolkits.mplot3d import Axes3D # noqa: F401 unused import

import matplotlib.pyplot as plt


import numpy as np

# Fixing random state for reproducibility


np.random.seed(19680801)

def randrange(n, vmin, vmax):


'''''
Helper function to make an array of random numbers having shape (n,
)
with each number distributed Uniform(vmin, vmax).
'''
return (vmax - vmin)*np.random.rand(n) + vmin

fig = plt.figure()
ax = fig.add_subplot(111, projection='3d')

n = 100

# For each set of style and range settings, plot n random points in the
box
# defined by x in [23, 32], y in [0, 100], z in [zlow, zhigh].
for m, zlow, zhigh in [('o', -50, -25), ('^', -30, -5)]:
xs = randrange(n, 23, 32)
ys = randrange(n, 0, 100)
zs = randrange(n, zlow, zhigh)
ax.scatter(xs, ys, zs, marker=m)

ax.set_xlabel('X Label')
ax.set_ylabel('Y Label')

https://www.c-sharpcorner.com/ebooks/ 56
ax.set_zlabel('Z Label')

plt.show()
Output

WireFrame 3D Plot
According to Wikipedia, wireframe plot takes a grid of values and projects it onto the specified
three-dimensional surface and can make the resulting three-dimensional forms quite easy to
visualize.
from mpl_toolkits.mplot3d import axes3d
import matplotlib.pyplot as plt

fig = plt.figure()
ax = fig.add_subplot(111, projection='3d')

# Grab some test data.


X, Y, Z = axes3d.get_test_data(0.05)

# Plot a basic wireframe.


ax.plot_wireframe(X, Y, Z, rstride=10, cstride=10)

plt.show()
Output

Surface 3D Plot
According to Wikipedia, surface plots are diagrams of three-dimensional data. Rather than
showing the individual data points, surface plots show a functional relationship between a
designated dependent variable (Y), and two independent variables (X and Z). The plot is a
companion plot to the contour plot.

https://www.c-sharpcorner.com/ebooks/ 57
# This import registers the 3D projection but is otherwise unused.
from mpl_toolkits.mplot3d import Axes3D # F401 unused import

import matplotlib.pyplot as plt


from matplotlib import cm
from matplotlib.ticker import LinearLocator, FormatStrFormatter
import numpy as np

fig = plt.figure()
ax = fig.gca(projection='3d')

# Make data.
X = np.arange(-5, 5, 0.5)
Y = np.arange(-5, 5, 0.5)
X, Y = np.meshgrid(X, Y)
R = np.sqrt(X**2 + Y**2)
Z = np.sin(R)

# Plot the surface.


surf = ax.plot_surface(X, Y, Z, cmap=cm.coolwarm,
linewidth=0, antialiased=False)

# Customize the z axis.


ax.set_zlim(-1.01, 1.01)
ax.zaxis.set_major_locator(LinearLocator(10))
ax.zaxis.set_major_formatter(FormatStrFormatter('%.02f'))

# Add a color bar which maps values to colors.


fig.colorbar(surf, shrink=0.5, aspect=5)

plt.show()
Output

Tri-Surface 3D Plot
According to Wikipedia, triangulation of a compact surface is a finite collection of triangles that
cover the surface in such a way that every point on the surface is in a triangle, and the

https://www.c-sharpcorner.com/ebooks/ 58
intersection of any two triangles is either void, a common edge or a common vertex. A
triangulated is called a tri-surface.
from mpl_toolkits.mplot3d import Axes3D
import matplotlib.pyplot as plt
import numpy as np

n_radii = 16
n_angles = 16

# Make radii and angles spaces (radius r=0 omitted to eliminate duplica
tion).
radii = np.linspace(0.125, 1.0, n_radii)
angles = np.linspace(0, 2*np.pi, n_angles, endpoint=False)

# Repeat all angles for each radius.


angles = np.repeat(angles[..., np.newaxis], n_radii, axis=1)

# Convert polar (radii, angles) coords to cartesian (x, y) coords.


# (0, 0) is manually added at this stage, so there will be no duplicat
e
# points in the (x, y) plane.
x = np.append(0, (radii*np.cos(angles)).flatten())
y = np.append(0, (radii*np.sin(angles)).flatten())

# Compute z to make the pringle surface.


z = np.sin(-x*y)

fig = plt.figure()
ax = fig.gca(projection='3d')

ax.plot_trisurf(x, y, z, linewidth=0.2, antialiased=True)

plt.show()
Output

Contour 3D Plot

https://www.c-sharpcorner.com/ebooks/ 59
According to Wikipedia, contour line of a function of two variables is a curve along which the
function has a constant value so that the curve joins points of equal value. It is a plane section of
the three-dimensional graph of the function f parallel to the plane. A contour plot is a graphical
technique for representing a 3-dimensional surface by plotting constant z slices, called contours,
on a 2-dimensional format. That is, given a value for z, lines are drawn for connecting the (x,y)
coordinates where that z value occurs.
from mpl_toolkits.mplot3d import axes3d
import matplotlib.pyplot as plt
from matplotlib import cm

fig = plt.figure()
ax = fig.gca(projection='3d')
X, Y, Z = axes3d.get_test_data(0.005)
cset = ax.contour(X, Y, Z, extend3d=True, cmap=cm.coolwarm)
ax.clabel(cset, fontsize=9, inline=1)

plt.show()
Output

Filled Contour 3D Plot


This is a type of contour plot where each 3D artist is filled
from mpl_toolkits.mplot3d import axes3d
import matplotlib.pyplot as plt
from matplotlib import cm

fig = plt.figure()
ax = fig.gca(projection='3d')
X, Y, Z = axes3d.get_test_data(0.005)
cset = ax.contourf(X, Y, Z, cmap=cm.coolwarm)
ax.clabel(cset, fontsize=9, inline=1)

plt.show()
Output

https://www.c-sharpcorner.com/ebooks/ 60
Polygon 3D Plot
According to Wikipedia, polygon plot is a type of 3D plot, where the artists can be of different
shapes and sizes with no restriction on the shape or size of any artist
from mpl_toolkits.mplot3d import Axes3D
from matplotlib.collections import PolyCollection
import matplotlib.pyplot as plt
from matplotlib import colors as mcolors
import numpy as np

fig = plt.figure()
ax = fig.gca(projection='3d')

def cc(arg):
return mcolors.to_rgba(arg, alpha=0.9)

xs = np.arange(0, 10, 0.2)


verts = []
zs = [0.0, 1.0, 2.0, 3.0]
for z in zs:
ys = np.random.rand(len(xs))
ys[0], ys[-1] = 0, 0
verts.append(list(zip(xs, ys)))

poly = PolyCollection(verts, facecolors=[cc('r'), cc('g'), cc('b'),


cc('y')])
poly.set_alpha(0.9)
ax.add_collection3d(poly, zs=zs, zdir='y')

ax.set_xlabel('X')
ax.set_xlim3d(0, 10)
ax.set_ylabel('Y')
ax.set_ylim3d(-1, 4)
ax.set_zlabel('Z')
ax.set_zlim3d(0, 1)

plt.show()
Output

https://www.c-sharpcorner.com/ebooks/ 61
Bar 3D Plot
According to Wikipedia, a bar chart or bar graph is a chart or graph that presents categorical
data with rectangular bars with heights or lengths proportional to the values that they represent.
The bars can be plotted vertically or horizontally. A vertical bar chart is sometimes called a line
graph.
from mpl_toolkits.mplot3d import Axes3D
import matplotlib.pyplot as plt
import numpy as np

fig = plt.figure()
ax = fig.add_subplot(111, projection='3d')
for c, z in zip(['r', 'g', 'b', 'y'], [30, 20, 10, 0]):
xs = np.arange(100)
ys = np.random.rand(100)

# You can provide either a single color or an array. To demonstrate


this,
# the first bar of each set will be colored cyan.
cs = [c] * len(xs)
cs[0] = 'c'
ax.bar(xs, ys, zs=z, zdir='y', color=cs, alpha=0.8)

ax.set_xlabel('X')
ax.set_ylabel('Y')
ax.set_zlabel('Z')

plt.show()
Output

https://www.c-sharpcorner.com/ebooks/ 62
Quiver 3D
A quiver plot displays velocity vectors as arrows with components (u,v) at the points (x,y) .
from mpl_toolkits.mplot3d import axes3d
import matplotlib.pyplot as plt
import numpy as np

fig = plt.figure()
ax = fig.gca(projection='3d')

# Make the grid


x, y, z = np.meshgrid(np.arange(-0.8, 1, 0.4),
np.arange(-0.8, 1, 0.3),
np.arange(-0.8, 1, 0.3))

# Make the direction data for the arrows


u = np.sin(np.pi * x) * np.cos(np.pi * y) * np.cos(np.pi * z)
v = -np.cos(np.pi * x) * np.sin(np.pi * y) * np.cos(np.pi * z)
w = (np.sqrt(2.0 / 3.0) * np.cos(np.pi * x) * np.cos(np.pi * y) *
np.sin(np.pi * z))

ax.quiver(x, y, z, u, v, w, length=0.2, normalize=True)

plt.show()
Output

Conclusion
In this chapter, we studied python matplotlib, matplotlib pyplot, matplotlib inline, installing
matplotlib, the anatomy of a figure, backend in matplotlib, list of backend renderers, list of pyplot
matplotlib functions, plotting 2D and 3D graphs, types of 2D and 3D graphs and python
implementation of these functionalities. Hope you were able to understand each and everything.
For any doubts, please comment on your query.

https://www.c-sharpcorner.com/ebooks/ 63
7
Python Seaborn
Overview

In this chapter, we delve into Seaborn, a powerful library for creating


statistical graphics in Python. Discover its integration with pandas
and matplotlib, and explore its capabilities for visualizing
relationships between variables, univariate and bivariate
distributions, and complex dataset structures. Learn how to install
Seaborn, understand its differences from Matplotlib, and master key
functions such as relplot(), scatterplot(), lineplot(), and more to
enhance your data analysis and presentation skills.

https://www.c-sharpcorner.com/ebooks/ 64
What is Python Seaborn?
Seaborn is a library for making statistical graphics in Python. It is built on top of matplotlib and
closely integrated with pandas data structures.
Here is some of the functionality that seaborn offers:
• A dataset-oriented API for examining relationships between multiple variables
• Specialized support for using categorical variables to show observations or aggregate
statistics
• Options for visualizing univariate or bivariate distributions and for comparing them
between subsets of data
• Automatic estimation and plotting of linear regression models for different kinds of
dependent variables
• Convenient views onto the overall structure of complex datasets
• High-level abstractions for structuring multi-plot grids that let you easily build complex
visualizations
• Concise control over matplotlib figure styling with several built-in themes
• Tools for choosing color palettes that faithfully reveal patterns in your data
Seaborn aims to make visualization a central part of exploring and understanding data. Its
dataset-oriented plotting functions operate on dataframes and arrays containing whole datasets
and internally perform the necessary semantic mapping and statistical aggregation to produce
informative plots
The official website is seaborn.pydata.org

Installing Seaborn
Ubuntu/Linux
sudo apt update -y
sudo apt upgrade -y
sudo apt install python3-tk python3-pip -y
sudo pip install seaborn -y

Anaconda Prompt
conda install -c anaconda seaborn

Difference Between Matplotlib and Seaborn


Matplotlib Seaborn
Seaborn, on the other hand, provides a
variety of visualization patterns. It uses
Matplotlib is mainly deployed for
fewer syntax and has easily interesting
basic plotting. Visualization using
Functionality default themes. It specializes in statistics
Matplotlib generally consists of bars,
visualization and is used if one has to
pies, lines, scatter plots and so on.
summarize data in visualizations and also
show the distribution in the data.
Handling Matplotlib has multiple figures can Seaborn automates the creation of multiple
Multiple be opened but need to be closed figures. This sometimes leads to OOM (out
Figures explicitly. plt.close() only closes the of memory) issues.

https://www.c-sharpcorner.com/ebooks/ 65
current figure. plt.close(‘all’) would
close all.
Matplotlib is a graphics package for
data visualization in Python. It is well Seaborn is more integrated for working with
integrated with NumPy and Pandas. Pandas data frames. It extends the
Visualization The pyplot module mirrors the Matplotlib library for creating beautiful
MATLAB plotting commands closely. graphics with Python using a more
Hence, MATLAB users can easily straightforward set of methods.
transit to plotting with Python.
Matplotlib works with data frames Seaborn works with the dataset as a whole
and arrays. It has different stateful and is much more intuitive than Matplotlib.
APIs for plotting. The figures and For Seaborn, replot() is the entry API with
Data Frames
aces are represented by the object ‘kind’ parameter to specify the type of plot
and Arrays
and therefore plot() like calls without which could be line, bar, or any of the other
parameters suffices, without having types. Seaborn is not stateful. Hence, plot()
to manage parameters. would require passing the object.
Seaborn avoids a ton of boilerplate by
Matplotlib is highly customizable and
Flexibility providing default themes which are
powerful.
commonly used.
Seaborn is for more specific use cases.
Pandas uses Matplotlib. It is a neat
Use Cases Also, it is Matplotlib under the hood. It is
wrapper around Matplotlib.
specially meant for statistical plotting.

Seaborn Functions
seaborn.relplot()
Syntax
seaborn.relplot(x=None, y=None, hue=None, size=None, style=None, data=None, row=None,
col=None, col_wrap=None, row_order=None, col_order=None, palette=None, hue_order=None,
hue_norm=None, sizes=None, size_order=None, size_norm=None, markers=None,
dashes=None, style_order=None, legend='brief', kind='scatter', height=5, aspect=1,
facet_kws=None, **kwargs)
It is a function that is a figure-level interface for drawing relational plots onto a FacetGrid.
import seaborn as sns
sns.set(style="white")

# Load the example mpg dataset


mpg = sns.load_dataset("mpg")

# Plot miles per gallon against horsepower with other semantics


sns.relplot(x="horsepower", y="mpg", hue="origin", size="weight",
sizes=(400, 40), alpha=.5, palette="muted",
height=6, data=mpg)
Output

https://www.c-sharpcorner.com/ebooks/ 66
seaborn.scatterplot()
Syntax
seaborn.scatterplot(x=None, y=None, hue=None, style=None, size=None, data=None,
palette=None, hue_order=None, hue_norm=None, sizes=None, size_order=None,
size_norm=None, markers=True, style_order=None, x_bins=None, y_bins=None, units=None,
estimator=None, ci=95, n_boot=1000, alpha='auto', x_jitter=None, y_jitter=None, legend='brief',
ax=None, **kwargs)
Draws a scatter plot with the possibility of several semantic groupings.
import seaborn as sns
sns.set()

# Load the example iris dataset


planets = sns.load_dataset("planets")

cmap = sns.cubehelix_palette(rot=-.5, as_cmap=True)


ax = sns.scatterplot(x="distance", y="orbital_period",
hue="year", size="mass",
palette=cmap, sizes=(100, 100),
data=planets)
Output

https://www.c-sharpcorner.com/ebooks/ 67
seaborn.lineplot()
Syntax
seaborn.lineplot(x=None, y=None, hue=None, size=None, style=None, data=None,
palette=None, hue_order=None, hue_norm=None, sizes=None, size_order=None,
size_norm=None, dashes=True, markers=None, style_order=None, units=None,
estimator='mean', ci=95, n_boot=1000, sort=True, err_style='band', err_kws=None,
legend='brief', ax=None, **kwargs)
Draws a line plot with the possibility of several semantic groupings.
import numpy as np
import pandas as pd
import seaborn as sns
sns.set(style="whitegrid")

rs = np.random.RandomState(365)
values = rs.randn(365, 4).cumsum(axis=0)
dates = pd.date_range("1 1 2016", periods=365, freq="D")
data = pd.DataFrame(values, dates, columns=["A", "B", "C", "D"])
data = data.rolling(10).mean()

sns.lineplot(data=data, palette="tab10", linewidth=2.5)


Output

seaborn.catplot()
Syntax
seaborn.catplot(x=None, y=None, hue=None, data=None, row=None, col=None,
col_wrap=None, estimator=<function mean>, ci=95, n_boot=1000, units=None, order=None,
hue_order=None, row_order=None, col_order=None, kind='strip', height=5, aspect=1,
orient=None, color=None, palette=None, legend=True, legend_out=True, sharex=True,
sharey=True, margin_titles=False, facet_kws=None, **kwargs)
Figure-level interface for drawing categorical plots onto a FacetGrid.
import seaborn as sns
sns.set(style="whitegrid")

# Load the example exercise dataset

https://www.c-sharpcorner.com/ebooks/ 68
df = sns.load_dataset("exercise")

# Draw a pointplot to show pulse as a function of three categorical fac


tors
g = sns.catplot(x="pulse", y="time", hue="diet", col="kind",
capsize=.6, palette="YlGnBu_d", height=6, aspect=.75,
kind="point", data=df)
g.despine(left=True)
Output

seaborn.stripplot()
Syntax
seaborn.stripplot(x=None, y=None, hue=None, data=None, order=None, hue_order=None,
jitter=True, dodge=False, orient=None, color=None, palette=None, size=5, edgecolor='gray',
linewidth=0, ax=None, **kwargs)
Draws a scatterplot where one variable is categorical.
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt

sns.set(style="whitegrid")
iris = sns.load_dataset("iris")

# "Melt" the dataset to "long-form" or "tidy" representation


iris = pd.melt(iris, "species", var_name="measurement")

# Initialize the figure


f, ax = plt.subplots()
sns.despine(bottom=True, left=True)

# Show each observation with a scatterplot

https://www.c-sharpcorner.com/ebooks/ 69
sns.stripplot(x="measurement", y="value", hue="species",
data=iris, dodge=True, jitter=True,
alpha=.25, zorder=1)

# Show the conditional means


sns.pointplot(x="measurement", y="value", hue="species",
data=iris, dodge=.532, join=False, palette="dark",
markers="d", scale=.75, ci=None)

# Improve the legend


handles, labels = ax.get_legend_handles_labels()
ax.legend(handles[3:], labels[3:], title="species",
handletextpad=0, columnspacing=1,
loc="lower right", ncol=3, frameon=True)
Output

seaborn.swarmplot()
Syntax
seaborn.swarmplot(x=None, y=None, hue=None, data=None, order=None, hue_order=None,
dodge=False, orient=None, color=None, palette=None, size=5, edgecolor='gray', linewidth=0,
ax=None, **kwargs)
Draws a categorical scatterplot with non-overlapping points.
import pandas as pd
import seaborn as sns
sns.set(style="whitegrid", palette="muted")

# Load the example iris dataset


iris = sns.load_dataset("iris")

# "Melt" the dataset to "long-form" or "tidy" representation


iris = pd.melt(iris, "species", var_name="measurement")

# Draw a categorical scatterplot to show each observation

https://www.c-sharpcorner.com/ebooks/ 70
sns.swarmplot(x="value", y="measurement", hue="species",
palette=["r", "c", "y"], data=iris)
Output

seaborn.boxplot()
Syntax
seaborn.boxplot(x=None, y=None, hue=None, data=None, order=None, hue_order=None,
orient=None, color=None, palette=None, saturation=0.75, width=0.8, dodge=True, fliersize=5,
linewidth=None, whis=1.5, notch=False, ax=None, **kwargs)
Draws a box plot to show distributions with respect to categories.
import seaborn as sns
import matplotlib.pyplot as plt

sns.set(style="ticks")

# Initialize the figure with a logarithmic x axis


f, ax = plt.subplots(figsize=(7, 6))
ax.set_xscale("log")

# Load the example planets dataset


planets = sns.load_dataset("planets")

# Plot the orbital period with horizontal boxes


sns.boxplot(x="distance", y="method", data=planets,
whis="range", palette="vlag")

# Add in points to show each observation


sns.swarmplot(x="distance", y="method", data=planets,
size=2, color=".6", linewidth=0)

# Tweak the visual presentation


ax.xaxis.grid(True)
ax.set(ylabel="")

https://www.c-sharpcorner.com/ebooks/ 71
sns.despine(trim=True, left=True)
Output

seaborn.violinplot()
Syntax
seaborn.violinplot(x=None, y=None, hue=None, data=None, order=None, hue_order=None,
bw='scott', cut=2, scale='area', scale_hue=True, gridsize=100, width=0.8, inner='box',
split=False, dodge=True, orient=None, linewidth=None, color=None, palette=None,
saturation=0.75, ax=None, **kwargs)
Draws a combination of boxplot and kernel density estimate.
import seaborn as sns
import matplotlib.pyplot as plt
sns.set(style="whitegrid")

# Load the example dataset of brain network correlations


df = sns.load_dataset("brain_networks", header=[0, 1, 2], index_col=0)

# Pull out a specific subset of networks


used_networks = [1, 3, 4, 5, 6, 7, 8, 11, 12, 13, 16, 17]
used_columns = (df.columns.get_level_values("network")
.astype(float)
.isin(used_networks))
df = df.loc[:, used_columns]

# Compute the correlation matrix and average over networks


corr_df = df.corr().groupby(level="network").mean()
corr_df.index = corr_df.index.astype(int)
corr_df = corr_df.sort_index().T

# Set up the matplotlib figure

https://www.c-sharpcorner.com/ebooks/ 72
f, ax = plt.subplots(figsize=(11, 6))

# Draw a violinplot with a narrower bandwidth than the default


sns.violinplot(data=corr_df, palette="Set3", bw=1, cut=.2, linewidth=1)

# Finalize the figure


ax.set(ylim=(-.7, 1.05))
sns.despine(left=True, bottom=True)
Output

seaborn.boxenplot()
Syntax
seaborn.boxenplot(x=None, y=None, hue=None, data=None, order=None, hue_order=None,
orient=None, color=None, palette=None, saturation=0.75, width=0.8, dodge=True,
k_depth='proportion', linewidth=None, scale='exponential', outlier_prop=None, ax=None,
**kwargs)
Draws an enhanced box plot for larger datasets.
import seaborn as sns
sns.set(style="whitegrid")

diamonds = sns.load_dataset("diamonds")
clarity_ranking = ["I1", "SI2", "SI1", "VVS2", "VVS1", "IF" "VS2", "VS1
"]

sns.boxenplot(x="clarity", y="carat",
color="g", order=clarity_ranking,
scale="linear", data=diamonds)
Output

https://www.c-sharpcorner.com/ebooks/ 73
seaborn.pointplot()
Syntax
seaborn.pointplot(x=None, y=None, hue=None, data=None, order=None, hue_order=None,
estimator=<function mean>, ci=95, n_boot=1000, units=None, markers='o', linestyles='-',
dodge=False, join=True, scale=1, orient=None, color=None, palette=None, errwidth=None,
capsize=None, ax=None, **kwargs)
Shows point estimates and confidence intervals using scatter plot graphs.
import seaborn as sns
sns.set(style="whitegrid")

# Load the example Titanic dataset


titanic = sns.load_dataset("titanic")

# Set up a grid to plot survival probability against several variables

g = sns.PairGrid(titanic, y_vars="survived",
x_vars=["class", "sex"],
height=5, aspect=.5)

# Draw a seaborn pointplot onto each Axes


g.map(sns.pointplot, scale=1.3, errwidth=4, color="xkcd:plum")
g.set(ylim=(0, 1))
sns.despine(fig=g.fig, left=True)
Output

https://www.c-sharpcorner.com/ebooks/ 74
seaborn.barplot()
Syntax
seaborn.barplot(x=None, y=None, hue=None, data=None, order=None, hue_order=None,
estimator=<function mean>, ci=95, n_boot=1000, units=None, orient=None, color=None,
palette=None, saturation=0.75, errcolor='.26', errwidth=None, capsize=None, dodge=True,
ax=None, **kwargs)
Shows point estimates and confidence intervals as rectangular bars.
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
sns.set(style="white", context="talk")
rs = np.random.RandomState(8)

# Set up the matplotlib figure


f, (ax1, ax2, ax3) = plt.subplots(3, 1, figsize=(7, 5), sharex=True)

# Generate some sequential data


x = np.array(list("ABCDEFGHIJ"))
y1 = np.arange(1, 11)
sns.barplot(x=x, y=y1, palette="rocket", ax=ax1)
ax1.axhline(0, color="k", clip_on=False)
ax1.set_ylabel("Sequential")

# Center the data to make it diverging


y2 = y1 - 5.5
sns.barplot(x=x, y=y2, palette="vlag", ax=ax2)
ax2.axhline(0, color="k", clip_on=False)
ax2.set_ylabel("Diverging")

# Randomly reorder the data to make it qualitative


y3 = rs.choice(y1, len(y1), replace=False)
sns.barplot(x=x, y=y3, palette="deep", ax=ax3)
ax3.axhline(0, color="k", clip_on=False)
ax3.set_ylabel("Qualitative")

# Finalize the plot


sns.despine(bottom=True)
plt.setp(f.axes, yticks=[])
plt.tight_layout(h_pad=2)
Output

https://www.c-sharpcorner.com/ebooks/ 75
seaborn.countplot()
Syntax
seaborn.countplot(x=None, y=None, hue=None, data=None, order=None, hue_order=None,
orient=None, color=None, palette=None, saturation=0.75, dodge=True, ax=None, **kwargs)
Shows the counts of observations in each categorical bin using bars.
import seaborn as sns
sns.set(style="darkgrid")
titanic = sns.load_dataset("titanic")
g = sns.catplot(x="class", hue="who", col="survived", data=titanic, kin
d="count", height=4, aspect=.7)
Output

seaborn.jointplot()
Syntax
seaborn.jointplot(x, y, data=None, kind='scatter', stat_func=None, color=None, height=6,
ratio=5, space=0.2, dropna=True, xlim=None, ylim=None, joint_kws=None,
marginal_kws=None, annot_kws=None, **kwargs)
Draw a plot of two variables with bivariate and univariate graphs.
import numpy as np
import seaborn as sns
sns.set(style="ticks")

rs = np.random.RandomState(11)

https://www.c-sharpcorner.com/ebooks/ 76
x = rs.gamma(1, size=500)
y = -.5 * x + rs.normal(size=500)

sns.jointplot(x, y, kind="hex", color="#4CB391")


Output

seaborn.distplot()
Syntax
seaborn.distplot(a, bins=None, hist=True, kde=True, rug=False, fit=None, hist_kws=None,
kde_kws=None, rug_kws=None, fit_kws=None, color=None, vertical=False, norm_hist=False,
axlabel=None, label=None, ax=None)
Flexibly plots a univariate distribution of observations.
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt

sns.set(style="white", palette="muted", color_codes=True)


rs = np.random.RandomState(10)

# Set up the matplotlib figure


f, axes = plt.subplots(2, 2, figsize=(7, 7), sharex=True)
sns.despine(left=True)

d = rs.normal(size=100)

# Plot a simple histogram with binsize determined automatically


sns.distplot(d, kde=False, color="b", ax=axes[0, 0])

sns.distplot(d, hist=False, rug=True, color="r", ax=axes[0, 1])

sns.distplot(d, hist=False, color="g", kde_kws={"shade": True}, ax=axes


[1, 0])

# Plot a historgram and kernel density estimate


sns.distplot(d, color="m", ax=axes[1, 1])

https://www.c-sharpcorner.com/ebooks/ 77
plt.setp(axes, yticks=[])
plt.tight_layout()
Output

seaborn.pairplot()
Syntax
seaborn.pairplot(data, hue=None, hue_order=None, palette=None, vars=None, x_vars=None,
y_vars=None, kind='scatter', diag_kind='auto', markers=None, height=2.5, aspect=1,
dropna=True, plot_kws=None, diag_kws=None, grid_kws=None, size=None)
Plots pairwise relationships in a dataset.
import seaborn as sns
sns.set(style="ticks")

df = sns.load_dataset("iris")
sns.pairplot(df, hue="species")
Output

https://www.c-sharpcorner.com/ebooks/ 78
seaborn.rugplot()
Syntax
seaborn.rugplot(a, height=0.05, axis='x', ax=None, **kwargs)
Plots data points in an array as sticks on an axis.
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
sample = np.hstack((np.random.randn(300), np.random.randn(200)+5))
fig, ax = plt.subplots(figsize=(8,4))
sns.distplot(sample, rug=True, hist=False, rug_kws={"color": "g"},
kde_kws={"color": "k", "lw": 3})
plt.show()
Output

seaborn.kdeplot()
Syntax
seaborn.kdeplot(data, data2=None, shade=False, vertical=False, kernel='gau', bw='scott',
gridsize=100, cut=3, clip=None, legend=True, cumulative=False, shade_lowest=True,
cbar=False, cbar_ax=None, cbar_kws=None, ax=None, **kwargs)
Fits and plots a univariate or bivariate kernel density estimate.
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt

sns.set(style="dark")
rs = np.random.RandomState(500)

# Set up the matplotlib figure


f, axes = plt.subplots(3, 3, figsize=(9, 9), sharex=True, sharey=True)

# Rotate the starting point around the cubehelix hue circle


for ax, s in zip(axes.flat, np.linspace(0, 3, 10)):

# Create a cubehelix colormap to use with kdeplot

https://www.c-sharpcorner.com/ebooks/ 79
cmap = sns.cubehelix_palette(start=s, light=1, as_cmap=True)

# Generate and plot a random bivariate dataset


x, y = rs.randn(2, 50)
sns.kdeplot(x, y, cmap=cmap, shade=True, cut=5, ax=ax)
ax.set(xlim=(-3, 3), ylim=(-3, 3))

f.tight_layout()
Output

seaborn.lmplot()
Syntax
seaborn.lmplot(x, y, data, hue=None, col=None, row=None, palette=None, col_wrap=None,
height=5, aspect=1, markers='o', sharex=True, sharey=True, hue_order=None, col_order=None,
row_order=None, legend=True, legend_out=True, x_estimator=None, x_bins=None, x_ci='ci',
scatter=True, fit_reg=True, ci=95, n_boot=1000, units=None, order=1, logistic=False,
lowess=False, robust=False, logx=False, x_partial=None, y_partial=None, truncate=False,
x_jitter=None, y_jitter=None, scatter_kws=None, line_kws=None, size=None)
Plots data and regression model fits across a FacetGrid
import seaborn as sns
sns.set()

iris = sns.load_dataset("iris")

# Plot sepal with as a function of sepal_length across days


g = sns.lmplot(x="sepal_length", y="sepal_width", hue="species",
truncate=True, height=5, data=iris)

# Use more informative axis labels than are provided by default

https://www.c-sharpcorner.com/ebooks/ 80
g.set_axis_labels("Sepal length (mm)", "Sepal width (mm)")
Output

seaborn.regplot()
Syntax
seaborn.regplot(x, y, data=None, x_estimator=None, x_bins=None, x_ci='ci', scatter=True,
fit_reg=True, ci=95, n_boot=1000, units=None, order=1, logistic=False, lowess=False,
robust=False, logx=False, x_partial=None, y_partial=None, truncate=False, dropna=True,
x_jitter=None, y_jitter=None, label=None, color=None, marker='o', scatter_kws=None,
line_kws=None, ax=None)
Plots data and a linear regression model fit.
import seaborn as sns; sns.set(color_codes=True)
tips = sns.load_dataset("tips")
ax = sns.regplot(x=x, y=y, marker="+")
Output

seaborn.residplot()
Syntax
seaborn.residplot(x, y, data=None, lowess=False, x_partial=None, y_partial=None, order=1,
robust=False, dropna=True, label=None, color=None, scatter_kws=None, line_kws=None,
ax=None)

https://www.c-sharpcorner.com/ebooks/ 81
Plots the residuals of linear regression.
import numpy as np
import seaborn as sns
sns.set(style="whitegrid")

# Make an example dataset with y ~ x


rs = np.random.RandomState(10)
x = rs.normal(2, 1, 75)
y = 2 + 1.5 * x + rs.normal(1, 2, 75)

# Plot the residuals after fitting a linear model


sns.residplot(x, y, lowess=True, color="g")
Output

seaborn.heatmap()
Syntax
seaborn.heatmap(data, vmin=None, vmax=None, cmap=None, center=None, robust=False,
annot=None, fmt='.2g', annot_kws=None, linewidths=0, linecolor='white', cbar=True,
cbar_kws=None, cbar_ax=None, square=False, xticklabels='auto', yticklabels='auto',
mask=None, ax=None, **kwargs)
Plots rectangular data as a color-encoded matrix.
import matplotlib.pyplot as plt
import seaborn as sns
sns.set()

# Load the example flights dataset and conver to long-form


flights_long = sns.load_dataset("flights")
flights = flights_long.pivot("month", "year", "passengers")

# Draw a heatmap with the numeric values in each cell


f, ax = plt.subplots(figsize=(9, 7))
sns.heatmap(flights, annot=True, fmt="d", linewidths=.5, ax=ax)
Output

https://www.c-sharpcorner.com/ebooks/ 82
seaborn.clustermap()
Syntax
seaborn.clustermap(data, pivot_kws=None, method='average', metric='euclidean',
z_score=None, standard_scale=None, figsize=None, cbar_kws=None, row_cluster=True,
col_cluster=True, row_linkage=None, col_linkage=None, row_colors=None, col_colors=None,
mask=None, **kwargs)
Plots a matrix dataset as a hierarchically-clustered heatmap.
import pandas as pd
import seaborn as sns
sns.set()

# Load the brain networks example dataset


df = sns.load_dataset("brain_networks", header=[0, 1, 2], index_col=0)

# Select a subset of the networks


used_networks = [1, 5, 6, 7, 8, 12, 13, 17]
used_columns = (df.columns.get_level_values("network")
.astype(int)
.isin(used_networks))
df = df.loc[:, used_columns]

# Create a categorical palette to identify the networks


network_pal = sns.husl_palette(8, s=.45)
network_lut = dict(zip(map(str, used_networks), network_pal))

# Convert the palette to vectors that will be drawn on the side of the
matrix
networks = df.columns.get_level_values("network")
network_colors = pd.Series(networks, index=df.columns).map(network_lut)

# Draw the full plot

https://www.c-sharpcorner.com/ebooks/ 83
sns.clustermap(df.corr(), center=0, cmap="vlag",
row_colors=network_colors, col_colors=network_colors,
linewidths=.75, figsize=(8, 8))
Output

Facet Grids
Function Description
FacetGrid (data[, row, col, hue, col_wrap,
Multi-plot grid for plotting conditional relationships
…])
Apply a plotting function to each facet’s subset of
FacetGrid.map(func, *args, **kwargs)
the data.
FacetGrid.map_dataframe(func, *args, Like .map but passes args as strings and inserts
**kwargs) data in kwargs.

Pair Grids
Function Description
PairGrid(data[, hue, hue_order, palette, Subplot grid for plotting pairwise relationships in a
…]) dataset.
PairGrid.map(func, **kwargs) Plot with the same function in every subplot.
Plot with a univariate function on each diagonal
PairGrid.map_diag(func, **kwargs)
subplot.
Plot with a bivariate function on the off-diagonal
PairGrid.map_offdiag(func, **kwargs)
subplots.
Plot with a bivariate function on the lower diagonal
PairGrid.map_lower(func, **kwargs)
subplots.
Plot with a bivariate function on the upper diagonal
PairGrid.map_upper(func, **kwargs)
subplots.

https://www.c-sharpcorner.com/ebooks/ 84
Joint Grids
Function Description
Grid for drawing a bivariate plot with marginal
JointGrid(x, y[, data, height, ratio, …])
univariate plots.
JointGrid.plot(joint_func, marginal_func[,
Shortcut to draw the full plot.
…])
JointGrid.plot_joint(func, **kwargs) Draw a bivariate plot of x and y.
JointGrid.plot_marginals(func, **kwargs) Draw univariate plots for x and y separately.

Style Control
Function Description
set([context, style, palette, font, …]) Set aesthetic parameters in one step.
Return a parameter dict for the aesthetic style of the
axes_style([style, rc])
plots.
set_style([style, rc]) Set the aesthetic style of the plots.
plotting_context([context, font_scale,
Return a parameter dict to scale elements of the figure.
rc])
set_context([context, font_scale, rc]) Set the plotting context parameters
set_color_codes([palette]) Change how matplotlib color shorthands are interpreted
reset_defaults() Restore all RC params to default settings.
Restore all RC params to original settings (respects
reset_orig()
custom rc)

Color Palettes
Function Description
Set the matplotlib color cycle using a seaborn
set_palette(palette[, n_colors, desat, …])
palette.
color_palette([palette, n_colors, desat]) Return a list of colors defining a color palette.
Get a set of evenly spaced colors in HUSL hue
husl_palette([n_colors, h, s, l])
space.
Get a set of evenly spaced colors in HLS hue
hls_palette([n_colors, h, l, s])
space.
Make a sequential palette from the cubehelix
cubehelix_palette([n_colors, start, rot, …])
system.
Make a sequential palette that blends from dark to
dark_palette(color[, n_colors, reverse, …])
color.
Make a sequential palette that blends from light to
light_palette(color[, n_colors, reverse, …])
color.
diverging_palette(h_neg, h_pos[, s, l, sep, Make a diverging palette between two HUSL
…]) colors.
blend_palette(colors[, n_colors, as_cmap, Make a palette that blends between a list of
input]) colors.
Make a palette with color names from the xkcd
xkcd_palette(colors)
color survey.
Make a palette with color names from Crayola
crayon_palette(colors)
crayons.
mpl_palette(name[, n_colors]) Return discrete colors from a matplotlib palette.

https://www.c-sharpcorner.com/ebooks/ 85
Palette Widgets
Function Description
choose_colorbrewer_palette(data_type[,
Select a palette from the ColorBrewer set
as_cmap])
Launch an interactive widget to create a
choose_cubehelix_palette([as_cmap])
sequential cubehelix palette
Launch an interactive widget to create a light
choose_light_palette([input, as_cmap])
sequential palette
Launch an interactive widget to create a dark
choose_dark_palette([input, as_cmap])
sequential palette
Launch an interactive widget to choose a
choose_diverging_palette([as_cmap])
diverging color palette

Utility Functions
Function Description
load_dataset(name[, cache, Load a dataset from the online repository (requires
data_home]) internet)
despine([fig, ax, top, right, left, bottom,
Remove the top and right spines from plot(s)
…])
Decrease the saturation channel of a color by some
desaturate(color, prop)
percent
saturatE(color) Return a fully saturated color with the same hue
Independently manipulate the h, l, or s channels of a
set_hls_values(color[, h, l, s])
color

Conclusion
In this chapter, we studied python seaborn, installing python seaborn, the difference between
matplotlib and seaborn, seaborn functions, types of graphs available in python seaborn and
python implementation of these functionalities. Hope you were able to understand each and
everything. For any doubts, please comment on your query.

https://www.c-sharpcorner.com/ebooks/ 86
8
Python TensorFlow
Overview

In this chapter, we explore Python TensorFlow, covering its key


concepts and architecture. Prepare yourself for a comprehensive
journey that will empower you to effectively utilize the flexibility and
capabilities of TensorFlow for machine learning and deep learning
applications.

https://www.c-sharpcorner.com/ebooks/ 87
What is Python TensorFlow?
According to Wikipedia, TensorFlow is a free and open-source software library for dataflow and
differentiable programming across a range of tasks. It is a symbolic math library and is also used
for machine learning applications such as neural networks. It is used for both research and
production at Google.
Its flexible architecture allows for the easy deployment of computation across a variety of
platforms (CPUs, GPUs, TPUs), and from desktops to clusters of servers to mobile and edge
devices.
TensorFlow computations are expressed as stateful dataflow graphs. The name TensorFlow
derives from the operations that such neural networks perform on multidimensional data arrays,
which are referred to as tensors. During the Google I/O Conference in June 2016, Jeff Dean
stated that 1,500 repositories on GitHub mentioned TensorFlow, of which only 5 were from
Google.
In Jan 2018, Google announced TensorFlow 2.0. In March 2018, Google announced
TensorFlow.js version 1.0 for machine learning in JavaScript and TensorFlow Graphics for deep
learning in computer graphics.
TensorFlow was developed by the Google Brain team for internal Google use. It was released
under the Apache License 2.0 on November 9, 2015. The official website is www.tensorflow.org.

Key Terms
Tensor
TensorFlow’s name is directly derived from its core framework: Tensor. In TensorFlow, all the
computations involve tensors. A tensor is a vector or matrix of n-dimensions that represents all
types of data. All values in a tensor hold identical data type with a known (or partially
known) shape. The shape of the data is the dimensionality of the matrix or array.
A tensor can be originated from the input data or the result of a computation. In TensorFlow, all
the operations are conducted inside a graph. The graph is a set of computation that takes place
successively. Each operation is called an op node and are connected to each other.
The graph outlines the ops and connections between the nodes. However, it does not display
the values. The edge of the nodes in the tensor, i.e., a way to populate the operation with data.

Graphs
TensorFlow makes use of a graph framework. The graph gathers and describes all the series
computations done during the training.
The graph has lots of advantages:
• It was done to run on multiple CPUs or GPUs and even mobile operating system
• The portability of the graph allows to preserve the computations for immediate or later
use. The graph can be saved to be executed in the future.
• All the computations in the graph are done by connecting tensors together.
• A tensor has a node and an edge. The node carries the mathematical operation and
produces endpoints outputs. The edges explain the input/output relationships between
nodes.
Try to read about graph theory, so as to understand TensorFlow better and to have a deep
understanding of the concept

https://www.c-sharpcorner.com/ebooks/ 88
DistBelief
Starting in 2011, Google Brain built DistBelief as a proprietary machine-learning system based
on deep learning neural networks. Its use grew rapidly across diverse Alphabet companies in
both research and commercial applications. Google assigned multiple computer scientists,
including Jeff Dean, to simplify and refactor the codebase of DistBelief into a faster, more robust
application-grade library, which became TensorFlow. In 2009, the team, led by Geoffrey Hinton,
had implemented generalized backpropagation and other improvements which allowed
generation of neural networks with substantially higher accuracy, for instance, a 25% reduction
in errors in speech recognition.

CPU
Central Processing Unit abbreviation CPU is the electronic circuitry, which works as brains of
the computer that perform the basic arithmetic, logical, control and input/output operations
specified by the instructions of a computer program.

GPU
GPU, the Graphics Processing Unit is a specialized electronic circuit designed to render 2D and
3D graphics together with a CPU. GPU is also known as Graphics Card in the Gamer’s culture.
Now GPUs are being harnessed more broadly to accelerate computational workloads in areas
such as financial modelling, cutting-edge scientific research, deep learning, analytics and oil and
gas exploration etc.

TPU
In May 2016, Google announced its Tensor Processing Unit (TPU), an application-specific
integrated circuit (a hardware chip) built specifically for machine learning and tailored for
TensorFlow. TPU is a programmable AI accelerator designed to provide high throughput of low-
precision arithmetic (e.g., 8-bit), and oriented toward using or running models rather than
training them. Google announced they had been running TPUs inside their data centers for
more than a year and had found them to deliver an order of magnitude better-optimized
performance per watt for machine learning.
In May 2017, Google announced the second-generation, as well as the availability of the TPUs
on Google Compute Engine. The second-generation TPUs delivers up to 180 teraflops of
performance, and when organized into clusters of 64 TPUs, provide up to 11.5 petaflops.
In May 2018, Google announced the third-generation TPUs delivering up to 420 teraflops of
performance and 128 GB HBM. Cloud TPU v3 Pods offer 100+ petaflops of performance and 32
TB HBM.

Edge TPU
In July 2018, the Edge TPU was announced. Edge TPU is Google’s purpose-built ASIC chip
designed to run TensorFlow Lite machine learning (ML) models on small client computing
devices such as smartphones known as edge computing.

TensorFlow Lite
In May 2017, Google announced a software stack specifically for mobile development,
TensorFlow Lite. In January 2019, TensorFlow team released a developer preview of the mobile
GPU inference engine with OpenGL ES 3.1 Compute Shaders on Android devices and Metal
Compute Shaders on iOS devices. In May 2019, Google announced that their TensorFlow Lite
Micro (also known as TensorFlow Lite for Microcontrollers) and ARM's uTensor would be
merging.

https://www.c-sharpcorner.com/ebooks/ 89
Pixel Visual Core (PVC)
In October 2017, Google released the Google Pixel 2 which featured their Pixel Visual Core
(PVC), a fully programmable image, vision and AI processor for mobile devices. The PVC
supports TensorFlow for machine learning (and Halide for image processing).

TensorFlow Session
A session will execute the operation from the graph. To feed the graph with the values of a
tensor, you need to open a session. Inside a session, you must run an operator to create an
output.

Features/Advantages of TensorFlow
• TensorFlow has a responsive construct as you can easily visualize each and every part
of the graph.
• It has platform flexibility, meaning it is modular and some parts of it can be standalone
while the others coalesced.
• It is easily trainable on CPU as well as GPU for distributed computing.
• TensorFlow has auto differentiation capabilities which benefit gradient-based machine
learning algorithms meaning you can compute derivatives of values with respect to other
values which results in a graph extension.
• Also, it has advanced support for threads, asynchronous computation, and queues.
It is a customizable and open source.
• TensorFlow lets you execute subparts of a graph which gives it an upper hand as you
can introduce and retrieve discrete data onto an edge and therefore offers great
debugging method.
• The libraries can be deployed on a gamut of hardware machines, starting from cellular
devices to computers with complex setups.
• TensorFlow is highly parallel and designed to use various backends software (GPU,
ASIC) etc.

Shortcomings/Disadvantages of TensorFlow
• The feature that’s most required when it comes to variable-length sequences are the
symbolic loops. Unfortunately, TensorFlow does not offer this feature
• TensorFlow has GPU memory conflicts with Theano if imported in the same scope.
No support for OpenCL and Windows
• Requires prior knowledge of advanced calculus and linear algebra along with a pretty
good understanding of machine learning.
• TensorFlow lacks behind in both speed and usage when compared to its competitors
• No GPU support other than Nvidia and only language support

TensorFlow Architecture Components


TensorFlow Servables
These are the central rudimentary units in TensorFlow Serving. TensorFlow Servables are the
objects that clients use to perform the computation.
The size of a servable is flexible. A single servable might include anything from a lookup table to
a single model to a tuple of inference models. Servables can be of any type and interface,
enabling flexibility and future improvements such as:
• Streaming results

https://www.c-sharpcorner.com/ebooks/ 90
• Experimental APIs
• Asynchronous modes of operation

TensorFlow Servable Versions


TensorFlow Serving can handle one or more versions of a servable, over the lifetime of a single
server instance. This opens the door for fresh algorithm configurations, weights, and other data
to be loaded over time. They also enable more than one version of a servable to be loaded
concurrently, supporting gradual roll-out and experimentation. At serving time, clients may
request either the latest version or a specific version id, for a particular model.

TensorFlow Servable Streams


A sequence of versions of a servable sorted by increasing version numbers.

TensorFlow Models
A Serving represents a model as one or more servables. A machine-learned model may include
one or more algorithms (including learned weights) and lookup or embedding tables. A servable
can also serve as a fraction of a model, for example, a large lookup table can be served as
many instances as possible.

TensorFlow Loaders
Loaders manage a servable’s life cycle. The Loader API enables common infrastructure
independent from specific learning algorithms, data or product use-cases involved. Specifically,
Loaders standardize the APIs for loading and unloading a servable.

Sources in TensorFlow Architecture


Sources are in simple terms, modules that find and provide servables. Each Source provides
zero or more servable streams. For each servable stream, a Source supplies one Loader
instance for each version it makes available to be loaded.

TensorFlow Managers
TensorFlow Managers handle the full lifecycle of Servables, including:
• Loading Servables
• Serving Servables
• Unloading Servables
Managers listen to sources and track all versions. The Manager tries to fulfil Sources’ requests
but may refuse to load an aspired version. Managers may also postpone an “unload”. For
example, a Manager may wait to unload until a newer version finish loading, based on a policy
to guarantee that at least one version is loaded at all times. For example, GetServableHandle(),
for clients to access loaded servable instances.

TensorFlow Core
Using the standard TensorFlow Serving APIs, TensorFlow Serving Core manages the following
aspects of servables:
• lifecycle
• metrics
TensorFlow Serving Core treats servables and loaders as opaque objects.

https://www.c-sharpcorner.com/ebooks/ 91
TensorFlow Batcher
Batching of multiple requests into a single request can significantly reduce the cost of
performing inference, especially in the presence of hardware accelerators such as GPUs.
TensorFlow Serving includes a request batching widget that let clients easily batch their type-
specific inferences across requests into batch requests that algorithm systems can more
efficiently process.

Life Cycle of TensorFlow Servable


• Sources create Loaders for Servable Versions, then Loaders are sent as Aspired
Versions to the Manager, which loads and serves them to client requests.
• The Loader contains whatever metadata it needs to load the Servable.
• The Source uses a callback to notify the manager of the Aspired Version.
• The manager applies the configured Version Policy to determine the next action to take.
• If the manager determines that it’s safe, it gives the Loader the required resources and
tells the Loader to load the new version.
• Clients ask the manager for the Servable, either specifying a version explicitly or just
requesting the latest version. The manager returns a handle for the Servable. The
dynamic Manager applies the Version Policy and decides to load the new version.
• The Dynamic Manager tells the Loader that there is enough memory. The Loader
instantiates the TensorFlow graph with the new weights.
• A client requests a handle to the latest version of the model, and the Dynamic Manager
returns a handle to the new version of the Servable.

Installing Python TensorFlow


Ubuntu/Linux
sudo apt update -y
sudo apt upgrade -y
sudo apt install python3-tk python3-pip -y

pip install tensorflow # Python 2.7; CPU support (no GPU support)

pip3 install tensorflow # Python 3.n; CPU support (no GPU support)

pip install tensorflow-gpu # Python 2.7; GPU support


pip3 install tensorflow-gpu # Python 3.n; GPU support
If the above commands fail, it may be possible that you are using an old binary to run the
following commands
sudo pip install --upgrade tfBinaryURL # Python 2.7
sudo pip3 install --upgrade tfBinaryURL # Python 3.n

Using Docker
• Install Docker on your system if not installed already, using the link
• For GPU support on Linux, install nvidia-docker. The latest version of Docker includes
native support for GPUs and nvidia-docker is not necessary.
• The official TensorFlow Docker images are located in the tensorflow/tensorflow Docker
Hub repository
• The following downloads TensorFlow release images to your machine:

https://www.c-sharpcorner.com/ebooks/ 92
docker pull tensorflow/tensorflow # latest stable r
elease
docker pull tensorflow/tensorflow:devel-
gpu # nightly dev release w/ GPU support
docker pull tensorflow/tensorflow:latest-gpu-
jupyter # latest release w/ GPU support and Jupyter
To run a TensorFlow Docker image, execute the following command
docker run [-it] [--rm] [-
p hostPort:containerPort] tensorflow/tensorflow[:tag] [command]
For details on executing docker images, see the docker run reference.

Anaconda Prompt
conda create -n tensorflow_env tensorflow
conda activate tensorflow_env #for CPU
Use the above command when installing on CPUs
conda create -n tensorflow_gpuenv tensorflow-gpu
conda activate tensorflow_gpuenv
Use the above command when installing on GPUs

Commonly Implemented Algorithms in TensorFlow


The following are a list of algorthirms and their corresponding TendorFlow functions
• Linear regression: tf.estimator.LinearRegressor
• Classification:tf.estimator.LinearClassifier
• Deep learning classification: tf.estimator.DNNClassifier
• Deep learning wipe and deep: tf.estimator.DNNLinearCombinedClassifier
• Booster tree regression: tf.estimator.BoostedTreesRegressor
• Boosted tree classification: tf.estimator.BoostedTreesClassifier

Creating a Tensor
Following is the procedure of creating a Tensor
Syntax
tf.constant(value, dtype, name = "")
arguments
• value: Value of n dimension to define the tensor.
Optional -
• dtype: Define the type of data:
• tf.string: String variable
• tf.float32: Flot variable
• tf.int16: Integer variable
• name: Name of the tensor.
Optional.
By default, `Const_1:0`

https://www.c-sharpcorner.com/ebooks/ 93
to create a tensor of dimension 0
## rank 0
# Default name
import tensorflow as tf
r1 = tf.constant(1, tf.int16)
print(r1)
r2 = tf.constant(1, tf.int16, name = "my_scalar")
print(r2)
The output of the above code will be Tensor("Const_1:0", shape=(), dtype=int16)
Tensor("my_scalar:0", shape=(), dtype=int16)

to create a tensor with decimal or string values


import tensorflow as tf
# Decimal
r1_decimal = tf.constant(1.12345, tf.float32)
print(r1_decimal)
# String
r1_string = tf.constant("Guru99", tf.string)
print(r1_string)
The output of the above code will be Tensor("Const_2:0", shape=(), dtype=float32)
Tensor("Const_3:0", shape=(), dtype=string)

to create a tensor of dimension 1


import tensorflow as tf

r2_boolean = tf.constant([True, True, False], tf.bool)


print(r2_boolean)
## Rank 2
r2_matrix = tf.constant([ [1, 2],
[3, 4] ],tf.int16)
print(r2_matrix)
The output of the above code will be Tensor("Const_4:0", shape=(3,), dtype=bool)
Tensor("Const_5:0", shape=(2, 2), dtype=int16)

Tensor Attributes
Given below is a list of commonly used tensor attributes

tensorflow.shape
It is used for returning the shape of the tensor
import tensorflow as tf

# Shape of tensor
m_shape = tf.constant([ [10, 11],
[12, 13],
[14, 15] ]
)

https://www.c-sharpcorner.com/ebooks/ 94
m_shape.shape
The output of the above code will be TensorShape([Dimension(3), Dimension(2)])

tensorflow.zeros
It is used for creating a tensor of the given dimension with all elements being zero
import tensorflow as tf
# Create a vector of 0
print(tf.zeros(10))
The output of the above code will be Tensor("zeros:0", shape=(10,), dtype=float32)

tensorflow.ones
It is used for creating a tensor of the given dimension with all elements being one
import tensorflow as tf
# Create a vector of 1
print(tf.ones([10, 10]))
# Create a vector of ones with the same number of rows as m_shape
print(tf.ones(m_shape.shape[0]))
# Create a vector of ones with the same number of column as m_shape
print(tf.ones(m_shape.shape[1]))

print(tf.ones(m_shape.shape))
The output of the above code will be Tensor("ones_1:0", shape=(10, 10), dtype=float32)
Tensor("ones_2:0", shape=(3,), dtype=float32) Tensor("ones_3:0", shape=(2,), dtype=float32)
Tensor("ones_4:0", shape=(3, 2), dtype=float32)

tensorflow.dtype
It is used to find the data type of the elements of the tensor
import tensorflow as tf
m_shape = tf.constant([ [10, 11],
[12, 13],
[14, 15] ]
)
print(m_shape.dtype)
The output of the above code will be <dtype: 'int32'>
import tensorflow as tf

# Change type of data


type_float = tf.constant(3.123456789, tf.float32)
type_int = tf.cast(type_float, dtype=tf.int32)
print(type_float.dtype)
print(type_int.dtype)
The output of the above code will be <dtype: 'float32'> <dtype: 'int32'>

TensorFlow Useful Functions


Following are some mathematical functions that are useful for manipulating the tensor

https://www.c-sharpcorner.com/ebooks/ 95
• tensorflow.add(a, b)
• tensorflow.substract(a, b)
• tensorflow.multiply(a, b)
• tensorflow.div(a, b)
• tensorflow.pow(a, b)
• tensorflow.exp(a)
• tensorflow.sqrt(a)
import tensorflow as tf

x = tf.constant([2.0], dtype = tf.float32)


tensor_a = tf.constant([[1,2]], dtype = tf.int32)
tensor_b = tf.constant([[3, 4]], dtype = tf.int32)

#Sqaure Root
print(tf.sqrt(x))
#Exponential
print(tf.exp(x))
#Power
print(tf.pow(x,x))
#Add
tensor_add = tf.add(tensor_a, tensor_b)
print(tensor_add)
#Substarct
tensor_sub = tf.subtract(tensor_a, tensor_b)
print(tensor_sub)
#Multiply
tensor_mul = tf.multiply(tensor_a, tensor_b)
print(tensor_mul)
#Divide
tensor_div = tf.div(tensor_a, tensor_b)
print(tensor_div)
The above code, demonstrates the use of all the above mentioned TensorFlow function0

TensorFlow Variables
To create variables in TensorFlow we use tensorflow.get_variable()
Syntax
tf.get_variable(name = "", values, dtype, initializer)
argument
• name : Name of the variable
• values: Dimension of the tensor
• dtype: Type of data. Optional
• initializer: How to initialize the tensor. Optional
If initializer is specified, there is no need to include the `values` as the shape of `initializer` is
used.

https://www.c-sharpcorner.com/ebooks/ 96
import tensorflow as tf

# Create a Variable
var = tf.get_variable("var", [1, 2])
print(var)

#following initializes the variable with a initial/default value


var_init_1 = tf.get_variable("var_init_1", [1, 2], dtype=tf.int32, ini
tializer=tf.zeros_initializer)
print(var_init_1)

#Initializes the first value of the tensor equals to tensor_const


tensor_const = tf.constant([[10, 20],[30, 40]])
var_init_2 = tf.get_variable("var_init_2", dtype=tf.int32, initializer
=tensor_const)
print(var_init_2)
The output of the above code will be <tf.Variable 'var:0' shape=(1, 2) dtype=float32_ref>
<tf.Variable 'var_init_1:0' shape=(1, 2) dtype=int32_ref> <tf.Variable 'var_init_2:0' shape=(2, 2)
dtype=int32_ref>

TensorFlow Placeholder
A placeholder has the purpose of feeding the tensor. Placeholder is used to initialize the data to
flow inside the tensors. To supply a placeholder, you need to use the method feed_dict. The
placeholder will be fed only within a session.
Syntax
tf.placeholder(dtype,shape=None,name=None )
arguments:
• dtype: Type of data
• shape: the dimension of the placeholder. Optional. By default, the shape of the data
• name: Name of the placeholder. Optional
import tensorflow as tf
data_placeholder_a = tf.placeholder(tf.float32, name = "data_placeholde
r_a")
print(data_placeholder_a)
The output of the above code will be Tensor("data_placeholder_a:0", dtype=float32)

TensorFlow Session
Following we will demonstrate using a TensorFlow Session
import tensorflow as tf

## Create, run and evaluate a session


x = tf.constant([2])
y = tf.constant([4])
## Create operator
multiply = tf.multiply(x, y)

https://www.c-sharpcorner.com/ebooks/ 97
## Create a session to run the code
sess = tf.Session()
result_1 = sess.run(multiply)
print(result_1)
sess.close()
The output of the above code will be [8]

Simple Python TensorFlow Program


import numpy as np
import tensorflow as tf
In the above code, we are importing numpy and TensorFlow and also renaming them as np and
tf respectively.
X_1 = tf.placeholder(tf.float32, name = "X_1")
X_2 = tf.placeholder(tf.float32, name = "X_2")
In the above code, we are defining the two variables X_1 and X_2. When we create a
placeholder node, we have to pass in the data type will be adding numbers here so we can use
a floating-point data type, let's use tf.float32. We also need to give this node a name. This name
will show up when we look at the graphical visualizations of our model.
multiply = tf.multiply(X_1, X_2, name = "multiply")
In the above code, we define the node that does the multiplication operation. In Tensorflow we
can do that by creating a tf.multiply node. This node will result in the product of X_1 and X_2
with tf.Session() as session:
result = session.run(multiply, feed_dict={X_1:[1,2,3], X_2:[4,5,6]}
)
print(result)
To execute operations in the graph, we have to create a session. In Tensorflow, it is done by
tf.Session(). Now that we have a session, we can ask the session to run operations on our
computational graph by calling session. To run the computation, we need to use run.
When the addition operation runs, it is going to see that it needs to grab the values of the X_1
and X_2 nodes, so we also need to feed in values for X_1 and X_2. We can do that by
supplying a parameter called feed_dict. We pass the value 1,2,3 for X_1 and 4,5,6 for X_2.

Simple_TensorFlow.py
import numpy as np
import tensorflow as tf

X_1 = tf.placeholder(tf.float32, name = "X_1")


X_2 = tf.placeholder(tf.float32, name = "X_2")

multiply = tf.multiply(X_1, X_2, name = "multiply")


with tf.Session() as session:
result = session.run(multiply, feed_dict={X_1:[1,2,3], X_2:[4,5,6]}
)
print(result)
The Above Consolidated Program will give the following result: [ 4. 10. 18.]

https://www.c-sharpcorner.com/ebooks/ 98
Methods to Load Data using Python TensorFlow
There are two ways to load data, they are as follows:

Load Data using NumPy Array


We can hard-code data into a NumPy Array or can load data from an xls or xlsx or CSV into a
Pandas DataFrame which can then be converted into a NumPy Array. If your dataset is not too
big, i.e., less than 10 gigabytes, you can use this method. The data can fit into memory.
## Numpy to pandas
import numpy as np
import pandas as pd

h = [[1,2],[3,4]]
df_h = pd.DataFrame(h)
print('Data Frame:', df_h)

## Pandas to numpy
df_h_n = np.array(df_h)
print('Numpy array:', df_h_n)
The output of the above code will be Data Frame: 0 1 0 1 2 1 3 4 Numpy array: [[1 2] [3 4]]

Load Data using TensorFlow Data Pipeline


TensorFlow has built-in API that helps you to load the data, perform the operation and feed the
machine learning algorithm easily. This method works very well especially when you have a
large dataset. For instance, image records are known to be enormous and do not fit into
memory. The data pipeline manages the memory by itself.
This method works best if you have a large dataset. For instance, if you have a dataset of 50
gigabytes, and your computer has only 16 gigabytes of memory then the machine will crash.
In this situation, you need to build a TensorFlow pipeline. The pipeline will load the data in
batch, or small chunk. Each batch will be pushed to the pipeline and be ready for the training.
Building a pipeline is an excellent solution because it allows you to use parallel computing. It
means TensorFlow will train the model across multiple CPUs. It fosters the computation and
permits for training powerful neural network.

Methods to create TensorFlow Data Pipeline


Create the Data
import numpy as np
import tensorflow as tf
x_input = np.random.sample((1,2))
print(x_input)
In the above code, we are generating two random numbers using the NumPy's Random Number
Generator

Create the Placeholder


x = tf.placeholder(tf.float32, shape=[1,2], name = 'X')
We are creating a placeholder using the tf.placeholder()

https://www.c-sharpcorner.com/ebooks/ 99
Define the Dataset Method
dataset = tf.data.Dataset.from_tensor_slices(x)
We define the dataset method as tf.data.Dataset.from_tensor_slices()

Create the Pipeline


iterator = dataset.make_initializable_iterator()
get_next = iterator.get_next()
In the above code, we need to initialize the pipeline where the data will flow. We need to create
an iterator with make_initializable_iterator. We name it iterator. Then we need to call this iterator
to feed the next batch of data, get_next. We name this step get_next. Note that in our example,
there is only one batch of data with only two values.

Execute the Operation


with tf.Session() as sess:
# feed the placeholder with data
sess.run(iterator.initializer, feed_dict={ x: x_input })
print(sess.run(get_next)) # output [ 0.52374458 0.71968478]
In the above code, we initiate a session, and we run the operation iterator. We feed the
feed_dict with the value generated by numpy. These two values will populate the placeholder x.
Then we run get_next to print the result.

TensorFlow_Pipeline.py
import numpy as np
import tensorflow as tf
x_input = np.random.sample((1,2))
print(x_input)
# using a placeholder
x = tf.placeholder(tf.float32, shape=[1,2], name = 'X')
dataset = tf.data.Dataset.from_tensor_slices(x)
iterator = dataset.make_initializable_iterator()
get_next = iterator.get_next()
with tf.Session() as sess:
# feed the placeholder with data
sess.run(iterator.initializer, feed_dict={ x: x_input })
print(sess.run(get_next))
The output of the above code will be [[0.87908525 0.80727791]] [0.87908524 0.8072779]

Conclusion
In this chapter, we studied python TensorFlow, key terms, advantages and disadvantages of
TensorFlow, TensorFlow architecture, lifecycle of TensorFlow servable, installing TensorFlow,
commonly interpreted algorithms in TensorFlow, creating a tensor, tensor attributes, TensorFlow
useful functions, TensorFlow variables, TensorFlow placeholder, TensorFlow session, simple
python TensorFlow program, methods to load data using python TensorFlow, method to create
TensorFlow data pipeline and python implementation of these functionalities. Hope you were
able to understand each and everything. For any doubts, please comment on your query.

https://www.c-sharpcorner.com/ebooks/ 100
OUR MISSION
Free Education is Our Basic Need! Our mission is to empower millions of developers worldwide by
providing the latest unbiased news, advice, and tools for learning, sharing, and career growth. We’re
passionate about nurturing the next young generation and help them not only to become great
programmers, but also exceptional human beings.

ABOUT US
CSharp Inc, headquartered in Philadelphia, PA, is an online global community of software
developers. C# Corner served 29.4 million visitors in year 2022. We publish the latest news and articles
on cutting-edge software development topics. Developers share their knowledge and connect via
content, forums, and chapters. Thousands of members benefit from our monthly events, webinars,
and conferences. All conferences are managed under Global Tech Conferences, a CSharp
Inc sister company. We also provide tools for career growth such as career advice, resume writing,
training, certifications, books and white-papers, and videos. We also connect developers with their poten-
tial employers via our Job board. Visit C# Corner

MORE BOOKS

You might also like