Python Machine Learning Workbook For Beginners
Python Machine Learning Workbook For Beginners
In this chapter, you will see how to set up the Python environment needed to run
various data science and machine learning libraries. The chapter also contains a
crash Python course for absolute beginners to Python. Finally, the different data
science and machine learning libraries that we are going to study in this book
have been discussed. The chapter ends with a simple exercise.
Before you delve deep into developing data science and machine learning
applications, you have to know what the field of data science and machine
learning is, what you can do with that, and what are some of the best tools and
libraries that you can use.
If you wish to be a data science and machine learning expert, you have to learn
programming. There is no working around this fact. Although there are several
cloud-based machine learning platforms like Amazon Sage Maker and Azure
ML Studio where you can create data science applications without writing a
single line of code, however, to get finegrained control over your applications,
you will need to learn programming.
Start with very basic data science applications. I would rather recommend that
you should not start developing data science applications right away. Start with
basic mathematical and numerical operations like computing dot products and
matrix multiplication, etc.
Once you are familiar with basic machine learning and deep learning
algorithms, you are good to go for developing data science applications. Data
science applications can be of different types, i.e., predicting house prices,
recognizing images, classifying text, etc. Being a beginner, you should try to
develop versatile data science applications, and later, when you find your area
of interest, e.g., natural language processing or image recognition, delve deep
into that. It is important to mention that this book provides a very generic
introduction to data science, and you will see applications of data science to
structured data, textual data, and image data. However, this book is not
dedicated to any specific data science field.
The time has come to install Python on Windows using an IDE. In fact, we will
use Anaconda throughout this book, right from installing Python to writing
multi-threaded codes in the coming lectures. Now, let us get going with the
installation.
This section explains how you can download and install Anaconda on
Windows.
2. The browser will take you to the following webpage. Select the latest version
Python (3.7 at the time of writing this book). Now, click the Download butto
download the executable file. Depending upon the speed of your internet, the
file will download within 2–3 minutes.
3. Run the executable file after the download is complete. You will most likely
find the download file in your download folder. The name of the file should
similar to “Anaconda3- 5.1.0-Windows-x86_64.” The installation wizard wi
open when you run the file, as shown in the following figure. Click the Next
button.
4. Now click I Agree on the License Agreement dialog as shown in the followin
screenshot.
5. Check the Just Me radio button from the Select Installation Type dialogue bo
Click the Next button to continue.
6. Now, the Choose Install Location dialog will be displayed. Change the direc
if you want, but the default is preferred. The installation folder should at leas
have 3 GB of free space for Anaconda. Click the Next button.
7. Go for the second option, Register Anaconda as my default Python 3.7 in the
Advanced Installation Options dialog box. Click the Install button to start th
installation, which can take some time to complete.
8. Click Next once the installation is complete.
9. Click Skip on the Microsoft Visual Studio Code Installation dialog box.
10. You have successfully installed Anaconda on your Windows. Excellent job.
next step is to uncheck both checkboxes on the dialog box. Now, click on the
Finish button.
1.3.2. Mac Setup
Anaconda’s installation process is almost the same for Mac. It may differ
graphically, but you will follow the same steps you followed for Windows. The
only difference is that you have to download the executable file, which is
compatible with the Mac operating system.
This section explains how you can download and install Anaconda on Mac.
2. The browser will take you to the following webpage. Select the latest version
Python for Mac. (3.7 at the time of writing this book). Now, click the Downl
button to download the executable file. Depending upon the speed of your
internet, the file will download within 2–3 minutes.
3. Run the executable file after the download is complete. You will most likely
find the download file in your download folder. The name of the file should
similar to “Anaconda3-5.1.0-Windows-x86_64.” The installation wizard wil
open when you run the file, as shown in the following figure. Click the
Continue button.
$ cd / tmp
$ curl –o https://repo.anaconda.com.archive/Anaconda3-5.2.0-Linux-x86_64.sh
3. You should also use the cryptographic hash verification through the SHA-25
checksum to verify the integrity of the installer.
$ sha256sum Anaconda3-5.2.0-Linux-x86_64.sh
09f53738b0cd3bb96f5b1bac488e5528df9906be2480fe61df40e0e0d19e3d48 Anaconda3-5.2.0-Linux-
x86_64.sh
4. The fourth step is to run the Anaconda Script, as shown in the following figu
$ bash Anaconda3-5.2.0-Linux-x86_64.sh
The command line will generate the following output. You will be asked to
review the license agreement. Keep on pressing Enter until you reach the
end.
Output
Welcome to Anaconda3 5.2.0
In order to continue the installation process, please review the license agreement.
Please, press Enter to continue
>>>
…
Do you approve the license terms? [yes|No]
Type, Yes, when you get to the bottom of the License Agreement.
5. The installer will ask you to choose the installation location after you agree t
the license agreement. Simply press Enter to choose the default location. You
can also specify a different location if you want.
Output
[/home/tola/anaconda3] >>>
The installation will proceed once you press Enter. Once again, you have
to be patient as the installation process takes some time to complete.
6. You will receive the following result when the installation is complete. If you
wish to use the conda command, type Yes.
Output
…
Installation finished.
Do you wish the installer to prepend Anaconda3 install location to path in your /home/tola/.bashrc?
[yes|no]
[no]>>>
At this point, you will also have the option to download the Visual Studio
Code. Type yes or no to install or decline, respectively.
7. Use the following command to activate your brand new installation of
Anaconda3.
$ source `/.bashrc
8. You can also test the installation using the conda command.
$ conda list
In addition to local Python environments such as Anaconda, you can run deep
learning applications on Google Colab, as well, which is Google’s platform for
deep learning with GPU support. All the codes in this book have been run using
Google Colab. Therefore I would suggest that you use Google Colab, too.
To run deep learning applications via Google Colab, all you need is a
Google/Gmail account. Once you have a Google/ Gmail account, you can
simply go to:
https://colab.research.google.com/
Next, click on File -> New notebook, as shown in the following screenshot.
Next, to run your code using GPU, from the top menu, select Runtime ->
Change runtime type, as shown in the following screenshot:
You should see the following window. Here from the dropdown list, select
GPU, and click the Save button.
To make sure you are running the latest version of TensorFlow, execute the
following script in the Google Colab notebook cell. The following script will
update your TensorFlow version.
To check if you are really running TensorFlow version > 2.0, execute the
following script.
1. import tensorflow as tf
2. print (tf.__version__)
With Google Cloud, you can import the datasets from your Google drive.
Execute the following script. And click on the link that appears, as shown
below:
You will be prompted to allow Google Colab to access your Google drive. Click
the Allow button, as shown below:
You will see a link appear, as shown in the following image (the link has been
blinded here).
Copy the link and paste it in the empty field in the Google Colab cell, as shown
below:
This way, you can import datasets from your Google drive to your Google
Colab environment.
In the next chapter, you will see how to write your first program in Python,
along with other Python programming concepts.
CHAPTER
If you are familiar with the elementary concepts of Python programming language,
you can skip this chapter. For those who are absolute beginners to Python, this
section provides a very brief overview of some of the most basic concepts of
Python. Python is a very vast programming language, and this section is by no
means a substitute for a complete Python Book. However, if you want to see how
various operations and commands are executed in Python, you are welcome to
follow along the rest of this section.
Jupyter Notebook consists of cells, as evident from the above image, making its
layout very simple and straightforward. You will write your code inside these cells.
Let us write our first ever Python program in Jupyter Notebook.
The above script basically prints a string value in the output using the print()
method. The print() method is used to print on the console, any string passed to it.
If you see the following output, you have successfully run your first Python
program.
Output:
Welcome to Data Visualization with Python
Let’s now explore some of the other important Python concepts starting with
Variables and Data Types.
Each script in this book has been executed via Jupyter Notebook. So, install
Jupyter Notebook on your system.
The Numpy and Pandas libraries should also be installed before this chapter.
b. Integers
f. Tuples
g. Dictionaries
A variable is an alias for the memory address where actual data is stored. The data
or the values stored at a memory address can be accessed and updated via the
variable name. Unlike other programming languages like C++, Java, and C#,
Python is loosely typed, which means that you don’t have to specify the data type
while creating a variable. Rather, the type of data is evaluated at runtime.
The following example shows how to create different data types and how to store
them in their corresponding variables. The script also prints the type of the
variables via the type() function.
Script 2:
1. # A string Variable
2. first_name = “Joseph”
3. print (type(first_name))
4.
5. # An Integer Variable
6. age = 20
7. print (type(age))
8.
9. # A floating point variable
10. weight = 70.35
11. print (type(weight))
12.
13. # A floating point variable
14. married = False
15. print (type(married))
16.
17. #List
18. cars = [“Honda” , “Toyota” , “Suzuki” ]
19. print (type(cars))
20.
21. #Tuples
22. days = (“Sunday” , “Monday” , “Tuesday” , “Wednesday” , “Thursday” , “Friday” , “Saturday” )
23. print (type(days))
24.
25. #Dictionaries
26. days2 = {1:”Sunday” , 2:”Monday” , 3:”Tuesday” , 4:”Wednesday” , 5:”Thursday” , 6:”Friday” ,
7:”Saturday” }
27. print (type(days2))
Output:
<class ‘str’>
<class ‘int’>
<class ‘float’>
<class ‘bool’>
<class ‘list’>
<class ‘tuple’>
<class ‘dict’>
b. Logical Operators
c. Comparison Operators
d. Assignment Operators
e. Membership Operators
Arithmetic Operators
Script 3:
1. X = 20
2. Y = 10
3. print (X + Y)
4. print (X – Y)
5. print (X * Y)
6. print (X / Y)
7. print (X ** Y)
Output:
30
10
200
2.0
10240000000000
Logical Operators
Logical operators are used to perform logical AND, OR, and NOT operations in
Python. The following table summarizes the logical operators. Here, X is True, and
Y is False.
Here is an example that explains the usage of the Python logical operators.
Script 4:
1. X = True
2. Y = False
3. print (X and Y)
4. print (X or Y)
5. print (not (X and Y))
Output:
1. False
2. True
3. True
Comparison Operators
Comparison operators, as the name suggests, are used to compare two or more than
two operands. Depending upon the relation between the operands, comparison
operators return Boolean values. The following table summarizes comparison
operators in Python. Here, X is 20, and Y is 35.
Script 5
1. X = 20
2. Y = 35
3.
4. print (X == Y)
5. print (X != Y)
6. print (X > Y)
7. print (X < Y)
8. print (X >= Y)
9. print (X <= Y)
Output:
False
True
False
True
False
True
Assignment Operators
Assignment operators are commonly used to assign values to variables. The
following table summarizes the assignment operators. Here, X is 20, and Y is equal
to 10.
Take a look at script 6 to see Python assignment operators in action.
Script 6:
1. X = 20; Y = 10
2. R = X + Y
3. print (R)
4.
5. X = 20;
6. Y = 10
7. X += Y
8. print (X)
9.
10. X = 20;
11. Y = 10
12. X -= Y
13. print (X)
14.
15. X = 20;
16. Y = 10
17. X *= Y
18. print (X)
19.
20. X = 20;
21. Y = 10
22. X /= Y
23. print (X)
24.
25. X = 20;
26. Y = 10
27. X %= Y
28. print (X)
29.
30. X = 20;
31. Y = 10
32. X **= Y
33. print (X)
Output:
30
30
10
200
2.0
0
10240000000000
Membership Operators
Script 7:
1. days = (“Sunday” , “Monday” , “Tuesday” , “Wednesday” , “Thursday” , “Friday” , “Saturday” )
2. print (‘Sunday’ in days)
Output:
True
Script 8:
Output:
True
b. If-else statement
c. If-elif statement
IF Statement
If you have to check for a single condition and you do not concern about the
alternate condition, you can use the if statement. For instance, if you want to check
if 10 is greater than 5 and based on that you want to print a statement, you can use
the if statement. The condition evaluated by the if statement returns a Boolean
value. If the condition evaluated by the if statement is true, the code block that
follows the if statement executes. It is important to mention that in Python, a new
code block starts at a new line with on tab indented from the left when compared
with the outer block.
Here, in the following example, the condition 10 > 5 is evaluated, which returns
true. Hence, the code block that follows the if statement executes, and a message is
printed on the console.
Script 9:
1. # The if statement
2.
3. if 10 > 5:
4. print (“Ten is greater than 10” )
Output:
IF-Else Statement
The If-else statement comes in handy when you want to execute an alternate piece
of code in case the condition for the if statement returns false. For instance, in the
following example, the condition 5 < 10 will return false. Hence, the code block
that follows the else statement will execute.
Script 10:
1. # if-else statement
2.
3. if 5 > 10:
4. print (“5 is greater than 10” )
5. else:
6. print (“10 is greater than 5” )
Output:
10 is greater than 5
IF-Elif Statement
The if-elif statement comes in handy when you have to evaluate multiple
conditions. For instance, in the following example, we first check if 5 > 10, which
evaluates to false. Next, an elif statement evaluates the condition 8 < 4, which also
returns false. Hence, the code block that follows the last else statement executes.
Script 11:
Output:
b. While Loop
For Loop
The for loop is used to iteratively execute a piece of code for a certain number of
times. You should use for loop when you exactly know the number of iterations or
repetitions for which you want to run your code. A for loop iterates over a
collection of items. In the following example, we create a collection of five integers
using the range() method. Next, a for loop iterates five times and prints each
integer in the collection.
Script 12:
1. items = range(5)
2. for item in items:
3. print (item)
Output:
0
1
2
3
4
While Loop
The while loop keeps executing a certain piece of code unless the evaluation
condition becomes false. For instance, the while loop in the following script keeps
executing unless variable c becomes greater than 10.
Script 13:
1. c = 0
2. while c < 10:
3. print (c)
4. c = c +1
Output:
0
1
2
3
4
5
6
7
8
9
2.6. Functions
Functions in any programming language are typically used to implement the piece
of code that is required to be executed multiple times at different locations in the
code. In such cases, instead of writing long pieces of code again and again, you can
simply define a function that contains the piece of code, and then, you can call the
function wherever you want in the code.
The def keyword is used to create a function in Python, followed by the name of the
function and opening and closing parenthesis.
Once a function is defined, you have to call it in order to execute the code inside a
function body. To call a function, you simply have to specify the name of the
function, followed by opening and closing parenthesis. In the following script, we
create a function named myfunc, which prints a simple statement on the console
using the print() method.
Script 14:
1. def myfunc():
2. print (“This is a simple function” )
3.
4. ### function call
5. myfunc()
Output:
You can also pass values to a function. The values are passed inside the parenthesis
of the function call. However, you must specify the parameter name in the function
definition, too. In the following script, we define a function named
myfuncparam(). The function accepts one parameter, i.e., num. The value passed
in the parenthesis of the function call will be stored in this num variable and will be
printed by the print() method inside the myfuncparam() method.
Script 15:
1. def myfuncparam(num):
2. print (“This is a function with parameter value: “+num)
3.
4. ### function call
5. myfuncparam(“Parameter 1” )
Output:
Finally, a function can also return values to the function call. To do so, you simply
have to use the return keyword, followed by the value that you want to return. In
the following script, the myreturnfunc() function returns a string value to the
calling function.
Script 16:
1. def myreturnfunc():
2. return “This function returns a value”
3.
4. val = myreturnfunc()
5. print (val)
Output:
For instance, a car can be implemented as an object since a car has some attributes
such as price, color, model, and can perform some functions such as drive car,
change gear, stop car, etc.
Similarly, a fruit can also be implemented as an object since a fruit has a price,
name, and you can eat a fruit, grow a fruit, and perform functions with a fruit.
To create an object, you first have to define a class. For instance, in the following
example, a class Fruit has been defined. The class has two attributes, name and
price, and one method, eat_fruit(). Next, we create an object f of class Fruit, and
then, we call the eat_fruit() method from the f object. We also access the name and
price attributes of the f object and print them on the console.
Script 17:
1. class Fruit:
2.
3. name = “apple”
4. price = 10
5.
6. def eat_fruit(self):
7. print (“Fruit has been eaten” )
8.
9.
10. f = Fruit()
11. f.eat_fruit()
12. print (f.name)
13. print (f.price)
Output:
A class in Python can have a special method called the constructor. The name of
the constructor method in Python is __init__() . The constructor is called whenever
an object of a class is created. Look at the following example to see the constructor
in action.
Script 18:
1. class Fruit:
2.
3. name = “apple”
4. price = 10
5.
6. def __init__(self, fruit_name, fruit_price):
7. Fruit.name = fruit_name
8. Fruit.price = fruit_price
9.
10. def eat_fruit(self):
11. print (“Fruit has been eaten” )
12.
13.
14. f = Fruit(“Orange” , 15)
15. f.eat_fruit()
16. print (f.name)
17. print (f.price)
Output:
2.8.1. NumPy
NumPy is one of the most commonly used libraries for numeric and scientific
computing. NumPy is extremely fast and contains support for multiple
mathematical domains, such as linear algebra, geometry, etc. It is extremely
important to learn NumPy in case if you plan to make a career in data science and
data preparation.
2.8.2. Matplotlib
Matplotlib is the de facto standard for static data visualization in Python, which is
the first step in data science and machine learning. Being the oldest data
visualization library in Python, Matplotlib is the most widely used data
visualization library.
Matplotlib was developed to resemble MATLAB, which is one of the most widely
used programming languages in academia. While Matplotlib graphs are easy to
plot, the look and feel of the Matplotlib plots have a distinct feel of the 1990s.
Many wrapper libraries like Pandas and Seaborn have been developed on top of
Matplotlib. These libraries allow users to plot much cleaner and sophisticated
graphs.
2.8.3. Seaborn
Seaborn library is built on top of the Matplotlib library and contains all the plotting
capabilities of Matplotlib. However, with Seaborn, you can plot much more
pleasing and aesthetic graphs with the help of Seaborn default styles and color
palettes.
2.8.4. Pandas
Pandas library, like Seaborn, is based on the Matplotlib library and offers utilities
that can be used to plot different types of static plots in a single line of codes. With
Pandas, you can import data in various formats such as CSV (Comma Separated
View) and TSV (Tab Separated View) and can plot a variety of data visualizations
via these data sources.
Scikit Learn, also called sklearn, is an extremely useful library for data science and
machine learning in Python. Sklearn contains many built-in modules that can be
used to perform data preparation tasks, such as feature engineering, feature scaling,
outlier detection, discretization, etc. You will be using Sklearn a lot in this book.
Therefore, it can be a good idea to study sklearn before you start coding using this
book.
TensorFlow is one of the frequently used libraries for deep learning. TensorFlow
has been developed by Google and offers an easy to use API for the development of
various deep learning models. TensorFlow is consistently being updated, and at the
time of writing of this book, TensorFlow 2 is the latest major release of
TensorFlow. With TensorFlow, you can not only easily develop deep learning
applications, but you can also deploy them with ease, owing to the deployment
functionalities of TensorFlow.
2.8.7. Keras
You are now familiar with basic Python concepts. In the next section, you will start
working on your first machine learning project, where you will predict house prices
using linear regression in Scikit learn.
Exercise: Chapter 2.1
Question 1
Which iteration should be used when you want to repeatedly execute a code a
specific number of times?
A. For Loop
B. While Loop
C. Both A & B
D. None of the above
Question 2
What is the maximum number of values that a function can return in Python?
A. Single Value
B. Double Value
Question 3
B. Out
C. Not In
D. Both A and C
PROJECT
Machine learning algorithms can be, on the whole, categorized into two types:
Supervised learning and unsupervised learning algorithms.
Supervised machine learning algorithms are those algorithms where the input
dataset and the corresponding output or true prediction is available, and the
algorithms try to find the relationship between inputs and outputs.
In this section, you will see how to predict the median value of house prices in
different towns of Boston, which is a state in America, using a linear regression
algorithm implemented in Python’s Scikit-Learn library. So, let’s begin without
much ado.
Before you can go on and train a linear regression algorithm for house price
prediction, you need to install a few libraries. On your command terminal, execute
the following commands to install the required libraries. You will see the
functionalities of these libraries later in this project.
Script 1:
1. import numpy as np
2. import pandas as pd
3. import matplotlib.pyplot as plt
4. import seaborn as sns
5. from sklearn.model_selection import train_test_split
6. from sklearn.linear_model import LinearRegression
7. from sklearn import metrics
8.
9. %matplotlib inline
The dataset is also available by the name: BostonHousing.csv in the Datasets folder
in GitHub and SharePoint repositories. Download the dataset to your local file
system, and use the read_csv() method of the Pandas library to read the dataset into
a Pandas dataframe, as shown in the following script. The script also prints the first
five rows of the dataset using the head() method.
Script 2:
Output:
Column Description
Name
CRIM The per capita crime rate by town
ZN The proportion of residential land zoned for lots over 25,000 sq.ft.
INDUS The proportion of non-retail business acres per town
CHAS Charles River dummy variable (= 1 if tract bounds river; 0
otherwise)
NOX Nitric oxide concentration (parts per 10 million)
RM The average number of rooms per dwelling
AGE The proportion of owner-occupied units built prior to 1940
DIS The weighted distances to five Boston employment centers
RAD The index of accessibility to radial highways
TAX The full-value property-tax rate per $10,000
PTRATIO The pupil-teacher ratio by town
B The proportion of blacks by town per 1000
LSTAT The lower status of the population
MEDV The median value of owner-occupied homes in $1000s (target)
The MEDV column contains the median value of owner-occupied houses in 1000s,
and this is the value that we will be predicting based on the values in the other
columns using linear regression.
We can also check the number of records in our dataset using the shape attribute.
Script 3:
1. housing_dataset.shape
The output shows that we have 506 records and 14 columns in our dataset.
Output:
(506, 14)
Let’s first plot the correlation between all the columns in our dataset. You can do so
using the corr() function of a dataframe, as shown below:
Script 4:
1. plt.rcParams[“figure.figsize” ] = [8,6]
2. corr = housing_dataset.corr()
3. corr
The output below shows that the MEDV column has the highest positive
correlation of 0.695 with the RM column (average number of rooms per dwelling),
which makes sense since houses with more rooms tend to have higher prices. On
the other hand, the MEDV column has the highest negative correlation with the
LSTATE column, which corresponds to the lower status of the population, which
again makes sense since towns with a large ratio lower status population should
have cheaper houses.
Output:
In addition to plotting a table, you can also plot a heatmap that shows the
correlation between two columns in the form of boxes. To plot a heatmap, you need
to pass the output of the corr() function of the Pandas dataframe to the heatmap()
function of the Seaborn library, as shown below:
Script 5:
1. sns.heatmap(corr)
Output:
In the above heatmap, the lighter the box, the higher will be the positive correlation,
and the darker the box, the higher will be the negative correlation between columns.
Script 6:
1. X = housing_dataset.drop([“medv” ], axis = 1)
2. y = housing_dataset.filter([“medv” ], axis = 1)
Now, if you plot the X dataframe, you will see the feature set, as shown below:
Script 7:
1. X.head()
Output:
Script 8:
1. y.head()
Output:
The dataset is trained via the train set and evaluated on the test set. To split the data
into training and test sets, you can use the train_test_split() function from the
Sklearn library, as shown below. The following script divides the data into 80
percent train set and 20 percent test set since the value for the test_size variable is
set to 0.2.
Script 9:
To implement linear regression with Sklearn, you can use the LinearRegression
class from the sklearn.linear_model module. To train the algorithm, the training and
test sets, i.e., X_train and X_test in our case, are passed to the fit() method of the
object of the LinearRegression class. The test set is passed to the predict() method
of the class to make predictions. The process of training and making predictions
with the linear regression algorithm is as follows:
Script 10:
1. house_predictor = LinearRegression()
2. house_predictor.fit(X_train, y_train)
3. y_pred = house_predictor.predict(X_test)
Mean absolute error (MAE) is calculated by taking the average of absolute error
obtained by subtracting the real values from predicted values. The equation for
calculating MAE is given below:
Mean squared error (MSE) is similar to MAE. However, error for each record is
squared in the case of MSE in order to punish the data record with a huge
difference between the predicted and actual values. The equation to calculate the
mean squared error is as follows:
Root Mean Squared Error is simply the under root of mean squared error and can
be calculated as follows:
The methods used to find the value for these metrics are available in
sklearn.metrics class. The predicted and actual values have to be passed to these
methods, as shown in the output.
Script 11:
Output:
The actual and predicted values for the test set can be plotted side by side using the
following script:
Script 12:
Output:
You can also print the linear regression coefficient of the learned linear regression
algorithm to actually see how the linear regression algorithm is making predictions
on the test set. To print the linear regression coefficients, you can use the coef_
attribute of the linear regression coefficients.
Script 13:
1. print (house_predictor.coef_)
Output:
Script 14:
1. X_test.values[1].shape
From the output below, you can see that this single record has one dimension.
Output:
(13,)
To make predictions on a single record, the feature vector for the record should be
in the form of a row vector. You can covert the feature vector for a single record
into the row vector using the reshape(1,–1) method, as shown below:
Script 15:
1. single_point = X_test.values[1].reshape(1,–1)
2. single_point.shape
The output shows that the shape of the feature has now been updated to a row
vector.
Output:
(1, 13)
To make predictions, you simply have to pass the row feature vector to the predict()
method of the trained linear regressor, as shown below:
Script 16:
1. house_predictor.predict(X_test.values[1].reshape(1,-1))
Output:
array([[36.02556534]])
Let’s now print the actual median value for house price for the feature index 1 of
the test set.
Script 17:
y_test.values[1]
The actual value is 32 thousand, which means that our prediction has an error of an
estimated 4 thousand.
Output:
array([32.4])
You can try other regression algorithms from the Sklearn library located at this link
(https://scikit-learn.org/stable/supervised_learning.html ) and see if you can get a
lesser error.
Which attribute of the Linear Regression class is used to print the linear regression
coefficients of a trained algorithm:
A. reg_coef
B. coefficients
C. coef_
D. None of the above
Question 2:
To make the prediction on a single data point, the data features should be in the
form of a__:
A. column vector
B. row vector
Question 3:
If you have used Gmail, Yahoo, or any other email service, you would have noticed
that some emails are automatically marked as spam by email engines. These spam
email detectors are based on rule-based and statistical machine learning approaches.
Spam email filtering is a text classification task, where based on the text of the
email, we have to classify whether or not an email is a spam email. Supervised
machine learning is commonly used for classification, particularly if the true
outputs are available in the dataset.
The Naïve Bayes Algorithm is one of the supervised machine learning algorithms
that have been proven to be effective for spam email detection. In this project, you
will see how to detect spam emails using the Naïve Bayes algorithm implemented
via Python’s Sklearn library.
To install the libraries required for this project, execute the following pip command
on your command terminals.
Script 1:
1. import numpy as np
2. import pandas as pd
3. import re
4. import nltk
5. import matplotlib.pyplot as plt
6. import seaborn as sns
7. from sklearn.naive_bayes import MultinomialNB
8. from wordcloud import WordCloud
9. %matplotlib inline
The dataset is also available by the name: emails.csv in the Datasets folder in
GitHub and SharePoint repositories. Download the dataset to your local file system
and use the read_csv() method of the Pandas library to read the dataset into a
Pandas dataframe, as shown in the following script. The script also prints the first
five rows of the dataset using the head() method.
Script 2:
Output:
The above output shows that our dataset contains two columns: text and spam. The
text column contains texts of email, and the spam column contains the label 1 or 0,
where 1 corresponds to spam emails and 0 corresponds to non-spam or ham emails.
Script 3:
1. message_dataset.shape
Output:
(5728, 2)
2.4. Data Visualization
Data visualization is always a good step before training a machine learning model.
We will also do that.
Let’s plot a pie chart that shows the distribution of spam and non-spam emails in
our dataset.
Script 4:
1. plt.rcParams[“figure.figsize” ] = [8,10]
2. message_dataset.spam.value_counts().plot(kind=’pie’, autopct=’%1.0f%%’ )
Output:
From the above pie chart, you can see that 24 percent of the emails in our dataset
are spam emails.
Next, we will plot word clouds for the spam and non-spam emails in our dataset.
Word cloud is basically a kind of graph, which shows the most frequently occurring
words in the text. The higher the frequency of occurrence, the larger will be the size
of the word.
But first, we will remove all the stop words, such as “a, is, you, I, are, etc.,” from
our dataset because these words occur quite a lot, and they do not have any
classification ability. The following script removes all the stop words from the
dataset.
Script 5:
The following script filters spam messages from the dataset and then plots word
cloud using spam emails only.
Script 6:
1. message_dataset_spam = message_dataset[message_dataset[“spam” ] == 1]
2.
3. plt.rcParams[“figure.figsize” ] = [8,10]
4. text = ‘ ‘.join(message_dataset_spam[‘text_without_sw’ ])
5. wordcloud2 = WordCloud().generate(text)
6.
7. plt.imshow(wordcloud2)
8. plt.axis(“off” )
9. plt.show()
The output below shows that spam emails mostly contain a subject, and they also
contain terms like money, free, thank, account, program, service, etc.
Output:
Similarly, the following script plots a word cloud for non-spam emails.
Script 7:
1. message_dataset_ham = message_dataset[message_dataset[“spam” ] == 0]
2.
3. plt.rcParams[“figure.figsize” ] = [8,10]
4. text = ‘ ‘.join(message_dataset_ham[‘text_without_sw’ ])
5. wordcloud2 = WordCloud().generate(text)
6.
7. plt.imshow(wordcloud2)
8. plt.axis(“off” )
9. plt.show()
You can see that non-spam emails contain mostly informal words such as thank,
work, etc., time, need, etc.
Output:
2.5. Cleaning the Data
Before training our machine learning model on the training data, we need to
remove the special characters and numbers from our text. Removing special
characters and numbers create empty spaces in the text, which also need to be
removed.
Before cleaning the data, let’s first divide the data into the email text, which forms
the feature set (X), and the email labels (y), which contains information about
whether or not an email is a spam email.
Script 8:
1. X = message_dataset[“text” ]
2.
3. y = message_dataset[“spam” ]
The following script defines a clean_text() method, which accepts a text string and
returns a string that is cleaned of digits, special characters, and multiple empty
spaces.
Script 9:
1. def clean_text(doc):
2.
3.
4. document = re.sub(‘[^a-zA-Z]’ , ‘ ‘ , doc)
5.
6. document = re.sub(r»\s+[a-zA-Z]\s+» , ‘ ‘ , document)
7.
8. document = re.sub(r’\s+’ , ‘ ‘ , document)
9.
10. return document
The following script calls the clean_text() method and preprocesses all the emails
in the dataset.
Script 10:
1. X_sentences = []
2. reviews = list(X)
3. for rev in reviews:
4. X_sentences.append(clean_text(rev))
Script 11:
Once the naive Bayes model is trained on the training set, the test set containing
only email texts is passed as inputs to the model. The model then predicts which of
the emails in the test set are spam. Predicted outputs for the test set are then
compared with the actual label in the test data in order to determine the
performance of the spam email detector naive Bayes model.
The following script divides the data into training and test sets.
Script 12:
To train the machine learning model, you will be using the MultinomialNB() class
from sklearn.naive_bayes module, which is one of the most frequently used
machine learning models for classification. The fit() method of the
MultinomialNB() class is used to train the model.
Script 13:
1. spam_detector = MultinomialNB()
2. spam_detector.fit(X_train, y_train)
Script 14:
1. y_pred = spam_detector.predict(X_test)
Once you have trained a model and have made predictions on the test set, the next
step is to know how well your model has performed for making predictions on the
unknown test set. There are various metrics to evaluate a classification method.
Some of the most commonly used classification metrics are F1, recall, precision,
accuracy, and confusion metric. Before you see the equations for these terms, you
need to understand the concept of true positive, true negative, and false positive and
false negative outputs:
True Negatives: (TN/tn): True negatives are those output labels that are actually
false, and the model also predicted them as false.
True Positive: True positives are those labels that are actually true and also
predicted as true by the model.
False Negative: False negatives are labels that are actually true but predicted as
false by machine learning models.
False Positive: Labels that are actually false but predicted as true by the model are
called false positive.
Confusion Matrix
Precision
Recall
Recall is calculated by dividing true positives by the sum of true positive and false
negative, as shown below:
F1 Measure
F1 measure is simply the harmonic mean of precision and recall and is calculated as
follows:
Accuracy
Accuracy refers to the number of correctly predicted labels divided by the total
number of observations in a dataset.
The choice of using a metric for classification problems depends totally upon you.
However, as a rule of thumb, in the case of balanced datasets, i.e., where the
number of labels for each class is balanced, accuracy can be used as an evaluation
metric. For imbalanced datasets, you can use the F1 measure as the classification
metric.
The methods used to find the value for these metrics are available in
sklearn.metrics class. The predicted and actual values have to be passed to these
methods, as shown in the output.
Script 15:
Output:
The output shows that our model is 97.81 percent accurate while predicting whether
a message is a spam or ham, which is pretty impressive.
Script 16:
1. print (X_sentences[56])
2. print (y[56])
The text of the email is as follows.
Output:
Subject localized software all languages available hello we would like to offer localized software versions
german french spanish uk and many others all listed software is available for immediate download no need
to wait week for cd delivery just few examples norton internet security pro windows xp professional with sp
full version corel draw graphics suite dreamweaver mx homesite includinq macromedia studio mx just
browse our site and find any software you need in your native language best regards kayleen
1
The actual output, i.e., 1, shows that the sentence number 56 in the dataset is 1, i.e.,
spam. The text of the sentence is also shown in the output.
Let’s pass this sentence into our spam detector classifier and see what it thinks:
Script 17:
1. print (spam_detector.predict(vectorizer.transform([X_sentences[56]])))
Output:
[1]
B. min_count
C. min_df
D. None of the above
Question 2:
Which method of the Multinomial NB object is used to train the algorithm on the
input data?
A. train()
B. fit()
C. predict()
D. train_data()
Question 3:
Spam email filtering with naive Bayes algorithm is a type of ___learning problem.
A. Supervised
B. Unsupervised
C. Reinforcement
D. Lazy
PROJECT
In project 1 of this book, you saw how we can predict the sale prices of houses
using linear regression. In this article, you will see how we can use a feedforward
artificial neural network to predict the prices of used cars. The car sale price
prediction problem is a regression problem like house price prediction since the
price of a car is a continuous value.
In this project, you will see how to predict car sale prices using a densely connected
neural network (DNN), which is a type of feedforward neural network. Though you
can implement a densely connected neural network from scratch in Python, in this
project, you will be using the TensorFlow Keras library to implement a feedforward
neural network.
In a neural network, we have an input layer, one or multiple hidden layers, and an
output layer. An example of a neural network is shown below:
In our neural network, we have two nodes in the input layer (since there are two
features in the input), one hidden layer with four nodes, and one output layer with
one node since we are doing binary classification. The number of hidden layers and
the number of neurons per hidden layer depend upon you.
In the above neural network, x1 and x2 are the input features, and ao is the output
of the network. Here, the only thing we can control is the weights w1, w2, w3, …
w12. The idea is to find the values of weights for which the difference between the
predicted output, ao, in this case, and the actual output (labels).
2. BackPropagation
I will explain both these steps in the context of our neural network.
FeedForward
In the feedforward step, the final output of a neural network is created. Let’s try to
find the final output of our neural network.
In our neural network, we will first find the value of zh1, which can be calculated
as follows:
In the same way, you find the values of ah2, ah3, and ah4.
To find the value of zo, you can use the following formula:
Backpropagation
Our weights are divided into two parts. We have weights that connect input features
to the hidden layer and the hidden layer to the output node. We call the weights that
connect the input to the hidden layer collectively as wh (w1, w2, w3 … w8), and
the weights connecting the hidden layer to the output as wo (w9, w10, w11, and
w12).
The backpropagation will consist of two phases. In the first phase, we will find
dcost/dwo (which refers to the derivative of the total cost with respect to wo
(weights in the output layer)). By the chain rule, dcost/dwo can be represented as
the product of dcost/dao * dao/dzo * dzo/dwo (d here refers to derivative).
Mathematically:
In the same way, you find the derivative of cost with respect to bias in the output
layer, i.e., dcost/dbo, which is given as:
Putting 6, 7, and 8 in equation 5, we can get the derivative of cost with respect to
the output weights.
The next step is to find the derivative of cost with respect to hidden layer weights,
wh, and bias, bh. Let’s first find the derivative of cost with respect to hidden layer
weights:
The values of dcost/dao and dao/dzo can be calculated from equations 6 and 7,
respectively. The value of dzo/dah is given as:
Putting the values of equations 6, 7, and 8 in equation 11, you can get the value of
equation 10.
Using equation 10, 12, and 13 in equation 9, you can find the value of dcost/dwh.
2. Neural networks are capable of finding hidden features from data that are oth
not visible to the human eye.
2. It can be slow during training time if you have a large number of layers and
in your neural network.
In the next steps, you will see how we can create a feedforward densely connected
neural network with the TensorFlow Keras library.
You also need to install TensorFlow 2.0 to run the scripts. The instructions to
download TensorFlow 2.0 are available on their official blog.
Script 1:
import pandas as pd
import numpy as np
import tensorflow as tf
print (tf.__version__)
The dataset is also available by the name: car_data.csv in the Datasets folder in the
GitHub and SharePoint repositories. Download the dataset to your local file system,
and use the read_csv() method of the Pandas library to read the dataset into a
Pandas dataframe, as shown in the following script. The following script also prints
the first five rows of the dataset using the head() method.
Script 2:
1. data_path = r”/content/car_data.csv”
2. car_dataset = pd.read_csv(data_path, engine=’python’ )
3. car_dataset.head()
Output:
3.4. Data Visualization and Preprocessing
Let’s first see the percentage of the missing data in all the columns. The following
script does that.
Script 3:
1. car_dataset.isnull().mean()
Output:
Unnamed: 0 0.000000
Name 0.000000
Location 0.000000
Year 0.000000
Kilometers_Driven 0.000000
Fuel_Type 0.000000
Transmission 0.000000
Owner_Type 0.000000
Mileage 0.000332
Engine 0.005981
Power 0.005981
Seats 0.006978
New_Price 0.863100
Price 0.000000
dtype: float64
The output above shows that the Mileage, Engine, Power, Seats, and New_Price
column contains missing values. The highest percentage of missing values is 86.31
percent, which belongs to the New_Price column. We will remove the New_Price
column. Also, the first column, i.e., “Unnamed: 0” doesn’t convey any useful
information. Therefore, we will delete that column, too. The following script
deletes these two columns.
Script 4:
Script 5:
1. plt.rcParams[“figure.figsize” ] = [8, 6]
2. sns.heatmap(car_dataset.corr())
Output:
The output shows that there is a very slight positive correlation between the Year
and the Price columns, which makes sense as newer cars are normally expensive
compared to older cars.
Let’s now plot a histogram for the Price to see the price distribution.
Script 6:
1. sns.distplot(car_dataset[‘Price’ ])
Output:
The output shows that most of the cars are priced between 2.5 to 7.5 hundred
thousand. Remember, the unit of the price mentioned in the price column is one
hundred thousand.
Let’s first see the number of unique values in different columns of the dataset.
Script 7:
1. car_dataset.nunique()
Output:
Name 1876
Location 11
Year 22
Kilometers_Driven 3093
Fuel_Type 5
Transmission 2
Owner_Type 4
Mileage 442
Engine 146
Power 372
Seats 9
Price 1373
dtype: int64
Script 8:
1. print (car_dataset.dtypes)
Output:
Name object
Location object
Year int64
Kilometers_Driven int64
Fuel_Type object
Transmission object
Owner_Type object
Mileage object
Engine object
Power object
Seats float64
Price float64
dtype: object
From the above output, the columns with object type are the categorical columns.
We need to convert these columns into a numeric type.
Also, the number of unique values in the Name column is too large. Therefore, it
might not convey any information for classification. Hence, we will remove the
Name column from our dataset.
We will follow a step by step approach. First, we will separate numerical columns
from categorical columns. Then, we will convert categorical columns into one-hot
categorical columns, and, finally, we will merge the one-hot encoded columns with
the original numerical columns. The process of one-hot encoding is explained in a
later section.
Script 9:
In the following output, you can see only the numerical columns in our dataset.
Output
Next, we will create a dataframe of categorical columns only by filtering all the
categorical columns (except Name, since we want to drop it) from the dataset. Look
at the following script for reference.
Script 10:
Output:
However, it can be noted that we do not really need two columns. A single column,
i.e., Transmission_Manual is enough since when the Transmission is Manual, we
can add 1 in the Transmission_Manual column, else 0 can be added in that column.
Hence, we actually need N-1 one-hot encoded columns for all the N unique values
in the original column.
The following script converts categorical columns into one-hot encoded columns
using the pd.get_dummies() method.
Script 11:
Output:
Finally, the following script concatenates the numerical columns with one-hot
encoded columns to create a final dataset.
Script 12:
Output:
Before dividing the data into training and test sets, we will again check if our data
contains null values.
Script 13:
1. complete_dataset.isnull().mean()
Output:
Year 0.000000
Kilometers_Driven 0.000000
Seats 0.006978
Price 0.000000
Location_Bangalore 0.000000
…
Power_98.82 bhp 0.000000
Power_98.96 bhp 0.000000
Power_99 bhp 0.000000
Power_99.6 bhp 0.000000
Power_null bhp 0.000000
Length: 979, dtype: float64
Now, instead of removing columns, we can remove the rows that contain any null
values. To do so, execute the following script:
Script 14:
1. complete_dataset.dropna(inplace = True)
Before we train our neural network, we need to divide the data into training and test
sets, as we did for project 1 and project 2.
Script 15:
1. X = complete_dataset.drop([‘Price’ ], axis=1)
2. y = complete_dataset[‘Price’ ]
Like traditional machine learning algorithms, neural networks are trained on the
training set and are evaluated on the test set. Therefore, we need to divide our
dataset into the training and test sets, as shown below:
Script 16:
To train neural networks, it is always a good approach to scale your feature set. The
following script can be used for feature scaling of training and test features.
Script 17:
3.7. Creating and Training Neural Network Model with Tensor Flow
Keras
Now, we are ready to create our neural network in TensorFlow Keras. First, import
the following modules and classes.
Script 18:
The following script describes our neural network. To train a feedforward neural
network on tabular data using Keras, you have to first define the input layer using
the Input class. The shape of the input in case of tabular data, such as the one we
have, should be (Number of Features). The shape is specified by the shape attribute
of the Input class.
Next, you can add as many dense layers as you want. In the following script, we
add six dense layers with 100, 50, 25, 10, 5, and 2 nodes. Each dense layer uses the
relu activation function. The input to the first dense layer is the output from the
input layer. The input to each layer is specified in a round bracket that follows the
layer name. The output layer in the following script also consists of a dense layer
but with 1 node since we are predicting a single value.
Script 19:
1. input_layer = Input(shape=(X.shape[1],))
2. dense_layer0 = Dense(100, activation=’relu’ )(input_layer)
3. dense_layer1 = Dense(50, activation=’relu’ )(dense_layer0)
4. dense_layer2 = Dense(25, activation=’relu’ )(dense_layer1)
5. dense_layer3 = Dense(10, activation=’relu’ )(dense_layer2)
6. dense_layer4 = Dense(5, activation=’relu’ )(dense_layer3)
7. dense_layer5 = Dense(2, activation=’relu’ )(dense_layer4)
8. output = Dense(1)(dense_layer5)
The previous script described the layers. Now is the time to develop the model. To
create a neural network model, you can use the Model class from
tensorflow.keras.models module, as shown in the following script. The input layer
is passed to the inputs attribute, while the output layer is passed to the outputs
module.
To compile the model, you need to call the compile() method of the model and then
specify the loss function, the optimizer, and the metrics. Our loss function is
“mean_absolute_error,” “optimizer” is adam, and metrics is also the mean absolute
error since we are evaluating a regression problem. To study more about Keras
optimizers, check this link: https://keras.io/api/optimizers/ . And to study more
about loss functions, check this link: https://keras.io/api/losses/ .
Script 20:
You can also plot and see how your model looks using the following script:
Script 21:
1. from keras.utils import plot_model
2. plot_model(model, to_file=’model_plot.png’, show_shapes=True, show_layer_names=True)
You can see all the layers and the number of inputs and outputs from the layers, as
shown below:
Output:
Finally, to train the model, you need to call the fit() method of model class and pass
it your training features and test features. Twenty percent of the data from the
training set will be used as validation data, while the algorithm will be trained five
times on the complete dataset five, as shown by the epochs attribute. The batch size
will also be 5.
Script 22:
Output:
3.8. Evaluating the Performance of a Neural Network Model
After the model is trained, the next step is to evaluate model performance. There
are several ways to do that. One of the ways is to plot the training and test loss, as
shown below:
Script 23:
1. plt.plot(history.history[‘loss’ ])
2. plt.plot(history.history[‘val_loss’ ])
3.
4. plt.title(‘loss’ )
5. plt.ylabel(‘loss’ )
6. plt.xlabel(‘epoch’ )
7. plt.legend([‘train’ ,’test’ ], loc=’upper left’ )
8. plt.show()
Output:
The above output shows that while the training loss keeps decreasing till the fifth
epoch, the test or validation loss shows fluctuation after the second epoch, which
shows that our model is slightly overfitting.
Another way to evaluate is to make predictions on the test set and then use
regression metrics such as MAE, MSE, and RMSE to evaluate model performance.
To make predictions, you can use the predict() method of the model class and pass
it the test set, as shown below:
Script 24:
1. y_pred = model.predict(X_test)
The following script calculates the values for MAE, MSE, and RMSE on the test
set.
Script 25:
1. from sklearn import metrics
2.
3. print (‘Mean Absolute Error:’ , metrics.mean_absolute_error(y_test, y_pred))
4. print (‘Mean Squared Error:’ , metrics.mean_squared_error(y_test, y_pred))
5. print (‘Root Mean Squared Error:’ , np.sqrt(metrics.mean_squared_error(y_test, y_pred)))
Output:
The above output shows that we have a mean error of 1.86. The mean of the Price
column can be calculated as follows:
Script 26:
1. car_dataset[‘Price’ ].mean()
Output:
9.479468350224273
We can find the mean percentage error by dividing MAE by the average of the
Price column, i.e., 1.86/9.47 = 0.196. The value shows that, on average, for all the
cars in the test set, the prices predicted by our feedforward neural network and the
actual prices differ by 19.6 percent.
You can plot the actual and predicted prices side by side, as follows:
Script 27:
Output:
3.9. Making Predictions on a Single Data Point
In this section, you will see how to make predictions for a single car price. Let’s
print the shape of the feature vector or record at the first index in the test set.
Script 28:
1. X_test[1].shape
From the output below, you can see that this single record has one dimension.
Output:
(978,)
As we did in project 1, to make predictions on a single record, the feature vector for
the record should be in the form of a row vector. You can covert the feature vector
for a single record into the row vector using the reshape(1,–1) method, as shown
below:
Script 29:
1. single_point = X_test[1].reshape(1,-1)
2. single_point.shape
Output:
(1, 978)
The output shows that the shape of the feature has now been updated to a row
vector.
To make predictions, you simply have to pass the row feature vector to the predict()
method of the trained neural network model, as shown below:
Script 30:
1. model.predict(X_test[1].reshape(1,-1))
Output:
array([[5.0670004]], dtype=float32)
y_test.values[1 ]
Output:
5.08
The actual output is 5.08 hundred thousand, which is very close to the 5.06 hundred
thousand predicted by our model. You can take any other record from the test set,
make a prediction on that using the trained neural network, and see how close you
get.
In a neural network with three input features, one hidden layer of 5 nodes, and an
output layer with three possible values, what will be the dimensions of weight that
connects the input to the hidden layer? Remember, the dimensions of the input data
are (m,3), where m is the number of records.
A. [5,3]
B. [3,5]
C. [4,5]
D. [5,4]
Question 2:
Which of the following loss functions can you use in case of regression problems?
A. Sigmoid
Question 3:
B. Non-linear boundaries
This is where a Recurrent Neural Network and LSTM come into play.
These neural networks are capable of making future predictions based on
previous records.
In this project, you will see how to predict one-month future stock prices
for Facebook, based on the previous five years of data. But before that, a
brief description of Recurrent Neural Networks and LSTM is presented in
the next section.
What Is an RNN?
A problem with the recurrent neural network is that while it can capture a
shorter sequence, it tends to forget longer sequences.
RNN can easily guess that the missing word is “Clouds” here.
To solve this problem, a special type of recurrent neural network, i.e., Long
Short-Term Memory (LSTM), has been developed.
What Is an LSTM?
In LSTM, instead of a single unit in the recurrent cell, there are four
interacting units, i.e., a forget gate, an input gate, an update gate, and an
output gate. The overall architecture of an LSTM cell is shown in the
following figure:
The cell state contains data from all the previous cells in the sequence. The
LSTM is capable of adding or removing information to a cell state. In other
words, LSTM tells the cell state which part of the previous information to
remember and which information to forget.
Forget Gate
The forget gate basically tells the cell state which information to retain from
the information in the previous step and which information to forget. The
working and calculation formula for the forget gate is as follows:
Input Gate
Update Gate
The forget gate tells us what to forget, and the input gate tells us what to
add to the cell state. The next step is to actually perform these two
operations. The update gate is basically used to perform these two
operations. The functioning and the equations for the update gate are as
follows:
Output Gate
Finally, you have the output gate, which outputs the hidden state and the
output just like a common recurrent neural network. The additional output
from an LSTM node is the cell state, which runs between all the nodes in a
sequence. The equations and the functioning of the output gate are depicted
by the following figure:
In the following sections, you will see how to use an LSTM for solving
different types of sequence problems.
The test data will consist of the opening stock prices of the Facebook
company for the month of January 2020. The training file fb_train.csv and
the test file fb_test.csv are also available in the Datasets folder in the
GitHub and SharePoint repositories. Let’s begin with the coding now.
In this section, we will train our stock prediction model on the training set.
Before you train the stock market prediction model, upload the TensorFlow
version by executing the following command on Google collaborator
(https://colab.research.google.com/ ).
If your files are placed on Google drive, and you want to access them in
Google Collaborator, to do so, you have to first mount the Google drive
inside your Google Collaborator environment via the following script:
Script 1:
Script 2:
1. # importing libraries
2. import pandas as pd
3. import numpy as np
4.
5. #importing dataset
6. fb_complete_data = pd.read_csv(“/gdrive/My Drive/datasets/fb_train.csv” )
Running the following script will print the first five rows of the dataset.
Script 3:
Output:
The output shows that our dataset consists of seven columns. However, in
this section, we are only interested in the Open column. Therefore, we will
select the Open column from the dataset. Run the following script to do so.
Script 4:
Script 5:
1. #scaling features
2. from sklearn.preprocessing import MinMaxScaler
3. scaler = MinMaxScaler(feature_range = (0, 1))
4.
5. fb_training_scaled = scaler.fit_transform(fb_training_processed)
If you check the total length of the dataset, you will see it has 1,257 records,
as shown below:
Script 6:
1. len(fb_training_scaled)
Output:
1257
Before we move forward, we need to divide our data into features and
labels. Our feature set will consist of 60 timesteps of 1 feature. The feature
set basically consists of the opening stock price of the past 60 days, while
the label set will consist of the opening stock price of the 61st day. Based on
the opening stock prices of the previous days, we will be able to predict the
opening stock price for the next day.
Script 7:
Script 8:
Script 9:
1. print (X_train.shape)
2. print (y_train.shape)
Output:
(1197, 60)
(1197,)
Script 10:
The following script creates our LSTM model. We have four LSTM layers
with 100 nodes each. Each LSTM layer is followed by a dropout layer to
avoid overfitting. The final dense layer has one node since the output is a
single value.
Script 11:
1. #importing libraries
2. import numpy as np
3. import matplotlib.pyplot as plt
4. from tensorflow.keras.layers import Input, Activation, Dense, Flatten, Dropout, Flatten,
LSTM
5. from tensorflow.keras.models import Model
Script 12:
Script 13:
1. print (X_train.shape)
2. print (y_train.shape)
3. y_train= y_train.reshape(-1,1)
4. print (y_train.shape)
Output:
(1197, 60, 1)
(1197,)
(1197, 1)
The following script trains our stock price prediction model on the training
set.
Script 14:
You can see the results for the last five epochs in the output.
Output:
Epoch 96/100
38/38 [==============================] - 11s 299ms/step - loss: 0.0018
Epoch 97/100
38/38 [==============================] - 11s 294ms/step - loss: 0.0019
Epoch 98/100
38/38 [==============================] - 11s 299ms/step - loss: 0.0018
Epoch 99/100
38/38 [==============================] - 12s 304ms/step - loss: 0.0018
Epoch 100/100
38/38 [==============================] - 11s 299ms/step - loss: 0.0021
Our model has been trained; next, we will test our stock prediction model
on the test data.
The test data should also be converted into the right shape to test our stock
prediction model. We will do that later. Let’s first import the data and then
remove all the columns from the test data except the Open column.
Script 15:
Script 16:
Script 17:
You can see that the length of the input data is 80. Here, the first 60 records
are the last 60 records from the training data, and the last 20 records are the
20 records from the test file.
Output:
(80,)
Script 18:
1. test_inputs = test_inputs.reshape(-1,1)
2. test_inputs = scaler.transform(test_inputs)
3. print (test_inputs.shape)
Output:
(80, 1)
As we did with the training data, we need to divide our input data into
features and labels. Here is the script that does that.
Script 19:
1. fb_test_features = []
2. for i in range(60, 80):
3. fb_test_features.append(test_inputs[i-60:i, 0])
Script 20:
1. X_test = np.array(fb_test_features)
2. print (X_test.shape)
Output:
(20, 60)
Script 21:
Output:
(20, 60, 1)
Now is the time to make predictions on the test set. The following script
does that:
Script 22:
Script 23:
Finally, to compare the predicted output with the actual stock price values,
you can plot the two values via the following script:
Script 24:
Output:
The output shows that our algorithm has been able to partially capture the
trend of the future opening stock prices for Facebook data.
Exercise 4.1
Question 1:
The shape of the feature set passed to the LSTM’s input layer should be:
A. Number of Records, Features, Timesteps
Question 2:
B. Diminishing Gradient
C. Low Gradient
D. None of the above
Question 3:
In project 4 of this book, you saw how an LSTM can be used for predicting
stock prices. In this project, you will see how a combination of two LSTM
networks can be used to create models capable of translating sentences from
one language to another.
In this project, you will see an application of the Seq2Seq model for text
translation. So, let’s begin with much ado.
For instance, if you look at the decoder input, in the first step, the input is
always <s>. The decoder output at the first timestep is the ground truth
translated output word. For instance, the first output word is “Je” in the above
example. In the second step, the input to the decoder is the hidden and cell
states from the previous step plus the first actual word in the output sentence,
i.e., “Je.” This process where the ground truth value of the previous output is
fed as input to the next timestep is called teacher forcing. All the sentences are
ended with an end of sentence token to stop the decoder from making
predictions when an end of sentence tag is encountered, which is </s> in the
above diagram.
Let’s code the above training model. The first step, as always, is to import the
libraries.
Script 1:
Next, we need to define a few configurations for our LSTM based encoder and
decoder models, as well as for the word2vec based embedding layers.
Script 2:
1. BATCH_SIZE = 64
2. NUM_EPOCHS = 20
3. LSTM_NODES =512
4. TOTAL_SENTENCES = 20000
5. MAX_SEN_LENGTH = 50
6. MAX_NUM_WORDS = 20000
7. EMBEDDING_SIZE = 100
Since the script in this project is run using Google Collaboratory, the datasets
are uploaded to Google Drive and then imported into the application. To
import datasets from Google drive to Google Collaboratory, run the following
script:
Script 3:
Go to the link and then download the fra-eng.zip file. Unzip the file, and you
should see the fra.txt file. This file contains our dataset. The dataset is also
available by the name: fra.txt in the Datasets folder in the GitHub and
SharePoint repositories. The first 10 lines of the file look like this:
1. Go. Va ! CC-BY 2.0 (France) Attribution: tatoeba.org #2877272 (CM) & #1158250 (Wittydev)
2. Hi. Salut ! CC-BY 2.0 (France) Attribution: tatoeba.org #538123 (CM) & #509819 (Aiji)
3. Hi. Salut. CC-BY 2.0 (France) Attribution: tatoeba.org #538123 (CM) & #4320462 (gillux)
4. Run! Cours ! CC-BY 2.0 (France) Attribution: tatoeba.org #906328 (papabear) & #906331
(sacredceltic)
5. Run! Courez ! CC-BY 2.0 (France) Attribution: tatoeba.org #906328 (papabear) & #906332
(sacredceltic)
6. Who? Qui ? CC-BY 2.0 (France) Attribution: tatoeba.org #2083030 (CK) & #4366796 (gillux)
7. Wow! ? a alors ! CC-BY 2.0 (France) Attribution: tatoeba.org #52027 (Zifre) & #374631 (zmoo)
8. Fire! Au feu ! CC-BY 2.0 (France) Attribution: tatoeba.org #1829639 (Spamster) & #4627939
(sacredceltic)
9. Help! ? l’aide ! CC-BY 2.0 (France) Attribution: tatoeba.org #435084 (lukaszpp) & #128430 (sysk
10. Jump. Saute. CC-BY 2.0 (France) Attribution: tatoeba.org #631038 (Shishir) & #2416938 (Phoeni
Each line in the fra.txt file contains a sentence in English, followed by a tab
and then the translation of the English sentence in French, again a tab, and
then the attribute.
We are only interested in the English and French sentences. The following
script creates three lists. The first list contains all the English sentences, which
serve as encoder input.
The second list contains the decoder input sentences in French, where the
offset <sos> is prefixed before all the sentences.
Finally, the third list contains decoder outputs where <eos> is appended at the
end of each sentence in French.
Script 4:
1. input_english_sentences = []
2. output_french_sentences = []
3. output_french_sentences_inputs = []
4.
5. count = 0
6. for line in open(r’/gdrive/My Drive/datasets/fra.txt’, encoding=”utf-8” ):
7. count += 1
8.
9. if count > TOTAL_SENTENCES:
10. break
11.
12. if ‘\t’ not in line:
13. continue
14.
15. input_sentence = line.rstrip().split(‘\t’ )[0]
16.
17. output = line.rstrip().split(‘\t’ )[1]
18.
19.
20. output_sentence = output + ‘ <eos>’
21. output_sentence_input = ‘<sos> ‘ + output
22.
23. input_english_sentences.append(input_sentence)
24. output_french_sentences.append(output_sentence)
25. output_french_sentences_inputs.append(output_sentence_input)
Let’s see how many total English and French sentences we have in our dataset:
Script 5:
Output:
Let’s randomly print a sentence in English and its French translation (both the
decoder input and the decoder output).
Script 6:
1. print (input_english_sentences[175])
2. print (output_french_sentences[175])
3. print (output_french_sentences_inputs[175])
Output:
I’m shy.
Je suis timide. <eos>
<sos> Je suis timide.
You can see that the sentence at index 175 is “I’m shy.” In the decoder input,
the translated sentence contains <sos> tag at the beginning, while the output
contains an <eos> tag.
Script 7:
1. input_eng_tokenizer = Tokenizer(num_words=MAX_NUM_WORDS)
2. input_eng_tokenizer.fit_on_texts(input_english_sentences)
3. input_eng_integer_seq = input_eng_tokenizer.texts_to_sequences(input_english_sentences)
4.
5. word2idx_eng_inputs = input_eng_tokenizer.word_index
6. print (‘Sum of unique words in English sentences: %s’ % len(word2idx_eng_inputs))
7.
8. max_input_len = max(len(sen) for sen in input_eng_integer_seq)
9. print (“Length of longest sentence in English sentences: %g” % max_input_len)
Output:
Output:
Next, we need to pad our input and output sequences so that they can have the
same length. The following script applies padding to the input sequences for
the encoder.
Script 9:
Since the maximum length of an English sentence is 6, you can see that the
shape of the encoder input sentence is (20000, 6), which means that all
sentences have now become of equal length of 6. For instance, if you print the
padded version for the sentence at index 175, you see [0, 0, 0, 0, 6, 307]. Since
the actual sentence is “I’m shy,” we can print the index for these words and see
the indexes (6, 37) match the indexes in the padded sequence for the sentence
at index 175.
Output:
encoder_input_eng_sequences.shape: (20000, 6)
encoder_input_eng_sequences[175]: [ 0 0 0 0 6 307]
6
307
Similarly, the following script applies padding to the decoder input French
sentences.
Script 10:
1. decoder_input_french_sequences = pad_sequences(output_input_french_integer_seq,
maxlen=max_out_len, padding=’post’ )
2. print (“decoder_input_french_sequences.shape:” , decoder_input_french_sequences.shape)
3. print (“decoder_input_french_sequences[175]:” , decoder_input_french_sequences[175])
4.
5. print (word2idx_french_outputs[“<sos>” ])
6. print (word2idx_french_outputs[“je” ])
7. print (word2idx_french_outputs[“suis” ])
8. print (word2idx_french_outputs[“timide.” ])
Output:
And the following script applies padding to the decoder output French
sentences.
Script 11:
decoder_output_french_sequences = pad_sequences(output_french_integer_seq,
maxlen=max_out_len, padding=’post’ )
The next step is to create word embeddings for the input and output sentences.
Word embeddings are used to convert a word into numerical vectors since
deep learning algorithms work with numbers only. For the input sentences, we
can use the Glove word embeddings since the sentences are English. You can
download the Glove word embeddings from Stanford Glove
(https://stanford.io/2MJW98X ).
The following scripts create the embedding dictionary for the Glove word
vectors.
Script 12:
And the following script creates an embedding matrix that will be used in the
embedding layer to the encoder LSTM.
Script 13:
The following script creates an embedding layer for the encoder LSTM.
Script 14:
The next step is to create the decoder embedding layer. The first step is to
create an empty embedding matrix of the shape (number of the output
sentence, length of the longest sentence in the output, total number of unique
words in the output). The following script does that.
Script 15:
1. decoder_one_hot_targets = np.zeros((
2. len(input_english_sentences),
3. max_out_len,
4. num_words_output
5. ),
6. dtype=’float32’
7. )
Script 16:
decoder_one_hot_targets.shape
Output:
(20000, 13, 9533)
The next step is to add one at those indexes in the decoder embedding matrix
where a word exists in the original decoder input and output sequences.
Script 17:
1. for i, d in enumerate(decoder_output_french_sequences):
2. for t, word in enumerate(d):
3. decoder_one_hot_targets[i, t, word] = 1
Script 18:
1. encoder_inputs_eng_placeholder = Input(shape=(max_input_len,))
2. x = embedding_layer(encoder_inputs_eng_placeholder)
3. encoder = LSTM(LSTM_NODES, return_state=True)
4.
5. encoder_outputs, h, c = encoder(x)
6. encoder_states = [h, c]
And the following script creates the decoder model. You can see that in the
decoder model, a custom embedding layer is being used.
Script 19:
1. decoder_inputs_french_placeholder = Input(shape=(max_out_len,))
2.
3. decoder_embedding = Embedding(num_words_output, LSTM_NODES)
4. decoder_inputs_x = decoder_embedding(decoder_inputs_french_placeholder)
5.
6. decoder_lstm = LSTM(LSTM_NODES, return_sequences=True, return_state=True)
7. decoder_outputs, _, _ = decoder_lstm(decoder_inputs_x, initial_state=encoder_states)
8.
9. ###
10.
11. decoder_dense = Dense(num_words_output, activation=’softmax’ )
12. decoder_outputs = decoder_dense(decoder_outputs)
The following script creates the complete training model for our seq2seq
model.
Script 20:
1. model = Model([encoder_inputs_eng_placeholder,
2. decoder_inputs_french_placeholder], decoder_outputs)
3. model.compile(
4. optimizer=’rmsprop’ ,
5. loss=’categorical_crossentropy’,
6. metrics=[‘accuracy’ ]
7. )
Script 21:
Output:
Finally, the following script trains the model.
Script 22:
1. r = model.fit(
2. [encoder_input_eng_sequences, decoder_input_french_sequences],
3. decoder_one_hot_targets,
4. batch_size=BATCH_SIZE,
5. epochs=NUM_EPOCHS,
6. validation_split=0.1,
7. )
Output:
Epoch 16/20
18000/18000 [==============================] - 23s 1ms/step - loss: 0.4830 - accuracy:
0.9182 - val_loss: 1.4919 - val_accuracy: 0.7976
Epoch 17/20
18000/18000 [==============================] - 23s 1ms/step - loss: 0.4730 - accuracy:
0.9202 - val_loss: 1.5083 - val_accuracy: 0.7962
Epoch 18/20
18000/18000 [==============================] - 23s 1ms/step - loss: 0.4616 - accuracy:
0.9219 - val_loss: 1.5127 - val_accuracy: 0.7963
Epoch 19/20
18000/18000 [==============================] - 22s 1ms/step - loss: 0.4515 - accuracy:
0.9235 - val_loss: 1.5249 - val_accuracy: 0.7963
Epoch 20/20
18000/18000 [==============================] - 23s 1ms/step - loss: 0.4407 - accuracy:
0.9250 - val_loss: 1.5303 - val_accuracy: 0.7967
At the second timestep, the input to the decoder is the hidden state and cell
state from the first decoder timestep, and the output from the first decoder
timestep, which is “Je.” The process continues until the decoder predicts
<eos>, which corresponds to the end of the sentence.
The following script implements the model for making predictions for
translating text from English to French using the seq2seq model.
Script 23:
Script 24:
Output:
The prediction model makes predictions in the form of integers. You will need
to convert the integers back to text. The following script creates an index to
word dictionaries for both the input and output sentences.
Script 25:
Script 26:
Now is the time to make predictions. The following script randomly chooses
an input sentence from the list of input sentence sequences. The sentence
sequence is passed to the “perform_translation()” method, which returns the
translated sentence in French.
Script 27:
1. random_sentence_index = np.random.choice(len(input_english_sentences))
2. input_eng_seq =
encoder_input_eng_sequences[random_sentence_index:random_sentence_index+1]
3. translation = perform_translation(input_eng_seq)
4. print (‘ -’ )
5. print (‘Input Sentence:’ , input_english_sentences[random_sentence_index])
6. print (‘Translated Sentence:’ , translation)
The output shows that the sentence chosen randomly by our script is “You
need sleep,” which has been successfully translated into “vous avez besoin de
sommeil” in French.
Output:
This process where the ground truth value of the previous output is fed as input to
the next timestep is called teacher forcing:
A. Truth Labeling
B. Input Labeling
C. Input Forcing
D. Teacher Forcing
Question 2:
In the seq2seq model, the input to the node in the decoder layer is:
A. Hidden state from the encoder
Question 3:
C. Both A and B
D. None of the above
PROJECT
In the previous three projects, you studied different feedforward densely connected
neural networks and recurrent neural networks. In this project, you will study the
Convolutional Neural Network (CNN).
You will see how you can use convolutional neural networks (CNN) to classify
cats’ and dogs’ images. Before you see the actual code, let’s first briefly discuss
what convolutional neural networks are.
A combination of these images then forms the complete image, which can then be
classified using a densely connected neural network. The steps involved in a
Convolutional Neural Network have been explained in the next section.
Here, the box on the leftmost is what humans see. They see a smiling face.
However, a computer sees it in the form of pixel values of 0s and 1s, as shown on
the right-hand side. Here, 0 indicates a white pixel, whereas 1 indicates a black
pixel. In the real world, 1 indicates a white pixel, while 0 indicates a black pixel.
Now, we know how a computer sees images, the next step is to explain the steps
involved in the image classification using a convolutional neural network.
The following are the steps involved in image classification with CNN:
1. The Convolution Operation
The convolution operation is the first step involved in the image classification with
a convolutional neural network.
In convolution operation, you have an image and a feature detector. The values of
the feature detector are initialized randomly. The feature detector is moved over the
image from left to right. The values in the feature detector are multiplied by the
corresponding values in the image, and then all the values in the feature detector are
added. The resultant value is added to the feature map.
In the above figure, we have an input image of 7 x 7. The feature detector is of size
3 x 3. The feature detector is placed over the image at the top left of the input
image, and then the pixel values in the feature detector are multiplied by the pixel
values in the input image. The result is then added. The feature detector then moves
to the N step toward the right. Here, N refers to stride. A stride is basically the
number of steps that a feature detector takes from left to right and then from top to
bottom to find a new value for the feature map.
In reality, there are multiple feature detectors, as shown in the following image:
Each feature detector is responsible for detecting a particular feature in the image.
In the ReLu operation, you simply apply the ReLu activation function on the
feature map generated as a result of the convolution operation. Convolution
operation gives us linear values. The ReLu operation is performed to introduce non-
linearity in the image.
In the ReLu operation, all the negative values in a feature map are replaced by 0.
All the positive values are left untouched.
When the ReLu function is applied on the feature map, the resultant feature map
looks like this:
The Pooling Operation
Let’s first understand what spatial invariance is. If you look at the following three
images, you can easily identify that these images contain cheetahs.
The second image is disoriented, and the third image is distorted. However, we are
still able to identify that all three images contain cheetahs based on certain features.
Pooling does exactly that. In pooling, we have a feature map and then a pooling
filter, which can be of any size. Next, we move the pooling filter over the feature
map and apply the pooling operation. There can be many pooling operations, such
as max pooling, min pooling, and average pooling. In max pooling, we choose the
maximum value from the pooling filter. Pooling not only introduces spatial
invariance but also reduces the size of an image.
Look at the following image. Here, in the 3rd and 4th rows and 1st and 2nd columns,
we have four values 1, 0, 1, and 4. When we apply max pooling on these four
pixels, the maximum value will be chosen, i.e., you can see 4 in the pooled feature
map.
For finding more features from an image, the pooled feature maps are flattened to
form a one-dimensional vector, as shown in the following figure:
The one-dimensional vector is then used as input to the densely or fully connected
neural network layer that you saw in project 3. This is shown in the following
image:
6.2. Cats and Dogs Image Classification with a CNN
In this section, we will move forward with the implementation of the convolutional
neural network in Python. We know that a convolutional neural network can learn
to identify the related features on a 2D map, such as images. In this project, we will
solve the image classification task with CNN. Given a set of images, the task is to
predict whether an image contains a cat or a dog.
The dataset for this project consists of images of cats and dogs. The dataset can be
downloaded directly from this Kaggle Link (https://www.kaggle.com/c/dogs-vs-
cats ).
The dataset is also available inside the Animal Datasets, which is located inside the
Datasets folder in the GitHub and SharePoint repositories. The original dataset
consists of 2,500 images. But the dataset that we are going to use will be smaller
and will consist of 10,000 images. Out of 10,000 images, 8,000 images are used for
training, while 2,000 images are used for testing. The training set consists of 4,000
images of cats and 4,000 images of dogs. The test set also contains an equal number
of images of cats and dogs.
Script 1:
The image dataset in this project is uploaded to Google Drive so that it can be
accessed easily by the Google collaborator environment. The following script will
mount your Google Drive in your Google Collaborator environment.
Script 2:
Script 3:
1. from tensorflow.keras.models import Sequential
2. from tensorflow.keras.layers import Conv2D
3. from tensorflow.keras.layers import MaxPooling2D
4. from tensorflow.keras.layers import Flatten
5. from tensorflow.keras.layers import Dense
In the previous two projects, we used the Keras Functional API to create the
TensorFlow Keras model. The Functional API is good when you have to develop
complex deep learning models. For simpler deep learning models, you can use
Sequential API, as well. In this project, we will build our CNN model using
sequential API.
To create a sequential model, you have to first create an object of the Sequential
class from the tensorflow.keras.models module.
Script 4:
1. cnn_model = Sequential()
Next, you can create layers and add them to the Sequential model object that you
just created.
The following script adds a convolution layer with 32 filters of shape 3 x 3 to the
sequential model. Notice that the input shape size here is 64, 64, 3. This is because
we will resize our images to a pixel size of 64 x 64 before training. The dimension
3 is added because a color image has three channels, i.e., red, green, and blue
(RGB).
Script 5:
1. conv_layer1 = Conv2D (32, (3, 3), input_shape = (64, 64, 3), activation = ‘relu’ )
2. cnn_model.add(conv_layer1)
Next, we will create a pooling layer of size 2, 2, and add it to our sequential CNN
model, as shown below.
Script 6:
1. pool_layer1 = MaxPooling2D(pool_size = (2, 2))
2. cnn_model.add(pool_layer1)
Let’s add one more convolution and one more pooling layer to our sequential
model. Look at scripts 7 and 8 for reference.
Script 7:
1. conv_layer2 = Conv2D (32, (3, 3), input_shape = (64, 64, 3), activation = ‘relu’ )
2. cnn_model.add(conv_layer2)
Script 8:
You can add more convolutional and sequential layers if you want.
As you studied in the theory section, the convolutional and pooling layers are
followed by dense layers. To connect the output of convolutional and pooling layers
to dense layers, you need to flatten the output first using the Flatten layer, as shown
below.
Script 9:
1. flatten_layer = Flatten()
2. cnn_model.add(flatten_layer )
We add two dense layers to our model. The first layer will have 128 neurons, and
the second dense layer, which will also be the output layer, will consist of 1 neuron
since we are predicting a single value. Scripts 10 and 11, shown below, add the
final two dense layers to our model.
Script 10:
As we did in the previous project, before training a model, we need to compile it.
To do so, you can use the compile model, as shown below. The optimizer we used
is adam, whereas the loss function is binary_cross entropy since we have only two
possible outputs, i.e., whether an image can be a cat or a dog.
And since this is a classification problem, the performance metric has been set to
‘accuracy’.
Script 12:
Script 13:
Output:
6.2.2. Image Augmentation
To improve the image and to increase the image uniformity, you can apply several
preprocessing steps to an image. To do so, you can use the ImageDataGenerator
class from the tensorflow.keras.preprocessing.image module. The following script
applies feature scaling to the training and test images by dividing each pixel value
by 255. Next, a shear value and zoom range of 0.2 is also added to the image.
Finally, all the images are flipped horizontally.
Script 14:
And the following script applies image augmentation to the test set. Note that we
only apply feature scaling to the test set, and no other preprocessing step is applied
to the test set.
Script 15:
Next, we need to divide the data into training and test sets. Since the images are in a
local directory, you can use the flow_from_directory() method of the
ImageDataGenerator object for the training and test sets.
You need to specify the target size (image size), which is 64, 64 in our case. The
batch size defines the number of images that will be processed in a batch. And
finally, since we have two output classes for our dataset, the class_mode attribute is
set to binary.
The following script creates the final training and test sets.
Script 16:
6.2.4. Training a CNN Model
Training the model is easy. You just need to pass the training and test sets to the
fit() method of your CNN model. You need to specify the steps per epoch. Steps per
epoch refers to the number of times you want to update the weights of your neural
network in one epoch. Since we have 8,000 records in the training set where 32
images are processed in a bath, the steps per epoch will be 8000/32 = 250.
Similarly, in the test set, we process 32 images at a time. The validation step is also
set to 2000/32, which means that the model will be validated on the test set after a
batch of 32 images.
Script 17:
Output:
6.2.5. Making Prediction on a Single Image
Let’s now see how you can make predictions on a single image. If you look at the
single_prediction folder in your dataset, it contains two images: cat_or_dog_1.jpg
and cat_or_dog2.jpg. We will be predicting what is in the first image, i.e.,
cat_or_dog1.jpg.
Execute the following script to load the cat_or_dog1.jpg image and convert it into
an image of 64 x 64 pixels.
Script 18:
1. import numpy as np
2. from tensorflow.keras.preprocessing import image
3.
4. single_image = image.load_img(“/gdrive/My Drive/datasets/Animal
Dataset/single_prediction/cat_or_dog_1.jpg” , target_size= (64, 64))
Script 19:
1. type(single_image)
Output:
PIL.Image.Image
The image type is PIL. We need to convert it into array type so that our trained
CNN model can make predictions on it. To do so, you can use the img_to_array()
function of the image from the tensorflow.keras.preprocessing module, as shown
below.
Script 20:
1. single_image = image.img_to_array(single_image)
2. single_image = np.expand_dims(single_image, axis = 0)
The above script also adds one extra dimension to the image array because the
trained model is trained using an extra dimension, i.e., batch. Therefore, while
making a prediction, you also need to add the dimension for the batch. Though the
batch size for a single image will always be 1, you still need to add the dimension
in order to make a prediction.
Finally, to make predictions, you need to pass the array for the image to the
predict() method of the CNN model, as shown below:
Script 21:
1. image_result = cnn_model.predict(single_image)
Script 22:
1. training_data.class_indices
Output:
{‘cats’: 0, ‘dogs’: 1}
Let’s print the value of the predicted result. To print the value, you need to first
specify the batch number and image number. Since you have only one batch and
only one image within that batch, you can specify 0 and 0 for both.
Script 23:
1. print (image_result[0][0])
Output:
1.0
The output depicts a value of 1.0, which shows that the predicted image contains a
dog. To verify, open the image Animal Dataset/single_prediction/cat_or_dog_1.jpg,
and you should see that it actually contains an image of a dog, as shown below.
This means our prediction is correct!
Further Readings – Image Classification with CNN
To study more about image classification with TensorFlow Keras, take a look at
these links:
https://bit.ly/3ed8PCg
https://bit.ly/2TFijwU
Exercise 6.1
Question 1
What should be the input shape of the input image to the convolutional neural
network?
A. Width, Height
B. Height, Width
Question 2
B. Image is distorted
C. Image is compressed
D. All of the above
Question 3
B. Non-linearity
C. Quadraticity
D. None of the above
PROJECT
In this project, you will see how to create a simple movie recommendation
system, which recommends movies to a user using item-based collaborative
filtering. Before you see the actual code for the recommendation system,
let’s first understand what collaborative filtering is.
Script 1:
1. import numpy as np
2. import pandas as pd
3. import matplotlib.pyplot as plt
4. import seaborn as sns
Let’s first import the movies.csv file. The file script uses the read_csv()
method from the Pandas library to read the CSV file into a Pandas
dataframe. Next, the head() method of the Pandas dataframe is being used
to display the header of the dataset.
Script 2:
1. movie_ids_titles = pd.read_csv(r”E:/Datasets/ml-latest-small/movies.csv” )
2. movie_ids_titles.head()
Output:
From the above output, you can see that the movies.csv file contains three
columns, i.e., movieId, title, and genres. This dataframe basically maps the
movieId with the movie title.
1. movie_ids_ratings = pd.read_csv(r”E:/Datasets/ml-latest-small/ratings.csv” )
2. movie_ids_ratings.head()
Output:
The ratings.csv file contains the userId column, which contains the ID of
the user who rated a movie. The movieId column consists of the id of the
movie; the rating column consists of ratings, while the timestamp column
consists of the timestamp (in seconds) when the review was left.
Script 4:
1. movie_ids_ratings.shape
Output:
(100836, 4)
The output shows that we have 100,836 records, and each record has four
columns.
Script 5:
Output:
Similarly, the following script removes the timestamp column from the
movie_ids_ratings dataframe.
Script 6:
1. movie_ids_ratings.drop(“timestamp” , inplace = True, axis = 1)
2. movie_ids_ratings.head()
Output:
Script 7:
Output:
Let’s first group the dataset by title and see what information we can get
regarding the ratings of movies. Execute the following script.
Script 8:
1. merged_movie_df.groupby(‘title’ ).describe()
Output:
The output above shows the userId, movieId, and rating columns grouped
together with respect to the title column. The describe() method further
shows the information as mean, min, max, and standard deviation values for
userId, movieId, and rating columns. We are only interested in the ratings
column. To extract the mean of ratings grouped by title, you can use the
following script.
Script 9:
The output below shows that the first two movies got an average rating of
4.0 each, while the third and fourth movies have average ratings of 3 and 5,
respectively.
Output:
title
‘71 (2014) 4.0
‘Hellboy’: The Seeds of Creation (2004) 4.0
‘Round Midnight (1986) 3.5
‘Salem’s Lot (2004) 5.0
‘Til There Was You (1997) 4.0
Name: rating, dtype: float64
Let’s sort the movie titles by the descending order of the average user
ratings. Execute the following.
Script 10:
The output below shows the names of some not so famous movies. This is
possible because some unknown movies might have got high ratings but
only by a few users. Hence, we can say that average rating alone is not a
good criterion to judge a movie. The number of times a movie has been
rated is also important.
Output:
title
Karlson Returns (1970) 5.0
Winter in Prostokvashino (1984) 5.0
My Love (2006) 5.0
Sorority House Massacre II (1990) 5.0
Winnie the Pooh and the Day of Concern (1972) 5.0
Name: rating, dtype: float64
Let’s now print the movies in the descending order of their rating counts.
Script 11:
Here is the output. You can now see some really famous movies, which
shows that a movie that is rated by a large number of people is usually a
good movie.
Output:
title
Forrest Gump (1994) 329
Shawshank Redemption, The (1994) 317
Pulp Fiction (1994) 307
Silence of the Lambs, The (1991) 279
Matrix, The (1999) 278
Name: rating, dtype: int64
Let’s create a new dataframe that shows the title, mean rating, and the rating
counts. Execute the following two scripts.
Script 12:
Script 13:
The following output shows the final dataframe. The dataframe now
contains the movie title, average ratings (rating_mean), and the number of
rating counts (rating_count).
Output:
First, we will plot a histogram to see how the average ratings are
distributed.
Script 14:
1. plt.figure(figsize=(10,8))
2. sns.set_style(“darkgrid” )
3. movie_rating_mean_count[‘rating_mean’ ].hist(bins=30, color = “purple” )
The output below shows that most of the movies have an average rating
between 3 and 4.
Output:
Script 15:
1. plt.figure(figsize=(10,8))
2. sns.set_style(“darkgrid” )
3. movie_rating_mean_count[‘rating_count’ ].hist(bins=33, color = “green” )
The output below shows that there are around 7,000 movies with less than
10 rating counts. The number of movies decreases with an increase in rating
counts. Movies with more than 50 ratings are very few.
Output:
Finally, it is also interesting to see the relationship between the mean ratings
and rating counts of a movie. You can plot a scatter plot for that, as shown
in the following script:
Script 16:
1. plt.figure(figsize=(10,8))
2. sns.set_style(“darkgrid” )
3. sns.regplot(x=”rating_mean” , y=”rating_count” , data=movie_rating_mean_count, color =
“brown” )
If you look at the top right portion of the following output, you can see that
the movies with a higher number of rating counts tend to have higher mean
ratings as well.
Output:
Let’s sort our dataset by rating counts and see the average ratings of the
movies with the top 5 highest number of ratings.
Script 17:
Output:
Script 18:
1. user_movie_rating_matrix = merged_movie_df.pivot_table(index=’userId’ , columns=’title’ ,
values=’rating’ )
2. user_movie_rating_matrix
Look at the output below. Here, the user Ids represent the dataframe index,
whereas columns represent movie titles. A single cell contains the rating
left by a particular user for a particular movie. You can see many null
values in the following dataframe because every user didn’t rate every
movie.
Output:
Script 19:
1. user_movie_rating_matrix.shape
The output shows that our dataset contains 610 rows and 9,719 columns.
This is because our dataset contains 610 unique users and 9,719 unique
movies.
Output:
(610, 9719)
Script 20:
Next, we will find the correlation between the user ratings of all the movies
and the user ratings for the movie pulp fiction. We know that the
user_movie_rating_matrix that we created earlier contains user ratings of all
the movies in columns. Therefore, we need to find the correlation between
the dataframe that contains user ratings for Pulp Fiction (1994), which is
pulp_fiction_ratings, and the dataframe that contains user ratings for all the
movies, i.e., user_movie_rating_matrix. To do so, you can use the
corrwith() function, as shown in the following script. The newly created
pf_corr column will contain the correlation between the ratings for the
movie Pulp Fiction (1994) and all the other movies.
Script 21:
1. pulp_fiction_correlations =
pd.DataFrame(user_movie_rating_matrix.corrwith(pulp_fiction_ratings), columns =[“pf_corr” ])
Let’s print the first five movies with the highest correlation with the movie
Pulp Fiction (1994). Execute the following script.
Script 22:
1. pulp_fiction_correlations.sort_values(“pf_corr” , ascending=False).head(5)
Here is the output. The names of the movies in the output below are not
very well known. This shows that correlation itself is not a very good
criterion for item-based collaborative filtering. For example, there can be a
movie in the dataset that is rated 5 stars by only one user who also rated the
movie Pulp Fiction (1994) as 5 stars. In such a case, that movie will have
the highest correlation with Pulp Fiction (1994) since both the movies will
have 5-star ratings.
Output:
Script 23:
1. pulp_fiction_correlations = pulp_fiction_correlations.
join(movie_rating_mean_count[“rating_count” ])
Next, let’s plot the first five rows of the pulp_fiction_correlations
dataframe.
Script 24:
1. pulp_fiction_correlations.head()
From the output, you can see both the pf_corr and rating_count columns.
The pf_corr column contains some NaN values. This is because there can
be movies that are rated by users who did not rate Pulp Fiction (1994). In
such cases, the correlation will be null.
Output:
We will remove all the movies with null correlation with Pulp Fiction
(1994). Execute the following script to do so.
Script 25:
1. pulp_fiction_correlations.dropna(inplace = True)
Next, plot the movies with the highest correlation with Pulp Fiction (1994).
Script 26:
1. pulp_fiction_correlations.sort_values(“pf_corr” , ascending=False).head(5)
You can see from the output below that, as expected, the movies with the
highest correlation have very low rating counts, and, hence, the correlation
doesn’t give a true picture of the similarities between movies.
Output:
A better way is to find the movies with rating counts of at least 50 and
having the highest correlation with Pulp Fiction (1994). The following
script finds and prints those movies.
Script 27:
1. pulp_fiction_correlations_50 =
pulp_fiction_correlations[pulp_fiction_correlations[‘rating_count’ ]>50]
2. pulp_fiction_correlations_50.sort_values(‘pf_corr’ , ascending=False).head()
From the output below, you can see that the movie Pulp Fiction has the
highest correlation with itself, which makes sense. Next, the highest
correlation is found for the movies The Wolf of Wall Street (2013) and Fight
Club (1999). These are the two movies recommended by our recommender
system to a user who likes Pulp Fiction (1994).
Output:
7.6.2. Finding Recommendations Based on Multiple Movies
In this section, you will see how to recommend movies to a user based on
his ratings of multiple movies. The first step is to create a dataframe, which
contains a correlation between all the movies in our dataset in the form of a
matrix. To do so, you can use the corr() method of the Pandas dataframe.
The correlation type, which is Pearson, in this case, is passed to the method
parameter. The min_periods attribute value specifies the minimum number
of observations required per pair of columns to have a valid result. A
min_periods value of 50 specifies calculating correlation for only those pair
of movies that have been rated by at least 50 same users. For the rest of the
movie pairs, the correlation will be null.
Script 28:
Script 29:
1. all_movie_correlations.head()
Output:
Now suppose a new user logs into your website. The user has already
watched three movies and has given a rating to those movies. Let’s create a
new dataframe that contains fictional ratings given by a user to three
movies.
Script 30:
1. movie_data = [[‘Forrest Gump (1994)’ , 4.0], [‘Fight Club (1999)’ , 3.5], [‘Interstellar (2014)’ ,
4.0]]
2.
3.
4. test_movies = pd.DataFrame(movie_data, columns = [‘Movie_Name’ , ‘Movie_Rating’ ])
5. test_movies.head()
Our input dataframe looks like this. We will be recommending movies from
our dataset based on the ratings given by a new user for these three movies.
Output:
To get the name and ratings of a movie from the test_movie dataframe, you
can use the following script.
Script 31:
Output:
Script 32:
Output:
Now, you know how to obtain names and ratings of movies from the
test_movie dataframe and how to obtain correlations of all the movies with
a single movie using the movie title.
Next, we will iterate through the three movies in the test_movie dataframe,
find the correlated movies, and then multiply the correlation of all the
correlated movies with the ratings of the input movie. The correlated
movies, along with the weighted correlation (calculated by multiplying the
actual correlation with the ratings of the movies in the test_movie
dataframe), are appended to an empty series named recommended_movies.
Script 33:
1. recommended_movies = pd.Series()
2. for i in range(0, 2):
3. movie = all_movie_correlations[test_movies[‘Movie_Name’ ][i]].dropna()
4. movie = movie.map(lambda movie_corr: movie_corr * test_movies[‘Movie_Rating’
][i])
5. recommended_movies = recommended_movies.append(movie)
Script 34:
1. recommended_movies
Output:
To get a final recommendation, you can sort the movies in the descending
order of the weighted correlation, as shown below.
Script 35:
The output shows the list of recommended movies based on the movies
Forrest Gump (1994), Fight Club (1999), and Interstellar (2014).
Output:
You can see from the above output that Forrest Gump (1994) and Fight
Club (1999) have the highest correlation with themselves. Hence, they are
recommended. The movie Interstellar (2014) doesn’t appear on the list
because it might not have passed the minimum 50 ratings thresholds. The
remaining movies are the movies recommended by our recommender
system to a user who watched Forrest Gump (1994), Fight Club (1999), and
Interstellar (2014).
Question 2:
Which method is used to find the correlation between columns of two different
Pandas dataframes?
A. get_corr()
B. corr()
C. corrwith()
D. None of the above()
Question 3:
Which method is used to find the correlation between the columns of a single
dataframe?
A. get_corr()
B. corr()
C. corrwith()
D. corrself()
PROJECT
Face detection, as the name suggests, refers to detecting faces from images
or videos and is one of the commonest computer vision tasks. Face
detection is a precursor to many advanced tasks such as emotion detection,
interest detection, surprise detection, etc. Face detection is also the first step
in developing face recognition systems.
Script 1:
For detecting face, eyes, and lips, we will be using two images. One image
contains a single person, and the other image contains multiple persons.
Both the images are available in the face_images folder inside the Datasets
directory in the GitHub and SharePoint repositories.
Let’s import both the images first. To do so, you can use the imread()
function of the OpenCV library and pass it the image path.
Script 2:
1. image1 = cv2.imread(r”E:/Datasets/face_images/image1.jpg” , 0)
2. image2 = cv2.imread(r”E:/Datasets/face_images/image2.jpg” , 0)
Script 3:
Script 4:
1. cv2.data.haarcascades
In the output, you will see a path to the haarcascade files for the Viola-Jones
algorithm.
Output:
C:\ProgramData\Anaconda3\Lib\site-packages\cv2\data
If you go to the path that contains your haarcascade files, you should see the
following files and directories:
Script 5:
1. face_detector = cv2.CascadeClassifier(cv2.data.haarcascades +
‘haarcascade_frontalface_default.xml’ )
Next, you need to define a method, which accepts an image. To detect a
face inside that image, you need to call the detectMultiscale() method of the
face detector object that you initialized in Script 5. Once the face is
detected, you need to create a rectangle around the face. To do so, you need
the x and y components of the face area and the width and height of the
face. Using that information, you can create a rectangle by calling the
rectangle method of the OpenCV object. Finally, the image with a rectangle
around the detected face is returned by the function. The detect_face()
method in the following script performs these tasks.
Script 6:
To detect the face, simply pass the face object to the detect_face() method
that you defined in Script 6. The following script passes image1 to the
detect_face() method.
Script 7:
1. detection_result = detect_face(image1)
Finally, to plot the image with face detection, pass the image returned by
the detect_face() method to the imshow() method of the OpenCV module,
as shown below.
Script 8:
1. plt.imshow(detection_result, cmap = “gray” )
In the following output, you can see that the face has been detected
successfully in the image.
Output:
Let’s now try to detect faces from image2 , which contains faces of nine
persons. Execute the following script:
Script 9:
1. detection_result = detect_face(image2)
2. plt.imshow(detection_result, cmap = “gray” )
The output below shows that out of nine persons in the image, the faces of
six persons are detected successfully.
Output:
OpenCV contains other classifiers as well for face detection. For instance,
in the following script, we define a detect_face() method, which uses the
“haarcascade_frontalface_alt.xml” classifier for face detection. The
following script tries to detect faces in image2.
Script 10:
1. face_detector = cv2.CascadeClassifier(cv2.data.haarcascades +
‘haarcascade_frontalface_alt.xml’ )
2.
3. def detect_face (image):
4.
5. face_image = image.copy()
6.
7. face_rectangle = face_detector.detectMultiScale(face_image)
8.
9. for (x,y,width,height) in face_rectangle:
10. cv2.rectangle(face_image, (x,y), (x + width, y+height), (255,255,255), 8)
11.
12. return face_image
13.
14. detection_result = detect_face(image2)
15. plt.imshow(detection_result, cmap = “gray” )
The output below shows that now 7 out of 9 images are detected which
means that “haarcascade_frontalface_alt” classifier performed better than
“haarcascade_frontalface_default” classifier.
Output:
Script 11:
1. face_detector = cv2.CascadeClassifier(cv2.data.haarcascades +
‘haarcascade_frontalface_alt_tree.xml’ )
2.
3. def detect_face (image):
4.
5. face_image = image.copy()
6.
7. face_rectangle = face_detector.detectMultiScale(face_image)
8.
9. for (x,y,width,height) in face_rectangle:
10. cv2.rectangle(face_image, (x,y), (x + width, y+height), (255,255,255), 8)
11.
12. return face_image
13.
14. detection_result = detect_face(image2)
15. plt.imshow(detection_result, cmap = “gray” )
Output:
8.4. Detecting Eyes
In addition to detecting faces, you can detect eyes in a face as well. To do
so, you need the haarcascade_eye classifier. The following script creates an
object of haarcascade_eye classifier.
Script 12:
And the following script defines the detect_eye() method, which detects
eyes from a face and then plots rectangles around eyes.
Script 13:
Script 14:
1. detection_result = detect_eye(image1)
Script 15:
1. plt.imshow(detection_result, cmap = “gray” )
From the output below, you can see that the eyes have been successfully
detected from the image1.
Output:
The following script tries to detect eyes inside the faces in image2.
Script 16:
1. detection_result = detect_eye(image2)
2. plt.imshow(detection_result, cmap = “gray” )
The output below shows that in addition to detecting eyes, some other
portions of the face have also been wrongly detected as eyes.
Output:
To avoid detecting extra objects in addition to the desired objects, you need
to update the values of the scaleFactor and minNeigbours attributes of the
detectMultiScale() method of various haarcascade classifier objects. For
instance, to avoid detecting extra eyes in image2, you can update the
detectMultiScale() method of the eye_detector object of the
haarcascade_eye classifier, as follows. Here, we set the value of
scaleFactor to 1.2 and the value of minNeighbors to 4.
Script 17:
Basically, the scaleFactor is used to create your scale pyramid. Your model
has a fixed size specified during training, which is visible in the xml.
Hence, if this size of the face is present in the image, it is detected. By
rescaling the input image, however, a larger face can be resized to a smaller
one, making it detectable by the algorithm.
There are no hard and fast rules for setting values for scaleFactor and
minNeigbours attributes. You can play around with different values and
select the ones that give you the best object detection results.
Let’s now again try to detect eyes in image2 using modified values of the
scaleFactor and minNeigbours attributes.
Script 18:
1. detection_result = detect_eye(image2)
2. plt.imshow(detection_result, cmap = “gray” )
The output shows that though there are still a few extra detections, however,
the detections are still better than before.
Output:
8.5. Detecting Smile
You can also detect a smile within an image using OpenCV implementation
of the Viola-Jones algorithm for smile detection. To do so, you can use the
haarcascade_smile classifier, as shown in the following script.
Script 19:
Script 20:
Script 21:
1. detection_result = detect_smile(image1)
2. plt.imshow(detection_result, cmap = “gray” )
The output below shows that we have plenty of extra detections. Hence, we
need to adjust the values of the scaleFactor and minNeigbours attributes.
Output:
Modify the detect_smile() method as follows:
Script 22:
Now, try to detect the smile in image1 using the following script:
Script 23:
1. detection_result = detect_smile(image1)
2. plt.imshow(detection_result, cmap = “gray” )
You will get this output. You can see that all the extra detections have now
been removed, and only the lips are detected for a smile.
Output:
Finally, let’s try to detect the lips in image2. Execute the following script:
Script 24:
1. detection_result = detect_smile(image2)
2. plt.imshow(detection_result, cmap = “gray” )
The output shows that the lips of most of the people are detected.
Output:
8.6. Face Detection from Live Videos
Since videos are essentially multiple frames of images, you can use the
Viola-Jones Classifier to detect faces in videos. Let’s first define the
detect_face() method, which uses the “haarcascade_frontalface_default”
face detection classifier to detect faces and draw a rectangle around the
face.
Script 25:
1. face_detector = cv2.CascadeClassifier(cv2.data.haarcascades +
‘haarcascade_frontalface_default.xml’ )
2.
3. def detect_face (image):
4.
5. face_image = image.copy()
6.
7. face_rectangle = face_detector.detectMultiScale(face_image)
8.
9. for (x,y,width,height) in face_rectangle:
10. cv2.rectangle(face_image, (x,y), (x + width, y+height), (255,255,255), 8)
11.
12. return face_image
Next, to capture a video from your system camera, you can use the
VideoCapture object of OpenCV and pass it 0 as a parameter. Next, to read
the current frame, pass 0 as a parameter to the read() method of the
VideoCapture object. The detected frame is passed to the detect_face()
method, and the detected face bounded by a rectangle is displayed in the
output. This process continues until you press the key “q.”
Script 26:
1. live_cam = cv2.VideoCapture(0)
2.
3. while True:
4. ret, current_frame = live_cam.read(0)
5.
6. current_frame = detect_face(current_frame)
7.
8. cv2.imshow(“Face detected” , current_frame)
9.
10. key = cv2.waitKey(50)
11. if key == ord(“q” ):
12. break
13.
14. live_cam.release()
15. cv2.destroyAllWindows()
Here is the screenshot of the output for detecting faces from videos.
Output:
Further Readings – Open CV for Face Detection
To study more about face detection with OpenCV, check out these links:
https://opencv.org/
https://bit.ly/2IgitZo
Exercise 8.1
Question 1:
B. Decreased
C. Kept constant
D. All of the Above
Question 2:
Which of the following is not a cascade classifier for face detection in Open CV?
A. haarcascade_frontalface_alt_tree.xml
B. haarcascade_frontalface_alt.xml
C. haarcascade_frontalface_default_tree.xml
D. haarcascade_frontalface_default.xml
Question 3:
To capture a live video from the camera, which of the following values should be
passed as an argument to the cv2. VideoCapture() method?
A. 0
B. 1
C. 2
D. 3
PROJECT
Script 1:
1. import numpy as np
2. import matplotlib.pyplot as plt
3.
4. from tensorflow.keras.layers import Input,Conv2D, Dense, Flatten, Dropout, MaxPool2D
5.
6. from tensorflow.keras.models import Model
9.2. Importing the Dataset
You will be using the EMNIST (Extended MNIST) dataset for this project.
The EMNIST dataset contains various corpora containing images of
handwritten digits and English alphabets. The details of the dataset are
available at this link (https://bit.ly/38KhOKI ).
Script 2:
Let’s see the list of available datasets in EMNIST. Run the following
command.
Script 3:
1. list_datasets()
The EMNIST dataset has the following sub-datasets. The details of these
datasets are available on the official link of the EMNIST website.
Output:
Script 4:
And the following script imports the test images and labels.
Script 5:
Let’s perform some data analysis and preprocessing before we train our
CNN model for English alphabet recognition.
Script 6:
1. print (training_images.shape)
2. print (test_images.shape)
The output below shows that the training set contains 124,800 images of 28
x 28 pixels. Similarly, the test set contains 20,800 pixels of 28 x 28.
Output:
In the same way, you can plot the shape of the training and test labels.
Script 7:
1. print (training_labels.shape)
2. print (test_labels.shape)
Output:
(124800,)
(20800,)
Let’s randomly plot the image number 3000 from the test set.
Script 8:
1. plt.figure()
2. plt.imshow(test_images[3000])
3. plt.colorbar()
4. plt.grid(False)
5. plt.show()
The output shows that the image at the 3,000 th index of the test set contains
the English letter D.
Output:
Let’s plot the label for the image and see what we get.
Script 9:
1. print (test_labels[3000])
Output:
The label for the image D is 4. This is because the output labels contain
integers for alphabets starting from 1 to 26. For instance, the label for image
A is 1, and the label for image Z is 26. Since the letter D is the 4th alphabet
in English ABC, the output label for the image at index 3,000 is 4.
Script 10:
1. np.unique(test_labels)
Output:
array([1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21,
22, 23, 24, 25, 26], dtype=uint8)
The output shows that there are 26 unique labels, from 1 to 26.
The next step is to change the dimensions of our input images. CNN in
Keras expect data to be in the format Width-Height-Channels. Our images
contain width and height but no channels. Since the images are greyscale,
we set the image channel to 1, as shown in the following script:
Script 11:
Output:
Script 12:
1. output_classes = len(set(training_labels))
9.4. Training and Fitting CNN Model
We are now ready to create our CNN model. In project 6, you used the
Keras sequential API for developing a CNN model. Though you can use the
sequential API for this project as well, you will be training your CNN
model using the functional API. Functional API is much more flexible and
powerful than sequential API.
In sequential API, you define the model first and then use the add() function
to add layers to the model. In the case of functional API, you do not need
the add() method.
With Keras functional API, to connect the previous layer with the next
layer, the name of the previous layer is passed inside the parenthesis at the
end of the next layer. You first define all the layers in a sequence and then
simply pass the input and output layers to your CNN model.
Script 13:
The CNN model defined in the above script contains one input layer, two
convolutional layers, one flattening layer, one hidden dense layer, and one
output layer. The number of filters in the first convolutional layer is 32,
while the number of filters in the second convolutional layer is 64. The
kernel size for both convolutional layers is 3 x 3, with a stride of 2. After
the first convolutional layer, a max-pooling layer with a size 2 x 2 and stride
2 has also been defined. Dropout layers are also added after the flattening
layer and the first dense layer. TensorFlow dropout layer is used to reduce
overfitting. Overfitting occurs when the model performs better on the
training set but worse on the test set.
The following script creates our CNN model. You can see that the input and
output layers are passed as parameters to the mode.
Script 14:
The rest of the process is similar to what you did in project 6. Once the
model is defined, you have to compile it, as shown below:
Script 15:
Finally, to train the model, you can use the fit() method, as shown below.
Script 16:
In the script above, the batch_size attribute specifies the number of records
processed together for training, and epochs define the number of times the
model is trained on the whole dataset. The validation_data attribute is used
to specify the test set for evaluation. After 20 epochs, an accuracy of 87.06
is obtained on the test set.
Output:
Script 17:
The output below shows that our model performance increased on both
training and test sets till the 5th epoch, and after that, the model
performance remained stable. Since the accuracy on the test set is better
than the accuracy on the training set, our model is not overfitting.
Output:
In addition to accuracy, you can also plot loss, as shown in the following
script.
Script 18:
The output below shows that the loss decreased till the 5th epoch, and the
value of the testing loss is less than the training loss, which again shows
that the model is not overfitting.
Output:
9.6. Making Predictions on a Single Image
Let’s make a prediction on a single image. The following script imports the
test set.
Script 19:
Let’s select the image at the 2,000th index of the test set and plot it.
Script 20:
1. plt.figure()
2. plt.imshow(test_images[2000])
3. plt.colorbar()
4. plt.grid(False)
5. plt.show()
The output below shows that the image contains the digit C.
Output:
Let’s plot the label for the image at index 2,000th of the test set.
Script 21:
1. print (test_labels[2000])
Output:
3
In the next step, we will make a prediction on the 2000th image using our
trained CNN and see what we get. The test set has to be passed to the
predict() method to make a prediction on the test set, as shown below:
Script 22:
1. output = model.predict(test_images)
2. prediction = np.argmax(output[2000])
3. print (prediction)
Output:
Our model predicted 3 as the label for the image at index 2,000 of the test
set, which is a correct prediction.
B. Reduce Overfitting
C. Reduce Loss
D. Increase Overfitting
Question 2:
In Keras Functional API, which of the following functions is used to add layers to a
neural network model?
A. add()
B. append()
C. insert()
D. None of the above()
Question 3:
Which of the following functions can be used to add a new dimension to a numpy
array?
A. add_dims()
B. append_dims()
C. expand_dims()
D. insert_dims()
PROJECT
Clustering algorithms are unsupervised algorithms where the training data is not
labeled. Rather, the algorithms cluster or group the data sets based on common
characteristics. There are two main techniques for clustering data: K-Means
clustering and Hierarchical clustering. In this project, you will use K-Means
clustering for customer segmentation. Before you implement the actual code, let’s
first briefly review what K-Means clustering is.
3. Assign the data point to the cluster of the centroid with the shorted distance.
4. Calculate and update centroid values based on the mean values of the coordi
of all the data points of the corresponding cluster.
5. Repeat steps 2-4 until new centroid values for all the clusters are different fr
previous centroid values.
The following are some of the disadvantages of the K-Means clustering algorithm.
1. The value of K has to be chosen manually.
Enough of theory. Let’s see how to use K-Means clustering for customer
segmentation.
1. import numpy as np
2. import pandas as pd
3. from sklearn.datasets.samples_generator import make_blobs
4. from sklearn.cluster import KMeans
5. from matplotlib import pyplot as plt
6. import seaborn as sns
7. %matplotlib inline
Script 2:
1. dataset = pd.read_csv(‘E:\Datasets\Mall_Customers.csv’ )
The following script prints the first five rows of the dataset.
Script 3:
1. dataset.head()
The below output shows that the dataset has five columns: CustomerID, Genre,
Age, Annual Income (K$), and Spending Score (1-100). The spending score is the
score assigned to customers based on their previous spending habits. Customers
with higher spending in the past have higher scores.
Output:
Let’s see the shape of the dataset.
Script 4:
1. dataset.shape
The output below shows that the dataset contains 200 records and 5 columns.
Output
(200, 5)
Script 5:
The output shows that most of the customers have incomes between 60 and 90K
per year.
Output:
Similarly, we can plot a histogram for the spending scores of the customers, as well.
Script 6:
The output shows that most of the customers have a spending score between 40 and
60.
Output:
We can also plot a regression line between annual income and spending score to see
if there is any linear relationship between the two or not.
Script 7:
From the straight line in the below output, you can infer that there is no linear
relation between annual income and spending.
Output:
Finally, you can also plot a linear regression line between the Age column and the
spending score.
Script 8:
The output confirms an inverse linear relationship between age and spending score.
It can be inferred from the output that young people have higher spending
compared to older people.
Output:
Enough of the data analysis. We are now ready to perform customer segmentation
on our data using the K-Means algorithm.
Script 9:
The output shows that we now have only the annual income and spending score
columns in our dataset.
Output:
To implement K-Means clustering, you can use the K-Means class from the
sklearn.cluster module of the Sklearn library. You have to pass the number of
clusters as an attribute to the K-Means class constructor. To train the K-Means
model, simply pass the dataset to the fit() method of the K-Means class, as shown
below.
Script 10:
Output
KMeans(n_clusters=4)
Once the model is trained, you can print the cluster centers using the
cluster_centers_attribute of the K-Means class object.
Script 11:
Output
[[48.26 56.48 ]
[86.53846154 82.12820513]
[87. 18.63157895]
[26.30434783 20.91304348]]
In addition to finding cluster centers, the K-Means class also assigns a cluster label
to each data point. The cluster labels are numbers that basically serve as cluster id.
For instance, in the case of four clusters, the cluster ids are 0,1,2,3.
To print the cluster ids for all the labels, you can use the labels_attribute of the K-
Means class, as shown below.
Script 12:
Output
[3 0 3 0 3 0 3 0 3 0 3 0 3 0 3 0 3 0 3 0 3 0 3 0 3 0 3 0 3 0 3 0 3 0 3 0 3 0 3 0 3 0 3
03000000000000000000000000000000000000000
00000000000000000000000000000000000000012
12121212121212121212121212121212121212121
2 1 2 1 2 1 2 1 2 1 2 1 2 1 2 1 2 1 2 1 2 1 2 1 2 1 2 1 2 1 2 1 2 1]
The following script prints the clusters in different colors along with the cluster
centers as black data points, as shown below.
Script 13:
Output:
Till now in this project, we have been randomly initializing the value of K or the
number of clusters. However, we do not know exactly how many segments of
customers are there in our dataset. To find the optimal number of customer
segments, we need to find the optimal number of K because K defines the number
of clusters.
There is a way to find the ideal number of clusters. The method is known as the
elbow method.
The inertia represents the total distance between the data points within a cluster.
Smaller inertia means that the predicted clusters are robust and close to the actual
clusters.
To calculate the inertia value, you can use the inertia_attribute of the K-Means class
object. The following script creates inertial values for K=1 to 10 and the plots in the
form.
Script 14:
From the output below, it can be seen that the value of inertia didn’t decrease much
after five clusters.
Output:
Let’s now segment our customer data into five groups by creating five clusters.
Script 15:
Output
KMeans(n_clusters=5)
Script 16:
Output:
From the above output, you can see that the customers are divided into five
segments. The customers in the middle of the plot (in purple) are the customers
with an average income and average spending. The customers belonging to the red
cluster are the ones with a low income and low spending. You need to target the
customers who belong to the top right cluster (sky blue). These are the customers
with high incomes and high spending in the past, and they are more likely to spend
in the future, as well. So any new marketing campaigns or advertisements should be
directed at these customers.
Script 17:
Here is the output. From the output, it seems that the coordinates of the centroid for
the top right cluster are 86.53 and 82.12. The centroid values are located at index 1,
which is also the Id of the cluster.
Output
[[55.2962963 49.51851852]
[86.53846154 82.12820513]
[25.72727273 79.36363636]
[88.2 17.11428571]
[26.30434783 20.91304348]]
To fetch all the records from the cluster with id 1, we will first create a dataframe
containing index values of all the records in the dataset and their corresponding
cluster labels, as shown below.
Script 18:
1. cluster_map = pd.DataFrame()
2. cluster_map[‘data_index’ ] = dataset.index.values
3. cluster_map[‘cluster’ ] = km_model.labels_
4. cluster_map
Output:
Next, we can simply filter all the records from the cluster_map dataframe, where
the value of the cluster column is 1. Execute the following script to do so.
Script 19:
1. cluster_map = cluster_map[cluster_map.cluster==1]
2. cluster_map.head()
Here are the first five records that belong to cluster 1. These are the customers that
have high incomes and high spending.
Output:
Further Readings – Customer Segmentation via Clustering
To study more about clustering for customer segmentation, look at these links:
https://bit.ly/3nqe9FI
https://bit.ly/36EApVw
https://bit.ly/3nqhiW4
Exercise 10.1
Question 1
B. Hierarchical Clustering
Question 2
Question 3
Which iteration should be used when you want to repeatedly execute a code
specific number of times?
A. For Loop
B. While Loop
C. Both A & B
D. None of the above
Answer: A
Question 2:
What is the maximum number of values that a function can return in Python?
A. Single Value
B. Double Value
Answer: C
Question 3:
B. Out
C. Not In
D. Both A and C
Answer: D
Exercise 1.1
Question 1:
Which attribute of the LinearRegression class is used to print the linear regression
coefficients of a trained algorithm:
A. reg_coef
B. coefficients
C. coef_
D. None of the Above
Answer: C
Question 2:
To make a prediction on a single data point, the data features should be in the form
of a_________:
A. column vector
B. row vector
Answer: B
Question 3:
Answer: A
Exercise 2.1
Question 1:
Which attribute of the TfidfVectorizer is used to define the minimum word count:
A. min_word
B. min_count
C. min_df
D. None of the Above
Answer: C
Question 2:
Which method of the MultinomialNB object is used to train the algorithm on the
input data:
A. train()
B. fit()
C. predict()
D. train_data()
Answer: B
Question 3:
B. Unsupervised
C. Reinforcement
D. Lazy
Answer: A
Exercise 3.1
Question 1 :
In a neural network with three input features, one hidden layer of five nodes, and an
output layer with three possible values, what will be the dimensions of weight that
connects the input to the hidden layer? Remember, the dimensions of the input data
are (m,3), where m is the number of records.
A. [5,3]
B. [3,5]
C. [4,5]
D. [5,4]
Answer: B
Question 2:
Which of the following loss function can you use in case of a regression problem:
A. Sigmoid
Answer: C
Question 3:
Neural networks with hidden layers are capable of finding:
A. Linear Boundaries
B. Non-linear Boundaries
Answer: C
Exercise 4.1
Question 1:
The shape of the feature set passed to the LSTM’s input layer should be:
A. Number of Records, Features, Timesteps
Answer: D
Question 2:
B. Diminishing Gradient
C. Low Gradient
D. None of the Above
Answer: B
Question 3:
An RNN is useful when the data is in the form of:
A. A table with unrelated records
Answer: C
Exercise 5.1
Question 1:
This process where the ground truth value of the previous output is fed as input to
the next timestep is called teacher forcing.
A. Truth Labeling
B. Input Labeling
C. Input Forcing
D. Teacher Forcing
Answer: D
Question 2:
In the seq2seq model, the input to the node in the decoder layer is:
A. Hidden state from the encoder
Answer: D
Question 3:
To end predictions using decoder LSTM in seq2seq, what strategy is adopted?
A. End sentence if maximum sentence length is achieved
C. Both A and B
D. None of the Above
Answer: C
Exercise 6.1
Question 1:
What should be the input shape of the input image to the convolutional neural
network?
A. Width, Height
B. Height, Width
Answer: D
Question 2:
B. Image is distorted
C. Image is compressed
D. All of the above
Answer: D
Question 3:
The ReLu activation function is used to introduce:
A. Linearity
B. Non-linearity
C. Quadraticity
D. None of the above
Answer: B
Exercise 7.1
Question 1:
Answer: D
Question 2:
Which method is used to find the correlation between columns of two different
Pandas dataframe?
A. get_corr()
B. corr()
C. corrwith()
D. None of the above()
Answer: C
Question 3:
Which method is used to find the correlation between the columns of a single
dataframe?
A. get_corr()
B. corr()
C. corrwith()
D. corrself()
Answer: B
Exercise 8.1
Question 1:
B. Decreased
C. Kept constant
D. All of the Above
Answer: A
Question 2:
Which of the following is not a cascade classifier for face detection in Open CV?
A. haarcascade_frontalface_alt_tree.xml
B. haarcascade_frontalface_alt.xml
C. haarcascade_frontalface_default_tree.xml
D. haarcascade_frontalface_default.xml
Answer: C
Question 3:
To capture live video from the camera, which of the following values should be
passed as an argument to cv2.VideoCapture() method?
A. 0
B. 1
C. 2
D. 3
Answer: A
Exercise 9.1
Question 1:
B. Reduce Overfitting
C. Reduce Loss
D. Increase Overfitting
Answer: B
Question 2:
In Keras Functional API, which of the following functions is used to add layers to a
neural network model?
A. add()
B. append()
C. insert()
D. None of the above()
Answer: D
Question 3:
Which of the following functions can be used to add a new dimension to a numpy
array?
A. add_dims()
B. append_dims()
C. expand_dims()
D. insert_dims()
Answer: C
Exercise 10.1
Question 1:
B. Hierarchical Clustering
Answer: D
Question 2:
Question 3:
Answer: D