Rajaram Reghuram

FACE MASK DETECTION USING
MACHINE LEARNING
A MINI PROJECT REPORT

SUBMITTED IN PARTIAL FULFILLMENT OF THE
REQUIREMENTS FOR THE AWARD OF DEGREE OF
BACHELOR OF TECHNOLOGY
in
Computer Science and Engineering
of
APJ Abdul Kalam Technological University
by
Rajaram Reghuram (VAS19CS083)
Safna R M (VAS19CS088)
Vydharsh K R (VAS19CS122)
(AN ISO 9001:2015 CERTIFIED INSTITUTION )
Department of Computer Science and Engineering

Vidya Academy of Science and Technology
Thalakottukara, Thrissur - 680 501
( http://www.vidyaacademy.ac.in)
July 2022
Vidya Academy of Science and Technology
Thalakottukara, Thrissur - 680 501
(http://www.vidyaacademy.ac.in)
(AN ISO 9001:2015 CERTIFIED INSTITUTION )
Certificate
This is to certify that the Mini Project Report titled “FACE MASK DETECTION US-
ING MACHINE LEARNING” is a bonafide record of the work carried out by
Rajaram Reghuram (Univ. Reg.No. VAS19CS083) of Vidya Academy of Science
and Technology, Thalakkottukara, Thrissur - 680 501 in partial fulfillment of the require-
ments for the award of Degree of Bachelor of Technology in Computer Science and
Engineering of APJ Abdul Kalam Technological University, during the academic year
2021-2022. The Mini Project Report has been approved as it satisfies the academic re-
quirements in the respect of mini project work prescribed for the said degree.
Project Guide/Supervisor Head of Department
Dr Beena M V Dr Ramani Bai V

Assistant Professor, Dept. of CSE Professor, Dept. of CSE
Undertaking
I,
Rajaram Reghuram (Univ. Reg. No. VAS19CS083), hereby undertake that the mini
project work entitled “FACE MASK DETECTION USING MACHINE LEARNING”,
is carried out by me independently under the valuable guidance of Dr Beena M V, As-
sistant Professor, Department of Computer Science and Engineering, Vidya Academy
of Science and Technology, Thalakottukara, Thrissur in partial fulfillment of the require-
ments for the award of degree of Bachelor of Technology in Computer Science and
Engineering of APJ Abdul Kalam Technological University, during the academic year
2021-2022.
Thrissur Rajaram Reghuram

July 2022
FACE MASK DETECTION USING MACHINE LEARNING
Acknowledgement
During the course of my mini project work, several persons collaborated directly
and indirectly with me. Without their support it would be impossible for me to finish my
work. That is why I wish to dedicate this section to recognize their support.
I want to start expressing my thanks to my project guide Dr Beena M V, Assistant
Professor, Dept. of Computer Science and Engineering, because of her valuable advice
and guidance towards this work. I received motivation, encouragement and hold up from
her during the course of work.
I am grateful to express my thanks to all the faculty members of our department for
their support. I articulate my gratitude to all my friends for their support and help for this
work.
I am thankful to Dr Ramani Bai V, Head of Computer Science and Engineering
Department, and our Principal Dr Saji C B, for their sole co-operation.
Last, but not the least I wish to express my gratitude to God Almighty for his abun-
dant blessings without which this effort would not have been successful.
Rajaram Reghuram
Univ. Reg. No. VAS19CS083
Sixth Semester B.Tech (2019 Admission)
Vidya Academy of Science & Technology
July 2022 Thrissur - 680 501.
Computer Science and Engineering i VAST, Thalakottukara

Abstract
The COVID-19 pandemic has caused a worldwide revolution in healthcare. All people
in the world have started to implement sanitary and prevention measures to prevent con-
tagious viral infections transmitting from one person to another. Most of the viruses and
other disease causing organisms, are spread mainly through droplets which emerge from
an infected person and poses a risk to others. The risk of transmission is highest espe-
cially in public places.
One of the best ways to stay safe from getting infected is wearing a face mask in
open territories as indicated by the WHO. Monitoring and checking face masks on peo-
ple manually is not only restrictive with limited resources, but can also lead to human
errors. There is an immediate requirement for a solution to administer the virus- spread
by not wearing a mask in the public. In this project, we propose a method to detect
whether people are wearing mask or not in real time.
This project aims to detect face masks using machine learning. This paper focuses
on a solution to help enforce wearing masks in public using object detection on images.
TensorFlow and Keras are used to build a CNN model to detect face masks and it can be
trained on a data set. The experimental results shown in this paper infer that the detection
of masked faces and human subjects has stronger robustness and faster detection speed.
Computer Science and Engineering ii VAST, Thalakottukara

Contents
CERTIFICATE
UNDERTAKING
ACKNOWLEDGEMENT i
ABSTRACT ii
LIST OF FIGURES iv
LIST OF SYMBOLS AND ABBREVIATIONS v
1 INTRODUCTION 1
1.1 General . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 Objectives of the Work . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.3 Motivation for this work . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.4 Methodologies Adopted . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
2 LITERATURE REVIEW 3
2.1 LITERATURE SURVEY . . . . . . . . . . . . . . . . . . . . . . . . . . 3
2.1.1 Covid-19 Face Mask Detection Using TensorFlow, Keras & OpenCV 3
2.1.2 Face Mask Detection Using OpenCV . . . . . . . . . . . . . . . 3
2.1.3 Face Mask Detection Using Machine Learning . . . . . . . . . . 4
3 SYSTEM DESIGN 5
3.1 System Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
3.1.1 Data Pre Processing . . . . . . . . . . . . . . . . . . . . . . . . 6
3.1.2 Training . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
3.1.3 Testing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
3.2 Data Flow Diagram . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
4 METHODOLOGY 8
4.1 Deep Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
4.1.1 How Does Deep Learning Works ? . . . . . . . . . . . . . . . . . 8
4.2 Convolutional Neural Network (CNN) . . . . . . . . . . . . . . . . . . . 9
4.2.1 Convolutional Layer . . . . . . . . . . . . . . . . . . . . . . . . 10
4.2.2 Pooling Layer . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
4.2.3 Fully Connected Layer . . . . . . . . . . . . . . . . . . . . . . . 10
4.3 ReLU layer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
Computer Science and Engineering iii VAST, Thalakottukara

4.3.1 Image Flattening . . . . . . . . . . . . . . . . . . . . . . . . . . 11

4.3.2 Dropout Layer . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
4.3.3 Dense Layer . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
4.3.4 Softmax Layer . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
4.4 CNN Architecture Design . . . . . . . . . . . . . . . . . . . . . . . . . . 13
4.4.1 MobileNetV2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
5 IMPLEMENTATION 15
5.1 Requirements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
5.1.1 Tensorflow . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
5.1.2 Keras . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
5.1.3 Python 3.7 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
5.1.4 OS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
5.1.5 OpenCV . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
5.1.6 PIL . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
5.1.7 NumPy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
5.1.8 Scikit-learn . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
5.1.9 Matplotlib . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
5.2 Data Set . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
5.3 Implementation Steps . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
5.4 Hyper Parameters for Convolutional Neural Network . . . . . . . . . . . 20
6 EVALUATION AND RESULT 21

6.1 Evaluation of Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
6.2 Precision . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
6.3 Result . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
7 CONCLUSION AND SCOPE OF FUTURE WORK 25

7.1 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
7.2 Future Scope . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
BIBLIOGRAPHY 26
Computer Science and Engineering iv VAST, Thalakottukara

List of Figures
3.1 System Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5

3.2 Data Flow Diagram . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
4.1 Deep Learning Process . . . . . . . . . . . . . . . . . . . . . . . . . . . 9

4.2 Basic CNN architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
4.3 Convolutional Layer . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
4.4 MobileNetV2 Architecture . . . . . . . . . . . . . . . . . . . . . . . . . 13
5.1 Non Masked Data Set . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18

5.2 Masked Data Set . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
5.3 Hyper-Parameters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
6.1 Training Loss and Accuracy . . . . . . . . . . . . . . . . . . . . . . . . 21

6.2 Result - Partially Masked 1 . . . . . . . . . . . . . . . . . . . . . . . . . 23
6.3 Result - Partially Masked 2 . . . . . . . . . . . . . . . . . . . . . . . . . 23
6.4 Result - Without Mask . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
6.5 Result - With Mask . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
Computer Science and Engineering v VAST, Thalakottukara

List of Symbols and Abbreviations
WHO World Health Organization

CNN Convolutional Neural Network
DFD Data Flow Diagram
IDE Integrated Development Environment
ANN Artificial Neural Network
RCNN Region-Based Convolutional Neural Network
PCA Principal Component Analysis
CV Computer Vision
ROI Region Of Interest
FCLayer Fully Connected Layer
Computer Science and Engineering vi VAST, Thalakottukara

Chapter 1
INTRODUCTION
1.1 General
COVID-19 had a massive impact on human lives. The pandemic lead to the loss of
millions and affected the lives of billions of people. Its negative impact was felt by al-
most all commercial establishments, education, economy, religion, transport, tourism,
employment, entertainment, food security and other industries. According to WHO, 55.6
million people were infected with Coronavirus and 1.34 million people died because of
it as of November 2020. It is extremely hard to keep the track of the spread of COVID-19.
COVID-19 and other viral mainly spreads through droplets present in air. Because
of this, the virus spreads rapidly among the masses. With the nationwide lock downs
being lifted, it has become even harder to track and control the virus. Face masks are
an effective method to control the spread of virus. It had been found that wearing face
masks is 96% effective to stop the spread of virus. The governments, all over the world,
have imposed strict rules the everyone should wear masks while they go out. But still,
some people may not wear masks and it is hard to check weather everyone is wearing
mask or not. In such cases, computer vision will be of great help.
This project increases the demand for an efficient system for detecting face masks on
people for transportation means, densely populated areas, residential districts, large-scale
manufacturers and other enterprises to ensure safety. This project uses machine learning
classification using OpenCV and Tensorflow to detect face masks on people.
1.2 Objectives of the Work

Primary objective of the project is to detect faces with and without face mask. For which
first, we collect some data set of people with and without face mask. Then this data set
is trained into a model and it is made to detect and recognize faces then it is trained to
detect faces with and without mask using the same data set. Finally, we use this trained
model to detect people with mask and without mask in real time.
Computer Science and Engineering 1 VAST, Thalakottukara

1.3 Motivation for this work

Many of the infections and diseases are spread through air. The main spreading is through
droplets which emerge from a person infected with the disease and poses a risk to others.
The risk of transmission is highest in public places. One of the best ways to stay safe
from getting infected is wearing a face mask in open territories as indicated by WHO.
This gave us an idea to provide a detection system to ensure that people wear masks in
public places.
1.4 Methodologies Adopted

Convolutional Neural Network (CNN) is a type of neural network which is used in signal
and image processing. CNN is chosen as the classifier because it gives high accuracy.In
our proposed system, there exist three phases.
First phase is the data pre processing, where the data sets are collected and categorised
into two categories, with-mask and without-mask. Data set consists of images of masked
and unmasked people.The categorised data sets are further divided into training data sets
and testing data sets.Augmentation process is done on the data set then.
Next phase is the training phase, where we train a model with the data sets to de-
tect masked and unmasked people.This model is developed with the help of TensorFlow,
Keras and other deep learning libraries in order to provide a highly accurate model.
Last phase is the testing phase, where we use the trained model to detect people in
real time.This is done with the help of OpenCV library, which has functions supporting
real time face detection.From the video stream, faces are detected and extracted to check
if the face is masked or not. Now, results on the detection is shown with the precision.

Chapter 2
LITERATURE REVIEW
2.1 LITERATURE SURVEY

2.1.1 Covid-19 Face Mask Detection Using TensorFlow, Keras & OpenCV
• Authors : Arjya Das, Mohammad Wasif Ansari, Rohini Basak
• Source : IEEE
• Year : 2020
Face mask detection involves in detecting the location of the face and then determin-
ing whether it has a mask on it or not. The issue is proximately cognate to general object
detection to detect the classes of objects. Face identification categorically deals with dis-
tinguishing a specific group of entities i.e. Face. It has numerous applications, such as
autonomous driving, education, surveillance, and so on. This paper presents a simplified
approach to serve the above purpose using the basic Machine Learning (ML) packages
such as TensorFlow, Keras, OpenCV and Sci kit-Learn.
The proposed method detects the face from the image correctly and then identifies
if it has a mask on it or not. As a surveillance task performer, it can also detect a face
along with a mask in motion. The method attains accuracy up to 95.7% data sets. We ex-
plore optimized values of parameters using the Sequential Convolutional Neural Network
model to detect the presence of masks correctly without causing over-fitting.
2.1.2 Face Mask Detection Using OpenCV

• Authors : Harish A, D. Kalyani, R . Krishna Sri, M . Pratapteja, P V R D P Rao
• Source : IEEE
• Year : 2021
The COVID-19 pandemic is causing a worldwide emergency in healthcare. This virus

mainly spreads through droplets which emerge from a person infected with coronavirus
and poses a risk to others. The risk of transmission is highest in public places. One of
the best ways to stay safe from getting infected is wearing a face mask in open territories

as indicated by the World Health Organization (WHO).
In this project, we propose a method which employs TensorFlow and OpenCV to

detect face masks on people. A bounding box drawn over the face of the person describes
weather the person is wearing a mask or not. If a person’s face is stored in the database,
it detects the name of the person who is not wearing face mask and an email will be
sent to that person warning them that they are not wearing a mask so that they can take
precautions.
2.1.3 Face Mask Detection Using Machine Learning

• Authors : H.A.P. Nithya Priya, V. Mavitha Sri, S. Monisha, P. Prathibha, K. Rekha
• Source : IRJET
• Year : 2021
The COVID-19 pandemic has caused major crisis with respect to the health of most of
the human beings in most of the places of the world which leaves a greater impact by
affecting the health of the people. Major effective methods are made by maintain social
distancing and compulsorily wearing a mask. By wearing a mask, it mainly reduces the
risk of transmission of the disease.
We are trying to present a hybrid model using classical machine learning algorithms,
deep learning for the detection. The data set includes images with and without mask
where we try to make use of OpenCV for the real-time detection using a webcam. As it
is compulsory to wear face masks in public areas and for enhanced safety of people, we
make sure to implement such systems for safety and security reasons. The same model
can also be use in workplaces to ensure all the employees wear their masks throughout
the day.
We are using data set to build a COVID-19 face mask detector using computer vi-
sion, TensorFlow, python and Keras. Our main goal is to identify whether a person on
image/video stream is wearing a face mask or not with the help of deep learning and deep
learning. As face recognition also represents the modalities of bio metric.

Chapter 3
SYSTEM DESIGN
System design is the process of designing the elements of a system such as the architec-
ture, modules and components, the different interfaces of those components and the data
that goes through that system.
3.1 System Architecture
Figure 3.1: System Architecture

3.1.1 Data Pre Processing

In any deep learning project, it is critical to set up a trustworthy validation scheme, in
order to properly evaluate and compare models. This is especially true if the data set is
small to medium size, or the evaluation metric is unstable. In the pre-processing stage
image resizing, contrast enhancement takes place. Label ’0’ is assigned for without mask
and ‘1’ is assigned for with mask.
3.1.2 Training
The training data is used to make sure the machine recognizes the data, the cross-validation
data is used to ensure better accuracy and efficiency of the algorithm used to train the
machine, and the test data is used to see how well the machine can predict new answers
based on its training. The data set is being divided into 80:20 ratio, in which 80 percent
for Training and 20 percent for Testing.
3.1.3 Testing
Evaluating how well the machine learning method work is called testing. The test set is
a set of observations used to evaluate the performance of the model using some perfor-
mance metric. It is important that no observations from the training set are included in
the test set. If the test set does contain examples from the training set, it will be difficult to
assess whether the algorithm has learned to generalize from the training set or has simply
memorized it.
3.2 Data Flow Diagram

A data flow diagram (DFD) maps out the flow of information for any process or sys-
tem. It uses defined symbols like rectangles, circles and arrows, plus short text labels,
to show data inputs, outputs, storage points and the routes between each destination.
Data flowcharts can range from simple, even hand-drawn process overviews, to in-depth,
multi-level DFD’s that dig progressively deeper into how the data is handled. They can
be used to analyze an existing system or model a new one.
Like all the best diagrams and charts, a DFD can often visually “say” things that
would be hard to explain in words, and they work for both technical and nontechnical
audiences, from developer to CEO.That’s why DFD’s remain so popular after all these
years. While they work well for data flow software and systems, they are less applicable
nowadays to visualizing interactive, real-time or database-oriented software or systems.

Figure 3.2: Data Flow Diagram
The above diagram maps out the flow of information for any process or system. For
attaining the above goal, it uses defined symbols like circles, rectangles, and arrows.The
data set initially collected is pre-processed. This pre-processed data set undergoes split-
ting into two sections, training and testing data. Training is performed using the training
data hence the trained model is obtained. The test data is then passed to monitor the suc-
cess rate and efficiency of the model. The output of the testing stage is the classification
of test data into with mask and without mask categories.

Chapter 4
METHODOLOGY
4.1 Deep Learning

Deep learning models are capable enough to focus on the accurate features themselves
by requiring a little guidance from the programmer and are very helpful in solving out
the problem of dimensionality. Deep learning algorithms are used, especially when we
have a huge number of inputs and outputs. Since deep learning has been evolved from
machine learning, which itself is a subset of artificial intelligence and as the idea behind
the artificial intelligence is to mimic the human behaviour, so same is “the idea of deep
learning to build such algorithm that can mimic the brain”. Deep learning is implemented
with the help of Neural Networks, and the idea behind the motivation of Neural Network
is the biological neurons, which is nothing but a brain cell.
4.1.1 How Does Deep Learning Works ?

In deep learning, a computer model learns to perform classification tasks directly from
images, text, or sound. It performs a task repeatedly, making a little tweak to improve
the outcome. Deep learning models can exceed human-level performance. Models are
trained by using a large set of labeled data and neural network architectures that contain
many layers. The most important part of a Deep Learning neural network is a layer of
computational nodes called “neurons”.
Every neuron connects to all of the neurons in the underlying layer. Due to “deep
learning” the neural network leverages at least two hidden layers. The addition of the
hidden layers enables researchers to make more in-depth calculations. How does the al-
gorithm work then? The thing is, each connection has its weight or importance. but, with
the help of the deep neural networks we can automatically find out the most important
features for classification. This is performed with the help of the Activation Function that
evaluates the way the signal should take for every neuron, just like in the case of a human
brain.

Figure 4.1: Deep Learning Process
4.2 Convolutional Neural Network (CNN)

A Convolutional Neural Network (CNN) is a Deep Learning algorithm which can take in
an input image, assign importance (learnable weights and biases) to various aspects/objects
in the image and be able to differentiate one from the other. The pre processing required
in a CNN is much lower as compared to other classification algorithms. While in prim-
itive methods filters are hand-engineered, with enough training, CNN have the ability to
learn these filters/characteristics.
Figure 4.2: Basic CNN architecture
The construction of a convolutional neural network is a multi-layered feed-forward

neural network, made by assembling many unseen layers on top of each other in a par-
ticular order. It is the sequential design that give permission to CNN to learn hierarchical
attributes. The convolutional neural network consists of mainly 3 layers convolutional
layer, pooling layer and fully connected layer. Going through each layer will increase
the complexity of CNN which results in obtaining greater results. Feature extraction is
done by input layer, convolution layer and pooling layer. Classification is done by fully
connected layer and output layer.

4.2.1 Convolutional Layer

Convolution layer is the first layer that converts the given input image into numerical val-
ues. In this layer, a series of mathematical operations are performed to extract the feature
map of the input image. Convolutional layer requires a few components, which are input
data, a filter, and a feature map. Here a feature detector called filter is used. It moves
across the input image and checks whether the feature is present.
Figure 4.3: Convolutional Layer
Filters are 2D array of weights that can vary in size. Filters are applied to a part of
image. The filter is shifted step by step starting from the top-left corner of the image
matrix and the values are multiplied to get the feature map. Output from the convolution
layer is called Feature map.
4.2.2 Pooling Layer

Output from the convolution layer after applying activation function function is passed
to next layer called Pooling layer. The size of the output matrix obtained from the con-
volution layer is reduced in this layer. This process is called down sampling. Although
filters of different sizes can be used in the pooling layer. Filters in the pooling layer does
not have any weights. Pooling involves selecting a pooling operation, much like a filter
to be applied to feature maps. The size of the pooling operation or filter is smaller than
the size of the feature map. Two common functions used in the pooling operation are:
Average Pooling : The filter is shifted step by step starting from the top-left corner of
the image matrix and calculate the average value for each patch on the feature map.This
average value is passed to new reduced matrix.
Maximum Pooling : The filter is shifted step by step starting from the top-left corner of
the image matrix and calculate the maximum value for each patch of the feature map.This
maximum value is passed to new reduced matrix.
4.2.3 Fully Connected Layer

Fully connected layer is the final main layer in CNN. It is a simple feed forward neu-
ral network. The input to the fully connected layer is the output from the final Pooling
or Convolutional Layer, which is flattened and then fed into the fully connected layer.

The output from the final (and any) Pooling and Convolutional Layer is a 3-dimensional
matrix, to flatten that is to unroll all its values into a vector. The recognition and classifi-
cation are performed in this layer.
Several convolutional and pooling layers are usually stacked on top of each other
to extract more abstract feature representations in moving through the network. The
fully connected layers that follow these layers interpret these feature representations and
perform the function of high-level reasoning.
4.3 ReLU layer

ReLU or rectified linear unit is a process of applying an activation function to increase the
non-linearity of the network without affecting the receptive fields of convolution layers.
The output is f(x) = max(0,x). we use this because to introduce the non-linearity to CNN.
In this step , apply the rectifier function to increase non-linearity in the CNN. Images are
made of different objects that are not linear to each other. Without applying this function
the image classification will be treated as a linear problem while it is actually a nonlin-
ear one. In a neural network, the activation function is responsible for transforming the
summed weighted input from the node into the activation of the node or output for that
input.
The rectified linear activation function or ReLU for short is a piece-wise linear func-
tion that will output the input directly if it is positive, otherwise, it will output zero. It
has become the default activation function for many types of neural networks because a
model that uses it is easier to train and often achieves better performance. In order to use
stochastic gradient descent with back propagation of errors to train deep neural networks,
an activation function is needed that looks and acts like a linear function, but is, in fact, a
nonlinear function allowing complex relationships in the data to be learned. The function
must also provide more sensitivity to the activation sum input and avoid easy saturation.
4.3.1 Image Flattening

Once the pooling is done the output needs to be converted to a tabular structure that can
be used by an artificial neural network to perform the classification. Note the number
of the dense layer as well as the number of neurons can vary depending on the problem
statement. Also often a drop out layer is added to prevent over fitting of the algorithm.
Dropouts ignore few of the activation maps while training the data however use all acti-
vation maps during the testing phase. It prevents over fitting by reducing the correlation
between neurons. Flattening involves transforming the entire pooled feature map matrix
into a single column which is then fed to the neural network for processing.Flatten our
entire matrix into a vector like a vertical one. so, that it will be passed to the input layer.
4.3.2 Dropout Layer

Deep learning neural networks are likely to quickly over fit a training data set with few
examples. Ensembles of neural networks with different model configurations are known
to reduce over fitting, but require the additional computational expense of training and

maintaining multiple models. A single model can be used to simulate having a large num-
ber of different network architectures by randomly dropping out nodes during training.
This is called dropout and offers a very computationally cheap and remarkably effec-
tive regularization method to reduce over fitting and improve generalization error in deep
neural networks of all kinds.
Usually, when all the features are connected to the FC layer, it can cause over fitting
in the training data set. Over fitting occurs when a particular model works so well on
the training data causing a negative impact in the model’s performance when used on a
new data. To overcome this problem, a dropout layer is utilised wherein a few neurons
are dropped from the neural network during training process resulting in reduced size of
the model. On passing a dropout of 0.50, 50% of the nodes are dropped out randomly
from the neural network. Dropout is implemented per-layer in a neural network. It can
be used with most types of layers, such as dense fully connected layers, convolutional
layers, and recurrent layers such as the long short-term memory network layer. Dropout
may be implemented on any or all hidden layers in the network.
4.3.3 Dense Layer

Dense layer is the regular deeply connected neural network layer. It is most common and
frequently used layer. Dense layer does the below operation on the input and return the
output. The dense layer is a neural network layer that is connected deeply, which means
each neuron in the dense layer receives input from all neurons of its previous layer. The
dense layer is found to be the most commonly used layer in the models.
In the background, the dense layer performs a matrix-vector multiplication. The val-
ues used in the matrix are actually parameters that can be trained and updated with the
help of back-propagation. The output generated by the dense layer is an ‘m’ dimensional
vector. Thus, dense layer is basically used for changing the dimensions of the vector.
Dense layers also applies operations like rotation, scaling, translation on the vector. The
dense layer’s neuron in a model receives output from every neuron of its preceding layer,
where neurons of the dense layer perform matrix-vector multiplication. Matrix vector
multiplication is a procedure where the row vector of the output from the preceding lay-
ers is equal to the column vector of the dense layer. The general rule of matrix-vector
multiplication is that the row vector must have as many columns like the column vector.
4.3.4 Softmax Layer

Softmax assigns decimal probabilities to each class in a multi-class problem. Softmax is
implemented through a neural network layer just before the output layer. The Softmax
layer must have the same number of nodes as the output layer. Variants of Softmax are:
• Full Softmax is the Softmax we’ve been discussing; that is, Softmax calculates a
probability for every possible class.
• Candidate sampling means that Softmax calculates a probability for all the positive
labels but only for a random sample of negative labels.

Full Softmax is fairly cheap when the number of classes is small but becomes pro-
hibitively expensive when the number of classes climbs. Candidate sampling can improve
efficiency in problems having a large number of classes.
4.4 CNN Architecture Design

The project aims to classify data into with mask and without mask from the images. To
achieve higher accuracy and results on the classification task, we have used MobileNetV2
model. MobileNetV2 model is used as the base model upon which the training is done.
4.4.1 MobileNetV2
MobileNetV2 is an architecture of bottleneck depth separable convolution building of
basic blocks with residuals. It has two types of blocks. Both blocks have three layers.
The first one is 1x1 convolutions with “ReLU6”. The second layer contains depth-wise
“convolution,” and the third layer contains a 1x1 “convolution” with no non-linearity.
The first layer is a one stride residual block and the second layer is also a residual block
with stride 2 and is used for shrinking. The methodology of object detection is to make
a classification to determine the input class, regression to adjust the bounding box. With
the exception of the last completely connected layers, most backbone networks for de-
tection are networks for classification tasks. The backbone network serves as a simple
feature extractor for object detection tasks, taking images as input and producing feature
maps for each input image.
Figure 4.4: MobileNetV2 Architecture

The predefined trained techniques are usually used to extract feature maps with high-
quality classification problems. This part of the model is called the base model. The base
model is the MobileNetV2 network which is used the “image net” weights. Image Net
is an image database that has been trained on hundreds of thousands of images, and as a
result, it is extremely useful for image categorization. The evaluated “bounding boxes”
are compared to the “ground truth boxes” during training, and the trainable parameters
are changed as needed during back propagation. A kernel is utilized in each feature space
to produce outcomes that show corresponding scores for each pixel, whether or not an
item exists, as well as the appropriate bounding box dimensions.

Chapter 5
IMPLEMENTATION
5.1 Requirements
5.1.1 Tensorflow
TensorFlow Is an open-source end-to-end platform for creating Machine Learning ap-
plications. It is basically a software library for numerical computation using data flow
graphs where nodes in the graph represent mathematical operations. TensorFlow uses
dataflow graphs to represent computation, shared state, and the operations that mutate
that state. It maps the nodes of a dataflow graph across many machines in a cluster,
and within a machine across multiple computational devices, including multicore CPUs,
generalpurpose GPUs, and custom-designed ASICs known as Tensor Processing Units
(TPUs). TensorFlow supports a variety of applications, with a focus on training and
inference on deep neural networks. Keras in tensorflow is used to implement various
techniques like optimizers. Edges in the graph represent the multidimensional data ar-
rays (called tensors) communicated between them. It includes a programming support of
deep neural networks, machine learning techniques and a high scalable feature of com-
putation with various data sets.
5.1.2 Keras
Keras is an Open Source Neural Network library written in Python that runs on top of
Theano or TensorFlow. It is designed to be modular, fast and easy to use. Keras is
high-level API wrapper for the low-level API, capable of running on top of TensorFlow,
CNTK, or Theano. It supports both convolutional networks and recurrent networks, as
well as combinations of the two. The core data structure of Keras is a model, a way to
organize layers. The simplest type of model is the Sequential model, a linear stack of
layers. For more complex architectures, use the Keras functional API, which allows to
build arbitrary graphs of layers. Keras High-Level API handles the way we make models,
defining layers, or set up multiple input-output models. In this level, Keras also compiles
our model with loss and optimizer functions, training process with fit function. Keras
doesn’t handle Low-Level API such as making the computational graph, making tensors
or other variables as it has been handled by the “backend” engine.

5.1.3 Python 3.7

Python is a programming language. It’s used for many different applications. It’s used
in some high schools and colleges as an introductory programming language because
Python is easy to learn, but its also used by professional software developers at places
such as Google, NASA, and Lucas film Ltd. Python is dynamically typed and garbage
collected. It supports multiple programming paradigms, including procedural, object
oriented, and functional programming. Python features a comprehensive standard library,
and is referred to as batteries included.
5.1.4 OS
OS comes under Python’s standard utility modules. OS module provides the facility to
establish the interaction between the user and the operating system. It offers many useful
OS functions that are used to perform OS-based tasks and get related information about
operating system. This module offers a portable way of using operating system dependent
functionality. Basically, the OS Module within Python Programming Language allows
us to execute a number of commands towards our installed operating system. The *os*
and *os.path* modules include many functions to interact with the file system.
5.1.5 OpenCV
The CV is the abbreviation form of computer vision. Computer vision is a field of ar-
tificial intelligence (AI) that enables computers and systems to derive meaningful infor-
mation from digital images, videos and other visual inputs — and take actions or make
recommendations based on that information. OpenCV is a huge open-source library for
computer vision, machine learning, and image processing. OpenCV supports a wide
variety of programming languages like Python, C++, Java, etc. It can process images
and videos to identify objects, faces, or even the handwriting of a human. When it is
integrated with various libraries, such as Numpy which is a highly optimized library for
numerical operations, then the number of weapons increases in your Arsenal i.e whatever
operations one can do in Numpy can be combined with OpenCV.
5.1.6 PIL
Python Imaging Library (PIL) is the image processing package for Python language. The
Python Imaging Library adds image processing capabilities to Python interpreter. This
library provides extensive file format support, an efficient internal representation, and
fairly powerful image processing capabilities. It incorporates lightweight image process-
ing tools that aids in editing, creating and saving images. This library provides extensive
file format support including BMP, PNG, JPEG, and TIFF, an efficient internal repre-
sentation, and fairly powerful image processing capabilities. The core image library is
designed for fast access to data stored in a few basic pixel formats. It should provide a
solid foundation for a general image processing tool.
5.1.7 NumPy
Numpy is a general-purpose array-processing package. It provides a high-performance
multidimensional array object, and tools for working with these arrays. It is the funda-

mental package for scientific computing with Python. Numpy can also be used as an
efficient multidimensional container of generic data. It is a Python Extension to pro-
vide functions and capability to transform arrays. Numpy array saves data in continuous
memory location making it easier to iterate through the data. Array in Numpy is a table
of elements , all of the same type, indexed by a tuple of positive integers. In Numpy,
number of dimensions of the array is called rank of the array. A tuple of integers giving
the size of the array along each dimension is known as shape of the array. An array class
in Numpy is called as ndarray. Elements in Numpy arrays are accessed by using square
brackets and can be initialized by using nested python lists.Using NumPy, Mathematical
and logical operations on arrays, Fourier transforms and routines for shape manipulation,
Operations related to linear algebra can be performed.
5.1.8 Scikit-learn
Scikit-learn is a powerful machine learning library that provides a wide variety of mod-
ules for data access, data preparation and statistical model building. It has a good selec-
tion of clean toy data sets that are great for people just getting started with data analysis
and machine learning. Even better, easy access to these data sets removes the hassle
of searching for and downloading files from an external data source. The library also
enables data processing tasks such as imputation, data standardization and data normal-
ization. These tasks can often lead to significant improvements in model performance.
Scikit-learn also provides a variety of packages for building linear models, tree-based
models, clustering models and much more. It features an easy-to-use interface for each
model object type, which facilitates fast prototyping and experimentation with models.
5.1.9 Matplotlib
Matplotlib is a plotting library for creating static, animated, and interactive visualizations
in Python. Matplotlib can be used in Python scripts, the Python and IPython shell, web
application servers, and various graphical user interface toolkits like Tkinter, awxPython,
etc. Pyplot is a Matplotlib module that provides a MATLAB-like interface. Matplotlib is
designed to be as usable as MATLAB, with the ability to use Python and the advantage
of being free and open-source. Each pyplot function makes some change to a figure: e.g.,
creates a figure, creates a plotting area in a figure, plots some lines in a plotting area,
decorates the plot with labels, etc. The various plots we can utilize using Pyplot are Line
plot, Histogram, Scatter, 3D plot, Image, Contour, and Polar
5.2 Data Set

A data set is a collection of data. In the case of tabular data, a data set corresponds to one
or more database tables, where every column of a table represents a particular variable,
and each row corresponds to a given record of the data set in question. The data set lists
values for each of the variables, such as for example height and weight of an object, for
each member of the data set. Data sets can also consist of a collection of documents or
files. In the open data discipline, data set is the unit to measure the information released
in a public open data repository. The European Open Data portal aggregates more than
half a million data sets. Several characteristics define a data set’s structure and properties.
These include the number and types of the attributes or variables, and various statistical

measures applicable to them, such as standard deviation.
Face Mask images has been collected from Kaggle, which has standard set of images
with and without masks.The data set contains 3,830 images.80% images are used for
training and 20% images for testing. All the images are of size 224 × 224.
Figure 5.1: Non Masked Data Set

Figure 5.2: Masked Data Set
5.3 Implementation Steps

1. The data set is split into two parts-training set and test set with 80 percent and 20
percent images respectively.
2. Data augmentation like shearing, zooming, flipping and brightness is used change
to increase the data set size to almost double the original data set size.
3. Pre-trained MobileNetV2 model is used as the base model for creating our train-
ing model.
4. Used 50 % dropout for effectively training the model.
5. The last layer is used for the classification with softmax as the activation function.
6. Binary cross entropy is chosen as the loss function.
7. The model is trained for 20 epochs with a batch size of 32 by changing the hyper
parameters like learning rate, batch size, optimizer and pre-trained weights.

8. Trained model is tested with test set to check whether the model is able to detect
masked and unmasked people.
9. The trained model is used to detect real time face mask for which a frame is de-
signed to show the results.
10. The frame shows the result whether the person is wearing mask or not and shows
the precision.
5.4 Hyper Parameters for Convolutional Neural Network
Figure 5.3: Hyper-Parameters

Chapter 6
EVALUATION AND RESULT
6.1 Evaluation of Model
Figure 6.1: Training Loss and Accuracy
The model trains on 20 Epochs on 32 batch sizes using Adam optimizer with learn-
ing rate 0.0001. Using 3,064 training images and 766 testing images, the model achieves
more than 95% accuracy with minimum training loss.
While training the model in keras, accuracy and loss in keras model for validation
data could be variating with different cases. Usually with every epoch increasing, loss
should be going lower and accuracy should be going higher. But with val loss(keras
validation loss) and val acc (keras validation accuracy), many cases can be possible like

val loss starts increasing, val acc starts decreasing. This means model is cramming values
not learning. val loss starts increasing, val acc also increases. This could be case of over
fitting or diverse probability values in cases where Softmax is being used in output layer.
val loss starts decreasing, val acc starts increasing. This is also fine as that means model
built is learning and working fine.
6.2 Precision
The precision is calculated as the ratio between the number of Positive samples cor-
rectly classified to the total number of samples classified as Positive (either correctly or
incorrectly). The precision measures the model’s accuracy in classifying a sample as pos-
itive.Precision looks to see how much junk positives got thrown in the mix. If there are
no bad positives (those FPs), then the model had 100% precision. The more FPs that get
into the mix, the uglier that precision is going to look. The precision of a model describes
how many detected items are truly relevant. It is calculated by dividing the true positives
by overall positives. Precision is defined as follows:
TP
P recision = (6.1)
TP + FP
Here ,
True Positive (TP) :A test result that correctly indicates the presence of a condition or
characteristic. That is, positive tuples that are correctly labeled by the model.
False Positive (FP) :A test result which wrongly indicates that a particular condition or
attribute is present. That is, negative tuples that were incorrectly labeled by the model.
When the model makes many incorrect Positive classifications, or few correct Positive
classifications, this increases the denominator and makes the precision small.On the other
hand, the precision is high when:
• The model makes many correct Positive classifications (maximize True Positive).
• The model makes fewer incorrect Positive classifications (minimize False Positive).
6.3 Result
This is a model developed for face mask detection based on deep learning ,implemented
using open-source python libraries. Here the model can detect people whether they’re
wearing mask or not from the real time video as shown in figure . The result shown will
be ”Mask” or ”No Mask” depending on whether the person is wearing mask or not while
video streaming. The image undergoes pre-processing stage.
The pre-processing and feature extraction is done at the convolution and pooling lay-
ers. Classification of the model can be done at a fully connected layer. The model is then
trained and tested and the project model is developed for real time. Figure 6. shows the
result which is obtained after running the project model. Here the model shows whether
the person in the rel time video is wearing mask or not and it’s precision. In this project,

we have used the concepts of deep learning and have developed a system to detect people
wearing mask or not from real time video and the project model achieved an accuracy of
95% after training of the model.
Figure 6.2: Result - Partially Masked 1
Figure 6.3: Result - Partially Masked 2

Figure 6.4: Result - Without Mask
Figure 6.5: Result - With Mask

Chapter 7
CONCLUSION AND SCOPE OF

FUTURE WORK
7.1 Conclusion
In this project, we briefly explained the motivation of the work at first. Then, we il-
lustrated the learning and performance task of the model. Using basic ML tools and
simplified techniques, the method has achieved reasonably high accuracy.It can be used
for a variety of applications.Wearing a mask will become a necessary safety precaution
in public places, as chances of getting infected by various viruses is high, nowadays.The
deployed model will contribute immensely to the public health care system.
7.2 Future Scope

In the future, physical distance integration could be introduced as a feature, or cough-
ing and sneezing detection could be added. Apart from detecting the face mask, it will
also compute the distances among each individual and see any possibility of coughing
or sneezing. Also,if the mask is not worn properly, a third class can be introduced that
labels the image as ‘improper mask’.
The model can be further improved to detect if the mask is virus prone or not i.e.
the type of the mask is surgical, N95 or not.This project can also be integrated with the
temperature sensors and blood pressure sensors by which if a person would stand in front
of a screen the camera would detect if he/she is wearing a mask or not, the infrared sensor
will the check the temperature of a person and hence the applications of each sensor
would be used and the person will get an optimal report with the data on the screen.

Bibliography
[1] M. Jiang, X. Fan and H. Yan, “RetinaMask: A Face Mask detector”, arXiv.org,
2020. [Online]. Available: https://arxiv.org/abs/2005.03950. 2020.
[2] M. Loey, G. Manogaran, M. H. N. Taha, and N. E. M. Khalifa, “A hybrid deep

transfer learning model with machine learning methods for face mask detection in
the era of the COVID-19 pandemic,” Measurement, vol. 167, Article ID 108288,
2021.
[3] S. K. Dey, A. Howlader, and C. Deb, “MobileNet mask: a multi-phase face mask
detection model to prevent person-to-person transmission of SARS-CoV-2,” in
Proceedings of International Conference on Trends in Computational and Cognitive
Engineering,2021.
[4] H. Jiang and E. Learned-Miller, Face detection with the faster R-CNN, 12th IEEE
Int. Conf. Automatic FaceGesture Recogn. (FG 2017), IEEE, 2017.
[5] M.S. Ejaz, M.R. Islam, M. Sifatullah and A. Sarker, Implementation of principle
component analysis on maskedand non-masked face recognition, IEEE 1st Interna-
tional Conference for Advanced Scientific Engineering and Robotics Technology,
2019.
[6] “Coronavirus Disease 2019 (COVID-19) – Symptoms”, Centers for Dis-

ease Control and Prevention, 2020. https://www.cdc.gov/coronavirus/2019-
ncov/symptomstesting/ symptoms.html. 2020.
[7] BOSHENG QIN, DONGXIAO LI. Identifying Facemask-wearing Condition Using

Image Super- Resolution with Classification Network to Prevent COVID- 19, 13
May 2020. https://doi.org/10.21203/rs.3.rs-28668/v1
[8] “Face Mask Detection”, Kaggle.com, 2020. [Online]. Available:

https://www.kaggle.com/andrewmvd/face-mask-detection. 2020.

[9] S. A. Hussain, A.S.A.A. Balushi, A real time face emotion classification and
recognition using deep learning model, J. Phys.: Conf. Ser. 1432 (2020) 012087,
doi: 10.1088/1742-6596/1432/1/012087.
[10] Z. Wang, G. Wang, B. Huang, Z. Xiong, Q. Hong, H. Wu, P. Yi, K. Jiang, N. Wang,
Y. Peiet al., “Masked face recognition data set and application,”arXiv preprint
arXiv:2003.09093, 2020.

Vidya Academy of Science & Technology
Thalakkottukara, Thrissur - 680 501
(http://www.vidyaacademy.ac.in)

Rajaram Reghuram

Uploaded by

Copyright:

Available Formats

Rajaram Reghuram

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Rajaram Reghuram

Uploaded by

Copyright:

Available Formats

FACE MASK DETECTION USING

A MINI PROJECT REPORT

(AN ISO 9001:2015 CERTIFIED INSTITUTION )

Department of Computer Science and Engineering

(AN ISO 9001:2015 CERTIFIED INSTITUTION )

Project Guide/Supervisor Head of Department

Dr Beena M V Dr Ramani Bai V

Thrissur Rajaram Reghuram

Computer Science and Engineering i VAST, Thalakottukara

Computer Science and Engineering ii VAST, Thalakottukara

LIST OF SYMBOLS AND ABBREVIATIONS v

Computer Science and Engineering iii VAST, Thalakottukara

4.3.1 Image Flattening . . . . . . . . . . . . . . . . . . . . . . . . . . 11

6 EVALUATION AND RESULT 21

7 CONCLUSION AND SCOPE OF FUTURE WORK 25

Computer Science and Engineering iv VAST, Thalakottukara

3.1 System Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5

4.1 Deep Learning Process . . . . . . . . . . . . . . . . . . . . . . . . . . . 9

5.1 Non Masked Data Set . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18

6.1 Training Loss and Accuracy . . . . . . . . . . . . . . . . . . . . . . . . 21

Computer Science and Engineering v VAST, Thalakottukara

List of Symbols and Abbreviations

WHO World Health Organization

Computer Science and Engineering vi VAST, Thalakottukara

1.2 Objectives of the Work

Computer Science and Engineering 1 VAST, Thalakottukara

1.3 Motivation for this work

1.4 Methodologies Adopted

Computer Science and Engineering 2 VAST, Thalakottukara

2.1 LITERATURE SURVEY

2.1.2 Face Mask Detection Using OpenCV

The COVID-19 pandemic is causing a worldwide emergency in healthcare. This virus

Computer Science and Engineering 3 VAST, Thalakottukara

as indicated by the World Health Organization (WHO).

In this project, we propose a method which employs TensorFlow and OpenCV to

2.1.3 Face Mask Detection Using Machine Learning

Computer Science and Engineering 4 VAST, Thalakottukara

3.1 System Architecture

Figure 3.1: System Architecture

Computer Science and Engineering 5 VAST, Thalakottukara

3.1.1 Data Pre Processing

3.2 Data Flow Diagram

Computer Science and Engineering 6 VAST, Thalakottukara

Figure 3.2: Data Flow Diagram

Computer Science and Engineering 7 VAST, Thalakottukara

4.1 Deep Learning

4.1.1 How Does Deep Learning Works ?

Computer Science and Engineering 8 VAST, Thalakottukara

Figure 4.1: Deep Learning Process

4.2 Convolutional Neural Network (CNN)

Figure 4.2: Basic CNN architecture

The construction of a convolutional neural network is a multi-layered feed-forward

Computer Science and Engineering 9 VAST, Thalakottukara

4.2.1 Convolutional Layer

Figure 4.3: Convolutional Layer

4.2.2 Pooling Layer

4.2.3 Fully Connected Layer

Computer Science and Engineering 10 VAST, Thalakottukara

4.3 ReLU layer

4.3.1 Image Flattening