Face Mask Detector: A Project Report Submitted in Partial Fulfillment of The Requirement For The Award of The Degree of

Face Mask Detector
A Project Report submitted in partial fulfillment

of the requirement for the award of the degree of
TECHNICAL UNIVERSITY DEGREE
Submitted by
ACHRAF CHTITEH ZAKARIA ELASRI
ADNANE ELMOURABITI
Under supervision of
Mr. MRANI NABIL

ACKNOWLEDGEMENT
First of all, We would like to express our deepest gratitude

to the almighty Allah for giving us the ability to work hard
successfully. Words actually will never be enough to
express our gratefulness.
Then, we would like to give a special thanks and respect to

our honorable teacher and supervisor Mr. MRANI NABIL
for his constant guidance, advice, encouragement & every
possible help in the overall preparation of this report.
Finally , we would also like to thanks our colleagues and

everyone who has contributed directly or indirectly to
complete this report.
TABLE OF CONTENTS
PROJECT INTRODUCTION P4
WHAT IS DEEPLEARNING P5
DEFINITION OF FUNDAMENTAL CONCEPT P6
INTRODUCTION TO KERAS & TENSORFLOW & MOBILENETV2 P8
PROJECT STRUCTURE P9
OBJECTIVE P9
APPROACH P9
THE PROPOSED METHOD P10

DATAPROCESSING P11
BUILDING BLOCKS OF CNNARCHITECTURE P12
IMPLEMENTATION OFTHE MASK DETECTOR P20
SCREENSHOTS FROM EXECUTION P25
CONCLUSION P27
Abstract
The corona virus COVID-19 pandemic is causing a global health crisis
so the effective protection methods is wearing a face mask in public
areas according to the World Health Organization(WHO).
After the breakout of the worldwide pandemic COVID-19, there arises
a severe need of protection mechanisms, face mask being the primary
one. The basic aim of the project is to detect the presence of a face
mask on human faces on live streaming video as well as on images. .
A hybrid model using deep and classical machine learning for face
mask detection will be presented. A face mask detection dataset
consists of with mask and without mask images , we are going to use
TensorFlow to do real-time face detection from a live stream via our
webcam. We will use the dataset to build a COVID-19 face mask
detector with computer vision using Python Tensor Flow and Keras.
Our goal is to identify whether the person on image/video stream is
wearing a face mask or not with the help of computer vision and deep
learning.
1 Project Introduction
The year 2020 has shown mankind some mind-boggling series of events
amongst which the COVID- 19 pandemic is the most life-changing event
which has startled the world since the year began. Affecting the health
and lives of masses, COVID-19 has called for strict measures to be
followed in order to prevent the spread of disease. From the very basic
hygiene standards to the treatments in the hospitals, people are doing all
they can for their own and the society’s safety; face masks are one of the
personal protective equipment. People wear face masks once they step
out of their homes and authorities strictly ensure that people are wearing
face masks while they are in groups and public places.
To monitor that people are following this basic safety principle, a
strategy should be developed. A face mask detector system can be
implemented to check this. Face mask detection means to identify
whether a person is wearing a mask or not. The first step to recognize
the presence of a mask on the face is to detect the face, which makes the
strategy divided into two parts: to detect faces and to detect masks on
those faces. Face detection is one of the applications of object detection
and can be used in many areas like security, biometrics, law enforcement
and more. There are many detector systems developed around the
world and being implemented. However, all this science needs
optimization; a better, more precise detector, because the world cannot
afford any more increase in corona cases.
What is deeplearning?
This chapter covers

◾ High-level definitions of fundamental concepts
◾ Introdcutionto keras and Tensorflow and
mobilenetv2
In the past few years, artificial intelligence ( AI) has been a subject of intense media
hype. Machine learning, deep learning, and AI come up in countless articles, often
outside of technology-minded publications. We’re promised a future of intelligent
chatbots, self-driving cars, and virtual assistants a future sometimes painted in a
grim light and other times as utopian, where human jobs will be scarce and most
economic activity will be handled by robots or AI agents. For a future or current
practitioner of machine learning.
1.1 Artificial intelligence, machinelearning,
and deep learning
First, we need to define clearly what we’re talking about when we
mention AI. What are artificial intelligence, machine learning, and deep
learning (see figure 1.1)? How do they relate to each other?
Artificial
intelligence
Machine
learning Figure 1.1 Artificial intelligence,
machine learning, and deep
Deep learning
learning
1. Artificial intelligence
Artificial intelligence was born in the 1950s, when a handful of
pioneers from the nascent field of computer science started asking
whether computers could be made to “think” a question whose
ramifications we’re still exploring today.
Although symbolic AI proved suitable to solve well-defined, logical
problems, such as playing chess, it turned out to be intractable to figure
out explicit rules for solving more complex, fuzzy problems, such as
image classification, speech recognition, and language translation. A
new approach arose to take symbolic AI’s place: machine learning.
2. Machine learning
Machine learning arises from this question: could a computer go
beyond “what we know how to order it to perform” and learn on its own
how to perform a specified task? Could a computer surprise us? Rather
than programmers crafting data-processing rules by hand, could a
computer automatically learn these rules by looking at data?
This question opens the door to a new programming paradigm. In
classical programming, the paradigm of symbolic AI, humans input rules
(a program) and data to be processed according to these rules, and out
come answers (see figure 1.2). With machine learning, humans input data
as well as the answers expected from the data,
and out come the rules. These rules can then be applied to new data to
produce original answers.
Rules Classical
Answers
Data programming
Data Machine Figure 1.2 Machine learning:

learning Rules
Answers a new programming
paradigm
A machine-learning system is trained rather than explicitly programmed. It’s

presented with many examples relevant to a task, and it finds statistical
structure in these examples that eventually allows the system to come up with
rules for automating the task
3. Deep learning
Deep learning is a class of machine learning algorithms that uses multiple layers to
progressively extract higher-level features from the raw input. For example,
in image processing, lower layers may identify edges, while higher layers may
identify the concepts relevant to a human such as digits or letters or
faces. Learning can be supervised, semi-supervised or unsupervised.
1. Supervisedlearning
This is Supervised learning (SL) is the machine learning task of learning a function
that maps an input to an output based on example input-output pairs. It infers a
function from labeled training data consisting of a set of training examples.
◾ Syntax tree prediction—Given a sentence, predict its decomposition into a syntax tree.
◾ Object detection—Given a picture, draw a bounding box around certain objects inside
the picture. This can also be expressed as a classification problem (given many
candidate bounding boxes, classify the contents of each one) or as a joint
classification and regression problem, where the bounding-box coordinates are
predicted via vector regression.
◾ Image segmentation —Given a picture, draw a pixel-level mask on a specific object.
2. Unsupervisedlearning
This branch of machine learning consists of finding interesting
transformations of the input data without the help of any targets, it allows the
model to work on its own to discover patterns and information that was
previously undetected. It mainly deals with the unlabelled data
3. self-supervisedlearning
This is a specific instance of supervised learning, but it’s different

enough that it deserves its own category. Self-supervised learning is
supervised learning without human-annotated labels
1. Introduction to tensorflow
TensorFlow is a free and open-source software library for dataflow and
differentiable programming across a range of tasks. It is a symbolic math
library, and is also used for machine learning applications such as neural
network. In the proposed model, the whole Sequential CNN architecture
(consists of several layers) uses TensorFlow at backend. It is also used to
reshape the data (image) in the data processing.
2. Introduction to Keras
Keras is a deep-learning framework for Python that provides a
convenient way to define and train almost any kind of deep-learning
model. Keras was initially developed for researchers, with the aim of
enabling fast experimentation.
Keras has the following key features:
◾ It allows the same code to run seamlessly on CPU or GPU.
◾ It has a user-friendly API that makes it easy to quickly prototype
deep-learning models.
◾ It has built-in support for convolutional networks (for computer
vision), recur- rent networks (for sequence processing), and any
combination of both.
Keras has well over 200,000 users, ranging from academic

researchers and engi- neers at both startups and large companies to
graduate students and hobbyists. Keras is used at Google, Netflix, Uber,
CERN, Yelp, Square, and hundreds of startups working on a wide
range of problems.
Figure 2.1 Google web search interest for different deep-learning frameworksover time
3. Introduction to MobileNetV2
MobileNetV2 is a significant improvement over MobileNetV1 and pushes
the state of the art for mobile visual recognition including classification, object
detection and semantic segmentation. MobileNetV2 is released as part
of TensorFlow-Slim Image Classification Library, or you can start exploring
MobileNetV2 right away in Colaboratory. Alternately, you can download the
notebook and explore it locally using Jupyter. MobileNetV2 is also available
as modules on TF-Hub, and pretrained checkpoints can be found on github.
MobileNetV2 builds upon the ideas from MobileNetV1 [1], using depthwise
separable convolution as efficient building blocks. However, V2 introduces
two new features to the architecture: 1) linear bottlenecks between the layers,
and 2) shortcut connections between the bottlenecks1. The basic structure
. Is shown below
Overview of MobileNetV2 Architecture. Blue blocks represent composite convolutional building blocks as shown above.
Project structure
a) Objective
To identify the person on image/video stream wearing face
mask with the help of computer vision and deep learning
algorithm by using the tensorflow/keras library.
b) Approach
Train Deep learning model (MobileNetV2 Process data
Apply mask detector over images / live video stream
The dataset/
directory contains the data described in the “Our COVID-19 face mask detection
dataset” section.
This dataset consists

of 1,376 images
belonging to two classes:
•with_mask
690 images
•without_mask
686 images
We’ll be reviewing three Python scripts :

•train_mask_detector.py : Accepts our input dataset and fine-tunes MobileNetV2 upon
it to create our mask_detector.model
•detect_mask_video.py : Using your webcam, this script applies face mask detection to
every frame in the stream
The Proposed Method
The proposed method consists of a cascade

classifier and a pre-trained CNN which contains two
2D convolution layers connected to layers of dense
neurons. The algorithm for face mask detection is as
follows:
The proposed method can locate the face in real time

and assess how the mask is being worn to aid the
control of the pandemic in public areas.
A. Data Processing
Data preprocessing involves conversion of data from a given format to
much more user friendly, desired and meaningful format. It can be in any form
like tables, images, videos, graphs, etc. These organized information fit in
with an information model or composition and captures relationship between
different entities
• Conversion of RGB image to Gray image

Modern descriptor-based image recognition systems regularly work on
grayscale images, without elaborating the method used to convert from color-
to-grayscale. This is because the color-to-grayscale method is of little
consequence when using robust descriptors.
Building blocks of CNN architecture
A Convolutional Neural Network (ConvNet/CNN) is a Deep

Learning algorithm which can take in an input image, assign importance
(learnable weights and biases) to various aspects/objects in the image and be
able to differentiate one from the other. The pre-processing required in a
ConvNet is much lower as compared to other classification algorithms. While
in primitive methods filters are hand-engineered, with enough training,
ConvNets have the ability to learn these filters/characteristics.
Which is CNN's greatest advantage?

What is the biggest advantage utilizing CNN? Little dependence on
pre processing, decreasing the needs of human effort developing its
functionalities. It is easy to understand and fast to implement. It has
the highest accuracy among all alghoritms that predicts images.
Anatomy of a convolutional neural network
In neural networks, Convolutional neural network (ConvNets or CNNs) is one
of the main categories to do images recognition, images classifications.
Objects detections, recognition faces etc., are some of the areas where CNNs
are widely used.
Computers sees an input image as array of pixels and it depends on the image
resolution. Based on the image resolution, it will see h x w x d( h = Height, w =
Width, d = Dimension ). Eg., An image of 6 x 6 x 3 array of matrix of RGB (3 refers
to RGB values) and an image of 4 x 4 x 1 array of matrix of grayscale image.
Figure 3: Every image is a matrix of pixel

values.
Technically, deep learning CNN models to train and test, each input image will pass it
through a series of convolution layers with filters (Kernals), Pooling, fully connected
layers (FC) and apply Softmax function to classify an object with probabilistic values
between 0 and 1.
1. Convolution Layer
Convolution is the first layer to extract features from an input image. Convolution preserves the
relationship between pixels by learning image features using small squares of input data. It is a
mathematical operation that takes two inputs such as image matrix and a filter or kernel.
Consider a 5 x 5 whose image pixel values are 0, 1 and filter matrix 3 x 3 as shown in
below
Then the convolution of 5 x 5 image matrix multiplies with 3 x 3 filter matrix which is
called “Feature Map” as output shown in below
2. Padding
Sometimes filter does not fit perfectly fit the input image. We have two options:
• Pad the picture with zeros (zero-padding) so that it fits
•Drop the part of the image where the filter did not fit. This is called valid padding which keeps only
valid part of the image.
Non Linearity (ReLU)

ReLU stands for Rectified Linear Unit for a non-linear operation. The output is ƒ(x) = max(0,x).
Why ReLU is important : ReLU’s purpose is to introduce non-linearity in our ConvNet. Since, the
real world data would want our ConvNet to learn would be non-negative linear values.
the ReLU operation can be understood clearly from Figure 9 below. It shows the ReLU
operation applied to one of the feature maps obtained in the figure above. The output feature
map here is also referred to as the ‘Rectified’ feature map.
Figure 4: ReLU operation

3. Poolinglayers
Pooling layers section would reduce the number of parameters when the images are too large.
Spatial pooling also called subsampling or downsampling which reduces the dimensionality of
each map but retains important information. Spatial pooling can be of different types:
• Max Pooling
• Average Pooling
• Sum Pooling
Max pooling takes the largest element from the rectified feature map. Taking the largest element
could also take the average pooling. Sum of all elements in the feature map call as sum pooling
Just like in the convolution step, the creation of the pooled feature map also makes us dispose of
unnecessary information or features. In this case, we have lost roughly 75% of the original information
found in the feature map since for each 4 pixels in the feature map we ended up with only the maximum
value and got rid of the other 3. These are the details that are unnecessary and without which the network
can do its job more efficiently.
The reason we extract the maximum value, which is actually the point from the whole pooling step,
is to account for distortions. Let's say we have three cheetah images, and in each image the
cheetah's tear lines are taking a different angle
4. Flattening
After finishing the previous two steps, we're supposed to have a pooled feature map by
now. As the name of this step implies, we are literally going to flatten our pooled feature
map into a column like in the image below.
The reason we do this is that we're going to need to insert this data into an
artificial neural network later on.
As you see in the image above, we have multiple pooled feature maps
from the previous step.
What happens after the flattening step is that you end up with a long
vector of input data that you then pass through the artificial neural
network to have it processed further.
5. Fully Connected Layer
The layer we call as FC layer, we flattened our matrix into vector and feed it into a fully
connected layer like a neural network.
Figure 9 : After pooling layer, flattened

as FC layer
The Full Connection Process

As we said the input layer contains the vector of data that was created in the flattening
step. The features that we distilled throughout the previous steps are encoded in this
vector.
At this point, they are already sufficient for a fair degree of accuracy in recognizing
classes. We now want to take it to the next level in terms of complexity and precision.
What isthe aim of thisstep?

The role of the artificial neural network is to take this data and combine the features into
a wider variety of attributes that make the convolutional network more capable of
classifying images, which is the whole purpose from creating a convolutional neural
network.
The Convolution Process:AQuick Recap
Since we're now done with this section, let's make a quick recap of what we
learned about convolutional neural networks. In the diagram below, you can see
the entire process of creating and optimizing a convolutional neural network that
we covered throughout the section.
As you see and should probably remember, the process goes as follows:
• We start off with an input image.
• We apply filters or feature maps to the image, which gives us a convolutional layer.
• We then break up the linearity of that image using the rectifier function.
•The image becomes ready for the pooling step, the purpose of which is providing our convolutional
neural network with the faculty of “spatial invariance” which you'll see explained in more detail in the
pooling step.
• After we're done with pooling, we end up with a pooled feature map.
• We then flatten our pooled feature map before inserting into an artificial neural network.
Implementingour COVID-19 face maskdetector
training script with Keras and TensorFlow
Now that we’ve reviewed our face mask

dataset,let’sseehowwe canuse Kerasand
TensorFlowto traina classifier
to automatically detectwhethera personis
wearinga maskor not.
To accomplish this task, we’ll be fine-tuning the MobileNet V2

architecture, a highly efficient architecture that can be applied to
embedded devices with limited computational capacity (ex., Raspberry
Pi, Google Coral, NVIDIA Jetson Nano, etc.).
Note: If your interest is embedded computer vision, be sure to check out
my Raspberry Pi for Computer Vision book which covers working with
computationally limited devices for computer vision and deep learning.
Deploying our face mask detector to embedded devices could reduce
the cost of manufacturing such face mask detection systems, hence why
we choose to use this architecture.
Implementing our COVID-19 face maskdetector
training script with Keras and TensorFlow
The imports for our training script may look intimidating and so many..
tensorflow.keras imports allow for:

• Data augmentation
•Loading the MobilNetV2 classifier (we will fine-tune this model with
pretrained ImageNet weights)
• Building a new fully-connected (FC) head
• Pre-processing
• Loading image data
deep learning hyperparameters :
Here, We’ve specified hyperparameter constants including our initial learning rate,
number of training epochs, and batch size. Later, we will be applying a learning rate decay
schedule, which is why we’ve named the learning rate variable INIT_LR.
At this point, we’re ready to load and pre-process our training data:
In this block, we are:
• Grabbing all of the imagePaths in the dataset
• Initializing data and labels lists (Lines 36 and37)
•Looping over the imagePaths and loading + pre-processing images (Lines 39-48). Pre-
processing steps include resizing to 224×224 pixels, conversion to array format, and scaling the
pixel intensities in the input image to the range [-1, 1] (via the preprocess_input convenience
function)
•Lines 51-53 one-hot encode our class labels, meaning that our data will be in the following
format:
As you can see, each element of our labels

array consists of an array in which only one index is “hot” (i.e., 1).
•Appending the pre-processed image and associated label to the data and labels lists,
respectively (Lines 47 and 48)
• Ensuring our training data is in NumPy array format (Lines 55 and 56)
Using scikit-learn’s convenience method, Lines 58 and 59 segment our data into 80%
training and the remaining 20% for testing.
During training, we’ll be applying on-the-fly mutations to our images in an effort to
improve generalization. This is known as date augmentation, where the random
rotation, zoom, shear, shift, and flip parameters are established on Lines 62-69. We’ll
use the Aug object at training time.
Fine-tuning setup is a three-step process:

1.Load MobileNet with pre-trained ImageNet weights, leaving off head of network (Lines
73 and 74)
2.Construct a new FC head, and append it to the base in place of the old head (Lines 78-87)
3.Freeze the base layers of the network (Lines 91 and 92). The weights of these base layers
will not be updated during the process of backpropagation, whereas the head layer weights
will be tuned.
With our data prepared and model architecture in place for fine-tuning, we’re now ready to
compile and train our face mask detector network:
Lines 96-98
Compile our model with the Adam optimizer, a learning rate decay schedule, and binary cross-
entropy. If you’re building from this training script with > 2 classes, be sure to use categorical
cross-entropy.
Face mask training is launched via Lines 102-107. Notice how our data augmentation object
(aug) will be providing batches of mutated image data.
Here, Lines 111-115 make predictions on the test set, grabbing the highest probability class
label indices. Then, we print a classification report in the terminal for inspection.
Line 123 serializes our face mask classification model to disk.
Screenshots fromExecution
Here both persons wear masks correctly
When persons don’t wear a mask or try to deceive the

program
Even if there is a group of people the program will disting those
who wear masks from who do not
Here we tried to show the program a monkey face which is close to human being
facial features , but it does not consider it as a human face
CONCLUSION
To mitigate the spread of COVID-19 pandemic, measures

must be taken. We have modeled a face mask detector
learning methods in neural networks. To train, validate
and test the model, we used the dataset that consisted
of 686 masked faces images and 690 unmasked faces
images. These images were taken from various resources
like Kaggle datasets.
The model was inferred on images and live video
streams. To select a base model, we evaluated the
metrics like accuracy, precision and recall and selected
MobileNetV2 architecture with the best performance
having 100% precision and 99% recall.
It is also computationally efficient using MobileNetV2

which makes it easier to install the model to embedded
systems. This face mask detector can be deployed in
many areas like shopping malls, airports and other heavy
traffic places to monitor the public and to avoid the
spread of the disease by checking who is following basic
rules and who is not.

Face Mask Detector: A Project Report Submitted in Partial Fulfillment of The Requirement For The Award of The Degree of

Uploaded by

Copyright:

Available Formats

Face Mask Detector: A Project Report Submitted in Partial Fulfillment of The Requirement For The Award of The Degree of

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Face Mask Detector: A Project Report Submitted in Partial Fulfillment of The Requirement For The Award of The Degree of

Uploaded by

Copyright:

Available Formats

Face Mask Detector

A Project Report submitted in partial fulfillment

TECHNICAL UNIVERSITY DEGREE

ACHRAF CHTITEH ZAKARIA ELASRI

Mr. MRANI NABIL

First of all, We would like to express our deepest gratitude

Then, we would like to give a special thanks and respect to

Finally , we would also like to thanks our colleagues and

THE PROPOSED METHOD P10

IMPLEMENTATION OFTHE MASK DETECTOR P20

SCREENSHOTS FROM EXECUTION P25

This chapter covers

Data Machine Figure 1.2 Machine learning:

A machine-learning system is trained rather than explicitly programmed. It’s

This is a specific instance of supervised learning, but it’s different

Keras has well over 200,000 users, ranging from academic

This dataset consists

We’ll be reviewing three Python scripts :

The proposed method consists of a cascade

The proposed method can locate the face in real time

• Conversion of RGB image to Gray image

A Convolutional Neural Network (ConvNet/CNN) is a Deep

Which is CNN's greatest advantage?

Figure 3: Every image is a matrix of pixel

Non Linearity (ReLU)

Figure 4: ReLU operation

Figure 9 : After pooling layer, flattened

The Full Connection Process

What isthe aim of thisstep?

• We start off with an input image.

training script with Keras and TensorFlow

Now that we’ve reviewed our face mask

To accomplish this task, we’ll be fine-tuning the MobileNet V2

tensorflow.keras imports allow for:

deep learning hyperparameters :

As you can see, each element of our labels

Fine-tuning setup is a three-step process:

Here both persons wear masks correctly

When persons don’t wear a mask or try to deceive the

To mitigate the spread of COVID-19 pandemic, measures

It is also computationally efficient using MobileNetV2

You might also like