Face Mask Detector: A Project Report Submitted in Partial Fulfillment of The Requirement For The Award of The Degree of
Face Mask Detector: A Project Report Submitted in Partial Fulfillment of The Requirement For The Award of The Degree of
Face Mask Detector: A Project Report Submitted in Partial Fulfillment of The Requirement For The Award of The Degree of
Submitted by
ADNANE ELMOURABITI
Under supervision of
PROJECT INTRODUCTION P4
WHAT IS DEEPLEARNING P5
DEFINITION OF FUNDAMENTAL CONCEPT P6
INTRODUCTION TO KERAS & TENSORFLOW & MOBILENETV2 P8
PROJECT STRUCTURE P9
OBJECTIVE P9
APPROACH P9
CONCLUSION P27
Abstract
The corona virus COVID-19 pandemic is causing a global health crisis
so the effective protection methods is wearing a face mask in public
areas according to the World Health Organization(WHO).
After the breakout of the worldwide pandemic COVID-19, there arises
a severe need of protection mechanisms, face mask being the primary
one. The basic aim of the project is to detect the presence of a face
mask on human faces on live streaming video as well as on images. .
A hybrid model using deep and classical machine learning for face
mask detection will be presented. A face mask detection dataset
consists of with mask and without mask images , we are going to use
TensorFlow to do real-time face detection from a live stream via our
webcam. We will use the dataset to build a COVID-19 face mask
detector with computer vision using Python Tensor Flow and Keras.
Our goal is to identify whether the person on image/video stream is
wearing a face mask or not with the help of computer vision and deep
learning.
1 Project Introduction
The year 2020 has shown mankind some mind-boggling series of events
amongst which the COVID- 19 pandemic is the most life-changing event
which has startled the world since the year began. Affecting the health
and lives of masses, COVID-19 has called for strict measures to be
followed in order to prevent the spread of disease. From the very basic
hygiene standards to the treatments in the hospitals, people are doing all
they can for their own and the society’s safety; face masks are one of the
personal protective equipment. People wear face masks once they step
out of their homes and authorities strictly ensure that people are wearing
face masks while they are in groups and public places.
To monitor that people are following this basic safety principle, a
strategy should be developed. A face mask detector system can be
implemented to check this. Face mask detection means to identify
whether a person is wearing a mask or not. The first step to recognize
the presence of a mask on the face is to detect the face, which makes the
strategy divided into two parts: to detect faces and to detect masks on
those faces. Face detection is one of the applications of object detection
and can be used in many areas like security, biometrics, law enforcement
and more. There are many detec- tor systems developed around the
world and being implemented. However, all this science needs
optimization; a better, more precise detector, because the world cannot
afford any more increase in corona cases.
What is deeplearning?
In the past few years, artificial intelligence ( AI) has been a subject of intense media
hype. Machine learning, deep learning, and AI come up in countless articles, often
outside of technology-minded publications. We’re promised a future of intelligent
chatbots, self-driving cars, and virtual assistants a future sometimes painted in a
grim light and other times as utopian, where human jobs will be scarce and most
economic activity will be handled by robots or AI agents. For a future or current
practitioner of machine learning.
1.1 Artificial intelligence, machinelearning,
and deep learning
First, we need to define clearly what we’re talking about when we
mention AI. What are artificial intelligence, machine learning, and deep
learning (see figure 1.1)? How do they relate to each other?
Artificial
intelligence
Machine
learning Figure 1.1 Artificial intelligence,
machine learning, and deep
Deep learning
learning
1. Artificial intelligence
Artificial intelligence was born in the 1950s, when a handful of
pioneers from the nascent field of computer science started asking
whether computers could be made to “think” a question whose
ramifications we’re still exploring today.
Although symbolic AI proved suitable to solve well-defined, logical
problems, such as playing chess, it turned out to be intractable to figure
out explicit rules for solving more complex, fuzzy problems, such as
image classification, speech recognition, and language translation. A
new approach arose to take symbolic AI’s place: machine learning.
2. Machine learning
Machine learning arises from this question: could a computer go
beyond “what we know how to order it to perform” and learn on its own
how to perform a specified task? Could a computer surprise us? Rather
than programmers crafting data-processing rules by hand, could a
computer automatically learn these rules by looking at data?
This question opens the door to a new programming paradigm. In
classical programming, the paradigm of symbolic AI, humans input rules
(a program) and data to be processed according to these rules, and out
come answers (see figure 1.2). With machine learning, humans input data
as well as the answers expected from the data,
and out come the rules. These rules can then be applied to new data to
produce original answers.
Rules Classical
Answers
Data programming
3. Deep learning
Deep learning is a class of machine learning algorithms that uses multiple layers to
progressively extract higher-level features from the raw input. For example,
in image processing, lower layers may identify edges, while higher layers may
identify the concepts relevant to a human such as digits or letters or
faces. Learning can be supervised, semi-supervised or unsupervised.
1. Supervisedlearning
This is Supervised learning (SL) is the machine learning task of learning a function
that maps an input to an output based on example input-output pairs. It infers a
function from labeled training data consisting of a set of training examples.
◾ Syntax tree prediction—Given a sentence, predict its decomposition into a syntax tree.
◾ Object detection—Given a picture, draw a bounding box around certain objects inside
the picture. This can also be expressed as a classification problem (given many
candidate bounding boxes, classify the contents of each one) or as a joint
classification and regression problem, where the bounding-box coordinates are
predicted via vector regression.
◾ Image segmentation —Given a picture, draw a pixel-level mask on a specific object.
2. Unsupervisedlearning
This branch of machine learning consists of finding interesting
transformations of the input data without the help of any targets, it allows the
model to work on its own to discover patterns and information that was
previously undetected. It mainly deals with the unlabelled data
3. self-supervisedlearning
2. Introduction to Keras
Keras is a deep-learning framework for Python that provides a
convenient way to define and train almost any kind of deep-learning
model. Keras was initially developed for researchers, with the aim of
enabling fast experimentation.
Keras has the following key features:
◾ It allows the same code to run seamlessly on CPU or GPU.
◾ It has a user-friendly API that makes it easy to quickly prototype
deep-learning models.
◾ It has built-in support for convolutional networks (for computer
vision), recur- rent networks (for sequence processing), and any
combination of both.
Figure 2.1 Google web search interest for different deep-learning frameworksover time
3. Introduction to MobileNetV2
MobileNetV2 is a significant improvement over MobileNetV1 and pushes
the state of the art for mobile visual recognition including classification, object
detection and semantic segmentation. MobileNetV2 is released as part
of TensorFlow-Slim Image Classification Library, or you can start exploring
MobileNetV2 right away in Colaboratory. Alternately, you can download the
notebook and explore it locally using Jupyter. MobileNetV2 is also available
as modules on TF-Hub, and pretrained checkpoints can be found on github.
MobileNetV2 builds upon the ideas from MobileNetV1 [1], using depthwise
separable convolution as efficient building blocks. However, V2 introduces
two new features to the architecture: 1) linear bottlenecks between the layers,
and 2) shortcut connections between the bottlenecks1. The basic structure
. Is shown below
Overview of MobileNetV2 Architecture. Blue blocks represent composite convolutional building blocks as shown above.
Project structure
a) Objective
To identify the person on image/video stream wearing face
mask with the help of computer vision and deep learning
algorithm by using the tensorflow/keras library.
b) Approach
Train Deep learning model (MobileNetV2 Process data
Apply mask detector over images / live video stream
The dataset/
directory contains the data described in the “Our COVID-19 face mask detection
dataset” section.
Computers sees an input image as array of pixels and it depends on the image
resolution. Based on the image resolution, it will see h x w x d( h = Height, w =
Width, d = Dimension ). Eg., An image of 6 x 6 x 3 array of matrix of RGB (3 refers
to RGB values) and an image of 4 x 4 x 1 array of matrix of grayscale image.
Technically, deep learning CNN models to train and test, each input image will pass it
through a series of convolution layers with filters (Kernals), Pooling, fully connected
layers (FC) and apply Softmax function to classify an object with probabilistic values
between 0 and 1.
1. Convolution Layer
Convolution is the first layer to extract features from an input image. Convolution preserves the
relationship between pixels by learning image features using small squares of input data. It is a
mathematical operation that takes two inputs such as image matrix and a filter or kernel.
Consider a 5 x 5 whose image pixel values are 0, 1 and filter matrix 3 x 3 as shown in
below
Then the convolution of 5 x 5 image matrix multiplies with 3 x 3 filter matrix which is
called “Feature Map” as output shown in below
2. Padding
Sometimes filter does not fit perfectly fit the input image. We have two options:
• Pad the picture with zeros (zero-padding) so that it fits
•Drop the part of the image where the filter did not fit. This is called valid padding which keeps only
valid part of the image.
the ReLU operation can be understood clearly from Figure 9 below. It shows the ReLU
operation applied to one of the feature maps obtained in the figure above. The output feature
map here is also referred to as the ‘Rectified’ feature map.
Just like in the convolution step, the creation of the pooled feature map also makes us dispose of
unnecessary information or features. In this case, we have lost roughly 75% of the original information
found in the feature map since for each 4 pixels in the feature map we ended up with only the maximum
value and got rid of the other 3. These are the details that are unnecessary and without which the network
can do its job more efficiently.
The reason we extract the maximum value, which is actually the point from the whole pooling step,
is to account for distortions. Let's say we have three cheetah images, and in each image the
cheetah's tear lines are taking a different angle
4. Flattening
After finishing the previous two steps, we're supposed to have a pooled feature map by
now. As the name of this step implies, we are literally going to flatten our pooled feature
map into a column like in the image below.
The reason we do this is that we're going to need to insert this data into an
artificial neural network later on.
As you see in the image above, we have multiple pooled feature maps
from the previous step.
What happens after the flattening step is that you end up with a long
vector of input data that you then pass through the artificial neural
network to have it processed further.
5. Fully Connected Layer
The layer we call as FC layer, we flattened our matrix into vector and feed it into a fully
connected layer like a neural network.
At this point, they are already sufficient for a fair degree of accuracy in recognizing
classes. We now want to take it to the next level in terms of complexity and precision.
Since we're now done with this section, let's make a quick recap of what we
learned about convolutional neural networks. In the diagram below, you can see
the entire process of creating and optimizing a convolutional neural network that
we covered throughout the section.
As you see and should probably remember, the process goes as follows:
• We apply filters or feature maps to the image, which gives us a convolutional layer.
• We then break up the linearity of that image using the rectifier function.
•The image becomes ready for the pooling step, the purpose of which is providing our convolutional
neural network with the faculty of “spatial invariance” which you'll see explained in more detail in the
pooling step.
• After we're done with pooling, we end up with a pooled feature map.
• We then flatten our pooled feature map before inserting into an artificial neural network.
Implementingour COVID-19 face maskdetector
The imports for our training script may look intimidating and so many..
Here, We’ve specified hyperparameter constants including our initial learning rate,
number of training epochs, and batch size. Later, we will be applying a learning rate decay
schedule, which is why we’ve named the learning rate variable INIT_LR.
At this point, we’re ready to load and pre-process our training data:
In this block, we are:
• Grabbing all of the imagePaths in the dataset
• Initializing data and labels lists (Lines 36 and37)
•Looping over the imagePaths and loading + pre-processing images (Lines 39-48). Pre-
processing steps include resizing to 224×224 pixels, conversion to array format, and scaling the
pixel intensities in the input image to the range [-1, 1] (via the preprocess_input convenience
function)
•Lines 51-53 one-hot encode our class labels, meaning that our data will be in the following
format:
•Appending the pre-processed image and associated label to the data and labels lists,
respectively (Lines 47 and 48)
• Ensuring our training data is in NumPy array format (Lines 55 and 56)
Using scikit-learn’s convenience method, Lines 58 and 59 segment our data into 80%
training and the remaining 20% for testing.
During training, we’ll be applying on-the-fly mutations to our images in an effort to
improve generalization. This is known as date augmentation, where the random
rotation, zoom, shear, shift, and flip parameters are established on Lines 62-69. We’ll
use the Aug object at training time.
Lines 96-98
Compile our model with the Adam optimizer, a learning rate decay schedule, and binary cross-
entropy. If you’re building from this training script with > 2 classes, be sure to use categorical
cross-entropy.
Face mask training is launched via Lines 102-107. Notice how our data augmentation object
(aug) will be providing batches of mutated image data.
Here, Lines 111-115 make predictions on the test set, grabbing the highest probability class
label indices. Then, we print a classification report in the terminal for inspection.
Line 123 serializes our face mask classification model to disk.
Screenshots fromExecution
Here we tried to show the program a monkey face which is close to human being
facial features , but it does not consider it as a human face
CONCLUSION