ABSTRACT
ABSTRACT
ABSTRACT
The usage of Deep Learning (DL) and Machine Learning (ML) approaches has
occurred as an innovative paradigm in facial assessment for age and gender
identification. This methodology includes using a smooth machine learning model
like SVM and a powerful deep learning model like CNN from scratch for accurate
predictions by decoding the facial patterns.
These photos display a range of lighting settings, some partial obstacles, various
angles, expressions on the faces, and resolutions. This dataset can be used for a
variety of applications, including face detection, age estimation, ageing process
prediction, landmark identification on the face, and more.
We have used this dataset for age and gender prediction using deep learning and
machine learning techniques. Originally the dataset contains 23, 706 face images
with annotations of age, gender, and ethnicity, but we needed only age and gender’
annotations to train the proposed model so that’s why Ethnicity column has been
removed from this dataset for better age and gender prediction. The age column of
dataset has been categorized into distinct categories and ranges so that age prediction
can be done better.
The CNN model is proposed that contains 15 layers out of which 4 are convolutional
layers and the activation function used in final layer is SoftMax for classification of
age groups. After training the model, the model will be tested on the unseen data to
check if the predictions being made are correct or not.
The cornerstone of our software depends upon the utilization of CNNs for age
and gender prediction. These Neural Networks are developed from scratch,
allowing the customization of the system architecture that is optimal for the
extraction of distinctive features from facial images of individuals for age and
gender prediction. Utilizing CNNs ensures us to capture the intricate instances
and variations present in facial images, offering high accuracy even in real-
time scenarios.
Furthermore, the real time processing capabilities are achieved through robust
data processing pipelines, optimized model prediction strategies, and
integrated system component interface. The architecture design incorporates
the capabilities of CNNs from scratch within the flexible design of the
software that is tailored for the accurate prediction of real time facial images.
The aim is to deliver precise, reliable, and effective forecasts that should be
suited for a wide range of real-world applications by harnessing the benefits
of Neural Networks.
5.1.1 Architecture Design
Following is the system architecture design of the software application:
This module is the core of the subsystem architecture and contains the dataset
that is going to be used for model training and validation. It consists of data
acquisition, annotation, and splitting to ensure the quality and integrity of the
dataset. The UTKFace dataset is a large set of facial images that includes
people ranging in age from 116 years old to newborns and gender (female and
male). It has 23, 706 photos with accompanying information on age and
gender. This module works closely with data preprocessing as it involves the
resizing of the images before training the models. Our system ensures that the
training data is properly formatted and annotated for reliable age and gender
predictions. It streamlines the data processing pipeline ensuring the overall
effectiveness of the model.
Prediction Module
After the models are trained, then these models are utilized to predict age and
gender from unseen facial images of testing dataset. This module works on
the backend server of the application software offering accurate results to the
real-time images that has been input by the users after face detection using
pretrained models of MediaPipe or YOLO algorithms to accurately locate the
facial regions of the individuals. After receiving the new image from the user
that is either captured or imported, this module uses the face detection
algorithm to detect the face and then uses the CNN model and its weights to
predict the age and gender of the individual. The predicted age and gender
labels are then displayed to the user and the information is saved into the
database at the backend for further analysis.
When a user starts the application, and uploads an image after signing in, this
component captures the image data and passes to prediction module. After
predictions, the results are displayed on the user interface and saved on the
database. By integrating the CNN model with the software interface, the
subsystem architecture allows the users to easily interact with the prediction
module to get the results, and it also improves the application’s usability and
efficiency. The software interface is user friendly, easier to input facial
images and receive accurate predictions and save the findings. This improved
the entire user experience and facilitated intuitive usage.
The following functional criteria apply to an app that uses facial analysis to
identify age and gender:
Use-Case 1: Sign Up
Table 1 Use-Case 1
Identifier UC1
Signup of users depending on Username and
Purpose
Password.
Priority High
Pre-
User must have email to sign up
conditions
Post- User must access his email and get 6 chars
conditions verification code to verify his email.
Typical Course of Action
S
Actor Action System Response
#
Open the webpage
Select the button of the
1 specialized for completing
registration.
sign up process.
Register user entered data
into database.
Fill in the registration form
2 Send message via user’s
and press sign up button.
registered email with email
verification code.
Log in to the system with Display Home page with
3 username and password after welcome message to the
verification. registered user.
Use-Case 2: Sign In
Table 2 Use-Case 2
Identifier UC2
Signing users depending on Username and
Purpose
Password.
Priority High
Pre- User must have been registered earlier on our
conditions system.
Post-
User had verified his email.
conditions
Typical Course of Action
S
Actor Action System Response
#
Sign In users using Username The system successfully
1 and Password by pressing on redirects the user to the
Log in Button. Home Page with a welcome
2 Press Sign In message.
Use-Case3: Forgot Password and reset password
Table 3 Use-Case 3
Identifier UC3
Priority High
Pre-
User registered in the application
conditions
Post-
fill the input requirements
conditions
S
Actor Action System Response
#
Table 4 Use-Case 4
Identifier UC4
Priority High
Pre-
User on the image upload face
conditions
Post-
Face detection of imported image
conditions
S
Actor Action System Response
#
Table 5 Use-Case 5
Identifier UC5
Priority High
Pre-
User on the image captured face
conditions
Post-
Face detection of captured image
conditions
S
Actor Action System Response
#
Table 6 Use-Case 6
Identifier UC6
Priority High
Pre-
Selection of imported/captured images
conditions
S
Actor Action System Response
#
Table 7 Use-Case 7
Identifier UC7
Priority High
Pre-
Face detection of imported/captured images
conditions
Post-
Results saved in a database
conditions
S
Actor Action System Response
#
Table 8 Use-Case 8
Identifier UC8
Priority Medium
Post-
Analysis results shared or exported
conditions
S
Actor Action System Response
#
Indicates successful
Confirmation message or
3 sharing/exporting of analysis
shareable link generated
results
5.1.6 Use-case diagram
5.1.7 Constraints
1) The application will accept only one image at a time, users are
prohibited to upload multiple images at a time of different facial
images of individuals.
5.1.8 Composition
The composition of the application comprises of different modules like:
Prediction Module
After face detection of the uploaded/captured image, the image will be passed
to prediction module where predictions are made using the trained CNN
model and its weights.
Model Training
The model that will be used at the backend is trained on the popular UTK
Face Dataset and evaluated using the testing dataset. After the evaluation, it is
used for the estimation of age and gender of uploaded images by the user.
Dataset module
The dataset module contains the dataset that is used for the training, validation
and testing of the model and to improve the model’s performance.
Performance
Security
Put in place strong safeguards to protect user information and make sure that
data protection laws (GDPR, CCPA, etc.) are followed. To limit access to
important functions and data, authentication and authorization methods will
be used.
Usability
Will Create an intuitive user interface that ensures a smooth experience for
users with different levels of technical expertise. During picture uploads and
result presentations, users can give concise, helpful feedback, along with error
messages for unsuccessful analyses.
Maintainability
We will create the application with a modular framework that makes upgrades
and future improvements simpler. To keep track of modifications and oversee
the development of the program, we will use version control systems.
Compliance
We will make sure that all legal and industry standards for image processing,
facial recognition, and data protection are followed. User consent and privacy
will be respected by incorporating ethical principles and considerations into
the app's design and usage.
Software Requirements
2) Programming Languages:
5.1.12 Interface/Exports
1) The software interface has been developed from React Native Expo.
2) The rest APIs of Django are used to integrate the model with the frontend.
During model training, the Adam optimizer is employed with binary cross-entropy as
the loss function and accuracy as the evaluation metric. This architecture has proven
effective for gender classification in grayscale facial images. Input images are present in
RGB format at 100 x 100 pixels.
In the training phase, different batch sizes (256, 128, 64) were explored to discover the
most optimal configuration. Large scale experimentation identified a batch size of 256
as providing optimal results, balancing convergence speed and reliable gradient
estimation. Using both Adam and stochastic gradient descent (SGD) optimizers across
various learning rates, the model's performance was assessed. Adam consistently
showed more advanced results, highlighting the importance of optimizer, and learning
rate selection.
Finally, the model's practical use was shown by precisely predicting gender categories
('male' or 'female') using certain images from the test set, demonstrating its
effectiveness in real-world gender identification. This comprehensive method
emphasizes the model's robustness and reliability in handling grayscale facial image
classification for gender identification.
The developed Convolutional Neural Network (CNN) framework for age prediction is
obtained by efficiently categorizing facial photos into distinguishing age instances. The
model contains 4 convolutional 2d layers gradually increasing kernel filters from 16 to
128. Batch normalization is applied along with each convolutional 2d Each
convolutional layer is followed by batch normalization to balance and intensify training,
exploiting activation functions such as rectified linear unit (ReLU) and hyperbolic
tangent (tanh) to generate complex patterns hierarchically from the input images. After
that Max-pooling has been applied with a pool size of (2,2) are strategically placed after
each convolutional layer to scale down feature maps and improve computational
performance.
The generated facial features are then flattened to be managed by fully connected layers
(dense layers) in the following convolutional 2d layers. The model encompasses 2
dense layers with 256 and 128 neurons respectively, combined with dropout rate of 0.1
after each FC layer. The dropout rate helps in avoiding overfitting by randomly
blocking a set of inactivated neurons during training. The output layer of this model is
composed of 5 neurons by using activation function of SoftMax, allowing for the
prediction of age instances probabilities.
The Adam optimizer is used along with categorical cross-entropy as the loss function to
obtain optimization during training procedure, which is compatible for multi-class
categorization strategies like age instances prediction. The performance metric of
accuracy is evaluated to examine the model’s performance, calculating the correctly
predicted age groups’ percentage.
This precisely shown CNN framework selects to train the model with the capability to
distinguish visual attributes and efficiently distinguish between different age categories
based on input of facial images. The careful combination of convolutional, batch
normalization, dropout, and dense layers, along with proper activation functions and
optimization techniques, underscores the model’s effectiveness and potential in age
prediction using facial analysis.
Figure 13 Block diagram of CNN for age identification
The output layer consists of neurons, each applying a ‘sigmoid’ activation function.
This layer is optimized and provides age predictions using the categorical cross-entropy
loss function. The model is trained using Adam optimizer efficiently, and it enhances
the learning rates for every parameter as it goes. To determine the model’s efficiency in
precisely categorizing age instances from facial images, performance metrics are also
used. This extensive framework finds it to smoothly predict age categories from input
facial images by implementing regularization techniques to prevent overfitting and
increase the model’s performance using the proper activation functions and optimizers.
The dataset is divided into 70% for training the model for a certain number of epochs
and a given batch size. Many callbacks are applied, such as EarlyStopping to reduce
overfitting and ReduceLROnPlateau to dynamically change the learning rate. 20% of
the dataset is used for validation to ensure the fine-tuning of the trained model. Lastly
for trained model evaluation on the testing subset, which is 10% of the dataset, the
trained model has been employed to the unseen data to check its performance and
robustness.
A separate testing subset, constituting 10% of the dataset, is dedicated to evaluating the
model's performance on previously unseen data. The model is applied to this subset,
and predictions are compared with the actual gender labels. This testing process
provides insights into the generalization capabilities of the model and its effectiveness
in real-world scenarios.
6.2.1 PREPROCESSING
We have used UTKFace dataset, and each image is read from this dataset. Pre-
processing is necessary for all the photos in the image dataset, including normalization
against brightness variations, scaling, and noise removal. The facial area is identified by
subjecting each image to the Viola-Jones technique, and the recognized faces were then
scaled to a predetermined size of 48x48 pixels.
Extraction of usable features from face photos is a crucial step in successful gender
classification. The characteristics of the Histogram Oriented Gradient (HOG) and
Gabor Filters and combined as input for gender classification.
HISTOGRAM ORIENTED GRADIENT (HOG)
Histogram Oriented Gradient is based on computing the gradient orientation histograms
for each cell by dividing an image into smaller cells. Following that, the histograms are
combined into a single feature vector that describes the overall structure of the image.
The HOG features are extracted using “skimage”.
GABOR FILTERS
Gabor filters, named after physicist Dennis Gabor, are linear filters commonly used in
image processing and computer vision. They are particularly effective for texture
analysis and edge detection. These features are extracted using “Opencv”.
The process of joining or merging different attributes of a data set into one feature
representation refers to combining features. We have combined the features by
concatenating. This is basically connecting several features into a distinct vector,
ensuing in a higher-dimensional feature representation.
Now, we have Split the dataset into training and testing sets to evaluate the model's
performance. 80% data is selected for training the model and 20% data is selected to
evaluate the model’s performance.
Then we reduced the dimensions of the feature vector by using PCA. PCA is a
technique statistically used to decrease the dimensionality of data while maintaining a
lot of information. The primary goal of PCA is to identify a new collection of
orthogonal axes, known as principal components, that will effectively capture the most
significant information in the data.
Gender recognition of humans has been achieved in this study by using SVM
Classifiers. This step involves training the SVM classifier so that it will be able to
differentiate between the classes i.e., female and male.
Figure 14 Flowchart of gender identification using SVM
Chapter 7
Images are scaled into grayscale (100 x 100 x 3). The output dense layer consists
of a ‘softmax’ activation function with one unit either depicting ‘male’ or ‘female’
class. As this is a binary classification problem, the loss function that is used here
is binary cross entropy. Optimization is done using Adam optimizer with a learning
rate of 0.0001. The final model consists of 688261 trainable parameters.
Similarly, the images (100 x 100 x 3) are first scaled into rgb channel. The images
are passed then the model for classifying the age category. As classification model
is used, so loss function used is categorical cross entropy and optimizer and
learning rate used are same as for gender identification. This model also contains
688261 trainable parameters.
The CNN model described here is specifically designed for gender identification
using RGB images of size 100 x 100 pixels. The decision to convert images to
RGB is strategic, focusing on essential image features for gender classification.
38
determined that a batch size of 256 achieved the best balance of performance and
resource efficiency for the CNN model.
An example was pointed out using a specific image from the test set to show the
model's effectiveness in gender prediction categories ('male' represented as 0 and
'female' represented as 1). To prepare the test image for model input, image
preprocessing techniques were applied, allowing precise gender prediction based
on the highest chances of output.
39
hyperparameter tuning and thorough evaluation, validates the CNN model's
efficacy in gender identification tasks. The study's results present key insights into
optimizing CNN architectures for similar image classification tasks and highlight
the model the model's effectiveness in real-world applications requiring precise
gender prediction from RGB images.
Table 11 Comparison of the effect of different learning rate on the training dataset
0.0001 89.8%
0.001 91.46%
0.01 91.8%
0.1 93.19%
1.0 92.72%
Table 12 Comparison of the effect of different batch sizes on the testing dataset
40
Figure 15 Test Accuracy of gender identification CNN model
A systematic approach in this study is used for age identification model tackle
comprehensive optimization of its hyperparameters. The model’s robustness was
determined across different batch sizes () while exploring many combinations of
values of learning rates, patience values for early stopping, and minimum delta
limit for stopping the training. On a designated training set, each configuration was
trained and validated on a separate validation set to scale its effects on performance
metrics.
Accuracy and loss metrics were meticulously observed to understand how changes
in hyperparameters affected the model's learning dynamics and convergence
throughout this iterative process. Researchers got valuable insights into the model's
behavior under different training circumstances by systematically changing these
parameters.
41
Additionally, the generalizability of the trained model was determined by using the
model onto the unseen dataset that was not included in training and validation
dataset. Test loss and accuracy were the metrics that were compared to assess the
model’s performance on this unseen dataset. The model’s predictive capabilities in
real world conditions were shown after the model generated the predictions on
photos of unseen data.
Image preprocessing techniques were applied to appropriately format the test data
to ensure compatibility with the model's input requirements. This preprocessing
step assisted precise age category predictions by the model, identifying the age
group with the highest probability among its output categories.
Table 13 Comparison of the effect of different learning rate on the training dataset
42
Figure 17 Accuracy vs Learning Rate
Table 14 Comparison of the effect of different batch sizes on the testing dataset
43
Figure 18 Test Accuracy of age identification CNN model
ACCURACY
The accuracy for gender prediction using deep learning architecture called
Convolutional Neural Networks is 89.49% and for age prediction, it is 81.81%. It
defines the correctly classified gender labels among all the predictions that have
been made and the correct proportion of categorized instances of age in the test set.
The high accuracy level shows the efficient performance of model in
discriminating between male and female subjects and across different age
categories based on their facial features.
F1 SCORE
44
F1 is computed as 0.886 which gives a balanced analysis of the model’s precision
and recall. It is a robust metric for binary classification tasks like gender prediction
considering it takes both false positives and false negatives. An increased F1 score
specifies that there is a good balance between minimizing false positives and false
negatives in the model, thus proficiently capturing the hidden features for gender
prediction in the dataset. On the other hand, the macro F1 score across all age
instances is approximately 0.7607. It gives a balanced analysis of model’s
precision and recall for all age categories.
ROC
The Receiver Operating Characteristic (ROC) curve, along with its corresponding
Area Under the Curve (AUC), serves as a performance metric for the model’s
ability to distinguish between male and female subjects against different threshold
values. The value of ROC AUC is approximately 0.9658 which suggests that the
performance of model is great at distinguishing between genders, with a
comparatively high true positive rate and a low false positive rate. This intimates
that the model has the capability to effectively distinguish between male and
female faces.
SPECIFICTY
45
accurately identify gender based on facial features. Specificity of age identifies the
true negative cases within each age category. It is calculated as 0.7392,
demonstrating fluctuations in the model’s performance across different age
categories. Classes with higher specificity like class 0 have better performance and
capability to recognize the age of individual outside their age category.
SENSISTIVITY
Table 15 Performance Metrics for age and gender prediction through CNN
46
CNN Model Performance
for age prediction
82.00%
80.00%
78.00%
76.00%
74.00%
72.00%
70.00%
68.00%
Accuracy F1 Score Sensitivity Specificity
Kernel Accuracy
Linear 82%
Poly 85%
47
RBF 86%
Accuracy
The overall accuracy of the model stands at 86%, demonstrating that the model
accurately predicts the orientation 86% of the time. We accomplished this exactness
utilizing 'rbf' kernel of support vector machine.
Precision
As far as precision, which estimates the exactness of positive expectations, the model
accomplishes precision of 85% for anticipating male and 86% for anticipating females.
This implies that when the model predicts male or female, it is right 85% and 86% of the
time, respectively.
Recall
Recall, or sensitivity, demonstrates the model's capacity to accurately distinguish every
pertinent occurrence. For this model, the recall is 84% for males and 87% for females.
This demonstrates that the model effectively distinguishes 84% of every real male and
87% of all genuine females accurately.
F1 score
The F1 score, which is a harmonic mean of accuracy and review, remains at 85% for
males and 86% for females. These F1 scores recommend a fair presentation among
48
accuracy and review, giving a more thorough perspective on the model's viability in
foreseeing the two sexes.
49