ABSTRACT

Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 49

METHODOLOGY

The usage of Deep Learning (DL) and Machine Learning (ML) approaches has
occurred as an innovative paradigm in facial assessment for age and gender
identification. This methodology includes using a smooth machine learning model
like SVM and a powerful deep learning model like CNN from scratch for accurate
predictions by decoding the facial patterns.

Figure 1 Methodology Framework

4.1 DATA ANALYSIS


The UTKFace dataset is a large set of RGB facial images that includes people
ranging in age from 116 years old to newborns. It has more than 20k photos with
accompanying information on age, gender, and ethnicity.
Figure 2 UTK Face Dataset

These photos display a range of lighting settings, some partial obstacles, various
angles, expressions on the faces, and resolutions. This dataset can be used for a
variety of applications, including face detection, age estimation, ageing process
prediction, landmark identification on the face, and more.

We have used this dataset for age and gender prediction using deep learning and
machine learning techniques. Originally the dataset contains 23, 706 face images
with annotations of age, gender, and ethnicity, but we needed only age and gender’
annotations to train the proposed model so that’s why Ethnicity column has been
removed from this dataset for better age and gender prediction. The age column of
dataset has been categorized into distinct categories and ranges so that age prediction
can be done better.

4.2 AGE IDENTIFICATION USING CNN


It is difficult to assume the exact age of the individuals by using regression for age
prediction because facial genetics of the individuals are unique from each other even
people from the same age. So, it will be more suitable to predict the range of the age.
Therefore, we also supposed that this problem was a classification task among age
categories that we have made by using UTK Face Dataset, the ages grouped as
following (0-3), (3-18), (18-35), (35-60), (60-116).
Figure 3 Visualization of age in UTK Face dataset

Figure 4 Age Categories

The CNN model is proposed that contains 15 layers out of which 4 are convolutional
layers and the activation function used in final layer is SoftMax for classification of
age groups. After training the model, the model will be tested on the unseen data to
check if the predictions being made are correct or not.

4.3 GENDER IDENTIFICATION USING CNN


The gender identification was supposed to be a classification task challenge which
was tackled by using the CNNs from scratch which consists of total 15 layers out of
which 4 are convolutional layers and the final output layer has the SoftMax
activation function with 2 neurons to represent male and female classes. By training
the model using the training dataset, the model will be able to predict the gender of
the test dataset of the model.
Figure 5 Visualization of gender in UTK Face dataset

4.4 GENDER IDENTIFICATION USING SVM


The proposed method focuses on determining the most productive machine learning
technique for the classification of gender using facial images from the UTKFace
dataset. The process includes many key stages, starting with pre-processing stages
such as normalizing the images against brightness variations, scaling them to same
sizes, and face detection using the Viola-Jones technique. The histogram-oriented
gradient (HOG) is a feature extraction technique which is then applied to extract
descriptive features from these preprocessed images. Principal Component Analysis
(PCA) is applied to the images to dimensionality reduce the features. These features
are again processed using Principal Component Analysis (PCA) to reduce
dimensionality, improving computational efficiency and classification performance.
Finally, a Support Vector Machine (SVM) classifier (80% training, 20% split test) is
trained and tested for classification of gender classes (male or female). This method
provides advantages such as advanced feature extraction and machine learning
techniques to accurately classify gender based on facial features.
Chapter 5

DETAILED DESIGN AND ARCHITECTURE

5.1 SYSTEM ARCHITECTURE


The system architecture design approach encompasses the models of deep
learning and machine learning to train the UTK Face dataset to predict age
and gender of individuals correctly using their facial attributes. These models
include Convolutional Neural Networks (CNNs) for age and gender
identification from scratch and Support Vector Machine (SVM) for gender
identification. The deep learning models for age and gender prediction are
then integrated in an application for real time age and gender estimation from
the facial photos of individuals that encloses a comprehensive and efficient
framework that is modified for efficiency, scalability, and flexibility.

At the center of our approach, there is an adaptative framework that is


thoughtfully designed to ensure optimal integration and interaction of
different components and modules of the software. The prediction procedure
runs so flawlessly and seamlessly from start to finish, because each
component has different functionalities.

The cornerstone of our software depends upon the utilization of CNNs for age
and gender prediction. These Neural Networks are developed from scratch,
allowing the customization of the system architecture that is optimal for the
extraction of distinctive features from facial images of individuals for age and
gender prediction. Utilizing CNNs ensures us to capture the intricate instances
and variations present in facial images, offering high accuracy even in real-
time scenarios.
Furthermore, the real time processing capabilities are achieved through robust
data processing pipelines, optimized model prediction strategies, and
integrated system component interface. The architecture design incorporates
the capabilities of CNNs from scratch within the flexible design of the
software that is tailored for the accurate prediction of real time facial images.
The aim is to deliver precise, reliable, and effective forecasts that should be
suited for a wide range of real-world applications by harnessing the benefits
of Neural Networks.
5.1.1 Architecture Design
Following is the system architecture design of the software application:

Figure 6 System Architecture

5.1.2 Subsystem Architecture

The subsystem architecture of our software application consists of ensuring


dataset preparation, model training and evaluation, prediction processing and
then integration of model with software interface. These distinct modules are
used for accurate and precise prediction of real-time facial images.

 Dataset Preparation Module

This module is the core of the subsystem architecture and contains the dataset
that is going to be used for model training and validation. It consists of data
acquisition, annotation, and splitting to ensure the quality and integrity of the
dataset. The UTKFace dataset is a large set of facial images that includes
people ranging in age from 116 years old to newborns and gender (female and
male). It has 23, 706 photos with accompanying information on age and
gender. This module works closely with data preprocessing as it involves the
resizing of the images before training the models. Our system ensures that the
training data is properly formatted and annotated for reliable age and gender
predictions. It streamlines the data processing pipeline ensuring the overall
effectiveness of the model.

 Model Training Module

This module is responsible for model training and validation using


Convolutional Neural Networks (CNNs). The software uses the customized
model of CNNs to generate facial attributes that are essential for estimating
age and gender. The training of this model includes iterative refining of the
model hyperparameters like convolutional layers, kernel filters, activation
functions, batch normalization, regularization, and dropout functions. The
model hyperparameters are then tuned for achieving maximum performance
and generalizability of the model.

 Prediction Module

After the models are trained, then these models are utilized to predict age and
gender from unseen facial images of testing dataset. This module works on
the backend server of the application software offering accurate results to the
real-time images that has been input by the users after face detection using
pretrained models of MediaPipe or YOLO algorithms to accurately locate the
facial regions of the individuals. After receiving the new image from the user
that is either captured or imported, this module uses the face detection
algorithm to detect the face and then uses the CNN model and its weights to
predict the age and gender of the individual. The predicted age and gender
labels are then displayed to the user and the information is saved into the
database at the backend for further analysis.

 Software Interface Module

It is an integration module that ensures flawless communication between the


prediction module and the user-interface, facilitating the users to
communicate with the software and get predictions in real time. It consists of
different pages of the application like sign up page, login page, welcome
page, home page, and results page etc.

When a user starts the application, and uploads an image after signing in, this
component captures the image data and passes to prediction module. After
predictions, the results are displayed on the user interface and saved on the
database. By integrating the CNN model with the software interface, the
subsystem architecture allows the users to easily interact with the prediction
module to get the results, and it also improves the application’s usability and
efficiency. The software interface is user friendly, easier to input facial
images and receive accurate predictions and save the findings. This improved
the entire user experience and facilitated intuitive usage.

Figure 7 Modules of Subsystem Architecture

5.2 DETAILED SYSTEM DESING


The functional requirements, non-functional requirements, system
architecture, user interface design, and technical requirements are all covered
in the detailed system design, which is based on the software specifications
that have been provided. Here's a thorough explanation:
5.1.3 Sequence diagram

Figure 8 Sequence diagram


5.1.4 Activity diagram

Figure 9 Activity diagram


5.1.5 Functional requirements

The following functional criteria apply to an app that uses facial analysis to
identify age and gender:

Use-Case 1: Sign Up

Table 1 Use-Case 1

Identifier UC1
Signup of users depending on Username and
Purpose
Password.
Priority High
Pre-
User must have email to sign up
conditions
Post- User must access his email and get 6 chars
conditions verification code to verify his email.
Typical Course of Action
S
Actor Action System Response
#
Open the webpage
Select the button of the
1 specialized for completing
registration.
sign up process.
Register user entered data
into database.
Fill in the registration form
2 Send message via user’s
and press sign up button.
registered email with email
verification code.
Log in to the system with Display Home page with
3 username and password after welcome message to the
verification. registered user.
Use-Case 2: Sign In

Table 2 Use-Case 2

Identifier UC2
Signing users depending on Username and
Purpose
Password.
Priority High
Pre- User must have been registered earlier on our
conditions system.
Post-
User had verified his email.
conditions
Typical Course of Action
S
Actor Action System Response
#
Sign In users using Username The system successfully
1 and Password by pressing on redirects the user to the
Log in Button. Home Page with a welcome
2 Press Sign In message.
Use-Case3: Forgot Password and reset password

Table 3 Use-Case 3

Identifier UC3

To configure old password and reset a new


Purpose
password

Priority High

Pre-
User registered in the application
conditions

Post-
fill the input requirements
conditions

Typical Course of Action

S
Actor Action System Response
#

Click on Forgot password Sends email verification and


1
button pin code

Navigate to reset password


2 Click on reset password
configuration
Use-Case4: Import Image

Table 4 Use-Case 4

Identifier UC4

Allow user to import image for age and gender


Purpose
prediction

Priority High

Pre-
User on the image upload face
conditions

Post-
Face detection of imported image
conditions

Typical Course of Action

S
Actor Action System Response
#

Select the option to import Opens gallery for image


1
image selection

Displays the chosen image


2 Select a single face image
for analysis

3 System processes the image Detects the face and


proceeds with age and
gender prediction

Use-Case5: Capture Image

Table 5 Use-Case 5

Identifier UC5

Allow user to capture image for age and gender


Purpose
prediction

Priority High

Pre-
User on the image captured face
conditions

Post-
Face detection of captured image
conditions

Typical Course of Action

S
Actor Action System Response
#

Select the option to capture Opens camera for image


1
image capture

2 Capture a single face image Displays the capture image


for analysis

Detects the face and


3 System processes the image proceeds with age and
gender prediction

Use-Case6: Face Detection

Table 6 Use-Case 6

Identifier UC6

Face detection of imported/captured images for


Purpose
age and gender prediction

Priority High

Pre-
Selection of imported/captured images
conditions

Post- Face detection of uploaded image for age and


conditions gender prediction

Typical Course of Action

S
Actor Action System Response
#

1 Image selected or captured Analyze face detection of


images

Proceed with age and gender


2 Face detected
prediction

Use-Case7: Age and gender prediction

Table 7 Use-Case 7

Identifier UC7

Age and gender prediction after face detection


Purpose
of images

Priority High

Pre-
Face detection of imported/captured images
conditions

Post-
Results saved in a database
conditions

Typical Course of Action

S
Actor Action System Response
#

Prediction of age and gender


1 Saves result in database
completed
Display confirmation
2 Results saved in database
message

Use-Case8: Share analysis results

Table 8 Use-Case 8

Identifier UC8

Purpose Allow user to share and export analysis results

Priority Medium

Analysis completed and results displayed and


Pre-conditions
saved

Post-
Analysis results shared or exported
conditions

Typical Course of Action

S
Actor Action System Response
#

Provides options for sharing


1 Click on share/export button (social media, email) or
exporting (PDF, CSV)

2 Select desired Executes the chosen action


sharing/exporting method

Indicates successful
Confirmation message or
3 sharing/exporting of analysis
shareable link generated
results
5.1.6 Use-case diagram

Figure 10 Use-case diagram

5.1.7 Constraints
1) The application will accept only one image at a time, users are
prohibited to upload multiple images at a time of different facial
images of individuals.

2) Batch processing of images at a single time to get predictions of age


and gender is restricted, only single image analysis is requested.

3) The granularity of age prediction is limited since the prediction


module predicts the age categories of individuals not the exact age.
4) The model may not be accurate for predictions of the individuals of
Asian countries like China and Korea due to similarities in their facial
features, since the diversity of the dataset used to train the model is not
vast.

5.1.8 Composition
The composition of the application comprises of different modules like:

 Software interface module

Users interact with the application through the user-interface of application


developed in react native. Users will sign-up to register their account and then
upload their image or capture their image for further processing.

 Prediction Module

After face detection of the uploaded/captured image, the image will be passed
to prediction module where predictions are made using the trained CNN
model and its weights.

 Model Training

The model that will be used at the backend is trained on the popular UTK
Face Dataset and evaluated using the testing dataset. After the evaluation, it is
used for the estimation of age and gender of uploaded images by the user.

 Dataset module

The dataset module contains the dataset that is used for the training, validation
and testing of the model and to improve the model’s performance.

5.1.9 Non-functional requirements

Non-functional requirements, go beyond the software's immediate


functionality and concentrate on features like performance, security, usability,
and maintainability. Here are some non-functional criteria for the application
that uses facial features to identify age and gender:

 Performance

When a picture is uploaded, the software will react very instantly,


guaranteeing a speedy analysis in a matter of seconds. It will be able to
manage several simultaneous requests with ease, accommodating higher user
loads without sacrificing performance. Memory and computational resources
will be optimized for seamless operation in a range of device combinations.

 Security

Put in place strong safeguards to protect user information and make sure that
data protection laws (GDPR, CCPA, etc.) are followed. To limit access to
important functions and data, authentication and authorization methods will
be used.

 Usability

Will Create an intuitive user interface that ensures a smooth experience for
users with different levels of technical expertise. During picture uploads and
result presentations, users can give concise, helpful feedback, along with error
messages for unsuccessful analyses.

 Maintainability

We will create the application with a modular framework that makes upgrades
and future improvements simpler. To keep track of modifications and oversee
the development of the program, we will use version control systems.

 Compliance

We will make sure that all legal and industry standards for image processing,
facial recognition, and data protection are followed. User consent and privacy
will be respected by incorporating ethical principles and considerations into
the app's design and usage.

5.1.10 User interface design


5.1.11 Technical requirements

 Software Requirements

1) Operating System: The system should be compatible with major


operating systems such as Windows, Linux, or macOS.

2) Programming Languages:

3) Python: Required for implementing machine learning and deep


learning algorithms.

4) React Native: Required for developing the frontend of application.

5) Django: Required for integrating the models with frontend of


application using APIs.

6) Libraries: TensorFlow, Keras, scikit-learn, skimage, OpenCV for


machine learning and computer vision tasks, MediaPipe for face
detection.

 App Development Framework


If a user interface is required, frameworks like React or next.js can be used.

1) React Native for front-end development

2) Django for backend development

3) Data Preprocessing Tools:

4) Pandas and NumPy for data cleaning, manipulation, and


transformation.

5) scikit-image for image processing tasks.

6) MediaPipe for face detection tasks.

 Machine Learning and Deep Learning Frameworks

1) TensorFlow and Keras: Used for building, training, and evaluating


CNN models for age and gender prediction.

2) scikit-learn: Utilized for implementing SVM classifiers and additional


machine learning functionalities.

 Image Processing Libraries

1) OpenCV: Essential for image preprocessing, facial detection, and feature


extraction.

2) scikit-image: Used for additional image processing tasks.


 Development tools

1) Popular IDEs such as Visual Studio Code, PyCharm, or Jupyter


Notebooks for code development.

5.1.12 Interface/Exports
1) The software interface has been developed from React Native Expo.

2) The rest APIs of Django are used to integrate the model with the frontend.

3) Necessary libraries like TensorFlow, keras, MediaPipe, react-native etc.


have been imported.
5.3 CLASS DIAGRAM

Figure 11 Class Diagram


Chapter 6

IMPLEMENTATION AND TESTING

6.1 DEEP LEARNING MODELS

6.1.1 GENDER IDENTIFICATION

The modified Convolutional Neural Network (CNN) is designed for gender


classification tasks using RGB facial images with a resolution of 100 x 100 pixels. It
starts with the convolutional 2d layers utilizing 16 kernel filters and the hyperbolic
tangent (tanh) activation function, integrated with L2 regularization to reduce
overfitting. After this, batch normalization is consequently employed to guarantee
balance and effective training, along with a max-pooling layer to downsize the spatial
dimensions, improving feature extraction. The subsequent convolutional 2d layers
further refine feature extraction: the second layer uses 32 filters with Rectified Linear
Unit (ReLU) activation, accompanied by batch normalization and max pooling, which
is repeated in a third convolutional layer. A fourth convolutional layer introduces 64
filters, followed by max pooling to reduce dimensionality. The subsequent layers
include a fully connected layer with 128 units and ReLU activation, followed by
dropout regularization at a rate of 20% to mitigate overfitting. The final layer utilizes
sigmoid activation for binary gender classification ('male' as 0, 'female' as 1).

During model training, the Adam optimizer is employed with binary cross-entropy as
the loss function and accuracy as the evaluation metric. This architecture has proven
effective for gender classification in grayscale facial images. Input images are present in
RGB format at 100 x 100 pixels.
In the training phase, different batch sizes (256, 128, 64) were explored to discover the
most optimal configuration. Large scale experimentation identified a batch size of 256
as providing optimal results, balancing convergence speed and reliable gradient
estimation. Using both Adam and stochastic gradient descent (SGD) optimizers across
various learning rates, the model's performance was assessed. Adam consistently
showed more advanced results, highlighting the importance of optimizer, and learning
rate selection.

Iterative hyperparameter adjustments played a vital role in optimizing the model.


Parameters such as learning rates, early stopping patience values, and minimum delta
were fine-tuned to increase accuracy over successive training iterations. An
independent test set was used to conduct accurate determination of the model's
generalization capabilities, with test loss and accuracy serving as key metrics to validate
performance on unseen data.

Finally, the model's practical use was shown by precisely predicting gender categories
('male' or 'female') using certain images from the test set, demonstrating its
effectiveness in real-world gender identification. This comprehensive method
emphasizes the model's robustness and reliability in handling grayscale facial image
classification for gender identification.

Figure 12 Block diagram of CNN for gender identification


A single neuron with a 'sigmoid' activation function acts as the output layer, modified
for binary classification tasks like gender prediction. This architecture is developed to
efficiently learn hierarchical features from input images and predict gender based on the
learned representations. Adam optimizer is used for training, and additional methods
such as early stopping and learning rate reduction are employed to increase model
performance and avoid overfitting.

Table 9 Parameters used in CNN for gender identification

Parameters Used Value


Batch size 256
Convolutional layers 4
Activation function Tanh, Relu, Sigmoid
Loss function Binary cross entropy
Optimizers Adam
Total number of parameters 160289
Number of trainable parameters 160129
Learning rate 0.001
Dropout 0.1

6.1.2 AGE IDENTIFICATION

The developed Convolutional Neural Network (CNN) framework for age prediction is
obtained by efficiently categorizing facial photos into distinguishing age instances. The
model contains 4 convolutional 2d layers gradually increasing kernel filters from 16 to
128. Batch normalization is applied along with each convolutional 2d Each
convolutional layer is followed by batch normalization to balance and intensify training,
exploiting activation functions such as rectified linear unit (ReLU) and hyperbolic
tangent (tanh) to generate complex patterns hierarchically from the input images. After
that Max-pooling has been applied with a pool size of (2,2) are strategically placed after
each convolutional layer to scale down feature maps and improve computational
performance.
The generated facial features are then flattened to be managed by fully connected layers
(dense layers) in the following convolutional 2d layers. The model encompasses 2
dense layers with 256 and 128 neurons respectively, combined with dropout rate of 0.1
after each FC layer. The dropout rate helps in avoiding overfitting by randomly
blocking a set of inactivated neurons during training. The output layer of this model is
composed of 5 neurons by using activation function of SoftMax, allowing for the
prediction of age instances probabilities.

The Adam optimizer is used along with categorical cross-entropy as the loss function to
obtain optimization during training procedure, which is compatible for multi-class
categorization strategies like age instances prediction. The performance metric of
accuracy is evaluated to examine the model’s performance, calculating the correctly
predicted age groups’ percentage.

Moreover, L2 regularization with a coefficient factor of 0.01 is employed to convo 2d


layers to determine values having increased weight and avoid overfitting.

This precisely shown CNN framework selects to train the model with the capability to
distinguish visual attributes and efficiently distinguish between different age categories
based on input of facial images. The careful combination of convolutional, batch
normalization, dropout, and dense layers, along with proper activation functions and
optimization techniques, underscores the model’s effectiveness and potential in age
prediction using facial analysis.
Figure 13 Block diagram of CNN for age identification

The output layer consists of neurons, each applying a ‘sigmoid’ activation function.
This layer is optimized and provides age predictions using the categorical cross-entropy
loss function. The model is trained using Adam optimizer efficiently, and it enhances
the learning rates for every parameter as it goes. To determine the model’s efficiency in
precisely categorizing age instances from facial images, performance metrics are also
used. This extensive framework finds it to smoothly predict age categories from input
facial images by implementing regularization techniques to prevent overfitting and
increase the model’s performance using the proper activation functions and optimizers.

Table 10 Parameters used in CNN for age identification

Parameters Used Value


Batch size 256
Convolutional layers 4
Activation function Tanh, Relu
Loss function Categorical cross entropy
Optimizers Adam
Total number of parameters 688485
Number of trainable parameters 688261
Learning rate 0.001
Dropout 0.1
6.1.3 TESTING, TRAINING AND VALIDATION

The dataset is divided into 70% for training the model for a certain number of epochs
and a given batch size. Many callbacks are applied, such as EarlyStopping to reduce
overfitting and ReduceLROnPlateau to dynamically change the learning rate. 20% of
the dataset is used for validation to ensure the fine-tuning of the trained model. Lastly
for trained model evaluation on the testing subset, which is 10% of the dataset, the
trained model has been employed to the unseen data to check its performance and
robustness.

A separate testing subset, constituting 10% of the dataset, is dedicated to evaluating the
model's performance on previously unseen data. The model is applied to this subset,
and predictions are compared with the actual gender labels. This testing process
provides insights into the generalization capabilities of the model and its effectiveness
in real-world scenarios.

6.2 MACHINE LEARNING MODEL


The goal is to discover the most effective and ideal technique for identifying a person's
gender from facial images using machine learning techniques. The standard procedure
for identifying a person's gender from photos is reading the images from the image
database and then extracting the facial image in accordance with the type of dataset.

6.2.1 PREPROCESSING

We have used UTKFace dataset, and each image is read from this dataset. Pre-
processing is necessary for all the photos in the image dataset, including normalization
against brightness variations, scaling, and noise removal. The facial area is identified by
subjecting each image to the Viola-Jones technique, and the recognized faces were then
scaled to a predetermined size of 48x48 pixels.

6.2.2 FEATURE EXTRACTION

Extraction of usable features from face photos is a crucial step in successful gender
classification. The characteristics of the Histogram Oriented Gradient (HOG) and
Gabor Filters and combined as input for gender classification.
 HISTOGRAM ORIENTED GRADIENT (HOG)
Histogram Oriented Gradient is based on computing the gradient orientation histograms
for each cell by dividing an image into smaller cells. Following that, the histograms are
combined into a single feature vector that describes the overall structure of the image.
The HOG features are extracted using “skimage”.
 GABOR FILTERS
Gabor filters, named after physicist Dennis Gabor, are linear filters commonly used in
image processing and computer vision. They are particularly effective for texture
analysis and edge detection. These features are extracted using “Opencv”.

The process of joining or merging different attributes of a data set into one feature
representation refers to combining features. We have combined the features by
concatenating. This is basically connecting several features into a distinct vector,
ensuing in a higher-dimensional feature representation.

Now, we have Split the dataset into training and testing sets to evaluate the model's
performance. 80% data is selected for training the model and 20% data is selected to
evaluate the model’s performance.

6.2.3 PRINCIPAL COMPONENT ANALYSIS (PCA)

Then we reduced the dimensions of the feature vector by using PCA. PCA is a
technique statistically used to decrease the dimensionality of data while maintaining a
lot of information. The primary goal of PCA is to identify a new collection of
orthogonal axes, known as principal components, that will effectively capture the most
significant information in the data.

6.2.4 SUPPORT VECTOR MACHINE (SVM)

Gender recognition of humans has been achieved in this study by using SVM
Classifiers. This step involves training the SVM classifier so that it will be able to
differentiate between the classes i.e., female and male.
Figure 14 Flowchart of gender identification using SVM
Chapter 7

RESULTS AND DISCUSSION

7.1 PREPROCESSING AND PARAMETERS SETTING

7.1.1 GENDER IDENTIFICATION

Images are scaled into grayscale (100 x 100 x 3). The output dense layer consists
of a ‘softmax’ activation function with one unit either depicting ‘male’ or ‘female’
class. As this is a binary classification problem, the loss function that is used here
is binary cross entropy. Optimization is done using Adam optimizer with a learning
rate of 0.0001. The final model consists of 688261 trainable parameters.

7.1.2 AGE IDENTIFICATION

Similarly, the images (100 x 100 x 3) are first scaled into rgb channel. The images
are passed then the model for classifying the age category. As classification model
is used, so loss function used is categorical cross entropy and optimizer and
learning rate used are same as for gender identification. This model also contains
688261 trainable parameters.

7.2 EXPERIMENTAL RESULTS

7.2.1 GENDER IDENTIFICATION USING CNN

The CNN model described here is specifically designed for gender identification
using RGB images of size 100 x 100 pixels. The decision to convert images to
RGB is strategic, focusing on essential image features for gender classification.

To optimize the model's performance, a comprehensive experimental approach was


employed. The model's performance was systematically evaluated across a range
of batch sizes (512, 256, 128, 64) to identify the optimal batch size that balances
training efficiency and computational resources. Through experimentation, it was

38
determined that a batch size of 256 achieved the best balance of performance and
resource efficiency for the CNN model.

Many combinations of hyperparameters were determined to enhance the model’s


evaluation process besides the batch size exploration. This method has involved
applying different optimizers like Adam and SGD with many values of learning
rates, as well as adapting patience values for early stopping and minimum delta
benchmark for stopping the training process. These hyperparameters play vital
roles in managing model convergence speed, stability, and generalization.

The adjustment of the values of parameters to increase the model’s performance


and convergence is done for repetitive hyperparameters refining process. This
method presented crucial insights into the model’s behavior and patterns under
different training set of condition on gender prediction.

Additionally, the generalizability of the trained model was continuously


determined by applying it to the data that was not included in the training and
validation dataset. To assess how great the model generalized to the independent
test set, the evaluation metrics like test loss and accuracy were determined,
presenting vital discernments into the real-world performance and usability.

An example was pointed out using a specific image from the test set to show the
model's effectiveness in gender prediction categories ('male' represented as 0 and
'female' represented as 1). To prepare the test image for model input, image
preprocessing techniques were applied, allowing precise gender prediction based
on the highest chances of output.

This thorough computational approach combined with the repetitive


hyperparameter tuning and wide determination, validates the effectiveness of the
CNN model in gender prediction. The study’s results present crucial information in
increasing the CNN architecture for comparable classification tasks.

Overall, this comprehensive experimental approach, combined with iterative

39
hyperparameter tuning and thorough evaluation, validates the CNN model's
efficacy in gender identification tasks. The study's results present key insights into
optimizing CNN architectures for similar image classification tasks and highlight
the model the model's effectiveness in real-world applications requiring precise
gender prediction from RGB images.

Table 11 Comparison of the effect of different learning rate on the training dataset

Learning rate Gender Accuracy

0.0001 89.8%
0.001 91.46%
0.01 91.8%
0.1 93.19%
1.0 92.72%

Table 12 Comparison of the effect of different batch sizes on the testing dataset

Batch size Gender Accuracy


512 88.16%
256 88.2%
128 87.19%
64 86.64%

40
Figure 15 Test Accuracy of gender identification CNN model

Figure 16 Test Loss of gender identification CNN model

7.2.2 AGE IDENTIFICATION USING CNN

A systematic approach in this study is used for age identification model tackle
comprehensive optimization of its hyperparameters. The model’s robustness was
determined across different batch sizes () while exploring many combinations of
values of learning rates, patience values for early stopping, and minimum delta
limit for stopping the training. On a designated training set, each configuration was
trained and validated on a separate validation set to scale its effects on performance
metrics.

Accuracy and loss metrics were meticulously observed to understand how changes
in hyperparameters affected the model's learning dynamics and convergence
throughout this iterative process. Researchers got valuable insights into the model's
behavior under different training circumstances by systematically changing these
parameters.

41
Additionally, the generalizability of the trained model was determined by using the
model onto the unseen dataset that was not included in training and validation
dataset. Test loss and accuracy were the metrics that were compared to assess the
model’s performance on this unseen dataset. The model’s predictive capabilities in
real world conditions were shown after the model generated the predictions on
photos of unseen data.

Image preprocessing techniques were applied to appropriately format the test data
to ensure compatibility with the model's input requirements. This preprocessing
step assisted precise age category predictions by the model, identifying the age
group with the highest probability among its output categories.

Collectively, a robust foundation for understanding the model's behavior across


different hyperparameter design were provided by these comprehensive
experiments, evaluating its generalization capability, and validating its
effectiveness in providing accurate age identification.

Table 13 Comparison of the effect of different learning rate on the training dataset

Learning rate Age Accuracy


0.00001 92.75%
0.0001 94.11%
0.001 94.51%
0.01 94.7%
0.1 95.13%

42
Figure 17 Accuracy vs Learning Rate

Table 14 Comparison of the effect of different batch sizes on the testing dataset

Batch size Age Accuracy


512 69.77%
256 81.81%
128 72.5%
64 73.57%

43
Figure 18 Test Accuracy of age identification CNN model

Figure 19 Test loss of age identification CNN model

7.2.3 PERFORMANCE METRICS

 ACCURACY

The accuracy for gender prediction using deep learning architecture called
Convolutional Neural Networks is 89.49% and for age prediction, it is 81.81%. It
defines the correctly classified gender labels among all the predictions that have
been made and the correct proportion of categorized instances of age in the test set.
The high accuracy level shows the efficient performance of model in
discriminating between male and female subjects and across different age
categories based on their facial features.
 F1 SCORE

44
F1 is computed as 0.886 which gives a balanced analysis of the model’s precision
and recall. It is a robust metric for binary classification tasks like gender prediction
considering it takes both false positives and false negatives. An increased F1 score
specifies that there is a good balance between minimizing false positives and false
negatives in the model, thus proficiently capturing the hidden features for gender
prediction in the dataset. On the other hand, the macro F1 score across all age
instances is approximately 0.7607. It gives a balanced analysis of model’s
precision and recall for all age categories.

 ROC

The Receiver Operating Characteristic (ROC) curve, along with its corresponding
Area Under the Curve (AUC), serves as a performance metric for the model’s
ability to distinguish between male and female subjects against different threshold
values. The value of ROC AUC is approximately 0.9658 which suggests that the
performance of model is great at distinguishing between genders, with a
comparatively high true positive rate and a low false positive rate. This intimates
that the model has the capability to effectively distinguish between male and
female faces.

 SPECIFICTY

Specificity is calculated as 0.907 and it measures the percentage of correctly


identified male subjects. This performance measure enhances specificity (true
negative rate) and gives insight into the model’s performance in identifying male
faces specifically. An increased specificity value suggests that model demonstrates
a low rate of false positives among male subjects, further pledges its capability to

45
accurately identify gender based on facial features. Specificity of age identifies the
true negative cases within each age category. It is calculated as 0.7392,
demonstrating fluctuations in the model’s performance across different age
categories. Classes with higher specificity like class 0 have better performance and
capability to recognize the age of individual outside their age category.

 SENSISTIVITY

Sensitivity is calculated as 0.8797 and it measures the percentage of correctly


identified female subjects. This performance measure enhances specificity (true
positive rate) and gives insight into the model’s performance in identifying female
faces specifically. An increased specificity value suggests that model demonstrates
a low rate of false positives among female subjects, further pledges its capability to
accurately identify gender based on facial features. Sensitivity of age identifies the
true positive cases within each age category. It is calculated as 0.7878,
demonstrating fluctuations in the model’s performance across different age
categories.

Table 15 Performance Metrics for age and gender prediction through CNN

Models Accuracy Specificity ROC/AUC Sensitivity F1 Score


Age CNN 81.69% 73.92% - 78.78% 76.07%
Gender 89.49% 90.77% 96.58% 87.97% 88.69%
CNN

46
CNN Model Performance
for age prediction
82.00%
80.00%
78.00%
76.00%
74.00%
72.00%
70.00%
68.00%
Accuracy F1 Score Sensitivity Specificity

Figure 20 Graph of performance metrics for age identification

CNN Model Performance


for gender prediction
98.00%
96.00%
94.00%
92.00%
90.00%
88.00%
86.00%
84.00%
82.00%
Accuracy F1 Score Sensitivity Specificity ROC

Figure 21 Graph of performance metrics for gender identification

7.2.4 GENDER IDENTIFICATION USING SVM

Table 16 Comparison of effects of different kernels on the testing dataset

Kernel Accuracy

Linear 82%

Poly 85%

47
RBF 86%

7.2.5 Performance measures for gender prediction using SVM

 Accuracy
The overall accuracy of the model stands at 86%, demonstrating that the model
accurately predicts the orientation 86% of the time. We accomplished this exactness
utilizing 'rbf' kernel of support vector machine.
 Precision
As far as precision, which estimates the exactness of positive expectations, the model
accomplishes precision of 85% for anticipating male and 86% for anticipating females.
This implies that when the model predicts male or female, it is right 85% and 86% of the
time, respectively.
 Recall
Recall, or sensitivity, demonstrates the model's capacity to accurately distinguish every
pertinent occurrence. For this model, the recall is 84% for males and 87% for females.
This demonstrates that the model effectively distinguishes 84% of every real male and
87% of all genuine females accurately.

 F1 score
The F1 score, which is a harmonic mean of accuracy and review, remains at 85% for
males and 86% for females. These F1 scores recommend a fair presentation among

48
accuracy and review, giving a more thorough perspective on the model's viability in
foreseeing the two sexes.

Class Accuracy Precision Recall F1 Score


Female (0) 90% 90% 91% 90%
Male (1) 90% 89% 89% 89%

49

You might also like