0% found this document useful (0 votes)
2 views

Research Paper

Uploaded by

Manshu Khajuria
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
2 views

Research Paper

Uploaded by

Manshu Khajuria
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 5

JOURNAL OF LATEX CLASS FILES, VOL. 14, NO.

8, AUGUST 2015 1

Comparative Analysis of Deep learning algorithms


for Clinical Datasets
Shrihari Tiwari, Aneesh Tufchi, Manshu Khajuria, and Kshiteej Pawar

Abstract—The paper presents a comparative analysis of deep used for various medical imaging tasks, including lesion
learning models, focusing on image classification for medical detection, tumor segmentation, and classification of brain
datasets such as brain MRI images. Using state-of-the-art models diseases. However, as medical images, especially brain MRIs,
like Convolutional Neural Networks (CNN), Visual Geometry
Group (VGG) networks, Residual Networks (ResNet), and Cap- present unique challenges such as the presence of noise,
sule Networks (CapsNet), the research aims to classify different anatomical variations, and limited datasets, more advanced
stages of diseases such as Alzheimer’s based on MRI scans. CNN architectures are necessary to improve classification
Preprocessing techniques, such as image normalization and aug- accuracy and generalization.
mentation, are utilized to improve model performance. The study VGG (Visual Geometry Group) networks, a deep
evaluates the models on accuracy, robustness, and generalization
capabilities, with the goal of advancing automatic diagnosis in convolutional network architecture, have proven effective
medical imaging. in various image classification tasks. Their use of small
filters (e.g., 3x3 kernels) and deep structures allows for
Index Terms—IEEE, IEEEtran, journal, LATEX, paper, tem-
plate. more detailed feature extraction, capturing minute variations
in medical images that could indicate early-stage diseases.
VGG’s depth enables it to detect fine-grained patterns, making
I. I NTRODUCTION it particularly useful for classifying complex medical datasets
In modern medical diagnostics, accurate and efficient where small differences in image features can have significant
image classification has emerged as a critical tool for the diagnostic implications.
early detection and staging of diseases. Among various Residual Networks (ResNet) take deep learning a step further
medical imaging modalities, brain MRI (Magnetic Resonance by addressing the problem of vanishing gradients, which
Imaging) plays a vital role in diagnosing neurodegenerative commonly occurs in very deep networks. The introduction
disorders such as Alzheimer’s disease, brain tumors, and of residual connections allows the model to bypass certain
multiple sclerosis. These diseases often present complex layers, ensuring that the gradient flows effectively through
patterns of structural changes in the brain, making manual the network. This feature is particularly important in medical
interpretation challenging and time-consuming, especially image classification, where learning intricate, high-level
when attempting to detect early-stage abnormalities that features across many layers can be essential for accurate
may be subtle and difficult to identify. Therefore, the need disease staging. ResNet’s ability to train deep networks
for automated and precise image classification systems without performance degradation has made it one of the
has become increasingly pressing, as these systems can most successful architectures in medical image classification,
significantly improve diagnostic accuracy and reduce the especially for tasks that require distinguishing between early
workload on radiologists. and advanced stages of diseases like Alzheimer’s.
Capsule Networks (CapsNet) represent a more recent
The application of deep learning techniques to medical development in deep learning, offering an alternative
image analysis, particularly for brain MRI scans, has gained approach to CNNs by preserving spatial hierarchies between
substantial momentum in recent years. Deep learning, a features. While CNNs are effective at detecting patterns, they
subfield of machine learning, has revolutionized image sometimes struggle to maintain the relationships between
recognition tasks by enabling models to automatically learn different parts of an image, which can be critical in medical
hierarchical features from raw image data, eliminating the imaging where the orientation and relative positioning of
need for manual feature extraction. These models, when anatomical structures carry diagnostic significance. CapsNet
applied to medical images, can detect patterns that may addresses this limitation by using capsules—groups of
not be immediately apparent to the human eye, leading neurons that encode both the probability of a feature’s
to improved diagnosis, staging, and prognosis of various presence and its pose (position, orientation, scale). This
brain-related diseases. makes CapsNet particularly advantageous for medical image
classification tasks, where variations in patient anatomy or
Among the deep learning models, scanning conditions can affect the orientation and scale of
important features.
Convolutional Neural Networks (CNNs) are particularly The goal of this research is to evaluate and compare
well-suited for image classification due to their ability to the performance of these four prominent deep learning
capture spatial hierarchies in data. CNNs have been widely architectures—CNN, VGG, ResNet, and CapsNet—on the
JOURNAL OF LATEX CLASS FILES, VOL. 14, NO. 8, AUGUST 2015 2

task of classifying brain MRI images. We focus on key 2) Hospital databases: Clinically gathered data from hos-
performance metrics such as accuracy, precision, recall, pitals, which may contain anonymized MRI scans of patients
F1-score, and AUC (Area Under the Curve), as well as along with corresponding medical history and disease staging.
the models’ ability to generalize across different datasets. These datasets often contain labeled images representing dif-
In addition to these metrics, computational efficiency is ferent stages of diseases such as Alzheimer’s (e.g., Mild Cog-
considered, as real-time diagnostic applications require not nitive Impairment (MCI), Early Alzheimer’s Disease (AD),
only accurate but also fast and resource-efficient models. or Healthy Control (HC)). The labeled data is crucial for
The importance of this research lies in its potential to supervised learning models like CNN, VGG, ResNet, and
advance automated medical diagnostics, particularly in the CapsNet. Challenges at this stage:
early detection and staging of neurodegenerative diseases. 3) Data diversity: Datasets must cover various demo-
By providing a thorough comparison of different deep graphic groups, imaging techniques, and disease stages to
learning models, this study aims to highlight the strengths ensure the models generalize well across different patient
and limitations of each model and provide insights into populations.
which architectures are best suited for various types of 4) Data size: Medical datasets can be relatively small due
medical image classification tasks. Furthermore, the findings to privacy concerns and the cost of collecting and annotating
from this research could pave the way for the integration of medical images, leading to issues such as overfitting. There-
deep learning-based image classification systems in clinical fore, augmentation and transfer learning are often necessary.
practice, ultimately improving patient outcomes by enabling
more accurate and timely diagnoses.
In summary, the advancement of deep learning models has B. Data Preprocessing Layer
the potential to transform medical diagnostics, particularly
in the classification of complex medical images like brain Once the MRI images are collected, they undergo several
MRIs. The focus of this research is to explore and analyze preprocessing steps to prepare them for input into the deep
the effectiveness of CNN, VGG, ResNet, and CapsNet learning models. The data preprocessing layer is critical
in classifying brain MRI images and to determine their because the quality of the input data directly influences the
applicability in real-world medical settings. By leveraging performance of the models. Key tasks in this layer include:
the capabilities of these models, the medical community
can improve the accuracy and speed of disease detection, 1) Resizing: MRI images come in various sizes depending
ultimately leading to better patient care and more informed on the scanning equipment used. For consistency, all images
treatment decisions. are resized to a fixed resolution (e.g., 224x224 or 256x256
pixels). This standardization ensures uniform input dimensions
for the models.
2) Normalization: The pixel values of MRI images can
II. S YSTEM A RCHITECHTURE vary widely, depending on the scanning intensity. Normaliza-
tion transforms these values into a standard range, such as
The system architecture for classifying medical images, [0,1] or [-1,1], which helps the neural network converge more
specifically brain MRI scans, using deep learning models, quickly and improves numerical stability during training.
consists of several interconnected layers that work together to
3) Data Augmentation: To combat the relatively small size
preprocess the data, extract features, apply machine learning
of medical image datasets, data augmentation techniques are
models, and present results. Each of these layers has a distinct
applied to artificially expand the dataset. Common augmen-
role in ensuring the accuracy, efficiency, and scalability of the
tations include: o Rotation: Rotating the images by random
image classification system. Below is a detailed breakdown
angles to help the model learn rotational invariance. o Flipping:
of the system architecture.
Horizontally or vertically flipping the images to create new
training samples. o Zooming: Randomly zooming in and out
to teach the model to handle images at different scales. o
Shifting: Translating the images slightly to simulate variability
A. Data Collection Layer
in patient positioning during the MRI scan.
The data collection layer forms the foundation of the 4) Image Slicing: MRI scans are typically 3D images
image classification system. Medical images, particularly composed of multiple 2D slices taken at different depths.
brain MRI scans, are sourced from a variety of repositories Depending on the approach, these 2D slices can be used
and databases, which could include: individually, or the 3D volume can be processed as a whole.
For some deep learning models, individual slices are classified
1) Public datasets: Examples include the Alzheimer’s independently, while for others (such as 3D CNNs), the entire
Disease Neuroimaging Initiative (ADNI) or Medical Image volume is processed to capture spatial relationships.
Computing and Computer-Assisted Intervention (MICCAI) These preprocessing techniques ensure that the images are
challenges, which provide labeled MRI images of healthy clean, standardized, and augmented, improving the models’
individuals and patients in various stages of neurodegenerative ability to generalize and perform well on unseen data.
diseases.
JOURNAL OF LATEX CLASS FILES, VOL. 14, NO. 8, AUGUST 2015 3

C. Feature Engineering 3) ResNet Residual Networks: solve the problem of van-


ishing gradients in deep networks by introducing skip con-
In traditional machine learning, feature engineering
nections, which allow gradients to flow directly through the
involves manually selecting and transforming features to
network without degradation. ResNet is particularly effective
improve model performance. However, in deep learning,
in medical image classification tasks because it enables the
particularly with CNNs and their variants, feature extraction
model to learn deep, complex representations of brain struc-
is handled automatically by the convolutional layers. These
tures that differentiate healthy brains from diseased ones.
layers learn to identify low-level features (e.g., edges,
4) Capsule Networks (CapsNet): Unlike CNNs, which are
textures) in the early stages and high-level features (e.g.,
prone to losing spatial information as features are pooled,
anatomical structures, patterns related to disease) in the
CapsNet preserves the spatial hierarchy of features using
deeper layers.
capsules—small groups of neurons that learn not just the
For brain MRI classification, feature extraction focuses on:
presence of a feature but also its orientation, position, and
scale. CapsNet’s dynamic routing between capsules enables
1) Identifying anatomical regions of interest (ROIs): The it to handle variations in the input image better, making it
network learns to focus on key brain regions like the hip- especially useful for medical images where subtle differences
pocampus, which is known to exhibit atrophy in Alzheimer’s in orientation and structure are diagnostically significant.
patients.
2) Capturing spatial relationships: Especially important E. Optimization Algorithms
for CapsNet, which preserves the pose and orientation of To improve model performance and convergence speed,
features, ensuring that spatial hierarchies are maintained. optimization algorithms such as Gannet, SBO (Sequential
Bayesian Optimization), and ESBO (Enhanced Sequential
For models like CNN, VGG, ResNet, and CapsNet, the Bayesian Optimization) are employed. These algorithms
convolutional layers and pooling layers work together to create help fine-tune hyperparameters (like learning rates, batch
hierarchical feature maps, which capture increasing levels of sizes, and network architecture choices) to maximize model
abstraction. In this process: performance on validation datasets.
• Convolutional filters slide over the input image to detect
features, producing feature maps. 1) Gannet: is designed to explore hyperparameter spaces
• Pooling layers downsample the feature maps, reducing efficiently, adapting the search process based on past perfor-
their size while retaining the most important information. mance to identify optimal configurations.
This helps in reducing computational complexity and 2) SBO: leverages Bayesian principles to update beliefs
preventing overfitting. about hyperparameter effects dynamically, allowing for a more
informed search strategy that improves convergence speed.
In medical image classification, feature extraction is
3) ESBO: extends SBO by enhancing the search process
particularly challenging because the features that differentiate
with advanced sampling techniques, allowing for better ex-
disease stages may be subtle, requiring deeper architectures
ploration of hyperparameter space and yielding improved
(such as VGG or ResNet) to capture these variations
performance outcomes.
effectively.
Each model is trained and validated using a portion of the
dataset. Cross-validation techniques are employed to assess
how well the models generalize to unseen data. Additionally,
D. Modeling Layer (Machine Learning) hyperparameter tuning is performed using the optimization
algorithms to optimize learning rates, batch sizes, and other
The modeling layer is where the deep learning architectures model-specific parameters.
are applied to the preprocessed and feature-extracted data to
perform classification. The models used in this study are: F. Classification Layer
Once the models have been trained, the classification layer
1) Convolutional Neural Networks (CNNs): A standard outputs predictions for each MRI scan, categorizing them into
architecture for image classification, CNNs use convolutional predefined classes (e.g., Healthy, Mild Cognitive Impairment,
layers to detect features and pooling layers to reduce spatial Alzheimer’s). The classification layer typically consists of
dimensions. CNNs are widely used in medical imaging be- fully connected (dense) layers, where the learned features from
cause of their ability to handle large amounts of spatial data the convolutional layers are combined and passed through
efficiently. The network learns to extract relevant features from activation functions (e.g., softmax for multi-class classifica-
MRI scans and classify them into disease stages. tion) to generate probability scores for each class. For medical
2) VGG: The VGG architecture extends CNN by using applications, confidence scores are particularly important, as
deeper networks with small (3x3) convolutional filters. The they help clinicians understand how confident the model is
depth of VGG networks allows them to capture more complex in its predictions. For example, if the model predicts early-
and finer details in MRI images. However, the increased depth stage Alzheimer’s with high confidence, it could prompt more
also leads to higher computational costs and longer training aggressive clinical follow-up.
times.
JOURNAL OF LATEX CLASS FILES, VOL. 14, NO. 8, AUGUST 2015 4

III. R ESULT 5) False Negative Rate (FNR):


Measuring Type 1 and Type 2 errors through these metrics FN
allows you to better understand and evaluate the performance FNR =
TP + FN
of your machine learning models. Depending on the specific FNR shows the proportion of actual positives that were incor-
use case, different metrics may be prioritized to achieve the rectly identified as negative. It’s the complement of recall. A
desired balance of accuracy and error types In the context of low FNR indicates that most positive instances are correctly
machine learning and statistical classification, Type 1 and Type identified.
2 errors can be measured using specific metrics derived from
the confusion matrix. Following the metrics related to these
G ENERAL M ETRICS (N OT C ATEGORIZED AS T YPE 1 OR
errors:
T YPE 2)
T YPE 1 E RROR (FALSE P OSITIVE ) 6) Accuracy:
Definition TP + TN
Accuracy =
A Type 1 error occurs when the model incorrectly identifies TP + TN + FP + FN
a negative instance as positive. This means that the model Accuracy measures the overall correctness of the model across
predicts the presence of a condition (e.g., a disease) when it all instances. It answers the question: ”What proportion of total
is not actually present. instances were correctly classified?” While useful, accuracy
can be misleading, especially in imbalanced datasets where
Metrics Associated
one class significantly outweighs the other.
1) Precision: 7) F1 Score:
TP P recision · Recall
P recision = F 1Score = 2 ·
TP + FP P recision + Recall
Precision indicates the quality of positive predictions. It an-
The F1 score combines precision and recall into a single
swers the question: ”Of all instances classified as positive, how
metric, providing a balance between the two. It’s particularly
many were actually positive?” A high precision means fewer
useful when you need to account for both false positives and
false positives.
false negatives. A high F1 score indicates a model that has
2) False Positive Rate (FPR):
both good precision and recall.
FP
FPR =
FP + TN Measure CNN VGG16 ResNet
Accuracy 0.901 0.666 0.529
FPR shows the proportion of actual negatives that were Precision 0.843 0.625 0.529
incorrectly identified as positive. It’s also known as the fall- Recall 1.0 0.925 1.0
out rate. A low FPR is desirable as it indicates fewer incorrect F1 Score 0.915 0.746 0.692
positive predictions. False Negative Rate (FNR) 0.0 0.074 0.0
False Positive Rate (FPR) 0.208 0.625 1.0
3) False Discovery Rate (FDR): False Discovery Rate (FDR) 0.156 0.375 0.470
FP TABLE II
F DR = C OMPARISON OF P ERFORMANCE M ETRICS FOR D IFFERENT M ODELS
TP + FP
FDR measures the proportion of false positives among all
positive predictions. It tells us how many of the predicted
positives were actually negative. Lowering the FDR improves
the reliability of positive predictions.
T YPE 2 E RROR (FALSE N EGATIVE )
Definition
A Type 2 error occurs when the model fails to identify
a positive instance, incorrectly classifying it as negative. This
means that the model predicts the absence of a condition when
it is actually present.

Metrics Associated
4) Recall (Sensitivity or True Positive Rate):
TP
Recall =
TP + FN
Recall measures the model’s ability to identify all relevant
instances. It answers the question: ”Of all actual positives,
how many were correctly identified?” A high recall means
fewer false negatives.
JOURNAL OF LATEX CLASS FILES, VOL. 14, NO. 8, AUGUST 2015 5

Criteria CNN VGG ResNet CapsNet


Architecture Multiple Deep, Uses Uses
convo- sequen- residual cap-
lutional tial blocks sules,
and archi- that groups
pooling tecture allow of
layers, with the neurons
followed multiple network that
by fully convo- to skip preserve
con- lutional layers spatial
nected layers hierar-
layers (16 chies
or 19
layers)
Depth Typically Deep Very Shallow
shallow (16 deep (up to mod-
to mod- or 19 to 100+ erate
erately layers in layers) depth,
deep VGG- but
16 or complex
VGG- internal
19) connec-
tions
Feature Learning Learns Focuses Residual Encodes
basic on ex- connec- spatial
features tracting tions relation-
like detailed allow ships
edges, features, better between
textures, larger feature features,
and depth propaga- not
patterns tion just the
features
Training Complexity Easier High Efficient Difficult
to train, compu- training to train
simpler tational with due to
architec- cost due residual dynamic
ture to deep connec- routing
layers tions; and
miti- more
gates complex
van- architec-
ishing ture
gradient
Parameter Size Moderate, Large, Fewer Small
depends due to param- com-
on the depth eters pared to
depth of and due to VGG,
layers fully residual but
con- connec- capsules
nected tions increase
layers the pa-
rameter
com-
plexity
Training Time Relatively Long Efficient Longer
short, due to due to due to
espe- the large residual complex
cially number connec- routing
for of layers tions mech-
shallow and pa- despite anisms
architec- rameters large between
tures depth capsules
Generalization Performs Good Excellent Better
well, general- general- at pre-
prone to ization, ization serving
overfit- but due to informa-
ting if prone to residual tion,subtle
too deep overfit- learning to data
ting. distribu-
tion
TABLE I
C OMPARISON OF D IFFERENT N EURAL N ETWORK A RCHITECTURES

You might also like