Comparative Analysis of Imagenet Pre-Trained Deep Learning Models and Dinov2 in Medical Imaging Classification
Comparative Analysis of Imagenet Pre-Trained Deep Learning Models and Dinov2 in Medical Imaging Classification
Comparative Analysis of Imagenet Pre-Trained Deep Learning Models and Dinov2 in Medical Imaging Classification
Abstract—Medical image analysis frequently encounters data is primarily due to the high costs of data annotation and strict
scarcity challenges. Transfer learning has been effective in patient privacy laws. In response, researchers are investigat-
addressing this issue while conserving computational resources. ing pre-feature learning methods for medical image analysis
The recent advent of foundational models like the DINOv2,
which uses the vision transformer architecture, has opened tasks. Among these methods, transfer learning is particularly
new opportunities in the field and gathered significant interest. prominent as it utilizes extensive datasets to initially train
However, DINOv2’s performance on clinical data still needs to models, which are subsequently fine-tuned for specific med-
be verified. In this paper, we performed a glioma grading task ical imaging tasks [5], [6]. Simultaneously, self-supervised
using three clinical modalities of brain MRI data. We compared learning, which derives meaningful features from unlabeled
the performance of various pre-trained deep learning models,
including those based on ImageNet and DINOv2, in a transfer data, is emerging as a valuable approach in medical image
learning context. Our focus was on understanding the impact analysis [7]. Though transfer and self-supervised learning both
of the freezing mechanism on performance. We also validated focus on feature learning, they apply different strategies in data
our findings on three other types of public datasets: chest use and learning techniques, underscoring the adaptability of
radiography, fundus radiography, and dermoscopy. Our findings deep learning in overcoming the distinct challenges of medical
indicate that in our clinical dataset, DINOv2’s performance was
not as strong as ImageNet-based pre-trained models, whereas in image analysis.
public datasets, DINOv2 generally outperformed other models, Recently, foundation models have had widespread attention.
especially when using the frozen mechanism. Similar perfor- Models like OpenAI’s ChatGPT [8] have excelled in natural
mance was observed with various sizes of DINOv2 models across language processing tasks [9], [10], highlighting the potential
different tasks. In summary, DINOv2 is viable for medical image for general artificial intelligence [11], [12]. In image analysis,
classification tasks, particularly with data resembling natural
images. However, its effectiveness may vary with data that sig- the rising prominence of foundation models in computer vision
nificantly differs from natural images such as MRI. In addition, has introduced novel approaches to the field [13], [14]. Notable
employing smaller versions of the model can be adequate for examples include SAM [15], focused on image segmentation,
medical task, offering resource-saving benefits. Our codes are and DINOv2 [16], a versatile backbone model trained on
available at https://github.com/GuanghuiFU/medical dino eval. extensive datasets for a range of applications, such as im-
Index Terms—Foundation model, Classification, Brain MRI, age classification, retrieval, depth estimation, and semantic
Glioma, Pretrained, Transfer learning. segmentation. The advancement of foundational models in
computer vision has led medical researchers to consider their
I. I NTRODUCTION application in medical image analysis. For instance, Yue et
al. [17] used SAM and heat maps generated by a classification
D EEP learning has made significant strides, attracting the
attention of researchers and finding applications across
a broad range of fields [1]. The emergence of large-scale
model to segment breast cancer images. Huix et al. [18]
explored the use of DINOv2 in medical image classification
datasets, such as ImageNet [2], has been a significant driver tasks, finding that it outperforms ImageNet pre-trained models
of this growth, fueling the development of more sophisticated in effectiveness. However, the need for more clinical data
models [3]. However, in medical image analysis, the lack of evaluation points to further investigation into these models’
large datasets poses a significant challenge [4]. This shortage internal mechanisms. Additionally, the nature and effectiveness
of transfer learning and foundation models raise questions
Jingchen Zou, Yuning Huang, Xin Yue, Qing Zhao, Jianqiang Li, Chang- about whether foundation models consistently surpass pre-
wei Song are with School of Software Engineering, Beijing University of trained deep learning models. It necessitates extensive research
Technology, Beijing, China.
Lanxi Meng, Shaowu Li are with Department of Neuroimaging, Beijing across various tasks to ascertain their comparative advantages.
Neurosurgical Institute, Beijing Tiantan Hospital, Capital Medical University, Glioma, the most common primary intracranial tumor, origi-
Beijing, China. nates from neuroglial stem or progenitor cells [19]. The World
Guanghui Fu and Gabriel Jimenez are with Sorbonne Université, Institut du
Cerveau – Paris Brain Institute - ICM, CNRS, Inria, Inserm, AP-HP, Hôpital Health Organization’s 2021 classification of central nervous
de la Pitié-Salpêtrière, Paris, France (e-mail: [email protected]). system tumors categorizes brain gliomas into grades 1 to
Corresponding author: Guanghui Fu ([email protected]) 4 [20]. In the United States, the annual incidence rate of
This work has been submitted to the IEEE for possible publication.
Copyright may be transferred without notice, after which this version may brain glioma is 6 per 100,000 people, with a 5-year mortality
no longer be accessible. rate of approximately 5% [21]. Developing a computer model
SUBMITTED TO IEEE CONFERENCE 2
capable of swiftly classifying brain MRIs could significantly enhanced classification efficiency by optimizing the final three
aid physicians in quickly assessing the condition and planning layers of these models.
further treatment [22]. Consequently, developing relevant deep These studies highlight the potential and applicability of
learning models is vital [23]. More studies applying these deep learning technology in brain image analysis research.
models to clinical data are necessary to determine their suit- However, it is important to note that most of these studies
ability for clinical use [24]. rely on public datasets. In contrast, clinical data often presents
In this study, we utilized clinical data to compare the more challenges due to its complexity and scarcity. Trans-
performance of foundation models and ImageNet pre-trained fer learning emerges as a valuable solution to these issues.
deep learning models in three-modality brain MRI for glioma It effectively addresses the challenges of data scarcity and
grading. We specifically focused on the impact of the freezing also conserves time and computational resources, offering the
mechanism in training, where specific layers of the model are potential for enhanced performance in these contexts [33],
kept fixed while others are fine-tuned. Similar experiments [34]. Deepak et al., [35] extracted features of brain tumor
were conducted on three public datasets, involving tasks like images using GoogLeNet and then performed five-fold cross-
chest X-ray classification [25], glaucoma classification on eye validation experiments on images using SVM as a classifier
fundus [26], and pigmented skin lesion detection on dermato- on the Figshare dataset to classify brain tumor types. The
scopic images [27]. We observed that the freezing approach performance was evaluated using the area under the curve
with DINOv2 consistently yielded superior results compared (AUC), precision, recall, F-score, and specificity. The results
to training without freezing. However, in our clinical trials, demonstrate that transfer learning is a valuable technique in
DINOv2 did not surpass the pre-trained deep learning models scenarios with limited availability of medical images. Chel-
in performance, although it excelled in public datasets. This ghoum et al. [36] employed deep transfer learning to classify
study also extends to clinical applications, exploring MRI’s three types of brain tumors using a CE-MRI dataset. They
potential in tumor grading. utilized nine deep CNN architectures as feature extractors,
modifying the network’s last three layers for fine-tuning.
II. R ELATED WORK Additionally, they investigated the impact of epoch variations
on classification performance. With its training on large-scale
A. Brain image classification and transfer learning datasets, the foundation model offers flexible adaptability for
Deep learning techniques have been widely used in med- various downstream tasks, presenting new opportunities in the
ical image classification tasks [28]. Fu et al. [29] devel- context of transfer learning. The specific performance and
oped an interpretable model that utilizes an attention mech- principles of the two are still worth exploring.
anism specifically designed to identify cerebral hemorrhage
in brain CT scans. This model highlights suspicious areas
using an attention heatmap, providing a basis for its predictions B. Foundation model for image classification
and assisting medical professionals. Following this, Wang et The deep learning field is increasingly applying foundation
al. [30] expanded on this approach by focusing the attention models that can be adapted for diverse tasks. A notable
mechanism more intensely on image methods to enhance example in computer vision is the Segment Anything model
feature learning. This refined approach, which involves the (SAM) [15], a foundation model that has gathered signif-
fusion of these methods, resulted in improved performance. icant interest, particularly in medical image segmentation
In human-computer interaction experiments involving doctors tasks [37], [38]. Another representative model is DINOv2 [39],
diagnosing the same images, it was observed that each set of a label-free, self-distillation approach leveraging the Vision
images could be processed 10 seconds faster, coupled with a Transformer architecture, notable for outperforming other self-
significant increase of 16 points in the F1 score. These results supervised and some supervised learning methods. Building
underscore the efficacy of artificial intelligence algorithms on DINOv2’s success, Oquab et al. [16] developed DINOv2,
in medical diagnostics. Soleymanifard et al. [31] proposed an enhanced version incorporating elements from iBOT [40],
a multi-stage tumor classification method for a complete and pre-trained on the LVD-142 M dataset with 142 million
tumor classification process. This study utilized a 12-scale natural images. DINOv2, maintaining DINOv2’s strengths and
multiscale fuzzy C-mean (MsFCM) detection of enhanced adding further improvements, is adaptable for tasks like classi-
tumors. After segmenting the tumors and normal tissues, the fication, segmentation, depth estimation, and image retrieval in
input tumors were classified into low-grade gliomas (LGG) both image and video formats. Based on these benefits from
and high-grade gliomas (HGG) using a fully connected neural the foundation model, researchers are working on applying
network. However, this study was not tested on clinical data DINOv2 to medical image analysis. Baharoon et al. [41]
and only validated the method on public data sets. Kaur et used the DINOv2 model for radiological image analysis and
al., [32] investigated applying several pre-trained deep learning conducted over 100 experiments using DINOv2 on different
models to classify pathological brain images. This research modalities (X-ray, CT, and MRI) with tasks including disease
primarily concentrated on assessing the performance of diverse classification and organ segmentation. The results of this
Deep Convolutional Neural Network (DCNN) architectures, study show that DINOv2 has good cross-task generalization
including AlexNet, ResNet, GoogLeNet, and VGG, using capabilities relative to supervised, self-supervised, and weakly
datasets from the Harvard Repository, a clinical dataset, and supervised models. However, the researchers maintained a
Figshare’s multi-class tumor dataset. Moreover, the research frozen backbone throughout their experiment, a strategy of
SUBMITTED TO IEEE CONFERENCE 3
considerable importance in transfer learning. The underlying standard for deep network architectures in image processing.
mechanism of this approach warrants further investigation due The application of VGG16 in medical imaging, mainly through
to its significance. Prokop et al. [42] extensively evaluated transfer learning, has shown significant efficacy [44]–[47].
models such as DINOv1 and DINOv2, among others, using 2) ResNet50: ResNet [48] has significantly advanced the
various pre-trained CNN and vision transformers (ViTs) as field of image classification. Its key innovation lies in the
feature extractors and combinations of multiple classifiers in a approach to Residual Learning, which effectively tackles the
small-sample X-ray image classification scenario. The research vanishing gradient problem common in deep neural networks.
tasks include the COVID-19 recognition task and the tubercu- ResNet achieves this through the integration of ’skip connec-
losis recognition task. In both tasks, ViTs as feature extractors tions,’ which allow a portion of the input to bypass one or
outperform CNN-based models in almost all scenarios. This more layers and then merge with the output of subsequent
suggests that DINOv2 and MAE visual transformers may layers. This architectural feature ensures the network can
be a good choice as feature extractors in metric learning learn identity functions where necessary, maintaining training
models. However, the mean AUROC of DINOv2-ViT-B/14 stability and improving learning efficiency. Importantly, this
lagged behind that of DINOv2-ViT-B/8 in the COVID-19 design enables the successful training of much deeper net-
recognition task. Huix et al. [18] conducted a validation of five works, leading to marked improvements in performance on
foundation models across four medical image classification complex image classification tasks. In the field of medical
datasets. Their research revealed that using DINOv2 as a imaging, ResNet50 has been successfully applied to disease
backbone for transfer learning exhibits strong transferability, classification through transfer learning [49]–[52].
potentially supplanting the role of ImageNet pre-training in 3) DenseNet121: DenseNet [53] represents a breakthrough
medical classification tasks. Additionally, their study delves in the architecture of convolutional networks, particularly with
into the freezing mechanism, observing that freezing the foun- its densely connected design. Unlike traditional architectures,
dation model frequently resulted in a decline in performance each layer in DenseNet receives inputs from all preceding lay-
in several datasets. ers, ensuring comprehensive feature integration. This structure
These studies show the potential and effectiveness of the significantly enhances the flow of information and gradients
DINOv2 model, highlighting its significant promise for appli- throughout the network, leading to more effective training and
cations in medical image analysis. However, it is important reduced information loss. A standout aspect of DenseNet is its
to note that most of these studies rely on experiments using exceptional efficiency in feature reuse, which allows for a re-
public data and lack validation with clinical data, which is cru- duced number of parameters without sacrificing performance.
cial for comprehensive assessment and practical applicability Among its variants, DenseNet121 is particularly notable for
in medical settings. its robust performance in various image classification tasks,
especially in processing images with complex textures and
III. M ETHODS patterns. Its application in medical image analysis, particularly
in disease diagnosis and tumor detection, has been extensively
In this research, we conducted a comparative analysis of
recognized and adopted [54]–[57].
the ImageNet pre-trained deep learning and foundation models
using a private clinical dataset and three additional public
datasets. The study emphasized examining the influence of the B. DINOv2: vision transformer-based computer vision foun-
model’s freezing mechanism on performance outcomes. We dation model
carried out six targeted experiments on two distinct datasets. In the realm of image data processing, DINOv2 [16] out-
Moreover, the study specifically explored the models’ ability performs traditional CNNs in handling large-scale, complex
to grade gliomas within clinical datasets, with comprehensive datasets. Distinguishing itself from other Transformer-based
results detailed in the subsequent results section. models, DINOv2’s unique strength lies in its capacity for
self-supervised learning, enabling it to effectively train on
A. Transfer learning on ImageNet pre-trained models unlabeled datasets. This model demonstrates proficiency in
a variety of image processing applications, such as image
We selected several representative deep-learning models
classification, object detection, and segmentation, attributed
pre-trained on ImageNet for comparison, including VGG16,
to its robust feature extraction capabilities. Researchers are
ResNet50, and DenseNet121. These models were selected for
increasingly investigating the use of DINOv2 for medical
their excellent performance in large-scale image recognition
image analysis tasks [18], [41], [42].
tasks.
1) VGG16: The VGG network [43] marked a signifi-
IV. DATASETS
cant advancement in deep learning for image classification.
VGG16, in particular, represented a major step forward in A. Clinical datasets: glioma grading in brain MRI
computational vision with its 16-layer architecture, consisting In this study, we studied glioma images from 101 patients.
of 13 convolutional and 3 fully connected layers. This depth, The study was approved by the Ethics Committee of Beijing
combined with small (3x3) convolution filters, significantly Tiantan Hospital, affiliated with Capital Medical University
enhanced the model’s capability for intricate feature extraction. (Project No. KY 2020-128-01). Clinical data of patients with
The design of VGG16 was instrumental in enabling the identi- histopathologic diagnosis of glioma were collected from Pic-
fication of complex and abstract visual patterns, setting a new ture Archiving and Communication Systems (PACS), and 101
SUBMITTED TO IEEE CONFERENCE 4
patients were finally enrolled after screening by inclusion and Grade 2 Grade 3 Grade 4
exclusion process. The inclusion criteria included: 1) patients
with biopsy or surgical resection followed by pathological
tissue confirmation of glioma, with the tumor located on the
cerebellar vermis; 2) patients with no history of therapeutic
surgeries such as hormone, radiotherapy, or puncture before
MRI scanning; 3) MRIs were acquired preoperatively; and 4)
the presence of a clear pathologic diagnosis. On the other hand, T1 Modality
the exclusion criteria included 1) incomplete preoperative im-
ages or unclear pathological organization and 2) poor quality
MRIs (motion artifacts, metal artifacts, others). According to
the 2016 revised World Health Organization (WHO) classifi-
cation of central nervous system (CNS) tumors [58]: diffuse
gliomas include WHO grade II and III astrocytomas, grade II
and III oligodendrocytes Cytomas, grade IV glioblastoma and
childhood-related diffuse gliomas. This study divided clinical T2 Modality
MRI glioma data into 3 categories based on histological and
pathological diagnosis: grade II, grade III and grade IV.
The acquisition equipment and scanning protocols are de-
scribed as follows: Imaging was conducted using a Siemens
3T Prisma MRI, complemented by a 20-channel head and
neck coil. Initial localization was achieved via a preliminary
imaging scan. After axial level determination on the sagittal T2 FLAIR Modality
plane for the trans anterior union, a series of structural image
scans were executed. These included: Fig. 1. The figure presents MRI scans showcasing different modalities (T1,
T2, and T2 FLAIR) across various glioma grades.
• T1-Weighted 3D Structural Imaging: Parameters were
set at TR=2300ms, TE=2.32ms, flip angle = 8°, FOV
= 240×240mm2 , matrix = 240×240, slice thickness = in our clinical experimental dataset.
0.9mm, and voxel size = 0.9×0.9×0.9mm3 .
• T2-Weighted Structural Imaging: Parameters were set B. Public datasets
at TR=5000ms, TE=105ms, flip angle = 150°, field of
To support our research, we also used three public datasets,
view (FOV) = 220×200mm2 , matrix = 448×358, slice
including the classification of chest X-rays, eye funds, and skin
thickness = 3.0mm, and voxel size = 0.5×0.5×3.0mm3 .
dermoscopic images. The example images for these datasets
• T2 FLAIR 3D Structural Imaging: Parameters were set
can be seen in Figure 2 Detailed data description is shown in
at TR = 5000ms, TE = 387ms, FOV = 230×230mm2 ,
Table I and Table II.
matrix = 256×256, slice thickness = 0.9mm, and voxel
a) Chest X-Ray dataset [25]: consists of patients aged 1
size = 0.9×0.9×0.9mm3 .
to 5 years at the Guangzhou Women and Children’s Medical
We gathered three distinct datasets from our patient cohort, Center (GWCMC), and the data were evaluated and examined
each representing a different imaging modality. These include by three experts. The dataset is divided into 3 folders (training
100 MRIs in the T1 modality, 101 MRIs in T2, and 36 MRIs in set: train, test set: test, validation set: val) and contains sub-
T2 FLAIR. While we could not obtain all three modalities for folders for each image category (pneumonia/normal). The data
each patient, the data we collected was meticulously organized set has 5840 X-ray images (4265 pneumonia images and 1575
by case number, and a radiologist with 8 years of experience normal images). We used the same train/test division, which
(L.M) provided voxel-level tumor annotations for each image. contains 5216 images for training and 624 images for testing.
To address potential discrepancies in image labels and voxel b) iChallenge-AMD dataset [26]: it consisted of retinal
sizes across the datasets, we resampled both data and labels fundus images of age-related macular degeneration (AMD)
collectively to ensure uniformity and alignment for all patients. from Chinese patients, with 77% of non-AMD subjects (89
Our study focused on 2D axial slices extracted from the 3D AMD images) and 23% of AMD subjects (311 non-AMD
data, specifically those where the tumor’s Region of Interest images). Each image contains glaucoma/non-glaucoma labels
(ROI), indicated by a binary mask, exceeded 30 pixels in size. assigned based on comprehensive clinical evaluation results.
These slices formed the basis of our three datasets, which The images were collected by a Zeiss Visucam 500 funding
were suitable for the 2D input requirements of our models. camera (2124 × 2056 pixels) and a Canon CR-2 unit (1634
We split the datasets into training and test sets on a patient × 1634 pixels). This dataset label is assigned based on
level to prevent data leakage [59], maintaining a 7:3 ratio. comprehensive clinical assessment. In this study, the training
Within the training dataset, we divided it into a training and set has 245 images, and the test set has 155 images.
validation set using an 8:2 ratio. The example images in c) HAM10000 dataset [27]: it contains 11,513 dermo-
Figure 1 illustrate the various modalities and tumor grades scopic images from the Department of Dermatology, Medical
SUBMITTED TO IEEE CONFERENCE 5
(a)
Normal Pneumonia
(b)
Non-AMD AMD
(c)
Fig. 2. The examples from three public datasets include: (a) a Chest X-Ray dataset used for binary classification of pneumonia and normal cases, (b) the
iChallenge-AMD dataset aimed at detecting age-related macular degeneration in eye fundus images, and (c) the HAM10000 dataset for classifying skin cancer
in dermoscopy images.
to the preservation of learned representations, enhancing their in DINOv2 does not always correlate with enhanced perfor-
ability to generalize across different tasks and datasets without mance, emphasizing the need for balance between model size
extensive retraining. and task complexity in addition to the data distribution and
When considering the performance in the publicly selected characteristics considerations.
datasets, DINOv2 models outperformed the traditional Im- Our study, centered on pre-trained models, was designed
ageNet pre-trained models. These results suggest that the to facilitate an equitable comparison across different architec-
DINOv2 models’ architecture is robust enough to handle tures. However, this choice inherently limited our exploration
different types of public medical data independently of the of each model’s intrinsic capabilities. Future research should
freezing mechanism. In addition, an interesting finding was consider a more holistic approach, potentially incorporating
the high performance of DINOv2 models (max of 86.73% models trained from scratch, to uncover alternative perfor-
F1 score) in the eye funds dataset (i.e., iChallenge-AMD), mance characteristics and applicability across different tasks
which had a relatively lower amount of images than the and image modalities.
other publicly available datasets. This increased performance The observed performance between different versions of
occurs when applying the freezing mechanism, demonstrating DINOv2 prompts an essential question regarding the optimal
DINOv2’s capability to transfer the learned features to limited balance between model complexity and computational effi-
data effectively, an essential quality for tasks with constrained ciency. Future studies could also focus on identifying the most
data availability. effective model size for specific tasks, potentially leading to
On the other hand, in our private datasets, DINOv2 models more computationally economical yet practical solutions.
exhibited slightly weaker and sometimes comparable perfor-
mance than the ImageNet pre-trained models. All models, VII. C ONCLUSION
however, achieved quantitatively low results, with the max This study conducted a comprehensive analysis of DINOv2
F1 score of 49.74% in the T2 dataset using the ResNet50. and ImageNet pre-trained models across various datasets,
These results suggest that while DINOv2 models are robust in including public and private collections, focusing specifically
more generalized settings, they may require further tuning or on brain glioma grading tasks. This investigation yielded
adaptation to excel in specialized, private datasets. Especially several key insights that contribute to the existing knowledge
when the classes in the dataset do not present apparent visual in machine learning and set a direction for future research.
differences, as is the case between Grades 3 and 4 for T1, Firstly, the study highlighted the significant influence of
T2, and T2FLAIR modalities (as observed in Figure 1). dataset characteristics on model performance. DINOv2 models
Then, these observations highlight the necessity of considering excelled in public datasets, demonstrating their robustness
the dataset-specific features (like diversity, volume, and label and adaptability. However, in the more specialized context
distribution) in the model selection process. of private brain glioma datasets, they were outperformed by
Interestingly, we also found that larger versions of DINOv2 ImageNet pre-trained models. This finding emphasizes the
did not necessarily outperform their smaller counterparts. need to consider dataset specificity when choosing models for
This observation suggests that increased model complexity particular applications. The research also examined the impact
SUBMITTED TO IEEE CONFERENCE 7
TABLE III
T HE CLASSIFICATION TASK PERFORMANCE ACROSS THREE MODALITIES OF PRIVATE DATASETS IS PRESENTED AS THE MEAN WITH A 95% BOOTSTRAP
CONFIDENCE INTERVAL . M ETRICS SUCH AS PRECISION , RECALL , AND F1- SCORE ARE REPORTED AS WEIGHTED AVERAGES .
of the freezing mechanism on model performance. DINOv2 [2] J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, and L. Fei-Fei, “Imagenet:
models consistently showed improvement with freezing, while A large-scale hierarchical image database,” in 2009 IEEE conference on
computer vision and pattern recognition. Ieee, 2009, pp. 248–255.
the response of ImageNet pre-trained models varied. This [3] M. Huh, P. Agrawal, and A. A. Efros, “What makes imagenet good for
observation suggests that tailored training strategies could transfer learning?” arXiv preprint arXiv:1608.08614, 2016.
improve model performance, especially in transfer learning [4] N. Tajbakhsh, L. Jeyaseelan, Q. Li, J. N. Chiang, Z. Wu, and X. Ding,
contexts. Nevertheless, the study’s focus on pre-trained models “Embracing imperfect datasets: A review of deep learning solutions
for medical image segmentation,” Medical Image Analysis, vol. 63, p.
highlights a limitation: the lack of an examination study of 101693, 2020.
models trained from scratch. Investigating such models could [5] F. Zhuang, Z. Qi, K. Duan, D. Xi, Y. Zhu, H. Zhu, H. Xiong, and
provide additional insights into their capabilities and adaptabil- Q. He, “A comprehensive survey on transfer learning,” Proceedings of
the IEEE, vol. 109, no. 1, pp. 43–76, 2020.
ity. This study set the basis for comparing classic deep learning
[6] M. A. Morid, A. Borjali, and G. Del Fiol, “A scoping review of transfer
approaches and foundation models and opens several paths for learning research on medical image analysis using imagenet,” Computers
future research. There is a compelling need to test a broader in biology and medicine, vol. 128, p. 104115, 2021.
range of tasks and datasets to corroborate and expand upon [7] R. Krishnan, P. Rajpurkar, and E. J. Topol, “Self-supervised learning
in medicine and healthcare,” Nature Biomedical Engineering, vol. 6,
these findings. Further exploration into the optimal balance no. 12, pp. 1346–1352, 2022.
between model size and complexity, the effects of various [8] T. Brown, B. Mann, N. Ryder, M. Subbiah, J. D. Kaplan, P. Dhariwal,
training strategies, and the investigation of transfer learning A. Neelakantan, P. Shyam, G. Sastry, A. Askell et al., “Language mod-
els are few-shot learners,” Advances in neural information processing
and fine-tuning methods are vital for the progression of the systems, vol. 33, pp. 1877–1901, 2020.
field. [9] A. J. Thirunavukarasu, D. S. J. Ting, K. Elangovan, L. Gutierrez, T. F.
In summary, this study offers valuable perspectives on the Tan, and D. S. W. Ting, “Large language models in medicine,” Nature
performance of advanced deep learning models across diverse medicine, vol. 29, no. 8, pp. 1930–1940, 2023.
datasets. It highlights the crucial role of dataset characteristics [10] T. He, G. Fu, Y. Yu, F. Wang, J. Li, Q. Zhao, C. Song, H. Qi, D. Luo,
H. Zou et al., “Towards a psychological generalist ai: A survey of
and training strategies in achieving optimal model perfor- current applications of large language models and future prospects,”
mance. These insights are pivotal for continuously evolving arXiv preprint arXiv:2312.04578, 2023.
and refining machine learning models, particularly in their [11] M. Moor, O. Banerjee, Z. S. H. Abad, H. M. Krumholz, J. Leskovec,
E. J. Topol, and P. Rajpurkar, “Foundation models for generalist medical
application to specialized and varied datasets. artificial intelligence,” Nature, vol. 616, no. 7956, pp. 259–265, 2023.
[12] R. Bommasani, D. A. Hudson, E. Adeli, and et al., “On the opportunities
R EFERENCES and risks of foundation models,” CoRR, vol. abs/2108.07258, 2021.
[1] Y. LeCun, Y. Bengio, and G. Hinton, “Deep learning,” nature, vol. 521, [Online]. Available: https://arxiv.org/abs/2108.07258
no. 7553, pp. 436–444, 2015. [13] Z. Huang, F. Bianchi, M. Yuksekgonul, T. J. Montine, and J. Zou, “A
SUBMITTED TO IEEE CONFERENCE 8
TABLE IV
T HE CLASSIFICATION TASK PERFORMANCE ACROSS THREE PUBLIC DATASETS ARE PRESENTED AS THE MEAN WITH A 95% BOOTSTRAP CONFIDENCE
INTERVAL . M ETRICS SUCH AS PRECISION , RECALL , AND F1- SCORE ARE REPORTED AS WEIGHTED AVERAGES .
visual–language foundation model for pathology image analysis using vol. 43, pp. 1–12, 2019.
medical twitter,” Nature medicine, vol. 29, no. 9, pp. 2307–2316, 2023. [23] Q. D. Buchlak, N. Esmaili, J.-C. Leveque, C. Bennett, F. Farrokhi,
[14] M. A. Mazurowski, H. Dong, H. Gu, J. Yang, N. Konz, and Y. Zhang, and M. Piccardi, “Machine learning applications to neuroimaging for
“Segment anything model for medical image analysis: an experimental glioma detection and classification: An artificial intelligence augmented
study,” Medical Image Analysis, vol. 89, p. 102918, 2023. systematic review,” Journal of Clinical Neuroscience, vol. 89, pp. 177–
[15] A. Kirillov, E. Mintun, N. Ravi, H. Mao, C. Rolland, L. Gustafson, 198, 2021.
T. Xiao, S. Whitehead, A. C. Berg, W.-Y. Lo et al., “Segment anything,” [24] W. Jin, M. Fatehi, R. Guo, and G. Hamarneh, “Evaluating the clinical
in Proc. ICCV 2023, 2023, pp. 4015–4026. utility of artificial intelligence assistance and its explanation on the
[16] M. Oquab, T. Darcet, T. Moutakanni, H. Vo, M. Szafraniec, V. Khalidov, glioma grading task,” Artificial Intelligence in Medicine, p. 102751,
P. Fernandez, D. Haziza, F. Massa, A. El-Nouby et al., “Dinov2: 2024.
Learning robust visual features without supervision,” arXiv preprint [25] D. S. Kermany, M. Goldbaum, W. Cai, C. C. Valentim, H. Liang, S. L.
arXiv:2304.07193, 2023. Baxter, A. McKeown, G. Yang, X. Wu, F. Yan et al., “Identifying
[17] X. Yue, Q. Zhao, J. Li, X. Liu, C. Song, S. Liu, and G. Fu, “Morphology- medical diagnoses and treatable diseases by image-based deep learning,”
enhanced cam-guided sam for weakly supervised breast lesion segmen- cell, vol. 172, no. 5, pp. 1122–1131, 2018.
tation,” arXiv preprint arXiv:2311.11176, 2023. [26] J. I. Orlando, H. Fu, J. B. Breda, K. Van Keer, D. R. Bathula, A. Diaz-
[18] J. P. Huix, A. R. Ganeshan, J. F. Haslum, M. Söderberg, C. Matsoukas, Pinto, R. Fang, P.-A. Heng, J. Kim, J. Lee et al., “Refuge challenge:
and K. Smith, “Are natural domain foundation models useful for A unified framework for evaluating automated methods for glaucoma
medical image classification?” in Proceedings of the IEEE/CVF Winter assessment from fundus photographs,” Medical image analysis, vol. 59,
Conference on Applications of Computer Vision, 2024, pp. 7634–7643. p. 101570, 2020.
[19] M. Weller, W. Wick, K. Aldape, M. Brada, M. Berger, S. M. Pfister, [27] P. Tschandl, C. Rosendahl, and H. Kittler, “The ham10000 dataset,
R. Nishikawa, M. Rosenthal, P. Y. Wen, R. Stupp et al., “Glioma,” a large collection of multi-source dermatoscopic images of common
Nature reviews Disease primers, vol. 1, no. 1, pp. 1–18, 2015. pigmented skin lesions,” Scientific data, vol. 5, no. 1, pp. 1–9, 2018.
[20] T. Komori, “Grading of adult diffuse gliomas according to the 2021 [28] G. Litjens, T. Kooi, B. E. Bejnordi, A. A. A. Setio, F. Ciompi,
who classification of tumors of the central nervous system,” Laboratory M. Ghafoorian, J. A. Van Der Laak, B. Van Ginneken, and C. I. Sánchez,
Investigation, vol. 102, no. 2, pp. 126–133, 2022. “A survey on deep learning in medical image analysis,” Medical image
[21] Q. T. Ostrom, D. J. Cote, M. Ascha, C. Kruchko, and J. S. Barnholtz- analysis, vol. 42, pp. 60–88, 2017.
Sloan, “Adult glioma incidence and survival by race or ethnicity in the [29] G. Fu, J. Li, R. Wang, Y. Ma, and Y. Chen, “Attention-based full slice
united states from 2000 to 2014,” JAMA oncology, vol. 4, no. 9, pp. brain ct image diagnosis with explanations,” Neurocomputing, vol. 452,
1254–1262, 2018. pp. 263–274, 2021.
[22] A. M. KV, V. Rajendran, and P. J. K, “Glioma tumor grade identification [30] R. Wang, G. Fu, J. Li, and Y. Pei, “Diagnosis after zooming in: A
using artificial intelligent techniques,” Journal of medical systems, multilabel classification model by imitating doctor reading habits to
SUBMITTED TO IEEE CONFERENCE 9
diagnose brain diseases,” Medical physics, vol. 49, no. 11, pp. 7054– classification using mri images,” Expert Systems with Applications, vol.
7070, 2022. 213, p. 119087, 2023.
[31] M. Soleymanifard and M. Hamghalam, “Multi-stage glioma segmenta- [53] G. Huang, Z. Liu, and K. Q. Weinberger, “Densely connected
tion for tumour grade classification based on multiscale fuzzy c-means,” convolutional networks,” CoRR, vol. abs/1608.06993, 2016. [Online].
Multimedia Tools and Applications, vol. 81, no. 6, pp. 8451–8470, 2022. Available: http://arxiv.org/abs/1608.06993
[32] T. Kaur and T. K. Gandhi, “Deep convolutional neural networks with [54] N. Girdhar, A. Sinha, and S. Gupta, “Densenet-ii: An improved deep
transfer learning for automated brain image classification,” Machine convolutional neural network for melanoma cancer detection,” Soft
vision and applications, vol. 31, no. 3, p. 20, 2020. computing, vol. 27, no. 18, pp. 13 285–13 304, 2023.
[33] H. E. Kim, A. Cosa-Linan, N. Santhanam, M. Jannesari, M. E. Maros, [55] M. G. Lanjewar, K. G. Panchbhai, and P. Charanarur, “Lung cancer
and T. Ganslandt, “Transfer learning for medical image classification: a detection from ct scans using modified densenet with feature selection
literature review,” BMC medical imaging, vol. 22, no. 1, p. 69, 2022. methods and ml classifiers,” Expert Systems with Applications, vol. 224,
[34] X. Yu, J. Wang, Q.-Q. Hong, R. Teku, S.-H. Wang, and Y.-D. Zhang, p. 119961, 2023.
“Transfer learning for medical images analyses: A survey,” Neurocom- [56] N. N. Prakash, V. Rajesh, D. L. Namakhwa, S. D. Pande, and S. H.
puting, vol. 489, pp. 230–254, 2022. Ahammad, “A densenet cnn-based liver lesion prediction and classifica-
[35] S. Deepak and P. Ameer, “Brain tumor classification using deep cnn tion for future medical diagnosis,” Scientific African, vol. 20, p. e01629,
features via transfer learning,” Computers in biology and medicine, vol. 2023.
111, p. 103345, 2019. [57] R. S. Sonawane, “Skin-cancer classification using deep learning with
[36] R. Chelghoum, A. Ikhlef, A. Hameurlaine, and S. Jacquir, “Transfer densenet and vgg with streamlit-framework implementation,” Ph.D.
learning using convolutional neural network architectures for brain tumor dissertation, Dublin, National College of Ireland, 2023.
classification from mri images,” in IFIP international conference on [58] T. Komori, “The 2016 who classification of tumours of the central
artificial intelligence applications and innovations. Springer, 2020, pp. nervous system: the major points of revision,” Neurologia medico-
189–200. chirurgica, vol. 57, no. 7, pp. 301–311, 2017.
[37] Y. Huang, X. Yang, L. Liu, H. Zhou, A. Chang, X. Zhou, R. Chen, [59] E. Thibeau-Sutre, M. Diaz, and et al., “ClinicaDL: an open-source deep
J. Yu, J. Chen, C. Chen et al., “Segment anything model for medical learning software for reproducible neuroimaging processing,” Computer
images?” Medical Image Analysis, p. 103061, 2023. Methods and Programs in Biomedicine, vol. 220, p. 106818, 2022.
[38] M. A. Mazurowski, H. Dong, H. Gu, J. Yang, N. Konz, and Y. Zhang, [60] D. P. Kingma and J. Ba, “Adam: A method for stochastic optimization,”
“Segment anything model for medical image analysis: an experimental arXiv preprint arXiv:1412.6980, 2014.
study,” Medical Image Analysis, vol. 89, p. 102918, 2023. [61] A. Paszke, S. Gross, F. Massa et al., “PyTorch: An imperative style,
[39] M. Caron, H. Touvron, I. Misra, H. Jégou, J. Mairal, P. Bojanowski, and high-performance deep learning library,” Proc. NeurIPS, vol. 32, 2019.
A. Joulin, “Emerging properties in self-supervised vision transformers,” [62] S. Marcel and Y. Rodriguez, “Torchvision the machine-vision package
in Proceedings of the IEEE/CVF international conference on computer of torch,” in Proceedings of the 18th ACM international conference on
vision, 2021, pp. 9650–9660. Multimedia, 2010, pp. 1485–1488.
[40] J. Zhou, C. Wei, H. Wang, W. Shen, C. Xie, A. Yuille, and T. Kong, [63] O. Colliot, E. Thibeau-Sutre, and N. Burgos, “Reproducibility in ma-
“ibot: Image bert pre-training with online tokenizer,” arXiv preprint chine learning for medical imaging,” Machine Learning for Brain
arXiv:2111.07832, 2021. Disorders, pp. 631–653, 2023.
[41] M. Baharoon, W. Qureshi, J. Ouyang, Y. Xu, K. Phol, A. Aljouie,
and W. Peng, “Towards general purpose vision foundation models for
medical image analysis: An experimental study of dinov2 on radiology
benchmarks,” arXiv preprint arXiv:2312.02366, 2023.
[42] J. Prokop, J. M. Tordera, J. Jaworek-Korjakowska, and S. Mohammadi,
“Deep metric learning for few-shot x-ray image classification,” medRxiv,
pp. 2023–08, 2023.
[43] K. Simonyan and A. Zisserman, “Very deep convolutional networks for
large-scale image recognition,” arXiv preprint arXiv:1409.1556, 2014.
[44] A. A. Hossain, J. K. Nisha, and F. Johora, “Breast cancer classification
from ultrasound images using vgg16 model based transfer learning,”
International Journal of Image, Graphics and Signal Processing, vol. 13,
2023.
[45] P. Gayathri, A. Dhavileswarapu, S. Ibrahim, R. Paul, and R. Gupta,
“Exploring the potential of vgg-16 architecture for accurate brain tumor
detection using deep learning,” Journal of Computers, Mechanical and
Management, vol. 2, 2023.
[46] N. Veni and J. Manjula, “Vgg-16 architecture for mri brain tumor image
classification,” in Futuristic Communication and Network Technologies:
Select Proceedings of VICFCNT 2021, Volume 1. Springer, 2023, pp.
319–328.
[47] R. Islam, A. B. Akhi, and F. Akter, “A fine tune robust transfer learning
based approach for brain tumor detection using vgg-16,” Bulletin of
Electrical Engineering and Informatics, vol. 12, no. 6, pp. 3861–3868,
2023.
[48] K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image
recognition,” CoRR, vol. abs/1512.03385, 2015. [Online]. Available:
http://arxiv.org/abs/1512.03385
[49] A. K. Sharma, A. Nandal, A. Dhaka, L. Zhou, A. Alhudhaif, F. Alenezi,
and K. Polat, “Brain tumor classification using the modified resnet50
model based on transfer learning,” Biomedical Signal Processing and
Control, vol. 86, p. 105299, 2023.
[50] S. Athisayamani, R. S. Antonyswamy, V. Sarveshwaran, M. Almeshari,
Y. Alzamil, and V. Ravi, “Feature extraction using a residual deep con-
volutional neural network (resnet-152) and optimized feature dimension
reduction for mri brain tumor classification,” Diagnostics, vol. 13, no. 4,
p. 668, 2023.
[51] N. Gouda and J. Amudha, “Skin cancer classification using resnet,” in
2020 IEEE 5th International Conference on Computing Communication
and Automation (ICCCA). IEEE, 2020, pp. 536–541.
[52] H. Mehnatkesh, S. M. J. Jalali, A. Khosravi, and S. Nahavandi, “An
intelligent driven deep residual learning framework for brain tumor