1. Introduction
A brain tumor (often abbreviated as BT) is an abnormal growth of brain cells that may manifest symptoms of cancer [
1]. It forms an abnormal segment with varying features of normal cells [
2,
3]. Benign BTs are composed uniformly of inactive cells. In contrast, malignant tumors are composed of cancerous, active cells with a non-uniform structure. These tumors can be divided into two categories: primary and metastatic BT. In the case of primary tumors, the cancerous cells are contained within the brain; however, in the case of metastatic tumors, the cancerous cells have spread to other parts of the body, not just the brain [
4,
5]. Because it affects people of all ages, BT is of the deadliest illnesses in the world [
6]. It is the most prevalent malignancy in older persons and the third most prevalent among adolescents and young adults [
4]. Brain tumors include Glioma, Meningioma, and Pituitary [
7]. Gliomas are found in certain brain regions, such as the cerebral pedicle and spinal cord. They induce symptoms such as vomiting, headaches, and discomfort. They represent one-third of all brain tumors and 80% of primary malignant brain tumors [
8]. Glioma cases are growing at an alarming rate with a serious impact on human mortality. Similarly, a Meningioma is a tumor that forms in the meninges, layers that surround the brain and spinal cord. A Pituitary tumor is an irregular enlargement that forms in the Pituitary gland. In most cases, these tumors are benign [
9].
For tumor classification purposes, the term No-tumor is used in the literature to refer to the healthy brain category. According to the World Health Organization (WHO) Classification of Tumors of the Central Nervous System WHO 2021, BTs are divided into four classes (I-IV) with progressively higher malignancies and a worse prognosis [
7]. In clinical practice, the kind, size, location, and grade of a tumor impact the selection of treatment [
10,
11,
12]. In addition, medical imaging encompasses some noninvasive procedures for viewing the body’s interior. They are an essential source of information for illness diagnosis nowadays. Therapy and diagnosis are the principal uses of medical imaging in the human body. Therefore, it contributes significantly to the improvement of human health [
1,
13,
14]. Computed tomography (CT), X-ray, ultrasound imaging (UI), single photon emission computed tomography (SPECT), positron emission tomography (PET), positron magnetic resonance imaging (PET-MRI), magnetic resonance spectroscopy (MRS), and magnetic resonance imaging (MRI), which are used to diagnose BT, are some imaging techniques commonly used by specialists [
15,
16,
17].
According to previous research, MRI is the most effective and extensively used method for identifying and classifying BTs. T1-weighted MRI (T1), T2-weighted MRI (T2), T1-weighted contrast-enhanced MRI (T1-CE), and fluid-attenuated inversion recovery (FLAIR) are the four diagnostic MRI modalities [
16,
18,
19]. Brain tumor diagnosis using MRI employing software-based tools includes segmentation, identification, and categorization of brain tumors [
5,
20], which results in a quicker response to therapy and increases patient survival [
16,
21]. As a result, software specialists have been tasked with developing tumor detection systems, particularly using image processing [
2]. Images, on the other hand, form a massive element of both digital and physical data stores. Because of this, image datasets tend to be relatively huge. The proliferation of digital cameras and other imaging technology has resulted in a huge uptick in the quantity of digital photographs taken and stored [
14].
The manual identification and categorization of brain tumors in huge databases of medical images in typical clinical jobs have a significant cost in both effort and time. As a result, certain solutions have been adopted today, utilizing machine learning (ML) and deep learning (DL) approaches for brain tumor segmentation, detection, and classification [
16]. DL approaches with CNN structures are now employed to analyze medical images of various forms of malignancy [
22]. Similarly, the transfer learning (TL) technique has been utilized, which is defined as a process in which a model previously trained in a specific problem is used in another similar problem and which has the benefit of a shorter training period because it has already been trained with a similar problem. Both strategies yielded great results [
23]. Sultan et al. [
24], for instance, proposed a convolutional neural network (CNN)-based deep learning model for classifying three kinds of brain cancers using two publically accessible datasets. The total
of the suggested network structure was 96.13% and 98.7% for the two sets. Similarly, in Aamir et al., 2022 [
17], the authors suggest two DL models for feature extraction from a dataset, to use yet a third model for the classification process. The strategy they developed was so effective that it led to a 98.95% success rate in classifying data. Additionally, Chattopadhyay and Maitra 2022 [
13] created an algorithm for segmenting BT from MRI images by first employing a CNN and then utilizing conventional classifiers. It was found that the proposed model was 99.74% accurate.
On the other hand, Nayak et al., 2022 [
25] showed a CNN using min-max normalization to classify 3260 MRI images into four categories (Glioma, Meningioma, Pituitary, and No-tumor). The developed network is a variant of EfficientNet. The results indicated that the model was 99.97% accurate during training and 98.78% during testing. Wahlang et al., 2022 [
26] determined if an MRI image is normal or pathological. They created a DL architecture based on LeNet. Age and gender are now considered criteria as well. Compared to other pre-trained models, the LeNet-inspired model achieved an overall
of 88%, whereas CNN-DNN architectures achieved just 80%, support vector machine (SVM) 82%, and AlexNet (64%). Another work where the TL AlexNet model was used as a tool in classification tasks was Badjie et al., 2022 [
27]. They proposed a binary classification to identify between an unhealthy brain and a healthy brain. They obtained an
of 99.62%.
For Raza et al., 2022 [
28], the categorization of three kinds of brain tumors (Glioma, Meningioma, and Pituitary) was accomplished using a hybrid deep-learning model named DeepTumorNet. Instead of these last five levels, they took the GoogLeNet design as a foundation and built 15 more layers. They achieved 99.67%
, 99.6%
, 100%
, and a 99.66% F1 score. They compared their outcomes to pre-trained models such as AlexNet, Resnet50, darknet-53, Shufflenet, GoogLeNet, SqueezeNet, Resnet101, Exception-Net, and MobileNetv2.
Another study point is using proposed and pre-trained TL models with other features or classification methods. An example of this is Maqsood et al. [
29], who divided their method into five steps, including the design of a 17-layer deep neural network architecture for brain tumor segmentation, the use of modified MobileNetV2 CNN for feature extraction, and the application of M-SVM for the classification of Meningioma, Glioma, and Pituitary. They obtained a 97.47%
rate for the BraTS 2018 dataset and a 98.92%
rate for Figshare. Likewise, Amran et al., 2022 [
30] established a deep hybrid learning classification model for binary brain tumors. This technique combines the GoogLeNet architecture with a CNN model by eliminating five levels of the GoogLeNet architecture and adding fourteen layers of the CNN model that automatically extracts features. ResNet, VGG-16, SqeezNet, AlexNet, MobileNetV2, and several ML/DL models were compared to the proposed model. Its classification scores were 99.51%
, 99%
, 98.90%
, and 98.50% F1-Score. Samme et al., (2002) [
31], provided a deep hybrid transfer learning (GN AlexNet) model for Pituitary, Meningioma, and Glioma BT classification. The suggested model integrated the GoogleNet architecture with the AlexNet model by deleting five GoogleNet levels and adding ten AlexNet layers, which automatically collects and classifies characteristics. Comparing the proposed model against pre-trained models (VGG-16, AlexNet, SqeezNet, ResNet, and MobileNet V2). The model’s
was 99.51%, and its sensitivity was 98.90%.
In other efforts, such as Ghazanfar et al., 2022 [
32], a strategy for classifying Glioma tumors was created employing CNN for feature extraction and SVM for classification. They attained an
of 96.19% for the HGG Glioma type and 95.46% for the LGG Glioma tumor type when classifying the four Glioma types using the suggested method (edema, necrosis, enhancement, and non-enhancement). Jibon et al., 2022 [
33] proposed a classification method to distinguish cancerous and non-cancerous tumors from MRIs using log-polar transform (LPT) and CNN. The LPT has been applied for the extraction of rotation and scaling features from distorted images, while the integration of CNN introduced an ML approach to the classification of tumors from distorted images. The results showed that the ML approach provides better classification, with a success rate of about 96%, on both single MRI images and brain MRI images with rotation and scale invariance. The model proposed by Yazdan et al., 2022 [
34], is a multiclass classification solution for magnetic resonance imaging (MRI) in Glioma, Meningioma, Pituitary, and No-tumor. The experimental findings demonstrated that the suggested multi-scale CNN model outperforms AlexNet and ResNet in terms of
and efficiency while incurring less computational expense. The proposal has an
score of 91.2% and an F1 score of 91%. Ullah et al., 2022 [
35] introduced a binary classification Tumor Resnet DL model for brain detection. The suggested model obtained 99.33%
. These experimental results, including the cross-dataset configuration, indicate that the TumorResNet model is better than some contemporary frameworks.
Another approach to TL was that of Alanazi et al., 2022 [
36] who developed a TL-based model for classifying BT into its subtypes, including Pituitary, Meningioma, and Glioma. Using the notion of TL, they repurposed the isolated 22-layer CNN model of binary classification (tumor or No-tumor) to rescale the weights of neurons in order to classify MRIs. Consequently, the
of the constructed TL model for the employed images was 95.75%. Other cases of the use of TL are, for example, Ahmed et al., 2022 [
37], who created an evolutionary method to identify MRI images into three brain tumor classifications. The Xception model was used for feature extraction. In this study, the model
was 99.06%. Secondly, Ullah et al., 2022 [
38], utilized TL to compare nine classifiers on the same dataset. InceptionResNetv2, InceptionV3, Xception, ResNet18, ResNet50, Resnet101, ShuffleNet, DenseNet201 and MobileNetV2. The best model was InceptionResNetV2 and its
was 98.91%. Similarly, Deepak and Ameer 2019 [
11] classified brain tumors into three BT classes. They utilized a pre-trained Google Neural Network to extract features from MRI. Moreover, SVM and K-nearest neighbors (KNN) were used. Their findings demonstrated an
of 98%.
In contrast, there are some works that only use TL models and assemble them for the classification task. For example, Kumar et al. [
15] submitted a work that used the TL method. They focused on identifying malignant tumors, benign tumors, and healthy brain tissue. They made use of ResNet152. Using the CoV-19 OA optimization technique, the weight parameters were modified. They compared their findings to those of previously suggested models. The suggested approach achieved the
values of 99.57%, 97.28%, 94.31%, 95.48%, 96.38%, 98.41%, and 96.34%. With a different approach, the work by Tandel et al., 2021 [
39], developed four therapeutically useful datasets. Five-fold cross-validation was used to evaluate the four sets with five DL-based models, AlexNet, VGG16, ResNet18, GoogleNet, and ResNet50, and five ML-based models, support vector machine, K-nearest neighbors, Nave Bayes, decision tree, and linear discrimination. They presented the MajVot method in order to maximize the classification performance of five DL and ML-based models. As a consequence, they achieved an increase in average
. Subsequently, using a majority voting technique, Tandel et al., 2022 [
40] worked with five pre-trained CNNs, including AlexNet, VGG16, ResNet18, GoogleNet, and ResNet50, for three distinct datasets. Their maximum level of
was 99.06%. In a similar way, Wu et al., 2022 [
41], using the TL models InceptionV3, Resnet101, and Densenet201, offered a similar categorization of three kinds of cancers. The average
for each model was 96.21%, 97%, and 96.5%, respectively.
The approach developed by Kazemi et al., 2022 [
14] is a parallel CNN model comprised of AlexNet and VGGNet networks. The layer structure differs between the two network types. The characteristics are integrated at a single location, followed by the classification using the softmax function. The
of the model was 99.14% for binary classification and 98.78% for multiclass.
Five pre-trained models and one suggested CNN model were proposed by Aurna et al., 2022 [
23], who picked the best models to combine them. VGG19, EfficientNetB0, InceptionV3, ResNet50, Xception, and the authors’ suggested model were the CNNs employed in this work. They used three distinct datasets and reached an
of 99.67%, 98.16%, and 99.76%, with the best assembly, respectively.
This paper proposes a CNN-based magnetic resonance imaging classification approach for four classes of BT: Meningioma, Glioma, Pituitary tumor, and No-tumor, using the Generic CNN model and other pre-trained models. The used dataset, Msoud Msoud2021, is a combination of three datasets, Fighshare, SARTAJ, and BR35H, which includes 7023 images divided into 80% for the training and 20% for the test stages. For training the proposed DL models, the k-fold cross-validation approach is adopted. The procedure becomes more difficult by adding zoom and brightness in the preprocessing stage. Moreover, a novel tool called WandB [
42], recently created by Weights & Biases for graphing and visualization of machine learning results is used. This study was motivated by the success of previous research in the TL field. To achieve the classification of BT in the target dataset, we applied different TL models previously described in the literature, such as InceptionResNetV2, InceptionV3, Xception, ResNet50, MobileNetV2, and efficientNetB0. In addition, the proposed CNN models are compared with various techniques to establish their effectiveness in MRI classification, one of them being a Model Size versus Model
graph, among other performance metrics. The goal is to discover the optimal classifier for this quadruple class of brain tumors, which is a multiclass problem, and the complexity is higher.
The remaining sections of the article are structured as follows:
Section 2 provides the proposed approach to the CNN models. In addition, the six pre-trained models, the generic CNN model, and the dataset are described.
Section 3 presents the findings and the limitations of the work. Finally, in
Section 4, the conclusions of the offered research are presented.
3. Results and Discussion
As stated in earlier parts, the neural network models were trained using a 5-fold cross-validation. The derived quantitative results are depicted in the Figures and Tables that follow. This part also evaluates the performance of different pre-trained TL classifiers used to classify MRI images from the multiclass dataset described in earlier sections. The primary benefit of TL classifiers and hyperparameter adjustment is the elimination of overfitting issues, which are common in DL algorithms when experimenting with a relatively small sample [
38].
After the preprocessing task, the following image examples were obtained as shown in
Figure 8. As can be observed, zoom and brightness characteristics were added to each image sample for each tumor type, as in the cases of Glioma and Meningioma tumors and the position were altered, as in the case of the Pituitary without the tumor. In the case of certain images, such as the Pituitary example, just the position and brightness were altered. The primary objective of the used preprocessing approaches was to enable the models to acquire knowledge independently of image features.
Table 5 shows the classification results of the TL algorithms and the proposed model, revealing that each TL classifier produced acceptable and competitive results, considering the preprocessing used and the composition of the dataset used. Likewise, it is observed that the generic CNN model does not have acceptable results compared to the pre-trained models, with an
of 81.05%. Using the evaluation parameters of
,
,
,
, and area under the curve (AUC) the algorithms were evaluated. The loss ratio for each instance was also mentioned, noting that the InceptionV3 model was the model that obtained the lowest losses during the training stage. According to the results, the DL InceptionV3 model had the highest average
of 97.12%, the average obtained from the cross-validation performed with k = 5. On the other hand, it is essential to note that the variations of the ResNet, ResNet50, and InceptionResNetV2 produced different but extremely similar results, being the second and third place in the table with 96.97% and 96.78%, respectively.
Table 6 shows in more detail the results obtained in each of the k-folds in the cross-validation process for the model that obtained the best average performance, InceptionV3. As can be observed, the standard deviation and the confidence limit have minor differences between each of the K-folds, which is a positive result since it demonstrates the distance between the data and the data median. This demonstrates the algorithm’s robustness to diverse data blocks.
Using Wandb, plots were generated. Wandb is a tool for tracking machine learning experiments. It makes it easy for machine learning practitioners to keep track of experiments and share their results with partners [
42].
The training and validation process of the best classification model for this study is shown in
Figure 9 and
Figure 10. The metrics shown are the average training and validation accuracy and the average of training and validation
. The number of k-folds used in this study may be seen in the graphs. The average percentages of
and
are 97.12% and 97.97%, respectively. On the other hand, the average validation
and
are 97.82% and 98.64%, respectively.
Moreover,
Figure 11 shows the percentage of losses generated in every K-fold in the stage of training and validation of the InceptionV3 model. The average percentage of losses observed during training and validation of this model is 7.9% and 6.3%, respectively.
Different performance metrics, such as
,
, and
, listed in
Table 5, were used to compare the suggested model’s performance. Using the confusion matrix, these parameters are examined.
Figure 12 represents the confusion matrices used to study the specifics of each k-fold. As a result of overfitting using 30% of the test data extracted from the data set during the Data Augmentation phase, these confusion matrices include some misclassifications in each k-fold. The misclassified tumors of the InceptionV3 model in the confusion matrix of the first k-fold include 13 of label 0 corresponding to Glioma, 12 of label 1 corresponding to Meningioma, 26 of label No-tumor, and 4 of label Pituitary, as shown in
Figure 12. Five from the Glioma label, thirty-four from the Meningioma label, nineteen from the No-tumor label, and one from the Pituitary label are misclassified tumors in the second k-fold confusion matrix. For k-fold equal to 3, 7 Gliomas, 16 Meningiomas, 12 No-tumor labels, and 1 Pituitary label are misclassified as tumors. For k-fold equal to 4, the misclassified tumors consist of one Glioma, 17 Meningiomas, 21 No-tumor labels, and 7 Pituitary labels. The final k-fold value is 5, and the misclassified classes consist of 5 Gliomas, 21 Meningiomas, 34 No-tumor labels, and 8 Pituitary labels. Due to less misclassified data, the InceptionV3 model is more accurate than the alternatives. k-fold classification of Glioma and Pituitary tumor is performed very effectively by any CNN model. The meningione and No-tumor classes cannot be learned as efficiently as the other three.
Another method that helped us to determine the most efficient model in this study was the use of the Model Size versus Model
diagram. This is shown in
Figure 13. This graphic is often used to evaluate the performance of a model with its size since the latter determines the possible computing cost (this issue will be addressed in the next section). This kind of graphic is used in studies such as Tan and Le (2019) [
48] where different pre-trained models and models proposed by the authors were compared. The graphic is constructed by placing the number of parameters or model size on the x-axis and the percentage of
on the y-axis. The sizes are listed in increasing order, beginning with the lowest model size regardless of its
and progressing to the biggest model size. The relative
of each model is then determined, and the graph’s trajectory is defined by lines. Finally, the graph’s structure is evaluated to determine which model is ideal in terms of parameters and
.
In this case study, InceptionV3 remains the most remarkable model. Although our perspective shifted with regard to ResNet50, which scored second in
Table 5 since its parameters are greater than those of InceptionV3, which suggests a greater computing cost. Considering its
and small number of parameters, MobileNetV2 might be considered the second-best model for the dataset utilized in this investigation.
3.1. Computational Complexity
This section discusses the computational complexity necessary for this research.
Table 7 has seven columns, beginning with the name of the model, followed by the
and parameters of each model. Then, the computational cost is shown by percentage, beginning with the CPU and GPU usage, followed by the percentage of CPU memory allocated and concluding with the training time due to its relation with the computational cost. The latter will be presented in a bar chart and explained subsequently.
According to
Table 7, the InceptionV3 model distinguishes itself from other high-
models because of its low GPU consumption and high
. Regarding the proportion of GPU Memory allocated, the InceptionV3 model achieved an average value of 60%. In terms of CPU utilization, the number for the InceptionV3 model is near the mean. As indicated in
Table 7, the runtime of this model represents the greatest possibility for optimization.
For this study, the Google Colab platform was used. This platform used The A100-SXM4-40 GB, a professional graphics card manufactured by NVIDIA.
For all the models considered in this study, the following computational cost plots were generated using the WandB tool previously mentioned in the reference [
42]. As can be seen in
Figure 14, shows the percentage of GPU and CPU utilization in each training k-fold for the InceptionV3 model. In each case, the highest utilization value obtained was taken and the average of these was obtained, which are the CPU 73.7% and GPU 58.73% utilization percentage values shown for InceptionV3 shown in
Table 7.
Figure 15 additionally shows and demonstrates that 60% of GPU Memory was allocated while training k-folds.
As a result of the Data Augmentation approach, which greatly increased the size of the dataset, all classification models took a substantial amount of time. This time is also dependent on the model’s complexity and architectural design. The duration is measured in hours and minutes. For brain tumor classification, TL’s InceptionV3 model was the longest but most effective elapsed time classifier, delivering good classification results but consuming the greatest time (5 h and 23 min). ResNet50 required a maximum of 3 h and 43 min to detect and classify MRI images of brain tumors into distinct kinds. It was the TL model with the quickest execution time and the second-best in the results table. Due to its architecture, the MobileNetV2 model was also one of the models with the shortest run time, producing results in 3 h and 51 min with a lower computational cost than training with the EfficientNetB0 model, which took 3 h and 59 min, generic CNN model which took 3 h and 20 min. It should be noted that the classification time of the various Resnet TL classifier versions grows as the number of framework layers increases. For instance, InceptionResNetV2 required a minimum of 4 h and 54 min. The length of the Xception models was 4 h and 34 min. Moreover, the training time of every CNN model was plotted in
Figure 16.
Table 5 demonstrates that, for the dataset utilized in this investigation, the
rose somewhat when a shallower ResNet model was employed. However, as the network’s depth rises, so does its computational complexity and, consequently, its training time, which eventually impacts the network’s efficiency; this explains why the
of the ResNet variations varies. In addition, we may deduce that the inceptionV3 algorithm is the best approach for classifying brain tumors in this study.
3.2. Comparison to Contemporary Related Work
We evaluated the classification performance of the best deep neural network, namely InceptionV3, to existing approaches for categorizing brain cancers into four categories: Meningioma, Pituitary, Glioma, and No-tumor. Specifically, we compared the proposed work to existing DL methods. In some of them, TL is the primary method, while in others, proposed models are the primary method.
Table 8 provides a detailed comparison of the pre-trained classification model techniques for brain tumor classification.
Table 8 only includes
as the primary performance statistic since it is the metric most often employed in all relevant research. In the first column of
Table 8 is the citation of the reviewed work, and the second column is the technique employed in each research, which may be an author-proposed model or the usage of pre-trained models. The third column additionally includes the dataset utilized in each research study since it is essential to refer to the dataset as a key indication of the results and to emphasize that in the current study, three distinct datasets, including a substantial number of photos are combined. In some research, the classification task is conducted, but the healthy brain is not taken into consideration as a class, an activity that is carried out in this study. In order to compare and assess the best current approaches, the best method or result attained is then included in each study.
According to
Table 8, the InceptionV3 TL model outperforms some existing state-of-the-art methods. Due to its capacity to extract stronger and more distinguishable deep features for classification, the method yields the best outcomes. Moreover, although we do not employ a balanced dataset (brain tumor classification (MRI) dataset), the number of images is adequate for the network to train. In contrast, the datasets used in previous approaches, such as [
30,
35] described in the preceding table, only have two classes for classification and do not have a substantial number of photos. Therefore, the dataset utilized in this study exceeds the state-of-the-art in the number of images and classes. In some instances, the fifth column of
Table 8 specifies the optimal model. Not in every study did the TL perform better than the recommended models. However, several InceptionV3 and InceptionResNetV2 models were among the best in certain trials. Similarly, several models were shown to be superior to the TL models.
3.3. Limitations of the Study
Finding an acceptable dataset for the classification problem is one of the constraints of this study. This kind of up-to-date medical information is difficult to collect, but fortunately, there are databases such as Kaggle that handle the storing of research data. In contrast, there is computational complexity. The usage of cloud-based technologies, such as Google Colab’s free version, may be beneficial for completing such activities. Although it is feasible to obtain a pro version and optimize time and resources, a high-performance computer would be required to do this research.