Paper 127-A Comprehensive Analysis of Network Security Attack Classification
Paper 127-A Comprehensive Analysis of Network Security Attack Classification
Paper 127-A Comprehensive Analysis of Network Security Attack Classification
Abstract—As internet usage and connected devices continue A. Research Objectives and Motivation
to proliferate, the concern for network security among The main objective of this paper is to conduct a
individuals, businesses, and governments has intensified.
comprehensive examination of network security attack
Cybercriminals exploit these opportunities through various
attacks, including phishing emails, malware, and DDoS attacks,
classification using ML algorithms. By exploring various ML
leading to disruptions, data exposure, and financial losses. In techniques and evaluating their applicability to network
response, this study investigates the effectiveness of machine security, the research aims to enhance precision and efficiency
learning algorithms for enhancing intrusion detection systems in in identifying and categorizing network attacks [4]. The
network security. Our findings reveal that Random Forest motivation behind this research lies in the critical need for
demonstrates superior performance, achieving 90% accuracy adaptive and intelligent security measures to counter the
and balanced precision-recall scores. KNN exhibits robust dynamic tactics employed by cybercriminals [5].
predictive capabilities, while Logistic Regression delivers
commendable accuracy, precision, and recall. However, Naive
B. Consequences of Cyber-Attacks
Bayes exhibits slightly lower performance compared to other The introduction also underscores the significant
algorithms. The study underscores the significance of leveraging consequences of successful cyber-attacks, ranging from
advanced machine learning techniques for accurate intrusion financial losses to reputational damage and legal ramifications
detection, with Random Forest emerging as a promising choice. [6]. This [7] highlights the importance of enhancing security
Future research directions include refining models and exploring measures to safeguard sensitive data, ensure uninterrupted
novel approaches to further enhance network security. operations, and maintain trust in digital systems.
Keywords—Machine learning; cyber security; intrusion C. Transition to Proactive Security Strategies
detection; network security; cyber security Furthermore, the integration of ML into network security
protocols facilitates a transition from reactive to proactive
I. INTRODUCTION
security strategies [8]. By preemptively addressing potential
In recent years, cyber-attacks have become more threats, organizations can enhance overall resilience and
sophisticated and frequent, posing significant challenges to security posture.
cybersecurity efforts. As organizations increasingly rely on
interconnected networks for their operations, they are exposed This paper will include a detailed comparative analysis
to a greater risk of malicious activities. Traditional security with state-of-the-art methods, including recent advancements
methods, such as firewalls and antivirus software, while still in deep learning applied to intrusion detection. Additionally,
valuable, are struggling to keep pace with the evolving tactics recent research in deep learning for intrusion detection will be
of cybercriminals [1]. These attacks can take various forms, reviewed to identify advancements and opportunities for
from relatively simple phishing emails to complex malware improvement. This comprehensive comparison will enhance
and DDoS attacks, resulting in operational disruptions, data the credibility and relevance of the research findings.
breaches, and financial losses [2]. To effectively combat these This study is structured to first explore the existing
threats, security professionals need to adopt more advanced landscape of network security and the challenges posed by
techniques for threat detection and mitigation [3]. Machine cyber-attacks. It will then delve into the application of ML
learning algorithms offer a promising solution by leveraging algorithms in enhancing threat detection and response
data analysis to identify patterns and anomalies indicative of processes. Following this, the paper will evaluate the strengths
malicious activity [4]. By automating threat detection and and limitations of existing network intrusion detection
response processes, ML can help organizations bolster their systems, proposing innovative ML solutions to address
network security defenses in the face of evolving cyber emerging challenges. Finally, it will provide recommendations
threats. for developing stronger, more flexible, and smarter security
systems to combat cyber threats effectively in today's digital
*Corresponding Author age.
1269 | P a g e
www.ijacsa.thesai.org
(IJACSA) International Journal of Advanced Computer Science and Applications,
Vol. 15, No. 4, 2024
II. RELATED WORKS novel and previously unseen threats that may not be captured
This review of the existing literature offers an in-depth by traditional rule-based systems.
examination of the present state of research in the For instance, research conducted by [17] on intrusion
classification of network security attacks through the detection exemplifies the application of machine learning in
application of machine learning algorithms. enhancing security measures. By leveraging machine learning
A. Network Security Attack Classification algorithms, researchers have demonstrated the effectiveness of
these techniques in discerning malicious activities within
Traditional cybersecurity methods rely on predefined rules network traffic. This study showcases the potential of machine
and signatures to detect and mitigate threats, but they struggle learning to augment traditional security measures by providing
to keep up with the rapidly evolving tactics of cybercriminals. a more adaptive and proactive approach to threat detection and
This [9] limitation has prompted a shift towards more adaptive mitigation [18].
and intelligent systems, leading to the exploration of machine
learning techniques. In their examination of machine learning Furthermore, the exploration of machine learning
algorithms, the focus is on their crucial role in intelligent data approaches in network security continues to evolve, with
analysis and automation within the cybersecurity field [10]. researchers investigating new algorithms and methodologies
They [11] highlight the ability of these algorithms to extract to address emerging challenges. As cyber threats become
valuable insights from diverse cyber data sources, increasingly sophisticated and diverse, the integration of
demonstrating their relevance in real-world scenarios and machine learning techniques holds promise for enhancing the
illustrating how data-driven intelligence contributes to resilience of network defenses and mitigating the impact of
proactive cybersecurity measures [12]. Furthermore, [13] cyber-attacks.
their analysis explores current methodologies, their practical D. Feature Extraction
implications, and emerging research directions, aiming to
provide a comprehensive understanding of the current state of The success of machine learning models in network
machine learning in cybersecurity and its potential for security heavily relies on the selection and extraction of
transformative advancements in line with the goals of our relevant features. Features can include traffic patterns, packet
research content, and behavioral analysis [19]. The process of feature
selection is critical in optimizing the performance of the
B. Machine Learning in Network Security machine learning model, as irrelevant or redundant features
Machine learning's role in network security extends far can lead to decreased accuracy and increased computational
beyond just threat detection. It encompasses prevention, overhead. Researchers in [20] have explored various feature
response, and recovery aspects as well. By leveraging machine selection techniques to identify the most informative features
learning, organizations can build systems that continuously for attack classification. The study in [21] employs machine
adapt to emerging threats, effectively fortifying their defenses learning models and feature selection techniques to detect
against evolving attack patterns [14]. This adaptability is DDoS attacks in SDN, achieving optimal accuracy (98.3%)
particularly crucial in an environment where cyber threats are with KNN.
constantly evolving in sophistication and evasiveness. Feature engineering is a critical step in the data
Furthermore, a recent study introduces a comprehensive preprocessing pipeline, aimed at transforming raw data into a
taxonomy of security threats, evaluating the potential of format that enhances the performance of machine learning
artificial intelligence (AI), including machine learning, to models. It encompasses various techniques, including feature
address a wide range of challenges. This study in [15] extraction and feature selection, to optimize the dataset for
represents the first exhaustive examination of AI solutions analysis. Given our dataset's high dimensionality with 49
across various security types and threats. It covers lessons features, effective dimensionality reduction was essential to
learned, current contributions, future directions, open issues, streamline the analysis and mitigate computational
and strategies for effectively countering advanced security complexity. To achieve this, we opted for PCA as a feature
threats [16]. This holistic approach underscores the extraction technique. PCA transforms the original features into
significance of integrating machine learning techniques into a reduced set of principal components, capturing the dataset's
network security frameworks to combat the diverse and essential variance while preserving valuable information.
evolving landscape of cyber threats effectively. Unlike feature selection techniques, which may exclude
potentially informative features, PCA retains underlying
C. Existing Machine Learning Approaches patterns and structures in the data. This approach not only
In addition to supervised learning methods like Support enhances computational efficiency but also maintains the
Vector Machines (SVM) and Random Forests, unsupervised integrity of the dataset. Explained variance analysis revealed
learning approaches, particularly anomaly detection, have that 10 principal components accounted for 90% of the
gained prominence in the realm of network security. Unlike dataset's variance, striking an optimal balance between
supervised methods that rely on labeled datasets to classify variance coverage and computational complexity in our study.
attacks, anomaly detection techniques can identify deviations
E. Related Articles and Cybersecurity Majors
from normal network behavior without predefined attack
signatures. This makes them particularly useful for detecting Table I shows summary of literature reviews, the table
major drawback from previous, write their accuracy values.
1270 | P a g e
www.ijacsa.thesai.org
(IJACSA) International Journal of Advanced Computer Science and Applications,
Vol. 15, No. 4, 2024
1271 | P a g e
www.ijacsa.thesai.org
(IJACSA) International Journal of Advanced Computer Science and Applications,
Vol. 15, No. 4, 2024
metrics such as accuracy precision recall and f 1 score these Nine different classes of attack families each representing
measures provide a thorough understanding of how well the a unique category of network security threat these classes
classifiers are performing in terms of correctly identifying and encompass a wide spectrum of attack methodologies
classifying network security attacks the combination of these providing a holistic view of the diverse challenges faced in
performance metrics ensures a comprehensive evaluation contemporary cybersecurity the dataset employs two label
taking into account various aspects of the model’s values for classification normal and attack enabling the
effectiveness in summary the proposed methodology begins categorization of network activities into either benign or
with meticulous data preprocessing addressing issues of malicious classes the dataset s utility extends shown in Fig. 2.
standardization normalization and feature selection it then The unsw nb 15 dataset serves as a vital resource in the field
tackles the challenge of class imbalance before training of cybersecurity research offering a rich and diverse collection
classifiers to detect diverse attack categories the evaluation of network activity records that enable in depth investigations
phase employs a set of performance metrics to gauge the into advanced intrusion techniques and the development of
overall effectiveness of the framework in accurately effective security solutions its comprehensive nature and well
identifying and classifying network security threats this defined class structure make it an invaluable tool for
methodological approach provides a systematic and robust researchers practitioners and educators alike in advancing the
foundation for analyzing network security attack data. understanding and mitigation of network security threats.
2) Data preprocessing: Raw data will undergo
preprocessing to handle missing values normalize features and
address any anomalies this step is crucial for the effective
application of machine learning algorithms.
a) Data standardizations: Data standardization, also
known as data normalization, is a crucial preprocessing step in
data analysis, particularly when working with machine
learning algorithms sensitive to input feature scales. This
process transforms the values of different variables to a
common scale, ensuring that no particular feature dominates
the learning process due to differences in their original scales.
By rescaling the variables to have a mean of 0 and a standard
deviation of 1, standardizing the data aids in maintaining
consistency and improving algorithm performance. Formula
is:
Z”= X’ – M’ / σ’
Here:-
Z” is the standardized value
Fig. 1. System design. X’ is the original value of the variable
B. Data Collection M’ is the mean of the variable
1) Data sources: Network traffic and attack data will be σ’ is the standard deviation of the variable.
sourced from Kaggle. This includes both real-world and
b) Data normalization: Data normalization is a
simulated datasets to ensure a comprehensive evaluation.
preprocessing method employed to adjust numerical variables
The unsw nb 15 dataset crafted by researchers in 2015 to a standardized range, usually between 0 and 1. This practice
stands as a comprehensive resource specifically tailored to aims to ensure that all variables equally contribute to the
address advanced network intrusion techniques comprising an analysis, preventing any single feature with larger magnitudes
extensive collection of 25 million records this dataset provides from dominating. One frequently used technique for
a rich and diverse landscape for the study of network security normalization is min-max scaling, which involves a formula
threats to the dataset encapsulates the complexity of modern for normalizing a variable.
cyber threats by encompassing 49 distinct features facilitating
Xnormalized=Xmax−Xmin/X−Xmin
a nuanced analysis of network activity the 49 features in the
dataset encapsulate various aspects of network traffic creating C. Machine Learning ML Classification Algorithm
a multidimensional representation of cyber activities these Machine learning classification algorithms are
features serve as essential variables for understanding and computational tools created to classify input data into
classifying different types of network security attacks predefined categories or labels by analyzing their underlying
researchers and practitioners benefit from the detailed and patterns and characteristics. These algorithms learn from
granular information embedded in each record enabling a labeled training data, identifying patterns and relationships to
thorough exploration of advanced intrusion techniques one predict the class labels of new instances. Various classification
noteworthy characteristic.
1272 | P a g e
www.ijacsa.thesai.org
(IJACSA) International Journal of Advanced Computer Science and Applications,
Vol. 15, No. 4, 2024
algorithms, each with unique methodologies such as rule- algorithm calculates the probability of a given instance
based decision-making or probabilistic modeling, are utilized belonging to a specific class by considering the conditional
to effectively categorize data points into different classes. probabilities of each feature given the class despite its
These algorithms are versatile tools used for tasks like simplicity naive bayes often performs well and is
detecting spam, recognizing images, and diagnosing medical
computationally efficient the naive assumption simplifies
conditions. Their performance is typically assessed using
metrics such as accuracy, precision, recall, and F1 score, calculations making it suitable for high dimensional datasets
ensuring their efficacy across diverse applications. despite its success naive bayes might struggle with correlated
features violating the independence assumption nevertheless
This is a Classification problem where we want to detect its speed simplicity and respectable performance in various
whether there is an attack or not. applications make it a popular choice for tasks involving
1) KNN: This is Ml algorithm proficient in both categorical or text based data.
classification and regression assignments. Unlike traditional 4) Logistic regression: It is used linear model for binary
methods, KNN doesn't undergo a conventional training phase and multiclass classification problems despite its name. Also
but rather memorizes the entire training dataset. During sigmoid the logistic function transforms the output into a
prediction, it relies on the proximity of data points within the range between 0 and 1 interpreting it as the probability of the
feature space [31]. To classify a new data point, KNN positive class the algorithm optimizes its parameters through
computes distances, often employing Euclidean distance, from maximum likelihood estimation regularization techniques like
the point to all other instances in the training set. The k- l 1 or l 2 regularization can be applied to prevent overfitting
nearest neighbors, identified by the smallest distances, then logistic regression is interpretable and its coefficients provide
engage in a majority voting mechanism to allocate the class to insights into feature importance it s suitable for linearly
the new data point. Alternatively, a weighted voting system separable problems but may struggle with complex
can be utilized, granting closer neighbors greater influence. In relationships ensemble methods like random forest often
regression duties, KNN forecasts the target value through outperform logistic regression on more intricate datasets but
averaging (or weighted averaging) the target values of the k- its simplicity interpretability and efficiency make it a valuable
nearest neighbors. The selection of the hyper-parameter 'k' is tool in various classification tasks.
pivotal, as it shapes the algorithm's sensitivity and D. Evaluation Metrics
generalization capability. KNN showcases its adaptability
Evaluation metrics serve as the compass for navigating the
across various domains like image recognition and
landscape of machine learning model performance accuracy
recommendation systems. Nonetheless [32], its performance the bedrock metric quantifies the models overall correctness
hinges on meticulous 'k' selection, the choice of a distance precision zooms in on the models ability to avoid false
metric, and understanding the dataset's traits. Employing positives while recall encapsulates its prowess in capturing all
efficient data structures such as KD-trees can enhance actual positive instances the f 1 score harmonizes precision
scalability, while thoughtful parameter tuning ensures its and recall into a single metric striking a balance between
efficacy across diverse contexts. precision oriented and recall oriented scenarios the confusion
2) Random forest: It is an ensemble learning method matrix a comprehensive tableau breaks down a models
widely used for classification and regression tasks, particularly predictions into true positives true negatives false positives
in intrusion detection, the algorithm operates through and false negatives these metrics collectively illuminate the
multifaceted facets of a models effectiveness providing
bootstrapped sampling creating diverse subsets of the dataset
practitioners with a versatile toolkit to gauge and enhance
by randomly selecting instances with replacement and training performance across diverse applications shown in Fig. 2.
individual decision trees on these subsets key to its robustness
is the random select of features at each node split tree
construction preventing overemphasis on specific features in
classification random forest employs a majority voting
mechanism aggregating predictions from multiple trees to
make [33] the final decision this approach not only yields
high accuracy but also enhances the models resilience to noise
and variability the algorithm s adaptability and effectiveness
make it a valuable tool in cybersecurity and various other
domains.
3) Naïve bayes: Naive Bayes is a probabilistic
classification algorithm based on Bayes theorem with the
naive assumption of feature independence it s particularly
effective for text classification and spam filtering the
Fig. 2. Performance evaluation.
1273 | P a g e
www.ijacsa.thesai.org
(IJACSA) International Journal of Advanced Computer Science and Applications,
Vol. 15, No. 4, 2024
IV. IMPLEMENTATION
The experimental setup encompasses the selection and
preparation of datasets the configuration of machine learning
algorithms and the establishment of a controlled environment
for rigorous testing the unsw nb 15 dataset consisting of 2 5 Fig. 3. Data visualization.
million records with 49 features was chosen for its relevance
to advanced network intrusion techniques to ensure a diverse Next distribution of classes in the target variable
representation of attacks the dataset was partitioned into showcases the balance or imbalance between normal and
training and testing sets. attack instances in Fig. 4. This is crucial for assessing the
A. Tools and Techniques dataset's class distribution and potential class imbalance,
which can impact machine learning model training.
This experimentation involved the implementation of
various machine learning algorithms to evaluate their
performance in network security attack classification python
leveraging popular libraries such as scikit learn and tensor
flow served as the primary programming language for
algorithm implementation the choice of algorithms includes
decision trees support vector machines neural networks and
ensemble methods each configured with appropriate
hyperparameters.
B. Implementation
The machine learning algorithms were implemented using
a modular and scalable approach allowing for easy integration
of new algorithms and flexibility in experimenting with
different configurations the jupyter notebook was version-
controlled using git to track changes and ensure
reproducibility.
1) Import dataset: In the dataset preparation phase, the
unsw nb 15 datasets were employed consisting of both Fig. 4. Distribution of classes.
training set unsw nb 15 training set csv and a testing set unsw
nb 15 testings set csv the dataset was loaded into a python The below in Fig. 5, visualization presents a correlation
environment using the pandas library the training set as read heatmap, offering a comprehensive overview of the numerical
features' relationships. This heatmap aids in identifying
from the unsw nb 15 training set csv file comprised 82 332
potential multicollinearity and understanding feature
records while the testing set obtained from the unsw nb 15 interdependencies.
testing set csv file included 175 341 records to verify the
integrity of the dataset and ensure the appropriate division
between training and testing data the lengths of the training
and testing sets were checked the training set exhibited a
length of 82 332 records and the testing set comprised 175.
2) Data visualization: The data visualization code utilizes
the seaborn library to create informative plots depicting the
distribution of attacks and normal traffic in both the training
and testing sets the first two count plots in the top row display
the overall distribution of labels attack or normal in the
training and testing datasets meanwhile the bottom row
illustrates the distribution of attack categories in both sets with
the order specified based on the frequency of attack categories
these visualizations provide a clear overview of the class Fig. 5. Correlation of heat map.
distribution and the prevalence of different attack categories
within the datasets such insights are crucial for understanding In Fig. 6, a boxplot of the sttl feature is depicted
the imbalance between attack and normal instances and guide showcasing its distribution across different classes this
subsequent steps in the analysis such as addressing class graphical representation allows for a quick assessment of the
imbalances and selecting appropriate evaluation metrics for feature's potential discriminative power in distinguishing
machine learning models show in Fig. 3. between normal and attack instances together these
visualizations contribute to a holistic understanding of the
dataset's characteristics guiding subsequent steps in the
analysis and model development.
1274 | P a g e
www.ijacsa.thesai.org
(IJACSA) International Journal of Advanced Computer Science and Applications,
Vol. 15, No. 4, 2024
1275 | P a g e
www.ijacsa.thesai.org
(IJACSA) International Journal of Advanced Computer Science and Applications,
Vol. 15, No. 4, 2024
leakage the training data x train was transformed using the For normal instances (label 0), the precision and recall are
fitted scaler and the testing data x test was scaled accordingly 0.72 and 0.96, respectively, resulting in an F1-score of 0.82.
the final dataset dimensions were confirmed showcasing 175 Similarly, for attack instances (label 1), the precision, recall,
341 samples for training each comprising 196 features and 82 and F1-score are notably higher at 0.98, 0.83, and 0.90,
respectively. The weighted average F1-score is reported as
332 samples for testing additionally categorical columns such
0.87, indicating a balanced performance across both classes.
as proto-state and service underwent one hot encoding for
inclusion in the analysis these preprocessing steps ensure that The confusion matrix further shows the classifier's
the machine learning models are trained and tested on effectiveness, correctly identifying 53,638 instances of normal
standardized and appropriately formatted data. traffic and 98,636 instances of attacks. However, the model
misclassified 2,362 normal instances as attacks and 20,705
C. ML Model Classifications attack instances as normal. Despite these misclassifications,
1) Random forest: In Fig. 9, RF classification algorithm the overall accuracy of 87% underscores the robustness of the
was implemented on a network security attack dataset, KNN model in distinguishing between benign and malicious
network activities.
achieving an accuracy of 90%. The classification report details
the model's performance in distinguishing normal and attack
instances, with precision, recall, and F1 score metrics
providing insights. For normal instances (label 0), precision
and recall are 0.77 and 0.98, resulting in an F1 score of 0.86.
For attack instances (label 1), precision, recall, and F1 score
are higher at 0.99, 0.86, and 0.92 respectively. The weighted
average F1 score is 0.90, indicating balanced performance.
The confusion matrix shows the model correctly identifying
54,699 normal instances and 102,950 attack instances while
misclassifying 1,301 normal instances as attacks and 16,391
attack instances as normal. Despite these misclassifications,
the 90% accuracy highlights the random forest model's
robustness in network security attack classification,
contributing valuable insights to the research.
1276 | P a g e
www.ijacsa.thesai.org
(IJACSA) International Journal of Advanced Computer Science and Applications,
Vol. 15, No. 4, 2024
V. RESULTS
The outcomes of the machine learning models including
accuracy precision recall and f 1 score will be systematically
analyzed a comparative study will be conducted to identify the
algorithm that best suits the requirements of network security
attack detection additionally insights gained from the analysis
will be used to draw meaningful conclusions about the
performance of each algorithm in handling diverse patterns
present in the network traffic data.
1277 | P a g e
www.ijacsa.thesai.org
(IJACSA) International Journal of Advanced Computer Science and Applications,
Vol. 15, No. 4, 2024
also address potential challenges and limitations observed TABLE III. COMPARISON WITH EXISTING APPROACHES
during the analysis providing a comprehensive perspective on
the feasibility of deploying these algorithms in real-world Paper Classifiers Accuracy Precision Recalls
scenarios furthermore comparisons with existing literature and [34] SGD 80% 82.1% 82.1%
benchmarks will be made to contextualize the significance of
This study Random forest 90% 90.2% 90%
the results.
[35] Neural network 87% 87.2% 87.8%
Deep learning has revolutionized intrusion detection,
offering unparalleled accuracy and efficiency. In a study, [12] [3] XGBoost 88% 88.3% 88.8%
introduced the Principal Component-based Convolution [8] SVM 76% 77% 77%
Neural Network (PCCNN) approach for IDS, specifically [23] Random forest 80% 81% 8.9%
targeting DoS and DDoS attacks on IoT devices. This
approach boasts impressive accuracies of 99.34% for binary [14] KNN 82% 82% 82%
and 99.13% for multiclass classification on the NSL-KDD
dataset. Utilizing a sophisticated architecture of 13 layers of B. Limitations
Sequential 1-D CNN and feature reduction through Principal Though our study presented promising results, it is crucial
Component Analysis (PCA), it showcases exceptional promise to recognize the limitations. The effectiveness of machine
for cutting-edge IoT intrusion detection. learning models heavily relies on the dataset's quality and
representativeness. The utilization of the unsw nb 15 dataset in
Furthermore, the IDSGT-DNN framework, presented by our research may not adequately cover all real-world network
[37], elevates cloud security by seamlessly integrating an traffic scenarios and variations. The chosen features and
attacker-defender mechanism using game theory and deep preprocessing techniques could impact model performance,
neural networks. This framework outperforms traditional suggesting further exploration of feature engineering methods
methods in accuracy, detection rate, and various metrics on to improve classifier efficacy. The selection of classifiers was
the CICIDS-2017 dataset. Remarkably, the defender's based on established algorithms, but future research could
detection rate spans from 0 to 0.99, with gains strategically set investigate new approaches or DL methods for better
at -5, 0, and 5. While the present study may not achieve the outcomes. Evaluation metrics mainly focused on accuracy,
accuracies of the PCCNN approach (99.34% for binary and precision, recall, and f1 score, potentially overlooking
99.13% for multiclass) and the IDSGT-DNN framework variations in performance among different attack types. These
presented in previous works, it excels in computational restrictions highlight the importance of continuous refinement
efficiency. Our machine learning classifiers—Random Forest and exploration in intrusion detection to combat evolving
(RF) with an accuracy of 0.90, K-Nearest Neighbors (KNN) at cyber threats effectively.
0.87, Naive Bayes with 0.79, and Logistic Regression (LR)
also at 0.87—demonstrate competitive performance. The ML models used in the present investigation have
Importantly, these classifiers deliver these results in been practically implemented and tested using the real
significantly less time, underscoring the trade-off between intrusion detection dataset, which is recognized for its
accuracy and computational speed in intrusion detection relevance to real-world network intrusion scenarios. This
systems. approach leverages the dataset to demonstrate the models'
practical applicability in a real-world network environment.
Additionally, promising results in the random forest model By conducting experiments on the dataset, the effectiveness of
showcased notable improvements achieving a commendable the models in detecting a variety of attacks, including novel
balance between precision and recall k nearest neighbors and sophisticated ones, was evaluated. This hands-on
demonstrated strong predictive capabilities aligning with its validation allows for the identification of operational
suitability for identifying patterns in network traffic although challenges and fine-tuning of the models for improved
naive bayes presented a lower accuracy its performance performance in real-world scenarios. The practical testing
remains consistent with the algorithm s inherent assumptions provides valuable insights into the models' robustness,
logistic regression emerged as a reliable choice showcasing a scalability, and applicability, thereby reinforcing their
balanced precision recall trade off collectively our findings effectiveness and reliability in real-world network intrusion
contribute to the existing body of research by highlighting the detection applications. Future research could consider novel
effectiveness of these classifiers in the specific context of approaches or DL methods for better results [38]. Evaluation
intrusion detection offering valuable insights for the metrics focused on overall accuracy, precision, recall, and f1
development of robust and accurate network security systems. score, neglecting performance variations across different
The performance of the proposed methodology will be attack types. These limitations highlight the importance of
compared with existing approaches, highlighting the continuous refinement and exploration in intrusion detection
advancements achieved in Table III. to address evolving cyber threats.
1278 | P a g e
www.ijacsa.thesai.org
(IJACSA) International Journal of Advanced Computer Science and Applications,
Vol. 15, No. 4, 2024
1279 | P a g e
www.ijacsa.thesai.org
(IJACSA) International Journal of Advanced Computer Science and Applications,
Vol. 15, No. 4, 2024
[32] A. R. X. Y. C. a. M. G. Huang, "Research on multi-label user [36] G. MeeraGandhi, "Machine Learning Approach for Attack Prediction
classification of social media based on ML-KNN algorithm.," and Classification using supervised learning algorithms," Int. J. Comput.
Technological Forecasting and Social Change 188, 2023. Sci. Commun 1, no. 2, 2010.
[33] B. D. J. A. a. S. H. L. He, "Assessment of tunnel blasting-induced [37] E. Balamurugan, A. Mehbodniya, E. Kariri, K. Yadav, A. Kumar, and M.
overbreak: A novel metaheuristic-based random forest approach.," Anul Haq, “Network optimization using defender system in cloud
Tunnelling and Underground Space Technology 133 , 2023. computing security based intrusion detection system withgame theory
[34] G. a. G. K. Kocher, "Analysis of Machine Learning Algorithms with deep neural network (IDSGT-DNN),” Pattern Recognit. Lett., vol. 156,
Feature Selection for Intrusion Detection Using UNSW-NB15 Dataset," pp. 142–151, 2022, doi: https://doi.org/10.1016/j.patrec.2022.02.013.
Available at SSRN 3784406, 2021. [38] H. Mohd Anul, “DBoTPM : A Deep Neural Network-Based Botnet,”
[35] M. S. L. a. K. G. K. Beechey, ""Evidential classification for defending Electronics, vol. 12, no. 1159, pp. 1–14, 2023.
against adversarial attacks on network traffic," Information Fusion 92
(2023): 115-126., 2023.
1280 | P a g e
www.ijacsa.thesai.org