Project Report

IoT Network Intrusion Detection and Classification using Explainable (XAI) Machine Learning Algorithms

ISE 5194

–

Human-Centered Machine Learning Spring 2021

Harshil Patel Abstract

The continuing increase of Internet of Things (IoT) based networks have increased the need for Computer networks intrusion detection systems (IDSs). Over the last few years, IDSs for IoT networks have been increasing reliant on machine learning (ML) techniques, algorithms, and models as traditional cybersecurity approaches become less viable for IoT. IDSs that have developed and implemented using machine learning approaches are effective, and accurate in detecting networks attacks with high-performance capabilities. However, the acceptability and trust of these systems may have been hindered due to many of the ML

implementations being ‘black boxes’ where human interpretability, transparency, explainability, and logic in

prediction outputs is significantly unavailable. The UNSW-NB15 is an IoT-based network traffic data set with classifying normal activities and malicious attack behaviors. Using this dataset, three ML classifiers: Decision Trees, Multi-Layer Perceptrons, and XGBoost, were trained. The ML classifiers and corresponding algorithm for developing a network forensic system based on network flow identifiers and features that can track suspicious activities of botnets proved to be very high-performing based on model performance accuracies. Thereafter, established Explainable AI (XAI) techniques using Scikit-Learn, LIME, ELI5, and SHAP libraries allowed for visualizations of the decision-making frameworks for the three classifiers to increase explainability in classification prediction. The results determined XAI is both feasible and viable as cybersecurity experts and professionals have much to gain with the implementation of traditional ML systems paired with Explainable AI (XAI) techniques.

1.

Introduction

IoT networks have become an increasingly valuable target of malicious attacks due to the increased amount of valuable user data they contain. It has high significance to critical services where cyber-attacks attempt to compromise the security principles of confidentiality , integrity, and availability. In response, network intrusion detection systems (IDS) or systems that monitors and detects cyber-attack patterns over networking environments based on machine learning algorithms/models have been developed to detect suspicious network activity where They monitor network traffic for suspicious activities and issue alerts in case of detected attack types. Machine Learning (ML) algorithm are being investigated as potential IDS frameworks as current or traditional IDS capabilities have concerns including:

•

IoT traditional network security solutions may not be directly applicable due to the differences in IoT structure and behavior.

•

Low operating energy and minimal computational capabilities.

•

Traditional security mechanism such as encryption protocols and authentication cannot be directly applied,

•

Lack of a single standard for IoT architecture, policies, and connectivity domains.

Existing ML algorithms and models are able to learn IoT network inputs associated with target features concerning Normal or Attack behaviors to detect malicious activity or botnets. But current research indicates these ML-

based IDSs are mainly “black boxes” where users of the systems in

cybersecurity services are lacking the ability to explain why the system arrived at the prediction or classification of attack, which is important for optimal initial evaluation in cybersecurity and information assurance planning and resource allocation for IoT networks. The rapid integration of IoT networks in a variety of applications and setting has increased demand for robust cybersecurity practices and systems, but relevant researchers, security administrators, information security

professional, etc…would benefit greatly from increased capabilities in u

nderstanding the ML- based IDS systems they may employ to effectively conduct operations and develop computer or network protection strategies to protect assets. The objective of this work proposes to apply a survey of ML algorithms that have been modified and considered Explainable AI (XAI) methods through utilization of existing Python libraries to explain model classification decisions, logic behind predictions, and feature importance for predictability to increase transparency of ML-based IDSs to be understood further than current levels by human analysts in the IoT cybersecurity domain. These tools have been utilized in varying applications and show promise to extending explainability features to IoT network security IDS problems. Additionally, performance metrics of accuracy will be measured to augment explainability.

2.

Literature Review

Protecting Internet of Things (IoT) networks has been a critical are of research for cybersecurity experts as IoT continues to be integrated in applications including home automation, smart cities, modern health systems, and advanced manufacturing. Furthermore, developing Intrusion detection systems based on machine learning techniques and other statistical feature learning algorithms has been researched and been put into application minimally. A study [8] conducted by Nour Mustafa developed an ensemble-based machine learning intrusion detection technique based on proposed statistical flow features for protecting network traffic of internet of things. In his proposed system, statistical flow features were generated from initial analysis of network features. Thereafter, an AdaBoost ensemble learning method was applied to three machine learning algorithms including decision tree, Naive Bayes (NB), and artificial neural network. The developed models were analyzed for performance to detect malicious events effectively, based on the UNSW-

NB15 and NIMS botnet datasets with simulated IoT sensors’ data

. The experimental results conveyed high performance of detection of normal and malicious activity. Overall, the proposed ensemble technique has a higher detection rate and a lower false positive rate when compared to three other traditional IoT cybersecurity techniques. Many other machine learning- based intrusion detection systems have been researched and developed as outlined by Kelton da Costa survey [3]. His research investigates the range of machine learning techniques applied in Internet-of-Things and Intrusion Detection for computer network security by cybersecurity experts. Over 95 works on the subject were analyzed, that ranged across different sub-disciplines in machine learning and security issues in IoT environments. Concerning literature on Explainable Artificial Intelligence (XAI), an extensive publication [4] summarizing the key concepts, taxonomies, opportunities, and challenges with aspect to

responsible AI can be used to review overall research work into explainability of ML methods. Recently, XAI had gained notable momentum, as lack of explainability has become an inherent problem of the latest ML techniques such as ensembles or Deep Neural Networks. Hence, XAI, has become crucial for deployments of ML models, where researchers keep transparency, fairness, model explainability and accountability at its core. More specifically, early investigation into developing Network Intrusion Detection System using an Explainable AI Frameworks [1] has conducted by Shraddha Mane and Dattaraj Rao. As ML models offer increased accuracy, the complexity increases and hence the interpretability decreases. In their paper, they developed a deep neural network for network intrusion detection and proposed an explainable AI framework to demonstrate model transparency throughout the machine learning pipeline. Utilizing existing XAI algorithms generated from SHAP, LIME, Contrastive Explanations Method (CEM), Protidic and Boolean Decision Rules via Column Generation (BRCG), that provide explanations on individual predications, they applied the approaches to the NSL-KDD dataset demonstrating successful increase in model transparency.

3.

Overview & Benefits of Explainable (XAI) Machine Learning

Advanced and state of the art ML algorithms and models offer valuable applications in establishing better IoT network security. The ML techniques, learn from input features generated in network traffic, and offer support to cybersecurity personnel in making critical threat detection decisions. However, these techniques are based on advanced models that are too complex to be interpreted by human analysts; hence, may they turn to traditional tools that may not be as viable but offer more explainability or inherent trust by the human involved. In many cases, it is nearly impossible to get a feeling for its inner workings of a ML system for Intrusion Detection. This may further decrease trust that a certain prediction from the model is correct even though performance results may indicate otherwise. Having an intuitive explanation of the rationale behind individual predictions or model decision-making framework will better position cybersecurity experts to trust prediction or the classifiers itself, especially, in understanding how it behaves in particular cases. Explainable AI (XAI) offers a variety of explanation or feature importance tools for generating explanations about the knowledge captured by trained ML models to aid in increasing overall trust.

4.

Methodology

4.1

UNSW-NB15 Dataset

UNSW-NB15 is an IoT-based network traffic data set with different categories for normal activities and malicious attack behaviors from botnets (through classification of attack type including Fuzzers, Analysis, Backdoors, DoS, Exploits, Generic, Reconnaissance, Shellcode and Worms). The raw network packets of the UNSW-NB 15 dataset were created by the IXIA PerfectStorm tool in the Cyber Range Lab of the Australian Centre for Cyber Security (ACCS) for generating a hybrid of real modern normal activities and synthetic contemporary attack behaviors on IoT based networks. Figure 1 shows the testbed configuration dataset and the method of the feature creation of the UNSW-NB15.

Figure 1: IXIA Traffic Generator Overview: The UNSW-NB15 is pre-partitioned by its creators into being configured into a training set for model training and testing set for model performance, namely, UNSW_NB15_training-set.csv and UNSW_NB15_testing-set.csv respectively. The number of records in the training set is 175,341 records and the testing set is 82,332 records with a target response of the traffic behavior for each record, attack, and normal behavior. The dataset consists of 39 features that are numeric in nature. The features and their descriptions are listed in the UNSW-NB15_features.csv file. The balance for the For our experimental processes, the target feature will be a binary classification of Normal or Attack behavior. Figures 2 provide the details and the values distribution of each attack class within the data subsets, where 0 represents Normal and 1 represents Attack behavior. We could see that the dataset is adequately balanced for the binary response variable of activity behavior. Figure 2: Training Dataset Distribution and Counts

Project Report

Uploaded by

Project Report

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Project Report

Uploaded by

Reward Your Curiosity

Share this document

Share or Embed Document

Sharing Options

You might also like