InfoSec Project Report

Download as pdf or txt
Download as pdf or txt
You are on page 1of 7

Federated Learning for Secure Intrusion Detection

Systems with Explainable AI


Maaz Ali Nadeem Shehryar Sohail Abdulwadood Waseem
Department of DS & AI Department of DS & AI Department of DS & AI
FAST NUCES FAST NUCES FAST NUCES
Islamabad, Pakistan Islamabad, Pakistan Islamabad, Pakistan
[email protected] [email protected] [email protected]

Abstract—Federated Learning is an emerging concept inefficient approach as larger data volumes lead
that promises a information secure implementation to higher latency when keeping up with real-time
of Machine Learning models over heterogenous, and demands.
distributed systems. In our work, we discuss implementing
an end-to-end Federated Learning approach on a military Federated Learning (FL) has emerged as a
Network Intrusion Dataset. We implement an Artificial promising new alternative to address the aforementioned
Neural Network and generate an aggregated model, with limitations. FL allows the system to learn collaboratively
XAI techniques to produce over 96% accuracy. We ensure through training on distributed datasets and distributed
the security of our local data, and mdoel weights through compute resources without compromising the privacy of
a three-step cryptographic application. data. Moreover, only the updates of model weights are
Index Terms—Cryptography, Neural Networks shared with the central server, that too in an encrypted
I. I NTRODUCTION format. The server aggregates the updates and creates
a global model that is built upon the knowledge from
Intrusion Detection Systems (IDS) are commonly individual participating clients and the raw data is kept
seen as digital security guards for providing a security private to the central server.
layer to computer networks. IDS are employed to
continuously monitor and identify traffic on the A. Importance of Federated Learning
network; aiming to be preventative against unauthorized
attempts of access gaining, malicious code injecting Federated Learning has several advantages over
and executing, and other similar breaches of security. traditional approaches using centralized learning. The
Through the detection of suspicious activity or traffic most relevant advantages are as follows:
on the network, the IDS plays a vital role in being 1) Prevention of Privacy: In the context of
a safeguard of sensitive information and for the Federated Learning, raw data only remains on
maintenance of data integrity. client machines. The only information sharing
The traditional approaches to IDS had relied on between clients and server are the model’s weights
rule-based, or centralized learnings, such that one model that are also encrypted. This technique ensures
was trained on a large collected dataset of historic that the sensitive data, such as network traffic logs
examples, stored in a central location. However, there of intrusion detection. It remains private to the
were several limitations to this approach: machine, and is neither exposed to central server
1) Concerns of Privacy: The use of proprietary nor to other clients on the network.
data, and sensitive information causes issues of 2) Enhancement of Security: Through
privacy when stored centrally and accessed. One decentralization of data, the Federated Learning
data breach is all it will take to create a big reduces the possibilities of data breaches, and
problem. access that is not allowed. In our project’s context,
2) Challenges of Scalability: Network sizes have this ensures that the risk of exposing sensitive
kept growing, especially in the era of Big Data. information to hackers, or malicious actors, is
Centralizing is a computationally expensive and reduced.
3) Efficiency of Collaboration: Federated Learning [7] propose a modelling process using ANOVA F-Test
lets multiple parties to collaborate on model and RFE where relevant features are obtained first and
training without having to share their data. This then various ML models like SVM, Decision Trees and
prevents any limitations of learning, but regulates Random Forest techniques are applied. Random Forest
constraints from privacy concerns. outperforms all other models on all benchmarks.
4) Efficiency of Resources: Model updates are
aggregated on a single central server, so Federated B. Explainable AI with various ML Techniques
Learning reduces amount of data sent over the To incorporate an understanding of how ML models
network. This saves bandwidth, and compute deal with adversarial attacks, the authors in [9] propose
resources. the usage of a Global Explanation model using SHAP
This project discusses the complete implementation (Shapley Additive Explanation) while training the
of Federated Learning and Explainable AI for Intrusion Random Forest Classifier. A more detailed discussion
Detection Systems, where two client machines train local is conducted as the authors break down a series of local
(and smaller) models on separate and unique copies of explanations by assigning a numeric weightage of impact
the same dataset, and training on the same architecture. to each individual feature; these local explanations are
The server is in charge of securely aggregating model then used by SHAP to generate global understanding
updates and generating a single global model that is while keeping local faithfulness intact. This methodology
broadcasted back to the clients, enhancing their intrusion helps for explaining tree-based classifiers [3].
detection capabilities and securing the data. Another interesting technique is the post-hoc model
The following sections are structured as follows: agnostic approach that is discussed in [8]. A tree
1) RELATED WORK: Previous examples and based classifier that is inherently explainable is trained
Literature review in Federated Learning and XAI using labels from the SVM, called secondary training
2) METHODOLOGY: How we contribute to this data to provide explanations and compare permutation
subject matter and build a project innovatively importance method to the more commonly used
3) EVALUATION: How the model is evaluated after measures such as accuracy.
aggregating C. Federated Learning in Network Intrusion Detection
II. R ELATED W ORK The authors in [6] present a comparative proof
There are several key concepts that we will be that federated learning models produce almost similar
reviewing throughout this study. To begin with, we performance accuracy as compared to centralized deep
conduct a review across many subtopics that we will learning models. According to experimentation, results
build upon as the models are built and tested. have demonstrated that workers participating in FL have
significant performance. The methods allows multiple
A. Intrusion Detection with Machine Learning ISPs to conduct a joint deep learning training run
Intrusion Detection Systems (IDS) are a supportive through, with the premise of reatining local data. They
layer to the information security problems, and with built this model on the CICIDS2017 network intrusion
the rapid growth of technologies exposes the new detection dataset.
wave of using Machine Learning for IDS. The We then see the authors in [1] discussing the
authors in [5] utilize a series of ML algorithms like introduction of data privacy through federated learning
Linear Discriminant Analysis (LDA), Classification & in heterogeneous and complex infrastructures. IoT
Regression Trees (CART) and Random Forest (RF) devices are orchestrated and the data communication
with RF being the best of the ML classifiers hitting is centralized through the server. They present the
an accuracy of over 99.80%. As Machine Learning solution being regulated through a central server, and
techniques, like the Support Vector Machines (SVM) involving Vehicular Networks. Smart Cities, Cyber
treat each feature equally, there is a solution to Physical Systems and Health Care Centers as clients.
this proposed by the authors in [2] who incorporate In [4], the authors suggest to enhance detection models
Information Gain Ratio (IGR) with the K-mean based on the characteristics of attack types, that include
algorithm to SVM for intrusion detection; first the dataset selecting the most effective features when conducting
is ranked using IGR and then the feature subset is data analysis. They propose a FL-based ML approach
made doing a K-mean selection algorithm. Authors in with a feature selection strategy. A greedy algorithm

2
an attack with a specific attack type. Each connection
record is of about a hunder bytes.
For each of the TCP/IP connections, there are 41
total features, 3 of them are qualitative and 38 of them
are quantitative. The dataset allows us to conduct a
binary classification with the class variable having two
categories:
• Normal
• Anomalous

B. Security Algorithm
For the security algorithm of our project, we
implement three-step cryptographic techniques for
secure communication, of Secure Socket Layer
Fig. 1: Local explanations based on TreeExplainer enable (SSL), Advanced Encryption Standard (AES) and
a wide variety of new ways to understand global model Rivest-Shamir-Adleman (RSA) encryption to secure
structure. (A) A local explanation based on assigning individual model weights of each client during the
a numeric credit to each feature. (B) Combining local process of federated learning process.
explanations, we represent global structure to retain • Secure Sockets Layer (SSL): SSL is applief
local faithfulness. To demonstrate this, three illustrative for secure communications between the client and
medical datasets are used to train gradient boosted the server, such that it ensures that eavesdropping
decision trees, and compute local explanations based on and tampering is prevented completely during
SHAP. This enables many new tools for understanding data transmission over the network, through its
global model structure. encryption.
• Advanced Encryption Standard (AES): AES
is a symmetric key encryption algorithm that uses
selects feature combinations to yield the best accuracy the same key for decryption and encryption. In
across different attack categories. Multiple global models our implementation, we use the AES to encrypt
are then generated by the central server according to the model weights (as an additional security layer)
the decided features by the edge devices. To evaluate, before using the RSA technique. (1) A random AES
they conduct simulation experiments based on latest key is generated for every client. (2) The model
on-device Neural Networks for Anomaly Detection, on weights are encrypted with that key. (3) The AES
the NSL-KDD dataset. Experimental results show the key itself is then encrypted using the server’s public
supremacy of this approach as it produces remarkable key (RSA).
results. • Rivest–Shamir–Adleman (RSA): RSA is an
III. M ETHODOLOGY asymmetric public-key encryption algorithm, Each
of the particpants (both, clients and servers)
A. Network Intrusion Detection Dataset
generate a key pair that consists of a public key and
The dataset that is used in this study is taken from a private key. Public Key is like your email address
Kaggle and is contributed open-source to the Data that is used to encrypt the message to send to you.
Science community by Sampada Bhosale. It consists of a The Private Key is like a password, usually kept
wise variety of intrusions that are simulated in a military secret, that decrypts the message. In our approach
network environment. The environment was able to get (1) The client encrypts the model weights with the
raw TCP/IP daump data for one of the networks by server’s public key, in order to send to them. (2)
simulating a typical LAN of the US Airforce. The server decrypts the received weights using its
A connection is the sequence of TCP packets that private key.
are starting and ending at a time duration between
which data is allowed to flow. This is done from source C. Implementation
IP address to target IP address under a well-defined Key Generation: The RSA Key Pair scripts are
protocol. Every connection is labelled as normal, or as generated for clients and the server. The public keys

3
are shared, and the private keys are kept secure on each secure for many applications, and RSA with a
device. 2048-bit key is a common technique used for
Encryption: A script is used to define the encrypt secure communications.
and decrytp functions. 2) Computation Time: There are several
1) Encrypt: Takes the model weights dictionary computation costs that include RSA key
and the server’s public key as an input. It then generation, random AES key generation and
generates a random AES key. It encrypts the AES encryption / decryption, and RSA encryption
weights with AES. It then encrypts the AES key / decryption. This can affect performances on
with RSA using the public key of the server and resource-constrained devices.
then combines it all for transmission.
2) Decrypt: Performs the operation in reverse order E. Deep Learning Model
on the data received that is received using the We implement a simple Artificial Neural Network
private key of client. architecture that builds the model on the given dataset.
1) Model Architecture & Training: We have defined
our model’s architecture as follows:
• An input layer of 118 nodes
• Two hidden layers, 20 and 10 nodes each
• ReLU activation for non-linearity
• Single Node on the output layer
• Sigmoid activation for binary classification
We have defined our model’s training process as
follows:
• Preprocessing steps are done like handling
categorical features and data standardization.
• Splitting of the dataset in train, test and split.
• Using the ADAM optimizer and the BCELoss
functiin for training on client side.
• A learning rate of 0.0001 is used
2) Suitability: The architecture is practical and
suitable as it is able to model the non-linear
complex patterns from the dataset for making accurate
predictions. It is able to capture the nuances in the
dataset.
Neural Networks are often data hungry, but we
implement a fairly simple and shallow architecture that
is able to work under the Federated Learning scenario
where there is less computational resource, and less
training data.

F. Integrated Gradients: Understanding Model


Fig. 2: A basic Federated Learning connection in our Predictions
project
To ensure trustability of our Federated Learning
approach, we implement an Explainable AI technique
D. Performance called the Integrated Gradients (IG).
1) Security Level: A three-layer approach offers a 1) Integrated Gradients: IG is an approach that
high probability of secure communication. RSA explains ML model predictions, primarily complex ones
gives a strong asymmetric encryption, while AES like Neural Networks. The aim is to understand which
offers an efficient symmetric encryption to the features contribute and correlate most to the model’s
local data. AES with a 256-bit key is considered predictions.

4
• IG starts by defining a baseline input. It is a • The server waits for all clients to share their weights
reference point with minimal impact on model’s before broadcasting the global model.
prediction. • The server average aggregates the decrypted model
• For a specific data point, IG calculates a gradient weights received from all clients.
of model’s prediction with respect to each input
feature. The gradient identifies how much the output
is affected based on a small change in that input
feature.
• All gradients are integrated along the path from the
baseline to accumulate each feature’s contribution.
• The resulting IG values provide an attribution
score for each feature that influences the model’s
prediction for the particular data point.
Integrated Gradients is an important technique that can
benefit us in the following ways:
1) Interpretability
2) Debugging
3) Feature Selection
G. Client-to-Server and Back Process with
Encryption/Decryption
Here we break down the entire process of
communication in detail.
1) Client-Side:
• Model Training: The clients train their local
models on their own splits of dataset. Fig. 3: A flow of how the communication is done
• Weights Extraction: After training, the client
gets the model weights which hold the model’s
learnings. IV. E VALUATION
• Encryption: (1) A random (32 byte) AES key A. Performance Metrics
is generated, and encrypts the model weights
We analyze the federated learning model performance
dictionary. The encrypted weights are then padded
for intrusion detection on two clients. The aggregated
to match the AES block size. (2) To ensure that the
model produces the following metrics (Accuracy,
server is unable to decrypt the AES key directly,
Precision and Recall), in comparison to individual
the client encrypts it using public key of server that
models saved after the client’s local training.
is obtained beforehand.
2) Data Transmission: Metrics Before Aggregation After Aggregation
Accuracy 0.98810 0.96349
• The client constructs a message with the encrypted
Precision 0.98413 0.97851
model weights (using AES) and encrypted AES key F1-Score 0.98805 0.96260
(using server’s public key with RSA).
TABLE I: Comparison of metrics before and after
• Message is sent to server through a secure SSL
aggregation for Client 1
channel.
3) Server-Side:
• Server receives the encrypted message from the Metrics Before Aggregation After Aggregation
client. Accuracy 0.98968 0.96270
Precision 0.99038 0.96916
• The server utilizes the private key for RSA to F1-Score 0.98959 0.96213
decrypt the AES key.
• Using the key, the server decrypts model weights
TABLE II: Comparison of metrics before and after
using the AES algorithm to get original weights. aggregation for Client 2

5
Node Encryption Decryption
Client 1 0.0030159 0.005802
Client 2 0.002324 0.005743
Central Server 0.001991 0.005042

TABLE III: Time taken by all devices in the network in


seconds

In both of the clients, there is a visible decrease in


the accuracy, precision and F1 score after aggregating
into a general model. This suggests that the aggregation
has lead to either a noise in the learning, or it has
allowed generalizability of the model by incorporating
more learnings. While the performance drop may be seen
as an incompatibility that has reduced performance, we
see this as a prevention of overfitting.
B. Encryption/Decryption Times Fig. 4: Integrated Gradient values plotted for each feature
The decryption times are generally longer than for a particualr prediction
encryption times due to the involvement of steps like
RSA key validation, and AES key decryption. Client
1 takes more time for the encryption than Client 2
does, this could be due to the factors of random
number generation with AES key creation, or differences
in hardware performance. These variations, are very
minimal however, and not significant for the study being
conducted.
C. Explainable AI Results
The results of the Integrated Gradients approach are
modelled here to generate trustable, and auditable model Fig. 5: Hidden layer activations recorded for neuron-level
outcomes. The positive spikes (to the right) indicate a impact
positive correlation to the prediction. And a negative
spike (to the left) indicate negative correlations to
the predictions. The length of the spike is directly [3] Scott M Lundberg, Gabriel Erion, Hugh Chen,
proportional to the magnitude of impact on the model’s Alex DeGrave, Jordan M Prutkin, Bala Nair, Ronit
decision. Katz, Jonathan Himmelfarb, Nisha Bansal, and
R EFERENCES Su-In Lee. “Explainable AI for trees: From local
explanations to global understanding”. In: arXiv
[1] Shaashwat Agrawal, Sagnik Sarkar, Ons preprint arXiv:1905.04610 (2019).
Aouedi, Gokul Yenduri, Kandaraj Piamrat, [4] Yang Qin and Masaaki Kondo. “Federated
Mamoun Alazab, Sweta Bhattacharya, learning-based network intrusion detection with a
Praveen Kumar Reddy Maddikunta, and feature selection approach”. In: 2021 International
Thippa Reddy Gadekallu. “Federated learning Conference on Electrical, Communication, and
for intrusion detection system: Concepts, Computer Engineering (ICECCE). IEEE. 2021,
challenges and future directions”. In: Computer pp. 1–6.
Communications 195 (2022), pp. 346–361. [5] T Saranya, S Sridevi, C Deisy, Tran Duc Chung,
[2] Jayshree Jha and Leena Ragha. “Intrusion and MKA Ahamed Khan. “Performance analysis of
detection system using support vector machine”. machine learning algorithms in intrusion detection
In: International Journal of Applied Information
Systems (IJAIS) 3 (2013), pp. 25–30.

6
system: A review”. In: Procedia Computer Science
171 (2020), pp. 1251–1260.
[6] Zhongyun Tang, Haiyang Hu, and Chonghuan Xu.
“A federated learning method for network intrusion
detection”. In: Concurrency and Computation:
Practice and Experience 34.10 (2022), e6812.
[7] Srinath Venkatesan. “Design an intrusion detection
system based on feature selection using ML
algorithms”. In: Mathematical Statistician
and Engineering Applications 72.1 (2023),
pp. 702–710.
[8] Carla Piazzon Ramos Vieira and Luciano Antonio
Digiampietri. “A study about explainable articial
intelligence: Using decision tree to explain SVM”.
In: Revista Brasileira de Computação Aplicada
12.1 (2020), pp. 113–121.
[9] Syed Wali and Irfan Khan. “Explainable AI and
random forest based reliable intrusion detection
system”. In: Authorea Preprints (2023).

You might also like