012 Evaluating Federated Learning For Intrusion Detection in Internet of Things Review and Challenges

Computer Networks 203 (2022) 108661
Contents lists available at ScienceDirect
Computer Networks
journal homepage: www.elsevier.com/locate/comnet
Evaluating Federated Learning for intrusion detection in Internet of Things:

Review and challenges
Enrique Mármol Campos a ,∗, Pablo Fernández Saura a , Aurora González-Vidal a ,
José L. Hernández-Ramos b , Jorge Bernal Bernabé a , Gianmarco Baldini b , Antonio Skarmeta a
a
University of Murcia, Department of Information and Communication Engineering, Spain
b
European Commission, Joint Research Centre, Ispra 21027, Italy
ARTICLE INFO ABSTRACT
Keywords: The application of Machine Learning (ML) techniques to the well-known intrusion detection systems (IDS) is
Internet of Things key to cope with increasingly sophisticated cybersecurity attacks through an effective and efficient detection
Federated Learning process. In the context of the Internet of Things (IoT), most ML-enabled IDS approaches use centralized
Intrusion detection systems
approaches where IoT devices share their data with data centers for further analysis. To mitigate privacy
concerns associated with centralized approaches, in recent years the use of Federated Learning (FL) has
attracted a significant interest in different sectors, including healthcare and transport systems. However, the
development of FL-enabled IDS for IoT is in its infancy, and still requires research efforts from various areas,
in order to identify the main challenges for the deployment in real-world scenarios. In this direction, our work
evaluates a FL-enabled IDS approach based on a multiclass classifier considering different data distributions
for the detection of different attacks in an IoT scenario. In particular, we use three different settings that are
obtained by partitioning the recent ToN_IoT dataset according to IoT devices’ IP address and types of attack.
Furthermore, we evaluate the impact of different aggregation functions according to such setting by using
the recent IBMFL framework as FL implementation. Additionally, we identify a set of challenges and future
directions based on the existing literature and the analysis of our evaluation results.
1. Introduction centralized, so that a single entity receives the network traffic data from
different devices to train a certain ML model. Therefore, this entity has
Nowadays, the constant development and deployment of Internet access to the whole network traffic derived from the communication
of Things (IoT) technologies is increasing the attack surface of physical of the different devices participating in the training process and also
devices that could be potentially exploited by malicious entities [1]. devices’ local data, which could lead to privacy issues. This problem
Well-known attacks, such as the Mirai botnet and recent variants [2], could be exacerbated in IoT scenarios due to the amount and sensitivity
demonstrate the need to strengthen IoT devices’ security in order of the information exchanged through certain devices, such as wearable
to protect large-scale IoT-enabled systems. Due to the development or eHealth systems [4]; therefore, decentralized data management
of such increasingly sophisticated attacks, in recent years the use of solutions are of paramount importance [5].
machine learning (ML) techniques has been widely considered for the To address the privacy issues of traditional centralized ML ap-
detection and mitigation of these attacks in IoT scenarios. Indeed, proaches, Federated Learning (FL) was proposed in 2016 [6] as a
the application of ML techniques has been proposed in recent works collaborative learning approach in which end devices (a.k.a clients or
to improve the detection capabilities of the well-known intrusion de-
parties) do not share their data, but only partial updates of a global
tection systems (IDS) through the application of diverse techniques
model that are aggregated by a central entity (a.k.a aggregator or
(e.g., neural networks) to infer potential attacks based on the analysis of
coordinator). Therefore, the use of FL is intended to improve users’
network traffic [3]. Despite the advantages provided by the application
privacy, since the data of their devices is never shared with other
of ML techniques to enhance IDS approaches (e.g., in terms of attack
entities. In general, an FL scenario is characterized by a large number of
detection accuracy), most of such ML-enabled IDS deployments are
∗ Corresponding author.
E-mail addresses: [email protected] (E.M. Campos), [email protected] (P.F. Saura), [email protected] (A. González-Vidal),
[email protected] (J.L. Hernández-Ramos), [email protected] (J.B. Bernabé), [email protected] (G. Baldini),
[email protected] (A. Skarmeta).
https://doi.org/10.1016/j.comnet.2021.108661
Received 18 July 2021; Received in revised form 12 November 2021; Accepted 28 November 2021
Available online 14 December 2021
1389-1286/© 2021 The Author(s). Published by Elsevier B.V. This is an open access article under the CC BY license (http://creativecommons.org/licenses/by/4.0/).
E.M. Campos et al. Computer Networks 203 (2022) 108661
client devices with a variable amount and distribution of data. Indeed, research proposals on FL-enabled IDSs for IoT. Section 4 describes
real-life scenarios are usually based on non-independent and identically our methodology, including the aspects of the dataset partitioning,
distributed (non-iid) data [7]. For example, in the case of an IDS classification techniques and aggregation methods. Then, evaluation
deployed on a certain network, some target devices could have traffic results are presented in Section 5. Based on such results and the analysis
associated with several kinds of attacks (e.g., DoS or port scanning), of existing literature, Section 6 highlights the main challenges for the
while other devices could only have traffic related to their intended development of FL-enabled IDS for IoT. Finally, Section 7 concludes the
operation. The development of FL-enabled IDS approaches in the con- paper with an outlook of potential future directions to be considered.
text of IoT scenarios has attracted an increasing interest in recent
years [8–10]. However, most of the proposed approaches are based on 2. FL-enabled IDS for IoT scenarios
unrealistic data distributions among the parties, inappropriate datasets
and settings (e.g., [11]), or they use binary classification approaches, Intrusion detection systems (IDS) have traditionally been considered
in which traffic data is only classified as attack or benign [12]. Other as key components to protect ICT systems by identifying potential
recent works, such as [13], describe several challenges on the general security attacks/threats derived from traffic monitoring and analysis.
application of FL in IoT scenarios, but they do not provide insights on Although there are several classifications [3,23], IDS approaches are
the integration of FL techniques to enhance IDS approaches. Moreover, usually categorized as signature and anomaly based systems. The for-
while [14] is focused on the challenges and future directions of FL- mer is based on pre-established network patterns and, consequently, it
enabled IDS, they do not provide evaluation results to support their cannot be used to detect a new attack; the latter uses specific features of
contributions, and they do not define a set of criteria to compare network traffic, so that a certain deviation on such network behavior
existing works in the context of IoT scenarios. Consequently, it is hard can be considered as a potential attack. In recent years, the applica-
for cybersecurity practitioners to come up with the most challenging tion of ML techniques to IDS has attracted a strong interest [24,25]
aspects derived from the application of FL to enhance IDS approaches considering different approaches such as neural networks [26,27] or
in IoT. clustering techniques [28]. In the context of IoT, recent efforts have
To fill these gaps, this paper provides a comprehensive evaluation been proposed by considering specific IoT devices and technologies [3].
on the use of FL for IDS in IoT by considering the impact of non-iid Indeed, the use of Deep Learning (DL) techniques has been recently
data. While the aspects of non-iid data distribution have been previ- evaluated through different types of neural networks for the detection
ously analyzed [15,16], their impact when using different aggregation of different attacks in such scenarios [29–31].
methods in the context of FL-enabled IDS has not been properly studied. Despite these efforts, most of the proposed IDS approaches for IoT
In particular, we evaluate the behavior of FL by considering different are based on centralized approaches in which devices send their local
data distributions, training rounds and aggregation methods. For this data to data centers in the cloud or servers with considerable computing
purpose, we use the ToN_IoT dataset [17,18], which has recently capabilities to be analyzed through ML/DL techniques [11]. Such sce-
been proposed for IoT and Industrial IoT scenarios considering sensor nario raises significant issues that need to be considered [32]. First, the
data manipulation attacks, in addition to several network attacks. We disclosure of IoT devices’ local data could represent a privacy concern
propose three scenarios based on different partitions and processing for end users, since an attacker could even infer users’ daily habits
of the ToN_IoT dataset: in the first setting, network flows are split ac- by analyzing the traffic of their devices (e.g., wearables). This aspect
cording to their destination IP address; the second scenario is balanced could also pose an issue for a specific company where IoT devices share
according to the types of attacks among the clients; then, a hybrid their network traffic with third parties. Second, given the dynamism
approach is considered as third setting, in which we find a compromise of typical IoT environments, the time required to detect a potential
between the balance of attack types and the destination IP address by attack could become a key aspect (or a limitation if the computing
means of the Shannon entropy [19]. These three configurations are time is considerable) to prevent its spread in a certain network. In
publicly available at [20]. Then, we evaluate such scenarios by using particular, it may be crucial to provide an early detection of generic
FedAvg [16] and Fed+ [21] aggregation methods through the IBM malware used to hijack vulnerable IoT devices and spread rapidly to
framework for Federated Learning IBMFL [22]. Based on our evaluation build up botnets such as Mirai [33] or Torii [34]. In the case of using
results, and the analysis of the existing literature, we describe some of typical cloud data centers, the latency derived from the communication
the main challenges for the development of FL-based IDS approaches of a large quantity of data with data centers could be unaffordable or
to be deployed in IoT scenarios. Therefore, our work can be used as a it could decrease the effectiveness of the IDS deployment. Although
reference for future research activities on the use of FL in this context. recent approaches propose the use of fog/edge computing [35] to bal-
In summary, our contributions are: ance the computing resources in the IDS implementation, this solution
still raises privacy concerns as devices’ data is shared with external
• Identification of the main aspects for the evaluation of FL-enabled entities (i.e., fog/edge nodes). Third, many IoT scenarios are com-
IDS for IoT, and analysis of existing proposals according to such prised of resource-constrained devices communicating through wireless
aspects. technologies with limited bandwidth and throughput. Therefore, the
• Partitioning of the recent ToN_IoT dataset to create different data constant communication of devices’ network data could represent a
distributions among clients to evaluate its impact on the overall high overhead for IoT networks with a high number of connected
system accuracy. devices.
• Quantitative analysis of the impact of non-iid data considering To address these issues, there is a need for decentralized approaches
different aggregation methods and training rounds by using the with on-device learning in which devices themselves could perform lo-
recent IBMFL implementation. cal processing on their own network traffic data. As described by [32],
• Usage of multi-class classification for differentiating specific types a distributed or self-learning approach is a potential solution in which
of attacks in the output. devices perform local training without interacting with each other.
• Definition of the main challenges and future trends to be consid- However, in this approach, devices are not able to improve their
ered in the future years for the development of FL-enabled IDS learning capacity based on the learning process of the other devices in
for IoT scenarios. the network. As an alternative, Federated Learning (FL) was proposed
in 2016 [6] as a collaborative learning approach in which devices
The structure of the paper is organized as follows. Section 2 pro- still interact each other through a centralized entity without the need
vides an overview of FL and the main advantages derived from its for sharing their data. Fig. 1 shows an overview of the centralized,
application to IDS. In Section 3, we describe and classify the existing distributed, and federated learning approaches.
2
Fig. 1. Comparison between centralized, distributed and federated learning approaches.
In a typical FL scenario, end devices do not share their data. Instead, can be exploited in different IoT scenarios [41]. In this context, recent
they update the information onto the global model based on local works have proposed the application of FL to improve IDS. To classify
calculations on their own data. These nodes are typically called clients these works, we have considered various aspects, such as: analyzed
or parties, and the entity responsible for aggregating such local updates attacks, training datasets, ML/DL algorithms to detect such attacks,
is called coordinator or aggregator. The training process is divided into aggregation methods, and implementation frameworks. An overview of
a set of rounds, in which clients interact with the coordinator to update these proposals is shown in Table 1.
the global model until a certain number of rounds is performed or Based on our analysis, we note that some of the proposed works use
a certain accuracy is achieved. In particular, the main steps of each their own generated or simulated dataset, For example, [42] integrated
training round comprise [6,36]: an FL approach with fog computing, where fog nodes collaborated for
detecting DDoS attacks. For this purpose, authors use Gated Recursive
1. The coordinator selects a subset of clients. For this purpose,
Units (GRUs) [53] as ML technique, and FedAvg as the aggregation
different aspects can be considered; for example, in an IoT
algorithm. Also based on GRU, [8] proposes the creation of com-
scenario, devices’ computation/communication resources can be
munication profiles associated to IoT devices that are used to detect
used to select the most suitable clients to participate in the
training round [37]. potential attacks. In this case, the dataset is generated from real devices
and the use of traffic associated with the Mirai botnet [54]. As these
2. The coordinator sends the parameters/weights of the global
model to the selected clients. works are not based on publicly available datasets, it is difficult to
3. The different clients update the global model’s assess the suitability of their proposed approach. Furthermore, in the
parameters/weights through a training process by using Stochas- case of [42], authors do not provide performance details, such as the
tic Gradient Descent (SGD) with their local data. In the case of an different numbers of participating clients and training rounds.
IDS system, the training is intended to be performed by using the While other FL-enabled IDS approaches have been proposed for IoT
local network traffic of each client. In this context, the number scenarios, they are not based on datasets with traffic associated with
of epochs represents the local training iterations performed by a such devices. In this direction, [11] evaluates different ML models,
client with its dataset before updating the global model. such as decision trees, Support Vector Machines (SVM), Random Forest
4. Then, the clients send their updated model’s parameters/weights and MultiLayer Perceptron (MLP) in a federated environment in which
back to the coordinator. Depending on the aggregation algo- the aggregation process is enabled through the use of blockchain. The
rithm being used, the coordinator aggregates all the parame- proposed approach is based on intermediate nodes to perform local
ters/weights to build a new global model, which will be used training using IoT devices’ data, as well as the KDDCup99 dataset [55].
in the next training round. This process in which clients train Moreover, [9] uses the NSL-KDD dataset [56] and MLP as the ML model
their model, update the global model and send the results to for a FL-enabled IDS system. The approach is based on the concept
be aggregated by the server is called a round. Although FedAvg of mimic learning in which a student model is trained with a public
is the most widely used aggregation algorithm [6], there is a dataset, which is labeled with a master model trained with sensitive
plethora of alternative algorithms that can be considered for this data. Also based on the NSL-KDD dataset, [32] uses neural networks
process, such as FedProx [38] or the recent Fed+ [21], which is to propose a FL-enabled IDS considering three scenarios according to
used in our evaluation. different data distributions regarding attack types. The use of neural
networks is also proposed by [43], which integrates a differential pri-
The application of FL in IoT scenarios has attracted a huge interest
vacy approach [57]. For this purpose, authors consider a scenario with
in recent years due to its benefits compared to traditional centralized
non-iid data using the CSE-CIC-IDS2018 dataset [58]. Moreover, [44]
learning approaches. However, there are still significant challenges to
employs Binarized Neural Networks (BNNs) [59] in edge devices to
be considered, such as communication and computing requirements or
reduce the overhead of traditional neural networks. The proposal is
potential security and privacy attacks [39,40]. In the context of IoT, the
FL application for IDS is still in its infancy, and existing proposals are based on the datasets CICIDS2017 [58] and ISCX Botnet 2014 [60],
often based on unrealistic settings and data distributions. These efforts as well as the aggregation algorithm signSGD [61] in order to reduce
are described in the next section. the overhead during the communication of model updates.
Besides previous works, recent efforts consider IoT-specific datasets
3. Related work to develop FL-enabled IDS in these scenarios. In particular, [45] pro-
poses the use of deep belief networks [62] to be deployed in IoT
As previously mentioned, the use of FL has attracted a significant gateways to detect potential attacks on a certain IoT subnet. Then, the
interest in recent years due to its characteristics and strengths, which different models are aggregated through FL. The proposed approach
3
Table 1
Classification of existing works on FL-enabled IDS for IoT.
Reference Attack Dataset ML model FL implementation Aggregation function Training parties Training rounds
studied
[42] 2 Simulated GRU – FedAvg – –
traffic
[8] 3 Generated GRU – FedAvg 14 3
[11] 1, 4–6 KDDCup99 MLP, DT, – FedAvg – 50
SVM, RF
[9] 1, 4–6 NSL-KDD MLPs TensorFlow, Keras FedAvg 10 20
[32] 1, 4–6 NSL-KDD NN – FedAvg 4 1–5
[43] 1, 7–11 CSE-CIC- NN TensorFlow FedAvg 1–50 10000
IDS2018
[44] 1, 7–10, CICIDS2017, Binarized NN TensorFlow signSGD – –
12 ISCX Botnet
2014
[45] 1, 4–6, KDD, Deep belief – FedAvg – –
13–20 NSL-KDD, network
UNSW-NB15,
N-BaIoT
[10] 1, 2, 12, BoT-IoT NN – FedAvg 4 1000
21, 28, 29
[12] 3, 20 N-BaIoT MLP, own library [46] FedAvg, Coordinate-wise 8 1–29
autoencoders median/trimmed mean
[47] 1, 17, 22 [48] Convolutional Flask [49], Keras [50] FedAvg 3–7 2–10
NN, GRU
[51] 23, 24 Modbus GRU Pytorch/PySyft FedAvg – 1–40
dataset
Our 1, 2, 15, CIC-ToN-IoT Logistic IBMFL FedAvg, Fed+ 4/10 1–300
approach 22, 24–27 regression
1: DoS, 2: DDoS, 3: Mirai, 4: U2R, 5: R2L, 6: Probe, 7: Web, 8: Bruteforce, 9: Infiltration, 10: Botnet, 11: DDOS+PortScan, 12: PortScan, 13: Fuzzers, 14: Analysis [52], 15:
Backdoor, 16: Generic [52], 17: Reconnaissance, 18: Shell code, 19: Worm, 20: BASHLITE, 21: Keylogging, 22: Injection, 23: Flooding, 24: MITM, 25: XSS, 26: Password, 27:
Scanning, 28: Data theft, 29: OS Fingerprinting.
uses several datasets, such as the N-BaIoT [63] dataset, which includes performance of FL can be reduced in the case of scenarios with non-iid
IoT devices’ traffic. However, authors do not provide information on and highly skewed data. While these aspects have not been evaluated in
the implementation being used or evaluation details considering as- the context of FL-enabled IDS, our work provides an exhaustive eval-
pects such as data distribution, number of clients or training rounds. uation under different data distributions using the recently proposed
This dataset is also used by [12], which proposes a binary classification ToN_IoT [66] dataset, which includes several IoT-related attacks. To
approach based on supervised learning (using MLP) and unsupervised cope with the impact of non-iid data, we compare the performance of
learning (using autoencoders). Additionally, the proposed approach the typical FedAvg algorithm with a recent approach called Fed+ [21].
uses different aggregation methods based on [64], which are compared To the best of our knowledge, this is the first approach evaluating the
considering different types of attack. In this case, it should be noted impact of non-iid data on the development of FL-enabled IDS for IoT.
that authors created a balanced dataset with the same number of
samples and proportion of classes for all devices. This distribution 4. Methodology
could be compared with our balanced scenario described in Section 4.2.
Moreover, the Bot-IoT dataset [65] is used by [10], which proposes
Before describing our evaluation results for the proposed FL-enabled
multiclass classification based on neural networks together with Prin-
IDS for IoT considering non-iid data, in this section we explain the main
cipal Component Analysis (PCA) in an edge-based network architecture
processes and assets used for this purpose. They include the dataset
with IoT gateways. The proposal distributes the dataset in four clients
selection, data distribution among several FL clients, as well as the
according to attackers’ IP address; however, details on the implemen-
classifier technique and aggregation functions being considered.
tation being used and data distribution in the different parties are
not described. Additionally, other works on the use of FL for IDS in
IoT are based on specific datasets for industrial environments. In this 4.1. Dataset selection
direction, [47] integrates Convolutional Neural Networks (CNN) and
GRUs for the detection of different attacks using the dataset described For the development of our FL-enabled IDS proposal for IoT, a key
in [48]. Furthermore, [51] also uses GRU with a dataset based on the aspect is the selection of an appropriate dataset. As described in the
well-known Modbus protocol [48]. previous section, recent approaches are based on obsolete and generic
Our literature analysis demonstrates that the development of FL- network traffic datasets, which do not consider IoT-specific protocols
enabled IDS approaches for IoT is still in its infancy. On the one hand, and attacks. Furthermore, as described by [12], most of the datasets for
while most of the previous works are intended to be considered in such IDS were not conceived to be used in an FL environment, as they cannot
scenarios, they are not based on datasets with IoT devices’ network be properly distributed among different clients. Therefore, our analysis
traffic. On the other hand, we note that a significant amount of the is focused on IoT datasets for IDS that can be divided by IP address
previous works do not provide information about the implementation or device [12], namely Bot-IoT [65], N-BaIoT [63], MedBIoT [67],
being used, or details related to the evaluation process, such as number IoTID20 [68] and ToN_IoT [66]. In the case of ToN_IoT, we consider the
of clients or training rounds. Furthermore, most of the works do not CIC-ToN-IoT dataset [69], which is generated from the original pcap
describe the data distribution among the different clients, or they files of ToN_IoT. An overview of these datasets is shown in Table 2,
consider scenarios where clients’ data are associated to a portion of in which they are compared according to several aspects, such as
the dataset that includes the same number of samples for each attack number of features and samples, attacks, the use of labeled data, or
being considered. However, as discussed in previous works [7], the their testbed.
4
Table 2
Comparison between relevant contemporary intrusion datasets for IoT (N=NO, Y=YES).
Dataset Training/ # features # samples Normal/malign Attacks Data Best-features Realistic
testing sets? flow ratio labeled? set testbed?
Bot-IoT [65] Y 46 73,370,443 0.00013:1 PortScan, OS Fingerprinting, Y Y Y
DoS/DDoS, Data Theft,
Keylogging
N-BaIoT [63] N 115 7,062,606 0.07:1 Mirai Bot, BashLite Bot Y N Y
MedBIoT [67] N 100 17,845,567 2.36:1 Mirai Bot, BashLite Bot, Torii Y N Y
Bot
IoTID20 [68] N 83 625,784 0.06:1 Mirai Bot, MITM, PortScan, Y Y Y
OS Fingerprinting
CIC-ToN-IoT [69] N 83 5,351,760 0.88:1 Backdoor, DoS, DDoS, Y N Y
Injection, MITM, Password,
Ransomware, Scanning, XSS
A common aspect of the different datasets is that they are based given a dataset of length 𝑛, and 𝑘 classes of size 𝑐𝑖 , the balance between
on realistic testbeds, as well as labeled data considering different types the classes is given by the formula:
of attack. Bot-IoT is the only analyzed dataset that provides training 𝑘 𝑐𝑖 𝑐𝑖
−𝛴𝑖=1 𝑛
log 𝑛
and testing sets. Furthermore, this dataset and IoTID20 identify a set Entropy = (1)
of best features to be considered. However, we note that most of the log 𝑘
datasets suffer from a significant imbalance between benign and attack where the function is equal to 0 if all classes are 0 except one, and is
traffic that can negatively affect the ML/DL approach, so that oversam- equal to 1 if all 𝑐𝑖 = 𝑘𝑛 . Furthermore, it should be noted that we consider
pling/undersampling could be required. In this direction, we note that that each FL client is represented by a single IP address. In this context,
the ToN_IoT dataset provides the best ratio between benign and attack 𝑛 is the number of network flows, 𝑘 is the number of the attack classes
traffic. This aspect could significantly impact on the evaluation results and 𝑐𝑖 is their size.
if the effect of very unbalanced data distributions (e.g., with only a few
samples of a certain class) is not properly considered. Furthermore, this 4.2.1. Basic scenario
dataset considers a broader diversity of attack types compared to the In this scenario, each FL client’s dataset is based on the network
other datasets being analyzed. For example, N-BaIoT and MedBIoT fo- traffic of the corresponding IoT device. As described in Table 3, in this
cus on particular attacks that are launched by IoT devices composing a case the distribution of classes and samples among the different nodes
botnet. However, they do not consider other attacks, such as DDoS/DoS is highly unbalanced. Indeed, party 7 only has benign traffic samples,
or MITM that should be considered in IoT environments. while parties 1 and 3 only have 2 samples of XSS attack. Consequently,
Moreover, while the different datasets are based on realistic these parties have the lowest Shannon entropy value. This scenario
testbeds, ToN_IoT is built using an IoT/IIoT testbed composed by represents a typical situation in a certain IoT network in which specific
edge/fog nodes and cloud components to simulate an IoT/IIoT pro- devices can be victims of several attacks while other devices perform
duction environment. Furthermore, ToN_IoT is the only dataset that their intended operation and they are not subject to attacks. However,
considers data from sensor readings and telemetry data, which can as described in Section 5, the straightforward application of FL in this
be used to detect additional attacks (beyond the network level) in scenario could result in poor performance and convergence issues.
such environments. Although ToN_IoT has been used in recent works
(e.g., [29]), to the best of our knowledge, this is the first effort to 4.2.2. Balanced scenario
consider ToN_IoT in a FL setting. Therefore, the evaluation results In this case, we select a portion of our dataset, which is distributed
provided in Section 5 could be considered as a starting point for future among the 10 parties, so that each party has the same number of
evaluations on this dataset on an FL setting. samples of each class. Therefore, as shown in Table 3, all the parties
have the same Shannon entropy value. As will be described in Section 5,
4.2. ToN_IoT partitioning such balanced scenario presents better performance; however, in this
case, each FL client could have samples of other nodes, so that it can
To create the three proposed scenarios based on different data dis- result in privacy issues depending on the scenario being considered.
tributions, we use the CIC-ToN-IoT dataset [69], which was generated It should be noted that such scenario can be compared with similar
through the CICFlowMeter tool [70] from the original pcap files of settings in previous works, such as [12], which uses a version of the N-
the ToN-IoT dataset, as previously described. Such tool was used to BaIoT dataset where the number of samples and the class proportions
extract 83 features, which were reduced by removing those with a are the same for all devices.
non-numeric value (e.g., flow ID). Then, we separate the samples of
the whole dataset according to the destination IP address, and select 4.2.3. Mixed scenario
the 10 IP addresses with more samples. The reason for this division The mixed scenario is generated to achieve a tradeoff between the
is to associate the traffic of each IP address to a single FL client. two previous settings in which each party maintains its own samples,
Furthermore, we selected a subset of the whole dataset considering but they are locally balanced. In particular, we select the parties with
10 devices to show the evolution of each node during the federated a Shannon entropy value, calculated by (1), higher than a certain
training process. Those observations constitute our dataset. Such result- threshold (0.2), that is, parties 0, 2, 4 and 5. After this initial filtering
ing dataset contains 4.404.084 samples, which represent 82,29% of the step (due to the fact that the parties’ classes are not well balanced)
original CIC-ToN-IoT. we use a simple instance selection mechanism that removes some of
From this dataset, we create three scenarios to evaluate the impact the samples from the predominant classes until we reach the Shannon
of different data distributions on the performance of our multiclass entropy within a range of values. Having this set in between 0.66 and
classifier to detect attacks. The datasets of such scenarios are available 0.71, we obtain a dataset that represents a compromise between the
at [20]. Specifically, we use Shannon entropy [19] to measure the basic scenario where no balancing was used, and the balanced scenario
imbalance of the different local datasets of each FL client. In particular, where we artificially distributed the dataset among the 10 parties.
5
Table 3
Description of the basic, balanced and mixed scenarios.
Scenario Party Total samples Benign XSS Injection Password Scanning MITM DDoS Dos Backdoor Entropy
0 811504 42527 474520 140519 140519 13419 – – – – 0.52041
1 763518 763516 2 – – – – – – – 0
2 740117 116540 594627 16271 1138 10923 253 202 145 18 0.28669
3 519806 519804 2 – – – – – – – 0
Basic 4 424531 2794 307962 66812 38009 8954 – – – – 0.38890
5 330956 10537 206036 44043 67431 2909 – – – – 0.47291
6 223092 3587 209637 9868 – – – – – – 0.11976
7 217737 217737 – – – – – – – – 0.0002
8 186891 8981 177910 – – – – – – – 0.08794
9 185932 8551 177381 – – – – – – – 0.08511
Balanced 0–9 43549 10000 10000 10000 10000 3500 20 18 10 1 0.7611
0 205946 42527 50000 50000 50000 13419 – – – – 0.69858
Mixed 2 42679 10000 10000 10000 1138 10923 253 202 145 18 0.70266
4 71748 2794 20000 20000 20000 8954 – – – – 0.66218
5 73446 10537 20000 20000 20000 2909 – – – – 0.66888
4.3. Multiclass classification the weights of the general model and 𝑊 𝑘 = (𝑤𝑘𝑖 ) the weights of the
party 𝑘, then:
Considering the already described scenarios, we use a multiclass ∑ 𝑑𝑖
probabilistic classification model to classify the instances into benign 𝑤𝑖 = 𝑤𝑘 , (4)
𝐷 𝑖
or a specific type of attack. For this purpose, we apply the multinomial
where 𝐷 and 𝑑𝑖 are the total data size and data size of each party
logistic regression [71], also called soft-max regression, due to its easy
respectively.
implementation and training efficiency. It can also interpret model
However, as described in recent works [7,16,73], the performance
coefficients as indicators of feature importance. The reason to choose
of FedAvg may be degraded in scenarios with non-iid and highly
this well-known model is because we focus on the impact of different
skewed data. While recent works propose alternative aggregation func-
data distributions and aggregation functions on the effectiveness of a
tions considering convergence and privacy aspects [39], in this work
FL model by considering the same ML technique. Indeed, the use of
we consider a recent approach called Fed+ [21], which unifies several
logistic regression is intended to provide a baseline scenario, avoiding
functions to cope with scenarios composed by heterogeneous data
additional complexity for the interpretation of the evaluation results
distributions. For this purpose, Fed+ relaxes the requirement of forcing
provided in Section 5.
all parties to converge on a single model (as in the case of FedAvg). In
Multinomial logistic regression is a simple extension of binary lo-
particular, let be the main objective in FedAvg:
gistic regression [72] that allows for more than two categories of the
dependent or outcome variable which do not present an order. As with 1 ∑
min 𝐹 (𝑥) = 𝑓𝑖 (𝑥), (5)
most classifiers, the input variables need to be independent for the 𝐷
correct use of the algorithm. Given the input 𝑥, the objective is to know where 𝑓𝑖 is the local loss function of the party 𝑖. In the case of Fed+,
the probability of 𝑦 (the label) in each potential class 𝑝(𝑦 = 𝑐|𝑥). The the main objective is:
softmax function takes a vector 𝑧 of 𝑘 arbitrary values and maps them 1 ∑
min 𝐹 (𝑥) = 𝑓𝑖 (𝑥) + 𝛼𝑖 𝐵(𝑥, 𝐶(𝑋)), (6)
to a probability distribution as follows 𝐷
exp(𝑧𝑖 ) where 𝛼𝑖 is a penalty constant, 𝐵(⋅, ⋅) is a distance function, and 𝐶 is an
softmax(𝑧𝑖 ) = ∑𝑘 . (2) aggregate function that computes a central point of 𝑥.
𝑗=1 exp(𝑧𝑗 ) It should be noted that this work represents the first effort to use
In our case, the input of (2) will be the dot product between a weight Fed+ to evaluate its impact in the context of FL-enabled IDS for IoT.
vector 𝑤 and the input vector 𝑥 plus a bias for each of the k classes: As will be described in Section 5, the use of such approach mitigates
exp(𝑤𝑐 𝑥̇ + 𝑏𝑐 ) the convergence issues of FedAvg specially in settings with non-iid and
𝑝(𝑦 = 𝑐|𝑥) = ∑𝑘 . (3) skewed data.
𝑗=1 exp(𝑤𝑗 𝑥̇ + 𝑏𝑗 )
The loss function for multinomial logistic regression generalizes the loss 5. Evaluation results
function for binary logistic regression and is known as the cross-entropy
loss or log loss. Based on the different aspects of the proposed methodology, in
It should be noted that unlike previous works based on binary this section we describe our evaluation results. For this purpose, we
classifiers (e.g., [12]), we consider the detection of a specific attack as consider the following metrics:
a key factor to dynamically deploy the most effective countermeasures 𝑇 𝑃 +𝑇 𝑁
• Accuracy:
to mitigate such threat. Furthermore, while other classifiers could be 𝑇 𝑃 +𝐹 𝑃 +𝐹 𝑁+𝑇 𝑁
employed (and it represents part of our future work), our evaluation • Precision: 𝑇 𝑃𝑇+𝐹𝑃
𝑃
𝑇𝑃
results are focused on the impact of different data distributions and • Recall: 𝑇 𝑃 +𝐹 𝑁
non-iid data in the classifier performance. • 𝑅𝑒𝑐𝑎𝑙𝑙∗𝑃 𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛
F1-score: 2 ∗ 𝑅𝑒𝑐𝑎𝑙𝑙+𝑃 𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛
𝐹𝑃
• False Positive Rate (FPR): 𝐹 𝑃 +𝑇 𝑁
4.4. Aggregation functions
where TP: true positives, TN: true negatives, FP: false positives, and FN:
As described in Section 2, the local updates generated by each false negatives.
client in FL are combined through an aggregation function in each Precision, recall, F1-score, and FPR metrics are calculated for each
training round. The most basic aggregation function is represented by scenario described in Section 4.2. In the case of multiclass classification,
FedAvg [6], which generates the global model based on the average of such metrics can be calculated by using micro, macro, and weighted
the weights generated by the FL clients. In particular, let 𝑊 = (𝑤𝑖 ) be averaging. The micro-averaging calculates the metrics using the total
6
amount of TP, TN, FP, and FN, independently of the number of classes. Table 4
Comparison between distributed method and federated method.
The macro-averaging calculates each metric for each class indepen-
dently, and then it uses the average of all the classes’ values. Then, the Accuracy distributed Accuracy federated
weighted-averaging follows a similar approach to the macro-averaging, Party 0 0.5526 1.0

Party 1 1 0.7293
but instead of using the normal averaging, the average is weighted
Party 2 0.9435 1.0
depending on the class size. As some of our scenarios are based on Party 3 1 0.9402
imbalanced datasets (see Section 4.2), we use the weighted-averaging Party 4 0.7283 0.9434
for our evaluation. Party 5 0.6525 0.9525
Moreover, we train the model across 300 rounds for each scenario Party 6 0.9412 0.9513
Party 7 1 0.5566
by considering one epoch for each training round. The reason to choose
Party 8 0.9493 1.0
300 rounds is because in the basic scenario (when FedAvg is used), Party 9 0.9508 0.6527
the accuracy starts to decrease around the round 200, so for every
case we have set the same number of rounds, 300, despite the rest of
cases converge at round 50 approximately. The number of epochs is
a hyperparameter that defines the number of times that the learning in a more balanced dataset may result in lower accuracy. It should
algorithm will work through the entire training dataset in each specific be noted that, according to Fig. 2, the accuracy of parties 3, 4, 7,
client. One epoch means that each sample in the training dataset 9 is decreased after around 200 training rounds. This aspect could
has updated the internal model parameters only once. Furthermore, be related to the use of FedAvg as aggregation function that could
the logistic regression algorithm is implemented by using scikit-learn represent convergence issues, as described by recent works [16].
SGDClassifier (Stochastic Gradient Descent). In particular, we choose a Table 4 shows the accuracy of each party by considering the dis-
logarithmic loss function to use the logistic regression, and the norm tributed and the federated scenario (using Fed+). It should be noted
𝐿2 in order to shrink model parameters toward the zero vector. Before that parties with a low entropy (see Section 4.2) provide a higher
the application of the ML/DL, the data is normalized. Furthermore, a accuracy in the distributed setting than in the federated scenario. This
ratio of 80–20 was defined between training and testing sets. can be justified since parties with fewer classes and lower balance will
For our evaluation, we consider FedAvg and Fed+ as aggregation classify better the samples of such predominant classes. Then, in the
functions in our FL-enabled IDS approach. Furthermore, we also mea- case of a federated environment, the weights of those parties with a
sure the accuracy of each client in a distributed scenario, where each few classes will be negatively influenced by the weights of other parties
party trains the model using their own data independently from the with more classes, because these parties detect different and additional
other parties (see Section 2). It should be noted that we do not consider types of attacks.
a centralized setting (in which devices send their data for training a As shown in Fig. 4, the other metrics (beyond accuracy), calculated
model) because in that case all the classes would be represented in the with Fed+, remain stable through the rounds, following a similar trend
dataset. Therefore, it would be unfair to compare such setting with a as the accuracy. The parties with a high FPR have poor results in
distributed/federated scenario in which clients only have traffic asso- terms of the others metrics. The values in recall, F1-score and precision
ciated to their IP address, and only some of the classes are represented of these parties are similar to the ones in the accuracy, except for
in their partial datasets. Nevertheless, for the sake of completeness, party 2 and party 8, which provide 0 for precision and recall (and
we measure the accuracy of the centralized setting and obtain a value consequently in F1-score), and 1 for FPR. This situation can arise in
of 0.724 using multinomial logistic regression. This value is close to scenarios with unbalanced datasets (like in this case), where a high
0.77, which represents the highest accuracy value obtained in the work accuracy is obtained (due to a high TN ratio) but recall and precision
describing the ToN_IoT dataset [66]. remain low (because of a low value for TP ratio)
Our experiments have been carried out in a simulated and dis- Previous results demonstrate that the direct application of FL to sce-
tributed testbed using IBMFL, that employs a federated architecture for narios with non-iid and highly skewed data could lead to undesirable
learning. It has been set-up with 10 IoT devices or parties (each IoT results. Therefore, there is a need to consider a suitable client/instance
device runs a different FL process) plus a central server. Although in selection process to make the dataset more balanced among the differ-
the simulation the federated learning processes are executed all in one ent clients in terms of number of classes and samples. The evaluation re-
physical machine, the simulation splits the learning in different isolated
sults for the balanced and mixed scenarios demonstrate the importance
processes or threads, each one running the federated learning task in
of such process, and are described below.
parallel. The federated environment, parties and server, were simulated
in a Lenovo laptop with an AMD Ryzen 7 4800H with Radeon Graphics,
5.2. Balanced scenario
and 16 GB of RAM.
5.1. Basic scenario In this scenario, the data is equally distributed among parties ac-
cording to the description provided in Section 4.2.2. Figs. 5 and 6
As described in Section 4.2, in this scenario, each party has the show the evolution of the parties’ accuracy by using FedAvg and Fed+
data corresponding to the traffic associated to a single IP address. algorithms respectively. In the case of FedAvg, parties with a high
Such scenario is characterized by a non-iid and highly skewed data accuracy obtain a decrease of such value throughout the rounds. For
distribution. This aspect is reflected in Fig. 2 and Fig. 3, which show the parties with a low accuracy, the evolution is similar to the Fed+ case.
accuracy evolution of each client by using FedAvg and Fed+ methods, Furthermore, as shown in Fig. 6, there is a clear increment in the
respectively. As shown, the accuracy value of each party remains stable accuracy for all parties that remain stable (between around 0.8 and
throughout the training rounds. While the accuracy value seems high 1) after about 50 rounds.
for parties 0, 2, 3, 4, 5, 6, and 8, this circumstance may be related to the Furthermore, the evolution of FPR, F1-score, recall and precision
heavily imbalanced dataset where accuracy may not be an exhaustive metrics in the case of Fed+ are shown in Fig. 7. In particular, the value
indicator because of the predominance of the data of the larger class of recall, F1-score and precision increase throughout the rounds with
(e.g., the legitimate traffic in this case). Then, accuracy is not fully a similar trend as the accuracy. Moreover, the FPR value decreases
representative since if a class represents the vast majority of the dataset, throughout the rounds until it converges to a lower value. Compared
the classification process will provide a high accuracy even if only a with the results for the basic scenario, these metrics have values akin
single class is actually learned. However, the application of such model to the accuracy following a similar trend.
7
Fig. 2. Basic scenario’s accuracy with FedAvg.
Fig. 3. Basic scenario’s accuracy with Fed+.
Fig. 4. Basic scenario’s FPR, F1-score, recall and precision with Fed+.
8
Fig. 5. Balanced scenario’s accuracy with FedAvg.
Fig. 6. Balanced scenario’s accuracy with Fed+.
Fig. 7. Better balanced scenario’s precision, recall, F-1 score and FPR with Fed+.
According to the obtained results, this scenario shows a better compared to the basic scenario. In particular, in the case of Fed+,
evolution in the parties for the different metrics being considered all the parties improve such metrics throughout the initial 50 rounds,
9
Fig. 8. Mixed scenario’s accuracy with FedAvg.
Fig. 9. Mixed scenario’s accuracy with Fed+.
when their values remain stable. However, in the case of FedAvg, these results for each metric: FPR=0.28, F1-score=0.91, recall=0.925 and
value drop for some of the parties. Therefore, spite this scenario was precision=0.9. Parties 0 and 5 have similar results, except that party
artificially balanced, so that the parties have samples all the different 0’s precision is similar to parties 2 and 4. It should be noted that these
attacks, the use of FedAvg still could lead to convergence issues. This results are similar to the balanced scenario. As in the previous case, it
could be due to the fact that even with a more balanced dataset among means that accuracy results are consistent with the values obtained for
the different parties, the number of samples of each attach in every the other metrics.
party still remains unbalanced. Based on the obtained results, this scenario represents a trade-off
between the previous two scenarios obtaining similar results to the
5.3. Mixed scenario balanced setting, where samples are shared among the different parties.
Furthermore, previous results demonstrate the need for considering
The data distribution for this scenario is described in Section 4.2.3. additional aggregation functions (beyond FedAvg) in order to deal with
Fig. 8 shows the accuracy evolution for each party when FedAvg is scenarios characterized by non-iid and skewed data among the parties
used. According to it, there is a clear decrease in the accuracy of party that are common in real-world scenarios.
2 until about round 200, and such trend is also observed for party 0
after a significant increase in the very initial rounds. In the case of 5.4. Comparison between basic, balanced, mixed, and distributed scenarios
party 4 and party 5, the accuracy value remains stable. The decrease
of accuracy is due to the unbalance of the scenario in which parties After analyzing the different evaluation metrics, Fig. 11 shows a
0, 4 and 5 only have a subset of attack types. Then, Fig. 9 shows comparison of the average accuracy of the parties for each federated
the accuracy evolution of the different parties with Fed+, in which scenario and a distributed setting, considering FedAvg and Fed+. It
accuracy values grow until a certain number of rounds (about 50) when should be noted that each federated subcase represents the average of
they remain stable. In the case of party 2, accuracy is more oscillating 10 executions where the dataset is shuffled before splitting it into train-
due to the fact that such party has samples of all the different classes test and the random state was changed for each execution. According
in its local dataset. to these results, Fed+ provides higher accuracy for all the federated
Fig. 10 shows the evolution for the other metrics when using Fed+ scenarios being considered. This demonstrates that it is able to handle
with a similar trend as for the accuracy. Parties 2 and 4 have the best better scenarios where parties do not have balanced datasets.
10
Fig. 10. Mixed scenario’s precision, recall, F-1 score and FPR with Fed+.
For the basic scenario, graphs are similar for FedAvg and Fed+. related to our analysis and evaluation results provided in previous
However, it should be noted that, in the case of Fed+, accuracy remains sections (e.g., 6.2 and 6.5), while others are based on existing literature
constant about 0.8725, which is close to the 0.8718 of the distributed and described in the context of IDS approaches for IoT. Moreover,
method, whereas it drops slowly from 0.8725 when using FedAvg. In as described by [74], it should be noted that many of the challenges
the balanced scenario, the initial accuracy starts at 0.8569 and rapidly associated with the use of FL in such context will require multidis-
grows to 0.9039 (where it remains stable throughout the rounds) for ciplinary approaches, including the application of privacy techniques,
Fed+. When FedAvg is used, accuracy grows from 0.8349, until 0.88 cryptography, distributed optimization, or information theory. Fig. 12
after 50 rounds, but it gradually drops to 0.87. Compared with the provides a summary of the following subsections.
distributed setting, Fed+ has a similar accuracy to the 0.9065 of such
scenario, since all parties have the same amount of data and number of 6.1. Deploying FL on IoT devices
classes. However, FedAvg does not reach the accuracy of the distributed
case. The main reason is that, while parties’ datasets are balanced While our work focuses on the impact of different data distributions
among each other, each local dataset is unbalanced in relation to the on FL by using a simulated testbed, a significant set of challenges is
number of samples for each class. derived from the deployment of a FL framework on real IoT devices.
In the case of the mixed scenario, accuracy (when Fed+ is used) goes Indeed, as described by [75], the computational requirements of well-
from 0.8498 until 0.8876 after 50 rounds, and it remains stable until known ML approaches might not be satisfied by constrained IoT devices
it finishes with 0.8869. Indeed, after about 40 rounds, Fed+ overtakes in terms of memory, computing power and energy consumption. This
the accuracy for the distributed case (0.877). However, the behavior of aspect can be aggravated in the case of applying DL techniques, which
FedAvg is worse than the distributed case. In particular, accuracy goes require in general more computing resources than ML. To address
from 0.8157 to 0.8698 after 10 rounds, but then, it decreases slowly such limitations, a current trend is the use of intermediate nodes at
until 0.8423. Therefore, in this scenario, Fed+ clearly improves the the network edge, so that the end devices send their data to these
behavior of FedAvg. nodes acting as FL clients [76,77]. For example, [78] use intermediate
Based on the previous evaluation, Fed+ provides better results than entities (called RSPs) in charge of performing the local training in an
FedAvg, which could introduce convergence issues in certain situations. FL setting. A similar approach is also proposed by [79], which used an
Indeed, Fed+ provides better results for the mixed scenario compared edge computing architecture to determine the aggregation frequency
to the results of the balanced setting when FedAvg is used. Based of the global model. However, it should be noted that sharing network
on the results for the different scenarios, it should be noted that the traffic with these intermediate nodes to identity potential attacks can
impact of different data distributions is more clear in the case of Fed+ still pose privacy concerns. Other approaches consist of the reduction of
and the distributed setting, where the best results are obtained for the the data that needs to be sent by segmenting and representing it [80], as
balanced scenarios, while the basic scenario provides the lowest value well as by exploring feature selection [81,82]. Therefore, more efforts
for accuracy. However, in the case of FedAvg the basic and balanced are needed to analyze the practical limitations of FL approaches in IoT
scenarios provide similar accuracy results, while the mixed scenario scenarios, as well as the security and privacy implications derived from
presents lower accuracy values. In any case, as already mentioned, the the use of edge computing architectures. In this context, a potential
use of Fed+ has a clear impact in the results obtained for the different research direction is associated with the application of TinyML frame-
scenarios by improving the evaluation metrics’ values when FedAvg is works (e.g., TensorFlow Lite [83]) in FL scenarios, as recently described
employed. by [84].
6. Challenges and research directions 6.2. Limitations of existing IDS-IoT datasets for FL
Based on the evaluation results provided in the previous section As described in Section 3, some of the existing FL-enabled IDS
and the analysis of the literature on the use of FL [39,74], below we proposals for IoT are based on general network datasets, which do
describe some of the main challenges and future research directions not consider IoT technologies and devices. Even though some datasets
to be considered for the development of FL-enabled IDS in the scope have recently been proposed for IoT scenarios, as described by [12],
of IoT scenarios. In particular, some of these challenges are directly some of them cannot be applied in an FL environment, since they do
11
Fig. 11. Comparison of average accuracy between basic, balanced, mixed and distributed scenarios.
Fig. 12. Challenges and future directions in Federated Learning for IDS.
not provide data associated with different IP addresses or devices, in 6.3. Aggregator as bottleneck
particular the IP destinations that can be identified as the parties of the
FL environment in IDS. Furthermore, as described in Section 4, most Even though FL is based on a collaborative training approach, the
IDS datasets for IoT present a significant imbalance between benign coordinator entity may become a bottleneck from a performance and
and attack traffic, as well as a limited set of attacks being considered. privacy perspective, as well as a single point of failure. To address such
Moreover, we note that ToN_IoT is the only dataset that considers issue, a current trend is the application of blockchain technology [87],
possible security threats related to telemetry data and sensor readings, which represents a distributed and immutable ledger shared by several
unlike other datasets only dealing with network attacks. However, as nodes. The use of blockchain can increase the level of trust in an FL
described by [85], the development of IDS datasets for IoT still needs environment, where the centralized coordinator is replaced by a set
to consider a broader scope of IoT technologies (including well-known of nodes with distributed functionality, which is carried out through
protocols like CoAP [86]), as well as additional aspects (e.g., energy smart contracts [41,88]. Indeed, blockchain has been proposed in
consumption) that can serve to identify potential attacks. Therefore, recent works to make model updates accountable and avoid potentially
more effort is needed in the development of IDS datasets for IoT malicious updates [89]. Furthermore, the use of blockchain is also
considering its divisibility to be deployed in a FL setting. proposed by [90] with a similar purpose in the scope of FL-enabled
12
vehicular networks. In the context of an IDS approach for IoT, [78] not want to use their limited resources for this purpose. While some
uses intermediate nodes acting as blockchain clients to store the model recent works address this issue in IoT scenarios [98], more efforts are
parameters updated by the end devices to avoid potential manipulation. required in real IoT environments to evaluate its impact on the learning
Despite these efforts, we note that most of current approaches do not process. Therefore, future strategies to come up with an effective client
provide comprehensive evaluations considering training frequency and selection in IoT systems must consider the changing conditions of
scenarios with a large number of devices, which may be required for devices in each training round.
IDS approaches. Furthermore, as described by [74], the use of permis-
sionless blockchains (e.g., Ethereum [91]) can raise privacy concerns, 6.6. Dynamic IoT devices’ behavior throughout their lifecycle
which must be addressed by proper encryption or differential privacy
techniques, as described in Section 6.8. Related with the previous aspects, there is a need to consider the
changing behavior of IoT devices throughout their lifecycle that could
6.4. Communication requirements impact the effectiveness of a FL-enabled IDS approach. This aspect is
not considered in the existing literature, and it is based on our own
The need for a significant communication bandwidth to exchange experience on IoT security [99,100]. For example, a software update
global model updates represents a well-known issue associated with process for a certain device can change its behavior [101], so that a
the use of FL [75]. This problem can be exacerbated in IoT scenarios new learning process is required in order to reflect the new behavior
where end devices acting as FL clients need to communicate their model as benign traffic in the context of an IDS. However, such change could be
updates through constrained networks and devices, which can degrade also related to a potential attack affecting this device. Therefore, there
the network or IoT performance [92]. In general, there are two main is a need to integrate network management approaches to detect if be-
factors that impose strong communication requirements between FL havioral changes in a certain device are produced intentionally, or they
clients and coordinator. The first aspect is related to the amount of data are due to a malicious action. Furthermore, the behavioral changes of a
associated with the gradient exchange [93], which is required between single device, known as data-drifts [80], could affect to the behavior of
clients and the coordinator for the learning process. This is generally other interacting devices. In the case of a FL scenario, it could require
addressed by gradient compression techniques, such as quantization new training rounds that might have a significant impact specially
and sparsification, as described by [94]. The second aspect is related in settings with constrained devices and networks. More specifically,
to the number of training rounds required to converge the model a Federated Reinforcement Learning scheme is proposed in [76] to
that can vary depending on the scenario, dataset, data distribution, control multiple real IoT devices of the same type but with slightly
or the ML algorithm being considered. For example, based on our different dynamics. However, this aspect is not addressed in existing
evaluation results, the different metrics remain stable after 50 rounds FL-enabled IDS approaches, which are based on existing datasets that
in the balanced and mixed scenarios, although this may be different do not reflect potential behavioral changes on IoT devices throughout
with other evaluation conditions. While a common trend to reduce the their lifecycle, so this is a novel field to research in.
training rounds is to perform several local training iterations before
updating the global model [95], the execution of such local training 6.7. Security attacks
iterations may have a significant impact on FL clients, specially in
case of resource-constrained devices (see Section 6.1). Indeed, while an Like in the case of centralized approaches, FL is also susceptible to
excessive number of epochs could overload the IoT device, an increase several attacks that can affect the learning process. Indeed, as described
in the number of training rounds could impact in the bandwidth in recent works [36], some of the major security threats in FL are repre-
requirements, as previously mentioned. Hence, the use of compression sented by data poisoning and model update poisoning attacks. The former
techniques, as well as to reach a tradeoff between number of epochs is related to the attacker ability to add false training data or modify
and rounds in a certain FL setting are crucial aspects to be considered the existing dataset of a certain client, for example, by modifying the
in future FL deployments. labels (label-flipping). The latter focuses on changing the global model
instead of the local training dataset. The realization of such attacks
6.5. Client selection could cause false alarms in an IDS approach due to misclassification
of benign/malicious traffic [102]. To address such concerns, a recent
As described in Section 2, in each training round, the coordinator work evaluates the behavior of different aggregation functions against
can select a subset of devices to participate as FL clients in the training several security attacks in an FL-enabled IDS approach [12]. Indeed,
process. For this purpose, different aspects such as device status, battery the application of certain aggregation approaches could help to make
level, computing/communication capacity, or ML technique’s accuracy an FL setting more robust against potential attacks. In this direction,
could be considered [96,97]. Indeed, the client selection process can as part of our future work, we will evaluate how Fed+ behaves in the
have an impact on the obtained accuracy and, therefore, on the de- context of different data poisoning and model update poisoning attacks
tection of potential security attacks in the scope of an IDS approach. with different data distributions. Other complementary approaches to
In our case, according to the results described in Section 5, we found be considered are based on network management approaches to ensure
that even a static client selection process can help to obtaining a that only devices behaving as intended can participate in the training
better performance of the ML algorithm. However, more sophisticated process [40]. These proposals still require lightweight cryptographic
client selection strategies must consider the dynamic aspects of an IoT mechanisms to be considered in real IoT environments so that devices
environment in each training round. For example, some devices may do not provide fake or forged data during the training process. Addi-
not be available in a certain round due to mobility issues or loss of con- tionally, trust and reputation mechanisms can also be used in order
nectivity [76]. Furthermore, due to devices heterogeneity, while some to prevent malicious nodes from injecting false data into the training
of them could perform the local training in a few milliseconds, other phase, even when using suitable cryptographic approaches [103].
devices could require a longer period to update the model (e.g., due
to resource constraints), which could slow down the overall federated 6.8. Privacy concerns
training [37]. In the context of IDS, this could lead to a longer delay
in detecting a certain attack, which could have severe consequences on While FL was mainly proposed to mitigate the privacy concerns
the overall cybersecurity of the network. An additional aspect is related associated with centralized learning approaches, it can still leak in-
to the need to provide incentives to devices, in order to foster their formation from clients’ training data. Indeed, as described by [36],
participation in the training process [36]. Otherwise, some devices may a malicious server could infer information from model updates, as
13
well as alter them in order to fool the global model. This can be Acknowledgments
exacerbated in the context of IDS approaches to IoT, where device
network traffic data can reveal everyday user habits. Therefore, the This work has been sponsored by UMU-CAMPUS LIVING LAB
application of privacy-preserving techniques for FL has attracted a sig- EQC2019-006176-P funded by ERDF funds, by the European Commis-
nificant interest recently [39], including the use of differential privacy sion through the PHOENIX (grant agreement 893079) CyberSec4Europe
(DP) approaches [57], secure-multiparty computation (SMC) [104] and (g.a. 830929) and DEMETER (g.a. 857202) EU Projects. It was also co-
homomorphic encryption [105]. However, these techniques often come financed by the European Social Fund (ESF) and the Youth European
at a cost in terms of accuracy and efficiency [39,106], which can Initiative (YEI) under the Spanish Seneca Foundation (CARM). We
negatively affect the attack detection capabilities of IDS approaches. also thank Bernardino Romera-Paredes (Deepmind) for his valuable
Indeed, a recent work evaluates the application of DP for an FL-enabled comments.
IDS considering non-iid data [43]. Although other recent efforts have
been proposed for IoT scenarios [92], more studies are required to References
come up with a tradeoff between privacy requirements, as well as
[1] N. Neshenko, E. Bou-Harb, J. Crichigno, G. Kaddoum, N. Ghani, Demystifying
performance and accuracy requirements for effective IDS approaches.
IoT security: an exhaustive survey on IoT vulnerabilities and a first empirical
look on internet-scale IoT exploitations, IEEE Commun. Surv. Tutor. 21 (3)
7. Conclusions (2019) 2702–2733.
[2] M.S. Pour, A. Mangino, K. Friday, M. Rathbun, E. Bou-Harb, F. Iqbal, S.
Samtani, J. Crichigno, N. Ghani, On data-driven curation, learning, and analysis
The application of FL techniques has attracted a significant interest for inferring evolving internet-of-things (IoT) botnets in the wild, Comput.
in recent years due to their advantages over traditional centralized Secur. 91 (2020) 101707.
[3] K.A. da Costa, J.P. Papa, C.O. Lisboa, R. Munoz, V.H.C. de Albuquerque,
learning approaches. In this work, we provided an overview about the
Internet of Things: A survey on machine learning-based intrusion detection
current research efforts for the application of FL toward the devel- approaches, Comput. Netw. 151 (2019) 147–157.
opment of IDS approaches for IoT scenarios. Unlike previous works, [4] W. Ding, X. Jing, Z. Yan, L.T. Yang, A survey on data fusion in internet of
we considered several settings with different data distributions. Our things: Towards secure and privacy-preserving fusion, Inf. Fusion 51 (2019)
129–144.
evaluation demonstrates the impact of non-iid and highly skewed data
[5] T. Iggena, E. Bin Ilyas, M. Fischer, R. Tönjes, T. Elsaleh, R. Rezvani, N.
distributions on the FL performance, which directly affects the ef- Pourshahrokhi, S. Bischof, A. Fernbach, J. Xavier Parreira, et al., IoTCRawler:
fectiveness of the security attack detection. We demonstrate that an Challenges and solutions for searching the Internet of Things, Sensors 21 (5)
instance selection process based on the Shannon entropy of each local (2021) 1559.
[6] B. McMahan, E. Moore, D. Ramage, S. Hampson, B.A. y Arcas, Communication-
dataset can improve the overall accuracy obtaining similar results
efficient learning of deep networks from decentralized data, in: Artificial
compared with a scenario where the dataset is balanced among the Intelligence and Statistics, PMLR, 2017, pp. 1273–1282.
parties. Toward this end, we evaluated the use of the FedAvg and Fed+ [7] Y. Zhao, M. Li, L. Lai, N. Suda, D. Civin, V. Chandra, Federated learning with
aggregation functions using the recently proposed ToN_IoT dataset. non-iid data, 2018, arXiv preprint arXiv:1806.00582.
[8] T.D. Nguyen, S. Marchal, M. Miettinen, H. Fereidooni, N. Asokan, A.-R. Sadeghi,
Furthermore, based on our evaluation and the analysis of existing
D¨iot: A federated self-learning anomaly detection system for IoT, in: 2019 IEEE
literature, we described the main challenges to be considered in the 39th International Conference on Distributed Computing Systems, ICDCS, IEEE,
coming years for the deployment of FL-enabled IDS in IoT. As future 2019, pp. 756–767.
work, we will address some of such challenges by deploying a FL- [9] N.A.A.-A. Al-Marri, B.S. Ciftler, M.M. Abdallah, Federated mimic learning for
enabled IDS approach in real IoT scenarios to assess its feasibility in privacy preserving intrusion detection, 2020, ArXiv:2012.06974v1.
[10] T. Huong, P.B. Ta, D. Long, B. Thang, N. Binh, T. Luong, K.P. TRAN, LocKedge:
environments with constrained devices and networks. Furthermore, we Low-complexity cyberattack detection in IoT edge computing, IEEE Access PP
will analyze the potential application of personalized FL, where each (2021).
node uses the most appropriate learning model, in order to improve [11] X. Hei, X. Yin, Y. Wang, J. Ren, L. Zhu, A trusted feature aggregator federated
the overall accuracy for attack detection in IoT scenarios. learning for distributed malicious attack detection, Comput. Secur. 99 (2020)
102033.
[12] V. Rey, P.M.S. Sánchez, A.H. Celdrán, G. Bovet, M. Jaggi, Federated learning
CRediT authorship contribution statement for malware detection in IoT devices, 2021, arXiv preprint arXiv:2104.09994.
[13] P. M, S.P.R. M, Q.-V. Pham, K. Dev, P.K.R. Maddikunta, T.R. Gadekallu, T.
Huynh-The, Fusion of federated learning and industrial Internet of Things: A
Enrique Mármol Campos: Conceptualization, Methodology, Soft- survey, 2021, arXiv:2101.00798.
ware, Validation, Formal analysis, Investigation, Data curation, Writing [14] S. Agrawal, S. Sarkar, O. Aouedi, G. Yenduri, K. Piamrat, S. Bhattacharya, P.K.R.
– original draft, Writing – review & editing. Pablo Fernández Saura: Maddikunta, T.R. Gadekallu, Federated learning for intrusion detection system:
Concepts, challenges and future directions, 2021, arXiv:2106.09527.
Conceptualization, Methodology, Software, Validation, Formal analy- [15] F. Sattler, S. Wiedemann, K.-R. Müller, W. Samek, Robust and communication-
sis, Investigation, Data curation, Writing – original draft, Writing – efficient federated learning from non-iid data, IEEE Trans. Neural Netw. Learn.
review & editing. Aurora González-Vidal: Conceptualization, Method- Syst. 31 (9) (2019) 3400–3413.
ology, Formal analysis, Investigation, Writing – original draft, Writing [16] X. Li, K. Huang, W. Yang, S. Wang, Z. Zhang, On the convergence of fedavg
on non-iid data, 2019, arXiv preprint arXiv:1907.02189.
– review & editing. José L. Hernández-Ramos: Conceptualization,
[17] T.M. Booij, I. Chiscop, E. Meeuwissen, N. Moustafa, F.T. den Hartog, ToN_IoT:
Methodology, Validation, Investigation, Writing – original draft, Writ- The role of heterogeneity and the need for standardization of features and
ing – review & editing. Jorge Bernal Bernabé: Conceptualization, attack types in IoT network intrusion datasets, IEEE Internet Things J. (2021).
Investigation, Writing – original draft, Writing – review & editing. Gi- [18] A. Alsaedi, N. Moustafa, Z. Tari, A. Mahmood, A. Anwar, ToN_IoT telemetry
dataset: a new generation dataset of IoT and iIoT for data-driven intrusion
anmarco Baldini: Writing – original draft, Writing – review & editing.
detection systems, IEEE Access 8 (2020) 165130–165150.
Antonio Skarmeta: Resources, Writing – original draft, Supervision, [19] J.A. Bonachela, H. Hinrichsen, M.A. Munoz, Entropy estimates of small data
Project administration, Funding acquisition. sets, J. Phys. A 41 (20) (2008) 202001.
[20] Evaluating-FL-for-intrusion-detection-in-IoT-review-and-challenges datasets,
2021, URL https://github.com/Enrique-Marmol/Evaluating-FL-for-Intrusion-
Declaration of competing interest Detection-in-IoT-review-and-challenges.
[21] P. Yu, L. Wynter, S.H. Lim, Fed+: A family of fusion algorithms for federated
learning, 2020, arXiv:2009.06303.
The authors declare that they have no known competing finan-
[22] H. Ludwig, N. Baracaldo, G. Thomas, Y. Zhou, A. Anwar, S. Rajamoni, Y. Ong, J.
cial interests or personal relationships that could have appeared to Radhakrishnan, A. Verma, M. Sinn, et al., Ibm federated learning: an enterprise
influence the work reported in this paper. framework white paper v0. 1, 2020, arXiv preprint arXiv:2007.10987.
14
[23] A. Khraisat, I. Gondal, P. Vamplew, J. Kamruzzaman, Survey of intrusion [51] V. Mothukuri, P. Khare, R.M. Parizi, S. Pouriyeh, A. Dehghantanha, G. Srivas-
detection systems: techniques, datasets and challenges, Cybersecurity 2 (1) tava, Federated learning-based anomaly detection for IoT security attacks, IEEE
(2019) 1–22. Internet Things J. (2021).
[24] R. Chapaneri, S. Shah, A comprehensive survey of machine learning-based [52] N. Moustafa, J. Slay, UNSW-NB15: a comprehensive data set for network
network intrusion detection, Smart Intell. Comput. Appl. (2019) 345–356. intrusion detection systems (UNSW-NB15 network data set), in: 2015 Military
[25] H. Liu, B. Lang, Machine learning and deep learning methods for intrusion Communications and Information Systems Conference, MilCIS, 2015, pp. 1–6,
detection systems: A survey, Appl. Sci. 9 (20) (2019) 4396. http://dx.doi.org/10.1109/MilCIS.2015.7348942.
[26] C. Yin, Y. Zhu, J. Fei, X. He, A deep learning approach for intrusion detection [53] R. Dey, F.M. Salem, Gate-variants of gated recurrent unit (GRU) neural
using recurrent neural networks, Ieee Access 5 (2017) 21954–21961. networks, in: 2017 IEEE 60th International Midwest Symposium on Circuits
[27] A. Drewek-Ossowicka, M. Pietrołaj, J. Rumiński, A survey of neural networks and Systems, MWSCAS, IEEE, 2017, pp. 1597–1600.
usage for intrusion detection systems, J. Ambient Intell. Humaniz. Comput. 12 [54] M. Antonakakis, T. April, M. Bailey, M. Bernhard, E. Bursztein, J. Cochran, Z.
(2021) 497–514. Durumeric, J.A. Halderman, L. Invernizzi, M. Kallitsis, et al., Understanding the
[28] W. Liang, K.-C. Li, J. Long, X. Kui, A.Y. Zomaya, An industrial network intrusion mirai botnet, in: 26th {𝑈 𝑆𝐸𝑁𝐼𝑋} Security Symposium {𝑈 𝑆𝐸𝑁𝐼𝑋} Security
detection algorithm based on multifeature data clustering optimization model, 17, 2017, pp. 1093–1110.
IEEE Trans. Ind. Inf. 16 (3) (2019) 2063–2071. [55] S. Stolfo, S. Stolfo, KDD cup 1999 dataset, 1999, UCI KDD Repository, http:
[29] M.A. Ferrag, L. Shu, H. Djallel, K.-K.R. Choo, Deep learning-based intrusion //Kdd.Ics.Uci.Edu.
detection for distributed denial of service attack in agriculture 4.0, Electronics [56] M. Tavallaee, E. Bagheri, W. Lu, A.A. Ghorbani, A detailed analysis of the KDD
10 (11) (2021) 1257. cup 99 data set, in: 2009 IEEE Symposium on Computational Intelligence for
[30] M. Ge, N.F. Syed, X. Fu, Z. Baig, A. Robles-Kelly, Towards a deep learning- Security and Defense Applications, IEEE, 2009, pp. 1–6.
driven intrusion detection approach for Internet of Things, Comput. Netw. 186 [57] K. Wei, J. Li, M. Ding, C. Ma, H.H. Yang, F. Farokhi, S. Jin, T.Q. Quek, H.V.
(2021) 107784. Poor, Federated learning with differential privacy: Algorithms and performance
[31] N. Garcia, T. Alcaniz, A. González-Vidal, J.B. Bernabe, D. Rivera, A. Skarmeta, analysis, IEEE Trans. Inf. Forensics Secur. 15 (2020) 3454–3469.
Distributed real-time SlowDoS attacks detection over encrypted traffic using [58] I. Sharafaldin, A.H. Lashkari, A.A. Ghorbani, Toward generating a new intrusion
artificial intelligence, J. Netw. Comput. Appl. 173 (2021) 102871. detection dataset and intrusion traffic characterization., in: ICISSp, 2018, pp.
[32] S.A. Rahman, H. Tout, C. Talhi, A. Mourad, Internet of Things intrusion 108–116.
detection: Centralized, on-device, or federated learning? IEEE Netw. 34 (6) [59] I. Hubara, M. Courbariaux, D. Soudry, R. El-Yaniv, Y. Bengio, Binarized neural
(2020) 310–317. networks, in: Proceedings of the 30th International Conference on Neural
Information Processing Systems, 2016, pp. 4114–4122.
[33] M. Antonakakis, T. April, M. Bailey, M. Bernhard, E. Bursztein, J. Cochran, Z.
[60] E.B. Beigi, H.H. Jazi, N. Stakhanova, A.A. Ghorbani, Towards effective feature
Durumeric, J.A. Halderman, L. Invernizzi, M. Kallitsis, D. Kumar, C. Lever,
selection in machine learning-based botnet detection approaches, in: 2014
Z. Ma, J. Mason, D. Menscher, C. Seaman, N. Sullivan, K. Thomas, Y.
IEEE Conference on Communications and Network Security, IEEE, 2014, pp.
Zhou, Understanding the Mirai Botnet, in: Proceedings of the 26th USENIX
247–255.
Conference on Security Symposium, in: SEC’17, USENIX Association, USA, ISBN:
9781931971409, 2017, pp. 1093–1110. [61] J. Bernstein, Y.-X. Wang, K. Azizzadenesheli, A. Anandkumar, signSGD: COm-
pressed optimisation for non-convex problems, in: International Conference on
[34] J. Kroustek, V. Iliushin, A. Shirokova, J. Neduchal, M. Hron, Torii botnet - Not
Machine Learning, PMLR, 2018, pp. 560–569.
another Mirai variant, Avast, 2018, https://blog.avast.com/new-torii-botnet-
[62] G.E. Hinton, Deep belief networks, Scholarpedia 4 (5) (2009) 5947.
threat-research.
[63] Y. Meidan, M. Bohadana, Y. Mathov, Y. Mirsky, A. Shabtai, D. Breitenbacher,
[35] M. Eskandari, Z.H. Janjua, M. Vecchio, F. Antonelli, Passban IDS: an intelligent
Y. Elovici, N-baiot—network-based detection of iot botnet attacks using deep
anomaly-based intrusion detection system for IoT edge devices, IEEE Internet
autoencoders, IEEE Pervasive Comput. 17 (3) (2018) 12–22.
Things J. 7 (8) (2020) 6882–6897.
[64] D. Yin, Y. Chen, R. Kannan, P. Bartlett, Byzantine-robust distributed learning:
[36] S.A. Rahman, H. Tout, H. Ould-Slimane, A. Mourad, C. Talhi, M. Guizani, A
Towards optimal statistical rates, in: International Conference on Machine
survey on federated learning: The journey from centralized to distributed on-site
Learning, PMLR, 2018, pp. 5650–5659.
learning and Beyond, IEEE Internet Things J. (2020).
[65] N. Koroniotis, N. Moustafa, E. Sitnikova, B. Turnbull, Towards the development
[37] T. Nishio, R. Yonetani, Client selection for federated learning with hetero-
of realistic botnet dataset in the internet of things for network forensic analytics:
geneous resources in mobile edge, in: ICC 2019-2019 IEEE International
Bot-iot dataset, Future Gener. Comput. Syst. 100 (2019) 779–796.
Conference on Communications, ICC, IEEE, 2019, pp. 1–7.
[66] A. Alsaedi, N. Moustafa, Z. Tari, A. Mahmood, A. Anwar, ToN_IoT telemetry
[38] T. Li, A.K. Sahu, M. Zaheer, M. Sanjabi, A. Talwalkar, V. Smith, Federated
dataset: A new generation dataset of IoT and iIoT for data-driven intrusion
optimization in heterogeneous networks, 2018, arXiv preprint arXiv:1812.
detection systems, IEEE Access 8 (2020) 165130–165150, http://dx.doi.org/10.
06127.
1109/ACCESS.2020.3022862.
[39] V. Mothukuri, R.M. Parizi, S. Pouriyeh, Y. Huang, A. Dehghantanha, G.
[67] A. Guerra-Manzanares, J. Medina-Galindo, H. Bahsi, S. Nõmm, MedBIoT:
Srivastava, A survey on security and privacy of federated learning, Future
Generation of an IoT Botnet dataset in a medium-sized IoT network, in: ICISSP,
Gener. Comput. Syst. 115 (2020) 619–640.
2020, pp. 207–218.
[40] A. Feraudo, P. Yadav, V. Safronov, D.A. Popescu, R. Mortier, S. Wang, [68] I. Ullah, Q.H. Mahmoud, A scheme for generating a dataset for anomalous
P. Bellavista, J. Crowcroft, CoLearn: Enabling federated learning in MUD- activity detection in IoT networks, in: Canadian Conference on Artificial
compliant IoT edge networks, in: Proceedings of the Third ACM International Intelligence, Springer, 2020, pp. 508–520.
Workshop on Edge Systems, Analytics and Networking, 2020, pp. 25–30.
[69] Machine learning-based NIDS datasets, URL https://staff.itee.uq.edu.au/marius/
[41] D.C. Nguyen, M. Ding, P.N. Pathirana, A. Seneviratne, J. Li, H.V. Poor, NIDS_datasets/.
Federated learning for Internet of Things: A comprehensive survey, 2021, arXiv [70] A.H. Lashkari, G. Draper-Gil, M.S.I. Mamun, A.A. Ghorbani, Characterization of
preprint arXiv:2104.07914. tor traffic using time based features, in: ICISSp, 2017, pp. 253–262.
[42] J. Li, L. Lyu, X. Liu, X. Zhang, X. Lyu, FLEAM: A federated learning empowered [71] D. Böhning, Multinomial logistic regression algorithm, Ann. Inst. Statist. Math.
architecture to mitigate DDoS in industrial IoT, 2020, ArXiv:2012.06150. 44 (1) (1992) 197–200.
[43] A.K. Chathoth, A. Jagannatha, S. Lee, Federated intrusion detection for IoT with [72] Logistic regression explained, URL https://towardsdatascience.com/logistic-
heterogeneous Cohort privacy, 2021, ArXiv:2101.09878v1. regression-explained-9ee73cede081.
[44] Q. Qin, K. Poularakis, K.K. Leung, L. Tassiulas, Line-speed and scalable intrusion [73] J. Pang, Y. Huang, Z. Xie, Q. Han, Z. Cai, Realizing the heterogeneity: A self-
detection at the network edge via federated learning, in: 2020 IFIP Networking organized federated learning framework for IoT, IEEE Internet Things J. 8 (5)
Conference (Networking), 2020, pp. 352–360. (2021) 3088–3098, http://dx.doi.org/10.1109/JIOT.2020.3007662.
[45] T.V. Khoa, Y.M. Saputra, D.T. Hoang, N.L. Trung, D. Nguyen, N.V. Ha, E. [74] P. Kairouz, H.B. McMahan, B. Avent, A. Bellet, M. Bennis, A.N. Bhagoji, K.
Dutkiewicz, Collaborative learning model for cyberattack detection systems in Bonawitz, Z. Charles, G. Cormode, R. Cummings, et al., Advances and open
IoT industry 4.0, in: 2020 IEEE Wireless Communications and Networking problems in federated learning, 2019, arXiv preprint arXiv:1912.04977.
Conference, WCNC, 2020, pp. 1–6, http://dx.doi.org/10.1109/WCNC45663. [75] A. Imteaj, U. Thakker, S. Wang, J. Li, M.H. Amini, Federated learning for
2020.9120761. resource-constrained IoT devices:Panoramas and state-of-the-art, 2020, ArXiv:
[46] V. Rey, fed_iot_guard, URL https://github.com/ValerianRey/fed_iot_guard. 2002.10610v1.
[47] B. Li, Y. Wu, J. Song, R. Lu, T. Li, L. Zhao, DeepFed: Federated deep learning [76] W.Y.B. Lim, N.C. Luong, D.T. Hoang, Y. Jiao, Y.-C. Liang, Q. Yang, D. Niyato,
for IntrusionDetection in industrial cyber-physical systems, IEEE (2020). C. Miao, Federated learning in mobile edge networks: A comprehensive survey,
[48] T. Morris, W. Gao, Industrial control system traffic data sets for intrusion detec- IEEE Commun. Surv. Tutor. 22 (3) (2020) 2031–2063.
tion research, in: International Conference on Critical Infrastructure Protection, [77] Y. Ye, S. Li, F. Liu, Y. Tang, W. Hu, Edgefed: optimized federated learning
Springer, 2014, pp. 65–78. based on edge computing, IEEE Access 8 (2020) 209191–209198.
[49] M. Grinberg, Flask-socketio documentation, 2021. [78] X. Hei, X. Yin, Y. Wang, J. Ren, L. Zhu, A trusted feature aggregator federated
[50] N. Ketkar, Introduction to keras, in: Deep Learning with Python, Springer, 2017, learning for distributed malicious attack detection, Comput. Secur. 99 (2020)
pp. 97–111. 102033.
15
[79] S. Wang, T. Tuor, T. Salonidis, K.K. Leung, C. Makaya, T. He, K. Chan, Enrique Mármol Campos is a Ph.D. Student at the uni-
Adaptive federated learning in resource constrained edge computing systems, versity of Murcia. He graduated in Mathematics in 2018.
2019, ArXiv:1804.05271v3. Then, in 2019, he finished the M.S. in advanced math, in the
[80] A. Gonzalez-Vidal, P. Barnaghi, A.F. Skarmeta, Beats: Blocks of eigenvalues specialty of operative research and statistic, at the university
algorithm for time series segmentation, IEEE Trans. Knowl. Data Eng. 30 (11) of Murcia. He is currently researching on federated learning
(2018) 2051–2064. applied to cybersecurity in IoT devices.
[81] M. Mafarja, A.A. Heidari, M. Habib, H. Faris, T. Thaher, I. Aljarah, Augmented
whale feature selection for IoT attacks: Structure, analysis and applications,
Future Gener. Comput. Syst. 112 (2020) 18–40.
[82] A. Gonzalez-Vidal, F. Jimenez, A.F. Gomez-Skarmeta, A methodology for
energy multivariate time series forecasting in smart buildings based on feature
Pablo Fernández Saura received the B.S. degree in com-
selection, Energy Build. 196 (2019) 71–82.
puter science from the University of Murcia. Currently, he is
[83] P. Warden, D. Situnayake, Tinyml: Machine Learning with Tensorflow Lite on
studying a M.S. in New Technologies in Computer Science
Arduino and Ultra-Low-Power Microcontrollers, " O’Reilly Media, Inc.", 2019.
while working as a researcher at the University of Murcia in
[84] A. Mathur, D.J. Beutel, P.P.B. de Gusmão, J. Fernandez-Marques, T. Topal, X.
several European projects such as H2020 CyberSec4Europe
Qiu, T. Parcollet, Y. Gao, N.D. Lane, On-device federated learning with flower,
and H2020 Inspire-5Gplus. His main research interests are
2021, arXiv preprint arXiv:2104.03042.
in the field of cybersecurity, artificial intelligence and 5G
[85] B.B. Zarpelão, R.S. Miani, C.T. Kawakani, S.C. de Alvarenga, A survey of networks.
intrusion detection in Internet of Things, J. Netw. Comput. Appl. 84 (2017)
25–37.
[86] Z. Shelby, K. Hartke, C. Bormann, The constrained application protocol (CoAP),
2014. Aurora Gonzalez Vidal graduated in Mathematics from
[87] Z. Zheng, S. Xie, H.-N. Dai, X. Chen, H. Wang, Blockchain challenges and the University of Murcia in 2014. In 2015 she got a
opportunities: A survey, Int. J. Web Grid Serv. 14 (4) (2018) 352–375. fellowship to work in the Statistical Division of the Research
[88] M. Ali, H. Karimipour, M. Tariq, Integration of blockchain and federated Support Service, where she specialized in Statistics and Data
learning for Internet of Things: Recent advances and future challenges, Comput. Analysis. Afterward, she studied a Big Data Master. In 2019,
Secur. (2021) 102355. she got a Ph.D. in Computer Science. Currently, she is a
[89] Y. Zhao, J. Zhao, L. Jiang, R. Tan, D. Niyato, Z. Li, L. Lyu, Y. Liu, Privacy- postdoctoral researcher at the University of Murcia. She
preserving blockchain-based federated learning for IoT devices, IEEE Internet has collaborated in several national and European projects
Things J. (2020). such as ENTROPY, IoTCrawler, and DEMETER. Her research
[90] Y. Qi, M.S. Hossain, J. Nie, X. Li, Privacy-preserving blockchain-based federated covers machine learning in IoT-based environments, missing
learning for traffic flow prediction, Future Gener. Comput. Syst. 117 (2021) values imputation, and time-series segmentation. She is the
328–337. president of the R Users Association UMUR.
[91] G. Wood, et al., Ethereum: A secure decentralised generalised transaction
ledger, Ethereum Proj. Yellow Pap. 151 (2014) (2014) 1–32. José L. Hernández-Ramos received the Ph.D. degree in
[92] R. Hu, Y. Guo, E.P. Ratazzi, Y. Gong, Differentially private federated learning for computer science from the University of Murcia, Spain. He
resource-constrained Internet of Things, 2020, arXiv preprint arXiv:2003.12705. is currently a Scientific Project Officer with the European
[93] J. Hamer, M. Mohri, A.T. Suresh, FedBoost: A communication-efficient algo- Commission, Joint Research Centre. His research interests
rithm for federated learning, in: International Conference on Machine Learning, include the application of security and privacy mecha-
PMLR, 2020, pp. 3973–3983. nisms in the IoT and transport systems scenarios, including
[94] Y. Liu, N. Kumar, Z. Xiong, W.Y.B. Lim, J. Kang, D. Niyato, Communication- blockchain and federated learning. He has participated
efficient federated learning for anomaly detection in industrial Internet of in different European research projects, such as SocIoTal,
Things, in: GLOBECOM, 2020, 2020, pp. 1–6. SMARTIE, and SerIoT. He has served as a technical program
[95] N. Guha, A. Talwalkar, V. Smith, One-shot federated learning, 2019, arXiv committee and chair member for different international
preprint arXiv:1902.11175. conferences.
[96] S. AbdulRahman, H. Tout, A. Mourad, C. Talhi, FedMCCS: Multicriteria client
selection model for optimal IoT federated learning, IEEE Internet Things J. 8 Jorge Bernal Bernabe received the B.S., M.S., and Ph.D.
(6) (2020) 4723–4735. degrees in computer science and the M.B.A. degree from the
[97] I. Mohammed, S. Tabatabai, A. Al-Fuqaha, F. El Bouanani, J. Qadir, B. University of Murcia, Spain. He is Assistant professor in the
Qolomany, M. Guizani, Budgeted online selection of candidate IoT clients to University of Murcia. He has been a Visiting Researcher with
participate in federated learning, IEEE Internet Things J. (2020). the Hewlett-Packard Laboratories and University of the West
[98] Y. Zhan, P. Li, Z. Qu, D. Zeng, S. Guo, A learning-based incentive mechanism of Scotland. He has authored several book chapters and
for federated learning, IEEE Internet Things J. 7 (7) (2020) 6360–6368. more than 60 articles in international top-level conferences
and journals. During the last years, he has been working in
[99] J.L. Hernandez-Ramos, J.A. Martinez, V. Savarino, M. Angelini, V. Napolitano,
several European research projects, such as SocIoTal, ARIES,
A.F. Skarmeta, G. Baldini, Security and privacy in Internet of Things-enabled
OLYMPUS, ANASTACIA, INSPIRE-5G and CyberSec4EU.
smart cities: Challenges and future directions, IEEE Secur. Priv. 19 (1) (2020)
12–23.
[100] J.L.H. Ramos, S.N. Matheu, A. Feraudo, G. Baldini, J.B. Bernabe, P. Yadav, A. Gianmarco Baldini received the Laurea degree in Elec-
Skarmeta, P. Bellavista, Defining the behavior of IoT devices through the MUD tronic Engineering from the University of Rome in 1993
standard: review, challenges and research directions, IEEE Access (2021). and his Ph.D. degree in computer science at the University
[101] J.L. Hernández-Ramos, G. Baldini, S.N. Matheu, A. Skarmeta, Updating IoT of Insubria in 2019. He was worked in the RD departments
devices: challenges and potential approaches, in: 2020 Global Internet of Things in the field of wireless communications in Italy, Ireland
Summit (GIoTS), IEEE, 2020, pp. 1–5. and USA before joining the European Commission, Joint
[102] T.D. Nguyen, P. Rieger, M. Miettinen, A.-R. Sadeghi, Poisoning attacks on Research Centre (JRC) in 2007. In the JRC he has worked in
federated learning-based IoT intrusion detection system, in: NDSS Workshop wireless communications, security, positioning, and machine
on Decentralized IoT Systems and Security, 2020. learning and he has contributed to the formulation of
[103] J. Kang, Z. Xiong, D. Niyato, Y. Zou, Y. Zhang, M. Guizani, Reliable federated European policies in the areas of radio frequency spectrum,
learning for mobile networks, IEEE Wirel. Commun. 27 (2) (2020) 72–80. road transportation, and cybersecurity.
[104] C. Zhao, S. Zhao, M. Zhao, Z. Chen, C.-Z. Gao, H. Li, Y.-a. Tan, Secure multi- Antonio Skarmeta is a full professor at the University of
party computation: Theory, practice and applications, Inform. Sci. 476 (2019) Murcia, the Department of Information and Communications
357–372. Engineering. His research interests are the integration of
[105] C. Zhang, S. Li, J. Xia, W. Wang, F. Yan, Y. Liu, Batchcrypt: Efficient security services, identity, the Internet of Things, and smart
homomorphic encryption for cross-silo federated learning, in: 2020 USENIX cities. Skarmeta received a Ph.D. in computer science from
Annual Technical Conference, 2020, pp. 493–506. the University of Murcia. He has published more than 200
[106] T. Li, A.K. Sahu, A. Talwalkar, V. Smith, Federated learning:Challenges, international papers and been a member of several program
methods, and future directions, 2019, ArXiv:1908.07873v1. committees.
16

012 Evaluating Federated Learning For Intrusion Detection in Internet of Things Review and Challenges

Uploaded by

Document Informationclick to expand document information

Copyright:

Available Formats

012 Evaluating Federated Learning For Intrusion Detection in Internet of Things Review and Challenges

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

012 Evaluating Federated Learning For Intrusion Detection in Internet of Things Review and Challenges

Uploaded by

Copyright:

Available Formats

Computer Networks 203 (2022) 108661

Contents lists available at ScienceDirect

Evaluating Federated Learning for intrusion detection in Internet of Things:

ARTICLE INFO ABSTRACT

Fig. 1. Comparison between centralized, distributed and federated learning approaches.

weighted-averaging follows a similar approach to the macro-averaging, Party 0 0.5526 1.0

Fig. 2. Basic scenario’s accuracy with FedAvg.

Fig. 3. Basic scenario’s accuracy with Fed+.

Fig. 5. Balanced scenario’s accuracy with FedAvg.

Fig. 6. Balanced scenario’s accuracy with Fed+.

Fig. 8. Mixed scenario’s accuracy with FedAvg.

Fig. 9. Mixed scenario’s accuracy with Fed+.

You might also like

012 Evaluating Federated Learning For Intrusion Detection in Internet of Things Review and Challenges

Uploaded by

Document Informationclick to expand document informationEvaluating Federated Learning for Intrusion Detection in Internet of Things Review and Challenges

Document Informationclick to expand document information

Copyright:

Available Formats

012 Evaluating Federated Learning For Intrusion Detection in Internet of Things Review and Challenges

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

012 Evaluating Federated Learning For Intrusion Detection in Internet of Things Review and Challenges

Uploaded by

Copyright:

Available Formats

Computer Networks 203 (2022) 108661

Contents lists available at ScienceDirect

Evaluating Federated Learning for intrusion detection in Internet of Things:

ARTICLE INFO ABSTRACT

Fig. 1. Comparison between centralized, distributed and federated learning approaches.

weighted-averaging follows a similar approach to the macro-averaging, Party 0 0.5526 1.0

Fig. 2. Basic scenario’s accuracy with FedAvg.

Fig. 3. Basic scenario’s accuracy with Fed+.

Fig. 5. Balanced scenario’s accuracy with FedAvg.

Fig. 6. Balanced scenario’s accuracy with Fed+.

Fig. 8. Mixed scenario’s accuracy with FedAvg.

Fig. 9. Mixed scenario’s accuracy with Fed+.

You might also like