1. Introduction
Internet-of-Things (IoT) augments the physical objects (usually referred to as IoT nodes) with internet connectivity such that they can collect and share data with other nodes in the network without human interventions. To enable the secure and reliable exchange of data among IoT nodes, different communication and messaging protocols have been developed, such as Constrained Application Protocol (CoAP), Advanced Message Queuing Protocol (AMQP), Message Queuing Telemetry Transport (MQTT), and Extensible Messaging Presence Protocol (XMPP) [
1]. Among all, MQTT has been widely used in smart homes [
2,
3,
4], agricultural IoT [
5,
6], and industrial applications [
7], etc. The reasons include support for communication on low bandwidths, low memory requirements, and reduced packet loss [
1,
8,
9].
MQTT communication protocol consists of four major components, including broker (central device), clients (IoT nodes), topic, and message. The topic in the MQTT protocol also contains information about the source and destination nodes of transmission messages among the networks. These topics are structured using the forward slash delimiter (/) with messages consisting of data gathered by IoT sensors. Using the MQTT messaging transport protocol, every node has three main associated tasks: topic selection, topic publication, and topic subscription [
9,
10]. Primarily, the IoT nodes (clients) of MQTT communicate with each other via a central node called a broker; a broker can be working on the edge, i.e., a local broker, or it can be on the cloud, i.e., a remote broker. It allows IoT nodes to publish or subscribe topics or publish and subscribe at the same time if the node functionality allows, as shown in
Figure 1 in general. For example, a Passive Infrared Sensor (PIS) for motion detection publishes sensed data to the broker that is subscribed by a camera. As soon as the PIS detects motion, the information is sent to the camera for further action.
The IoT has been identified as the most vulnerable network to be attacked by external, as well as internal, attackers [
11,
12,
13]. External attackers try to corrupt the system from outside the networks. On the contrary, internal attackers operate from inside the network under threat. The internal attackers, however, can access information easily as compared to external attackers. In either case, prior to initiating an attack, the attackers usually gather information to check the vulnerability of the network or system using different tools, such as Masscan, Network Mapper (NMAP), or Shodan [
14]. For example, the paper [
15] presents a case scenario of an attacker using penetrating testing tools to collect information of brokers through the Shodan tool. Furthermore, the Shodan tool provided connection codes that indicate whether a broker needs authentication or not.
The different types of threats to the broker in an MQTT protocol are illustrated in
Figure 2. As an example, by breaching the broker security and making to all topics, an attacker can expose critical information of the system. Similarly, if an attacker publishes a topic the same as any other publisher, it can control the subscribers of a given topic [
15]. For example, streetlights can be subscribers to a valid publisher in a smart streetlight system [
16]. An attacker connected to its broker can generate and send the wrong information over to control these streetlights. In addition, an internal attacker can compromise the integrity of MQTT data packets as they can have an opportunity of analysing and modifying them.
Studies show that attackers usually target central communication devices, i.e., brokers, in MQTT-based IoT systems. Denial-of-Service (DoS) [
17], Man-in-the-Middle (MitM), scanning, and Intrusion are a few examples of common attacks on brokers [
15,
17,
18,
19]. In principle, the MQTT client starts a connection with a broker by sending a connect packet, and, since MQTT works on top of TCP/IP, the broker sends connection acknowledgement (connack). After receiving acknowledgement, the client starts data transmission to the broker. MQTT protocol can provide three levels of Quality of Service (QoS) that define the level of agreement and the assurance of successful communication between a transmitter and receiver in the network. The QoS level 0 has no acknowledgement mechanism in communication between the sender and receiver. [
20]. In addition, an internal attacker sends multiple messages with QoS1 and QoS2 to make the broker busy in acknowledgements, thereby imposing a DoS attack [
18].
Machine learning (ML) has shown efficiency in different application areas, including intrusion detection systems for IoT [
21,
22,
23]. Some researchers opine that ML has the potential to not only efficiently detect but also predict the attacks given efficient data have been used to train them. Therefore, in this paper, we propose an Intrusion Detection System (IDS) for MQTT protocol based on the ML algorithm, i.e., a Deep Neural Network (DNN). The proposed DNN algorithm is evaluated on the latest dataset named MQTT-IoT-IDS2020 and the dataset (
https://joseaveleira.es/dataset; access date was 8 July 2021) discussed in [
24] that contains three well-known attacks: MitM, Intrusion, and DoS over MQTT. The selected datasets [
24,
25] are generated in an MQTT simulated environment. In MQTT-IoT-IDS2020, there are three abstract-level features, such as Packet-flow, Uni-flow, and Bi-flow, as mentioned in [
25]. Detailed statistics of this dataset will be found in the upcoming section of this paper. The contributions can be summarised as follows.
A DNN is proposed in this work for intrusion detection in MQTT-based protocol. Additionally, a number of ML models have been evaluated and compared for three different scenarios, including Bi-flow, Uni-flow, and Packet-flow, of abstract levels in the MQTT-IoT-IDS2020 dataset. The evaluation has been performed for binary as well as multi-class classification.
The performance of the proposed DNN model is also evaluated for different attacks, including DoS, Intrusion, and MitM, in another dataset [
24].
The remainder of this paper is organised as follows.
Section 2 presents a literature review and a detailed discussion about related works.
Section 3 provides a detailed explanation of the proposed intrusion detection system and other classical ML models.
Section 4 illustrates the experimental setup, dataset selection criteria, results, and a discussion of the results.
Section 5 concludes the paper and highlights potential future directions.
2. Related Works
IoT security is an open research area currently being addressed by researchers around the globe. Different security-enhancing methods have been proposed to protect IoT against anomalous adversarial attacks. These methods commonly aim at detecting intruders in the network by monitoring network activities, such as data flow rate. Here, a short literature review is presented to put forward current advances in IoT security, with a focus on intrusion detection systems targeting the MQTT messaging protocol of IoT. The authors of [
26] presented an attack detection strategy for MQTT protocols based on a process tree. It models the network behaviour in terms of hierarchical branches of a tree, where it is further applied to detect attacks or anomalous behaviours. The model is evaluated using a detection rate where a total of four common types of attacks are induced in the network. However, newly developed adversarial attacks and intrusions have not been addressed. Furthermore, the paper [
27] presents a fuzzy logic-based intrusion detection model specifically designed for protecting IoT nodes with the MQTT protocol against DoS attacks. Although fuzzy logic has shown its efficiency for different applications, including sensor fault detection in IoT [
28]; however, its high complexity with an increase in the input dimension limits its potential in intrusion detection for IoT where huge data are transferred continuously. In addition, more advanced and complex attacks have been left untouched in paper [
27] that raises questions over the efficiency of the proposed model for detecting other types of attacks.
ML and DL has shown efficiency in detecting complex and unknown intrusions, such as MitM, DoS, etc. [
23]. Commonly used algorithms include Support Vector Machine (SVM) [
29,
30], Semi-supervised Spatio-Temporal Deep Learning Intrusions Detection (SS-Deep-ID) [
31], and Deep Feed-Forward Neural Network (DFFNN), ref. [
32] etc.
In [
33], multiple ML algorithms, including Autoencoder, RF, K-Means clustering, and Isolation Forest (IF), are employed to detect attacks in the IoT. However, the paper does not present clarity about the type of attacks considered in this work. In addition, the authors developed and evaluated an intrusion detection strategy on the network layer of the IoT that is not necessarily based on the MQTT messaging protocol.
Faker and Dodge [
32] proposed a DL-based network intrusion detection system and evaluated it against CIC-IDS2017 and UNSW-NB15 datasets where accuracy and prediction time are used as evaluation metrics. The results show the significance of applying deep learning (DL) algorithms while designing intrusion detection systems for IoT. A total of 32 attack types from CIC-IDS2017 and UNSW-NB15 were included in the experiment using accuracy and prediction time as evaluation metrics. CIC-IDS2017 and UNSW-NB15 datasets are general purpose datasets not representing MQTT specifically. In [
34], authors have worked on a new dataset known as MQTTset and proposed various ML algorithms for intrusion detection.
In [
35], performances of eight different ML algorithms, including DNN, Logistic Regression (LR), NB, SVM, Adaptive Boosting (AB), kNN, DT, and RF, are analysed against six datasets, such as KDD-99, NSL-KDD, UNSW-NB15, Kyoto2006+, and WSN-DS CICIDS2017. Intuitively, the DL achieved the best accuracy as compared to classical ML classifiers at the cost of the high computational requirements. This paper also does not address MQTT messaging protocol-related issues.
Table 1 and
Table 2 summarise intrusion detection systems proposed in recent literature with a tick (√) in the last column indicating the given model is developed targeting the MQTT protocol. The first four columns show the reference number for the paper, the ML model exploited the evaluation method, and the evaluation metrics, respectively.
In [
24], a number of ML algorithms, such as eXtreme Gradient Boosting (XGBoost), GRUs, and LSTM, are used to design security models for the MQTT protocol in IoT. For the verification of the proposed algorithms, the author’s used the MQTT dataset containing three types of attacks such as intrusion (illegal entry), DoS, and MitM. Different ML algorithms, such as NB, RF, DT, LR, KNN and SVM, are evaluated using the MQTT-IoT-IDS2020 dataset [
25]. The acceptable performance of these ML models for the proposal of the MQTT intrusion detection system was reported. The author of [
40] proposed a single-layer ANN-based model for intrusion detection in an MQTT-enabled IoT system. The proposed model is evaluated on the KDD-99 and NSL-KDD dataset with acceptable performance measures. However, these datasets do not represent the MQTT-enabled IoT system-based environment. In [
39], the author proposed a model for anomaly-based IDS in IoT systems using a Convolutional Neural Network (CNN) and GRUs for MQTT-IoT-IDS2020. This study presents a comparison of several ML-based models for intrusion detection in MQTT-enabled IoT systems with the proposed DNN.
3. Proposed Deep Neural Network (DNN) Based Intrusion Detection System
Deep learning (DL) is a sub-field of machine learning inspired by the biological brain. These algorithms, also known as Artificial Neural Networks (ANNs), have better predictive capabilities as compared to conventional Multi-Layer Perceptron (MLP) because of a higher number of hidden layers. Primarily, ANNs consist of neurons connected with a neighbouring layer, which processes the input data using activation functions [
41] in order to predict the output. Our proposed model consists of an input layer, two fully connected hidden layers, and an output layer. The data processing from input through the hidden layer to the output layer follows forward and backward propagation.
Figure 3 shows the framework of DNN-based IDS for attack classification. The output layer is different depending upon the classification task, such as binary or multi-class. The input layer of our proposed DNN-based learning model takes into account the features of the MQTT protocol-based network, two hidden layers with Rectified Linear Unit (ReLU) activation, and an output layer with sigmoid activation in the case of binary classification and softmax for multi-class attacks classification. The reason behind choosing the softmax for multi-classification is based on our experimental results performed in this paper. As the MQTT-IoT-IDS2020 dataset contains three abstract-level features of MQTT protocol, i.e., Packet-flow, Bi-flow, and Uni-flow data. The proposed model is tested for all of the three mentioned features of MQTT contained in MQTT-IoT-IDS2020.
Figure 4 shows the number of input neurons, hidden layer, and output neurons. The data from the input layer is forward propagated through the hidden layer neurons during model training and backward propagated to update the weights and reduce the loss function until the model learns the proper weights and bias. Mathematically, the processing of data through the dense layer of neurons can be expressed as:
, where m represents the input vector size, while n is the size of the output vector. Suppose X presents the input vector such that X =
, then the mathematical computation of the hidden layer can be expressed as a product of weights and an addition of bias as in the following equation:
where
is defined as
, f is function from
defined by (2a) for the hidden layer. In Equation (
1), the
presents the bias that add to the product input and weights, i.e.,
.
An Artificial Neural Network (ANN) consists of many stacked hidden layers that become a deep network. In general, these hidden layers can be expressed mathematically via Equation (
3).
Our proposed model is tested for binary as well as multi-class attack classification. Therefore, two different activation functions at the output layer are used. For binary classification,
is calculated at the last layer via sigmoid, as presented with mathematical expression in Equation (
2c). Depending on the classification task, we utilised different cost functions; for binary attack classification, we used binary cross-entropy, as presented in Equation (
4), while in the case of multi-attack classification, we utilised categorical cross-entropy, as presented in Equation (
5). The loss function calculated the amount of difference between predicted labels and actual labels. The smaller the reduction in the loss function, the more accurate the prediction of the model. Optimisation algorithms play the main role in finding parameters in order to minimise or maximise any mathematical functions. In deep learning, such optimisation algorithms helps to reduce the cost function for particular. Out of many existing optimisation algorithms used in deep learning, we adopted Adaptive Moment Estimation (ADAM) as an optimiser to reduce the cost function of our proposed model. The ADAM optimiser combines the best feature Root Mean Square Propagation (RMSProp) optimiser and momentum. That is why it is still the best optimiser in most DL-related tasks and is used in lots of optimisation problems in deep learning function:
where
J is a function defined on
y and
,
is predicted output calculated at the last layer by sigmoid or softmax of our proposed model, and
y is the actual label,
t is the batch size, and
c denotes the class category.
3.1. Other Classical ML Models
This subsection highlights the brief theoretical concepts behind the other classical ML models that are used for cross-comparison in this study with the proposed Deep Neural Network model.
3.1.1. K-Nearest Neighbour
This learning algorithm is categorised as a supervised learning model and known as a lazy learner because of the fact that it does not learn a discriminative function from the training data rather memorise it. For example, the weights during the training process of the logistic regression are learned. The KNN algorithm is relatively straightforward, the working of KNN can be summarised in the following three main points:
Choose the number of k neighbour and distance metrics.
Locate the k neighbour of the test sample.
Assign label accordingly to the majority of the label in k neighbour.
Different distance metrics exist such as Manhattan distance, Minkowski distance, and Euclidean distance, etc. Among all of these, the euclidean is widely used as a distance metric in KNN. The Euclidean distance and Manhattan distance is a specialised form of Minkowski. The mathematical representation of these distances is given as below in Equation (
6).
where the parameter p, if changes then the above equation change to other distance metrics. For example, if
then the above equation becomes euclidean, and if
then it becomes Manhattan distance.
3.1.2. Decision Tree
This model breaks our data into a hierarchical manner, so that to make predictions on new data, that is why due to this hierarchical learning style of this model it is called a decision tree. This learning model also belongs to supervised learning and can handle both classification & regression problem. This model makes tree, where each node of the decision tree model represents an attribute and each leaf node represents a class label. The main working of the decision tree can be described as:
Find the best attribute and place it in the root of the tree.
Make subset of training data in such a way that each subset contains data with the same value for an attribute.
Repeat above two steps until reach to the leaf node.
Assume a dataset consist of n attributes, for the selection of best attribute as the root node of the tree, researcher work on mathematical measures, these mathematical measured values are used for such attribute selection, these measures are information gain and Gini-index. Mostly the information gain is used when the attribute is categorical, while for continuous attributes the Gini index is favourable. An information gain is a reduction in entropy. Entropy is a measure used to calculates the randomness of data within attributes or features of a dataset. Mathematically entropy can be represented as bellow equation
here
denotes the proportion of the sample that belongs to class
c. The following steps are used in calculating the information gain using entropy.
Mathematically it is represented as:
3.1.3. Random Forest
Random Forest learning model is a type of supervised ML model. It is an ensemble model which makes use of multiple trees in predictions of a target. This model is used for regression and classification problems. It takes n samples as input and creates multiple trees based on a subset of input features. Then on the results of every tree, a majority voting is performed in order to get the final prediction for the target class variable.
Assume m denotes the total features in data, the main working of this learning model can be summarised in the following points.
Select k number of features randomly from m features of data such that k < < m.
Calculates the best split for k selected features.
Split the node into child nodes using best split.
Repeat above until leaf node reached.
Build a forest of trees by repeating the above steps.
3.1.4. Naive Bayes
This learning model is based on Bayes rules in learning and predicting the new instances class label. Bayes theorem provides the way of calculating the posterior probability of class as depicted in Equation (
9) below.
where
indicates the posterior probability of the target class given independent variable
x,
indicates the prior probability of the target class,
represents the likelihood and
is the prior probability of independent variable. In comparison to other, the NB performs better and fast prediction of the test set. This model performs better in multi-classification problems. sci kit-learn provides three types of models for Naive Bayes, these are Gaussian, Multinomial, Bernoulli.
5. Conclusions and Future Direction
This paper presents a DNN-based intrusion detection system for MQTT-enabled IoT smart systems. A recently published MQTT-IoT-IDS2020 and another MQTT dataset are used to evaluate the performance of the proposed model. The MQTT-IoT-IDS2020 dataset contains three abstraction-level features of MQTT-enabled IoT, including Packet-flow, Bi-flow, and Uni-flow features. There are five files in each of these featured data representing attack and normal scenarios. The data were organised such that each separated feature gets a subset in order to assess the performance in binary-class and multi-class attack classification. The tests were conducted under different batch sizes, such as 32, 64, and 128, for binary and multi-classifications. The results show that increasing the batch size of the training subset improves the convergence and performance of the classifier. The performance of the proposed DL-based IDS with a default learning rate and using the ADAM optimiser was compared with the performance of conventional ML-based IDSs, including KNN, NB, DT, and RF. Furthermore, the proposed model was tested for binary-class as well as multi-class attack classification with different activation functions at the output layers. The results show that the DL-based model for Bi-flow and Uni-flow featured data can achieve 99% accuracy and 98% accuracy for binary and multi-class attack classification, respectively. However, in Packet-flow featured data, the accuracy for binary and multi-class were 94% and 90%, respectively. Additionally, we also tested the performance of the proposed model against DoS and MitM, etc., over an MQTT-based IoT system. From the results and comparison tables, it was evident that the proposed model has higher accuracy than other state-of-the-art deep learning models. In the future, we intend to investigate the vulnerability of new types of attacks on various IoT protocols. Our aim is to propose a novel deep learning-based model for new vulnerabilities.