From Zero-Shot Machine Learning To Zero-Day Attack Detection
From Zero-Shot Machine Learning To Zero-Day Attack Detection
From Zero-Shot Machine Learning To Zero-Day Attack Detection
https://doi.org/10.1007/s10207-023-00676-0
REGULAR CONTRIBUTION
Abstract
Machine learning (ML) models have proved efficient in classifying data samples into their respective categories. The standard
ML evaluation methodology assumes that test data samples are derived from pre-observed classes used in the training phase.
However, in applications such as Network Intrusion Detection Systems (NIDSs), obtaining data samples of all attack classes
to be observed is challenging. ML-based NIDSs face new attack traffic known as zero-day attacks that are not used in training
due to their non-existence at the time. Therefore, this paper proposes a novel zero-shot learning methodology to evaluate
the performance of ML-based NIDSs in recognising zero-day attack scenarios. In the attribute learning stage, the learning
models map network data features to semantic attributes that distinguish between known attacks and benign behaviour. In
the inference stage, the models construct the relationships between known and zero-day attacks to detect them as malicious.
A new evaluation metric is defined as Zero-day Detection Rate (Z-DR) to measure the effectiveness of the learning model
in detecting unknown attacks. The proposed framework is evaluated using two key ML models and two modern NIDS data
sets. The results demonstrate that for certain zero-day attack groups discovered in this paper, ML-based NIDSs are ineffective
in detecting them as malicious. Further analysis shows that attacks with a low Z-DR have a significantly distinct feature
distribution and a higher Wasserstein Distance range than the other attack classes.
Keywords Machine learning · Network Intrusion Detection System · Wasserstein Distance · Zero-day attacks · Zero-shot
learning
123
948 M. Sarhan et al.
that targeted local escalation privileges [14]. Generally, when are not available in the training phase [21]. This includes
a zero-day attack is discovered, it is added to the publicly zero-day attacks that could lead to fatal consequences for
shared Common Vulnerabilities and Exposures (CVE) list the adopting organisation if undetected [13]. A reliable ML-
[15] and defined using a CVE code and a severity level [15]. based NIDS must be evaluated across a test set of unknown
From a network layer perspective, zero-day attack detection attacks not available in the training set (unseen classes), sim-
is generally carried out by adding threat-related IOCs to a list ulating the likely scenario of a zero-day threat.
of detection databases [16] used by signature-based NIDSs. This paper proposes a new ZSL framework to evaluate the
As such, signature-based NIDSs are deemed unreliable in performance of ML-based NIDSs in recognising zero-day
detecting zero-day attacks simply because the complete set attack scenarios. The framework measures how well an ML-
of IOCs has not been discovered or registered for monitoring based NIDS can detect unseen attacks using a set of semantic
at the time of exploitation. attributes learnt from seen attacks. There are two main stages
Organisations protected by signature-based NIDS are vul- of the proposed ZSL setup. In the attribute learning stage,
nerable to zero-day attacks without the discovery of IOCs the models extract and map the network data features to
associated with the threat. Therefore, the focus has been the unique attributes of known attacks (seen classes). In
diverted to the design of ML-based NIDSs [12], an enhanced the inference phase, the model associates the relationships
modern edition of traditional NIDSs to overcome the limi- between seen and zero-day (unseen) attacks to assist in their
tations faced in the detection of zero-day or unseen attacks. discovery and classification as malicious. The training and
ML-based NIDSs are designed and deployed to scan and testing sets containing the seen and unseen classes remain
analyse incoming network traffic for any anomalies or mali- disjoint throughout the setup. Unlike traditional evaluation
cious intent [12]. The analysis process is carried out by methods, the proposed set-up aims to evaluate ML-based
comparing the behavioural pattern of the incoming network NIDS in detecting zero-day attacks using a new metric,
traffic with the learnt behaviour of safe and intrusive traffic Zero-day Detection Rate (Z-DR). The proposed method-
[17]. During the design process, the ML model is trained ology has been implemented using two widely used ML
using a set of benign and attack samples, where the hidden models in the research field. It has been evaluated on two
complex traffic pattern is learnt. Unlike the signature-based key NIDS data sets, each consisting of a wide range of mod-
NIDSs, which solely rely on IOC for detection, the ML- ern attacks. Furthermore, the results obtained were analysed
based NIDSs utilise the learnt behavioural pattern to detect using the Wasserstein Distance (WD) technique to investi-
network attacks [17]. This has a great potential of detect- gate and explain the variation in the Z-DR with different
ing zero-day attacks as the requirement of obtaining IOC attack groups. The key contributions of this paper are a)
is obsolete [18]. The main difference between signature- the proposal of a novel ZSL-based methodology to evalu-
and ML-based NIDSs is the detection engine functionality, ate NIDSs in recognition of unseen (unseen) attack types, b)
whereas signature-based detection relies on IOCs. In con- the implementation of the framework using two widely-used
trast, ML-based detection focuses on malicious and benign ML models and two modern NIDS data sets, and c) the anal-
behavioural patterns. Most of the available research work has ysis and explanation of the detection results using the WD
aimed at designing and evaluating ML-based NIDSs to detect technique. In Sect. 2, key related works are discussed, fol-
known attack groups. However, limited research has focused lowed by a detailed explanation of the proposed ZSL-based
on evaluating zero-day attack detection to measure the ben- methodology in Sect. 3. The experimental methodology fol-
efits of ML-based NIDSs over signature-based NIDSs. lowed in this paper and the results obtained are discussed and
A large number of proposed ML-based NIDSs do not explained in Sects. 4 and 5, respectively.
consider the most likely re-occurring scenario of zero-day
attacks, where a new attack class may appear after the
learning stage and deployment of the ML model. Zero-shot 2 Related works
learning (ZSL) is an emerging methodology used to evalu-
ate and improve the generalisability of ML models to new This section discusses key related papers that aim to evaluate
or unseen data classes [19]. This technique assumes that the NIDSs for the detection of zero-day attacks. Although most
training data set might not include the entire set of classes of the articles propose sophisticated ML-based NIDSs [22],
that the ML model could observe once deployed in the real the evaluation focuses on detecting a range of known attacks,
world. ZSL addresses the ever-growing set of classes that where traditional signature-based NIDSs have achieved sat-
might render it unfeasible to collect training samples for each isfactory performance throughout the years. Therefore, it is
of them [20]. ZSL involves the recognition of new data sam- surprising that only a few papers have attempted to challenge
ples derived from previously unseen classes. As such, ZSL ML-based NIDSs in the detection of unknown or zero-day
addresses one of the main challenges in building a reliable attacks. In the case of unsupervised anomaly detection sys-
NIDS: the evaluation of recognising new attack classes that tems, where the model only learns the behaviour of benign
123
From zero-shot machine learning to zero-day attack detection 949
traffic, NIDSs fundamentally work to detect each attack type the number of undetected attacks and false alarms is unmea-
as an unknown attack. However, it is noted that such models sured.
lead to many false alarms leading to alert fatigue [23], as Li et al. [26] focused on attribute learning methods to
it does not consider the attack behavioural pattern. Overall, detect unknown attack types. The authors followed a ZSL
a limited number of papers follow a ZSL methodology to method to design an NIDS to overcome the limitations in
detect zero-day attacks. To the best of our knowledge, none anomaly detection faced by current methods. The architec-
of these works has aimed to utilise modern network data sets ture involves a pipeline using a Random Forest (RF) feature
that represent current network traffic characteristics to eval- selection and a spatial clustering attribute conversion method.
uate their approach. The results demonstrate that the proposed method overcomes
In [24], the author has evaluated the zero-day attack the state-of-the-art approaches in anomaly detection. The
detection performance using a signature-based NIDS. The attribute learning framework converts network data samples
paper studies the frequent claim that such systems cannot into unsupervised cluster attributes. The NSL-KDD data set
detect zero-day attacks. The experiment studies 356 network has been used to evaluate the proposed framework, where it
attacks, of which 183 are unknown (zero-day) to the ruleset. could detect DoS (apache2) and Probe (saint) attacks achiev-
The paper utilised the Snort tool, a well-known signature- ing an overall accuracy of 34.71%. The authors compared its
based NIDS. The Metasploit framework is used to simulate performance with a decision tree classifier with a poor overall
attack scenarios. The detection rate is calculated by apply- accuracy of 13.59%.
ing a Snort rule set that does not disclose the vulnerabilities In [27], Kumar et al. propose a robust detection model
relevant to the attack. The results show that Snort has an to detect zero-day attacks. The model utilises the concept
unreliable detection rate of 17% against zero-day attacks. of high-volume attacks to derive high-traffic volume attacks
The paper argues that the frequent claim that signature-based using heavy-hitter low-volume attacks to derive signatures
NIDSs cannot detect zero-day attacks is incorrect, since 17% for low-volume attacks using the graph technique. The pro-
is significantly larger than zero. The author mentions that posed framework consists of two stages Signature generation
more mechanisms should be implemented to complement and an evaluation phase. The detection accuracy is evaluated
signature-based NIDS in detecting unregistered attacks. The using signatures generated in the training phase. Using a real-
results of this paper can be seen as a baseline for zero-day time attack data set, 91.33% and 90.35% accuracies were
attack detection. achieved following binary- and multi-classification meth-
Zhang et al. [21] have evaluated ML-based NIDSs detec- ods. Using a benchmark CICIDS18 dataset, a performance
tion performance against zero-day attacks. The authors have of 91.62% and 88.98% was achieved.
used ZSL to simulate the occurrence of zero-day attack In general, several studies have evaluated the performance
scenarios. The authors used a sparse autoencoder model of ML-based NIDSs in detecting unknown attacks. However,
that projects the features of known attacks into a seman- only a small number adopted the emerging ZSL-based setup
tic space and establishes a feature-to-semantic mapping to to simulate the occurrence of zero-day attacks. Moreover,
detect unknown attacks. ML models learn the distinguish- minimal experimental work has been done on current zero-
ing information between the attack and benign classes by day attack scenarios with recent data sets and attack types,
mapping the feature and attribute space. The paper used the which limits the identification of sophisticated attacks that
attacks present in the NSL-KDD data set, released in 1998, to cannot be detected in zero-day scenarios. In addition, it is
simulate a zero-day scenario; the data set contains four attack surprising that some recent work still uses the NSL-KDD
scenarios. The results demonstrate that the average accuracy data set for evaluation purposes, given that it is more than 20
is 88.3% for all available attacks in the data set. years old. The attack scenarios in the data set do not represent
In [25], Hindy et al. aimed to improve unsupervised modern network traffic characteristics and threats, limiting
outlier-based detection systems that generally suffer from the reliability and evaluation of the proposed methodology
a high false alarm rate (FAR). The paper explored an autoen- [28]. In this paper, a ZSL approach is proposed to evaluate
coder to detect zero-day attacks to maintain a high detection ML models in the recognition of a broader range of modern
rate while lowering the FAR. The system is evaluated across zero-day attacks. A new metric defined as Z-DR is utilised
two key data sets; CICIDS2017 and NSL-KDD. The method- to measure the detection accuracy of each unseen class. The
ology involved training the classifiers using benign data results presented in this paper are explained and analysed
samples and evaluating the detection of zero-day attacks. The using the WD technique to provide additional insights.
results are compared to a one-class support vector machine,
where the autoencoder is superior. The results demonstrate
a zero-day detection accuracy of 89–99% for the NSL-KDD
data set and 75–98% for the CICIDS2017 data set. However,
the proposed models do not consider attack behaviour, and
123
950 M. Sarhan et al.
3 Proposed methodology ZSL techniques have been adopted to address such short-
comings in the evaluation of ML systems that are required
In a traditional ML evaluation methodology, the learning to detect a more extensive set of classes than the one used
model is trained and tested on the same set of data classes. The in training. ZSL was developed principally to overcome the
model learns to identify patterns directly from each data class issue of not having training samples available. ZSL is a
in the training stage. In the testing stage, the model applies promising approach to leverage supervised learning for the
the learnt patterns to identify the data samples derived from recognition of unavailable training data samples [19]. Unlike
the same data classes used in the training stage. The data set traditional ML methods, the objective of ZSL is to improve
used in an experimental set-up is split into training and test- the recognition of unseen classes by generalising the learning
ing partitions. The learning model is trained on the training model to data samples not derived from pre-observed classes.
set that contains the same number and type of classes present This approach overcomes the limitation of evaluating ML-
in the test set used in the evaluation stage. This evaluation based NIDSs in the detection of zero-day attacks because
approach follows the assumption that the data set collected the collection of data samples of zero-day attacks remains an
for the training of ML models includes the complete set of impossible task simply due to their absence at the time of the
classes that the model will observe post-deployment in pro- ML-based NIDSs development phase.
duction. In the case of currently proposed ML-based NIDSs, In this paper, we propose a ZSL-based methodology,
the model is trained and tested using a set of known attack illustrated in Fig. 1, to evaluate ML-based NIDSs in the
classes. Therefore, the model is evaluated to determine how recognition of zero-day attacks. The proposed methodol-
well it can detect data samples derived from known attack ogy will overcome the necessity of collecting training data
groups as malicious. samples of all the attack classes that the model will observe
The training set Dtr and testing set Dtst of a NIDS data post-deployments. In the attribute learning stage, the model
set can be represented as follows: captures the semantic attributes of the attack behaviour using
a set of known attacks and benign data samples. The attributes
Dtr = {(x, y)|x ∈ X tr , y ∈ Ytr } (1) present the distinguishing vectors between the attack and
Dtst = {(x, y)|x ∈ X tst , y ∈ Ytst } (2) benign network traffic. In the inference stage, the learnt
knowledge is utilised to reconstruct the relationships between
known attack classes and the zero-day attack to classify the
wher e X tr ⊂ X , X tst ⊂ X unseen threat as malicious. Three main data concepts exist
as part of the proposed methodology: 1) Known attacks—
in which x represents a data sample (flow) chosen from the these are precedent attacks for which labelled data samples
training sets X tr and testing sets X tst . X represents all data are available during training. 2) Zero-day attacks—these
samples, y represents the corresponding labels, Ytr represents unknown attacks will emerge post-deployment for which
the set of class labels observed in the training phase, and Ytst labelled data samples are unavailable during training. 3)
represents the set of class labels used in the testing phase. In Semantic attributes—the distinguishing information that the
traditional ML, Ytr = Ytst , that is, the set of classes observed ML model will learn from the known attacks to detect the
during the training phase is identical to the set of classes zero-day attacks.
encountered by the model during testing. The proposed methodology assumes that the model is
The traditional ML set-up has been commonly used in evaluated using zero-shot samples derived from an attack
the ML-based NIDSs evaluation process, proving effective class that is unavailable during the training stage at the test-
in measuring the detection rate of known attack classes. ing stage. Given an NIDS data set, we can define a ZSL
z
However, obtaining data samples for each attack class is chal- training set Dtr for attack classes z as follows:
lenging for different reasons. For instance, zero-day attacks
have emerged repeatedly over the past few decades and
z
Dtr = {(x, y)|x ∈ X tr , y ∈ Ytrz = {b, a1 , a2 , ..., an } \ {az }}
present a severe risk to the organisation of computer net- f or z ∈ {1, ..., n} (3)
works. A zero-day attack can be a new kind of modified Dtst = {(x, y)|x ∈ X tst , y ∈ Ytst = {b, a1 , a2 , ..., an }} (4)
threat that has not been seen or available earlier [13]. Further-
more, due to the wide variety of tactics and techniques used
in executing network attacks, each threat presents a unique wher e X tr ⊂ X , X tst ⊂ X
behavioural pattern, and the collection of each attack type for
ML training is unrealistic. Therefore, the traditional ML eval- the set of training classes Ytrz consists of benign traffic b and
uation set-up removes the conclusion that ML-based NIDSs n attack classes a1 , ..., an , but importantly, minus the zero-
are effective in the detection of zero-day or unseen attack day attack class az . In contrast, the test data set Dtst always
scenarios due to their unavailability at the time of training. consists of samples of all classes without removing any attack
123
From zero-shot machine learning to zero-day attack detection 951
Tesng Set
Zero-Day
where H is the set of learnt attributes used to predict zero-day 4 Experimental setup
attacks.
During the inference phase, the zero-day attack traffic The evaluation of the ML-based NIDS capability to detect
class az is added to the test set to measure the zero-day zero-day attacks is crucial. In this paper, two commonly used
detection accuracy. Therefore, the test set includes known ML models have been used in the design of ML-based NIDSs,
123
952 M. Sarhan et al.
Random Forest (RF) [30] and Multi-Layer Perceptron (MLP) set. The nprobe feature extraction tool extracts network
[31]. The RF classifier is designed using randomised 50 deci- data flows, and the flows are labelled using the appro-
sion tree classifiers in the forest. The model utilises the Gini priate data labels. The total number of data flows is
impurity loss function [32] to measure the quality of a split 2,390,275, of which 95,053 (3.98%) are attack samples
with no maximum tree depth defined. The RF model requires and 2,295,222 (96.02%) benign.
2 data samples to split an internal node and 1 data sample to be
at a leaf node. The MLP neural network model is structured This paper uses the complete set of data samples in each
with 100 neurons in two hidden layers, each performing the data set. This is required as distinct nodes on the testbed
Rectified Linear Unit (ReLU) [33] activation function. The have been used to launch attack scenarios targeting specific
Adam optimiser is used for the model’s loss function and network ports. Initially, the flow identifiers such as sam-
parameter optimisation with a 0.001 learning rate. A 0.0001 ple id, source/destination IPs, source/destination ports, and
L2 regularisation parameter is used to avoid over-fitting and timestamps are dropped to avoid learning bias towards the
the training rounds are set to 50. The semantic representations attacking and victim endpoints. Moreover, all categorical-
are learnt by the RF and MLP models in the training phase based features are converted to numerical-based values using
using their respective loss optimisation function. A fivefold the label encoding technique, where each label is assigned
cross-validation method is adopted in the inference stage to a unique integer. Once a complete numerical data set is
calculate the mean results. obtained, the Min-Max Scaler technique is applied to nor-
In this paper, two NIDS data sets are used to evalu- malise all values between 0 and 1 to accommodate efficient
ate the ML models following the proposed methodology, experiments.
i.e. UNSW-NB15 [34] and NF-UNSW-NB15-v2 [35]. The The standard classification performance metrics of preci-
data sets are synthetic and were created via virtual network sion, detection rate (DR), false alarm rate (FAR), area under
testbeds representing modern network structures. In design- the curve (AUC), and F1 score are used for our evaluation.
ing such data sets, specific attack scenarios are conducted, These metrics are defined based on the numbers of True Pos-
and the corresponding network traffic is captured and labelled itives (TP), False Positives (FP), True Negatives (TN), and
with the respective attack type. In addition, benign network False Negatives (FN), as shown in Table 1. In addition to these
traffic is generated that represents benign traffic and is cap- standard metrics, we define a new evaluation metric called
tured and labelled accordingly. Both malicious traffic and Zero-Day Detection Rate (Z -D Rz ), also shown in Table 1,
non-malicious traffic are captured in the native packet cap- which is defined as the specific detection rate of the zero-
ture (pcap) format, and network data features are extracted day attack class az , which is excluded from the training data
to represent explicit information regarding the data flow. The set. The T Paz and F Naz are the number of True Positives and
chosen data sets include a variety of modern network attacks, False Negatives explicitly calculated for the samples from the
each of which can be used to simulate the incoming of a zero-day attack class az . The new metric, defined in Eq. 6,
zero-day attack. Such data sets have been widely used in the measures how well the ML model can detect zero-day attacks
literature, as they do not present the privacy limitations faced of class az . The Z -D Rz is used to explicitly measure the per-
by the collection and labelling of real-world production net- formance of the trained model in recognising the zero-day
works. attack samples. The DR provides insights into the detection
of the complete set of attack samples.
• UNSW-NB15 [34]- A well-known and widely used
NIDS data set was released in 2015 by the Cyber Range T Paz
Z -D Rz = × 100 (6)
Lab of the Australian Centre for Cyber Security (ACCS). T Paz + F Naz
The synthetic data set uses the IXIA Perfect Storm tool
to generate benign network activities and pre-meditated
attack scenarios. The data set contains 49 features, listed 5 Evaluation
and discussed in [34], extracted by the Argus and Bro-
IDS tools, and twelve additional SQL algorithms. The In this section, two ML models, MLP and RF, have been used
data set consists of 2,218,761 (87.35%) benign and to detect zero-day attacks using our proposed ZSL evaluation
321,283 (12.65%) attack samples, that is, 2,540,044 net- framework. The experiments use two synthetic NIDS data
work data samples in total. sets (UNSW-NB15 and NF-UNSW-NB15-v2). Each avail-
• NF-UNSW-NB15-v2 [35]- A NetFlow data set based able attack class in the data sets is considered to simulate
on the UNSW-NB15 data set has recently been gen- a zero-day attack incident. The models are evaluated based
erated and released in 2021. The data set is generated on the Z-DR and the test set’s overall detection accuracy,
by extracting 43 NetFlow-based features, explained in including known attacks, a zero-day attack, and benign data
[35], from the pcap files of the UNSW-NB15 data samples. This represents a generalised ZSL set-up where the
123
From zero-shot machine learning to zero-day attack detection 953
test set includes known and unknown data samples, which attack types, such as Backdoor and Worms, were almost
is appropriate for ML-based NIDS evaluation. The baseline entirely detected by both ML models when observed as zero-
used for the Z-DR comparison is the traditional DR metric to day attacks.
highlight the difference in each scenario. Finally, the results The performance of both ML models depends on the
are analysed and explained using WD to explain the variance complexity of the incoming zero-day attacks. The models
of Z-DR with different attack classes. successfully detected 95% or more samples of attacks such as
Generic, DoS, Backdoor, Shellcode, and Worms. However,
5.1 Results Exploits, Reconnaissance, and Analysis are harder to detect,
with both models achieving around 90% Z-DR. However, in
Tables 2, 3, 4, 5 display the complete set of results collected. the likely scenario of the models observing attacks related
Each table represents a unique ML model and data set combi- to the Fuzzers attack group as a zero-day attack, ML-based
nation. The first column in each table lists the attacks used to NIDSs would be highly vulnerable as more than 80% of their
simulate a zero-day attack incident. The second column dis- data samples were undetected and classified as benign sam-
plays the corresponding Z-DR value, and the rest presents the ples. The extremely low Z-DRs of both models present severe
remaining evaluation metrics collected over the complete test risks to organisations protected by ML-based NIDSs in a sce-
set, including the zero-day attack, known attacks and benign nario of a new zero-day attack group similar to Fuzzers. The
data samples. MLP classifier generally achieved an average of 85.5% detec-
In Tables 2 and 3, the performance of the MLP and RF tion rates in zero-day attacks. The RF classifier was slightly
classifiers, when evaluated using the UNSW-NB15 data set, inferior, with an average detection rate of 80.67%.
is presented. During the simulation of zero-day attacks, the In Fig. 3, the detection rate of each attack group in the
Exploits, Reconnaissance, and DoS attacks are detected at UNSW-NB15 data set is measured in known attack and
around 90% using the MLP classifier. The RF classifier is zero-day attack scenarios. Figure 3a and 3b represents the
more effective in detecting Exploits and DoS attacks. The performance using the MLP and RF models, respectively.
MLP and RF models detect 20% and 15% of the Fuzzer attack The drop in detection rate is highly notable in certain attack
data samples, respectively. The MLP model is superior to RF types such as Fuzzers and Reconnaissance. The DR value
in detecting Generic and Shellcode attack types, achieving dropped by around 70% and 10%, respectively, for the two
a high Z-DR of 96% and 97% compared to 59% and 91%, ML models. Furthermore, there are distinct differences in
respectively. The Analysis attack type is deemed complex the performance of the two models. The MLP model was
in its detection as a zero-day attack where the MLP model more successful in detecting Generic attacks as a zero-day
achieved an 84%, and the RF model detected 81%. Other at a Z-DR of 95.90% compared to 59.06% achieved by RF.
123
954 M. Sarhan et al.
123
From zero-shot machine learning to zero-day attack detection 955
Both models achieved a 100% detection rate when the attack them as zero-day attacks. The MLP and RF models achieve
class was observed in the training set. The RF classifier has average Z-DR values of 92.45% and 87.37%, respectively.
been slightly more efficient in detecting the Exploits and DoS The UNSW-NB15 and NF-UNSW-NB15-v2 data sets con-
attack groups as a zero-day. tain the same attack groups and differ only in their respective
In Tables 4 and 5, the zero-day attack detection perfor- feature sets. The NetFlow-based feature set of NF-UNSW-
mance of the ML models is evaluated using NF-UNSW- NB15-v2 results in an increased Z-DR of around 7% for each
NB15-v2, the NetFlow-based edition of the UNSW-NB15 of the two ML models. This demonstrates an advantage of
data set. The MLP model is superior to the RF model in using NetFlow-based features in the detection of zero-day
detecting zero-day Exploits and Fuzzers attack groups with attack scenarios.
a detection rate of 82% and 76% compared to 59% and 51%, In Fig. 4, the detection rate of each attack group in the NF-
respectively. The ML models did not successfully apply the UNSW-NB15-v2 data set is shown for known and zero-day
learnt semantic attributes of the attack behaviour to relate attack scenarios. Figure 4a and 4b shows the performance
the Exploits and Fuzzers zero-day attacks as malicious traf- of the MLP and RF models, respectively. In this data set, a
fic. Attacks such as Generic, Reconnaissance, Backdoor, and significant drop in detection rates is observed for the Exploits
Shellcode present a significantly lower cybersecurity risk to and Fuzzers attack groups, with an average decrease of 28%
organisations protected by ML-based NIDS when observed and 35%, respectively, for the two ML models in a zero-day
for the first time as zero-day attacks. The utilised mod- attack scenario. The ML models could detect the attacks for
els correctly detected close to 100% of their data samples the rest of the attack groups; however, the DoS and Analysis
as intrusive traffic. Moreover, the DoS and Analysis attack were slightly sophisticated in their detection, even in a known
groups were slightly harder to detect, as both ML models attack scenario.
detected around 90% of their data samples. Overall, effective Z-DRs have been achieved by both ML
Most of the attacks in the NF-UNSW-NB15-v2 data set models on most of the zero-day attack data samples. This
were reliably detected using the two ML models in a zero-day demonstrates the efficiency of the proposed technique and
attack scenario. The learning models successfully utilised increases the motivation to adopt ML-based NIDSs in secur-
the learnt information from known attacks to detect zero- ing organisational parameters. However, the Fuzzers attack
day attack types. However, the Exploits and Fuzzers attack group is challenging for such systems to detect in a zero-
scenarios seem harder to detect if the ML models encounter day scenario. The Fuzzers group contains attack scenarios
123
956 M. Sarhan et al.
in which the attacker sends a large amount of random data, where (u, v) is the set of (probability) distributions on
causing a system to crash while aiming to discover security R×R where u and v are its first and second factor marginals.
vulnerabilities. While from a security perspective, network γ (x, y) can be interpreted as a transport plan/function that
scanning traffic often appears similar to benign traffic with an gives the amount of mass to move from each x to y to trans-
increased volume [36], during the next Section, we analyse port u to v, subject to the following constraints:
the statistical distribution of the Fuzzers attack data samples.
⎧
⎪
⎨ γ (x, y)dy = u(x)
5.2 Analysis (8)
⎪
⎩ γ (x, y)d x = v(y)
In order to investigate the results provided in the previ-
ous subsection, particularly the low Z-DRs of some attack
classes, this section examines the distribution differences this indicates that for an infinitesimal region around x, the
of features between the training and test sets, i.e. where all total mass moved out must be equal to u(x)d x. Similarly,
attacks are seen (training set) and where there is an unseen for an infinitesimal region around y, the total mass moved in
attack (testing set). Since the main objective of this analysis must be equal to v(y)dy.
was to find any possible differences between the ZSL train- Accordingly, we use WD as the comparison metric and
ing and testing sets, statistical measures that could identify conducted a series of experiments to investigate the differ-
differences between (feature) distributions were explored. ences in the feature distributions of the training and test sets
The Wasserstein Distance (WD) metric is a commonly in 9 different zero-day scenarios (one per attack class in each
z
used tool in the ML community, which has been successfully data set). In each scenario, after selecting the training (Dtr )
z
used in [37] for quantifying the feature distribution distances. and testing (Dtst ) sets, the distribution of each feature (except
The WD, also known as Earth Mover distance, is mathe- the flow identifier features that were removed in the pre-
matically defined as a distance function of two probability processing stage) was compared across the two sets using
z z
distributions u and v in Eq. 7 [38]: the WD metric (i.e. W (Dtr , Dtst ) in the form of Eq. 7 nota-
tion).
The method is performed by measuring the WD between
W (u, v) = inf |x − y|dγ (x, y) (7) the set of known attacks and the set, including the zero-day
γ ∈(u,v) R×R
123
From zero-shot machine learning to zero-day attack detection 957
z z
Fig. 5 Average WD of distributions of features (averaged over 45 features) in the train vs. test sets, W (Dtr , Dtst ), of each zero-day attack, a)
UNSW-NB15 data set, and b) NF-UNSW-NB15-v2 data set
attack. Hence, a WD value corresponding to each feature of paper, as there is a significant difference between their Z-
the data set was obtained for each zero-day attack scenario DR and DR values. Therefore, their detection as zero-day
(45 WD values per each zero-day attack scenario). A higher attacks using an ML-based NIDS will be challenging from
WD value for a feature indicates a more distinctive distribu- an ML perspective. More studies are required to improve ML-
z z
tion between the training sets (Dtr ) and testing (Dtst ) sets of based NIDSs in detecting unique attack behaviour related to
the corresponding zero-day attack. sophisticated attacks.
Figure 5a and 5b shows the average WD value of the dis-
tribution of features (averaged over 45 features) for each
zero-day attack scenario, for the UNSW-NB15 and NF-
UNSW-NB15-v2 data sets, respectively. In both figures, each
column corresponds to an unseen/zero-day attack scenario. 6 Conclusion
Accordingly, the value of WD in each column is the average
of 45 WD of the distribution of features in the training and test A novel ZSL-based framework has been proposed to evalu-
sets in that zero-day attack scenario. As can be seen, most of ate the performance of ML-based NIDSs in the recognition
the unseen/zero-day attacks have a low WD value of around of unseen attacks, also known as zero-day attacks. In the
0.2, which indicates the overall feature distributions have attribute learning stage, the model learns the distinguishing
been similar between the training and testing sets in these attributes of the attack traffic using a set of known attacks.
zero-day attack scenarios. This shows that these unseen/zero- This is accomplished by mapping relationships between the
day attacks have had similar statistical feature distributions network data features and semantic attributes. In the infer-
to the seen attacks, i.e. attacks present in the training set. ence stage, the model is required to associate the relationship
Due to the similarity in the attack types, it is expected to of the known attack behaviour to detect a zero-day attack.
see a higher zero-day detection performance. This mainly Using our proposed methodology, two well-known ML mod-
includes the Analysis, Backdoor, DoS, Reconnaissance, els have been designed to evaluate their ability to detect each
Shellcode, and Worms attacks. Taking into account Tables 2, attack present in the UNSW-NB15 and NF-UNSW-NB15-
3, 4, and 5, the Z-DR values of these attacks indicate that these v2 data sets as a zero-day attack. The results demonstrate
attacks are detected with a high detection rate in a zero-day that while most attack classes have high Z-DR values, cer-
attack scenario. Our results also show a minor degradation tain attack groups identified in this paper were unreliably
in their Z-DR values compared to their (non-zero-day) DR, detected as zero-day threats. The results presented in this
using the same ML model. paper were further analysed and confirmed using the WD
Our analysis using the WD between feature distributions technique, in which the statistical differences in feature distri-
of different attack classes provides a solid explanation of the butions have been directly correlated with the WD and Z-DR
results presented and is consistent with the main findings of metrics. The ability of zero-day attack detection is an essen-
this paper. Overall, the WD function has identified several tial feature of ML-based NIDSs and is critical for increased
attack groups with a unique malicious pattern compared to practical deployment in production networks. However, this
the remainder of the attacks. This matches the results in this vital issue has attracted only relatively limited attention in the
123
958 M. Sarhan et al.
research literature. We hope that the work presented in this 8. Dua, S., Du, X.: Data mining and machine learning in cybersecurity.
paper provides a basis and motivation for further research. CRC press (2016)
9. Apruzzese, G. Colajanni, M., Ferretti, L., Guido, A., Marchetti,
Funding Open Access funding enabled and organized by CAUL and its M.: On the effectiveness of machine and deep learning for cyber
Member Institutions No funding was received to assist with the prepa- security. In: 2018 10th International Conference on Cyber Conflict
ration of this manuscript. (CyCon), pp. 371–390, IEEE (2018)
10. Mukherjee, B., Heberlein, L.T., Levitt, K.N.: Network intrusion
Research data policy and data availability statements All data gener- detection. IEEE Netw. 8(3), 26–41 (1994)
ated or analysed during this study are included in this published article 11. Kumar, V., Sangwan, O.P.: Signature based intrusion detection sys-
Sarhan, M., Layeghy, S. & Portmann, M. Towards a Standard Feature tem using snort. Int. J. Comput. Appl. Inf. Technol. 1(3), 35–41
Set for Network Intrusion Detection System Datasets. Mobile Netw (2012)
Appl 27, 357–370 (2022). https://doi.org/10.1007/s11036-021-01843- 12. Garcia-Teodoro, P., Diaz-Verdejo, J., Maciá-Fernández,
0 G., Vázquez, E.: Anomaly-based network intrusion detec-
tion: techniques, systems and challenges. comput. Security
28(1–2), pp. 18–28 (2009)
Declarations 13. Bilge, L., Dumitraş, T.: Before we knew it: an empirical study of
zero-day attacks in the real world. In: Proceedings of the 2012 ACM
Conference on Computer and Communications Security, pp. 833–
Conflict of interest The authors have no competing interests to declare 844 (2012)
relevant to this article’s content. 14. Stellios, I., Kotzanikolaou, P., Psarakis, M.: Advanced persistent
threats and zero-day exploits in industrial internet of things. In:
Human and animal participants This article does not contain any stud- Security and Privacy Trends in the Industrial Internet of Things,
ies with human participants or animals performed by any authors. pp. 47–68, Springer (2019)
15. Mell, P., Grance, T.: Use of the common vulnerabilities and expo-
Open Access This article is licensed under a Creative Commons sures (cve) vulnerability naming scheme, tech. rep., National Inst
Attribution 4.0 International License, which permits use, sharing, adap- of Standards and Technology Gaithersburg MD Computer Security
tation, distribution and reproduction in any medium or format, as Div (2002)
long as you give appropriate credit to the original author(s) and the 16. Ganame, K., Allaire, M. A., Zagdene, G., Boudar, O.: Network
source, provide a link to the Creative Commons licence, and indi- behavioral analysis for zero-day malware detection–a case study.
cate if changes were made. The images or other third party material In: International Conference on Intelligent, Secure, and Depend-
in this article are included in the article’s Creative Commons licence, able Systems in Distributed and Cloud Environments, pp. 169–181,
unless indicated otherwise in a credit line to the material. If material Springer (2017)
is not included in the article’s Creative Commons licence and your 17. Sinclair, C., Pierce, L., Matzner, S.: An application of machine
intended use is not permitted by statutory regulation or exceeds the learning to network intrusion detection. In: Proceedings 15th
permitted use, you will need to obtain permission directly from the copy- Annual Computer Security Applications Conference (ACSAC’99),
right holder. To view a copy of this licence, visit http://creativecomm pp. 371–377, IEEE (1999)
ons.org/licenses/by/4.0/. 18. S. Sahu and B. M. Mehtre, Network intrusion detection sys-
tem using j48 decision tree. In: 2015 International Conference
on Advances in Computing, Communications and Informatics
(ICACCI), pp. 2023–2026, IEEE (2015)
19. Xian, Y., Schiele, B., Akata, Z.: Zero-shot learning-the good, the
bad and the ugly. In Proceedings of the IEEE Conference on Com-
References puter Vision and Pattern Recognition, pp. 4582–4591 (2017)
20. Wang, W., Zheng, V.W., Yu, H., Miao, C.: A survey of zero-shot
1. Ghahramani, Z.: Probabilistic machine learning and artificial intel- learning: settings, methods, and applications. ACM Trans. Intell.
ligence. Nature 521(7553), 452–459 (2015) Syst. Technol. (TIST) 10(2), 1–37 (2019)
2. Panch, T., Szolovits, P., Atun, R.: Artificial intelligence, machine 21. Zhang, Z., Liu, Q., Qiu, S., Zhou, S., Zhang, C.: Unknown attack
learning and health systems. J. Glob. Health 8(2) (2018) detection based on zero-shot learning. IEEE Access 8, 193981–
3. Koza, J. R., Bennett, F. H., Andre, D., Keane, M. A.: Automated 193991 (2020)
Design of Both the Topology and Sizing of Analog Electri- 22. Sommer, R., Paxson, V.: Outside the closed world: on using
cal Circuits Using Genetic Programming, pp. 151–170. Springer machine learning for network intrusion detection. In: 2010 IEEE
Netherlands, Dordrecht (1996) Symposium on Security and Privacy, pp. 305–316, IEEE (2010)
4. Najafabadi, M.M., Villanustre, F., Khoshgoftaar, T.M., Seliya, N., 23. Casas, P., Mazel, J., Owezarski, P.: Unsupervised network intru-
Wald, R., Muharemagic, E.: Deep learning applications and chal- sion detection systems: detecting the unknown without knowledge.
lenges in big data analytics. J. Big Data 2(1), 1–21 (2015) Comput. Commun. 35(7), 772–783 (2012)
5. Bloomfield, R., Khlaaf, H., Conmy, P.R., Fletcher, G.: Disruptive 24. Holm, H.: Signature based intrusion detection for zero-day
innovations and disruptive assurance: assuring machine learning attacks:(not) a closed chapter?. In: 2014 47th Hawaii International
and autonomy. Computer 52(9), 82–89 (2019) Conference on System Sciences, pp. 4895–4904, IEEE (2014)
6. Buczak, A.L., Guven, E.: A survey of data mining and machine 25. Hindy, H., Atkinson, R., Tachtatzis, C., Colin, J.-N., Bayne, E.,
learning methods for cyber security intrusion detection. IEEE Com- Bellekens, X.: Utilising deep learning techniques for effective zero-
mun. Surv. Tutorials 18(2), 1153–1176 (2015) day attack detection. Electronics 9(10), 1684 (2020)
7. Alrashdi, I., Alqazzaz, A., Aloufi, E., Alharthi, R., Zohdy, 26. Li, Z., Qin, Z., Shen, P., Jiang, L.: Zero-shot learning for intrusion
M., Ming, H.: Ad-iot: Anomaly detection of iot cyberattacks in detection via attribute representation. In: International Conference
smart city using machine learning. In: 2019 IEEE 9th Annual Com- on Neural Information Processing, pp. 352–364, Springer (2019)
puting and Communication Workshop and Conference (CCWC), 27. Kumar, V., Sinha, D.: A robust intelligent zero-day cyber-attack
pp. 0305–0310, IEEE (2019) detection technique. Complex Intell. Syst. 7(5), 2211–2234 (2021)
123
From zero-shot machine learning to zero-day attack detection 959
28. Siddique, K., Akhtar, Z., Aslam Khan, F., Kim, Y.: Kdd cup 99 38. Ramdas, A., Trillos, N.G., Cuturi, M.: On wasserstein two-sample
data sets: A perspective on the role of data sets in network intrusion testing and related families of nonparametric tests. Entropy 19(2),
detection research. Computer 52(2), 41–51 (2019) 47 (2017)
29. Felix, R., Harwood, B., Sasdelli, M., Carneiro, G.: Generalised
zero-shot learning with domain classification in a joint semantic
and visual space. In: 2019 Digital Image Computing: Techniques
Publisher’s Note Springer Nature remains neutral with regard to juris-
and Applications (DICTA), pp. 1–8, IEEE (2019)
dictional claims in published maps and institutional affiliations.
30. Breiman, L.: Random forests. Mach. Learn. 45(1), 5–32 (2001)
31. Hinton, G. E.: Connectionist learning procedures. Mach. learn.,
pp. 555–610, Elsevier (1990)
32. Breiman, L.: Some properties of splitting criteria. Mach. Learn.
24(1), 41–47 (1996)
33. Agarap, A. F.: Deep learning using rectified linear units (relu).
arXiv preprint arXiv:1803.08375 (2018)
34. Moustafa, N., Slay, J.: Unsw-nb15: a comprehensive data set for
network intrusion detection systems (unsw-nb15 network data set).
In: 2015 Military Communications and Information Systems Con-
ference (MilCIS), pp. 1–6, IEEE (2015)
35. Sarhan, M., Layeghy, S., Moustafa, N., Portmann, M.: Towards
a Standard Feature Set of NIDS Datasets. arXiv preprint
arXiv:2101.11315 (2021)
36. Corchado, E., Herrero, Á.: Neural visualization of network traffic
data for intrusion detection. Appl. Soft Comput. 11(2), 2042–2056
(2011)
37. Layeghy, S., Gallagher, M., Portmann, M.: Benchmarking the
Benchmark - Analysis of Synthetic NIDS Datasets. arXiv preprint
arXiv:2104.09029 (2021)
123