The Evaluation of Network Anomaly Detection Systems: Statistical Analysis of
The Evaluation of Network Anomaly Detection Systems: Statistical Analysis of
The Evaluation of Network Anomaly Detection Systems: Statistical Analysis of
To cite this article: Nour Moustafa & Jill Slay (2016): The evaluation of Network Anomaly
Detection Systems: Statistical analysis of the UNSW-NB15 data set and the comparison
with the KDD99 data set, Information Security Journal: A Global Perspective, DOI:
10.1080/19393555.2015.1125974
Article views: 13
Download by: [Mount Allison University 0Libraries] Date: 13 January 2016, At: 17:28
INFORMATION SECURITY JOURNAL: A GLOBAL PERSPECTIVE
http://dx.doi.org/10.1080/19393555.2015.1125974
ABSTRACT KEYWORDS
Over the last three decades, Network Intrusion Detection Systems (NIDSs), particularly, Anomaly Feature correlations;
Detection Systems (ADSs), have become more significant in detecting novel attacks than multivariate analysis; NIDSs;
Signature Detection Systems (SDSs). Evaluating NIDSs using the existing benchmark data sets of UNSW-NB15 data set
Downloaded by [Mount Allison University 0Libraries] at 17:28 13 January 2016
KDD99 and NSLKDD does not reflect satisfactory results, due to three major issues: (1) their lack of
modern low footprint attack styles, (2) their lack of modern normal traffic scenarios, and (3) a
different distribution of training and testing sets. To address these issues, the UNSW-NB15 data set
has recently been generated. This data set has nine types of the modern attacks fashions and new
patterns of normal traffic, and it contains 49 attributes that comprise the flow based between
hosts and the network packets inspection to discriminate between the observations, either
normal or abnormal. In this paper, we demonstrate the complexity of the UNSW-NB15 data set
in three aspects. First, the statistical analysis of the observations and the attributes are explained.
Second, the examination of feature correlations is provided. Third, five existing classifiers are used
to evaluate the complexity in terms of accuracy and false alarm rates (FARs) and then, the results
are compared with the KDD99 data set. The experimental results show that UNSW-NB15 is more
complex than KDD99 and is considered as a new benchmark data set for evaluating NIDSs.
1. Introduction FAR than the ANIDS (Lee et al., 1999), but the
ANIDS has the ability of detecting novel attacks
Because of the ubiquitous usage of computer
(Lazarevic et al., 2003). Therefore, the ANIDSs are
networks and the plurality of applications running
becoming a necessity rather than the MNIDS (Aziz
on them, cyber attackers attempt to exploit weak
et al., 2014; Bhuyan, Bhattacharyya, & Kalita, 2014;
points of network architectures to steal, corrupt, or
García-Teodoro, Díaz-Verdejo, Maciá-Fernández, &
destroy valuable information (DeWeese, 2009; Eom
Vázquez, 2009).
et al., 2012; Vatis, 2001). Consequently, the function
Evaluating the efficiency of any NIDS requires a
of a NIDS is to detect and identify anomalies in
modern comprehensive data set that contains
network systems (Denning, 1987). NIDSs are classi-
contemporary normal and attack activities.
fied into Misuse based (MNIDS) and Anomaly based
McHugh (2000), Tavallaee et al. (2009), and
(ANIDS) (Lee, Stolfo, & Mok, 1999; Moustafa &
Moustafa and Slay (2015a) stated that the existing
Slay, 2015a; Valdes & Anderson, 1995). In MNIDS,
benchmark data sets, especially, KDD99 and
the known attacks are detected by matching the
NSLKDD, negatively affect the NIDS results
stored signatures of those attacks (Lee et al., 1999;
because of three major problems. Firstly, lack of
Vigna & Kemmerer, 1999). While ANIDS creates a
modern low footprint attack fashions, for instance,
profile of normal activities, any deviation from this
stealthy or spy attacks that change their styles over
profile is considered as an anomaly (Ghosh,
time to become similar to normal behaviors
Wanken, & Charron, 1998; Valdes & Anderson,
(Cunningham & Lippmann, 2000; Tavallaee
1995). Several studies stated that the MNIDS
et al., 2009). Second, the existing data sets were
can often accomplish higher accuracy and lower
created two decades ago, indicating that the
CONTACT Nour Moustafa [email protected] School of Engineering and Information Technology, University of New
South Wales at the Australian Defence Force Academy, Northcott Drive, Campbell, ACT 2600, Canberra, Australia.
Color versions of one or more of the figures in the article can be found online at www.tandfonline.com/uiss.
© 2016 Taylor & Francis
2 N. MOUSTAFA AND J. SLAY
normal traffic of the existing benchmark data sets Clustering (Sharif, Prugel-Benett, & Wills, 2012)
is different from the current normal traffic because are executed on the training and testing sets to
of the revolution in networks speed and applica- assess the complexity in terms of accuracy and
tions (McHugh, 2000). Third, the testing set of the FARs. Further, the results of this data set are com-
existing benchmark data sets has some attack types pared with the KDD99 data set (KDDCUP1999,
which are not in the training set; this means that 2007) to identify the capability of the UNSW-
the training and testing set have different distribu- NB15 data set in appraising existing and novel
tion (Tavallaee et al., 2009). The difference in the classifiers.
distribution persuades classifier systems to skew The objective of the paper is to analyse the
toward some observations causing the FAR UNSW-NB15 data set statistically and practically.
(Cieslak & Chawla, 2009; Tavallaee et al., 2009). First, in the statistical aspect, the distribution of
In the light of the above discussion, to address data points specifies the suitable algorithms of
these challenges the UNSW-NB15 data set has classification. To be clear, if a data set follows
recently been released (Moustafa & Slay, 2014, Gaussian distribution, many statistical algorithms,
Downloaded by [Mount Allison University 0Libraries] at 17:28 13 January 2016
2015b). This data set includes nine categories of for instance, HMM and Kalman filter are used.
the modern attack types and involves realistic However, if a data set does not fit Gaussian
activities of normal traffic that were captured distribution, other algorithms, for example,
with the change over time. In addition, it contains particle filter and mixture models are applied.
49 features that comprised the flow based between Second, in the practical aspect, the adoption of
hosts (i.e., client-to-server or server-to-client) the best attributes decrease false alarm rates and
and the packet header which covers in-depth reduce the execution costs. For that purpose, fea-
characteristics of the network traffic. ture correlations with label and without label are
A part of UNSW-NB15 data set was decomposed demonstrated.
into two partitions of the training and testing sets to The rest of this paper is organised as follows:
determine the analysis aspects. The goal of the three Section 2 describes the UNSW-NB15 data set.
aspects is to evaluate the complexity of the training Section 3 discusses the training and the testing
and testing sets. First, the Kolmogorov-Smirnov sets extracted from this data set. Section 4 discusses
Test (Justel, Peña, & Zamar, 1997; Massey, 1951) the statistical mechanisms used on the two sets.
defines and compares the distribution of the Section 5 presents the feature correlation methods.
training and testing sets; skewness (Mardia, 1970) Section 6 identifies the classification techniques
measures the asymmetry of the features; and which are involved to evaluate the complexity of
kurtosis (Mardia, 1970) estimates the flatness of the KDD99 and UNSW-NB15 data sets. Section 7
the features. The reliability of results can be presents the experimental results of the statistical
achieved when these statistics are approximately techniques, the feature correlations, and the
similar to the features of the training and testing complexity evaluation. Finally, section 8 provides a
sets. Second, the feature correlations are measured conclusion to the paper and examines the future
in two perspectives: (1) the feature correlations research area.
without the class label, (2) and the feature correla-
tions with the class label. To achieve the first per-
2. Description of the UNSW-NB15 data set
spective, Pearson’s Correlation Coefficient (PCC)
(Bland & Altman, 1995) is used. Gain Ratio (GR) The UNSW-NB 15 data set (Moustafa & Slay, 2014,
method (Hall & Smith, 1998) is utilised to achieve 2015b) was created using an IXIA PerfectStorm tool
the second perspective. Third, five existing techni- (IXIA PerfectStormOne Tool, 2014) in the Cyber
ques, namely, Naïve Bayes (NB)(Panda & Patra, Range Lab of the Australian Centre for Cyber
2007), Decision Tree (DT) (Bouzida & Cuppens, Security (ACCS) (Australian Center for Cyber
2006), Artificial Neural Network (ANN) (Bouzida Security (ACCS), 2014) to generate a hybrid of the
& Cuppens, 2006; Mukkamala, Sung, & Abraham, realistic modern normal activities and the synthetic
2005), Logistic Regression (LR) (Mukkamala et al., contemporary attack behaviors from network traffic.
2005), and Expectation-Maximisation (EM) A tcpdump tool (tcpdump tool, 2014) was used to
INFORMATION SECURITY JOURNAL: A GLOBAL PERSPECTIVE 3
capture 100 GB of a raw network traffic. Argus Table 5. Additional generated features.
(Argus tool, 2014), Bro-IDS (Bro-IDS Tool, 2014) 36 is_sm_ips_ports If srcip (1) equals to dstip (3) and sport (2)
equals to dsport (4), this variable assigns to 1
tools were used and 12 models were developed for otherwise 0
extracting the features of Tables 1, 2, 3, 4 and 5, 37 ct_state_ttl No. for each state (6) according to specific
respectively. These techniques were configured in a range of values of sttl (10) and dttl (11)
38 ct_flw_http_mthd No. of flows that has methods such as Get
parallel processing to extract 49 features with the and Post in http service
class label. After finishing the implementation of the 39 is_ftp_login If the ftp session is accessed by user and
password then 1 else 0
40 ct_ftp_cmd No of flows that has a command in ftp
session
Table 1. Flow features. 41 ct_srv_src No. of records that contain the same service
(14) and srcip (1) in 100 records according to
No. Name Description
the ltime (26)
1 Srcip Source IP address 42 ct_srv_dst No. of records that contain the same service
2 Sport Source port number (14) and dstip (3) in 100 records according to
3 Dstip Destination IP address the ltime (26)
4 Dsport Destination port number 43 ct_dst_ltm No. of records of the same dstip (3) in 100
Downloaded by [Mount Allison University 0Libraries] at 17:28 13 January 2016
5 Proto Protocol type (such as TC, UDP) records according to the ltime (26)
44 ct_src_ ltm No. of records of the srcip (1) in 100 records
according to the ltime (26)
Table 2. Basic features. 45 ct_src_dport_ltm No of records of the same srcip (1) and the
6 stateIndicates to the state and its dependent protocol (such dsport (4) in 100 records according to the
as ACC, CLO and CON). ltime (26)
7 dur Record total duration 46 ct_dst_sport_ltm No of records of the same dstip (3) and the
8 sbytes Source to destination bytes sport (2) in 100 records according to the ltime
9 dbytes Destination to source bytes (26)
10 sttl Source to destination time to live 47 ct_dst_src_ltm No of records of the same srcip (1) and the
11 dttl Destination to source time to live dstip (3) in in 100 records according to the
12 sloss Source packets retransmitted or dropped ltime (26)
13 dloss Destination packets retransmitted or dropped
14 service Such as http, ftp, smtp, ssh, dns and ftp-data.
15 sload Source bits per second configured techniques, the total number of records,
16 dload Destination bits per second 2,540,044, were stored in four CSV files. The records
17 spkts Source to destination packet count
18 dpkts Destination to source packet count and the features of the UNSW-NB15 data set are
described in-depth as follows.
Table 3. Content features.
19 swin Source TCP window advertisement value 2.1. Attack types
20 dwin Destination TCP window advertisement value
21 stcpb Source TCP base sequence number Attack types can be classified into nine groups:
22 dtcpb Destination TCP base sequence number
23 smeansz Mean of the flow packet size transmitted by the src
(1) Fuzzers: an attack in which the attacker
24 dmeansz Mean of the flow packet size transmitted by the attempts to discover security loopholes in a
dst program, operating system, or network by
25 trans_depth Represents the pipelined depth into the
connection of http request/response transaction feeding it with the massive inputting of
26 res_bdy_len Actual uncompressed content size of the data random data to make it crash.
transferred from the server’s http service
(2) Analysis: a type of variety intrusions that
penetrate the web applications via ports
Table 4. Time features. (e.g., port scans), emails (e.g., spam), and
27 sjitSource jitter (mSec) web scripts (e.g., HTML files).
28 djitDestination jitter (mSec)
(3) Backdoor: a technique of bypassing a
29 stime
record start time
30 ltime
record last time stealthy normal authentication, securing
31 sintpkt
Source interpacket arrival time (mSec) unauthorized remote access to a device,
32 dintpkt
Destination interpacket arrival time (mSec)
33 tcprtt
TCP connection setup round-trip time, the sum of and locating the entrance to plain text as it
’synack’ and ’ackdat’ is struggling to continue unobserved.
34 synack TCP connection setup time, the time between the SYN (4) DoS: an intrusion which disrupts the com-
and the SYN_ACK packets
35 ackdat TCP connection setup time, the time between the puter resources via memory, to be extremely
SYN_ACK and the ACK packets
4 N. MOUSTAFA AND J. SLAY
busy in order to prevent the authorized To label this data set, two attributes were
requests from accessing a device. provided: attack_cat represents the nine categories
(5) Exploit: a sequence of instructions that of the attack and the normal, and label is 0 for
takes advantage of a glitch, bug, or vulner- normal and otherwise 1.
ability to be caused by an unintentional or
unsuspected behavior on a host or network.
(6) Generic: a technique that establishes against 3. Training and testing set distribution
every block-cipher using a hash function to
collision without respect to the configura- A NIDS data set can be conceptualized as a
tion of the block-cipher. relational table (T) (Witten & Mining, 2005). The
(7) Reconnaissance: can be defined as a probe; an input to any NIDS is a set of instances (I) (e.g.,
attack that gathers information about a com- normal and attack records). Each instance consists
puter network to evade its security controls. of features (F) that have different data types
(8) Shellcode: an attack in which the attacker (i.e.,"f fR [ Sg, where "f means each feature
Downloaded by [Mount Allison University 0Libraries] at 17:28 13 January 2016
penetrates a slight piece of code starting from in T; R is real numbers and S denotes characters).
a shell to control the compromised machine. It is observed that NIDS techniques face challenges
(9) Worm: an attack whereby the attacker repli- for using these features because no standard format
cates itself in order to spread on other com- for feature values (e.g., number or nominal) is
puters. Often, it uses a computer network to offered(Shyu et al., 2005). In statistical perspective,
spread itself, depending on the security fail- T is a multivariate data representation which is codi-
ures on the target computer to access it. fied in Definition 1.
Definition 1: Let I1:N 2 T; I1:N ¼ fij 2 F ;
Y1:N ¼ fci 2 Cg; where i, j = 1, 2. . .., N. Suppose
2.2. Features
F is iid (independently and identically distributed).
Features are categorized into five groups: Defining I1:N and Y1:N as a column-vector, as given
(1) Flow features: includes the identifier in Eq. (1).
attributes between hosts (e.g., client-to-serve
f11 f12 :: c
or server-to-client), as reflected in Table 1. I1:N ¼ ; Y1:N ¼ 1 (1)
f21 f22 fij ci
(2) Basic features: involves the attributes that
represent protocols connections, as shown
in Table 2. such that I represents the observations of T,Y is
(3) Content features: encapsulates the attributes the class label ðCÞ for each I, N is the number of
of TCP/IP; also they contain some attributes instances, F denotes the features of I.
of http services, as reflected in Table 3.
(4) Time features: contains the attributes time, Proposition 1: A standard format for features (F)
for example, arrival time between packets, is prepared to have a same type (i.e., (number
start/end packet time, and round trip time onlyÞ"F fRg) to make the analysis of the data
of TCP protocol, as shown in Table 4. points easier. it assigns each nominal feature (S) to
(5) Additional generated features: in Table 5, a sequence of numbers (i.e.,"S ! R0:R ;
this category can be further divided into two wheref0 : Rg denotes as equence of numbers
groups: general purpose features (i.e., (Salem & Buehler, 2012). For instance, the UNSW-
36–40), whereby each feature has its own NB15 data set has three major nominal features
purpose, according to protect the service of (e.g., protocol types (e.g., TCP, UPD), States (e.g.,
protocols, and (2) connection features (i.e., CON, ACC) and services (e.g., HTTP, FTP)).This
41–47) that are built from the flow of 100 issue can be tackled by converting each value in
record connections based on the sequential these features into ordered numbers such as
order of the last time feature. TCP = 1, UDP = 2 and so on.
INFORMATION SECURITY JOURNAL: A GLOBAL PERSPECTIVE 5
data set; a part of the data set records has been function is utilised as formulated in Eq. (3). It is a
divided with an approximate 60%:40% ratio of the linear transformation to standardise the format of
training and testing sets, respectively. To achieve the fij values, this makes it easier to compare
the authenticity of NIDS evaluations, no redun- values in diverse distributions without changing
dant records among the training and testing set. the shape of the original distribution.
pffiffiffiffiffi 1
2π X ð2n1Þ2 π2 =ð8x2 Þ It is acknowledged that if skewness and kurtosis
f ðxÞ ¼ e (5) values tend to be 0, then the distribution
x n¼1
approximates a normal distribution.
From Eqs. (4) and (5), K-S test achieves Fn ðxÞ fits
f ðxÞ by maximizing the absolute difference as follows:
4.5. The statistical functions utilisation on the
Dn ¼ maxx jf ðxÞ Fn ðxÞj (6) TRIN and TSIN
The K-S test, Multivariate skewness and kurtosis
In the case of the critical value (Dn;/ ) functions are customised on the TRIN and TSIN to
(i.e., / denotes significance level) falls into the estimate the compatibility of them, as declared in
Kolmogorov-Smirnov table (KDDCUP1999, Eqs. (9)–(11).
2007), do PðDn Dn;/ Þ ¼ 1 / :Dn that can be
ρ
used to test "f within f ðxÞ. Hence, the suitable "f TRIN Dn;TRIN , "f TSIN Dn;TSIN Dn;/;"f (9)
Downloaded by [Mount Allison University 0Libraries] at 17:28 13 January 2016
ρ
4.3. Multivariate skewness "f TRIN kurTRIN , "f TSIN kurTSIN (11)
The skewness method (Mardia, 1970) is an asym-
metry measure of the probability distribution of Eq. (9) estimates the best fitting of the distribution
"f that has x1 ; x2 ; . . . :; xn from its M; skewness to TRIN and TSIN of "f , achieving the two sides are
function can be defined as: less than or equal K-S test (i.e., Dn;/;"f ), while Eqs.
Pn (10) and (11) calculate the skewness and the kurtosis
ðxi MÞ3
ske ¼ i¼1 3 (7) to the TRIN and TSIN of"f , respectively. It is observed
nδ
that ρ assigns to a threshold operator (e.g., = , < or >)
which compares the results between the two sides of
In Eq. (7), if result is positive, the distribution
the equations.
with an asymmetric tail spreads toward major
Based on the above explanation, the TRIN
positive values. On the other hand, a negative
value indicates that the distribution with an asym- and TSIN of "f is analysed to evaluate the statisti-
metric tail extends toward more negative values. cal relationship of them as in the following
algorithm:
5: compare the results of step 4 to the where covðÞ is the covariance and σ is the stan-
TRIN and TSIN as in formulated equations 9, 10 PN Pn
The splitting value between the subsets Iir is the DT classifier is a structure similar to a flowchart
indicated as: which consists of root, nodes and branches to
represent the classification rules. Each node denotes
X
r
Ii Ii
Split ðI Þ ¼ log2 (16) rules or procedures on a feature, each branch con-
i¼1
IN IN tains the results of the rules; and each leaf node
expresses a class label. Third, the ANN learning is
In Eq. (16), the split value of I expresses the used to approximate an activation function that
information generated by dividing I into r parts depends on a large number of input observations
conforming to r on the features. From Eqs. (14) and I. The basic ANN function can be defined as:
!
(15), the GA can be defined as: GRð f Þ ¼ Gainð f Þ= X
SplitðI Þ, where the feature with the highest Gain ratio f ðI Þ ¼ τ Wj :Ij (18)
is selected as the splitting feature. Thus, the strongest j
features with the class label are evaluated and ranked where f ðI Þ represents a predicted output of the class
Downloaded by [Mount Allison University 0Libraries] at 17:28 13 January 2016
by utilising the GR, where the scores of the features label, τ is an activation function (i.e., Sigmoid), Wj
are in the descending order. is a weight of each input instance Ij . Fourth, Logistic
Regression algorithm establishes the correlation
between a dependent variable (C) and independent
6. Techniques for evaluating the complexity
variables (F). It uses the maximum likelihood func-
This section discusses the techniques that are used tion to estimate the regression parameters. Fifth,
to evaluate the complexity in terms of accuracy Expectation-Maximization (EM) clustering techni-
and false alarm rates (FAR) on the UNSW-NB15 que depends on maximizing the probability density
data set. The five techniques used are Naïve Bayes function of a Gaussian distribution to calculate the
(NB) (Panda & Patra, 2007), Decision Tree (DT) mean and the covariance of each instance I in T.
(Bouzida & Cuppens, 2006), Artificial Neural The EM clustering algorithm encompasses into two
Network (ANN) (Bouzida & Cuppens, 2006; steps (i.e., Expectation (E-step) and Maximization
Mukkamala et al., 2005), Logistic Regression (LR) (M)). In the E-step, it estimates the likelihood for
(Mukkamala et al., 2005), and Expectation- each instance I in T. whilst, the M-step is utilised to
Maximization (EM) Clustering (Sharif et al., re-estimate the parameter values from the E-step to
2012). Each technique has its own characteristics achieve the best expected output.
to learn and evaluate data points of the Two parameters (accuracy and false alarm rate)
TRIN and TSIN which are described respectively in are calculated from the outcomes of these techniques
the following section. First, the NB classifier is a to measure the complexity of the UNSW-NB15
conditional probability model which constructs data set. Let the factors of the classification are FC ¼
the classification of the two classes (i.e., normal fTP; TN; FP; FN g where TP (i.e., true
(0) or anomaly (1)). It is applied by the maximum positive) denotes a number of the correctly attack
a posterior (MAP) function which is denoted as: classified, (i.e., true negative) expresses a number of
the correctly normal classified, FP (i.e., false positive)
Y
N
PðCjI Þ ¼ argmax PðCw Þ PðIj jCw Þ (17) is a number of the misclassified attacks and FN (i.e.,
w2f1;2;::;N g j¼1 false negative) is a number of the misclassified
normal records (Sokolova, Japkowicz, &
where C is the class label, I is the observation of each Szpakowicz, 2006). The accuracy (So-In et al., 2014;
class, w is the class number, P(C|I) denotes the Sokolova et al., 2006) is the rate of the correctly
probability of the class given a specified observation classified records to all the records, whether correctly
Q
N or incorrectly classified, which is denoted as
and PðIj jCw Þ indicates to multiply all the prob-
j¼1 TP þ TN
accuracy ¼ (19)
abilities of the instances conditionally to their TP þ TN þ FP þ FN
classes to achieve the maximum outcome. Second,
INFORMATION SECURITY JOURNAL: A GLOBAL PERSPECTIVE 9
The false alarm rate (FAR) is the average ratio of Table 7. The features of the analysis.
the misclassified to classified records either normal or Id Names Id Names
1 dur 22 synack
abnormal as denoted in Eq. (22). It is designed from
2 spkts 23 ackdat
Eqs. (20) and (21) to calculate the false positive rate 3 dpkts 24 smean
(FPR) and the false negative rate (FNR), respectively. 4 sbytes 25 dmean
5 dbytes 26 trans_depth
FP 6 rate 27 response_body_len
FPR ¼ (20) 7 sttl 28 ct_srv_src
FP þ TN 8 dttl 29 ct_state_ttl
FN 9 sload 30 ct_dst_ltm
FNR ¼ (21) 10 dload 31 ct_src_dport_ltm
FN þ TP 11 sloss 32 ct_dst_sport_ltm
12 dloss 33 ct_dst_src_ltm
FPR þ FNR
FAR ¼ (22) 13
14
sinpkt
dinpkt
34
35
is_ftp_logn
ct_ftp_cmd
2
15 sjit 36 ct_flw_http_mthd
16 djit 37 ct_src_ltm
Downloaded by [Mount Allison University 0Libraries] at 17:28 13 January 2016
17 swin 38 ct_srv_dst
7. Results and discussion 18 stcpb 39 is_sm_ips_ports
19 dtcpb 40 proto
This paper examines analytical approaches to mea- 20 dwin 41 service
sure the complexity of the UNSW-NB15 data set 21 tcprtt 42 state
which was developed to evaluate NIDSs. The study
uses three approaches: (7.1) the statistical explana-
tion (i.e., K-T test, multivariate skewness and kur-
tosis measures), (7.2) the features correlations (i.e.,
PCC and GR), and (7.3) and the complexity evalua-
tion using the five classifiers. To measure the com-
plexity of the UNSW-NB15 data set within the
adopted part of the training and the testing sets, as
presented in Table 6. The features that are selected
to execute these aspects are reflected in Table 7.
Table 8. Comparison between the results of the KDD99 and UNSW-NB15 data set.
KDD99 data set UNSW-NB15 data set
Techniques Reference Accuracy (%) FAR (%) Accuracy (%) FAR (%)
DT (Bro-IDS Tool, 2014) 92.30 11.71 85.56 15.78
LR (Witten & Mining, 2005) 92.75 - 83.15 18.48
NB (Shyu et al., 2005) 95 5 82.07 18.56
ANN (Witten & Mining, 2005) 97.04 1.48 81.34 21.13
EM clustering (Salem & Buehler, 2012) 78.06 10.37 78.47 23.79
12 N. MOUSTAFA AND J. SLAY
behaviors. On the contrary, the attack and normal the results of the two data sets, the efficiency
behaviors of the KDD99 data set are outdated. techniques using the KDD99 data set are better
Additionally, the similarities of the normal and than the UNSW-NB15 data set. As a consequence,
the attack observations in majority of the features the UNSW-NB15 data set is considered complex
add another factor to the complexity of the UNSW- due to the similar behaviours of the modern attack
NB15 data set. and normal network traffic. This means that this
Second, from the perspective of the statistical data set can be used to evaluate the existing and
based test, as shown in Figures 1, 2, and 3, the the novel methods of NIDSs in a reliable way.
features of the training and the testing sets are a In the future, we plan to develop a new classifica-
highly correlation because the features are almost tion technique to identify the anomalies from the
similar in the skewness and the kurtosis indicators. nonlinearity and non-normality data representation.
Further, the training and the testing sets have the
same distribution which is non-linear and non-nor-
mal. As a result, the two perspectives demonstrate ORCID
Downloaded by [Mount Allison University 0Libraries] at 17:28 13 January 2016
the major reasons of the complexity of the UNSW- Nour Moustafa http://orcid.org/0000-0001-
NB15 data set compared to the KDD99 data set. 6127-9349
In this paper, the analysis and the evaluation of the Argus tool. (2014). Retrieved from http://qosient.com/argus/
UNSW-NB15 data set are discussed. A part from flowtools.shtml.
Australian Center for Cyber Security (ACCS). (2014).
this data set is divided into a training set and
Retrieved from http://www.accs.unsw.adfa.edu.au/
testing set to examine this data set. The training Aziz, A. S. A., Azar, A. T., Hassanien, A. E., & Hanafy, S. E.
and testing sets are analysed in three aspects of the (2014). Continuous features discretization for anomaly
statistical analysis phase, the feature correlation intrusion detectors generation. In Proceedings of the 17th
phase and the complexity evaluation phase. First, Online World Conference on Soft Computing in Industrial
the features of the two sets are converted into Applications (pp. 209–221). Switzerland: Springer.
Bhuyan, M. H., Bhattacharyya, D. K., & Kalita, J. K. (2014).
numerical values to be statistically processed and
Network anomaly detection: Methods, systems and
normalized using the z-score transformation to tools. IEEE Communications Surveys & Tutorials, 16 (1),
prevent the change in the original distribution. 303–336. doi:10.1109/SURV.2013.052213.00046
The statistical results show that the two sets are Bland, J. M., & Altman, D. G. (1995). Statistics notes:
of the same distribution, nonnormal and non- Calculating correlation coefficients with repeated observa-
linear, using the Kolmogorov-Smirnov test. tions: Part 2—correlation between subjects. Bmj, 310
(6980), 633. doi:10.1136/bmj.310.6980.633
Further, the skewness and kurtosis indicators of
Bouzida, Y., & Cuppens, F. (2006). Neural networks vs. deci-
the training and the testing set are statistically sion trees for intrusion detection. IEEE/IST Workshop on
similar. Second, the feature correlations of the Monitoring, Attack Detection and Mitigation (MonAM),
training and the testing sets are measured either Tuebingen, Germany.
with the class label (i.e., the Pearson’s correlation Bro-IDS Tool. (2014). Retrieved from https://www.bro.org/.
coefficient method) or without the label (i.e., the Cherkassky, V., & Mulier, F. M. (2007). Learning from data:
Concepts, theory, and methods. Hoboken, NJ: John Wiley &
Gain Ratio technique). The feature correlations
Sons.
results demonstrate that these features are highly Cieslak, D. A., & Chawla, N. V. (2009). A framework for
relevant observations. Third, the five techniques of monitoring classifiers’ performance: When and why
the DT, LR, NB, ANN, and EM clustering are used failure occurs? Knowledge and Information Systems, 18
to measure the complexity in terms of accuracy (1), 83–108. doi:10.1007/s10115-008-0139-1
and False Alarm Rate (FAR) of this data set, and Cunningham, R., & Lippmann, R. (2000). Detecting computer
attackers: Recognizing patterns of malicious stealthy behavior.
then the results are compared using the KDD99
MIT Lincoln Laboratory–Presentation to CERIAS, 11, 29.
data set. The evaluation results of the five techni- Denning, D. E. (1987). An intrusion-detection model. IEEE
ques show that the DT technique accomplishes the Transactions on Software Engineering, SE-13 (2), 222–232.
best efficiency compared to others. For comparing doi:10.1109/TSE.1987.232894
INFORMATION SECURITY JOURNAL: A GLOBAL PERSPECTIVE 13
DeWeese, S. (2009). Capability of the People’s Republic of Moustafa, N., & Slay, J. (2015a). Creating novel features to
China (PRC) to conduct cyber warfare and computer net- anomaly network detection using DARPA-2009 data set.
work exploitation. Darby, PA: DIANE Publishing. 14th European Conference on Cyber Warfare and Security
Eom, J.-H., Kim, S.-H., & Chung, T.-M. (2012). Cyber mili- ECCWS-2015. The University of Hertfordshire, Hatfield,
tary strategy for cyberspace superiority in cyber warfare. in UK.
2012 International Conference on Cyber Security, Cyber Moustafa, N., & Slay, J. (2015b). UNSW-NB15: A compre-
Warfare and Digital Forensic (CyberSec), IEEE. hensive data set for network intrusion detection. 2015
García-Teodoro, P., Díaz-Verdejo, J., Maciá-Fernández, G., & Military Communications and Information Systems
Vázquez, E. (2009). Anomaly-based network intrusion Conference. Canberra, Australia: MilCIS 2015-IEEE
detection: Techniques, systems and challenges. Computers Stream.
& Security, 28 (1–2), 18–28. doi:10.1016/j.cose.2008.08.003 Mukkamala, S., Sung, A. H., & Abraham, A. (2005). Intrusion
Ghosh, A. K., Wanken, J., & Charron, F. (1998). Detecting detection using an ensemble of intelligent paradigms.
anomalous and unknown intrusions against programs. Journal of Network and Computer Applications, 28 (2),
Computer Security Applications Conference, 1998. 167–182. doi:10.1016/j.jnca.2004.01.003
Proceedings. 14th Annual. IEEE. Panda, M., & Patra, M. R. (2007). Network intrusion detec-
Hall, M. A., & Smith, L. A. (1998). Practical feature subset tion using naive bayes. International Journal of Computer
Downloaded by [Mount Allison University 0Libraries] at 17:28 13 January 2016
selection for machine learning. In McDonald, C. (Ed.), Science and Network Security, 7 (12), 258–263.
Computer Science ’98 Proceedings of the 21st Australasian Salem, M., & Buehler, U. (2012). Mining techniques in net-
Computer Science Conference ACSC’98 (pp. 181–191). work security to enhance intrusion detection systems.
Berlin, Germany: Springer. International Journal of Network Security & Its
IXIA PerfectStormOne Tool. (2014). Retrieved from http:// Applications (IJNSA), 4 (6). doi:10.5121/ijnsa
www.ixiacom.com/products/perfectstorm Sharif, I., Prugel-Benett, A., & Wills, G. (2012).
Jain, A., Nandakumar, K., & Ross, A. (2005). Score normal- Unsupervised clustering approach for network anomaly
ization in multimodal biometric systems. Pattern detection, networked digital technologies. In Benlamri,
Recognition, 38 (12), 2270–2285. doi:10.1016/j.patcog.2005. R., (Ed), Networked Digital Technologies (Vol. 293, pp.
01.012 135–145). Communications in Computer and
Justel, A., Peña, D., & Zamar, R. (1997). A multivariate Information Science. Berlin, Germany: Springer Berlin
Kolmogorov-Smirnov test of goodness of fit. Statistics & Heidelberg.
Probability Letters, 35 (3), 251–259. doi:10.1016/S0167- Shyu, M.-L., Sarinnapakorn, K., Kuruppu-Appuhamilage, I.,
7152(97)00020-5 Chen, S.-C., Chang, L., & Goldring, T. (2005). Handling
KDDCUP1999. (2007). Retrieved from http://kdd.ics.uci.edu/ nominal features in anomaly intrusion detection problems.
databases/kddcup99/KDDCUP99.html 15th International Workshop on Research Issues in Data
Lazarevic, A., Ertoz, L., Kumar, V., Ozgur, A., & Srivastava, J. Engineering: Stream Data Mining and Applications, 2005.
(2003). A comparative study of anomaly detection schemes in RIDE-SDMA 2005. IEEE.
network intrusion detection. SDM. SIAM. So-In, C., Mongkonchai, N., Aimtongkham, P., Wijitsopon,
Lee, W., Stolfo, S. J., & Mok, K. W. (1999). A data mining K., & Rujirakul, K. (2014). An evaluation of data mining
framework for building intrusion detection models. classification models for network intrusion detection. 2014
Proceedings of the 1999 IEEE Symposium on Security Fourth International Conference on Digital Information
and Privacy, 1999. IEEE. and Communication Technology and its Applications
Mardia, K. V. (1970). Measures of multivariate skewness and (DICTAP), IEEE.
kurtosis with applications. Biometrika, 57 (3), 519–530. Sokolova, M., Japkowicz, N., & Szpakowicz, S. (2006). Beyond
doi:10.1093/biomet/57.3.519 accuracy, F-score and ROC: A family of discriminant
Massey, F. J., Jr (1951). The Kolmogorov-Smirnov test for good- measures for performance evaluation. In AI 2006:
ness of fit. Journal of the American Statistical Association, 46 Advances in artificial intelligence (Vol. 4304, pp.
(253), 68–78. doi:10.1080/01621459.1951.10500769 1015–1021). Lecture Notes in Computer Science. Berlin,
Matlab Tool. (2014). Retrieved from http://au.mathworks. Germany: Springer.
com/products/matlab/?refresh=true SPSS tool. (2014). Retrieved from http://www-01.ibm.com/
McHugh, J. (2000). Testing intrusion detection systems: A software/analytics/spss/
critique of the 1998 and 1999 DARPA intrusion detection Tavallaee, M., (2009). A detailed analysis of the KDD CUP 99
system evaluations as performed by Lincoln Laboratory. data set. In Proceedings of the Second IEEE Symposium on
ACM Transactions on Information and System Security, 3 Computational Intelligence for Security and Defence
(4), 262–294. doi:10.1145/382912.382923 Applications 2009 (pp. 53–58). Piscataway, NJ: IEEE.
Moustafa, N., & Slay, J. (2014, May) UNSW-NB15 DataSet for tcpdump tool. (2014). Retrieved from http://www.tcpdump.
Network Intrusion Detection Systems. Retrieved from org/
http://www.cybersecurity.unsw.adfa.edu.au/ADFA% Valdes, A., & Anderson, D. (1995). Statistical methods for
20NB15%20Datasets computer usage anomaly detection using NIDES (Next-
14 N. MOUSTAFA AND J. SLAY