ms160400843 - Synopsis v2

Research Proposal/Synopsis for MS/M.Phil/Ph.
D Thesis
Department of Computer Science
1. Name of the Student: Muhammad Shahid Azeem

2. VU ID:MS160400843 3. Session: Fall 2016
4. Semester: Fall 2018 5. Field of Specialization: Computer Networks
6. Title of Research Proposal: Enhanced Network Anomaly Detection Model Based on
Supervised Learning Techniques with symbolic features selection
7. Date of Enrolment in Research: October 19, 2017

8. Duration of Proposed Research: 1.5 years
9. Total Funds Requested (if any) Rs. No (Rupees No )
===============================================================================
Supervisor, Supervisory Committee (SC) Information
1. Name of Supervisor: Hasnain Ahmed esignation:
Assistant Professor
Email ID: [email protected]
Affiliation: _Virtual University of Pakistan________________________________________
2. Name of Supervisory Committee (SC) Member 1:_Mr. Syed Shah Muhammad__________
Designation: Lecturer_______________________ Email ID: [email protected]
Affiliation: Virtual University of Pakistan________________________________________
3. Name of Supervisory Committee (SC) Member 2: Mr. Zafar Naz ir
Designation: Instructor Email ID: [email protected]
Affiliation: Virtual University of Pakistan
Topic
Enhanced Network Anomaly Detection Model Based On Supervised Learning Techniques with
Symbolic Features Selection
Abstract/Summary
The massive growth in the Internet during last two decades increases the importance of cyber
security. Numerous new threats to data security are being created on daily basis. Intrusion
Detection System (IDS) is a primary defence mechanism to secure data and resources from
illegal disclosure and unauthorized use. Various approaches for cyber security were proposed by
researchers i.e. signature based intrusion detection and anomaly detection based intrusion
detection. In signature based intrusion detection approach the IDS has a database containing the
signatures of harmful traffic like viruses. An ID sniffs and analyses each data packet and
compare it with its database. In case of a match, it removes the data packet from network. The
major flaw in this approach is human involvement and day zero attacks. Anomaly detection
based traffic analysis emerged to cater this problem. In this approach, network traffic is
categorized into valid traffic and anomalous traffic. Appropriate features selection plays a vital
role in performance and accuracy of the Anomaly detection model. Existing approaches mostly
rely on quantitative features. These approaches have limited capability to work with symbolic
features. Dearth of work on studying encoding of qualitative features to quantitative features is a
significant flaw of these approaches. Encoding of qualitative features into quantitative features
can increase accuracy level of anomaly detection model. In this study I propose an Intrusion
Detection System based on anomaly detection model using supervised learning techniques using
both quantitative and symbolic features. Different supervised learning techniques i.e. Nearest
Neighbour, Random Forest, Multilevel perceptron and Decision tree, along with encoding
techniques i.e. Polynomial encoding, Leave one out encoding, Target encoding are used to
enhance the anomaly detection process in unbalanced network traffic. The proposed model will
be tested on UNSW-NB15 data set. Experimental results are recorded and compared for
suitability of anomaly detection model against the UNSW-NB15 data set.
Introduction
Catering security threats to information and other network resources is a hot cake of research in
the field of Information Technology. Cyber threats landscape is increasing drastically. About 430
million new malwares, 362 Crypto-ransomware, and other cyber threats were discovered on
Internet in 2015, reported by Internet Security Threat Report (ISTR) (Semente, 2016). Cyber
assets protection has emerged to a most critical concern for governments, corporate society and
for individuals as well. In 2016 75 million dollars were spent on cyber security services around
the globe.
Intrusion detection in the domain of network security is a process of analysing network traffic for
invalid usage patterns. Being a primary security mechanism, Intrusion detection system is
responsible for differentiating the antagonistic traffic flows from valid network flows. The need
of protection of cyber assets forms unauthorized access and illegal disclosure make the IDS the
forefront of cyber security domain. Various approaches were proposed for invalid behaviour
detection. In past signature based IDSs were commonly used. Signature based IDS were good for
known attacks and attack vectors (Naseer et al., 2018). These systems are not appropriate for
modern era because of ever changing threats landscape. A huge number of new threats are being
invented and spreader via internet daily. Traffic Analysts must analyse the network traffic
continuously and identify harmful data patterns. These patterns are then updated in database. But
before pattern detection and updating in database, there is no mechanism to stop such data packet
to detriment the resources. This scenario is known as day zero attack, which can be very
dangerous, even, can lead to a non-recoverable state of affairs.
An alternative approach in this regard is anomaly detection in network traffic. An anomaly is a

traffic pattern in network that is deviating from expected network traffic behaviour (Nevat et al.,
2018). An anomaly can cause extravagant damages in a network i.e. search vulnerability in
network or initiate an attack such as DDoS attack etc. Anomaly detection can be used in many
situations for same purpose i.e. fraud detection, location spoofing detection in IoTetc (Koh,
Nevat, Leong & Wong, 2016). So anomaly detection performs well in case of unknown attacks.
An efficient Intrusion detection system is a core contrivance in network security (Li, Ma & Jiao,
2015). IDS apply all of his capabilities to detect unusual and unacceptable traffic patterns and
intrusions attempted by crackers (Salama, Eid, Ramadan, Darwish&Hassanien, 2011). It is a big
challenge to design efficient, reliable and affordable IDS to meet network security objectives.
Anomaly detection is sculpted as a classification problem. Seminal work in this regard is

presented by Denning (Denning, 1987) and Stanford Chen (García-Teodoro, Díaz-Verdejo,
Maciá-Fernández&Vázquez, 2009). Denning (1987) proposed the use of learning algorithms on
traffic flows to classify abnormal traffic and intrusion attempts. Various classifiers can be used
for classification form three main categories, supervised learning, unsupervised learning and
semi supervised learning. In supervised learning, labelled data is used to train anomaly detection
model. A possible set of classes is predicted in advance is supervised learning approach. So a
training data set consists of inputs and possible outcomes. Model will classify new data based on
given data set (Aljawarneh, Aldwairi&Yassein, 2018).
Literature review of anomaly detection problem leads us to some prominent lacks. First of all
there is no comprehensive attempt to investigate a comprehensive solution of network security.
Some isolated studies were carried out although by different researchers, as described in
Literature Review section of this document, no proper solution is presented yet. Deep Learning is
suggested by a number of researchers. Deep learning is a powerful and successful technique in
classification problem, but it is computationally very expensive which is difficult to apply in low
cast solutions. It is also very hard to train a DNN based classification model even require
thousands of examples during training. That is a tedious and time consuming situation. Another
prominent problem is lack of using proper validation metrics of different classifiers among them.
To fill these lacks, an efficient anomaly detection model is designed and implemented using
conventional supervised learning techniques. Supervised learning (Kwon et al., 2017) also
referred as predictive or directed classification, identifies a set of possible classes in advance.
This family of techniques receive a bunch of pre-classified data instances for training. This
training dataset comprises both inputs and desired results. It classifies new data on the basis of
training dataset then. Various renowned supervised learning algorithms are support vector
machines (SVMs), artificial neural network (ANN), logistic regression, nave Bayes (NB), K-
nearest neighbours (KNN), random forests (RF), decision trees (DT), etc.
Feature selection is a procedure to choose appropriate set of attack features against various attack
classes to detect anomalous behaviour in the data flow (Ravale, Marathe & Padiya, 2015). There
may be a huge number of features and it is impossible to work will all of them. Therefore some
valuable features are selected out of them. Most of the studies work with quantitative features.
Ravale et al. stated that they will work with measureable features i.e. no. of times login failed
(Ravale, Marathe & Padiya, 2015). Various symbolic features are also in scene along with
quantitative features and I believe that using symbolic features, anomaly detection model will
perform more accurately than existing anomaly detection based IDSs. Symbolic features must be
encoded to quantitative form to measure them. For this purpose various encoding schemes can be
used Binary Encoder, Hashing Encoder, Helmert Encoder, OneHotEncoder, OrdinalEncoder,
SumEncoder, PolynomialEncoder, BaseNEncoder, LeaveOneOutEncoder, TargetEncoder.
Research Questions
1. Can dimensionality reduction techniques be useful for anomalous behaviour detection in

network traffic?
2. Which evaluation parameters can be used to effectively evaluate the performance of
anomaly detection model?
3. Does encoding and inclusion of qualitative/symbolic features increases the performance
of anomaly detection model?
Research Objectives
This piece of study is intended to propose a reliable, acceptable and affordable Intrusion
Detection system (IDS) model to enhance the network security mechanism using supervised
learning techniques from the domain of machine learning. IDS model will be trained to classify
the normal network traffic flows and unexpected network flows. The model will be validated
against various known cyber-attacks. A major objective of the study is to prove the validity of
IDS model against those well-known network attacks.
Literature Review:
Anomaly detection problem is a kind of classification problem that uses some feature or
characteristic of sample data to categorize it. This feature is some sort of summery of raw data.
Various dimensionality reduction techniques i.e. supervised learning, unsupervised learning,
semi supervised learning, have been proposed to improve anomaly detection model performance.
Shadi Aljawarneh1 states that the first ever IDS proposed by Dorothy E. Denning during a
research conducted under the SRI International (Aljawarneh, Aldwairi&Yassein, 2018). This
leads to a new generation of intrusion detection systems referred as the anomaly detection based
IDS.
Bhavesh Borisaniya discussed (Borisaniya& Patel, 2015)a misuse detection case study using
ADFA-LD and ADFA-WD datasets. He used a modified vector space and an N-gram feature
extraction technique. This approach generated classifier models to classify both binary and
multiple classes. Using various classifiers they presented classification accuracy up to 92% and
20% false positive rate on binary and multiclass dataset. They used IBk and J48 classifiers in
Weka. They also reported an accuracy of 96% and 19% false positive rate for binary class
problem.
In a study by Assem, Rachidi and Graini (Assem, Rachidi&Taha El Graini, 2018) a misuse
detection system, named SC2.2, is proposed to address binary class problem using UNM
datasets. Using Markov chain model of long sequence of system calls, they define conditional
probabilities in four different datasets. They used true positive rate (TPR), and false positive rate
(FPR) for naïve Bayes multinomial (NBm), C4.5 decision tree, Repeated Incremental Pruning to
Produce Error Reduction (RIPPER), support vector machine (SVM), and logistic regression (LR)
classifiers in their model. They presented that classifier accuracy ranges from 97% to 99% and
false positive rate ranges from 0.3% to 3% on all UNM datasets.
Eduardo DelaHoz et al. (De la Hoz, De La Hoz, Ortiz, Ortega &Prieto, 2015) discussed a
combination of statistical techniques and self-organizing maps for network anomalies detection.
He utilized Fisher’s discriminant ration along with principal component analysis for feature
selection.Using probabilistic self-organizing maps and noise removal, network traffic is
classified into two broad categories, normal traffic and anomalous traffic.
WathiqLaftah Al-Yaseen et al.(Al-Yaseen, Othman &Nazri, 2017) proposed a multilevel hybrid

intrusion detection model to classify the network traffic into normal and abnormal behavior.
Their model based on support vector machine and extreme learning machine to enhance the
capability of detecting known and unknown attacks. They used a modified K-mean algorithm to
train the model that builds new training datasets. This algorithm improved the working of
classifiers and reduced the training effort and time, contributed significantly in improving
efficiency of IDS.KD Cup 1999 dataset is used to evaluate the performance and efficiency of the
model and reported the attack detection accuracy up to 95.75%.
In a study (Naseer et al., 2018) different deep learning technique are studied to investigate their
suitability for anomaly detection in network flows. They developed a IDS model based on
different deep learning techniques, i.e. Convolutional Neural Networks (CNNs), Auto-encoders
and Recurrent Neural Networks (RNNs).they used NSLKDD training dataset to train their model
and same dataset namely NSLKDDTest+ and NSLKDDTest21 for evaluation and evaluated on
both test datasets. After evaluation they reported that Deep Convolutional Neural Network
(DCNN) and Long Short term Memory (LSTM) Recurrent neural network (RNN) Models
proved up to 85% and 89% accurate on test dataset. They concluded that deep learning is a
viable and promising technology for anomaly detection in network security.
Aygun&Yavuz et al. (Aygun&Yavuz, 2017) used vanilla and de-noising deep Auto-encoders for
anomaly detection based IDS model with NSLKDD dataset and reported 88.28%and 88.6%
accuracy rate on NSLKDDtest+ dataset. NSLKDDtest21 dataset were not provided in this study
and also there is lack of other quality metrics to evaluate the quality of their classifiers.
Methodology/Research Design
This IDS model is based on supervised learning techniques to classify normal and abnormal
traffic patterns. For state of the art classification techniques Nearest Neighbour, Random Forest,
Multilevel perceptron and Decision tree are used in this model. All these techniques will be
modelled in Python programming language.
This model will be trained using a well-known and reliable standard dataset, names as UNSW-
NB15. This data set is divided into two parts (Moustafa & Slay, 2016) training set and testing
set. Moustafa et al. (Moustafa & Slay, 2016) states that UNSW-NB15is recently generated as a
benchmark dataset for IDS performance evaluation. It has nine types of modern attacks and
comprise of realistic activities of a normal traffic captured within change in time. This dataset
also comprises 49 feature of data flow between different nodes in a network. Authors compared
the UNSW-NB15 dataset with KDD99 dataset statistically and practically proved that UNSW-
NB15 is more complex and reflects real world situations in more sophisticated fashion.
In this study, after training of model the anomaly detection accuracy of the model will be
evaluated and compared statistically with UNSW-NB15 dataset.
References/Bibliography
[1] Semente: 2016 Internet Security Threat Report (ISTR), vol. 21, p.8, April 2016
[2] Nevat, I., Divakaran, D., Nagarajan, S., Zhang, P., Su, L., Ko, L., & Thing, V. (2018).
Anomaly Detection and Attribution in Networks with Temporally Correlated
Traffic. IEEE/ACM Transactions on Networking, 26(1), 131-144. doi:
10.1109/tnet.2017.2765719
[3] Koh, J., Nevat, I., Leong, D., & Wong, W. (2016).Geo-Spatial Location Spoofing
Detection for Internet of Things. IEEE Internet Of Things Journal, 3(6), 971-978. doi:
10.1109/jiot.2016.2535165
[4] Li, Y., Ma, R., & Jiao, R. (2015). A Hybrid Malicious Code Detection Method based on
Deep Learning. International Journal Of Security And Its Applications, 9(5), 205-216.
doi: 10.14257/ijsia.2015.9.5.21
[5] Salama, M., Eid, H., Ramadan, R., Darwish, A., &Hassanien, A. (2011).Hybrid
Intelligent Intrusion Detection Scheme. Advances In Intelligent And Soft Computing, 293-
303. doi: 10.1007/978-3-642-20505-7_26
[6] Denning, D. (1987). An Intrusion-Detection Model. IEEE Transactions On Software
Engineering, SE-13(2), 222-232. doi: 10.1109/tse.1987.232894
[7] García-Teodoro, P., Díaz-Verdejo, J., Maciá-Fernández, G., &Vázquez, E. (2009).
Anomaly-based network intrusion detection: Techniques, systems and
challenges. Computers & Security, 28(1-2), 18-28. doi: 10.1016/j.cose.2008.08.003
[8] Ravale, U., Marathe, N., & Padiya, P. (2015). Feature Selection Based Hybrid Anomaly
Intrusion Detection System Using K Means and RBF Kernel Function. Procedia
Computer Science, 45, 428-435. doi: 10.1016/j.procs.2015.03.174
[9] Aljawarneh, S., Aldwairi, M., &Yassein, M. (2018).Anomaly-based intrusion detection
system through feature selection analysis and building hybrid efficient model. Journal Of
Computational Science, 25, 152-160. doi: 10.1016/j.jocs.2017.03.006
[10] Kwon, D., Kim, H., Kim, J., Suh, S., Kim, I., & Kim, K. (2017). A survey of deep
learning-based network anomaly detection. Cluster Computing.doi: 10.1007/s10586-017-
1117-8
[11] Liao, Y., &Vemuri, V. (2002).Use of K-Nearest Neighbor classifier for intrusion
detection. Computers & Security, 21(5), 439-448. doi: 10.1016/s0167-4048(02)00514-x
[12] Borisaniya, B., & Patel, D. (2015).Evaluation of Modified Vector Space Representation
Using ADFA-LD and ADFA-WD Datasets. Journal of Information Security, 06(03),
250-264. doi: 10.4236/jis.2015.63025
[13] Assem, N., Rachidi, T., &Taha El Graini, M. (2018). INTRUSION DETECTION USING
BAYESIAN CLASSIFIER FOR ARBITRARILY LONG SYSTEM CALL
SEQUENCES. IADIS International Journal On Computer Science And Information
Systems, 9(1), 71-81. Retrieved from
http://www.iadisportal.org/ijcsis/papers/2014170106.pdf
[14] De la Hoz, E., De La Hoz, E., Ortiz, A., Ortega, J., &Prieto, B. (2015). PCA filtering and
probabilistic SOM for network intrusion detection. Neurocomputing, 164, 71-81. doi:
10.1016/j.neucom.2014.09.083
[15] Al-Yaseen, W., Othman, Z., &Nazri, M. (2017). Multi-level hybrid support vector
machine and extreme learning machine based on modified K-means for intrusion
detection system. Expert Systems with Applications, 67, 296-303. doi:
10.1016/j.eswa.2016.09.041
[16] Naseer, S., Saleem, Y., Khalid, S., Bashir, M., Han, J., Iqbal, M., & Han, K. (2018).
Enhanced Network Anomaly Detection Based on Deep Neural Networks. IEEE
Access, 6, 48231-48246. doi:10.1109/access.2018.2863036
[17] Aygun, R., &Yavuz, A. (2017). Network Anomaly Detection with Stochastically
Improved Autoencoder Based Models. 2017 IEEE 4Th International Conference On
Cyber Security And Cloud Computing (Cscloud).doi: 10.1109/cscloud.2017.39
[18] Moustafa, N., & Slay, J. (2016). The evaluation of Network Anomaly Detection Systems:
Statistical analysis of the UNSW-NB15 data set and the comparison with the KDD99
data set. Information Security Journal: A Global Perspective, 25(1-3), 18-31. doi:
10.1080/19393555.2015.1125974
[19]
DECLARATION
We hereby agree to supervise the research work as per above proposal/synopsis.
____________________
Signature of Supervisor
Signature of SC Member 1 Signature of SC Member 2

Date: _________ Date: _________
Note: Hard and soft copy of synopsis/research proposal must be submitted to secretary ASRB for final approval.
FOR VU THESIS SUPERVISOR USE ONLY
Profile of Supervisor
Name of Supervisor:_________________________________________________________
Designation: _______________________________________________________________
 Total No. of Impact Factor Research Publications during last 5 years: ____
 Total No. of Publications without Impact Factor during last 5 years: _______________
Ongoing
Research students
Number of MS/M.Phil. Number of PhD students

students
Signature of Supervisor
Endst. No. ___________ Dated:______________
The Proposal entitled “ ”duly

recommended by the Graduate Research Committee (GRC) in its meeting held on __________ is
forwarded to ASRB through the Dean of the Faculty for approval and allocation of funds (if
requested).
Signature / Seal
Chairperson of the Department
Date: ___________
Signature / Seal Signature / Seal

Dean of the Faculty Secretary ASRB
Date: ___________ Date: ___________

ms160400843 - Synopsis v2

Uploaded by

Document Informationclick to expand document information

Copyright:

Available Formats

ms160400843 - Synopsis v2

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

ms160400843 - Synopsis v2

Uploaded by

Copyright:

Available Formats

Research Proposal/Synopsis for MS/M.Phil/Ph.

1. Name of the Student: Muhammad Shahid Azeem

7. Date of Enrolment in Research: October 19, 2017

An alternative approach in this regard is anomaly detection in network traffic. An anomaly is a

Anomaly detection is sculpted as a classification problem. Seminal work in this regard is

1. Can dimensionality reduction techniques be useful for anomalous behaviour detection in

WathiqLaftah Al-Yaseen et al.(Al-Yaseen, Othman &Nazri, 2017) proposed a multilevel hybrid

We hereby agree to supervise the research work as per above proposal/synopsis.

Signature of SC Member 1 Signature of SC Member 2

Number of MS/M.Phil. Number of PhD students

Endst. No. _ Dated:____

The Proposal entitled “ ”duly

Signature / Seal Signature / Seal

You might also like