Preventing SQL Injection Attacks With Python
Preventing SQL Injection Attacks With Python
com
Real Python
21–27 minutes
Every few years, the Open Web Application Security Project (OWASP) ranks the most critical web application
security risks. Since the first report, injection risks have always been on top. Among all injection types, SQL
injection is one of the most common attack vectors, and arguably the most dangerous. As Python is one of the
most popular programming languages in the world, knowing how to protect against Python SQL injection is
critical.
In this tutorial, you’re going to learn:
• What Python SQL injection is and how to prevent it
• How to compose queries with both literals and identifiers as parameters
• How to safely execute queries in a database
This tutorial is suited for users of all database engines. The examples here use PostgreSQL, but the results
can be reproduced in other database management systems (such as SQLite, MySQL, Microsoft SQL Server,
Oracle, and so on).
Setting Up a Database
To get started, you’re going to set up a fresh PostgreSQL database and populate it with data. Throughout the
tutorial, you’ll use this database to witness firsthand how Python SQL injection works.
Creating a Database
First, open your shell and create a new PostgreSQL database owned by the user postgres:
$ createdb -O postgres psycopgtest
Here you used the command line option -O to set the owner of the database to the user postgres. You also
specified the name of the database, which is psycopgtest.
Your new database is ready to go! You can connect to it using psql:
$ psql -U postgres -d psycopgtest
psql (11.2, server 10.5)
Type "help" for help.
You’re now connected to the database psycopgtest as the user postgres. This user is also the database
owner, so you’ll have read permissions on every table in the database.
Now that you have a database, it’s time to set up your Python environment. For step-by-step instructions on how
to do this, check out Python Virtual Environments: A Primer.
Create your virtual environment in a new directory:
(~/src) $ mkdir psycopgtest
(~/src) $ cd psycopgtest
(~/src/psycopgtest) $ python3 -m venv venv
After you run this command, a new directory called venv will be created. This directory will store all the
packages you install inside the virtual environment.
To connect to a database in Python, you need a database adapter. Most database adapters follow version 2.0
of the Python Database API Specification PEP 249. Every major database engine has a leading adapter:
To connect to a PostgreSQL database, you’ll need to install Psycopg, which is the most popular adapter for
PostgreSQL in Python. Django ORM uses it by default, and it’s also supported by SQLAlchemy.
In your terminal, activate the virtual environment and use pip to install psycopg:
(~/src/psycopgtest) $ source venv/bin/activate
(~/src/psycopgtest) $ python -m pip install psycopg2>=2.8.0
Collecting psycopg2
Using cached https://....
psycopg2-2.8.2.tar.gz
Installing collected packages: psycopg2
Running setup.py install for psycopg2 ... done
Successfully installed psycopg2-2.8.2
Now you’re ready to create a connection to your database. Here’s the start of your Python script:
import psycopg2
connection = psycopg2.connect(
host="localhost",
database="psycopgtest",
user="postgres",
password=None,
)
connection.set_session(autocommit=True)
You used psycopg2.connect() to create the connection. This function accepts the following arguments:
• host is the IP address or the DNS of the server where your database is located. In this case, the host is your
local machine, or localhost.
• database is the name of the database to connect to. You want to connect to the database you created earlier,
psycopgtest.
• user is a user with permissions for the database. In this case, you want to connect to the database as the
owner, so you pass the user postgres.
• password is the password for whoever you specified in user. In most development environments, users can
connect to the local database without a password.
After setting up the connection, you configured the session with autocommit=True. Activating autocommit
means you won’t have to manually manage transactions by issuing a commit or rollback. This is the default
behavior in most ORMs. You use this behavior here as well so that you can focus on composing SQL queries
instead of managing transactions.
Executing a Query
Now that you have a connection to the database, you’re ready to execute a query:
>>>
>>> with connection.cursor() as cursor:
... cursor.execute('SELECT COUNT(*) FROM users')
... result = cursor.fetchone()
... print(result)
(2,)
You used the connection object to create a cursor. Just like a file in Python, cursor is implemented as a
context manager. When you create the context, a cursor is opened for you to use to send commands to the
database. When the context exits, the cursor closes and you can no longer use it.
While inside the context, you used cursor to execute a query and fetch the results. In this case, you issued a
query to count the rows in the users table. To fetch the result from the query, you executed
cursor.fetchone() and received a tuple. Since the query can only return one result, you used fetchone(). If
the query were to return more than one result, then you’d need to either iterate over cursor or use one of the
other fetch* methods.
In the previous section, you saw how an intruder can exploit your system and gain admin permissions by using a
carefully crafted string. The issue was that you allowed the value passed from the client to be executed directly
to the database, without performing any sort of check or validation. SQL injections rely on this type of
vulnerability.
Any time user input is used in a database query, there’s a possible vulnerability for SQL injection. The key to
preventing Python SQL injection is to make sure the value is being used as the developer intended. In the
previous example, you intended for username to be used as a string. In reality, it was used as a raw SQL
statement.
To make sure values are used as they’re intended, you need to escape the value. For example, to prevent
intruders from injecting raw SQL in the place of a string argument, you can escape quotation marks:
>>>
>>> # BAD EXAMPLE. DON'T DO THIS!
>>> username = username.replace("'", "''")
This is just one example. There are a lot of special characters and scenarios to think about when trying to
prevent Python SQL injection. Lucky for you, modern database adapters, come with built-in tools for preventing
Python SQL injection by using query parameters. These are used instead of plain string interpolation to
compose a query with parameters.
Now that you have a better understanding of the vulnerability, you’re ready to rewrite the function using query
parameters instead of string interpolation:
1def is_admin(username: str) -> bool:
2 with connection.cursor() as cursor:
3 cursor.execute("""
4 SELECT
5 admin
6 FROM
7 users
8 WHERE
9 username = %(username)s
10 """, {
11 'username': username
12 })
13 result = cursor.fetchone()
14
15 if result is None:
16 # User does not exist
17 return False
18
19 admin, = result
20 return admin
Here’s what’s different in this example:
• In line 9, you used a named parameter username to indicate where the username should go. Notice how the
parameter username is no longer surrounded by single quotation marks.
• In line 11, you passed the value of username as the second argument to cursor.execute(). The connection
will use the type and value of username when executing the query in the database.
To test this function, try some valid and invalid values, including the dangerous string from before:
>>>
>>> is_admin('haki')
False
>>> is_admin('ran')
True
>>> is_admin('foo')
False
>>> is_admin("'; select true; --")
False
Amazing! The function returned the expected result for all values. What’s more, the dangerous string no longer
works. To understand why, you can inspect the query generated by execute():
>>>
>>> with connection.cursor() as cursor:
... cursor.execute("""
... SELECT
... admin
... FROM
... users
... WHERE
... username = %(username)s
... """, {
... 'username': "'; select true; --"
... })
... print(cursor.query.decode('utf-8'))
SELECT
admin
FROM
users
WHERE
username = '''; select true; --'
The connection treated the value of username as a string and escaped any characters that might terminate the
string and introduce Python SQL injection.
Database adapters usually offer several ways to pass query parameters. Named placeholders are usually the
best for readability, but some implementations might benefit from using other options.
Let’s take a quick look at some of the right and wrong ways to use query parameters. The following code block
shows the types of queries you’ll want to avoid:
# BAD EXAMPLES. DON'T DO THIS!
cursor.execute("SELECT admin FROM users WHERE username = '" + username + '");
cursor.execute("SELECT admin FROM users WHERE username = '%s' % username);
cursor.execute("SELECT admin FROM users WHERE username = '{}'".format(username));
cursor.execute(f"SELECT admin FROM users WHERE username = '{username}'");
Each of these statements passes username from the client directly to the database, without performing any sort
of check or validation. This sort of code is ripe for inviting Python SQL injection.
In contrast, these types of queries should be safe for you to execute:
# SAFE EXAMPLES. DO THIS!
cursor.execute("SELECT admin FROM users WHERE username = %s'", (username, ));
cursor.execute("SELECT admin FROM users WHERE username = %(username)s", {'username':
username});
In these statements, username is passed as a named parameter. Now, the database will use the specified type
and value of username when executing the query, offering protection from Python SQL injection.
rowcount, = result
return rowcount
Try to execute the function on your users table:
>>>
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "<stdin>", line 9, in count_rows
psycopg2.errors.SyntaxError: syntax error at or near "'users'"
LINE 5: 'users'
^
The command failed to generate the SQL. As you’ve seen already, the database adapter treats the variable as a
string or a literal. A table name, however, is not a plain string. This is where SQL composition comes in.
You already know it’s not safe to use string interpolation to compose SQL. Luckily, Psycopg provides a module
called psycopg.sql to help you safely compose SQL queries. Let’s rewrite the function using
psycopg.sql.SQL():
from psycopg2 import sql
rowcount, = result
return rowcount
There are two differences in this implementation. First, you used sql.SQL() to compose the query. Then, you
used sql.Identifier() to annotate the argument value table_name. (An identifier is a column or table
name.)
Now, try executing the function on the users table:
>>>
>>> count_rows('users')
2
Great! Next, let’s see what happens when the table does not exist:
>>>
>>> count_rows('foo')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "<stdin>", line 11, in count_rows
psycopg2.errors.UndefinedTable: relation "foo" does not exist
LINE 5: "foo"
^
The function throws the UndefinedTable exception. In the following steps, you’ll use this exception as an
indication that your function is safe from a Python SQL injection attack.
To put it all together, add an option to count rows in the table up to a certain limit. This feature might be useful
for very large tables. To implement this, add a LIMIT clause to the query, along with query parameters for the
limit’s value:
from psycopg2 import sql
rowcount, = result
return rowcount
In this code block, you annotated limit using sql.Literal(). As in the previous example, psycopg will bind all
query parameters as literals when using the simple approach. However, when using sql.SQL(), you need to
explicitly annotate each parameter using either sql.Identifier() or sql.Literal().
Execute the function to make sure that it works:
>>>
>>> count_rows('users', 1)
1
>>> count_rows('users', 10)
2
Now that you see the function is working, make sure it’s also safe:
>>>
>>> count_rows("(select 1) as foo; update users set admin = true where name = 'haki'; --",
1)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "<stdin>", line 18, in count_rows
psycopg2.errors.UndefinedTable: relation "(select 1) as foo; update users set admin = true
where name = '" does not exist
LINE 8: "(select 1) as foo; update users set adm...
^
This traceback shows that psycopg escaped the value, and the database treated it as a table name. Since a
table with this name doesn’t exist, an UndefinedTable exception was raised and you were not hacked!
Conclusion
You’ve successfully implemented a function that composes dynamic SQL without putting your system at risk for
Python SQL injection! You’ve used both literals and identifiers in your query without compromising security.
You’ve learned:
• What Python SQL injection is and how it can be exploited
• How to prevent Python SQL injection using query parameters
• How to safely compose SQL statements that use literals and identifiers as parameters
You’re now able to create programs that can withstand attacks from the outside. Go forth and thwart the
hackers!