SQL For Beginners Tutorial (Learn SQL in 2020) - Datagy
SQL For Beginners Tutorial (Learn SQL in 2020) - Datagy
Welcome to our SQL for Beginners Tutorial! In this guide, you’ll learn everything you need to know to
get started with SQL for data analysis.
We cover off fundamental concepts of the SQL language, such as creating databases and tables,
select records, updating and deleting records, etc.
To download the full guide in a beautiful PDF, a SQL
Cheat Sheet, and a database file to play along with,
click here!
We also cover off some more intermediate concepts such as joining tables. We do this by providing
many SQL examples to guide you through the process.
What is SQL?
What is SQLite?
Conclusion
The main SQL commands you need to know to get started!
What is SQL?
SQL stands for Structured Query Language and is a standard programming language designed for
storing, retrieving, and managing data stored in a relational database management system (RDBMS).
SQL is the most popular database language but has been implemented differently by different
database systems. For the purposes of this tutorial, we’ll use SQLite3 – a trimmed down version of
SQL that is easier to implement if you want to follow along.
SQL is used to create, maintain, and update databases. Because databases are everywhere in
technology, whether on your iPhone or on this website, SQL is used almost everywhere.
It’s also in incredibly high demand in terms of data jobs, as this Indeed study found:
Nearly a quarter of tech jobs require a knowledge of SQL according to an Indeed study. Source.
A more complete answer would be: it depends on what your previous knowledge is. If you have an
understanding of relational databases or other programming languages, you might have an easier
time.
The best way to learn is to dive into it with beginner exercises. Later, you can apply what you’ve
learned to large, more complex examples to better prepare you for the real world.
What is SQLite?
SQLite is a relational database management system that is embedded in the end program. This makes
it an easier solution for this tutorial to follow along with, as it’s something you can set up immediately
on your machine. It’s quite similar in structure to another iteration of SQL called PostgreSQL.
Tip! If you want to follow along with entering code, you can download the free DB Browser
for SQLite.
SQL for Beginners Tutorial – What We’re Creating
Following along with this SQL for beginners tutorial, I’ll walk you through all the code you need to
create the following database. It’s a simple one, but it’ll teach you basic and intermediate SQL skills!
If you’re not familiar with database structures, let’s go over a few key elements before diving in:
Table Names are listed in blue. In this database, we have two tables: clients and orders.
Primary Keys of tables are in bold. Primary keys uniquely identify a record in a table.
A line is drawn between columns that have a relationship. In this case, the client_id in the clients
table connects with userid in the orders table. Each client can have multiple orders – this means
that the client table has a one-to-many relationship.
CREATE TABLE is the command used to instruct SQL to create a new table,
IF NOT EXISTS only makes SQL create the table is the table doesn’t already exist,
Within brackets, columns are defined by providing: the column name and any constraints.
SQLite supports PRIMARY KEY, UNIQUE, NOT NULL, CHECK column constraints.
Within the brackets, table constraints such as PRIMARY KEY, FOREIGN KEY, and UNIQUE.
We end with a semi-colon, which let’s SQL know that the command is complete.
Assigning a PRIMARY KEY value to a column or multiple columns means that the column(s) uniquely
identify a record in the table.
In order to create the two tables for our sample database, we would write the following code:
When we run this command, we create our two tables. We’re including the IF NOT EXISTS command
so that we can re-run the command without SQL throwing an error that the tables already exist.
Back to the Table of Contents of our SQL for Beginners Tutorial?
We first specify the name of the table we want to add values into
We then specify a list of all the columns in the table. While this list is optional, it’s good practice to
include it.
We then follow with a list of values we want to include. If we don’t spell out all the column names,
we have to include a value for each column.
If the table has some constraints, such as UNIQUE or NOT NULL, these need to be maintained in our
INSERT statement.
UPDATE table_name
SET column1 = value1, column2 = value2, ...
WHERE condition;
We follow UPDATE with the name of the table where we want to update records,
SET is followed by a list of column = value pairings of which columns we want data to be updated
in
The WHERE statement identifies the records where we want data to be updated
Note! The WHERE statement identifies the records to be updated. While this field is
optional, if it’s left blank, it causes all records in that table to be overwritten.
Let’s try this out to update one of our records in our client table:
UPDATE clients
SET fname = 'Jean', lname = 'Grey'
WHERE client_id = 2;
In this example, we updated our second record in the clients table to change the client’s first name
from Jane to Jean the last name from Doe to Grey.
If we had left the WHERE statement blank, all first names in the table would have become Jean and all
last names would have become Grey!
WHERE is followed by the condition(s) which are used to tell SQL which records to delete
Note! The WHERE statement is optional, but if it’s left blank, all the records in the table will
be deleted.
In the above example, we specified that we wanted to delete the record where order_id is equal to 6,
from the orders table.
Selecting and retrieving data is an important skill for data analysis and data science. Because of this,
we’ll dedicate a significant amount of time to this to provide helpful examples!
The most straightforward to select data with SQL is to select all the records in a table. This is
accomplished using the structure below:
The asterisk (*) is used as a wildcard character in SQL. We ask SQL to return all columns in a table.
If we wanted to only return a number of columns from a table, we could specify the column names in
our SELECT statement. This follows the structure below:
In true databases, tables will have many, many more rows than our sample tables. By limiting outputs,
you can also improve the performance of your queries, which is especially useful when you’re testing
a query.
Let’s see what this looks like! For the purposes of our SQL for beginners tutorial, we will follow SQLite
syntax, which follows the MySQL syntax:
SELECT column_names
FROM table_name
LIMIT num_of_rows;
This follows a similar structure to a regular select statement, except we add a LIMIT clause at the end
with a number of rows we want to limit the query to.
The WHERE clause is used in the SELECT statement in the following way:
The FROM statement is used to identify the table from which to extract records,
Let’s say that we only wanted to select records from our orders table where the total price was higher
than 100, we could write:
SELECT *
FROM orders
WHERE total > 100;
To be able to more accurately filter data, we can use different operators, which are listed out below.
Since this is a SQL for beginners tutorial, we’ll only cover off some of them in this tutorial.
Operator Description Example
BETWEEN Matches a value between a range of values WHERE total BETWEEN 50 AND 100
We can also apply multiple conditions to a WHERE clause. Within this, we can use the different
operators that we showed above. We can apply this with AND and OR statements to further filter
data.
The AND operator is used to evaluate whether two conditions are TRUE. It’s used in combination with
the WHERE statement. This follows the format below:
SELECT column_names
FROM table_name
WHERE condition1 AND condition2;
This returns only records where both condition1 and condition2 are met.
If we wanted to, for example, return all orders where the user_id is equal to 1 and the order total is
greater or equal to 200. We could do this using the following code:
SELECT *
FROM orders
WHERE userid = 1 AND total >= 200;
OR statements are used when only one condition needs to be true. This is helpful in situations where
it doesn’t matter which condition is true.
SELECT column_names
FROM table_name
WHERE condition1 OR condition2;
SELECT *
FROM orders
WHERE userid = 1 OR total >= 90;
The table above includes any record where there userid is equal to 1 or where the total is greater or
equal to 90.
SELECT column_names
FROM table_name
WHERE condition1 AND (condition2 OR condition3);
SELECT *
FROM orders
WHERE total < 200 AND (userid = 1 OR userid = 3);
As part of our SQL for beginners tutorial, let’s take a look at an example. We may be asked, “What is
the average value of each order?”. We can do this easily in SQL using our sample database by writing
the following code:
This returns:
AVG(total)
185.6
The SELECT statement lists out columns and aggregate functions applied to columns.
The GROUP BY statement identifies which column to group by. It’s helpful to have this column in
the SELECT statement.
Let’s try this with an example. Say we wanted to know what the total value of orders and count of
orders were, by client, we could write:
In the example above, we can see that the column names of SUM(total) and COUNT(total) are
accurate, but not easy to understand. We may want to change the column names to
“Total_Sale_Value” and “Total_Number_of_Sales”.
In SQL, this is done with what is known as an alias. Let’s see how this is accomplished:
The “as” is optional, but makes the code easier to read. The same would be accomplished using:
If we wanted to apply this to our query from the Aggregating Data example from earlier in our SQL
for beginners tutorial, we could write:
Let’s take a quick look at our database we created for this SQL tutorial for beginners:
The diagram above shows that client_id in the clients table has a one-to-many relationship with the
userid field in the orders table. Practically, this means that a single client can have multiple orders.
In terms of databases, this means that userid is a foreign key for the client_id field. Because this
relationship exists, we know that we can join these two tables.
There are a number of different types of joins. Let’s take a look at these now.
The different types of joins available in SQL
Inner Join
An Inner Join only the rows of tables that exist in both tables. Take the two tables as an example. If we
created a new client that did not yet have any orders, that new client would not show up as he or she
would not be represented within the orders table.
In the SELECT statement, we include all the fields we want to bring in, from both tables. We prefix
the column names with the table name as best practice, in case there is an overlap between
column names.
If you knew that you wanted to return all records from one table, you could write table_name.*
The FROM statement is followed by an INNER JOIN statement that identifies the table the join.
The ON statement identifies which fields to merge on. This identifies the two fields in each table
that have a foreign key relationship.
Let’s demonstrate this with an example. Say we wanted to join in the first and last names of clients
onto the orders table. To demonstrate this better, let’s create a customer in the clients table, but not
any orders for that customer.
Now, let’s do an inner join of the two tables. If this runs correctly, we should not see our new client
returned in the table.
Outer Joins
There are three types of outer joins: left join, right join, and outer (or full) join.
Left Join
A left join includes all the records from the table on the “left” and only matching records from the
table on the right. If you’re familiar with VLOOKUP in Excel, you can think of a left join as being a
VLOOKUP from one table to another.
The format is quite similar to an inner join. Let’s explore this in more detail:
In the SELECT statement, we list out all the fields we want to bring in. We prefix the column names
with the table name.
If you knew that you wanted to return all records from one table, you could write table_name.*
The FROM statement is followed by a LEFT JOIN statement that identifies the table the join.
Let’s now write a statement that merges in order data into the client table. What we would expect to
see is that any client that does not yet have any orders would still exist in the returned data, but not
have any data in the columns relating to orders.
Note here that client_id 4 exists in the table, but does not return any values for the total column. This
is because that client hasn’t placed any orders (therefore doesn’t exist in the right table), but exists in
the left table.
Right Join
A right join includes all the records from the table on the “right” and only matching records from the
table on the left. However, note that SQLite doesn’t support a right join, but other implementations of
SQL do. As this is a SQL for beginners tutorial, we won’t cover off other languages here.
A full outer join will return records from both tables, regardless if they exist in one and not in the
other. However, note that SQLite doesn’t support a right join, but other implementations of SQL do.
You can download our printable PDF of this guide along with a complete database file by signing up
for our mailing list below!
00:0 09:2
0 7
email address
SUBSCRIBE