Top 33 SQL Interview Questions For Data Analysts (
Top 33 SQL Interview Questions For Data Analysts (
Table of contents
Introduction
Data Analyst SQL interview questions aim to quickly evaluate your ability to extract metrics and work with
data using SQL. You’ll usually need to write SQL queries on a whiteboard or in a code editor.
SQL questions cover a range of topics, from asking when to use a GROUP BY statement to challenging you
to create a query that supports or disproves a product metrics hypothesis.
The goal remains simple, regardless of the interview format: produce clean SQL code as fast as possible.
Data analyst SQL questions fall into three categories:
Easy SQL questions - These questions focus on de ning SQL features, basic use cases, and
di erentiating between commands like ẂHERÉ and H́AVINǴ. They may also include simple queries.
Intermediate SQL questions - These questions require you to write complex queries using joins, sub-
queries, self-joins, and window functions. They may also involve They may also include analytics case
studies.
Hard SQL questions - These questions challenge you to write advanced queries, including the use of
indices and complex SQL clauses. They may also include more advanced analytics case studies.
How Is SQL Tested in Data Analyst Interviews?
SQL technical screens are a part of nearly every data analyst interview. In these screens, candidates are
asked to answer real-world problems using SQL.
Most commonly, candidates are provided a dataset and asked to write a SQL query to return the desired
data.
How do companies test pro ciency? There are three main types of SQL interview questions:
. Whiteboarding - SQL whiteboard tests are a common part of interviews. In a whiteboard test, you’re
required to write SQL queries by hand, which allows companies to assess your understanding of SQL
concepts and problem-solving ability.
. Coding tests - Many companies ask you to write code and run queries in live interviews. With live coding
screens, you can check for syntax errors while you work, and it provides companies a way to see your
coding e ciency.
. SQL case studies - In case interviews, you’re given a real-world problem and asked to use your SQL skills
to solve the problem. These are typically open-ended questions that leave room for analysis and creative
problem-solving.
Easy SQL Questions for Data Analysts
Data analyst interviews and technical screens generally start with beginner SQL questions. There are two
main types of easy SQL questions:
Basic SQL Queries - You will often be asked basic SQL interview questions that require you to write a
query. You might be asked to get the COUNT of a table, make a simple join, or use the HAVING clause.
De nitions - Less frequently encountered, these questions ask you to explain technical concepts,
compare two or more SQL functions or de ne how a concept is used.
If you were to attempt to formulate a query selecting a number of rows with a WHERE clause, and then
display an aggregate value alongside, you would nd that the query would return an error. This is because
SQL is not able to display, in a single table, the results of our WHERE query as a list of values conjoined with
the aggregate value you are looking for.
2. What are the most common aggregate functions in SQL? What do they do?
An aggregate function performs a calculation on a set of values and returns a single value summarizing the
set. The three most common aggregate functions in SQL are: COUNT, SUM, and AVG.
In SQL, a unique key is one or more columns or elds that identify a record in a database. Tables can have
multiple unique keys, which is a di erence between unique keys and primary keys. With unique keys, only
one NULL value is accepted for the column and it cannot have duplicate values.
UNION and UNION ALL are SQL operators used to concatenate two or more result sets. This allows us to
write multiple SELECT statements, retrieve the desired results, then combine them together into a nal,
uni ed set.
The two most common types of joins in SQL are LEFT and RIGHT. The main di erence is that these JOIN
operators deal with matched and unmatched rows.
A LEFT JOIN includes all records from the left table and matched rows from the right. A RIGHT JOIN
returns all rows from the right table and unmatched rows from the left.
Tables contain data and they are made up of columns and rows. A view is a virtual table, which generally is
dependent on data from the table for its display.
One use case for a view can be found if you wanted to look at a subset of data from a table. You could create
a view using the SELECT command to query the data.
The LIKE operator is used to search for a speci c pattern within a column. It is used with a WHERE clause to
query speci c columns.
WHERE Employee Name LIKE '%r' - Finds matches that end with "r"
WHERE Employee NAME LIKE '%gh%' - Finds matches that include "gh" in any posi
WHERE Employee NAME LIKE '_ch%' - Finds matches with "ch" in the second and th
WHERE Employee NAME LIKE 'g%r' - Finds matches that start with "g" and end wit
You update an existing table with the UPDATE command in SQL. It is used with SET (which includes the
updated information) and WHERE to select the speci c instance.
Example: In the table ‘Employees’, you want to change the emergency contact, ContactName, for an
employee with EmployeeID 3.
UPDATE Employees
SET ContactName = "Bob Smith"
WHERE EmployeeID = 3;
9. Which operator is used to select values within a range? What types of values can be
selected?
The BETWEEN operator is used to select values within a range. You can use numbers, texts or dates with
BETWEEN.
One important thing to note: the BETWEEN operator includes both the start and end dates.
SELECT EmployeeID
FROM Employees
WHERE EmployeeID BETWEEN 378 AND 492
10. Write a query that outputs a random manufacturer’s name
Given a table of cars with columns id and make, write a query that outputs a random manufacturer’s name
with an equal probability of selecting any name.
Input:
cars table
id make
1 Ford
2 Toyota
3 Toyota
4 Honda
5 Honda
6 Honda
Output:
Column Type
make Text
Complex SQL queries - Intermediate SQL questions ask you to perform joins, sub-queries, self-joins, and
window functions.
SQL/Analytics case studies - Many intermediate questions take the form of case studies or ask you to
perform analysis on the data you pull from a query.
Given tables employees, employee_projects, and projects, nd the 3 lowest-paid employees that
have completed at least 2 projects.
Note: incomplete projects will have an end date of NULL in the projects table.
12. Given the tables users and rides, write a query to report the distance traveled by
each user in descending order.
For this question, you need to accomplish two things: the rst is to gure out the total distance traveled for
each user_id, and the second is to order from greatest to least each user_id by a calculated distance
traveled.
13. Write a query to nd all the users that are currently “Excited” and have never been
“Bored” within a campaign.
For this medium SQL problem, assume you work at an advertising rm. You have a table of users’
impressions of ad campaigns over time. Each user_id from these campaigns has an attached impression_id,
categorized as either “Excited” or “Bored”. You will need to assess which users are “Excited” by their most
recent campaign and have never been “Bored” in any past campaign.
Note: This is the type of question that might get asked for a marketing analyst job.
14. Write a SQL query to select the second-highest salary in the engineering department.
To answer this question, you need the name of the department to be associated with each employee in the
employees table to understand which department each employee is a part of.
The “department_id” eld in the employees table is associated with the “id” eld in the departments
table. You can call the “department_id” a foreign key because it is a column that references the primary key
of another table, which in this case is the “id” eld in the departments table.
Based on this shared eld, you can join both tables using INNER JOIN to associate the department name
with their employees.
SELECT salary
FROM employees
INNER JOIN departments
ON employees.department_id = departments.id
With the department name in place you can now look at the employees of the Engineering team and sort by
salary to nd the second highest paid.
15. Given a table of bank transactions, write a query to get the last transaction for each
day.
More Context: The table includes the columns: id, transaction_value and created_at (representing the time
for each transaction).
Since our goal in this problem is to pull the last transaction from each day, you want to group the
transactions by the day they occurred and create a chronological order within each day from which you can
retrieve the latest transaction.
To accomplish the task of grouping and order, create a modi ed version of the bank_transactions table with
an added column denoting the chronological ordering of transactions within each day.
To partition by date, you can use an OVER() statement. After partitioning, you should use a descending
order so that the rst entry in each partition is the last transaction chronologically. Here is how that query
can be written:
AS ordered_time
16. Write a query to debug an error and select the top ve most expensive projects by
budget-to-employee ratio.
More context: You are given two tables. A projects table and another that maps employees to their
projects, called employee_projects. In this question, however, a bug exists that is causing duplicate rows
in the employee_projects table.
Example:
Input:
projects table
column type
id INTEGER
title VARCHAR
state_date DATETIME
end_date DATETIME
budget INTEGER
employee_projects table
Column Type
project_id INTEGER
employee_id INTEGER
Output:
Column Type
title VARCHAR
budget_per_employee FLOAT
This is a good example of a logic-based SQL problem. Although there are a few steps to the solution, the
actual SQL queries are fairly simple.
HINT: One way to do the debugging is to group by columns project_id simply and employee_id. By
grouping by both columns, you are creating a table that provides distinct values on project_id and
employee_id, thereby excluding any duplicates.
17. You have a table that represents the total number of messages sent between two users
by date on Facebook Messenger. Answer these questions:
What are some insights that could be derived from this table?
What do you think the distribution of the number of conversations created by each user per day looks
like?
Write a query to get the distribution of the number of conversations created by each user by day in 2020.
This question tests your data sense, as well as your SQL writing skills. It has also appeared in Facebook data
analyst interviews.
To answer the rst part of the question regarding insights, there are a number of metrics you could evaluate.
You can nd the total number of messages sent per day, the number of conversations being started, or the
average number of messages per conversation. All of these metrics seek to nd users’ level of engagement
and connectivity.
You can nd a full solution on parts one through three in the following YouTube video:
18. Write a SQL query to create a histogram of the number of comments per user in the
month of January 2020.
This intermediate SQL question has been asked in Amazon data analyst interviews. Here is a partial answer
from Interview Query:
What does a histogram represent, and what kind of story does it tell? In this case, you are interested in using
a histogram to represent the distribution of comments each user has made in January 2020. A histogram
with bin buckets of size one means that you can avoid the logical overhead of grouping frequencies into
speci c intervals.
For example, if you want a histogram of size ve, you would have to run a SELECT statement like so:
SELECT
CASE WHEN frequency BETWEEN 0 AND 5 THEN 5
WHEN frequency BETWEEN 5 AND 10 THEN 10 etc..
19. Select the largest three departments with ten or more employees and rank them
according to the percentage of employees making over 100,000 dollars.
In this problem, you are given two tables: An employees table and a departments table.
Example:
Input:
employees table
Columns Type
id INTEGER
rst_name VARCHAR
last_name VARCHAR
salary INTEGER
department_id INTEGER
departments table
Columns Type
id INTEGER
name VARCHAR
Output:
Column Type
percentage_over_100k FLOAT
department_name VARCHAR
number of employees INTEGER
First, break down the question to understand what it’s asking. Speci cally, you break the question down into
three clauses of conditions:
From here, think about how you would associate employees with their department, calculate and display the
percentage of employees making over $\$100,000$ a year, and order those results to provide an answer to
the original question.
20. Given a table of students and their SAT test scores, write a query to return the two
students with the closest test scores by score di erence.
Given that this problem is referencing one table with only two columns, you have to self-reference di erent
creations of the same table. It is helpful to think about this problem in the form of two di erent tables with
the same values.
The rst part compares each combination of students and their SAT scores.
The second part is guring out which two students’ scores are the closest.
21. Write a query to support or disprove the hypothesis: Clickthrough Rate (CTR) is
dependent on search rating.
This question provides a table that represents search results on Facebook, including a query, a position, and
a human rating.
22. Write a query to get the number of customers that were upsold by purchasing
additional products.
For this problem, you are given a table of product purchases. Each row in the table represents an individual
product purchase.
Note: If the customer purchased two things on the same day, that does not count as an upsell, as they were
purchased within a similar timeframe. We are looking for a customer returning on a di erent date to
purchase a product.
This question is a little tricky because you have to note the dates that each user purchased products. You
can’t just group by the user_id and look where the number of products purchased is greater than one
because of the upsell condition.
You have to group by both the date eld and the user_id to get each transaction broken out by day and
user:
SELECT
user_id
, DATE(created_at) AS date
FROM transactions
GROUP BY 1,2
The query above will now give us a user_id and date eld for each row. If there exists a duplicate user_id,
then you know that the user purchased on multiple days, which satis es the upsell condition. What comes
next?
23. Given the transactions table below, write a query that nds the third purchase of
every user.
Note: Sort the results by the user_id in ascending order. If a user purchases two products at the same time,
the lower ID eld is used to determine which is the rst purchase.
Example:
Input:
transactions table
Columns Type
id INTEGER
user_id INTEGER
created_at DATETIME
product_id INTEGER
quantity INTEGER
Output:
Columns Type
user_id INTEGER
created_at DATETIME
product_id INTEGER
quantity INTEGER
Here is a helpful hint for this question: You need an indicator of which purchase was the third by a speci c
user. Whenever you are thinking of ranking a dataset, it is helpful to immediately think of a speci c window
function you can use. You need to apply the RANK function to the transactions table. The RANK function
is a window function that assigns a rank to each row in the partition of the result set.
24. Write a query to retrieve the number of users who have posted each of their job listings
only once and the number of users who have posted at least one job multiple times.
This is a LinkedIn data analyst interview question. See a full solution to this question on YouTube:
25. Write a query to get the top three highest employee salaries for each department.
For this problem, you are given an employees and a departments table.
Note: If the department contains less than three employees, the top two or top one highest salaries should
be listed.
Here’s a hint: You need to order the salaries by department. A window function is useful here. Window
functions enable calculations within a certain partition of rows. In this case, the RANK() function would be
useful. What would you put in the PARTITION BY and ORDER BY clauses?
Note: When you substitute for the actual id and metric elds, make sure the substitutes are relevant to the
question asked and aligned with the data provided to you.
26. Write a query to nd the number of non-purchased seats for each ight.
In this Robinhood data analyst question, assume you work for a small airline, and you are given three tables:
ights, planes, and ight_purchases.
To get the number of unsold seats per ight, you need to get each ight’s total number of seats available and
the total seats sold.
You can do an inner join on all 3 tables since the question states that the ight_purchases table does not
have entries of ights or seats that do not exist.
To calculate the number of seats per ight, you use GROUP BY on the ight_id together with COUNT() on
seat_id to get a count of seats sold. You then do the calculation of the number of total seats on the ight
minus the total seats sold to reach how many seats remained unsold.
27. Given a transactions table with date timestamps, sample every fourth row ordered
by date.
Here’s a hint for this question to get you started: If you are sampling from this table and you want to sample
every fourth value speci cally, you will probably have to use a window function.
A general rule of thumb to follow is that when a question states or asks for some Nth value (like the third
purchase of each customer or the tenth noti cation sent), then a window function is the best option. Window
functions allow us to use the RANK() or ROW_NUMBER() function to provide a numerical index based on a
certain ordering.
28. Write a query that returns all of the neighborhoods that have zero users.
More Context: You are given two tables: the rst is a users table with demographic information and the
neighborhoods they live in, and the second is a neighborhoods table.
This is an intermediate SQL problem that requires you to write a simple query. Our task is to nd all the
neighborhoods without users. To reframe the task, you need all the neighborhoods that do not have a single
user living in them. This means you have to introduce a column in one table but not in the other, such that
you can see user counts by neighborhood.
Advanced SQL writing - Writing queries to debug code, using indices to tune SQL queries, and using
advanced SQL clauses.
Logic-based questions - These questions can be more challenging analytics cases studies or queries that
rst require you to solve a logic-based problem.
29. An online marketplace company has introduced a new feature that allows potential
buyers and sellers to conduct audio chats with each other prior to transacting. Answer the
following questions:
Here’s the full solution to this complex data analytics case study on YouTube:
30. Write a query to get the total three-day rolling average for deposits by day.
For this question, you are given a table of bank transactions with three columns: user_id, a deposit or
withdrawal value (determined if the value is positive or negative), and created_at time for each transaction.
Here’s a hint: Usually, if the problem asks to solve for a moving/rolling average, you are provided the dataset
in the form of a table with two columns: date and value. This problem is taken one step further as it provides
a table of just transactions, with an interest in ltering for deposits (positive value) and removing records
representing withdrawals (negative value, e.g. -10).
You also need to know the total deposit amount (sum of all deposits) for each day, as it will factor into
calculating the numerator for the rolling three-day average:
31. Write a SQL query that creates a cumulative distribution of the number of comments
per user. Assume bin buckets class intervals of one.
To solve this cumulative distribution practice problem, you are given two tables, a users table and a
comments table.
frequency cumulative
0 10
1 25
2 27
32. Write a query to display a graph to understand how unsubscribes are a ecting login
rates over time.
For this question, assume that you work at Twitter. Twitter wants to roll out more push noti cations to users
because they think users are missing out on good content. Twitter decides to do this in an A/B test. After
you release more push noti cations, you suddenly see the total number of unsubscribes increase. How would
you visually represent this growth in unsubscribes and its e ect on login rates?
33. You are given a table of user experiences representing each person’s past employment
history. Answer the following:
Write a query to prove or disprove this hypothesis: Data scientists who switch jobs more frequently become
managers faster than data scientists that stay at one job for longer.
For this question, you are interested in analyzing the career paths of data scientists. Let’s say that the titles
you care about are bucketed into three categories: data scientist, senior data scientist, and data science
manager.
This question requires a bit of creative problem-solving to understand how you can prove or disprove the
hypothesis. The hypothesis is that data scientists who end up switching jobs more often get promoted faster.
Therefore, in analyzing this dataset, you can prove this hypothesis by separating the data scientists into
speci c segments on how often they switch jobs in their careers.
For example, if you looked at the number of job switches for data scientists that have been in their eld for
ve years, you could prove the hypothesis that the number of data science managers increased with the
number of times they had switched jobs.
The main di erence between medium and hard SQL questions is how straightforward they are. An
intermediate question asks you to write a query that retrieves a speci c set of data. Instead, hard
questions require you to think what is the data you need in order to solve the question as well. Hard
ti t d t b d d bi t l lif it ti f d t l t
Continue Lesson
How he transitioned from Financial Analyst to Data Scientist at Credit Expert: An IQ Success Story
Go Premium