MySQL Data Analysis
MySQL Data Analysis
MySQL FOR
DATA ANALYSIS
With Expert SQL Instructor John Pauler
Quizzes & Homework Exercises to test and reinforce key concepts, with step-by-step solutions
Bonus Projects to test your abilities and apply the skills developed throughout the course
Use SQL queries to select data, apply filters and sorting rules,
3 Analyzing Data from Single Tables analyze segments of data, and evaluate conditional logic
[MID-COURSE PROJECT]
[FINAL PROJECT]
THE You and your rich Uncle Jimmy just purchased Maven Movies, a brick and mortar DVD Rental
SITUATION business. Uncle Jimmy put up the money, and you’re in charge of the day-to-day operations.
As a new owner, you’ll need to learn everything you can about your business: your product
THE inventory, your staff, your customer purchase behaviors, etc.
BRIEF
You have access to the entire Maven Movies SQL database, but the remaining employees are
not able to give you much direction. You’ll need to analyze everything on your own.
2 This course is designed to get you up & running with SQL for analysis
• The goal is to provide a foundational understanding of SQL and relational databases; some concepts may be
covered at a high level, and we won’t cover everything that SQL can do in the scope of this course
You can load, query and analyze massive data sets using SQL
• SQL stands for Structured Query Language, and was designed in the early
1970s at IBM to manipulate and retrieve data stored in a relational
database management system (RDMS)
Mac, Windows, Linux Web App (works on any OS) Windows only Mac only
Step 1 Download Community Server This allows SQL to run on your machine
This is the program you’ll use to write and run SQL queries
Step 2 Download MySQL Workbench (it’s intuitive, and works across operating systems)
We’ll run the SQL code to build the 16 table database which
Step 5 Create the Database we’ll be exploring throughout the course (this part is easy!)
2 Select the MacOS operating system, and download the DMG Archive version
• Note: you’ll likely see a later version than the one shown (just download the latest)
3 No need to Login or Sign Up, just click “No thanks, just start my download”
4 Find the install file in your downloads, then double click to run the installer package
5 Click through each install step, leaving defaults unless you need customized settings
• Note: Make sure to store your root password somewhere, you’ll need this later!
1 3 5
2 4
2 Select the Microsoft Windows operating system, and the Installer MSI download
• Note: On the download page you may see two versions: select mysql-installer-web-community if you are connected
to the internet, and keep in mind that you may see a later version than the one shown (just download the latest)
3 No need to Login or Sign Up, just click “No thanks, just start my download”
4 Find the install file in your downloads, then double click to run the installer package
5 Click through each install step, leaving defaults unless you need customized settings
• Note: Make sure to store your root password somewhere, you’ll need this later!
1 3 5
2 4
2 We’ll be using version 8.0.16 for this course, so you can either click “Looking for previous
GA versions?” to search for the same one, or simply download the latest available
3 No need to Login or Sign Up, just click “No thanks, just start my download”
4 Find the install file in your downloads, click the MySQL Workbench logo (with the dolphin)
and drag it into your Applications folder
5 Look for MySQL workbench in your list of applications, double click to launch, then proceed
to Step 3: Connecting to the server
1 3 5
2 We’ll be using version 8.0.13 for this course, so you can either click “Looking for previous
GA versions?” to search for the same one, or simply download the latest available
3 No need to Login or Sign Up, just click “No thanks, just start my download”
4 Find the install file in your downloads, double click to run the installation process, and
stick with default settings unless you need a custom configuration
5 Look for MySQL workbench in your list of programs, double click to launch, then proceed to
Step 3: Connecting to the server
• Note: You may see a warning if you aren’t on Windows 10+, but most older systems (i.e. Windows 7) should be compatible
1 3 5
1 After launching Workbench, check the MySQL Connections section on the welcome page
• If you see a connection already, right-click to Edit Connection, otherwise click the plus sign (+) to add a new one
2 Name the connection “MavenMovies”, confirm that the Username is “root”, and click OK
3 Once you see the MavenMovies connection on your welcome screen, simply click the tile
and enter your root password to complete the connection
1 3
2 4
Result Grid
After running your SQL
queries, your results
appear here
Schemas Tab
Here you can view
tables and views in Action Output
your database
This is a summary of
actions taken by the server
(TIP: the ‘Response’ column
is great for troubleshooting
errors!)
1 In MySQL Workbench, click File from the top menu, then select Open SQL Script
3 Click anywhere in the SQL Query Editor window (without highlighting any code), and click
the lightning bolt icon to run all of the code and create the database
2 4
Tables contain information organized into columns (or fields) and rows (or records)
inventory_id, customer_id, and staff_id are Foreign Keys, which reference primary keys from other tables
SELECT Identifies the column(s) you want your query to select for your results
PRO TIP:
SELECT tells the SQL server to retrieve all rows for the columns specified. Don’t worry about filtering for
now (we’ll do that later by adding a WHERE clause)
FROM Identifies the table(s) your query will select data from
FROM tableName
HEY THIS IS IMPORTANT!
FROM is one of the two things you will
This lets the SQL server This is the table SQL need to include in every SQL query you
know you are about to will select from write (SELECT is the other)
specify the table(s) to select
from for your query result
PRO TIP:
Unlike specifying multiple columns in your SELECT, you cannot just list multiple tables separated by
commas in your FROM. To use multiple tables, you’ll need to use a JOIN (we’ll cover how to do this later).
PRO TIP:
SELECT * FROM is a great way to
quickly see what data a table contains
PRO TIP:
Use line breaks and indentation to make
your code more human-readable.
PRO TIP:
SELECT DISTINCT is a great way to
quickly see all values in a column
WHERE (Optional) Specifies criteria for filtering the records of your result set
WHERE logicalCondition
HEY THIS IS IMPORTANT!
WHERE is an optional clause, and always
comes after your FROM clause and before
This lets the SQL server This is where you prescribe the logical any GROUP BY, HAVING, or ORDER BY
know you are about to conditions used to filter your result set clauses (if included)
specify a logical condition
for which rows to include Examples:
• WHERE category = ‘Sci-Fi’
• WHERE amount > 5.99
• WHERE rental_date BETWEEN ‘2006-01-01’ AND ‘2006-06-01’
PRO TIP:
Specify multiple filter criteria using AND/OR statements within your WHERE clause
PRO TIP:
Add and remove filtering logic and
compare results to quickly master this
PRO TIP:
You can use AND and OR together in the
same WHERE clause. This is powerful!
PRO TIP:
Using IN saves time, and will make it
easier to read your query later
LIKE Allows you to use pattern matching in your logical operators (instead of exact matching)
LIKE ‘%patternToLookFor%’
HEY THIS IS IMPORTANT!
Capitalization matters. When you
specify a value to look for, the server
LIKE tells SQL that This is where you define the pattern. The “%”
will consider the capitalization you
you’re prescribing a before and after the text is a type of wildcard
provided in your SQL statement.
pattern instead of an (along with “_”)
exact number or string
Examples :
of characters • WHERE name LIKE ‘Denise%’ -- records where name starts with ‘Denise’, with any number of characters after
• WHERE description LIKE ‘%fancy%’ -- records that contains ‘fancy’, with any characters before OR after
• WHERE name LIKE ‘%Johnson’ -- records that end with ‘Johnson’, with any number of characters before
• WHERE first_name LIKE ‘_erry’ -- records that end with ‘erry’, with exactly one character before (i.e. Terry, Jerry)
PRO TIP:
NOT LIKE can also be used to filter out (rather than keep) records where values match a pattern you provide,
and uses the same wildcards and capitalization rules as LIKE
PRO TIP:
GROUP BY is great for comparing different segments of your data (similar to Pivot Tables in Excel)
PRO TIP:
Aggregate functions serve the same purpose
as summarization modes in a Pivot Table
PRO TIP:
Commenting make queries much more
human readable -- use them often! Assigning an alias does NOT change the resulting values
PRO TIP:
Group by customer segment, and add a
time dimension to create cohort trends
PRO TIP:
HAVING is a great way to limit your results to the most important groups for your business. For example,
you can limit results to customers with total payments above a certain amount, or to your most-rented films.
ORDER BY (Optional) Specifies the order in which your query results are displayed
PRO TIP:
When you use ORDER BY with multiple criteria (see the third example above), the server will prioritize
sorting the data based on first column specified, then use additional columns as tiebreakers
CASE Allows you to process a series of IF/THEN logical operators in a specific order
CASE WHEN logic1 THEN value1 WHEN logic2 THEN value2 ELSE value3 END
PRO TIP:
I often use my ELSE condition as a catch all, and write something like “Oops…check logic!”
This helps me quickly see if I forgot to include any conditions in my CASE statement (examples to follow)
PRO TIP:
When values feel too granular or noisy, CASE
can help you roll them up to a higher level
PRO TIP:
Include an error message using an ELSE
statement to see if you missed any logic
PRO TIP:
Use GROUP BY to define your row
labels, and CASE to pivot to columns
THE The company’s insurance policy is up for renewal and the insurance company’s underwriters
SITUATION need some updated information from us before they will issue a new policy.
Sincerely,
Joe Scardycat, Lead Underwriter
1 We will need a list of all staff members, including their first and last names, email addresses, and the store
identification number where they work.
2 We will need separate counts of inventory items held at each of your two stores.
3 We will need a count of active customers for each of your stores. Separately, please.
4 In order to assess the liability of a data breach, we will need you to provide a count of all customer email
addresses stored in the database.
5 We are interested in how diverse your film offering is as a means of understanding how likely you are to
keep customers engaged in the future. Please provide a count of unique film titles you have in inventory at
each store and then provide a count of the unique categories of films you provide.
6 We would like to understand the replacement cost of your films. Please provide the replacement cost for the
film that is least expensive to replace, the most expensive to replace, and the average of all films you carry.
7 We are interested in having you put payment monitoring systems and maximum payment processing
restrictions in place in order to minimize the future risk of fraud by your staff. Please provide the average
payment you process, as well as the maximum payment you have processed.
8 We would like to better understand what your customer base looks like. Please provide a list of all customer
identification values, with a count of rentals they have made all-time, with your highest volume customers at
the top of the list.
Cardinality refers to the uniqueness of values in a column (or attribute) of a table, and is commonly used to describe
how two tables relate (one-to-one, one-to-many, or many-to-many). For now, here are the key points to grasp:
FOREIGN FOREIGN
(MANY) (MANY)
• Primary keys are unique
inventory_id film_id address_id
1 1 1 • They cannot repeat, so there is only one instance
2 1 1 PRIMARY of each primary key value in a column
3 1 1 (ONE)
4 1 1 film_id title release_year • Foreign keys are non-unique
5 1 2 1 ACADEMY DINOSAUR 2006
2 ACE GOLDFINGER 2006
• They can repeat, so there may be many instances
6 1 2
7 1 2 3 ADAPTATION HOLES 2006 of each foreign key value in a column
4 AFFAIR PREJUDICE 2006
8 1 2
9 2 2 • We can create a one-to-many relationship
10 2 2 PRIMARY by connecting a foreign key in one table to
11 2 2 (ONE)
12 3 2 a primary key in another
address_id address district
13 3 2 1 47 MySakila Drive Alberta
14 3 2 2 28 MySQL Boulevard QLD
15 3 2
16 4 1
17 4 1
18 4 1
19 4 1
20 4 2
Customer table:
PRO TIP:
When you explore a database for the first time, diagram
your relationships to understand your table structure
The whole point of table relationships is to enable multi-table querying (i.e. pulling data from multiple tables at once)
• In SQL, we use JOIN statements to do this, by writing these table relationships directly into our queries
For example, what if you need a table showing all film titles in
each store’s inventory?
Since title and store_id live in separate tables, we can’t use
single-table queries; we’ll need to use a JOIN!
Returns ALL records from the LEFT table, and any FROM leftTableName
LEFT JOIN matching records from the RIGHT table LEFT JOIN rightTableName
Returns ALL records from the RIGHT table, and FROM leftTableName
RIGHT JOIN any matching records from the LEFT table RIGHT JOIN rightTableName
Returns all data from one table, with all data SELECT FROM firstName
UNION from another table appended to the end UNION
SELECT FROM secondTableName
*Copyright 2019, Excel Maven & Maven Analytics, LLC
INNER JOIN
INNER JOIN Returns records from BOTH tables when there is a match, and excludes unmatched records
Tells SQL to return This is where you name your right table, and tell
only the overlap SQL which column to match on by specifying the HEY THIS IS IMPORTANT!
column name from each table INNER JOIN is one of the two join types
you’ll likely use most (LEFT JOIN is the
Example:
other). Make sure you understand the
Left Right FROM rental
INNER JOIN customer
differences between the two types!
ON rental.customer_id = customer.customer_id
PRO TIP:
A good way to remember how INNER JOIN works is to think of it as only returning the inner/middle part of a
Venn diagram (where the two tables match/overlap)
LEFT JOIN Returns all records from the LEFT table, and any matched records from the RIGHT table
PRO TIP:
This is the join type that I use most often. When you’re working with a data set and want to add data from
another table while keeping all of your current records, LEFT JOIN is the way to go!
PRO TIP:
Compare your query with LEFT and INNER
joins to quickly master the differences INNER JOIN = 4,580 ROWS LEFT JOIN: 4,581 ROWS
RIGHT JOIN Returns all records from the RIGHT table, and any matched records from the LEFT table
Tells SQL to return the This is where you tell SQL how to look for a match by
overlap and everything specifying columns from each table
from the right table HEY THIS IS IMPORTANT!
Example: RIGHT JOIN is like the opposite of LEFT
FROM rental JOIN; instead of keeping all records from
RIGHT JOIN customer the first (left) table named, it keeps all
Left Right ON rental.customer_id = customer.customer_id
records from the second (right)
PRO TIP:
I write SQL queries every day in my professional career, and I’ve never needed to use RIGHT JOIN.
To keep things simple, just use LEFT JOIN (I’m only teaching you about RIGHT JOIN for completeness)
Returns all values from the left (first) table and any Returns any matching values from both tables. For Returns all values from the right (second) table, and
records from the right (second) table which match the records with no match, INNER JOIN does not return any records from the left (first) table which match the
JOIN criteria. Returns NULL when no match is found. values from either table. JOIN criteria. Returns NULL when no match is found.
FULL JOIN Returns all records from BOTH tables when there is a match in either one of the tables
Tells SQL to return records This is where you tell SQL how to look for a match, by
from both tables specifying columns from each table
HEY THIS IS IMPORTANT!
Example: Use a FULL JOIN when you want records
FROM rental
from both tables even when there isn’t a
FULL JOIN customer
Left Right ON rental.customer_id = customer.customer_id
match for the column you’re joining on
(this may yield lots of records!)
PRO TIP:
FULL JOIN likely isn’t something you’ll use every day, but can come in handy if you ever need to merge ALL
records from two tables
BRIDGE
UNION Returns all data from the FIRST table, with all data from the SECOND table appended to the end
UNION tells SQL to Instead of a JOIN to another table, HEY THIS IS IMPORTANT!
combine the we write a second SELECT statement here:
results of one UNION will deduplicate records, and
SELECT statement
+ Example: keep only distinct values in your result
with another -- yields one column with all names (first+last) set. If you want to keep duplicate
SELECT first_name, last_name FROM advisor records as well, use UNION ALL.
UNION
SELECT first_name, last_name FROM investor
PRO TIP:
UNION is an easy one to run into errors with. Make sure that A) your two SELECT statements have the same
number of columns, B) columns are in the same order, and C) columns in each table have similar data types.
THE You and your business partner were recently approached by another local business owner
SITUATION who is interested in purchasing Maven Movies. He primarily owns restaurants and bars, so he
has lots of questions for you about your business and the rental business in general. His offer
seems very generous, so you are going to entertain his questions.
Best,
Martin Moneybags
1 My partner and I want to come by each of the stores in person and meet the managers. Please send over
the managers’ names at each store, with the full address of each property (street address, district, city, and
country please).
2 I would like to get a better understanding of all of the inventory that would come along with the business.
Please pull together a list of each inventory item you have stocked, including the store_id number, the
inventory_id, the name of the film, the film’s rating, its rental rate and replacement cost.
3 From the same list of films you just pulled, please roll that data up and provide a summary level overview of
your inventory. We would like to know how many inventory items you have with each rating at each store.
4 Similarly, we want to understand how diversified the inventory is in terms of replacement cost. We want to
see how big of a hit it would be if a certain category of film became unpopular at a certain store.
We would like to see the number of films, as well as the average replacement cost, and total replacement
cost, sliced by store and film category.
5 We want to make sure you folks have a good handle on who your customers are. Please provide a list
of all customer names, which store they go to, whether or not they are currently active, and their full
addresses – street address, city, and country.
We would like to understand how much your customers are spending with you, and also to know who your
6 most valuable customers are. Please pull together a list of customer names, their total lifetime rentals, and the
sum of all payments you have collected from them. It would be great to see this ordered on total lifetime value,
with the most valuable customers at the top of the list.
My partner and I would like to get to know your board of advisors and any current investors. Could you
7 please provide a list of advisor and investor names in one table? Could you please note whether they are an
investor or an advisor, and for the investors, it would be good to include which company they work with.
We're interested in how well you have covered the most-awarded actors. Of all the actors with three types of
8 awards, for what % of them do we carry a film? And how about for actors with two types of awards? Same
questions. Finally, how about actors with just one award?