100% found this document useful (3 votes)
2K views140 pages

MySQL Data Analysis

MY sql data analysis

Uploaded by

Deovrat
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
Download as pdf or txt
100% found this document useful (3 votes)
2K views140 pages

MySQL Data Analysis

MY sql data analysis

Uploaded by

Deovrat
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
Download as pdf or txt
Download as pdf or txt
You are on page 1/ 140

INTRO TO

MySQL FOR
DATA ANALYSIS
With Expert SQL Instructor John Pauler

*Copyright 2019, Maven Analytics, LLC


COURSE STRUCTURE

This is a project-based course, for students looking for a practical, hands-on,


and highly engaging approach to querying and analyzing databases with MySQL

Additional resources include:

Downloadable Ebook to serve as a helpful reference when you’re offline or on the go

Quizzes & Homework Exercises to test and reinforce key concepts, with step-by-step solutions

Bonus Projects to test your abilities and apply the skills developed throughout the course

*Copyright 2019, Excel Maven & Maven Analytics, LLC


COURSE OUTLINE

Explore the SQL landscape, download Community Server


1 SQL Intro & Setup and Workbench, and create the project database

Review basic database concepts, including table structure,


2 Database Fundamentals (Part 1) columns and records, and primary vs. foreign keys

Use SQL queries to select data, apply filters and sorting rules,
3 Analyzing Data from Single Tables analyze segments of data, and evaluate conditional logic

[MID-COURSE PROJECT]

Understand table relationships, dimension vs. fact tables,


4 Database Fundamentals (Part 2) and the basics of normalization and cardinality

Learn the different types of joins, when to use each of


5 Analyzing Multiple Tables via Joins them, and how to analyze data across multiple tables

[FINAL PROJECT]

*Copyright 2019, Excel Maven & Maven Analytics, LLC


INTRODUCING THE COURSE PROJECT

THE You and your rich Uncle Jimmy just purchased Maven Movies, a brick and mortar DVD Rental
SITUATION business. Uncle Jimmy put up the money, and you’re in charge of the day-to-day operations.

As a new owner, you’ll need to learn everything you can about your business: your product
THE inventory, your staff, your customer purchase behaviors, etc.
BRIEF
You have access to the entire Maven Movies SQL database, but the remaining employees are
not able to give you much direction. You’ll need to analyze everything on your own.

THE Use MySQL to:


OBJECTIVE • Access and explore the Maven Movies database
• Develop a firm grasp of the 16 database tables and how they relate to each other
• Analyze all aspects of the company’s data, including transactions, customers, staff, etc.

*Copyright 2019, Excel Maven & Maven Analytics, LLC


SETTING EXPECTATIONS

1 You’ll be learning MySQL, and practicing that using MySQL Workbench


• In your career, you may end up using other “flavors” of SQL (T-SQL, PL/SQL, PostgreSQL, etc.)
• Each flavor is very similar, with only minor syntax changes; the concepts you learn will apply universally

2 This course is designed to get you up & running with SQL for analysis
• The goal is to provide a foundational understanding of SQL and relational databases; some concepts may be
covered at a high level, and we won’t cover everything that SQL can do in the scope of this course

3 The course focuses on understanding, extracting, and analyzing data


• This course is ideal for people who want a firm understanding of analyzing data in relational databases
• We start with the basics, so feel free to skip ahead as you see fit if you have had some SQL exposure in the past

4 We will NOT cover building & maintenance of databases in this course


• This course is more oriented toward analysis, and will not get into typical database administrator tasks
• We will NOT cover database or table creation, altering or dropping tables, or managing user permissions

*Copyright 2019, Excel Maven & Maven Analytics, LLC


INTRODUCING SQL

*Copyright 2019, Excel Maven & Maven Analytics, LLC


WHY LEARN SQL?

SQL is the standard language for relational database management

You can load, query and analyze massive data sets using SQL

Learning SQL is fun, intuitive and surprisingly easy

Companies always need employees who know SQL

Becoming a database + SQL expert has never-ending practical benefits

*Copyright 2019, Excel Maven & Maven Analytics, LLC


BRIEF HISTORY OF SQL

• SQL stands for Structured Query Language, and was designed in the early
1970s at IBM to manipulate and retrieve data stored in a relational
database management system (RDMS)

• There is a universal standard for SQL set by the International Organization


for Standards and the American National Standards Institute (ANSI), with
updates released every ~3-5 years

• Vendors are constantly adding new features on top of the standards,


which creates different “flavors” of SQL (MySQL, PostgreSQL, SQLite, etc.)

*Copyright 2019, Excel Maven & Maven Analytics, LLC


COMMON FLAVORS OF SQL

HEY THIS IS IMPORTANT!


These flavors of SQL are much more similar than they are different – all
are based on the same universal standard, with slight variations in syntax.

*Copyright 2019, Excel Maven & Maven Analytics, LLC


POPULAR MySQL EDITORS

Mac, Windows, Linux Web App (works on any OS) Windows only Mac only

HEY THIS IS IMPORTANT!


We’ll be practicing MySQL using MySQL Workbench, but rather than thinking “I learned MySQL” or “I can use Workbench”,
you should think “I’m learning SQL”. Period. You’ll be able to use the concepts and syntax you learn here universally.

*Copyright 2019, Excel Maven & Maven Analytics, LLC


DOWNLOAD & SETUP

*Copyright 2019, Excel Maven & Maven Analytics, LLC


MySQL DOWNLOAD & SETUP – OVERVIEW

Step 1 Download Community Server This allows SQL to run on your machine

This is the program you’ll use to write and run SQL queries
Step 2 Download MySQL Workbench (it’s intuitive, and works across operating systems)

We’ll get you connected to the server so you can use


Step 3 Connect Workbench to Server Workbench to start running your own SQL queries

We’ll take a quick tour of the Workbench interface to get


Step 4 Review Workbench Interface you familiar with the layout and key components

We’ll run the SQL code to build the 16 table database which
Step 5 Create the Database we’ll be exploring throughout the course (this part is easy!)

*Copyright 2019, Excel Maven & Maven Analytics, LLC


STEP 1: COMMUNITY SERVER (MAC)

*Copyright 2019, Excel Maven & Maven Analytics, LLC


MySQL COMMUNITY SERVER – MAC DOWNLOAD GUIDE

1 Go to https://dev.mysql.com/downloads and download MySQL Community Server

2 Select the MacOS operating system, and download the DMG Archive version
• Note: you’ll likely see a later version than the one shown (just download the latest)

3 No need to Login or Sign Up, just click “No thanks, just start my download”

4 Find the install file in your downloads, then double click to run the installer package

5 Click through each install step, leaving defaults unless you need customized settings
• Note: Make sure to store your root password somewhere, you’ll need this later!

*Copyright 2019, Excel Maven & Maven Analytics, LLC


MySQL COMMUNITY SERVER – MAC DOWNLOAD GUIDE

1 3 5

2 4

*Copyright 2019, Excel Maven & Maven Analytics, LLC


STEP 1: COMMUNITY SERVER (PC)

*Copyright 2019, Excel Maven & Maven Analytics, LLC


MySQL COMMUNITY SERVER – PC DOWNLOAD GUIDE

1 Go to https://dev.mysql.com/downloads and download MySQL Community Server

2 Select the Microsoft Windows operating system, and the Installer MSI download
• Note: On the download page you may see two versions: select mysql-installer-web-community if you are connected
to the internet, and keep in mind that you may see a later version than the one shown (just download the latest)

3 No need to Login or Sign Up, just click “No thanks, just start my download”

4 Find the install file in your downloads, then double click to run the installer package

5 Click through each install step, leaving defaults unless you need customized settings
• Note: Make sure to store your root password somewhere, you’ll need this later!

*Copyright 2019, Excel Maven & Maven Analytics, LLC


MySQL COMMUNITY SERVER – PC DOWNLOAD GUIDE

1 3 5

2 4

*Copyright 2019, Excel Maven & Maven Analytics, LLC


STEP 2: MySQL WORKBENCH (MAC)

*Copyright 2019, Excel Maven & Maven Analytics, LLC


MySQL WORKBENCH – MAC DOWNLOAD GUIDE

1 Go to https://dev.mysql.com/downloads/workbench, scroll down to Generally Available


(GA) Releases, and select the MacOS operating system

2 We’ll be using version 8.0.16 for this course, so you can either click “Looking for previous
GA versions?” to search for the same one, or simply download the latest available

3 No need to Login or Sign Up, just click “No thanks, just start my download”

4 Find the install file in your downloads, click the MySQL Workbench logo (with the dolphin)
and drag it into your Applications folder

5 Look for MySQL workbench in your list of applications, double click to launch, then proceed
to Step 3: Connecting to the server

*Copyright 2019, Excel Maven & Maven Analytics, LLC


MySQL WORKBENCH – MAC DOWNLOAD GUIDE

1 3 5

2 Look for version 8.0.16, or download the latest


4

*Copyright 2019, Excel Maven & Maven Analytics, LLC


STEP 2: MySQL WORKBENCH (PC)

*Copyright 2019, Excel Maven & Maven Analytics, LLC


MySQL WORKBENCH – PC DOWNLOAD GUIDE

1 Go to https://dev.mysql.com/downloads/workbench, scroll down to Generally Available


(GA) Releases, and select the Microsoft Windows operating system

2 We’ll be using version 8.0.13 for this course, so you can either click “Looking for previous
GA versions?” to search for the same one, or simply download the latest available

3 No need to Login or Sign Up, just click “No thanks, just start my download”

4 Find the install file in your downloads, double click to run the installation process, and
stick with default settings unless you need a custom configuration

5 Look for MySQL workbench in your list of programs, double click to launch, then proceed to
Step 3: Connecting to the server
• Note: You may see a warning if you aren’t on Windows 10+, but most older systems (i.e. Windows 7) should be compatible

*Copyright 2019, Excel Maven & Maven Analytics, LLC


MySQL WORKBENCH – PC DOWNLOAD GUIDE

1 3 5

2 Look for version 8.0.13, or download the latest


4

*Copyright 2019, Excel Maven & Maven Analytics, LLC


STEP 3: CONNECTING TO THE SERVER

*Copyright 2019, Excel Maven & Maven Analytics, LLC


CONNECTING TO THE SERVER

1 After launching Workbench, check the MySQL Connections section on the welcome page
• If you see a connection already, right-click to Edit Connection, otherwise click the plus sign (+) to add a new one

2 Name the connection “MavenMovies”, confirm that the Username is “root”, and click OK

3 Once you see the MavenMovies connection on your welcome screen, simply click the tile
and enter your root password to complete the connection

*Copyright 2019, Excel Maven & Maven Analytics, LLC


CONNECTING TO THE SERVER

1 3

2 4

*Copyright 2019, Excel Maven & Maven Analytics, LLC


STEP 4: MySQL WORKBENCH INTERFACE

*Copyright 2019, Excel Maven & Maven Analytics, LLC


MySQL WORKBENCH INTERFACE (MAC VS. PC)

Mac interface PC interface

HEY THIS IS IMPORTANT!


Workbench looks slightly different on Mac vs. PC, but everything you need is found in the same place.
While the course is recorded on a Mac, but you should have no problem keeping up on a PC

*Copyright 2019, Excel Maven & Maven Analytics, LLC


QUICK TOUR: THE WORKBENCH INTERFACE

Query Editor Window


This is where you write and run your code

Result Grid
After running your SQL
queries, your results
appear here
Schemas Tab
Here you can view
tables and views in Action Output
your database
This is a summary of
actions taken by the server
(TIP: the ‘Response’ column
is great for troubleshooting
errors!)

*Copyright 2019, Excel Maven & Maven Analytics, LLC


STEP 5: CREATING THE DATABASE

*Copyright 2019, Excel Maven & Maven Analytics, LLC


CREATING THE DATABASE

1 In MySQL Workbench, click File from the top menu, then select Open SQL Script

2 Navigate to the create_mavenmovies.sql file provided in the course resources


• This code will automatically generate the entire database that we’ll be exploring throughout the course, modeling a
real-world DVD rental business

3 Click anywhere in the SQL Query Editor window (without highlighting any code), and click
the lightning bolt icon to run all of the code and create the database

4 After running the code, confirm the following:


1. You see a list of results in the Action Output window, with green check marks and no errors in the Response column
2. When you refresh the Schemas list, you should see a new database called mavenmovies, containing 16 tables

*Copyright 2019, Excel Maven & Maven Analytics, LLC


CREATING THE DATABASE
1 3

2 4

*Copyright 2019, Excel Maven & Maven Analytics, LLC


DATABASE FUNDAMENTALS (PART 1)

*Copyright 2019, Excel Maven & Maven Analytics, LLC


A DATABASE CAN CONTAIN MANY RELATED TABLES

In this case our database includes 16 related


tables, containing information about:
• Customers (Name, Address, etc.)
• Business (Staff, Rentals, etc.)
• Inventory (Films, Categories, etc.)

We’ll start by using MySQL to explore


individual tables, then discuss table
relationships and multi-table joins later in
the course

*Copyright 2019, Excel Maven & Maven Analytics, LLC


EACH TABLE CONTAINS ROWS & COLUMNS

Tables contain information organized into columns (or fields) and rows (or records)

In this case, our Rental table contains 7 columns and 10 rows:


• Each column contains an attribute related to our film rentals (rental/return date, customer ID, etc.)
• Each row corresponds to one specific rental (which film was rented, when, who rented it, etc.)

*Copyright 2019, Excel Maven & Maven Analytics, LLC


TABLES CAN CONTAIN PRIMARY & FOREIGN KEYS
The rental_id column is known as a Primary Key, which serves as a unique identifier for each record in the rental table

inventory_id, customer_id, and staff_id are Foreign Keys, which reference primary keys from other tables

HEY THIS IS IMPORTANT!


Primary Keys cannot contain duplicates, and cannot be NULL; Foreign Keys can repeat, and can be NULL

*Copyright 2019, Excel Maven & Maven Analytics, LLC


USE MAVENMOVIES
HEY THIS IS IMPORTANT!
The USE statement identifies the schema you will be
selecting data from in Workbench.
Example: USE mavenmovies;
If you encounter an error that says ‘no database selected,
you’ll need to select your database with a USE statement

*Copyright 2019, Excel Maven & Maven Analytics, LLC


ANALYZING SINGLE TABLES

*Copyright 2019, Excel Maven & Maven Analytics, LLC


THE “BIG 6” ELEMENTS OF A SQL SELECT STATEMENT
START OF
STATEMENT

Identifies the column(s) you want your


SELECT query to select for your results
SELECT columnName

Identifies the table(s) your query will pull


FROM data from
FROM tableName

(Optional) Specifies record-filtering criteria


WHERE for filtering your results
WHERE logicalCondition

(Optional) Specifies how to group the data


GROUP BY in your results
GROUP BY columnName

(Optional) Specifies group-filtering criteria


HAVING for filtering your results
HAVING logicalCondition

(Optional) Specifies the order in which your


ORDER BY query results are displayed
ORDER BY columnName
END OF
STATEMENT

*Copyright 2019, Excel Maven & Maven Analytics, LLC


THE SELECT STATEMENT

SELECT Identifies the column(s) you want your query to select for your results

SELECT columnName, otherColumnName

HEY THIS IS IMPORTANT!


This lets the SQL server This list is what SQL will select into your query
result. You can list a single column, or specify SELECT is one of the two things you will
know you are about to
need to include in every SQL query you
specify the column(s) to multiple columns separated by a comma.
write (FROM is the other)
select for your query result

PRO TIP:
SELECT tells the SQL server to retrieve all rows for the columns specified. Don’t worry about filtering for
now (we’ll do that later by adding a WHERE clause)

*Copyright 2019, Excel Maven & Maven Analytics, LLC


THE FROM CLAUSE

FROM Identifies the table(s) your query will select data from

FROM tableName
HEY THIS IS IMPORTANT!
FROM is one of the two things you will
This lets the SQL server This is the table SQL need to include in every SQL query you
know you are about to will select from write (SELECT is the other)
specify the table(s) to select
from for your query result

PRO TIP:
Unlike specifying multiple columns in your SELECT, you cannot just list multiple tables separated by
commas in your FROM. To use multiple tables, you’ll need to use a JOIN (we’ll cover how to do this later).

*Copyright 2019, Excel Maven & Maven Analytics, LLC


SELECT * FROM MySQL QUERY IN ACTION:

• SELECT * FROM tells SQL to


select all columns from the
specified table

• Running SELECT * FROM {table}


without using a WHERE clause
will return the entire table (all QUERY RESULTS:
columns, all rows)

PRO TIP:
SELECT * FROM is a great way to
quickly see what data a table contains

*Copyright 2019, Excel Maven & Maven Analytics, LLC


SELECTING COLUMNS MySQL QUERY IN ACTION:

• You can return one or more


specific columns in your results
by naming them in your SELECT
statement

• To select data from multiple


columns, separate them with QUERY RESULTS:
commas

PRO TIP:
Use line breaks and indentation to make
your code more human-readable.

*Copyright 2019, Excel Maven & Maven Analytics, LLC


Error Code: 1064. You have an error in ERROR TYPE
your SQL syntax; check the manual that
ERROR corresponds to your MySQL server version MISSING/EXTRA COMMA
for the right syntax to use near 'FROM
MESSAGE rental' at line 4

Error Code 1064 means you’re not


following SQL syntax. The most common
WHAT IT culprits are an extra comma after your
MEANS final column of your SELECT statement, or
a missing comma between two columns.

Look for the line number at the end of


HOW TO your Error Response, in this case line 4.
The Response also tells you where your
DUBUG syntax error is near, in this case ‘FROM
LIKE A PRO rental’. Find the row, and that part of your
code. These are your best clues.
Result Preview
YOUR ASSIGNMENT:

“I’m going to send an email


letting our customers know there
has been a management change.
Could you pull a list of the first
name, last name, and email of
each of our customers?”

TEST YOUR SKILLS: QUERYING TABLES


*Copyright 2019, Excel Maven & Maven Analytics, LLC
Solution Query
YOUR ASSIGNMENT:

“I’m going to send an email


letting our customers know there
has been a management change.
Could you pull a list of the first
name, last name, and email of
each of our customers?”

TEST YOUR SKILLS: QUERYING TABLES


*Copyright 2019, Excel Maven & Maven Analytics, LLC
SELECT DISTINCT MySQL QUERY IN ACTION:

• When you use SELECT DISTINCT,


your result set will return just the
distinct (or unique) values in
those columns

• If you include multiple columns,


your result set will return all QUERY RESULTS:
distinct combinations of values
across those columns

PRO TIP:
SELECT DISTINCT is a great way to
quickly see all values in a column

*Copyright 2019, Excel Maven & Maven Analytics, LLC


Result Preview
YOUR ASSIGNMENT:

“My understanding is that we


have titles that we rent for
durations of 3, 5, or 7 days.
Could you pull the records of
our films and see if there are
any other rental durations?”

TEST YOUR SKILLS: SELECT DISTINCT


*Copyright 2019, Excel Maven & Maven Analytics, LLC
Solution Query
YOUR ASSIGNMENT:

“My understanding is that we


have titles that we rent for
durations of 3, 5, or 7 days.
Could you pull the records of
our films and see if there are
any other rental durations?”

TEST YOUR SKILLS: SELECT DISTINCT


*Copyright 2019, Excel Maven & Maven Analytics, LLC
THE WHERE CLAUSE

WHERE (Optional) Specifies criteria for filtering the records of your result set

WHERE logicalCondition
HEY THIS IS IMPORTANT!
WHERE is an optional clause, and always
comes after your FROM clause and before
This lets the SQL server This is where you prescribe the logical any GROUP BY, HAVING, or ORDER BY
know you are about to conditions used to filter your result set clauses (if included)
specify a logical condition
for which rows to include Examples:
• WHERE category = ‘Sci-Fi’
• WHERE amount > 5.99
• WHERE rental_date BETWEEN ‘2006-01-01’ AND ‘2006-06-01’

PRO TIP:
Specify multiple filter criteria using AND/OR statements within your WHERE clause

*Copyright 2019, Excel Maven & Maven Analytics, LLC


WHERE OPERATORS MySQL QUERY IN ACTION:

• WHERE clauses can filter records


using any of these logical operators:

Operator What it Means


= Equals
<> Does NOT Equal
> Greater Than QUERY RESULTS:
< Less Than
>= Greater Than Or Equal To
<= Less Than Or Equal To
BETWEEN A Range Between Two Values
LIKE Matching a Pattern Like This
IN() Equals One of These Values

*Copyright 2019, Excel Maven & Maven Analytics, LLC


WHERE EXAMPLES MySQL QUERY IN ACTION:

1. Pulling records before or after a


date, or between two dates

2. Limiting your result set to include


only transactions above a certain
payment amount
QUERY RESULTS:
3. Filtering your result set to just
records related to certain
customers, or to certain lines of
business

*Copyright 2019, Excel Maven & Maven Analytics, LLC


Error Code: 1064. You have an error in ERROR TYPE
your SQL syntax; check the manual that
ERROR corresponds to your MySQL server version WHERE BEFORE FROM
for the right syntax to use near 'FROM
MESSAGE payment' at line 7

Again, the Error Code 1064 means you’re


not following SQL syntax. In this case,
WHAT IT we’ve tried to apply the WHERE clause
MEANS before the FROM, forgetting that WHERE
always comes after FROM.

Look for the line number at the end of


HOW TO your Error Response, in this case line 7.
The Response also tells you where your
DUBUG syntax error is near, in this case ‘FROM
LIKE A PRO payment’. Find the row, and that part of
your code. These are your best clues.
Result Preview
YOUR ASSIGNMENT:

“I’d like to look at payment


records for our long-term
customers to learn about their
purchase patterns.
Could you pull all payments from
our first 100 customers (based on
customer ID)?”

TEST YOUR SKILLS: WHERE CLAUSES


*Copyright 2019, Excel Maven & Maven Analytics, LLC
Solution Query
YOUR ASSIGNMENT:

“I’d like to look at payment


records for our long-term
customers to learn about their
purchase patterns.
Could you pull all payments from
our first 100 customers (based on ALTERNATIVE OPTIONS:
customer ID)?”

TEST YOUR SKILLS: WHERE CLAUSES


*Copyright 2019, Excel Maven & Maven Analytics, LLC
WHERE & AND MySQL QUERY IN ACTION:

• You can include multiple logical


conditions in your WHERE clause
using an AND statement

• Use AND to return records which


satisfy all criteria
QUERY RESULTS:

PRO TIP:
Add and remove filtering logic and
compare results to quickly master this

*Copyright 2019, Excel Maven & Maven Analytics, LLC


Error Code: 1064. You have an error in ERROR TYPE
your SQL syntax; check the manual that
ERROR corresponds to your MySQL server version MISSING QUOTATION MARK
for the right syntax to use near ''2006-01'
MESSAGE at line 9

Error Code 1064 means you’re not


following SQL syntax. In this case, a very
WHAT IT common example, we have forgotten to
MEANS put a quotation mark after the date.

Look for the line number at the end of


HOW TO your Error Response, in this case line 9.
The Response also tells you where your
DUBUG syntax error is near, in this case the date.
LIKE A PRO Find the row, and that part of your code.
These are your best clues.
Result Preview
YOUR ASSIGNMENT:

“The payment data you gave


me on our first 100 customers
was great – thank you!
Now I’d love to see just
payments over $5 for those
same customers, since
January 1, 2006.”

TEST YOUR SKILLS: WHERE & AND


*Copyright 2019, Excel Maven & Maven Analytics, LLC
Solution Query
YOUR ASSIGNMENT:

“The payment data you gave


me on our first 100 customers
was great – thank you!
Now I’d love to see just
payments over $5 for those ALTERNATIVE OPTIONS:
same customers, since You can add the three filters in any order without
January 1, 2006.” changing the outcome, and can define the
customer_id filter in a number of different ways.
As long as you see the 4 records shown on the
previous slide, you’re good to go!

TEST YOUR SKILLS: WHERE & AND


*Copyright 2019, Excel Maven & Maven Analytics, LLC
WHERE & OR MySQL QUERY IN ACTION:

• You can include multiple logical


conditions in your WHERE clause
using an OR statement

• Use OR to return records which


satisfy any criteria
QUERY RESULTS:

PRO TIP:
You can use AND and OR together in the
same WHERE clause. This is powerful!

*Copyright 2019, Excel Maven & Maven Analytics, LLC


Result Preview
YOUR ASSIGNMENT:

“The data you shared previously


on customers 42, 53, 60, and 75
was good to see.
Now, could you please write a
query to pull all payments from
those specific customers, along
with payments over $5, from
any customer?”

TEST YOUR SKILLS: WHERE & OR


*Copyright 2019, Excel Maven & Maven Analytics, LLC
Solution Query
YOUR ASSIGNMENT:

“The data you shared previously


on customers 42, 53, 60, and 75
was good to see.
Now, could you please write a
query to pull all payments from
those specific customers, along
with payments over $5, from
any customer?” ALTERNATIVE OPTIONS:
The logic in this WHERE can be written in any order.

TEST YOUR SKILLS: WHERE & OR


*Copyright 2019, Excel Maven & Maven Analytics, LLC
WHERE & IN MySQL QUERY IN ACTION:

• If you find yourself writing


multiple OR conditions that
reference different values in the
same column, you can use IN() to
save some time

• The two queries shown at the


These two queries do the exact same thing
right produce identical results,
but the second version is easier
to read (and write!)

PRO TIP:
Using IN saves time, and will make it
easier to read your query later

*Copyright 2019, Excel Maven & Maven Analytics, LLC


Error Code: 1054. Unknown column ERROR TYPE
'cusotmer_id' in 'where clause'
ERROR UNKNOWN COLUMN
MESSAGE

Error Code 1054 means you’re telling the


SQL server to use a column that it cannot
WHAT IT find. The most common reasons:
MEANS
• You’ve misspelled the column name
• That column is not in this table

Read the column name provided in the


HOW TO Response. If it looks like a misspelling, find
the misspelling in your code and fix it. If
DUBUG the column appears spelled correctly,
LIKE A PRO check to make sure that column exists in
the table you are using.
THE LIKE OPERATOR

LIKE Allows you to use pattern matching in your logical operators (instead of exact matching)

LIKE ‘%patternToLookFor%’
HEY THIS IS IMPORTANT!
Capitalization matters. When you
specify a value to look for, the server
LIKE tells SQL that This is where you define the pattern. The “%”
will consider the capitalization you
you’re prescribing a before and after the text is a type of wildcard
provided in your SQL statement.
pattern instead of an (along with “_”)
exact number or string
Examples :
of characters • WHERE name LIKE ‘Denise%’ -- records where name starts with ‘Denise’, with any number of characters after
• WHERE description LIKE ‘%fancy%’ -- records that contains ‘fancy’, with any characters before OR after
• WHERE name LIKE ‘%Johnson’ -- records that end with ‘Johnson’, with any number of characters before
• WHERE first_name LIKE ‘_erry’ -- records that end with ‘erry’, with exactly one character before (i.e. Terry, Jerry)

PRO TIP:
NOT LIKE can also be used to filter out (rather than keep) records where values match a pattern you provide,
and uses the same wildcards and capitalization rules as LIKE

*Copyright 2019, Excel Maven & Maven Analytics, LLC


WILDCARD EXAMPLES MySQL QUERY IN ACTION:

Syntax How SQL Evaluates Logic


column = TRUE if the value in the column matches your
‘value’ value exactly (nothing before or after)

column LIKE TRUE if the first characters in the column match


‘pattern%’ your pattern, even if they are followed by others

column LIKE TRUE if the last characters in the column match


‘%pattern’ your pattern, even if they are preceded by others

TRUE if any characters in the column match your


column LIKE QUERY RESULTS:
pattern, even if there are extra characters before
‘%pattern%’
or after
TRUE if the first characters in the column match
column LIKE
your pattern, even if they are followed by exactly
‘pattern_’
1 other character
TRUE if the last characters in the column match
column LIKE
your pattern, even if they are preceded by exactly
‘_pattern’
1 other character
TRUE if any characters in the column match your
column LIKE
pattern, even if there is exactly 1 character before
‘_pattern_’
or after

*Copyright 2019, Excel Maven & Maven Analytics, LLC


Result Preview
YOUR ASSIGNMENT:

“We need to understand the


special features in our films.
Could you pull a list of films which
include a Behind the Scenes
special feature?”

TEST YOUR SKILLS: LIKE W/ WILDCARDS


*Copyright 2019, Excel Maven & Maven Analytics, LLC
Solution Query
YOUR ASSIGNMENT:

“We need to understand the


special features in our films.
Could you pull a list of films which
include a Behind the Scenes
special feature?”

TEST YOUR SKILLS: LIKE W/ WILDCARDS


*Copyright 2019, Excel Maven & Maven Analytics, LLC
THE GROUP BY CLAUSE

GROUP BY (Optional) Specifies how to group the data in your results

GROUP BY columnName, otherColumnName

HEY THIS IS IMPORTANT!


This lets the SQL server This is where you prescribe which column(s) the
server will use to group your data. You can use GROUP BY is optional, and comes after
know you are about to
any WHERE, and before any HAVING or
specify how you want your multiple columns for grouping if needed.
ORDER BY clauses in your query
result set to be grouped
Examples:
• GROUP BY category
• GROUP BY store_location
• GROUP BY store_location, category

PRO TIP:
GROUP BY is great for comparing different segments of your data (similar to Pivot Tables in Excel)

*Copyright 2019, Excel Maven & Maven Analytics, LLC


GROUP AGGREGATION MySQL QUERY IN ACTION:

• GROUP BY allows us to perform


segment analysis on our data at
various levels of granularity

• Combine GROUP BY with


aggregate functions (like COUNT
or SUM) to specify how values QUERY RESULTS:
are summarized for each group

PRO TIP:
Aggregate functions serve the same purpose
as summarization modes in a Pivot Table

*Copyright 2019, Excel Maven & Maven Analytics, LLC


Error Code: 1140. In aggregated query ERROR TYPE
without GROUP BY, expression #1 of
ERROR SELECT list contains nonaggregated NON-AGGREGATED COLUMN
column
MESSAGE

Error Code 1140 means you’re using an


aggregate function, but also included
WHAT IT some non-aggregated column (in this case
MEANS ratings) and haven’t included it in a
GROUP BY

Look at the expression # provided in the


HOW TO error response to quickly identify the
column that is causing the error. Add that
DUBUG column to a GROUP BY.
LIKE A PRO Think “Dimensions” vs “Metrics”. Make
sure all Dimensions are in your GROUP BY.
COMMENTS & ALIASES MySQL QUERY IN ACTION:

• Comments can be added to your These are


comments.
code using “--” or “/*”, which tells SQL skips
these lines
SQL to ignore those lines This is an example of an “alias”

• Comments can apply to entire lines,


portions of a line, or multiple lines

• Aliases allow you to assign a


QUERY RESULTS:
custom name to a field in your
result set, using an AS statement

PRO TIP:
Commenting make queries much more
human readable -- use them often! Assigning an alias does NOT change the resulting values

*Copyright 2019, Excel Maven & Maven Analytics, LLC


Result Preview
YOUR ASSIGNMENT:

“I need to get a quick overview


of how long our movies tend to
be rented out for.
Could you please pull a count of
titles sliced by rental duration?”

BONUS: USE AN ALIAS TO MATCH THE COLUMN NAMES

TEST YOUR SKILLS: GROUP BY


*Copyright 2019, Excel Maven & Maven Analytics, LLC
Solution Query
YOUR ASSIGNMENT:

“I need to get a quick overview


of how long our movies tend to
be rented out for.
Could you please pull a count of
titles sliced by rental duration?”

TEST YOUR SKILLS: GROUP BY


*Copyright 2019, Excel Maven & Maven Analytics, LLC
MULTIPLE GROUP BY MySQL QUERY IN ACTION:

• You can use GROUP BY with


multiple columns at once, by
listing the columns and separating
them with a comma

• This allows you to create groups


and sub-groups in your result set, QUERY RESULTS:
(just like specifying multiple row
or column labels in a PivotTable!)

PRO TIP:
Group by customer segment, and add a
time dimension to create cohort trends

*Copyright 2019, Excel Maven & Maven Analytics, LLC


AGGREGATE FUNCTIONS MySQL QUERY IN ACTION:

• The powerful functions below can


all be used with GROUP BY to
provide group-level summaries
Function Purpose Note the application of aliases to make
the output easier to interpret
Count of Records
COUNT() Skips NULL, except COUNT(*)

COUNT Count of Distinct Values


DISTINCT() Skips NULL values QUERY RESULTS:
Finds the Smallest Value
MIN() Skips NULL values

Finds the Largest Value


MAX() Skips NULL values

AVG() Average of All Values


Skips NULL values

SUM of All Values


SUM() Treats NULL values as Zero

*Copyright 2019, Excel Maven & Maven Analytics, LLC


Result Preview
YOUR ASSIGNMENT:

“I’m wondering if we charge more


for a rental when the replacement
cost is higher.
Can you help me pull a count of
films, along with the average,
min, and max rental rate,
BONUS: USE ALIASES TO MATCH THE COLUMN NAMES
grouped by replacement cost?”

TEST YOUR SKILLS: AGGREGATE FUNCTIONS


*Copyright 2019, Excel Maven & Maven Analytics, LLC
Solution Query
YOUR ASSIGNMENT:

“I’m wondering if we charge more


for a rental when the replacement
cost is higher.
Can you help me pull a count of
films, along with the average,
min, and max rental rate,
grouped by replacement cost?”

TEST YOUR SKILLS: AGGREGATE FUNCTIONS


*Copyright 2019, Excel Maven & Maven Analytics, LLC
THE HAVING CLAUSE

HAVING (Optional) Specifies group-filtering criteria for filtering your results

HAVING logical condition

This is where you specify the filtering logic


HEY THIS IS IMPORTANT!
that you want applied to your group-level HAVING can only be used with GROUP BY.
aggregated metrics. If you are trying to filter your results, but
aren’t grouping with GROUP BY, then you
Examples: should use a WHERE clause instead.
• HAVING COUNT(*) > 1
• HAVING SUM(payment) > 10
• HAVING MIN(rental_date) < ‘2005-05-25’

PRO TIP:
HAVING is a great way to limit your results to the most important groups for your business. For example,
you can limit results to customers with total payments above a certain amount, or to your most-rented films.

*Copyright 2019, Excel Maven & Maven Analytics, LLC


HAVING MySQL QUERY IN ACTION:

• HAVING is an optional clause you


can use with GROUP BY to limit
your result set to groups which
satisfy certain logical criteria

• HAVING comes after the GROUP


BY clause and before ORDER BY, if QUERY RESULTS:
you are sorting your results (we’ll
cover this next)

*Copyright 2019, Excel Maven & Maven Analytics, LLC


Result Preview
YOUR ASSIGNMENT:

“I’d like to talk to customers that


have not rented much from us to
understand if there is something
we could be doing better.
Could you pull a list of
customer_ids with less than 15
rentals all-time?”

TEST YOUR SKILLS: HAVING


*Copyright 2019, Excel Maven & Maven Analytics, LLC
Solution Query
YOUR ASSIGNMENT:

“I’d like to talk to customers that


have not rented much from us to
understand if there is something
we could be doing better.
Could you pull a list of
customer_ids with less than 15
rentals all-time?”

TEST YOUR SKILLS: HAVING


*Copyright 2019, Excel Maven & Maven Analytics, LLC
THE ORDER BY CLAUSE

ORDER BY (Optional) Specifies the order in which your query results are displayed

ORDER BY columnName, otherColumnName

HEY THIS IS IMPORTANT!


This lets the SQL server This is where you prescribe which column(s) to use ORDER BY is optional, and is always the
know you are about to to sort your data (default is ascending order) last clause of “The Big 6”. It comes after
specify how you want to any WHERE, GROUP BY, or HAVING clauses
Examples:
order (or sort) the records • ORDER BY amount (if included)
returned in your result set • ORDER BY created_date DESC
• ORDER BY created_date DESC, amount DESC

PRO TIP:
When you use ORDER BY with multiple criteria (see the third example above), the server will prioritize
sorting the data based on first column specified, then use additional columns as tiebreakers

*Copyright 2019, Excel Maven & Maven Analytics, LLC


ORDER BY MySQL QUERY IN ACTION:

• SQL lets you sort your results by


including ORDER BY after your
FROM clause (and after your
WHERE, GROUP BY, and HAVING
clauses, if applicable)

• ORDER BY defaults to ascending QUERY RESULTS:


order (low to high), but can be
modified to sort in descending
order by adding DESC after the
column reference

*Copyright 2019, Excel Maven & Maven Analytics, LLC


Result Preview
YOUR ASSIGNMENT:

“I’d like to see if our longest films


also tend to be our most
expensive rentals.
Could you pull me a list of all film
titles along with their lengths
and rental rates, and sort them
from longest to shortest?”

TEST YOUR SKILLS: ORDER BY


*Copyright 2019, Excel Maven & Maven Analytics, LLC
Solution Query
YOUR ASSIGNMENT:

“I’d like to see if our longest films


also tend to be our most
expensive rentals.
Could you pull me a list of all film
titles along with their lengths
and rental rates, and sort them
from longest to shortest?”

TEST YOUR SKILLS: ORDER BY


*Copyright 2019, Excel Maven & Maven Analytics, LLC
RECAP: THE “BIG 6” ELEMENTS OF A SQL SELECT STATEMENT
START OF
STATEMENT

Identifies the column(s) you want your


SELECT query to select for your results
SELECT columnName

Identifies the table(s) your query will pull


FROM data from
FROM tableName

(Optional) Specifies record-filtering criteria


WHERE for filtering your results
WHERE logicalCondition

(Optional) Specifies how to group the data


GROUP BY in your results
GROUP BY columnName

(Optional) Specifies group-filtering criteria


HAVING for filtering your results
HAVING logicalCondition

(Optional) Specifies the order in which your


ORDER BY query results are displayed
ORDER BY columnName
END OF
STATEMENT

*Copyright 2019, Excel Maven & Maven Analytics, LLC


THE CASE STATEMENT

CASE Allows you to process a series of IF/THEN logical operators in a specific order

CASE WHEN logic1 THEN value1 WHEN logic2 THEN value2 ELSE value3 END

HEY THIS IS IMPORTANT!


Every CASE statement This is where you define your logical operators
and the values you want assigned when met. CASE statements execute in the
begins with CASE, ends with
order they appear; if a record
END, and contains at least
Example: satisfies more than one logical
one THEN/WHEN pair. condition, the record will be assigned
CASE
WHEN category IN (‘horror’, ’suspense’) THEN ‘too scary’ by the first THEN statement
WHEN length > 90 THEN ’too long’
ELSE ’we should see it’
END

PRO TIP:
I often use my ELSE condition as a catch all, and write something like “Oops…check logic!”
This helps me quickly see if I forgot to include any conditions in my CASE statement (examples to follow)

*Copyright 2019, Excel Maven & Maven Analytics, LLC


CASE STATEMENTS MySQL QUERY IN ACTION:

• CASE Statements allow you to use


conditional logic to specify how
your results should be calculated
for various cases

• One of the most common


application for CASE is to “bucket” QUERY RESULTS:
values, as shown here →

PRO TIP:
When values feel too granular or noisy, CASE
can help you roll them up to a higher level

*Copyright 2019, Excel Maven & Maven Analytics, LLC


CASE STATEMENTS MySQL QUERY IN ACTION:

• The top WHEN/THEN pair executes


first. If true, the CASE is complete. If
not, it continues testing each
condition until the END is reached

• If multiple WHEN/THEN conditions


are true, the upper-most condition CASE STATEMENT EXECUTION PROCEDURE:
determines the value, since it’s the
first to be evaluated (top to bottom) LENGTH CASE STATEMENT EVALUATION PROCESS ILLUSTRATED
52 1. length < 60 = TRUE ➔ return ‘under 1 hr’

• If no conditions are satisfied when 65


1. length < 60 = FALSE ➔ try next condition
2. length BETWEEN 60 AND 90 = TRUE ➔ return ‘1-1.5hrs’
the CASE reaches the END, a NULL 1. length < 60 = FALSE ➔ try next condition
value will be returned 125 2. length BETWEEN 60 AND 90 = FALSE ➔ try next
3. length > 90 = TRUE ➔ return ‘over 1.5hrs’

*Copyright 2019, Excel Maven & Maven Analytics, LLC


CASE WITH ELSE MySQL QUERY IN ACTION:

• In this CASE, our buckets are not


mutually exclusive or collectively
exhaustive

• Notice how 48 and 50 get assigned


to ‘under 1 hr’ by the first condition,
even though they also satisfy the QUERY RESULTS:
second, and that 117 doesn’t satisfy
any of the given criteria

PRO TIP:
Include an error message using an ELSE
statement to see if you missed any logic

*Copyright 2019, Excel Maven & Maven Analytics, LLC


CASE OPERATORS MySQL QUERY IN ACTION:

• CASE can work with the same set of


logical operators that we used with
our WHERE statements:
Operator What it Means
= Equals
<> Does NOT Equal
> Greater Than QUERY RESULTS:
< Less Than
>= Greater Than Or Equal To
<= Less Than Or Equal To
BETWEEN A Range Between Two Values
LIKE Matching a Pattern Like This
IN() Equals One of These Values
*Copyright 2019, Excel Maven & Maven Analytics, LLC
Result Preview
YOUR ASSIGNMENT:

“I’d like to know which store each


customer goes to, and whether or
not they are active.
Could you pull a list of first and
last names of all customers, and
label them as either ‘store 1
active’, ‘store 1 inactive’, ‘store 2
active’, or ‘store 2 inactive’?”

TEST YOUR SKILLS: CASE STATEMENTS


*Copyright 2019, Excel Maven & Maven Analytics, LLC
Solution Query
YOUR ASSIGNMENT:

“I’d like to know which store each


customer goes to, and whether or
not they are active.
Could you pull a list of first and
last names of all customers, and
label them as either ‘store 1
ALTERNATIVE OPTIONS:
active’, ‘store 1 inactive’, ‘store 2 There are a number of ways to write a valid CASE statement
active’, or ‘store 2 inactive’?” that produces the same results. If you can produce the correct
values for these 4 buckets, you’re good to go!

TEST YOUR SKILLS: CASE STATEMENTS


*Copyright 2019, Excel Maven & Maven Analytics, LLC
PRO TIP: “PIVOTING” DATA WITH COUNT & CASE
CASE “PIVOTS” Excel’s ability to pivot to columns can be replicated in SQL using COUNT and CASE

Excel makes it very easy to “pivot” data on two dimensions.


Here we’re breaking down the count of inventory_id by film_id (rows) and store_id MySQL can do the same thing using COUNT
(columns) to quickly see how many copies of each film we have at each store: functions inside of a CASE statement:

Inventory Table Excel Pivot Table MySQL CASE STATEMENT

Both methods yield


identical results

*Copyright 2019, Excel Maven & Maven Analytics, LLC


CASE & COUNT MySQL QUERY IN ACTION:

• Excel makes it easy to pivot data


into columns and rows

• We can do the same thing in


MySQL by using GROUP BY,
combined with the “CASE Pivot”
ORIGINAL TABLE: QUERY RESULTS:
• When CASE Pivoting, we use
COUNT() and only count records
that match a certain criteria

PRO TIP:
Use GROUP BY to define your row
labels, and CASE to pivot to columns

*Copyright 2019, Excel Maven & Maven Analytics, LLC


Result Preview
YOUR ASSIGNMENT:

“I’m curious how many inactive


customers we have at each store.
Could you please create a table to
count the number of customers
broken down by store_id (in
rows), and active status (in
columns)?”

TEST YOUR SKILLS: CASE & COUNT


*Copyright 2019, Excel Maven & Maven Analytics, LLC
Solution Query
YOUR ASSIGNMENT:

“I’m curious how many inactive


customers we have at each store.
Could you please create a table to
count the number of customers
broken down by store_id (in
rows), and active status (in
columns)?”

TEST YOUR SKILLS: CASE & COUNT


*Copyright 2019, Excel Maven & Maven Analytics, LLC
INTRODUCING THE MID COURSE PROJECT

THE The company’s insurance policy is up for renewal and the insurance company’s underwriters
SITUATION need some updated information from us before they will issue a new policy.

THE Use MySQL to:


OBJECTIVE Leverage your SQL skills to extract and analyze data from various tables in the Maven
Movies database to answer the underwriters’ questions. Each question can be answered
by querying just one table. Part of your job as an Analyst is figuring out which table to use.

*Copyright 2019, Excel Maven & Maven Analytics, LLC


INTRODUCING THE MID COURSE PROJECT

THE Dear Maven Movies Management,


LETTER
In our review of your policy renewal application, we have realized
that your business information has not been updated in a number
of years.

In order to accurately assess the risk and approve your policy


renewal, we will need you to provide all of the following
information.

Sincerely,
Joe Scardycat, Lead Underwriter

*Copyright 2019, Excel Maven & Maven Analytics, LLC


MID COURSE PROJECT QUESTIONS

1 We will need a list of all staff members, including their first and last names, email addresses, and the store
identification number where they work.

2 We will need separate counts of inventory items held at each of your two stores.

3 We will need a count of active customers for each of your stores. Separately, please.

4 In order to assess the liability of a data breach, we will need you to provide a count of all customer email
addresses stored in the database.

*Copyright 2019, Excel Maven & Maven Analytics, LLC


MID COURSE PROJECT QUESTIONS

5 We are interested in how diverse your film offering is as a means of understanding how likely you are to
keep customers engaged in the future. Please provide a count of unique film titles you have in inventory at
each store and then provide a count of the unique categories of films you provide.

6 We would like to understand the replacement cost of your films. Please provide the replacement cost for the
film that is least expensive to replace, the most expensive to replace, and the average of all films you carry.

7 We are interested in having you put payment monitoring systems and maximum payment processing
restrictions in place in order to minimize the future risk of fraud by your staff. Please provide the average
payment you process, as well as the maximum payment you have processed.

8 We would like to better understand what your customer base looks like. Please provide a list of all customer
identification values, with a count of rentals they have made all-time, with your highest volume customers at
the top of the list.

*Copyright 2019, Excel Maven & Maven Analytics, LLC


DATABASE FUNDAMENTALS (PART 2)

*Copyright 2019, Excel Maven & Maven Analytics, LLC


DATABASE NORMALIZATION
Normalization is the process of structuring the tables and columns in a relational database to minimize redundancy
and preserve data integrity. Benefits of normalization include:
• Eliminating duplicate data (this makes storage and query processing more efficient)
• Reducing errors and anomalies (restrictions around data structure help to prevent human errors)

inventory_id title release_year store_address store_district


1 ACADEMY DINOSAUR 2006 47 MySakila Drive Alberta
2 ACADEMY DINOSAUR 2006 47 MySakila Drive Alberta
3 ACADEMY DINOSAUR 2006 47 MySakila Drive Alberta
4 ACADEMY DINOSAUR 2006 47 MySakila Drive Alberta
5 ACADEMY DINOSAUR 2006 28 MySQL Boulevard QLD HEY THIS IS IMPORTANT!
6 ACADEMY DINOSAUR 2006 28 MySQL Boulevard QLD If you don’t normalize your database, you
7 ACADEMY DINOSAUR 2006 28 MySQL Boulevard QLD end up with tables that look like this, with
8 ACADEMY DINOSAUR 2006 28 MySQL Boulevard QLD
9 ACE GOLDFINGER 2006 28 MySQL Boulevard QLD
lots and lots of duplicate values.
10 ACE GOLDFINGER 2006 28 MySQL Boulevard QLD
11 ACE GOLDFINGER 2006 28 MySQL Boulevard QLD
This may not seem like a lot for the 20
12 ADAPTATION HOLES 2006 28 MySQL Boulevard QLD records shown, but can have a significant
13 ADAPTATION HOLES 2006 28 MySQL Boulevard QLD impact as your database scales.
14 ADAPTATION HOLES 2006 28 MySQL Boulevard QLD
15 ADAPTATION HOLES 2006 28 MySQL Boulevard QLD
16 AFFAIR PREJUDICE 2006 47 MySakila Drive Alberta
17 AFFAIR PREJUDICE 2006 47 MySakila Drive Alberta
18 AFFAIR PREJUDICE 2006 47 MySakila Drive Alberta
19 AFFAIR PREJUDICE 2006 47 MySakila Drive Alberta
20 AFFAIR PREJUDICE 2006 28 MySQL Boulevard QLD
*Copyright 2019, Excel Maven & Maven Analytics, LLC
NORMALIZATION: MULTIPLE RELATED TABLES
In practice, normalization involves breaking out data from a single merged table into multiple related tables
• Instead of storing redundant information about each store and film in a single table (like the one one the left), we create
new tables containing a single record for each unique value, and link to those tables using a simple id

Not normalized: Normalized!


inventory_id title release_year store_address store_district inventory_id film_id address_id film_id title release_year
1 ACADEMY DINOSAUR 2006 47 MySakila Drive Alberta 1 1 1 1 ACADEMY DINOSAUR 2006
2 ACADEMY DINOSAUR 2006 47 MySakila Drive Alberta 2 1 1 2 ACE GOLDFINGER 2006
3 ACADEMY DINOSAUR 2006 47 MySakila Drive Alberta 3 1 1 3 ADAPTATION HOLES 2006
4 ACADEMY DINOSAUR 2006 47 MySakila Drive Alberta 4 1 1 4 AFFAIR PREJUDICE 2006
5 ACADEMY DINOSAUR 2006 28 MySQL Boulevard QLD 5 1 2
6 ACADEMY DINOSAUR 2006 28 MySQL Boulevard QLD 6 1 2
address_id address district
7 ACADEMY DINOSAUR 2006 28 MySQL Boulevard QLD 7 1 2
1 47 MySakila Drive Alberta
8 ACADEMY DINOSAUR 2006 28 MySQL Boulevard QLD 8 1 2
2 28 MySQL Boulevard QLD
9 ACE GOLDFINGER 2006 28 MySQL Boulevard QLD 9 2 2
10 ACE GOLDFINGER 2006 28 MySQL Boulevard QLD 10 2 2
11 ACE GOLDFINGER 2006 28 MySQL Boulevard QLD 11 2 2
12 ADAPTATION HOLES 2006 28 MySQL Boulevard QLD 12 3 2
13 ADAPTATION HOLES 2006 28 MySQL Boulevard QLD 13 3 2 We now have single records containing
14 ADAPTATION HOLES 2006 28 MySQL Boulevard QLD 14 3 2 all of the information about our films
15 ADAPTATION HOLES 2006 28 MySQL Boulevard QLD 15 3 2
and addresses – no more redundancy!
16 AFFAIR PREJUDICE 2006 47 MySakila Drive Alberta 16 4 1
17 AFFAIR PREJUDICE 2006 47 MySakila Drive Alberta 17 4 1
18 AFFAIR PREJUDICE 2006 47 MySakila Drive Alberta 18 4 1
19 AFFAIR PREJUDICE 2006 47 MySakila Drive Alberta 19 4 1
20 AFFAIR PREJUDICE 2006 28 MySQL Boulevard QLD 20 4 2

*Copyright 2019, Excel Maven & Maven Analytics, LLC


TABLE RELATIONSHIPS & CARDINALITY

Cardinality refers to the uniqueness of values in a column (or attribute) of a table, and is commonly used to describe
how two tables relate (one-to-one, one-to-many, or many-to-many). For now, here are the key points to grasp:

FOREIGN FOREIGN
(MANY) (MANY)
• Primary keys are unique
inventory_id film_id address_id
1 1 1 • They cannot repeat, so there is only one instance
2 1 1 PRIMARY of each primary key value in a column
3 1 1 (ONE)
4 1 1 film_id title release_year • Foreign keys are non-unique
5 1 2 1 ACADEMY DINOSAUR 2006
2 ACE GOLDFINGER 2006
• They can repeat, so there may be many instances
6 1 2
7 1 2 3 ADAPTATION HOLES 2006 of each foreign key value in a column
4 AFFAIR PREJUDICE 2006
8 1 2
9 2 2 • We can create a one-to-many relationship
10 2 2 PRIMARY by connecting a foreign key in one table to
11 2 2 (ONE)
12 3 2 a primary key in another
address_id address district
13 3 2 1 47 MySakila Drive Alberta
14 3 2 2 28 MySQL Boulevard QLD
15 3 2
16 4 1
17 4 1
18 4 1
19 4 1
20 4 2

*Copyright 2019, Excel Maven & Maven Analytics, LLC


RELATIONSHIP DIAGRAMS
Consider the two tables shown below:
• The Customer table below contains details about each customer, identified by a unique customer_id (the table’s primary key)
• The Rental table contains records of each rental, and includes a non-unique customer_id field since customers may rent films
on multiple occasions (this is one of the table’s foreign keys)

Customer table:

We can diagram table


relationships in Workbench to
understand how the records
in each table relate

NOTE: This isn’t required, but


serves as a helpful reference
Rental table:

PRO TIP:
When you explore a database for the first time, diagram
your relationships to understand your table structure

*Copyright 2019, Excel Maven & Maven Analytics, LLC


USING JOINS FOR MULTI-TABLE QUERYING

The whole point of table relationships is to enable multi-table querying (i.e. pulling data from multiple tables at once)
• In SQL, we use JOIN statements to do this, by writing these table relationships directly into our queries

For example, what if you need a table showing all film titles in
each store’s inventory?
Since title and store_id live in separate tables, we can’t use
single-table queries; we’ll need to use a JOIN!

Sneak peek at a MySQL JOIN query

*Copyright 2019, Excel Maven & Maven Analytics, LLC


FULL MAVEN MOVIES DATABASE

*Copyright 2019, Excel Maven & Maven Analytics, LLC


JOINING MULTIPLE TABLES

*Copyright 2019, Excel Maven & Maven Analytics, LLC


COMMON JOIN TYPES

Returns records that exist in BOTH tables, and FROM leftTableName


INNER JOIN excludes unmatched records from either table INNER JOIN rightTableName

Returns ALL records from the LEFT table, and any FROM leftTableName
LEFT JOIN matching records from the RIGHT table LEFT JOIN rightTableName

Returns ALL records from the RIGHT table, and FROM leftTableName
RIGHT JOIN any matching records from the LEFT table RIGHT JOIN rightTableName

Returns ALL records from BOTH tables, including FROM leftTableName


FULL OUTER JOIN non-matching records FULL JOIN rightTableName

Returns all data from one table, with all data SELECT FROM firstName
UNION from another table appended to the end UNION
SELECT FROM secondTableName
*Copyright 2019, Excel Maven & Maven Analytics, LLC
INNER JOIN

INNER JOIN Returns records from BOTH tables when there is a match, and excludes unmatched records

INNER JOIN rightTableName ON leftTable.columnName = rightTable.columnName

Tells SQL to return This is where you name your right table, and tell
only the overlap SQL which column to match on by specifying the HEY THIS IS IMPORTANT!
column name from each table INNER JOIN is one of the two join types
you’ll likely use most (LEFT JOIN is the
Example:
other). Make sure you understand the
Left Right FROM rental
INNER JOIN customer
differences between the two types!
ON rental.customer_id = customer.customer_id

PRO TIP:
A good way to remember how INNER JOIN works is to think of it as only returning the inner/middle part of a
Venn diagram (where the two tables match/overlap)

*Copyright 2019, Excel Maven & Maven Analytics, LLC


INNER JOIN MySQL QUERY IN ACTION:

The 1st query pulls


DISTINCT inventory_id
• INNER JOIN lets us pull records values from inventory

from two tables at once, and only


returns records when the “JOIN ON” The 2nd query joins the
rental and inventory
logic produces a match tables to return
matching inventory_ids

• After the FROM clause, we add


INNER JOIN, followed by the name
of the table we want to join to, QUERY RESULTS:
followed by logic for how the server
Notice how the 1st query returns 4,581 rows and the 2nd
should perform the join query returns 4,580 rows.
Inventory_id = 5 does not appear in the rental table, so the
• When you write queries using record is filtered out of the result set even though it does
appear in the inventory table (must be a really bad movie…)
multiple tables, you need to specify
both the table and column name
(i.e. inventory.inventory_id)
*Copyright 2019, Excel Maven & Maven Analytics, LLC
Error Code: 1052. Column 'inventory_id' in ERROR TYPE
field list is ambiguous
ERROR COLUMN IS AMBIGUOUS
MESSAGE

Error Code 1052 means you are


referencing a column name which appears
WHAT IT in more than one of your joined tables,
MEANS and you have not specified which table
you want to use.

Read the column name and the location in


HOW TO the response to find the ambiguous
column name. Specify a table and re-run.
DUBUG In this case, the location is ‘field list’.
LIKE A PRO Other times, you could see ‘where clause’,
‘on clause’, etc.
Result Preview
YOUR ASSIGNMENT:

“Can you pull for me a list of each


film we have in inventory?
I would like to see the film’s title,
description, and the store_id value
associated with each item, and its
inventory_id. Thanks!”

TEST YOUR SKILLS: JOINING TABLES


*Copyright 2019, Excel Maven & Maven Analytics, LLC
Solution Query
YOUR ASSIGNMENT:

“Can you pull for me a list of each


film we have in inventory?
I would like to see the film’s title,
description, and the store_id value
associated with each item, and its
inventory_id. Thanks!”

TEST YOUR SKILLS: JOINING TABLES


*Copyright 2019, Excel Maven & Maven Analytics, LLC
LEFT JOIN

LEFT JOIN Returns all records from the LEFT table, and any matched records from the RIGHT table

LEFT JOIN rightTableName ON leftTable.columnName = rightTable.columnName

Tells SQL to return the


This is where you name your right table, and tell
overlap and everything
SQL which column to match on by specifying the HEY THIS IS IMPORTANT!
from the left table
column name from each table
LEFT JOIN is what you’ll use when you
Example: want additional data from a second table,
FROM rental and you want to keep ALL records from
Left Right LEFT JOIN customer your first table (even if there isn’t a match)
ON rental.customer_id = customer.customer_id

PRO TIP:
This is the join type that I use most often. When you’re working with a data set and want to add data from
another table while keeping all of your current records, LEFT JOIN is the way to go!

*Copyright 2019, Excel Maven & Maven Analytics, LLC


LEFT JOIN MySQL QUERY IN ACTION:

• In this example, we’re looking for a


COUNT of all films in which each
actor appeared

• We use LEFT JOIN here because we


want to return all of the actors’
names, even if they don’t match QUERY RESULTS:
with any film records

*Copyright 2019, Excel Maven & Maven Analytics, LLC


LEFT vs INNER JOIN MySQL QUERY IN ACTION:

• While INNER JOIN returns results


only when there is a match, LEFT INNER JOIN
JOIN returns any matches plus all vs.
LEFT JOIN
records from the left table (the
first table named)

• You can think of INNER JOIN as QUERY RESULTS:


being more restrictive, and LEFT
JOIN being a little looser

PRO TIP:
Compare your query with LEFT and INNER
joins to quickly master the differences INNER JOIN = 4,580 ROWS LEFT JOIN: 4,581 ROWS

*Copyright 2019, Excel Maven & Maven Analytics, LLC


Result Preview
YOUR ASSIGNMENT:

“One of our investors is interested


in the films we carry and how
many actors are listed for each
film title.
Can you pull a list of all titles, and
figure out how many actors are
associated with each title?”

TEST YOUR SKILLS: LEFT JOIN


*Copyright 2019, Excel Maven & Maven Analytics, LLC
Solution Query
YOUR ASSIGNMENT:

“One of our investors is interested


in the films we carry and how
many actors are listed for each
film title.
Can you pull a list of all titles, and
figure out how many actors are
associated with each title?”

TEST YOUR SKILLS: LEFT JOIN


*Copyright 2019, Excel Maven & Maven Analytics, LLC
RIGHT JOIN

RIGHT JOIN Returns all records from the RIGHT table, and any matched records from the LEFT table

RIGHT JOIN rightTableName ON leftTable.columnName = rightTable.columnName

Tells SQL to return the This is where you tell SQL how to look for a match by
overlap and everything specifying columns from each table
from the right table HEY THIS IS IMPORTANT!
Example: RIGHT JOIN is like the opposite of LEFT
FROM rental JOIN; instead of keeping all records from
RIGHT JOIN customer the first (left) table named, it keeps all
Left Right ON rental.customer_id = customer.customer_id
records from the second (right)

PRO TIP:
I write SQL queries every day in my professional career, and I’ve never needed to use RIGHT JOIN.
To keep things simple, just use LEFT JOIN (I’m only teaching you about RIGHT JOIN for completeness)

*Copyright 2019, Excel Maven & Maven Analytics, LLC


LEFT vs INNER vs RIGHT
LEFT JOIN INNER JOIN RIGHT JOIN

Returns all values from the left (first) table and any Returns any matching values from both tables. For Returns all values from the right (second) table, and
records from the right (second) table which match the records with no match, INNER JOIN does not return any records from the left (first) table which match the
JOIN criteria. Returns NULL when no match is found. values from either table. JOIN criteria. Returns NULL when no match is found.

*Copyright 2019, Excel Maven & Maven Analytics, LLC


FULL OUTER JOIN (aka FULL JOIN)

FULL JOIN Returns all records from BOTH tables when there is a match in either one of the tables

FULL JOIN rightTableName ON leftTable.columnName = rightTable.columnName

Tells SQL to return records This is where you tell SQL how to look for a match, by
from both tables specifying columns from each table
HEY THIS IS IMPORTANT!
Example: Use a FULL JOIN when you want records
FROM rental
from both tables even when there isn’t a
FULL JOIN customer
Left Right ON rental.customer_id = customer.customer_id
match for the column you’re joining on
(this may yield lots of records!)

PRO TIP:
FULL JOIN likely isn’t something you’ll use every day, but can come in handy if you ever need to merge ALL
records from two tables

*Copyright 2019, Excel Maven & Maven Analytics, LLC


PRO TIP: “BRIDGING” UNRELATED TABLES
When you need to connect two tables which do not directly relate, look for a third table containing keys common to
both; this can serve as a “bridge” to join your tables together
• Here we have no key to connect the customer table directly to city, but we can join customer to address (using
address_id), and address to city (using city_id). In this case the address table serves as our bridge.

BRIDGE

*Copyright 2019, Excel Maven & Maven Analytics, LLC


PRO TIP: BRIDGING MySQL QUERY IN ACTION:

• In this example, we need the film.title


and category.name fields, but the film
and category tables don’t have a
matching key
• We’ll need to find a “bridge” table,
which relates to both of the tables we
want to pull data from
• In this example, film_category has QUERY RESULTS:
keys to both film and category tables,
and can serve as our bridge
Using two JOINs, we can produce a result
set containing values from 3 tables:
• film
• film_category
PRO TIP: • category
If you aren’t sure if two tables can connect, list
the tables each one is joined to. Find a bridge!

*Copyright 2019, Excel Maven & Maven Analytics, LLC


Result Preview
YOUR ASSIGNMENT:

“Customers often ask which films


their favorite actors appear in.
It would be great to have a list of
all actors, with each title that they
appear in. Could you please pull
that for me?”

TEST YOUR SKILLS: BRIDGING


*Copyright 2019, Excel Maven & Maven Analytics, LLC
Solution Query
YOUR ASSIGNMENT:

“Customers often ask which films


their favorite actors appear in.
It would be great to have a list of
all actors, with each title that they
appear in. Could you please pull
that for me?”

TEST YOUR SKILLS: BRIDGING


*Copyright 2019, Excel Maven & Maven Analytics, LLC
MULTI-CONDITION JOINS MySQL QUERY IN ACTION:

• The top query joins film to category


(using film_category as a bridge), then
uses a WHERE clause to filter down the
result set to only include ‘horror’ films

• The bottom query makes being a


‘horror’ film part of the JOIN criteria, These two queries produce the
by adding an AND statement within exact same results

the ON clause (which makes the join


itself more selective)

• Both methods are valid, and both filter


the result set to ‘horror’ films,
producing identical results

*Copyright 2019, Excel Maven & Maven Analytics, LLC


Result Preview
YOUR ASSIGNMENT:

“The Manager from Store 2 is


working on expanding our film
collection there.
Could you pull a list of distinct titles
and their descriptions, currently
available in inventory at store 2?”

TEST YOUR SKILLS: MULTI-CONDITION JOINS


*Copyright 2019, Excel Maven & Maven Analytics, LLC
Solution Query
YOUR ASSIGNMENT:

“The Manager from Store 2 is


working on expanding our film
collection there.
Could you pull a list of distinct titles
and their descriptions, currently
available in inventory at store 2?”

TEST YOUR SKILLS: MULTI-CONDITION JOINS


*Copyright 2019, Excel Maven & Maven Analytics, LLC
UNION: A DIFFERENT WAY TO COMBINE TABLES

UNION Returns all data from the FIRST table, with all data from the SECOND table appended to the end

UNION SELECT sameColumnName FROM secondTableName

UNION tells SQL to Instead of a JOIN to another table, HEY THIS IS IMPORTANT!
combine the we write a second SELECT statement here:
results of one UNION will deduplicate records, and
SELECT statement
+ Example: keep only distinct values in your result
with another -- yields one column with all names (first+last) set. If you want to keep duplicate
SELECT first_name, last_name FROM advisor records as well, use UNION ALL.
UNION
SELECT first_name, last_name FROM investor

PRO TIP:
UNION is an easy one to run into errors with. Make sure that A) your two SELECT statements have the same
number of columns, B) columns are in the same order, and C) columns in each table have similar data types.

*Copyright 2019, Excel Maven & Maven Analytics, LLC


UNION MySQL QUERY IN ACTION:

• UNION allows you to combine the


results of two or more SELECT
statements into a single result set

• Think of UNION as “stacking” two


result sets on top of each other to
make one longer result set QUERY RESULTS:

• Each SELECT statement must


include the same number of
columns, and stacked columns
must share compatible formats

*Copyright 2019, Excel Maven & Maven Analytics, LLC


Result Preview
YOUR ASSIGNMENT:

“We will be hosting a meeting with


all of our staff and advisors soon.
Could you pull one list of all staff
and advisor names, and include a
column noting whether they are a
staff member or advisor? Thanks!”

TEST YOUR SKILLS: UNION


*Copyright 2019, Excel Maven & Maven Analytics, LLC
Solution Query
YOUR ASSIGNMENT:

“We will be hosting a meeting with


all of our staff and advisors soon.
Could you pull one list of all staff
and advisor names, and include a
column noting whether they are a
staff member or advisor? Thanks!”

TEST YOUR SKILLS: UNION


*Copyright 2019, Excel Maven & Maven Analytics, LLC
INTRODUCING THE FINAL COURSE PROJECT

THE You and your business partner were recently approached by another local business owner
SITUATION who is interested in purchasing Maven Movies. He primarily owns restaurants and bars, so he
has lots of questions for you about your business and the rental business in general. His offer
seems very generous, so you are going to entertain his questions.

THE Use MySQL to:


OBJECTIVE Leverage your SQL skills to extract and analyze data from various tables in the Maven
Movies database to answer your potential Acquirer’s questions. Each question will require
you to write a multi-table SQL query, joining at least two tables.

*Copyright 2019, Excel Maven & Maven Analytics, LLC


INTRODUCING THE FINAL COURSE PROJECT

THE Dear Maven Movies Management,


LETTER
I am excited about the potential acquisition and learning more
about your rental business.

Please bear with me as I am new to the industry, but I have a


number of questions for you. Assuming you can answer them all,
and that there are no major surprises, we should be able to move
forward with the purchase.

Best,
Martin Moneybags

*Copyright 2019, Excel Maven & Maven Analytics, LLC


FINAL COURSE PROJECT QUESTIONS

1 My partner and I want to come by each of the stores in person and meet the managers. Please send over
the managers’ names at each store, with the full address of each property (street address, district, city, and
country please).

2 I would like to get a better understanding of all of the inventory that would come along with the business.
Please pull together a list of each inventory item you have stocked, including the store_id number, the
inventory_id, the name of the film, the film’s rating, its rental rate and replacement cost.

3 From the same list of films you just pulled, please roll that data up and provide a summary level overview of
your inventory. We would like to know how many inventory items you have with each rating at each store.

4 Similarly, we want to understand how diversified the inventory is in terms of replacement cost. We want to
see how big of a hit it would be if a certain category of film became unpopular at a certain store.
We would like to see the number of films, as well as the average replacement cost, and total replacement
cost, sliced by store and film category.

*Copyright 2019, Excel Maven & Maven Analytics, LLC


FINAL COURSE PROJECT QUESTIONS

5 We want to make sure you folks have a good handle on who your customers are. Please provide a list
of all customer names, which store they go to, whether or not they are currently active, and their full
addresses – street address, city, and country.

We would like to understand how much your customers are spending with you, and also to know who your
6 most valuable customers are. Please pull together a list of customer names, their total lifetime rentals, and the
sum of all payments you have collected from them. It would be great to see this ordered on total lifetime value,
with the most valuable customers at the top of the list.

My partner and I would like to get to know your board of advisors and any current investors. Could you
7 please provide a list of advisor and investor names in one table? Could you please note whether they are an
investor or an advisor, and for the investors, it would be good to include which company they work with.

We're interested in how well you have covered the most-awarded actors. Of all the actors with three types of
8 awards, for what % of them do we carry a film? And how about for actors with two types of awards? Same
questions. Finally, how about actors with just one award?

*Copyright 2019, Excel Maven & Maven Analytics, LLC

You might also like