0% found this document useful (0 votes)
2 views8 pages

SQL Theory for Data Science

This document provides an overview of SQL (Structured Query Language) and its applications in data science, including data retrieval, manipulation, and database management. It covers fundamental concepts such as data models, relationships, SQL syntax, filtering, sorting, and joining tables, along with practical examples. The course is designed for various roles including data scientists, database administrators, and backend developers, utilizing SQLite as the primary database management system.

Uploaded by

dain55788
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
Download as docx, pdf, or txt
0% found this document useful (0 votes)
2 views8 pages

SQL Theory for Data Science

This document provides an overview of SQL (Structured Query Language) and its applications in data science, including data retrieval, manipulation, and database management. It covers fundamental concepts such as data models, relationships, SQL syntax, filtering, sorting, and joining tables, along with practical examples. The course is designed for various roles including data scientists, database administrators, and backend developers, utilizing SQLite as the primary database management system.

Uploaded by

dain55788
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1/ 8

SQL for Data Science - University of California

Module 1 : Getting into SQL


Module 1.1 : Selecting and Retrieving Data with SQL

Part 1 : WHAT IS SQL?


1. SQL - Structured Query Languages - a standard computer language for relational database
management and data manipulation.
*remember : SQL syntax is either uppercase or lowercase (example : select or SELECT is appropriate).
2. USED to query, insert, update and modify data. USED to communicate with databases.

3. HOW is SQL used?


- SQL is all about data.
- Read/ Retrieve data.
- Write data.
- Update data - Insert new data.

4. WHO use SQL ? - Backend Dev, QA Engineer, Data Architect, Database Admin, Data
Scientists.…..

Database Administrator (DBA) :


 Manages/ Governs entire database.
 Gives permissions to users.
 Determines access to data.
 Manages and creates tables.
 Uses SQL to query and retrieve data.

Data Scientists :
 End user of a database.
 Uses SQL to query and retrieve data.

5. HOW do Data Scientists use SQL?


 Retrieve data. (main thing used by DS)
 Create their own table or test environment.
 Combine multiple sources together.
 Writes Complex queries for analysis.

6. SQL and Database Management Systems. (DBMS)


- How we write syntax will depend on what DBMS we’re using.
Such as : MySQL, PostgreSQL, SQLite, Microsoft SQL Server.…
In this course with University of California, we’ll use SQLite.

Part 2 : Data Models, Thinking about our Data :


Think before code : What’s the problem we’re trying to solve?

Understand the Data :


 Understand the Business process or Subject matter the data is modeled after.
 Know the business rules.
 Understand how your data is organized and structured in the table.

Think why/ what we’re doing, spend some time understanding how the data relates to each
other …..

Databases and Tables :


Database is a container (file or set of files) to store organized data, a set of related information.
Tables is a structured list of data or a specific type.

Columns and Rows.

Part 3 : The evolution of Data Models :

What is data modeling?


- Organizes and structures information into multiple and related tables.
- Represent a business process or show relationships between business processes.
- Should always represent real-world problem.

Relational database model allows us to write queries, retrieve, update and write data.

Part 4 : Relational and Transaction Models :


(Objectives : - Describe and explain the differences between one-one, one-many and
many-many relationships - Describe primary keys in a database - Explain how ER diagrams
are used).

1. Relational vs Transactional Model


- Relational Model allows us for easy querying and data manipulation in an easy, logical way.
- Transaction Model is operational database. (may need to make and extract trans
information and move it into a relational model).

2. Data model building blocks : Entity (person, place or event …), Attribute (attributes of an
entity), Relationship : one-to-many, many-to-many, one-to-one.

3. ER Diagrams : is composed of entity types and specifies relationships that can exists
between instances.

4. Relationships :
A One-to-Many (1:M) relationship : a Painter can paint many paintings, and each paintings is painted
by one Painter. (example)
A Many-to-Many (M:N) relationship : an employee can learn many skills, each skill can be learned by
many employees.
A One-to-One (1:1) : an employee manages one store, each store is managed by only 1 employee.

ER Diagram Notation : Chen Notation, Crow’s Foot Notation, UML Class Diagram Notation.
---> Try to understand notations when reading ER diagrams.

Part 5 : Retrieving data with a SELECT statement :


Retrieving multiple columns :
SELECT prod_name, prod_id, prod_price #can use * to select all items
FROM Products;

or SELECT prod_name,
, prod_id
, prod_price
FROM Products;
Limit output using LIMIT :
SELECT columns we wish
FROM specific table
LIMIT number of records

Example :
SELECT prod_name
FROM Products;
LIMIT 5;

Part 6 : Creating Tables and Adding Data to the Table:


Why tables? - Tables used to make models and predictions.
- Create dashboards.
- Visualize data with other tools.
- Extract data from other sources.
Example :

CREATE TABLE SHOES


(
Id char(10) PRIMARYKEY, # PRIMARYKEY doesn’t accept null values.
Brand char(10) NOT NULL, # NOT NULL key to accept null values.
Type char(10) NOT NULL,
Price decimal(8,2) NOT NULL,
);

Create TEMPORARY Table :


- Temporary table will be deleted after ending terminal.
Example :
CREATE TEMPORARY TABLE Sandals AS
(
SELECT *
FROM Shoes
WHERE type = ‘slippers’
)
Insert data :
INSERT INTO Shoes (Id, Brand, Type, Color, Desc)
VALUES (‘12345’
, ‘Gucci’
, ‘Slippers’
, ‘Pink’
, NULL );

Part 7 : Add comments to SQL code :


- There are 2 ways :
Single Line :
SELECT shoe_id
- -, brand_id #comment brand_id
, shoe_name
FROM Shoes
Section :
SELECT shoe_id
/*, brand_id
,shoe_name #Comment the whole 2 lines
*/
FROM Shoes

MODULE 2 : Filtering, Sorting and Calculating Data with SQL :


New Clauses : WHERE, BETWEEN, IN, NOT, LIKE, ORDER BY, GROUP BY, OR.
Math Operators : AVERAGE, COUNT, MAX, MIN.

Part 1 : Basic Filtering with SQL:


(Objectives : use WHERE and BETWEEN clauses …)

WHERE clause : (where goes with SELECT and FROM clauses.)


Example :
SELECT column_name, column_name
FROM table_name
WHERE column_name operator value;
Operator : =, <> (not equal), >, <, <=, >=, BETWEEN, IS NULL.

Example 1 :
SELECT ProductName
, UnitPrice
, SupplierID
FROM Products
WHERE ProductName = ‘Tofu’; #filter its value : WHERE UnitPrice >= 75;

Filter using Range of Values (BETWEEN, AND) :


SELECT ProductName
,UnitPrice
, SupplierID
,UnitsInStock
FROM Products
WHERE UnitsInStock BETWEEN 15 AND 80;

Part 2 : Advanced Filtering : IN, OR, NOT:


For example :
SELECT ProductID
, UnitPrice
, SupplierID
From Products
WHERE SupplierID IN (9, 10, 11);
OR operator : (Once the first condition is met, the second condition won’t be evaluate)
SELECT ProductID,….
….
Where ProductName = ‘Tofu’ OR ‘Konbu’;
AND with OR :
..….
WHERE (SupplierID = 9 OR SupplierID = 11) AND UnitPrice > 15;
(because of using (), AND condition will be executed first, then OR condition.)
NOT Operator :
SELECT *
FROM Employees
WHERE NOT City = ‘London’ AND NOT City = ‘Seattle’; #here : AND NOT = OR

Part 3 : Using wildcards in SQL :


- Wildcards cannot be used for numerical data.
Ways of using % wildcards :
‘%Pizza’ : Grabs anything ending with the word Pizza
‘Pizza%’ : Grabs anything after the word Pizza
‘%Pizza%’ : Grabs anything before and after the word Pizza
‘S%E’ : Grabs anything that starts with S and ends with E

Ways of using Underscore (_) wildcards :


Example : Where size LIKE ‘_pizza’ #similar to %pizza
Output example : spizza mpizza

Part 4 : Sorting with ORDER BY


- Rules : Must always be the last clause in select statement.
Example :
SELECT something
FROM database
ORDER BY characteristics

Sorting by Column Position : ORDER BY 2,3 (2nd column and 3rd column)
Sort Direction : DESC (descending order) and ASC (ascending order).
*Remember sorting direction only applies to a specific column name. (specify each individual
columns for ascending and descending).

Part 5 : Math Operations :


Example :
SELECT ProductID
, UnitsOnOrder
, UnitPrice
, UnitsOnOrder * UnitPrice AS Total_Order_Cost
#The result will appear in Total_Order_Cost
FROM Products

Example 1 :
…...
,(UnitPrice - Discount) * Quantity AS Total_Cost
FROM OrderDetails;

Part 6 : Aggregate Functions + Distinct statement :


- Include : COUNT(), AVG(), MAX(), MIN(), SUM()
- Used to summarize data, finding the highest and lowest data.
- Finding the total number of rows, the average value.
Example :
Select AVG(UnitPrice) AS avg_price
FROM Products
Example 1 :
Select COUNT(CustomerID) AS total_customers
FROM Customers;
*remember : the null values will be ignored with MAX and MIN function
Example 2 :
Select MAX(UnitPrice) AS max_prod_price
, MIN(UnitPrice) AS min_prod_price
FROM Products;

DISTINCT Statement:
- use distinct to retrieve specific information.
Example :
SELECT COUNT(DISTINCT CustomerID) #or we can use distinct without aggregate functions.
FROM Customers;

Part 7 : Grouping Data with SQL (Group By and HAVING clauses):


GROUP BY clause :
- Every column in Select statement will be presented in Group By clause.
SELECT Region, COUNT(CustomerID) AS total_customers
FROM Customers
GROUP BY Region;
HAVING clause :
- Used after data being grouped by GROUP BY statement.

Example :
SELECT CustomerID
, COUNT(*) AS orders
FROM Orders
GROUP BY CustomerID
HAVING COUNT(*) >= 2
# In this example, we’re selecting those customers who had orders >= 2

Compare WHERE vs HAVING :


- WHERE filters before data is grouped.
- HAVING filters after data is grouped.
- Rows eliminated by the WHERE clause won’t included in the group.

MODULE 3 : Subqueries and Joins Method in SQL :

Part 1 : Using subqueries :

- Subqueries help us merge data from 2 or more tables or merge data from multiple sources.
- Helps us adding other filtering criteria.
Example :
SELECT CustomerID #In this retrieving, DBMS is performing 2 operations :
, CompanyID # 1. Getting the order numbers for the product selected.
, Region # 2. Adding that to the WHERE clause and processing
FROM Customers # the overall SELECT statement.
WHERE customerID IN (SELECT customerID
FROM Orders
WHERE Freight > 100);
Another example : Write in a indented way :
SELECT Customer_name, Customer_contact
FROM Customers
WHERE cust_id IN
SELECT Customer_id
FROM Orders
WHERE order_number IN (SELECT order_number
FROM OrderItems
WHERE prod_name = ‘Toothbrush’);

Example 2 using Subquery (HackkerRank) :


SELECT CITY, LENGTH(CITY)
FROM STATION
WHERE CITY IN ((SELECT CITY FROM STATION ORDER BY LENGTH(CITY), CITY LIMIT 1),
(SELECT CITY FROM STATION ORDER BY LENGTH(CITY) DESC, CITY LIMIT 1))

Example 3 :
SELECT COUNT(*) AS orders
FROM Orders
WHERE customer_id = ‘143569’

SELECT customer_name
,customer_state
(SELECT COUNT(*) AS orders
FROM Orders
WHERE Orders.customer_id = Customer.customer_id) AS orders
FROM customers
ORDER BY customer_name

Ouput is like this (example) :

| customer_name | customer_state | orders |


|-----------------------|-------------------------|----------|
| Customer A | CA |2 |
| Customer B | NY |1 |
| Customer C | TX | 2 | # Kho hieu vai loz

Part 4 : Joining tables :


- Associate correct records from each table on the fly.
- Allows data retrieval from multiple tables in 1 query.

Part 5 : Cartesian (Cross) join :


CROSS JOINS : each row from the first table joins with all the rows of another table.

Example :
SELECT product_name
, unit_price
, company_name
FROM suppliers CROSS JOIN products.

INPUT : Table 1 (suppliers), Table 2(products).


OUTPUT : the number of joins in the 1st table multiplied by the number of rows in the 2nd.
Example if there are 29 records in table 1, 77 records in table 2, the result will be 2,233.
Part 6 : INNER JOIN :
- Inner join is used when we want to take matching values of rows both 2 tables.
- The result is the connection of columns that have matching values together.
Example :
SELECT suppliers.CompanyName #Select CompanyName from suppliers table
, ProductName # Select ProductName and UnitPrice from Products
, UnitPrice
FROM suppliers INNER JOIN Products
ON suppliers.supplierID = Products.supplierID; #Condition to join 2 tables together.

Example 2 :
SELECT o.OrderID, c.CompanyName, e.LastName
FROM ((Orders o INNER JOIN Customers c ON o.CustomerID = c.CustomerID)
INNER JOIN Employess e ON o.EmployeeID = e.EmployeeID);

You might also like