SQL Theory for Data Science
SQL Theory for Data Science
4. WHO use SQL ? - Backend Dev, QA Engineer, Data Architect, Database Admin, Data
Scientists.…..
Data Scientists :
End user of a database.
Uses SQL to query and retrieve data.
Think why/ what we’re doing, spend some time understanding how the data relates to each
other …..
Relational database model allows us to write queries, retrieve, update and write data.
2. Data model building blocks : Entity (person, place or event …), Attribute (attributes of an
entity), Relationship : one-to-many, many-to-many, one-to-one.
3. ER Diagrams : is composed of entity types and specifies relationships that can exists
between instances.
4. Relationships :
A One-to-Many (1:M) relationship : a Painter can paint many paintings, and each paintings is painted
by one Painter. (example)
A Many-to-Many (M:N) relationship : an employee can learn many skills, each skill can be learned by
many employees.
A One-to-One (1:1) : an employee manages one store, each store is managed by only 1 employee.
ER Diagram Notation : Chen Notation, Crow’s Foot Notation, UML Class Diagram Notation.
---> Try to understand notations when reading ER diagrams.
or SELECT prod_name,
, prod_id
, prod_price
FROM Products;
Limit output using LIMIT :
SELECT columns we wish
FROM specific table
LIMIT number of records
Example :
SELECT prod_name
FROM Products;
LIMIT 5;
Example 1 :
SELECT ProductName
, UnitPrice
, SupplierID
FROM Products
WHERE ProductName = ‘Tofu’; #filter its value : WHERE UnitPrice >= 75;
Sorting by Column Position : ORDER BY 2,3 (2nd column and 3rd column)
Sort Direction : DESC (descending order) and ASC (ascending order).
*Remember sorting direction only applies to a specific column name. (specify each individual
columns for ascending and descending).
Example 1 :
…...
,(UnitPrice - Discount) * Quantity AS Total_Cost
FROM OrderDetails;
DISTINCT Statement:
- use distinct to retrieve specific information.
Example :
SELECT COUNT(DISTINCT CustomerID) #or we can use distinct without aggregate functions.
FROM Customers;
Example :
SELECT CustomerID
, COUNT(*) AS orders
FROM Orders
GROUP BY CustomerID
HAVING COUNT(*) >= 2
# In this example, we’re selecting those customers who had orders >= 2
- Subqueries help us merge data from 2 or more tables or merge data from multiple sources.
- Helps us adding other filtering criteria.
Example :
SELECT CustomerID #In this retrieving, DBMS is performing 2 operations :
, CompanyID # 1. Getting the order numbers for the product selected.
, Region # 2. Adding that to the WHERE clause and processing
FROM Customers # the overall SELECT statement.
WHERE customerID IN (SELECT customerID
FROM Orders
WHERE Freight > 100);
Another example : Write in a indented way :
SELECT Customer_name, Customer_contact
FROM Customers
WHERE cust_id IN
SELECT Customer_id
FROM Orders
WHERE order_number IN (SELECT order_number
FROM OrderItems
WHERE prod_name = ‘Toothbrush’);
Example 3 :
SELECT COUNT(*) AS orders
FROM Orders
WHERE customer_id = ‘143569’
SELECT customer_name
,customer_state
(SELECT COUNT(*) AS orders
FROM Orders
WHERE Orders.customer_id = Customer.customer_id) AS orders
FROM customers
ORDER BY customer_name
Example :
SELECT product_name
, unit_price
, company_name
FROM suppliers CROSS JOIN products.
Example 2 :
SELECT o.OrderID, c.CompanyName, e.LastName
FROM ((Orders o INNER JOIN Customers c ON o.CustomerID = c.CustomerID)
INNER JOIN Employess e ON o.EmployeeID = e.EmployeeID);