Unit 3 - Rdbms Notes
Unit 3 - Rdbms Notes
Normalization
Normalization is a process used in database design to
systematically organize data and eliminate redundancy and
inconsistencies. It involves breaking down large tables into smaller,
more focused tables and defining relationships between them. This
process helps to improve data integrity, reduce data duplication,
and make the database more efficient and maintainable.
1. Poor Normalization
3. Lack of Indexing
Selection (σ):
σ GPA>3.5(Students)
Projection (π):
π StudentID, Name(Students)
Union (∪):
Freshmen ∪ Sophomores
Intersection (∩):
Honors ∩ MathClub
Freshmen - Honors
Students × Courses
Join (⋈):
1. Combines rows from two or more relations, based on a
related column between them.
2. Types of Joins:
1. Natural Join: Joins relations based on common
attributes.
2. Equijoin: Joins relations based on an equality
condition.
3. Theta Join: Joins relations based on a general
comparison operator (e.g., <, >, ≠).
4. Outer Join: Preserves tuples from one relation
even if they don't have matching tuples in the
other relation. (Left Outer Join, Right Outer Join,
Full Outer Join)
3. Example: To join the "Students" and
"Enrollments" relations based on the
"StudentID" attribute:
Students ⋈ Enrollments
Denormalization in DBMS
Denormalization is a database optimization technique that involves
adding redundant data to one or more tables in a normalized
database. This is done to improve read performance by reducing
the number of joins required to retrieve data. While normalization
is a technique to eliminate redundancy and improve data integrity,
denormalization intentionally introduces redundancy to enhance
performance.
Why Denormalize?
Performance Improvement:
o Reduces Join Operations: By storing redundant data,
denormalization can reduce the number of joins
required to retrieve information, leading to faster query
execution.
o Improves Read Performance: In read-heavy
workloads, denormalization can significantly boost
performance.
When to Denormalize:
Risks of Denormalization:
Denormalization Techniques:
Example:
Normalized Tables:
SQL
CREATE TABLE Customers (
CustomerID INT PRIMARY KEY,
CustomerName VARCHAR(50),
Address VARCHAR(100)
);
CREATE TABLE Orders (
OrderID INT PRIMARY KEY,
CustomerID INT,
OrderDate DATE,
TotalAmount DECIMAL(10,2),
FOREIGN KEY (CustomerID) REFERENCES
Customers(CustomerID)
);
Denormalized Table:
SQL
CREATE TABLE OrdersWithCustomerInfo (
OrderID INT PRIMARY KEY,
CustomerID INT,
OrderDate DATE,
TotalAmount DECIMAL(10,2),
CustomerName VARCHAR(50),
Address VARCHAR(100),
FOREIGN KEY (CustomerID) REFERENCES
Customers(CustomerID)
);
2. Performance Tuning:
5. Scalability:
Inner Join
Syntax:
Table A
Number Square
2 4
Number Square
3 9
Table B
Number Cube
2 8
3 27
A ⨝B
Output
2 4 8
3 9 27
2. Outer Join
Outer join is a type of join that retrieves matching as well as non-
matching records from related tables. These three types of outer
join
Left outer join
Right outer join
Full outer join
It is also called left join. This type of outer join retrieves all
records from the left table and retrieves matching records from
the right table.
Example: Suppose there are two tables Table A and Table B
Table A
Number Square
2 4
3 9
4 16
Table B
Number Cube
2 8
3 27
5 125
A ⟕B
Output
2 4 8
3 9 27
4 16 NULL
SQL Query-
SELECT * FROM TableA LEFT OUTER JOIN TableB
ON TableA.Number = TableB.Number;
Explanation: Since we know in the left outer join we take all the
columns from the left table (Here Table A) In the table A we can
see that there is no Cube value for number 4. so we mark this as
NULL.
It is also called a right join. This type of outer join retrieves all
records from the right table and retrieves matching records from
the left table. And for the record which doesn’t lies in Left table
will be marked as NULL in result Set.
Output:
2 4 8
3 9 27
5 NULL 125
SQL Query
SELECT * FROM TableA RIGHT OUTER JOIN
TableB ON TableA.Number= TableB.Number;
Explanation: Since we know in the right outer join we take all the
columns from the right table (Here Table B) In table A we can see
that there is no square value for number 5. So we mark this as
NULL.
FULL JOIN creates the result set by combining the results of both
LEFT JOIN and RIGHT JOIN. The result set will contain all the
rows from both tables. For the rows for which there is no
matching, the result set will contain NULL values.
Example: Table A and Table B are the same as in the left outer
join
A ⟗B
Output:
2 4 8
3 9 27
4 16 NULL
5 NULL 125
3. Self Join
A SQL SELF JOIN is a type of join operation where a table
is joined with itself. It allows you to combine data from a
single table by creating a virtual copy of the table and
establishing relationships between the original and virtual
tables. Self joins are used to compare or combine data
within the same table, often by creating relationships
between rows within the table
Syntax
Join Dependency
A Join Dependency on a relation schema R, specifies a
constraint on states, r of R that every legal state r of R should have
a lossless join decomposition into R1R1, R2R2,..., RnRn. In a
database management system, join dependency is a
generalization of the idea of multivalued dependency.
E_Name Company
Rohan Comp1
Harpreet Comp2
Anant Comp3
E_Name Product
Rohan Jeans
Harpreet Jacket
Anant TShirt
Company Product
Comp1 Jeans
Company Product
Comp2 Jacket
Comp3 TShirt
Step 2- Next, let's perform the natural join of the above table
with R3:
Multi-valued dependency
o Multivalued dependency occurs when two attributes in a
table are independent of each other but, both depend on a
third attribute.
o A multivalued dependency consists of at least two attributes
that are dependent on a third attribute that's why it always
requires at least three attributes.
. BIKE_MODEL → → MANUF_YEAR
. BIKE_MODEL → → COLOR
This can be read as "BIKE_MODEL multidetermined
MANUF_YEAR" and "BIKE_MODEL multidetermined COLOR".
Concepts of Normalization - 1NF, 2NF, 3NF,4NF
and 5NF with example.
What is Normalization?
Advantages of Normalization :-
a single cell must not hold more than one value (atomicity)
there must be a primary key for identification
no duplicated rows or columns
each column must have only one value for each row in the
table
Examples of 1NF :-
Imagine we're building a restaurant management application. That
application needs to store data about the company's employees
and it starts out by creating the following table of employees:
All the entries are atomic and there is a composite primary key
(employee_id, job_code) so the table is in the first normal form
(1NF).
But even if you only know someone's employee_id, then you can
determine their name, home_state, and state_code (because they
should be the same person). This means name, home_state,
and state_code are dependent on employee_id (a part of primary
composite key). So, the table is not in 2NF. We should separate
them to a different table to make it 2NF.
employees Table
employee_id name state_code home_state
E001 Alice 26 Michigan
E002 Bob 56 Wyoming
E003 Alice 56 Wyoming
jobs table
job_code job
J01 Chef
J02 Waiter
J03 Bartender
be in 2NF
have no transitive partial dependency.
employee_roles Table
employee_id job_code
E001 J01
E001 J02
E002 J02
E002 J03
E003 J01
employees Table
employee_id name state_code
E001 Alice 26
E002 Bob 56
E003 Alice 56
jobs Table
job_code job
J01 Chef
J02 Waiter
J03 Bartender
states Table
state_code home_state
26 Michigan
56 Wyoming
Example:-
STUDENT
21 Computer Dancing
21 Math Singing
34 Chemistry Dancing
74 Biology Cricket
59 Physics Hockey
STUDENT_COURSE
STU_ID COURSE
21 Computer
21 Math
34 Chemistry
74 Biology
59 Physics
STUDENT_HOBBY
STU_ID HOBBY
21 Dancing
21 Singing
34 Dancing
74 Cricket
59 Hockey
Fifth Normal Form/Projected Normal Form (5NF)
The fifth normal form (5NF) is also called the Project-Join Normal
Form (PJNF). A relation is in 5NF if it is in 4NF and does not
contain any join dependencies that could result in data loss during
the join operation.
Table ACP
A1 PQR Nut
A1 PQR Bolt
A1 XYZ Nut
A1 XYZ Bolt
A2 PQR Nut
Table R1
Agent Company
Agent Company
A1 PQR
A1 XYZ
A2 PQR
Table R2
Agent Product
A1 Nut
A1 Bolt
A2 Nut
Table R3
Company Product
PQR Nut
PQR Bolt
XYZ Nut
XYZ Bolt
42 abc 17
43 pqr 18
44 xyz 18
42 abc 17
43 pqr 18
44 xyz 18
42 abc 17
43 pqr 18
44 xyz 18
45 abc 19
42 abc CO 4
43 pqr EC 2
44 xyz IT 1
45 abc EC 2