0% found this document useful (0 votes)
2 views39 pages

Schema Refinement (Normalization) in DBMS

Schema Refinement (Normalization) in DBMS
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
Download as pptx, pdf, or txt
0% found this document useful (0 votes)
2 views39 pages

Schema Refinement (Normalization) in DBMS

Schema Refinement (Normalization) in DBMS
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
Download as pptx, pdf, or txt
Download as pptx, pdf, or txt
You are on page 1/ 39

Database Systems

Schema Refinement
(Normalization)
Outline

Informal Design Guidelines for good Relation


Schemas
Formal concepts of Functional Dependencies
and Normal Form
– 1 NF (1st Normal Forms)
– 2 NF (2nd Normal Forms)
– 3 NF (3rd Normal Forms)
Introduction
Relational database design: The grouping of
attributes to form “Good" relation schemas
Two levels of relation schemas:
– The logical "user view" level
– The storage "base relation" level
Design is concerned mainly with base relations

What are Criteria for "good" base relations?


– Informal guidelines for good relational design

– Formal concepts (Normalization) of functional dependencies


and normal forms 1NF 2NF 3NF
Informal Design Guidelines For Relation
Schemas
1. Making sure attribute semantics are clear

2. Reducing redundant information in tuples

3. Reducing NULL values in tuples


Problems that occur in poorly planned,
unmormalized DB where all the data is stored in
one table (a Flat File database) is called anomaly
Informal Design Guidelines For Relation Schemas

1. Semantics of the Relation Attributes

 Whenever attributes are grouped to form a relation schema, it is


assumed that attributes belonging to one relation have certain
real-world meaning and a proper interpretation associated with
them.
 In general the easier it is to explain the semantics of the relation,
the better the relation schema design is.

Guideline 1: Design a relation schema so that it is easy to explain


its meaning.
Do not combine attributes from multiple entity types and
relationship types into a single relation.

3
Informal Design Guidelines For Relation Schemas

Example of violating Guideline 1:

A tuple in the EMP_DEPT relation represents a single employee


but:
Includes additional information, (Dname) of the department for
which the employee works and (Dmgr_ssn) of the department
manager.
Informal Design Guidelines For Relation Schemas

2. Redundant Information in Tuples and Update Anomalies

Redundancy: Duplication of data to be stored in database

 Grouping attributes into relation schemas has a significant


effect on storage space.
 One goal of schema design is to minimize the storage space
used by the base relations.

– Mixing attributes of multiple entities may cause


Redundancy

4
Informal Design Guidelines For Relation Schemas
2. Redundant Information in Tuples and Update Anomalies

DNUMBER

DNUMBER, DNAME and DMNGRSSN is repeating group for


employee working in same department 5
Informal Design Guidelines For Relation Schemas

2. Redundant Information in Tuples and Update Anomalies

• Redundant information may cause update anomalies.

• Update anomalies:

• Insertion anomalies

• Deletion anomalies.

• Modification anomalies.

7
Informal Design Guidelines For Relation Schemas

2. Redundant Information in Tuples and Update Anomalies

EMP_PROJ
SSN PNumber Hours EName PName PLocation

• Insertion Anomalies:
• Occurs when it is impossible to store a fact until another fact is
known.
• Example:
1. Cannot insert a project unless an employee is assigned.
2. Cannot insert an employee unless he/she is assigned to a
project.

8
Informal Design Guidelines For Relation Schemas

2. Redundant Information in Tuples and Update Anomalies

EMP_PROJ
SSN PNumber Hours EName PName PLocation

• Delete anomalies:
• Occurs when the deletion of a fact causes other facts to be
deleted.
• Example:
1. When a project is deleted, it will result in deleting all the
employees who work on that project.
2. If an employee is the sole employee on a project, deleting
that employee would result in deleting the corresponding
project.

9
Informal Design Guidelines For Relation Schemas

2. Redundant Information in Tuples and Update Anomalies

EMP_PROJ
SSN PNumber Hours EName PName PLocation

• Modification Anomalies:
1. Occurs when a change in a fact causes multiple
modifications to be necessary.
2. Example: Changing the name of project number P1 (for
example) may cause this update to be made for all
employees working on that project.

10
Informal Design Guidelines For Relation Schemas

2. Redundant Information in Tuples and Update Anomalies

Guideline 2: Design the base relation schemas so that NO insertion,


deletion, or modification anomalies are present in the relations.

if any anomalies are present, note them clearly and make sure that
the programs that update the database will operate correctly.

11
Informal Design Guidelines For Relation Schemas

3. Null Values in Tuples

• In some schema designs many attributes may be grouped together


into a “flat” relation.

• If many of the attributes do not apply to all tuples in the relation,


many null values will appear in those tuples.

Guideline 3: As far as possible, avoid placing attributes in a base


relation whose values may frequently be null.
If nulls are unavoidable, make sure that they apply in exceptional
cases only and do not apply to a majority of tuples in the relation.

12
Functional Dependencies (FDs)
 FDs are constraints that are derived from the meaning and
interrelationships of the data attributes

 Used to define NORMAL FORM for relations


 Represented as (XY)

– A set of attributes X functionally determines a set of attributes Y if


the value of X determines a unique value for Y
– Means that if we know value of X then we can precisely determine a
unique value of Y
 Example:
– Social Security Number determines employee name
SSN  ENAME
– Attribute or set of attributes on the left side are called
determinant and on the right are called dependents
Partial Functional dependency
In case of COMPOSITE primary key all non key
attribute of relation should determined by whole key

 Partial Dependency – when a non-key attribute is determined


by a part, not the whole COMPOSITE primary key.
Partial
CUSTOMER Dependency

Cust_ID Name Order_ID


{Cust_Id, Order_ID} Name is not
101 AT&T 1234 a full FD called partial dependency
101 AT&T 156 since
Cust_Id Name also holds which is
125 Cisco 1250 part of key
Transitive Dependency
When a non-key/prime attribute determines another
non-key attribute.
– Nonprime is an attribute which is Not a member of any
candidate key
Transitive
Dependency

EMPLOYEE

Emp_ID F_Name L_Name Dept_ID Dept_Name


111 Mary Jones 1 Acct
122 Sarah Smith 2 Mktg

Emp_ID  Dept_IDDept_Name
Transitive Dependency
 A transitive dependency is a functional dependency which holds
by virtue of transitivity. A transitive dependency can occur only in
a relation that has three or more attributes.

 The functional dependency {Book} → {Author Nationality} applies; that is, if


we know the book, we know the author's nationality. Furthermore:
– {Book} → {Author}
– {Author} does not → {Book}
– {Author} → {Author Nationality}
– Therefore {Book} → {Author Nationality} is a transitive dependency.
 Transitive dependency occurred because a non-key attribute (Author) was
determining another non-key attribute (Author Nationality).
Decomposition
 Solution to the problem caused by data redundancy and
Functional Dependencies
 Decomposition mean breaking up the large schema into
multiple smaller schema
 It help to remove all the anomalies and help to maintain data
integrity
EMP_PROJ
SSN PNumber Hours EName PName PLocation
 EMP_PROJ can be decomposed into following smaller schemas to remove update
anomalies

EMP SSN EName PNumber

WORK_ON SSN PNumber Hours

PROJECT PNumber PName PLocation


Decomposition Problems
 Decomposition may lead to its own problem

 Following two properties of decomposition must be considered:

1. Loss less-join Property: Identify any instance (row) of


original relation from the corresponding instance of the
smaller relation attained after decomposition

2. Dependency preservation property: Ensures that each


functional dependency is represented in some individual
relation resulting after decomposition.
Normalization of Relations
 Normalization is the process of decomposing
relations with anomalies to produce smaller, well
structured relations.
– Normalization can be accomplished and
understood in stages, each of which corresponds
to a normal form.

Normal form is a state of a relation that results


from applying simple rules regarding functional
dependencies to that relation.

22
First Normal Form (1NF)
A relation is said to be in 1NF if:
– The attribute value are atomic:
A attribute said to be value atomic if it contain only a
single value of data for any given rows and column
intersection
– There should be No repeating group in particular
rows
Relation in 1 NF disallows:
– Multivalued attribute
– Composite or nested attribute
– Repeating groups of rows
First Normal Form (1NF)

Consider following DEPARTMENT relation

Multivalued
First Normal Form (1NF)
There are three main techniques to achieve first normal
form for Multivalued attributes:
1. Expand the key so that there will be a separate tuple in the
original DEPARTMENT relation for each location of a
DEPARTMENT. Redundancy- Repeating groups

2. Decompose: Remove the attribute DLOCATIONS that violates


1NF and place it in a separate relation DEPT_LOCATIONS
along with the primary key DNUMBER of DEPARTMENT.

3. If a maximum number of values is known for the attribute


(e.g. 3) replace the DLOCATIONS attribute by three atomic
attributes:
DLOCATION1, DLOCATION2, DLOCATION3. Null values
28
Multivalued

1- Expand PK (a) A relation schema that is not in


1NF.
(b) Sample state of relation
DEPARTMENT.
(c) 1NF version of the same
relation with redundancy-Repeating
group -For each value of Dlocation
group of (Dname, Dnumber,
Dmgr_ssn) exists.
Introduce Repeating
Group
DEPT_LOC 2- Decompose into 2 relation
(DEPT_LOC, DEPT): to remove repeating
Dnumber Dlocation
group
5 Bellaire New relation with Dnumber and Dlocation
5 Sugarland as PK
DEPT
5 Houston
4 Stafford Dname Dnumber Mng_ssn
1 Houston
Research 5 333445555
Administration 4 987654321
Headquarter 1 888665555

DEPT_LOCATIONS 3- Three atomic locations


Dname Dnumber Mng_ssn Dlocation1 Dlocation2 Dlocation3

Research 5 333445555 Bellaire Sugarland Houston


Administration 4 987654321 Stafford Null Null
Headquarter 1 888665555 Houston Null Null
Slide 1-26
First Normal Form (1NF)
Does not allow nested relations
– Each tuple can have a relation within it
To change to 1NF:
– Remove nested relation attributes into a new
relation
– Propagate the primary key into it
– Un-nest relation into a set of 1NF relations
First Normal Form (1NF)
Remove Nested Relation
(a) Schema of the EMP_PROJ relation with a “nested relation” PROJS.

(b) Example extension of the EMP_PROJ relation showing nested


relations within each tuple.
(c) Decomposing EMP_PROJ
into 1NF relations EMP_PROJ1
and EMP_PROJ2 by propagating
the primary key (SSN).

28
Second Normal Form (2NF)
• A relation is in 2NF if it is:
– in 1NF
– Every nonprime attribute is fully functionally dependent
on the primary key

• Remove Partial Dependency:


– Any attributes which are dependent on part of the
composite key
– These attributes are put into a separate table along with
that part of the compound key.

30
Second Normal Form: Examples

31
Second Normal Form: Examples (cont’d…)

 {SSN, PNUMBER}  HOURS is a full FD since neither


SSN  HOURS nor PNUMBER  HOURS hold

 {SSN, PNUMBER}  ENAME is not a full FD (it is


called a partial dependency )
– since SSN  ENAME also holds

 A relation schema R is in second normal form (2NF) if


every non-prime attribute A in R is fully functionally
dependent on the primary key

 R can be decomposed into 2NF relations via the process of


2NF normalization
Example: Determine NF
ORDER

Order_No Product_ID Description

In your solution you will write the following


justification:
Product_ID  Description 1) No M/V attributes, therefore at least 1NF
2) There is a partial dependency
(Product_IDDescription), therefore not in
2NF

Solution:
Order(Order_No, Prod_ID)
Prod(Prod_ID, Description)
Third Normal Form (3 NF)
Based on concept of transitive dependency

A Relations in third normal form (3NF) if it is in 2NF


and there is no transitive dependency,
– No non-prime attribute is dependent on another non-prime
attribute
– A relation is said to be in 3NF if every determinant is a
key
– i.e. for each and every fictional dependency: FD: AB, A
is a key
Solution: Decompose relation with attributes which are
dependent on another attribute other than the primary
key within the table.
32
Transitive Dependency
 Transitive functional dependency – if there a set of
atribute Z that are neither a primary or candidate key
and both X  Y and Y  Z holds.

Examples:
– SSN  DMGRSSN is a transitive FD since
SSN  DNUMBER and DNUMBER  DMGRSSN hold

– SSN  ENAME is non-transitive since there is no set


of attributes X where SSN  X and X  ENAME
Third Normal Form: Example
DNUMBER is non prime attribute which determine other non
prime attribute DNAME and DMGRSSN
In other word SSN transitively (indirectly) determine
DNAME and DMGRSSN

33
Example : Determine NF
BOOK

ISBN Title Publisher Address

ISBN  Title
In your solution you will write the
ISBN  Publisher following justification:
Publisher  Address 1. No M/V attributes, therefore at least
1NF
2. No partial dependencies, therefore
at least 2NF
Solution: 3. There is a transitive dependency
Book( ISBAN,Title,Publisher) (Publisher  Address), therefore,
not 3NF

Pub_Add(Publisher, Address)
Steps in Data Normalization
UNORMALISED ENTITY

Step 1: remove repeating groups

1st NORMAL FORM

Step 2: remove partial dependencies

2nd NORMAL FORM

Step 3: remove indirect dependencies

3rd NORMAL FORM

Step 4: remove multi-dependencies Step 4: every determinate a key

4th NORMAL FORM BOYCE-CODD NORMAL FORM

25
Advantages of Normalization

 The amount of unnecessary redundant data is reduced.

 Data integrity is easily maintained within the database.

 The database & application design processes are much more


flexible.

 Security is easier to manage.

42
Disadvantages of Normalization

 Produces lots of tables with a relatively small number of


columns

 Probably requires joins in order to put the information back


together in the way it needs to be used

 Impacts computer performance (CPU, I/O, memory).

43

You might also like