Schema Refinement (Normalization) in DBMS

Database Systems
Schema Refinement
(Normalization)
Outline
Informal Design Guidelines for good Relation

Schemas
Formal concepts of Functional Dependencies
and Normal Form
– 1 NF (1st Normal Forms)
– 2 NF (2nd Normal Forms)
– 3 NF (3rd Normal Forms)
Introduction
Relational database design: The grouping of
attributes to form “Good" relation schemas
Two levels of relation schemas:
– The logical "user view" level
– The storage "base relation" level
Design is concerned mainly with base relations
What are Criteria for "good" base relations?

– Informal guidelines for good relational design
– Formal concepts (Normalization) of functional dependencies

and normal forms 1NF 2NF 3NF
Informal Design Guidelines For Relation
Schemas
1. Making sure attribute semantics are clear
2. Reducing redundant information in tuples
3. Reducing NULL values in tuples

Problems that occur in poorly planned,
unmormalized DB where all the data is stored in
one table (a Flat File database) is called anomaly
Informal Design Guidelines For Relation Schemas
1. Semantics of the Relation Attributes
 Whenever attributes are grouped to form a relation schema, it is

assumed that attributes belonging to one relation have certain
real-world meaning and a proper interpretation associated with
them.
 In general the easier it is to explain the semantics of the relation,
the better the relation schema design is.
Guideline 1: Design a relation schema so that it is easy to explain

its meaning.
Do not combine attributes from multiple entity types and
relationship types into a single relation.
3
Example of violating Guideline 1:
A tuple in the EMP_DEPT relation represents a single employee

but:
Includes additional information, (Dname) of the department for
which the employee works and (Dmgr_ssn) of the department
manager.
2. Redundant Information in Tuples and Update Anomalies
Redundancy: Duplication of data to be stored in database
 Grouping attributes into relation schemas has a significant

effect on storage space.
 One goal of schema design is to minimize the storage space
used by the base relations.
– Mixing attributes of multiple entities may cause

Redundancy
4
DNUMBER
DNUMBER, DNAME and DMNGRSSN is repeating group for

employee working in same department 5
• Redundant information may cause update anomalies.
• Update anomalies:
• Insertion anomalies
• Deletion anomalies.
• Modification anomalies.
7
EMP_PROJ
SSN PNumber Hours EName PName PLocation
• Insertion Anomalies:
• Occurs when it is impossible to store a fact until another fact is
known.
• Example:
1. Cannot insert a project unless an employee is assigned.
2. Cannot insert an employee unless he/she is assigned to a
project.
8
EMP_PROJ
• Delete anomalies:
• Occurs when the deletion of a fact causes other facts to be
deleted.
• Example:
1. When a project is deleted, it will result in deleting all the
employees who work on that project.
2. If an employee is the sole employee on a project, deleting
that employee would result in deleting the corresponding
project.
9
EMP_PROJ
• Modification Anomalies:
1. Occurs when a change in a fact causes multiple
modifications to be necessary.
2. Example: Changing the name of project number P1 (for
example) may cause this update to be made for all
employees working on that project.
10
Guideline 2: Design the base relation schemas so that NO insertion,

deletion, or modification anomalies are present in the relations.
if any anomalies are present, note them clearly and make sure that
the programs that update the database will operate correctly.
11
3. Null Values in Tuples
• In some schema designs many attributes may be grouped together

into a “flat” relation.
• If many of the attributes do not apply to all tuples in the relation,

many null values will appear in those tuples.
Guideline 3: As far as possible, avoid placing attributes in a base

relation whose values may frequently be null.
If nulls are unavoidable, make sure that they apply in exceptional
cases only and do not apply to a majority of tuples in the relation.
12
Functional Dependencies (FDs)
 FDs are constraints that are derived from the meaning and
interrelationships of the data attributes
 Used to define NORMAL FORM for relations

 Represented as (XY)
– A set of attributes X functionally determines a set of attributes Y if

the value of X determines a unique value for Y
– Means that if we know value of X then we can precisely determine a
unique value of Y
 Example:
– Social Security Number determines employee name
SSN  ENAME
– Attribute or set of attributes on the left side are called
determinant and on the right are called dependents
Partial Functional dependency
In case of COMPOSITE primary key all non key
attribute of relation should determined by whole key
 Partial Dependency – when a non-key attribute is determined

by a part, not the whole COMPOSITE primary key.
Partial
CUSTOMER Dependency
Cust_ID Name Order_ID

{Cust_Id, Order_ID} Name is not
101 AT&T 1234 a full FD called partial dependency
101 AT&T 156 since
Cust_Id Name also holds which is
125 Cisco 1250 part of key
Transitive Dependency
When a non-key/prime attribute determines another
non-key attribute.
– Nonprime is an attribute which is Not a member of any
candidate key
Transitive
Dependency
EMPLOYEE
Emp_ID F_Name L_Name Dept_ID Dept_Name

111 Mary Jones 1 Acct
122 Sarah Smith 2 Mktg
Emp_ID  Dept_IDDept_Name
 A transitive dependency is a functional dependency which holds
by virtue of transitivity. A transitive dependency can occur only in
a relation that has three or more attributes.
 The functional dependency {Book} → {Author Nationality} applies; that is, if

we know the book, we know the author's nationality. Furthermore:
– {Book} → {Author}
– {Author} does not → {Book}
– {Author} → {Author Nationality}
– Therefore {Book} → {Author Nationality} is a transitive dependency.
 Transitive dependency occurred because a non-key attribute (Author) was
determining another non-key attribute (Author Nationality).
Decomposition
 Solution to the problem caused by data redundancy and
Functional Dependencies
 Decomposition mean breaking up the large schema into
multiple smaller schema
 It help to remove all the anomalies and help to maintain data
integrity
EMP_PROJ
 EMP_PROJ can be decomposed into following smaller schemas to remove update
anomalies
EMP SSN EName PNumber
WORK_ON SSN PNumber Hours
PROJECT PNumber PName PLocation

Decomposition Problems
 Decomposition may lead to its own problem
 Following two properties of decomposition must be considered:
1. Loss less-join Property: Identify any instance (row) of

original relation from the corresponding instance of the
smaller relation attained after decomposition
2. Dependency preservation property: Ensures that each

functional dependency is represented in some individual
relation resulting after decomposition.
Normalization of Relations
 Normalization is the process of decomposing
relations with anomalies to produce smaller, well
structured relations.
– Normalization can be accomplished and
understood in stages, each of which corresponds
to a normal form.
Normal form is a state of a relation that results

from applying simple rules regarding functional
dependencies to that relation.
22
First Normal Form (1NF)
A relation is said to be in 1NF if:
– The attribute value are atomic:
A attribute said to be value atomic if it contain only a
single value of data for any given rows and column
intersection
– There should be No repeating group in particular
rows
Relation in 1 NF disallows:
– Multivalued attribute
– Composite or nested attribute
– Repeating groups of rows
Consider following DEPARTMENT relation
Multivalued
There are three main techniques to achieve first normal
form for Multivalued attributes:
1. Expand the key so that there will be a separate tuple in the
original DEPARTMENT relation for each location of a
DEPARTMENT. Redundancy- Repeating groups
2. Decompose: Remove the attribute DLOCATIONS that violates

1NF and place it in a separate relation DEPT_LOCATIONS
along with the primary key DNUMBER of DEPARTMENT.
3. If a maximum number of values is known for the attribute

(e.g. 3) replace the DLOCATIONS attribute by three atomic
attributes:
DLOCATION1, DLOCATION2, DLOCATION3. Null values
28
Multivalued
1- Expand PK (a) A relation schema that is not in

1NF.
(b) Sample state of relation
DEPARTMENT.
(c) 1NF version of the same
relation with redundancy-Repeating
group -For each value of Dlocation
group of (Dname, Dnumber,
Dmgr_ssn) exists.
Introduce Repeating
Group
DEPT_LOC 2- Decompose into 2 relation
(DEPT_LOC, DEPT): to remove repeating
Dnumber Dlocation
group
5 Bellaire New relation with Dnumber and Dlocation
5 Sugarland as PK
DEPT
5 Houston
4 Stafford Dname Dnumber Mng_ssn
1 Houston
Research 5 333445555
Administration 4 987654321
Headquarter 1 888665555
DEPT_LOCATIONS 3- Three atomic locations

Dname Dnumber Mng_ssn Dlocation1 Dlocation2 Dlocation3
Research 5 333445555 Bellaire Sugarland Houston

Administration 4 987654321 Stafford Null Null
Headquarter 1 888665555 Houston Null Null
Slide 1-26
Does not allow nested relations
– Each tuple can have a relation within it
To change to 1NF:
– Remove nested relation attributes into a new
relation
– Propagate the primary key into it
– Un-nest relation into a set of 1NF relations
Remove Nested Relation
(a) Schema of the EMP_PROJ relation with a “nested relation” PROJS.
(b) Example extension of the EMP_PROJ relation showing nested

relations within each tuple.
(c) Decomposing EMP_PROJ
into 1NF relations EMP_PROJ1
and EMP_PROJ2 by propagating
the primary key (SSN).
28
Second Normal Form (2NF)
• A relation is in 2NF if it is:
– in 1NF
– Every nonprime attribute is fully functionally dependent
on the primary key
• Remove Partial Dependency:

– Any attributes which are dependent on part of the
composite key
– These attributes are put into a separate table along with
that part of the compound key.
30
Second Normal Form: Examples
31
Second Normal Form: Examples (cont’d…)
 {SSN, PNUMBER}  HOURS is a full FD since neither

SSN  HOURS nor PNUMBER  HOURS hold
 {SSN, PNUMBER}  ENAME is not a full FD (it is

called a partial dependency )
– since SSN  ENAME also holds
 A relation schema R is in second normal form (2NF) if

every non-prime attribute A in R is fully functionally
dependent on the primary key
 R can be decomposed into 2NF relations via the process of

2NF normalization
Example: Determine NF
ORDER
Order_No Product_ID Description
In your solution you will write the following

justification:
Product_ID  Description 1) No M/V attributes, therefore at least 1NF
2) There is a partial dependency
(Product_IDDescription), therefore not in
2NF
Solution:
Order(Order_No, Prod_ID)
Prod(Prod_ID, Description)
Third Normal Form (3 NF)
Based on concept of transitive dependency
A Relations in third normal form (3NF) if it is in 2NF

and there is no transitive dependency,
– No non-prime attribute is dependent on another non-prime
attribute
– A relation is said to be in 3NF if every determinant is a
key
– i.e. for each and every fictional dependency: FD: AB, A
is a key
Solution: Decompose relation with attributes which are
dependent on another attribute other than the primary
key within the table.
32
 Transitive functional dependency – if there a set of
atribute Z that are neither a primary or candidate key
and both X  Y and Y  Z holds.
Examples:
– SSN  DMGRSSN is a transitive FD since
SSN  DNUMBER and DNUMBER  DMGRSSN hold
– SSN  ENAME is non-transitive since there is no set

of attributes X where SSN  X and X  ENAME
Third Normal Form: Example
DNUMBER is non prime attribute which determine other non
prime attribute DNAME and DMGRSSN
In other word SSN transitively (indirectly) determine
DNAME and DMGRSSN
33
Example : Determine NF
BOOK
ISBN Title Publisher Address
ISBN  Title
In your solution you will write the
ISBN  Publisher following justification:
Publisher  Address 1. No M/V attributes, therefore at least
1NF
2. No partial dependencies, therefore
at least 2NF
Solution: 3. There is a transitive dependency
Book( ISBAN,Title,Publisher) (Publisher  Address), therefore,
not 3NF
Pub_Add(Publisher, Address)
Steps in Data Normalization
UNORMALISED ENTITY
Step 1: remove repeating groups
1st NORMAL FORM
Step 2: remove partial dependencies
2nd NORMAL FORM
Step 3: remove indirect dependencies
3rd NORMAL FORM
Step 4: remove multi-dependencies Step 4: every determinate a key
4th NORMAL FORM BOYCE-CODD NORMAL FORM
25
Advantages of Normalization
 The amount of unnecessary redundant data is reduced.
 Data integrity is easily maintained within the database.
 The database & application design processes are much more

flexible.
 Security is easier to manage.
42
Disadvantages of Normalization
 Produces lots of tables with a relatively small number of

columns
 Probably requires joins in order to put the information back

together in the way it needs to be used
 Impacts computer performance (CPU, I/O, memory).
43

Schema Refinement (Normalization) in DBMS

Uploaded by

Schema Refinement (Normalization) in DBMS

Uploaded by

Database Systems

Informal Design Guidelines for good Relation

What are Criteria for "good" base relations?

– Formal concepts (Normalization) of functional dependencies

2. Reducing redundant information in tuples

3. Reducing NULL values in tuples

1. Semantics of the Relation Attributes

 Whenever attributes are grouped to form a relation schema, it is

Guideline 1: Design a relation schema so that it is easy to explain

Example of violating Guideline 1:

A tuple in the EMP_DEPT relation represents a single employee

2. Redundant Information in Tuples and Update Anomalies

Redundancy: Duplication of data to be stored in database

 Grouping attributes into relation schemas has a significant

– Mixing attributes of multiple entities may cause

DNUMBER, DNAME and DMNGRSSN is repeating group for

2. Redundant Information in Tuples and Update Anomalies

• Redundant information may cause update anomalies.

2. Redundant Information in Tuples and Update Anomalies

2. Redundant Information in Tuples and Update Anomalies

2. Redundant Information in Tuples and Update Anomalies

2. Redundant Information in Tuples and Update Anomalies

Guideline 2: Design the base relation schemas so that NO insertion,

3. Null Values in Tuples

• In some schema designs many attributes may be grouped together

• If many of the attributes do not apply to all tuples in the relation,

Guideline 3: As far as possible, avoid placing attributes in a base

 Used to define NORMAL FORM for relations

– A set of attributes X functionally determines a set of attributes Y if

 Partial Dependency – when a non-key attribute is determined

Cust_ID Name Order_ID

Emp_ID F_Name L_Name Dept_ID Dept_Name

 The functional dependency {Book} → {Author Nationality} applies; that is, if

EMP SSN EName PNumber

WORK_ON SSN PNumber Hours

PROJECT PNumber PName PLocation

 Following two properties of decomposition must be considered:

1. Loss less-join Property: Identify any instance (row) of

2. Dependency preservation property: Ensures that each

Normal form is a state of a relation that results

Consider following DEPARTMENT relation

2. Decompose: Remove the attribute DLOCATIONS that violates

3. If a maximum number of values is known for the attribute

1- Expand PK (a) A relation schema that is not in

DEPT_LOCATIONS 3- Three atomic locations

Research 5 333445555 Bellaire Sugarland Houston

(b) Example extension of the EMP_PROJ relation showing nested

• Remove Partial Dependency:

 {SSN, PNUMBER}  HOURS is a full FD since neither

 {SSN, PNUMBER}  ENAME is not a full FD (it is

 A relation schema R is in second normal form (2NF) if

 R can be decomposed into 2NF relations via the process of

Order_No Product_ID Description

In your solution you will write the following

A Relations in third normal form (3NF) if it is in 2NF

– SSN  ENAME is non-transitive since there is no set

ISBN Title Publisher Address

Step 1: remove repeating groups

1st NORMAL FORM

Step 2: remove partial dependencies

2nd NORMAL FORM

Step 3: remove indirect dependencies

3rd NORMAL FORM

Step 4: remove multi-dependencies Step 4: every determinate a key

4th NORMAL FORM BOYCE-CODD NORMAL FORM

 The amount of unnecessary redundant data is reduced.

 Data integrity is easily maintained within the database.

 The database & application design processes are much more

 Security is easier to manage.

 Produces lots of tables with a relatively small number of

 Probably requires joins in order to put the information back

 Impacts computer performance (CPU, I/O, memory).

You might also like