Normalization of Database Tables

Download as pdf or txt
Download as pdf or txt
You are on page 1of 52

IT131-8:

Information Management
Normalization of Database Tables
Learning Objectives
After completing this chapter, you will be able to:

Explain normalization and its role in the database


design process

Identify and describe each of the normal forms:


1NF, 2NF, 3NF, BCNF, and 4NF

Explain how normal forms can be transformed


from lower normal forms to higher normal forms

Apply normalization rules to evaluate and correct


table structures

Identify situations that require denormalization to


generate information efficiently

Use a data-modeling checklist to check that the


ERD meets a set of minimum requirements
Determination
State in which knowing the value of one attribute makes
Recall: it possible to determine the value of another
Dependencies
Functional Dependence
value of one or more attributes determines the value of

one or more other attributes (A B)
Determinant (A): attribute whose value determines
another
Dependent (B): attribute whose value is determined by
the other attribute

Full Functional Dependence


entire collection of attributes in the determinant is
necessary for the relationship
Examples

a) STU_NUM→ STU_LNAME
321452→ BOWSER

STU_NUM = Student number


324273→ SMITH
STU_LNAME = Student last name
STU_FNAME = Student first name
STU_INIT = Student middle initial
b) STU_LNAME ↛ STU_NUM
STU_DOB = Student date of birth SMITH ↛ 324273
STU_HRS = Credit hours earned
STU_CLASS = Student classification SMITH ↛ 324299
STU_GPA = Grade point average
STU_TRANSFER = Student transferred from another institution
DEPT_CODE = Department code
STU_PHONE = 4-digit campus phone extension
PROF_NUM = Number of the professor who is the student's advisor
More Examples
STU_NUM = Student number
STU_LNAME = Student last name
STU_FNAME = Student first name
STU_INIT = Student middle initial
STU_DOB = Student date of birth
STU_HRS = Credit hours earned
STU_CLASS = Student classification
STU_GPA = Grade point average
STU_TRANSFER = Student transferred from another institution
DEPT_CODE = Department code
STU_PHONE = 4-digit campus phone extension
PROF_NUM = Number of the professor who is the student's advisor

{STU_LNAME} →
{STU_LNAME}
{STU_LNAME, STU_NUM} {STU_DOB}→
{STU_NUM} →
{STU_LNAME, STU_DOB, DEPT_CODE}
{EMAIL} →{STU_NUM, STU_LNAME, STU_DOB, STU_GPA}
Database Normalization
evaluating and correcting table structures to minimize
Tables and data redundancies and data inconsistencies
Normalization Reduces data anomalies
Assigns attributes to tables based on determination/
functional dependencies
Performed as a series of steps that result in a set of
desired forms

Normal forms
First normal form (1NF)
Second normal form (2NF)
Third normal form (3NF)
Boyce-Codd normal form (BCNF)
Database Structural point of view of normal forms
Tables and Higher normal forms are better than lower normal forms
Normalization There are higher normal forms but they are not based
on functional dependencies (e.g. 4NF is based on
“multivalued dependencies”)
Properly designed 3NF structures meet the requirement
of fourth normal form (4NF)

Denormalization
produces a lower normal form
Results in increased performance and greater data
redundancy
The Need for
Normalization Used while designing a new database
structure
Analyzes the relationship among the attributes within
each entity
Determines if the structure can be improved through
normalization
Improves the existing data structure and creates an
appropriate database design
Objective is to ensure that each table conforms to the
The concept of well-formed relations
Normalization Each table represents a single subject
Each row/column intersection contains only one value and not a
Process group of values
No data item will be unnecessarily stored in more than one table
All nonprime attributes in a table are dependent on the primary key
Each table has no insertion, update, or deletion anomalies

Ensures that all tables are in at least 3NF


Higher forms are not likely to be encountered in business
environment

Works one relation at a time


Identifies the dependencies of a relation (table)
Progressively breaks the relation up into a new set of relations
The Normalization Process
Normal Forms

Normal Form Characteristic

First normal form (1NF) Table format, no repeating groups, and PK identified

Second normal form (2NF) 1NF and no partial dependencies

Third normal form (3NF) 2NF and no transitive dependencies

Boyce-Codd normal form 3NF and every determinant of a functional dependency


(BCNF) is a candidate key

Fourth normal form (4NF) BCNF and no non-CK functional dependencies


Unnormalized Form (UNF)
The
Normalization "Remove" repeating groups

Process First Normal Form (1NF)

Remove Partial Dependencies

Second Normal Form (2NF)

Remove Transitive Dependencies

Third Normal Form (3NF)

Remove non CK Dependencies

Boyce-Codd Normal Form (BCNF)


Conversion to Unnormalized form
a table that contains one or more repeating groups
the First Normal
Form (1NF) Repeating group
group of multiple entries of same type can exist for any single key
attribute occurrence

1NF describes tabular format in which:


All key attributes are defined
There are no repeating groups in the table
All attributes are dependent on the primary key

All relational tables satisfy 1NF requirements

Some tables contain partial dependencies


Conversion to Three step procedure
the First Normal 1. Nominate an attribute or group of attributes to act as the
identifier for the elements in the unnormalized table (i.e. Identify
Form (1NF) the primary key)
2. Identify and eliminate the repeating groups by entering
appropriate rows of data that contain scalar values only
3. Identify all dependencies

Dependency diagram
depicts all dependencies found within given table structure
Helps to get an overview of all relationships among table’s
attributes
Makes it less likely that an important dependency will be
overlooked
Example
A Sample Report Layout

A consulting company
charges its clients by
billing the hours spent on
each project. The hourly
billing rate is dependent
on the employee’s
position. Periodically, a
report is generated whose
content correspond to the
reporting requirements.
Example
Assignment Table (Primary Key: PROJ_NUM & EMP_NUM)
Example
Assignment Table (Identify repeating groups & enter appropriate scalar data)
Example
The arrows above
indicate that all the
table’s attributes
are dependent on
the PK.
1NF (PROJ_NUM,EMP_NUM, PROJ_NAME, EMP_NAME, JOB_CLASS, CHG_HOUR, HOURS)
PARTIAL DEPENDENCIES:
The arrows below
(PROJ_NUM -> PROJ_NAME) indicate any other
(EMP_NUM -> EMP_NAME, JOB_CLASS, CHG_HOUR) functional
TRANSITIVE DEPENDENCY:
dependencies
(JOB_CLASS -> CHG_HOUR)
Conversion to Conversion to 2NF occurs only when the 1NF has a
Second Normal composite primary key
Form (2NF) If the 1NF has a single-attribute primary key, then the table is
automatically in 2NF

The 1NF-to-2NF conversion is simple


Make new tables to eliminate partial dependencies
Reassign corresponding dependent attributes

Table is in 2NF when it:


Is in 1NF
Includes no partial dependencies
PROJECT (PROJ_NUM, PROJ_NAME)

Example
EMPLOYEE (EMP_NUM, EMP_NAME, JOB_CLASS, CHG_HOUR)
TRANSITIVE DEPENDENCY
(JOB_CLASS -> CHG_HOUR)
Note that if there
are no partial
dependencies, 1NF
and 2NF are the
ASSIGNMENT (PROJ_NUM, EMP_NUM, ASSIGN_HOURS)
same.
Conversion to
Third Normal
The data anomalies created by the database
Form (3NF)
organization shown in the figure from the previous
slide are easily eliminated
Make new tables to eliminate transitive dependencies
Reassign corresponding dependent attributes

Table is in 3NF when it:


Is in 2NF
Contains no transitive dependencies
Example
Note that if there
are no transitive
dependencies, 2NF
and 3NF are the
same.
Normalization is valuable because its use helps eliminate data
redundancies
Improving the
Design Issues that need to be addressed to produce a good
normalized set of tables and enhance operational activities:
Evaluate PK assignments
Better to create JOB_CODE in JOB table rather than JOB_CLASS
Evaluate naming conventions
Change CHG_HOUR to JOB_CHG_HOUR to indicate association with
JOB table
Change JOB_CLASS to JOB_DESCRIPTION
Refine attribute atomicity
Atomic attribute: cannot be further subdivided
Atomicity: characteristic of an atomic attribute
EMP_NAME to divide further to EMP_FNAME, EMP_LNAME,
EMP_INITIAL
Identify new attributes
EMP_HIREDATE, EMP_SSS, etc can be desirable for EMPLOYEE table
in real world scenario
Issues that need to be addressed to produce a good
normalized set of tables and enhance operational activities:
Improving the
Identify new relationships
Design Each PROJECT has one PROJECT MANAGER. Adding EMP_NUM as a
foreign key to the PROJECT ensures that you have access to the
PROJECT MANAGER’s data without data redundancy.
Refine primary keys as required for data granularity
Granularity: Level of detail represented by the values stored in a table’s
row
ASSIGN_HOURS in ASSIGNMENT table represent the hours worked in a
project, does it represent daily total, weekly total, monthly total, or yearly
total?
Using a surrogate key ASSIGN_NUM provides lower granularity
Maintain historical accuracy and evaluate using derived attributes
Writing the ASSIGN_CHG_HR in the ASSIGNMENT table is crucial in
maintaining historical accuracy since it is charge per hour change over
time.
Evaluate using derived attributes
You can use a derived attribute ASSIGN_CHARGE by multiplying
ASSIGN_HOURS by ASSIGN_CHG_HR for ease of use.
The Completed Database
Surrogate Key
Considerations
Used by designers when the primary key is
considered unsuitable

System-defined attribute
Created and managed via the DBMS
Have a numeric value which is automatically
incremented for each new row
Tables in 3NF will perform suitably in business
The Boyce- transactional databases. However, higher normal
forms are sometimes useful.
Codd Normal
Form
BCNF
Every determinant in the table is a candidate key
Candidate key: same characteristics as primary key but not
chosen to be the primary key
Equivalent to 3NF when the table contains only one
candidate key
Violated only when the table contains more than one
candidate key
Considered to be a special case of 3NF
The Boyce-
Codd Normal
Form

DECOMPOSITION TO
BCNF
The Boyce-Codd Normal Form
Sample Data for a BCNF Conversion
The Boyce-Codd Normal Form
Example
Staff_Property_Inspection (1NF)

DreamHome company rents properties to clients. Periodically the properties are inspected in order
to perform certain repairs and to maintain the properties in the best possible conditions. When staff
member is required to inspect property, he or she use the company car for the day. However, a car
may be allocated to several members of staff throughout the working day, as needed and available.
A staff member may inspect several properties on a given day (using one car), but a property is only
inspected once on a given day.
Example
Dependency Diagram – 1NF
Example
Dependency Diagram – 2NF
Example
Dependency Diagram – 3NF
Example
If functional
dependencies exists in
the table where their
determinants are not
candidate keys for the
table, remove the
functional dependencies
by placing them in new
tables (One new table
per dependency)
Rules
All attributes must be dependent on the primary key, but they must be
independent of each other
Fourth Normal No row may contain two or more multivalued facts about an entity
Form (4NF)
Table is in 4NF when it:
Is in 3NF/ BCNF
Has no multivalued dependencies

Multivalued dependencies exist when there are at least three


attributes in a table (e.g., A, B, C) and
For each value of attribute A, there is a set of values of B and a set of values
of C independent from the set of values of B

To remove the dependency, we need to divide the table into two


new tables:
Table 1 has attributes A and B
Table 2 has attributes A and C
Table Assignment has for each Course a set of Instructors
that teach that Course and a set of Textbooks that are
used for that Course:
Fourth Normal
“Management” has 3 Instructors and 2 Textbooks
Form (4NF) “Finance” has 1 Instructor and 2 Textbooks

The textbooks used for a Course are independent of the


Instructors that are teaching the Course.
First, we convert the Assignment into table Offering by
filling the empty cells
Fourth Normal
For each Course, we need all possible combination of Instructor and
Form (4NF) Textbook
To get tables to be in the 4NF, we divide Offering into
Instructor table and Textbook table:
Fourth Normal
Form (4NF)
Normalization Normalization should be part of the design
process
and Database
Design Proposed entities must meet the required normal form
before table structures are created

Principles and normalization procedures to be


understood to redesign and modify databases
ERD is created through an iterative process
Normalization focuses on the characteristics of specific
entities
Normalization
and Database
Design

Initial Contracting
Company ERD
Normalization
and Database
Design

Modified Contracting
Company ERD
Normalization
and Database
Design

Incorrect M:N
Relationship
Representation
Normalization
and Database
Design

Final Contracting
Company ERD
Table Name: EMPLOYEE

Normalization
and Database
Design

Table Name: JOB


The Implemented
Database
Table Name: PROJECT

Normalization
and Database Table Name: ASSIGNMENT
Design

The Implemented
Database
Design goals considerations:
Creation of normalized relations
Denormalization
Processing requirements and speed

Problem in Normalization: number of database tables


expands
Tables are decomposed to conform to normalization requirements
Joining a larger number of tables
Takes additional input/output (I/O) operations and processing logic
Reduces system speed

Defects in unnormalized tables


Data updates are less efficient because tables are larger
Indexing is more cumbersome
No simple strategies for creating virtual tables known as views
Denormalization
Common Denormalization Examples

Case Example Rationale and Controls

Avoid extra join operations


Storing ZIP and CITY attributes in the AGENT table
Redundant Data Program can validate city (drop-down box) based on the
when ZIP determines CITY
zip code

Storing STU_HRS and STU_CLASS (student Avoid extra join operations


Derived data classification) when STU_HRS determines Program can validate classification (lookup) based on the
STU_CLASS student hours

Storing the student grade point average (STU_GPA) Avoid extra join operations
Preaggregated data aggregate value in the STUDENT table when this Program computes the GPA every time a grade is entered
(also derived data) can be calculated from the ENROLL and COURSE or updated
tables STU_GPA can be updated only via administrative routine

Impossible to generate the data required by the report


Using a temporary denormalized table to hold
using plain SQL
Information report data; this is required when creating a tabular
No need to maintain table
requirements report in which the columns represent data that
Temporary table is deleted once report is done
are stored in the table as rows
Processing speed is not an issue
Business rules
Data Modeling Properly document and verify all business rules with the end users
Ensure that all business rules are written precisely, clearly, and simply
Checklist The business rules must help identify entities, attributes, relationships, and
constraints
Identify the source of all business rules, and ensure that each business rule is
justified, dated, and signed off by an approving authority

Data modeling
Naming conventions: all names should be limited in length (database-dependent
size)

Entity names:
Should be nouns that are familiar to business and should be short and meaningful
Should document abbreviations, synonyms, and aliases for each entity
Should be unique within the model
For composite entities, may include a combination of abbreviated names of the
entities linked through the composite entity
Data Modeling Attribute names:
Checklist Should be unique within the entity
Should use the entity abbreviation as a prefix
Should be descriptive of the characteristic
Should use suffixes such as _ID, _NUM, or _CODE for the PK
attribute
Should not be a reserved word
Should not contain spaces or special characters such as @, !, or &

Relationship names:
Should be active or passive verbs that clearly indicate the nature
of the relationship
Entities:
Data Modeling
Each entity should represent a single subject
Checklist Each entity should represent a set of distinguishable entity instances
All entities should be in 3NF or higher
Any entities below 3NF should be justified
Granularity of the entity instance should be clearly defined
PK should be clearly defined and support the selected data granularity

Attributes:
Should be simple and single-valued (atomic data)
Should document default values, constraints, synonyms, and aliases
Derived attributes should be clearly identified and include source(s)
Should not be redundant unless this is required for transaction
accuracy, performance, or maintaining a history
Nonkey attributes must be fully dependent on the PK attribute
Data Modeling Relationships:
Checklist Should clearly identify relationship participants
Should clearly define participation, connectivity, and document
cardinality

ER model:
Should be validated against expected processes: inserts, updates, and
deletions
Should evaluate where, when, and how to maintain a history
Should not contain redundant relationships except as required (see
attributes)
Should minimize data redundancy to ensure single-place updates
Should conform to the minimal data rule: All that is needed is there, and
all that is there is needed
Normalization is a technique used to design tables in which data
redundancies are minimized
A table is in 1NF when all key attributes are defined and all remaining
Summary attributes are dependent on the primary key
A table is in 2NF when it is in 1NF and contains no partial
dependencies
A table is in 3NF when it is in 2NF and contains no transitive
dependencies
A table that is not in 3NF may be split into new tables until all of the
tables meet the 3NF requirements
Normalization is an important part—but only a part—of the design
process
A table in 3NF might contain multivalued dependencies that produce
either numerous null values or redundant data
The larger the number of tables, the more additional I/O operations
and processing logic you need to join them
The data-modeling checklist provides a way for the designer to
check that the ERD meets a set of minimum requirements

You might also like