Normalization of Database Tables
Normalization of Database Tables
Normalization of Database Tables
Information Management
Normalization of Database Tables
Learning Objectives
After completing this chapter, you will be able to:
a) STU_NUM→ STU_LNAME
321452→ BOWSER
{STU_LNAME} →
{STU_LNAME}
{STU_LNAME, STU_NUM} {STU_DOB}→
{STU_NUM} →
{STU_LNAME, STU_DOB, DEPT_CODE}
{EMAIL} →{STU_NUM, STU_LNAME, STU_DOB, STU_GPA}
Database Normalization
evaluating and correcting table structures to minimize
Tables and data redundancies and data inconsistencies
Normalization Reduces data anomalies
Assigns attributes to tables based on determination/
functional dependencies
Performed as a series of steps that result in a set of
desired forms
Normal forms
First normal form (1NF)
Second normal form (2NF)
Third normal form (3NF)
Boyce-Codd normal form (BCNF)
Database Structural point of view of normal forms
Tables and Higher normal forms are better than lower normal forms
Normalization There are higher normal forms but they are not based
on functional dependencies (e.g. 4NF is based on
“multivalued dependencies”)
Properly designed 3NF structures meet the requirement
of fourth normal form (4NF)
Denormalization
produces a lower normal form
Results in increased performance and greater data
redundancy
The Need for
Normalization Used while designing a new database
structure
Analyzes the relationship among the attributes within
each entity
Determines if the structure can be improved through
normalization
Improves the existing data structure and creates an
appropriate database design
Objective is to ensure that each table conforms to the
The concept of well-formed relations
Normalization Each table represents a single subject
Each row/column intersection contains only one value and not a
Process group of values
No data item will be unnecessarily stored in more than one table
All nonprime attributes in a table are dependent on the primary key
Each table has no insertion, update, or deletion anomalies
First normal form (1NF) Table format, no repeating groups, and PK identified
Dependency diagram
depicts all dependencies found within given table structure
Helps to get an overview of all relationships among table’s
attributes
Makes it less likely that an important dependency will be
overlooked
Example
A Sample Report Layout
A consulting company
charges its clients by
billing the hours spent on
each project. The hourly
billing rate is dependent
on the employee’s
position. Periodically, a
report is generated whose
content correspond to the
reporting requirements.
Example
Assignment Table (Primary Key: PROJ_NUM & EMP_NUM)
Example
Assignment Table (Identify repeating groups & enter appropriate scalar data)
Example
The arrows above
indicate that all the
table’s attributes
are dependent on
the PK.
1NF (PROJ_NUM,EMP_NUM, PROJ_NAME, EMP_NAME, JOB_CLASS, CHG_HOUR, HOURS)
PARTIAL DEPENDENCIES:
The arrows below
(PROJ_NUM -> PROJ_NAME) indicate any other
(EMP_NUM -> EMP_NAME, JOB_CLASS, CHG_HOUR) functional
TRANSITIVE DEPENDENCY:
dependencies
(JOB_CLASS -> CHG_HOUR)
Conversion to Conversion to 2NF occurs only when the 1NF has a
Second Normal composite primary key
Form (2NF) If the 1NF has a single-attribute primary key, then the table is
automatically in 2NF
Example
EMPLOYEE (EMP_NUM, EMP_NAME, JOB_CLASS, CHG_HOUR)
TRANSITIVE DEPENDENCY
(JOB_CLASS -> CHG_HOUR)
Note that if there
are no partial
dependencies, 1NF
and 2NF are the
ASSIGNMENT (PROJ_NUM, EMP_NUM, ASSIGN_HOURS)
same.
Conversion to
Third Normal
The data anomalies created by the database
Form (3NF)
organization shown in the figure from the previous
slide are easily eliminated
Make new tables to eliminate transitive dependencies
Reassign corresponding dependent attributes
System-defined attribute
Created and managed via the DBMS
Have a numeric value which is automatically
incremented for each new row
Tables in 3NF will perform suitably in business
The Boyce- transactional databases. However, higher normal
forms are sometimes useful.
Codd Normal
Form
BCNF
Every determinant in the table is a candidate key
Candidate key: same characteristics as primary key but not
chosen to be the primary key
Equivalent to 3NF when the table contains only one
candidate key
Violated only when the table contains more than one
candidate key
Considered to be a special case of 3NF
The Boyce-
Codd Normal
Form
DECOMPOSITION TO
BCNF
The Boyce-Codd Normal Form
Sample Data for a BCNF Conversion
The Boyce-Codd Normal Form
Example
Staff_Property_Inspection (1NF)
DreamHome company rents properties to clients. Periodically the properties are inspected in order
to perform certain repairs and to maintain the properties in the best possible conditions. When staff
member is required to inspect property, he or she use the company car for the day. However, a car
may be allocated to several members of staff throughout the working day, as needed and available.
A staff member may inspect several properties on a given day (using one car), but a property is only
inspected once on a given day.
Example
Dependency Diagram – 1NF
Example
Dependency Diagram – 2NF
Example
Dependency Diagram – 3NF
Example
If functional
dependencies exists in
the table where their
determinants are not
candidate keys for the
table, remove the
functional dependencies
by placing them in new
tables (One new table
per dependency)
Rules
All attributes must be dependent on the primary key, but they must be
independent of each other
Fourth Normal No row may contain two or more multivalued facts about an entity
Form (4NF)
Table is in 4NF when it:
Is in 3NF/ BCNF
Has no multivalued dependencies
Initial Contracting
Company ERD
Normalization
and Database
Design
Modified Contracting
Company ERD
Normalization
and Database
Design
Incorrect M:N
Relationship
Representation
Normalization
and Database
Design
Final Contracting
Company ERD
Table Name: EMPLOYEE
Normalization
and Database
Design
Normalization
and Database Table Name: ASSIGNMENT
Design
The Implemented
Database
Design goals considerations:
Creation of normalized relations
Denormalization
Processing requirements and speed
Storing the student grade point average (STU_GPA) Avoid extra join operations
Preaggregated data aggregate value in the STUDENT table when this Program computes the GPA every time a grade is entered
(also derived data) can be calculated from the ENROLL and COURSE or updated
tables STU_GPA can be updated only via administrative routine
Data modeling
Naming conventions: all names should be limited in length (database-dependent
size)
Entity names:
Should be nouns that are familiar to business and should be short and meaningful
Should document abbreviations, synonyms, and aliases for each entity
Should be unique within the model
For composite entities, may include a combination of abbreviated names of the
entities linked through the composite entity
Data Modeling Attribute names:
Checklist Should be unique within the entity
Should use the entity abbreviation as a prefix
Should be descriptive of the characteristic
Should use suffixes such as _ID, _NUM, or _CODE for the PK
attribute
Should not be a reserved word
Should not contain spaces or special characters such as @, !, or &
Relationship names:
Should be active or passive verbs that clearly indicate the nature
of the relationship
Entities:
Data Modeling
Each entity should represent a single subject
Checklist Each entity should represent a set of distinguishable entity instances
All entities should be in 3NF or higher
Any entities below 3NF should be justified
Granularity of the entity instance should be clearly defined
PK should be clearly defined and support the selected data granularity
Attributes:
Should be simple and single-valued (atomic data)
Should document default values, constraints, synonyms, and aliases
Derived attributes should be clearly identified and include source(s)
Should not be redundant unless this is required for transaction
accuracy, performance, or maintaining a history
Nonkey attributes must be fully dependent on the PK attribute
Data Modeling Relationships:
Checklist Should clearly identify relationship participants
Should clearly define participation, connectivity, and document
cardinality
ER model:
Should be validated against expected processes: inserts, updates, and
deletions
Should evaluate where, when, and how to maintain a history
Should not contain redundant relationships except as required (see
attributes)
Should minimize data redundancy to ensure single-place updates
Should conform to the minimal data rule: All that is needed is there, and
all that is there is needed
Normalization is a technique used to design tables in which data
redundancies are minimized
A table is in 1NF when all key attributes are defined and all remaining
Summary attributes are dependent on the primary key
A table is in 2NF when it is in 1NF and contains no partial
dependencies
A table is in 3NF when it is in 2NF and contains no transitive
dependencies
A table that is not in 3NF may be split into new tables until all of the
tables meet the 3NF requirements
Normalization is an important part—but only a part—of the design
process
A table in 3NF might contain multivalued dependencies that produce
either numerous null values or redundant data
The larger the number of tables, the more additional I/O operations
and processing logic you need to join them
The data-modeling checklist provides a way for the designer to
check that the ERD meets a set of minimum requirements