Normalization
Normalization
Normalization
A functional dependency (FD) is a relationship between two attributes, typically between the
PK and other non-key attributes within a table. For any relation R, attribute Y is functionally
dependent on attribute X (usually the PK), if for every valid instance of X, that value of X
uniquely determines the value of Y. This relationship is indicated by the representation below :
X ———–> Y
The left side of the above FD diagram is called the determinant, and the right side is the
dependent. Here are a few examples.
In the first example, below, SIN determines Name, Address and Birthdate. Given SIN, we can
determine any of the other attributes within the table.
For the second example, SIN and Course determine the date completed (DateCompleted). This
must also work for a composite PK.
functional dependency (FD): a relationship between two attributes, typically between the PK
and other non-key attributes within a table
Use an entity relation diagram (ERD) to provide the big picture, or macro view, of an organization’s data
requirements and operations. This is created through an iterative process that involves identifying relevant entities,
their attributes and their relationships.
Normalization procedure focuses on characteristics of specific entities and represents the micro view of entities
within the ERD.
What Is Normalization?
Normalization is the branch of relational theory that provides design insights. It is the process of determining how
much redundancy exists in a table. The goals of normalization are to:
Normalization theory draws heavily on the theory of functional dependencies. Normalization theory defines six
normal forms (NF). Each normal form involves a set of dependency properties that a schema must satisfy and each
normal form gives guarantees about the presence and/or absence of update anomalies. This means that higher normal
forms have less redundancy, and as a result, fewer update problems.
Normal Forms
All the tables in any database can be in one of the normal forms we will discuss next. Ideally we only want minimal
redundancy for PK to FK. Everything else should be derived from other tables. There are six normal forms, but we
will only look at the first four, which are:
To normalize a relation that contains a repeating group, remove the repeating group and form two new relations.
The PK of the new relation is a combination of the PK of the original relation plus an attribute from the newly created
relation for unique identification.
In the Student Grade Report table, the repeating group is the course information. A student can take many
courses.
Remove the repeating group. In this case, it’s the course information for each student.
The PK must uniquely identify the attribute value (StudentNo and CourseNo).
After removing all the attributes related to the course and student, you are left with the student course table
(StudentCourse).
The Student table (Student) is now in first normal form with the repeating group removed.
If the relation has a composite PK, then each non-key attribute must be fully dependent on the entire PK and not on a
subset of the PK (i.e., there must be no partial dependency or augmentation).
When examining the Student Course table, we see that not all the attributes are fully dependent on the PK;
specifically, all course information. The only attribute that is fully dependent is grade.
Check new table(s) as well as table(s) modified to make sure that each table has a determinant and that no
table contains inappropriate dependencies.
At this stage, there should be no anomalies in third normal form. Let’s look at the dependency diagram (Figure 12.1)
for this example. The first step is to remove repeating groups, as discussed above.