Integrity Constraints (Ref: Dbms by Silbershatz and Galvin)

Download as pdf or txt
Download as pdf or txt
You are on page 1of 35

RT503 Database Management Systems 1 Module 4

______________________________________________________________________________________________________________
MODULE 4

Integrity constraints ( ref: dbms by Silbershatz and galvin)

We know that unauthorized users can access the database. They can damage data in the database.
Also they can make the database inconsistent. Also a normal DBMS user can make the database in an
inconsistent state because of accident. So some restrictions should be made in the database so that the
users do not make changes to data accidentally. These restrictions are also called constraints.
Integrity constraints are intended for the normal user. These integrityt constraints ensure that
changes made to the database by authorized users do not result in a loss of data consistency. So the
integrity constraints guard against accidental damage to the database. They are a number of weays to
specify integrity constraints.
They are

Key constraints ( primary keys, foreign keys and candidate key specification)
Using ‘not null’
Using ‘check’ clause
Using assertions
Using triggers
Using functional dependencies

Domain constraints

We know that an attribute has a set of possible values associated with it.
For example in the student table

Student ( stdid, name, marks)

We know that the set of possible values for the attribute stdid is in the range of integers.
For attribute name the set of possible values are a group of characters.
For attribute marks the set of possible values are integers.
So these integer, character, date etc.. are called standard domain types.

Declaring an attribute to be of a particular domain acts as a constraint on the values that it can take. It is
possible for several attributes to have the same domain. For example in our student table, the domain of
stdid is same as domain of marks. That is integer. But we never say that find the name of students who
have the same stdid as a mark. It is not meaningful.
We can define new domains by using the create domain clause.

That is
create domain Dollars int ;
create domain pounds int;
Define the domains Dollars and pounds to be of integers. An attempt to assign a value of type dollars to a
variable of type Pounds would result in a syntax error although both are of the same type. But they are of
different domains.
The check clause in SQL permits domains to be restricted in powerful ways. For example if we are
creating a domain Studmarks and the condition is that the Studmarks value should not be more than 100.
we can specify thgis by

Create domain Studmarks int


______________________________________________________________________________________________________________
Department of Computer Science & Engineering
RT503 Database Management Systems 2 Module 4
______________________________________________________________________________________________________________
Constraint marktest check (value <=100)

Complex check conditions can be useful when we want to ensure integrity of data.

Rerential integrity

Here we are using foreign keys. Sometimes we wish to ensure that a value that appears in one table
for a given set of attributes also appears for a certain set of attributes in another table.this condition is
called referential integrity. We can illustrate by an example.

Suppose we have a college and we have stored the details of all students in the college in the student table
and we have a library in the college that contains books. Suppose the details of all books are atored in the
books table.

Student ( stdid, name, marks)


Books ( Bid, bname, author)

Suppose there is a facility for students to access and reserve books. Suppose the college uses 2 tables to
store this reserve and accessed

Reserve (stdid, bid, rdate)


Accessed ( stdid, bid, adate)

Suppose we create the student and books table like this.

Create table student (


Stdid int,
Name char(10),
Marks int,
Primary key ( stdid)
);

create table books (


bid int,
bname char(10),
author char(10),
primary key (bid)
);

suppose the students are allowed to access and reserve books. We are given that the details of all students
are in the students table and details of all books are in the books table.
Suppose the condition in the college is that only students of the college are allowed to access and reserve
books. In other words we can specify this condition as only students who are having entry in the student
table are allowed to access the books. In other words the stdid values in reserve and accessed table must
also be present in the student table.
Suppose another condition is that the students are allowed to access and reserve only those books that are
present in the college library. Or in other words we can say that the students are allowed to access and
reserve books that are present in the books table. Or in other words the bid values in the books reserve and
accessed tables must also be present in the books table .
______________________________________________________________________________________________________________
Department of Computer Science & Engineering
RT503 Database Management Systems 3 Module 4
______________________________________________________________________________________________________________
The above conditions or restrictions we can specify by using a foreign key clause.

That is
The tables accessed and reserved are created by

Create table reserved (


Stdid int,
Bid int,
Foreign key( stdid) references student( stdid),
Foreign key( bid) references books( bid)
);

create table accessed (


stdid int,
bid int,
Foreign key( stdid) references student( stdid),
Foreign key( bid) references books( bid)
);

this means that for any tuples inserted in to the reserved table the value of stdid and bid must be present
in the student and books tables respectively.
Also for any tuples inserted in to the accessed table the value of stdid and bid must be present in the
student and books tables respectively.
We can also create the tables reserved and accessed by specifying a coantraint name for these foreign
keys. That is another way of creating the tables is

Create table reserved (


Stdid int,
Bid int,
Constraint st Foreign key( stdid) references student( stdid),
Constraint bks Foreign key( bid) references books( bid)
);

create table accessed (


stdid int,
bid int,
constraint stud Foreign key( stdid) references student( stdid),
constraint bk1 Foreign key( bid) references books( bid)
);

here we have given names to these constraints.


So there are 2 foreign key constraints for reserved table. They are st and bks.
There are 2 foreign key constraints for accessed table. They are stud and bk1.

These facts can be represented by

Student
Stdid Name marks

______________________________________________________________________________________________________________
Department of Computer Science & Engineering
RT503 Database Management Systems 4 Module 4
______________________________________________________________________________________________________________

Books

Bid Bname Author

Reserved
Stdid Bid Rdate

Accessed
Stdid Bid Adate

Then other types of constraints are primary key constraints , unique, not null, check constraints.
For example suppose consider the table student.

Student ( stdid, branch, sem, relation, name, marks)

In this we can see that there are 2 candidate keys. They are stdid and (branch, sem, relation). One we
assign as the primary key , one we assign as unique.
Suppose we have the constraint that the name and marks of a student should not be nil or thwere should
be a value in the marks field and also suppose that we want to ensure that the value of marks should not
be more than 100. we can ensure this by using check clause.
We can create the table by

Create table student (


Stdid int,
Branch char(2),
Sem int,
Rn int,
Name char(10) not null,
Marks int not null,
Primary key (stdid),
Unique( branch, sem, Rn),
Check (marks<=100)
);

OR
We can give a name to all these constraints as
Create table student (
Stdid int,
Branch char(2),
Sem int,
Rn int,
______________________________________________________________________________________________________________
Department of Computer Science & Engineering
RT503 Database Management Systems 5 Module 4
______________________________________________________________________________________________________________
Name char(10) not null,
Marks int not null,
Constraint pk Primary key (stdid),
Constraint cdk Unique( branch, sem, Rn),
Constraint chk Check (marks<=100)
);

if the student table is created in this way we cannot insert two tuples that are having the same stdid values,
since stdid is declared as the primary key.

Also we cannot insert two tuples that are having the same (branch, sem, relation) values since these three
attributes together forms another key and it is declared using unique key word.

We cannot insert a tuple that is having the marks value greater than 100 since check clause is used to
restrict the marks values to be less than 100.

We have declared marks and name fields to be non null. So for each tuple that is inserted in to the table
there should be some value in the marks and name fields.

Other integrity constraints are triggers, assertions, functional dependencies. These are explained in some
other sections.

Pitfalls in relational database design ( ref: navathe / silbertschatz)

Before we discuss normalization of databases, we can see the drawbacks in the common design of
databases.
Some of the undesirable properties of bad design are
Repetition of information
Inability to represent certain information
Problems in updating values
Lossy join decomposition
We can see an example. Suppose the information related with a college is stored as

College (dname, dhod, dphone, stdid, stdname, stdmarks)

College

Dname Dhod Dphone stdid stdname smarks


CS Abc 23456 100 Ss1 70
CS Abc 23456 101 Ss2 20
CS Abc 23456 102 Ss3 45
EC Bgh 78905 100 Ss7 67
EC Bgh 78905 101 Ss8 55
AE Mkl 34443 100 Ss2 68
CS Abc 23456 103 Ss4 34
AE Mkl 34443 101 Ss3 70

Suppose we want to add the details of a new student in to the college table.
That is student- 800, hjk, 50 to AE department.

______________________________________________________________________________________________________________
Department of Computer Science & Engineering
RT503 Database Management Systems 6 Module 4
______________________________________________________________________________________________________________
In our design we need a tuple with values on all attributes of college schema. Thus we must repeat
the dhod and dphone and we must add the tuple

AE, bcd, 34443, 800, hjk, 50

In general, the Dhod and Dphone for a department must appear once for each student admitted to
that department.
The repetition of information is very much undesirable. Repeating information wastes space. Also
it complicates the database. Suppose the phone number of department CS changes from 23456 to 56789.
Under this design many tuples of college relation needs to be changed. So updates are very costly in this
design. When we perform update on this table, we must ensure that every tuple corresponding to CS
departnment is updated. Otherwise our table will show 2 different phone number values.
By observing this, we can say that this design of our table or database is bad.

We know that a department has a unique value of phone number, so given a department name we can
uniquely identify the phone number value.
We know that a department has many students, so given a department name we cannot uniquely
determine the stdid. In other words we can say that the functional dependency dname  dphone holds on
college schema. But we cannot say that there is a functional dependency dname  stdid exists.
The fact that the department has a particular value for phone no., and the fact that dept has a
student are independent, these facts can be best represented in separate tables. We will see that we can use
functional dependencies to specify formally when a database design is good.
Another problem with the college relation is that we cannot represent directly the information
related with a department ( dname, dhod, dphone) if there are no students in that department. This is
because tuples in college relation requires values for stdid, stdname, stdmarks.
One solution for this is to use null values. But these null values are difficult to handle. If we do not
want to deal with null values, we can create department information only when the first student is
admitted to that department. And if all students from that department go out, then we have to delete all
information on that department. But this situation is undesirable.

Then some other problems that can occur isupdate anomalies or problems in updates and lossy join
decompositions.
For example if we consider the student table
Student ( stdid, branch, name, marks, hod, deptphoneno)

Student
Stdid Branch Name Marks Hod Deptphoneno

100 Cs Abc 60 Def 567890


101 Cs Bcd 70 Def 567890
102 Ec Sad 80 Ghj 123456
105 Ec Abc 10 Ghj 123456

In this table we can see that there is repetition of information. Also we can see that there is a particular
person as hod for each branch. If all the students’ details are stored in this table we can see that if there are
100 students in each branch the hod ‘s name will be repeated 100 times. Also the department phone no
will also be repeated 100 times. Suppose the hod of a particular branch changes. Then we have to update
the hod field of each branch. If there are 100 tuples corresponding to each branch then all those tuples
______________________________________________________________________________________________________________
Department of Computer Science & Engineering
RT503 Database Management Systems 7 Module 4
______________________________________________________________________________________________________________
have to be updated corresponding to the hod field. This is the case with deptphoneno also. If we want to
change the phone no of a particular department, it also has to be changed for all these tuples. This is called
update anomalies.
Lossy join decomposition is another pitfall in the relational database design. This has been explained with
fourth normal form.

Functional dependency ( Ref: navathe)

This is a very important concept in the relational database design. A functional dependency is a
constraint between 2 sets of attributes from the database. First we can see an example.

Consider the student table.


Student

Stdid Sname Marks Rn Branch Sem Hod Grade


100 Anil 50 1 Cs 3 Abc D
101 Binil 80 2 Cs 3 Abc A
102 Cinil 70 3 Cs 3 Abc B
103 Dinil 80 4 Cs 3 Abc A

We are considering the student table and our assumptions are on a real world view of the student.
We can see that the keys or candidate keys of the table are stdid and (branch, sem, rn). We know
that a key means for each tuple the value of the key attribute or column should be distinct. For example
stdid, for each row or tuple in the student table, stdid value should be different. Then the key (branch,
sem, rn). In this case also the 3 values for these three attributes taken together are distinct for each tuple or
row. That is these groups of 3 values are distinct for each tuple or row.

Stdid

100
101
102
103
108

branch sem rn

cs 3 1
cs 3 2
cs 3 3
cs 5 1
cs 5 2
ec 3 1
ec 3 2
ec 3 3
ec 5 1
ec 5 2

we can see that the key values are distinct for each row.
______________________________________________________________________________________________________________
Department of Computer Science & Engineering
RT503 Database Management Systems 8 Module 4
______________________________________________________________________________________________________________

If we say
Stdid  marks
This is called a functional dependency. That is stdid functionally determines marks.

Suppose in the above table the values for the attributes are

Stdid marks

100 80
101 85
102 70
103 70
104 85
108 70
109 80

Any way ‘stdid’ values are different for each row since it is a candidate key. In this we can see that
for each ‘stdid’ value, there is a unique ‘marks’ value. It means if the ‘stdid’ is 102, its corresponding
‘marks’ value is always 70 in this student table. This means that the value of the ‘marks’ attribute of a
tuple in student depend on or are determined by the values of the ‘stdid’ component or we can say that the
values of the ‘stdid’ component of a tuple uniquely (functionally) determines the values of the ‘marks’
attribute. We can say that there is a functional dependency from ‘stdid’ to ‘marks’ or that ‘marks’ is
functionally dependent on ‘stdid’. The attribute ‘stdid’ is called the left hand side of the FD and ‘marks’
is called the right hand side.

We can write other functional dependencies as


Stdid  sname
Stdid  rn
Stdid  sem
Stdid  branch
Stdid  hod
Stdid  grade

Also we can write as

Stdid  sname, marks, rn, branch, sem, hod, grade

We can see that this is correct. We have written the above sets because stdid is a key attribute.

We can also write


Branch, sem, rn  marks

We can write it because the left hand side is a key attribute.

Branch sem rn marks

Cs 3 1 50
______________________________________________________________________________________________________________
Department of Computer Science & Engineering
RT503 Database Management Systems 9 Module 4
______________________________________________________________________________________________________________
Cs 3 2 60
Cs 3 3 70
Cs 3 4 50
Ec 3 1 50
Ec 3 2 20
Ec 3 3 30

On looking on to this we can say that


(branch, sem, rn) functionally determines marks.

Also we can write

Branch, sem, rn  stdid


Branch, sem, rn  sname
Branch, sem, rn  marks
Branch, sem, rn  hod
Branch, sem, rn  grade

Or together
Branch, sem, rn  stdid, sname, marks, hod, grade

Since these 2 attributes are keys for student, we have written these 2 functional dependencies.

Stdid  branch, sem, rn, sname, marks, hod, grade

Branch, sem, rn  stdid, sname, marks, hod, grade

Of we look on to that table again, we can find other functional dependencies.

For example

Stdid branch hod

100 cs abc
101 cs abc
103 cs abc
104 cs abc
101 ec bcd
103 ec bcd
105 cs abc
104 ec bcd

if we think, we can find that for each branch there is only one hod or for each value of
branch there is a unique hod.
We can write as
Branch  hod
Then if we take marks and grade, suppose the mark is 80. suppose the grade is A for mark
80 and above. We can see that whenever mark 80 comes grade will be A.
So for each value of mark there is a unique grade.

______________________________________________________________________________________________________________
Department of Computer Science & Engineering
RT503 Database Management Systems 10 Module 4
______________________________________________________________________________________________________________

Stdid marks grade

100 50 D
101 80 A
102 85 A
103 50 D
104 60 C
105 75 B
106 60 C

so we can write
marks  grade

so we can say that the following functional dependencies hold in the student relation.

Stdid  branch, sem, rn, sname, marks, hod, grade

Branch, sem, rn  stdid, sname, marks, hod, grade

Branch  hod

Marks  grade

So in the student schema we are representing these functional dependencies as

Student

Stdid Sname Marks Rn Branch Sem Hod Grade

A functional dependency (FD) denoted by X Y between 2 sets of attributes X and Y that


are subsets of R specifies a constraint on the possible tuples that can form a relation state r of R. the
constraint is that for any two tuples t1 and t2 in r that have t1[X] = t2[X], we must also have t1[Y]=
t2[Y].

______________________________________________________________________________________________________________
Department of Computer Science & Engineering
RT503 Database Management Systems 11 Module 4
______________________________________________________________________________________________________________
This means that the values of Y component of a tuple in r depend on or are determined by the
values of the X component. . Or in other words, the values of X component of a tuple uniquely or
functionally determine the values of the Y component.
We are saying that there is a functional dependency from X to Y or that Y is functionally
dependent on X. the abbreviation for functional dependency is FD. The set of attributes X is called left
hand side of FD, and Y is called right hand side of FD.

A functional dependency is a property of the relation schema R, not of a particular relation state r
of R. So an FD cannot be automatically determined from a given relation but it must be explicitly defined
by someone who knows the meaning or semantics of the columns of relation R.

Trivial and non trivial functional dependencies

In a functional dependency X  Y , if Y  X then it is a trivial FD.

X  Y is non trivial FD when Y is not subset of X

For example
A B C

Q L M
E J N
R B Y
Q L J
T B D
U G P
R B Y

The FD s are A  B
CA
These two are non trivial functional dependencies.
We can also write
A, B  B
A, B, C  A, C
These are trivial functional dependencies because RHS is a subset of LHS.

Inference rules for functional dependencies

 Amstrong Axioms are basic inference rule


 Amstrong Axioms are used to conclude FD’s on a relational
database
 The inference rule is a type of assertion .It can apply to a set
of FD to derive other FD
______________________________________________________________________________________________________________
Department of Computer Science & Engineering
RT503 Database Management Systems 12 Module 4
______________________________________________________________________________________________________________

 Using the inference rule ,we can derive additional functional


dependency from the initial set
 The set of all such dependencies is called closure of F and is
denoted by F+.

Armstrong’ s inference rules


The following set of rules is well known inference rules for FDs

1. Reflexive rule
If Y  X, then X  Y,(if Y is subset of X then X determines Y)

2. Augmentation rule/Partial Dependency

If X determines Y then XZdetermines YZ for any Z


X  Y we can infer XZ  YZ

3. Transitive rule
X  Y, Y  Z , we can infer X  Z

4. Decomposition rule
X  YZ , we can infer X  Y, X  Z

5. union rule
X  Y, X Z we can infer X  YZ

6. pseudotransitive rule
X  Y, WY  Z we can infer WX  Z

Closure of an Attribute Set


Set of all attribute which can be functionally determined from an attribute /set of attribute
are called closure of that attribute /set attribute .
Closure of attribute set{X} is denoted as {X}+
Following steps are followed to find the closure of an attributes et
Step 1
Add the attribute s which are present on the LHS in the original functionl dependency

Step 2 :Add the attribute s which are present on the RHS in the functional dependency
______________________________________________________________________________________________________________
Department of Computer Science & Engineering
RT503 Database Management Systems 13 Module 4
______________________________________________________________________________________________________________

.
Step 2:With the help of attributes present on RHS ,check the other attribute that can be derived
from the other given functional dependancies. Repeat this process until all the possible
attributes which can be derived are added I the closure

Ex:Consider relation R(A,B,C,D,E,F,G) with the functional dependencies

F={ABC, BCDE, DF, CFG}


Closure of attribute A
A+={A}
={A,B,C} (Using FD - A BC)
={A,B,C,D,E} (Using FD ,BCDE)
={A,B,C,D,E,F} (Using FD ,DF)
={A,B,C,D,E,F,G} (Using FD CF G)
A+=={A,B,C,D,E,F,G} it is a candidatekey

Ex 2:
R(A,B,C) FD are AB, B C, ABC

A+ = {A,B,C} –it is candidate key


B+={B,C}
C+={C}, C not determined any other attribute from the above FD
AB+={A,B,C} This is a super key

FINDING KEYS USING CLOSURE

Super Key
 If the closure result of an attribute set contains all the attributes of the relation ,then that
attribute set is called as a superkey of that relation
 Thus we can say
The closure of a super key is the entire relation schema

For example from the above example A is a super key,but if we need to find closure of BC
then it is not a super key as A will not get (should get all the attribute then it’s a superkey)

Candidate Key

If there is no subset of an attribute set whose closure contains all the attribute of the relation ,then
that attribute set is called as a candidate key of that relation .

For example
______________________________________________________________________________________________________________
Department of Computer Science & Engineering
RT503 Database Management Systems 14 Module 4
______________________________________________________________________________________________________________
No subset of attribute A contains all the attribute of the relation
Thus, attribute A is also a candidate key for that relation

(If AC is a super key and A itself also a super key not a candidate key)A itself act as a super
key (minimal super key)

Equivalence of Functional Dependency

If F and G are two sets of functional dependencies


1.If All FD ‘s of G can be determined from FD’s that are present in F, we can conclude F covers G
2.If all FDs of F can be determined from FD’s that are present in G, we can conclude that G
covers F
If 1 and 2 satisfied then F=G

______________________________________________________________________________________________________________
Department of Computer Science & Engineering
RT503 Database Management Systems 15 Module 4
______________________________________________________________________________________________________________

______________________________________________________________________________________________________________
Department of Computer Science & Engineering
RT503 Database Management Systems 16 Module 4
______________________________________________________________________________________________________________

______________________________________________________________________________________________________________
Department of Computer Science & Engineering
RT503 Database Management Systems 17 Module 4
______________________________________________________________________________________________________________

______________________________________________________________________________________________________________
Department of Computer Science & Engineering
RT503 Database Management Systems 18 Module 4
______________________________________________________________________________________________________________

______________________________________________________________________________________________________________
Department of Computer Science & Engineering
RT503 Database Management Systems 19 Module 4
______________________________________________________________________________________________________________

Normal forms ( ref: Dbms by Navathe)

The normal forms or normalization process was first proposed by Codd. It takes a relation schema
or a set of tables through a series of tests and it checks whether the database satisfies a certain normal
form. Codd proposed 3 normal forms.

First normal form


Second normal form and
Third normal form

Then a modification to the third normal form was proposed. That is called

Boyce Codd normal form

All these normal forms are based on functional dependencies.


Later fourth normal form and fifth normal forms were proposed. They are based on multivalued
and join dependencies.
We have already studied some drawbacks or pitfalls in relational database design. The main
drawbacks are repetition of information and inability to represent certain information. The purpose of
normalization is to analyze the given relation schemas or tables and based on functional dependencies and
candidate keys and remove the above said drawbacks from the database. If a relation schema or tables are
not satisfying the normal form tests, they are decomposed and new relations are made which satisfies the
normal form tests.
We know the concept of candidate keys and primary key of a table.

Prime attribute
An attribute of relation schema R is called a prime attribute if it is a member of some candidate key
of R. an attribute is called non-prime if it is not a prime attribute- that is it is not a member of some any
candidate key.
For example

Student ( stdid, branch, sem, rn, sname, marks)

‘Branch’ is a prime attribute because it is a member of the candidate key ( branch, sem, rn).
Like wise ‘sem’ is a prime attribute.
‘Stdid’ is a prime attribute because it is itself a candidate key.
‘Marks’ is not a prime attribute.
Also ‘sname’ is not a prime attribute.

______________________________________________________________________________________________________________
Department of Computer Science & Engineering
RT503 Database Management Systems 20 Module 4
______________________________________________________________________________________________________________

First normal form (1NF)

It is defined to disallow multivalued attributes, composite attributes and their combinations.


It states that domain of an attribute must include only atomic (indivisible) values and that value of
any attribute in a tuple must be a single value from the domain of the attribute.
So first normal form disallows having as set of values, tuple of values or combination of both as an
attribute value for a single tuple.
We can explain this using an example.

Consider the student relation.

Student ( stdid, sname, saddress, phoneno)

Student

Stdid Sname Saddress phoneno

100 Abc No. 20, KTM, Kerala 567890

102 Bcd No. 35, EKM, Kerala 564476


234789

105 Def No. 41, KTM, Kerala 123245


367840
300898

In this relation we can see there are 3 tuples. But there is a composite attribute ‘saddress’ having
three fields, ‘ house no, city and state ‘.
Then we can see a multivalued attribute ‘phoneno’. We can see that student 102 has 2 phones. 103
has 3 phones.
According to 1NF, all these multivalued and composite attributes are not allowed.
We have to find a way to to normalize this schema to first normal form.
First we are solving the problem caused by multi valued attributes, here phoneno.
We are removing the attribute ‘phoneno’ and place it in a separate table or relation along with the primary
key of student that is ‘stdid’.

Then we get
Student1 ( stdid, sname, saddress)
Std_phone ( stdid, phoneno)

Here the
primary key of student1 is ‘stdid’ and
Primary key of std_phone is (stdid, phoneno )

______________________________________________________________________________________________________________
Department of Computer Science & Engineering
RT503 Database Management Systems 21 Module 4
______________________________________________________________________________________________________________
Student1

Stdid Sname Saddress

100 Abc No. 20, KTM, Kerala

102 Bcd No. 35, EKM, Kerala

105 Def No. 41, KTM, Kerala

Std_phone

Stdid Phoneno

100 567890
102 564476
102 234789
105 123245
105 367840
105 300898

Then next we have to deal with composite attributes . we can expand the ‘saddress’ to 3 attributes
as ‘add_house’, ‘add_city’, ‘add_state’. The nthe relations will be

Student1A

Stdid Sname Add_house Add_city Add_state

100 Abc No. 20 KTM Kerala


102 Bcd No. 35 EKM Kerala
105 Def No. 41 KTM Kerala

Std_phone

Stdid Phoneno

100 567890
102 564476
102 234789
105 123245
105 367840
105 300898
We can see that student1A and std_phone are in first normal form (1NF).

Second normal form (2NF)

______________________________________________________________________________________________________________
Department of Computer Science & Engineering
RT503 Database Management Systems 22 Module 4
______________________________________________________________________________________________________________
A relation schema or a table, R is in second normal form, if every non prime attribute A in R is fully
functionally dependent on the primary key of R. it should be in 1NF

Before seeing second normal form, we have to learn some definitions

Prime Attribute: An attribute which is a part of the candidate key(LHS)


Non Prime Attribute:An attribute which is not a part of the candidate key(RHS)

Partial and full functional dependencies

A functional dependency X  Y is a full functional dependency if removal of an attribute A from X (that


is A subset of X) means that the dependency does not hold any more.

A functional dependency X  Y is a partial functional dependency, if some attribute A from X is


removed, the dependency still holds.

For example
Student (stdid, branch, sem, rn, name, marks, hod)

We know that the following FD’s are correct for this table.

FD1 -- stdid  branch, sem, rn, name, marks, hod

FD2 -- branch, sem, rn  stdid, name, marks


Also
FD3 -- branch, sem, rn  hod

In FD2, if we remove the attribute sem from the LHS or X part, we can see the
Branch, rn does not functionally determine stdid, name, marks, hod. This is the case if we remove branch
and rn. So this FD2 is called a full functional dependency.
In FD3, if we remove the attribute sem and rn we cn see that the FD still holds.
That is branch  hod is also a functuional dependency. So this FD3 is a partial functional dependency.

For example
Student1 (stdid,branch, sem, rn, name, hod, marks, grade )

FD1

FD2

FD3

FD4

We can see that the student1 relation is not in second normal form, because of FD3. that is
Branch  hod

______________________________________________________________________________________________________________
Department of Computer Science & Engineering
RT503 Database Management Systems 23 Module 4
______________________________________________________________________________________________________________
It violates 2NF because the non prime attribute hod is partially dependent on the candidate key (branch,
sem, rn ).
This is a partial functional dependency because
Branch, sem, rn  hod. (if we remove the attribute sem, rn then also the FD holds).

Other non prime attributes are name, marks,grade. They are fully functionally dependent on the keys.
Stdid  name
Branch, sem, rn  name
Stdid  marks
Branch, sem, rn  marks
Stdid  grade
Branch, sem, rn  grade

Grade  marks does not violate 2NF, because grade is not a prime attribute.

As a next step we have to normalize student1 to 2NF.


We are decomposing it by
Removing attribute hod which forms a partial dependency from student1 and put it in another relation.

That is we are decomposing student1 to student1A and student1B

Student1A

Stdid Branch Sem Rn Name Marks Grade

FD1

FD2

FD3

Student1B

Branch Hod

So we have decomposed student1 into


student1A (stdid, branch, sem, rn, name, marks, grade) and
stuident1B ( branch, hod)
This is in 2NF.
______________________________________________________________________________________________________________
Department of Computer Science & Engineering
RT503 Database Management Systems 24 Module 4
______________________________________________________________________________________________________________

Third normal form (3NF)

3NF is based on the concept of transitive dependency. Transitive dependencies are not allowed in
3NF.
Transitive dependency means, if in a relation or a table if XY and YZ hold, then X Z is also
a functional dependency that holds on R. Here X, Y, Z are attributes of the table and also Y should not be
a candidate key or a subset of any key (prime attribute) of the table R.it should be in 2NF
we can see this by an example.
Student3

Stdid Branch Sem Rn Name Marks Grade

We have shown 3 FD’s here. That is

Fd1 Stdid  Marks

Fd2 Marks  Grade

Fd3 StdidGrade

We can see that marks is not a prime attribute of student3.


Stdid  grade is a transitive dependency because of Fd2 and Fd3.

This is not allowed in 3NF.

A relation R is said to be in 3NF, if R is in 2NF and also no non prime attribute of R is transitively
dependent on the key of R.
The above relation schema student3 is in 2NF, since there are no partial dependencies on a key exists. But
it is not in 3NF because of the transitive dependency stdid  grade via e ‘marks’.
We can normalize student3 by decomposing it in to two 3NF relation schemas,
Student3A and student3B as follows.

Student3A (stdid, branch, sem, rn, name, marks)


Student3B (marks, grade)

______________________________________________________________________________________________________________
Department of Computer Science & Engineering
RT503 Database Management Systems 25 Module 4
______________________________________________________________________________________________________________

Student3
Stdid Branch Sem Rn Name Marks Grade

Student3A

Stdid Branch Sem Rn Name Marks

Student3B

Marks Grade

We can see that this is in 3NF.

Example 2:

Emp_dept

Ename Ssn Bdate Address Dnumber Dname Dmgrssn

______________________________________________________________________________________________________________
Department of Computer Science & Engineering
RT503 Database Management Systems 26 Module 4
______________________________________________________________________________________________________________

We can see that the above schema is not in 3NF because the transitive dependency, but it is in 2NF.

Ename  dmgrssn is there. Also


Ename  dname is there. (through dnumber)

We can decompose this in to

ED1
Ename Ssn Bdate Address Dnumber

ED2

Dnumber Dname Dmgrssn

See that this table is in 3NF.

General definitions of second and third normal forms

General definition of second normal form

A relation schema R is in 2NF, if every non prime attribute A in R is not partially dependent on
any key of R. we can see an example.

LOTS
Propertyid Countyname Lot Area Price Taxrate

Fd1

Fd2

Fd3

______________________________________________________________________________________________________________
Department of Computer Science & Engineering
RT503 Database Management Systems 27 Module 4
______________________________________________________________________________________________________________

Fd4

We can see that the LOTS schema violates the general definition of 3NF because ‘tax rate’ is
partially dependent on the candidate key (county name, lot) due to FD3.
To normalize LOTS in to 2NF, we decompose it in to 2 relations, Lots1 and Lots2. we construct
Lots1 by removing the attribute tax rate that violates 2NF and placing it with county name (the LHS of
FD3 that causes partial dependency) in to another relation Lots2. both Lots1 and Lots2 are in 2NF. We
can see that FD4 does not violate 2NF.

LOTS1

Propertyid Countyname Lot Area Price

Fd1

Fd2

Fd4

LOTS2

County Tax rate


name

fd3

The relations LOTS1 and LOTS2 are in second normal form.

General definition of third normal form (3NF)

A relation schema R is in 3NF if whenever a non-trivial functional dependency


X  A holds in R, either
a) X is a super key of R
OR
b) A is a prime attribute of R.
______________________________________________________________________________________________________________
Department of Computer Science & Engineering
RT503 Database Management Systems 28 Module 4
______________________________________________________________________________________________________________

If any of these conditions hold we can say that the relation schema is in 3NF.

Using this we can directly analyse a relation scheam whether it is in 3NF.

Consider the LOTS relation.

LOTS
Propertyid Countyname Lot Area Price Taxrate

Fd1

Fd2

Fd3

Fd4

According to this LOTS is not in 3NF, because FD3 and FD4 violates the conditions.

We can see that FD1 and FD2 are in 3NF.

But in FD3
County name taxrate
County name itself is not a super key and also tax rate is not a prime attribute.

Also in FD4
Area  price
Area is not a super key and also price is not a prime attribute.

So LOTS is not in 3NF.

To normalize LOTS we decompose it into LOTS2 and LOTS1A and LOTS1B.


We construct LOTS1A by removing the attribute price that violates 3NF and LOTS2 by removing the
attribute taxrate that also violates 3NF.

LOTS2

County Tax rate


name

Fd3

______________________________________________________________________________________________________________
Department of Computer Science & Engineering
RT503 Database Management Systems 29 Module 4
______________________________________________________________________________________________________________

LOTS1A

Propert id Countyname Lot Area

Fd1

Fd2

LOTS1B

Area Price

Fd4

We can see that all the above relations LOTS2, LOTS1A, LOTS1B are in 3NF.

A relation schema R is in 3N if every non prime attribute of R meets the following conditions .
It is fully functionally dependent on every key of R.
It is non transitively dependent on every key of R.

Boyce Codd Normal form (BCNF)

It was first proposed as a simpler form of 3NF, but it was founf to be stricter than 3NF. This is
because every relation in BCNF is also in 3NF. However a relation in 3NF may not be in BCNF.
A relation schema R is in BCNF if whenever a non trivial functional dependency X  A
holds in R, then X is a superkey of R. the only difference between BCNF and 3NF is that the
condition (b) of 3NF (which allows A to be prime) is absent from BCNF.

______________________________________________________________________________________________________________
Department of Computer Science & Engineering
RT503 Database Management Systems 30 Module 4
______________________________________________________________________________________________________________

Suppose we have a table Lots1A

Lots1A

Propertyid Countyname Lot Area

Fd1

Fd2

Fd5

Here we can see that the relation Lots1A is not in BCNF, but it is in 3NF.
FD5 violates BCNF because area is not a superkey.Fd1 and Fd2 satisfies BCNF because the LHS are
super keys.
So we remove the attribute (county name) and place it in another relation.

Lots1AX

Propertyid Area Lot

Lots1AY

Area Countyname

These relations are in BCNF.

Every relation in BCNF is also in 3NF. Every relation in 3NF may not necessarily be in BCNF.

For example
R
A B C

Fd1
______________________________________________________________________________________________________________
Department of Computer Science & Engineering
RT503 Database Management Systems 31 Module 4
______________________________________________________________________________________________________________

Fd2

Here the relation R is in 3NF. But we can see that it is not in BCNF because C is not a super key of R.

Exercise:

1. consider the relation R = { A, B, C, D, E, F, G, H, I, J } and the set of functional dependencies


A, B  C
A  D, E
BF
F  G, H
D  I, J
What is the key of R?
Decompose R in to 2NF, then 3NF relations.

Answer

A B C D E F G H I J

Fd1

Fd2

Fd3

Fd4
Fd5

From the figure, the key of R is (A, B).

This is not in 2NF because in fd2, fd3, there is partial functional dependency. So we remove attributes D,
E, F. but we can see
A D
D I
DJ
So we have to remove I, J
BF
FG
FH
So we have to remove G, H.
______________________________________________________________________________________________________________
Department of Computer Science & Engineering
RT503 Database Management Systems 32 Module 4
______________________________________________________________________________________________________________

So we get relations 2NF

R1
A B C

Fd1

R2

A D E I J

Fd2

Fd5

R3

B F G H

Fd3

Fd4

The above relations R1, R2, R3 are in 2NF because there are no partial functional dependencies
and also it is in 1NF.

DECOMPOSITION TO 3NF

We can take each of R1, R2 and R3 and analyse them

R1
A B C

Fd1

R1 is in 3NF because in Fd1 (A,B  C), A,B is a super key.

______________________________________________________________________________________________________________
Department of Computer Science & Engineering
RT503 Database Management Systems 33 Module 4
______________________________________________________________________________________________________________

R2

A D E I J

Fd2

Fd5

R2 is not in 3NF because

Fd2 ( A  D,E) is in 3NF because A is a super key.

Fd5 ( D I, J) is not in 3NF because D is not a super key and also D is not a prime attribute.

So we remove I and J from R2.


We decompose R2 as

R2A

A D E

Fd2

R2B
D I J

fd5

R2A and R2B are in 3NF.


Consider R3
R3
B F G H

Fd3

Fd4

We can see fd3 satisfies 3NF because B is a super key.

Fd4 is not in 3NF beause F is not a super key and also F is not a prime attribute.
We decompose it into 2 relations. R3A, and R3B.
______________________________________________________________________________________________________________
Department of Computer Science & Engineering
RT503 Database Management Systems 34 Module 4
______________________________________________________________________________________________________________

R3A

B F

Fd3

R3B

F G H

Fd4

R3A and R3B are in 3NF.

So we get the final set of relations as

R1
A B C

Fd1

R2A

A D E

Fd2

R2B
D I J

fd5

R3A
B F

Fd3
______________________________________________________________________________________________________________
Department of Computer Science & Engineering
RT503 Database Management Systems 35 Module 4
______________________________________________________________________________________________________________

R3B
F G H

Fd4

Numerical Questions:

1. Consider a relation R(A, B, C, D) with the following functional dependencies:


o A -> B
o BC -> D

Determine the minimal cover of the functional dependencies.

Solution: The minimal cover for the given functional dependencies is:

 A -> B
 B -> D
 C -> D

2. Given a relation R(A, B, C, D) with the following functional dependencies:


o AB -> C
o C -> D

Determine if the relation R is in Second Normal Form (2NF) and Third Normal Form (3NF).

Solution: The relation R is in 2NF but not in 3NF because C is transitively dependent on the candidate key AB. To
bring R into 3NF, it needs to be decomposed into two relations: R1(AB, C) and R2(C, D).

3. Consider a relation R(A, B, C, D) with the following functional dependencies:


o A -> B
o BC -> D

Decompose the relation R into Boyce-Codd Normal Form (BCNF) while preserving dependencies. Show
the resulting decomposed relations.

Solution: The relation R is already in BCNF since all the functional dependencies are either superkeys or candidate
keys.

______________________________________________________________________________________________________________
Department of Computer Science & Engineering

You might also like