BCS403 DBMS Module2 Notes

Database Management System (BCS403) Module 2: Relational Model
Module 2
Module 2: Relational Model: Relational Model Concepts, Relational Model Constraints and
relational database schemas, Update operations, transactions, and dealing with constraint
violations. Relational Algebra: Unary and Binary relational operations, additional relational
operations (aggregate, grouping, etc.) Examples of Queries in relational algebra. Mapping
Conceptual Design into a Logical Design: Relational Database Design using ER-to-Relational
mapping.
Textbook:
1. Fundamentals of Database Systems, Ramez Elmasri and Shamkant B. Navathe, 7th Edition,
2017, Pearson.
2. Database management systems, Ramakrishnan, and Gehrke, 3rd Edition, 2014, McGraw Hill.
1. Relational Model Concepts

 The relational model represents the database as a collection of relations.
Informally, each relation resembles a table of values or, to some extent, a flat file of records
 A relation is thought of as a table of values, each row in the table represents a collection
of related data values.
 A row represents a fact that typically corresponds to a real-world entity or relationship.
The table name and column names are used to help to interpret the meaning of the values
in each row.
 In the formal relational model terminology, a row
 A tuple, a column header
 An attribute, and the table
 A relation. The data type describing the types of values that can appear in each column is
represented by a domain of possible values.
Domains, Attributes, Tuples, and Relations

 A domain D is a set of atomic values. By atomic means each value in the domain is
indivisible in formal relational model. A common method of specifying a domain is to
specify a data type from which the data values forming the domain are drawn.
 Some examples of domains follow:
USA_phone_number: string of digits of length ten
Department of CSE, Vemana IT Page 1 of 25

SSN: string of digits of length nine

Name: string of characters beginning with an upper case letter
GPA: a real number between 0.0 and 4.0
Sex: a member of the set { female, male }
Dept_Code: a member of the set { CMPS, MATH, ENGL, PHYS, PSYC, ... }
 A relation schema R, denoted by R(A1, A2, … , An), is made up of a relation name R and
a list of attributes, A1, A2, … , An.
 Attribute: Ai is the name of a role played by some domain D in the relation schema R. D
is called the domain of Ai and is denoted by dom(Ai).
 Tuple: A tuple is a mapping from attributes to values drawn from the respective domains
of those attributes. A tuple is intended to describe some entity (or relationship between
entities) in the miniworld.
 R is called the name of this relation.
 The degree (or arity) of a relation is the number of attributes n of its relation schema.
 A relation of degree seven, which stores information about university students, would
contain seven attributes describing each student as follows:
STUDENT(Name, Ssn, Home_phone, Address, Office_phone, Age, Gpa)
 Relational Database: A collection of relations, each one consistent with its specified
relational schema.
 A relation (or relation state) r of the relation schema R(A1, A2, … , An), also denoted
by r(R), is a set of n-tuples r = {t1, t2, … , tm}. Each n-tuple t is an ordered list of n values
t
=<v1,v2…,vn>

Characteristics of Relations
1. Ordering of Tuples in a Relation
 A relation is defined as a set of tuples. Mathematically, elements of a set have no
order among them; hence, tuples in a relation do not have any particular order.
 Similarly, when tuples are represented on a storage device, they must be organized in
some fashion, and it may be advantageous, from a performance standpoint, to
organize them in a way that depends upon their content.
2. Ordering of Values within a Tuple

 The order of attributes and their values is not that important as long as the
correspondence between attributes and values is maintained.
 A tuple can be considered as a set of (<attribute>,<value> ) pairs, where each pair
gives the value of the mapping from an attribute Ai to a value vi from dom(Ai). The
ordering of attributes is not important, because the attribute name appears with its
value.
3. Values and NULLs in the Tuples

 Each value in a tuple is an atomic value; that is, it is not divisible into components.
 An important concept is NULL values, which are used to represent the values of
attributes that may be unknown or may not apply to a tuple.
 NULL values has several meanings, such as value unknown, value exists but is
not available, or attribute does not apply to this tuple.
4. Interpretation (Meaning) of a Relation

 Each tuple in the relation can then be interpreted as a fact or a particular instance of
the assertion.
 Each relation can be viewed as a predicate and each tuple in that relation can be
viewed as an assertion for which that predicate is satisfied (i.e., has value true) for
the combination of values in it.
 Example: There exists a student having name Benjamin Bayer, having SSN 305-61-
2435, having age 19, etc
Relational Model Notation

The following notation are used for presentation:
 A relation schema R of degree n is denoted by R(A1, A2, … , An).

 The uppercase letters Q, R, S denote relation names.

 The lowercase letters q, r, s denote relation states.
 The letters t, u, v denote tuples.
 In general, the name of a relation schema such as STUDENT also indicates the current
set of tuples in that relation—the current relation state—whereas STUDENT(Name, Ssn,
…) refers only to the relation schema.
 An attribute A can be qualified with the relation name R to which it belongs by using the
dot notation R.A—for example, STUDENT.Name or STUDENT.Age. This is because the
same name may be used for two attributes in different relations.
 An n-tuple t in a relation r(R) is denoted by t =<v1,v2, ….., vn> , where vi is the value
corresponding to attribute Ai. The following notation refers to component values of
tuples:
 Both t[Ai] and t.Ai (and sometimes t[i]) refer to the value vi in t for attribute Ai.
 Both t[Au, Aw, … , Az] and t.(Au, Aw, … , Az), where Au, Aw, … , Az is a list of
attributes from R, refer to the subtuple of values from t corresponding to the
attributes specified in the list.
2. Relational Model Constraints and Relational Database Schemas
Relational Model Constraints on databases can generally be divided into three main categories:
1. Constraints that are inherent in the data model known as inherent model-based
constraints or implicit constraints.
2. Constraints that can be directly expressed in the schemas of the data model, typically by
specifying them in the DDL known as schema-based constraints or explicit constraints.
3. Constraints that cannot be directly expressed in the schemas of the data model, and hence
must be expressed and enforced by the application programs or in some other way known as
application-based or semantic constraints or business rules.
The schema-based constraints include domain constraints, key constraints, constraints on

NULLs, entity integrity constraints, and referential integrity constraints.
1. Domain Constraints
 Domain constraints specify that within each tuple, the value of each attribute A must be
an atomic value from the domain dom(A).
 The data types associated with domains typically include standard numeric data types for
integers and real numbers. Characters, Booleans, fixed-length strings, and variable-length
strings are also available, as are date, time, timestamp, and other special data types.

2. Key Constraints and Constraints on NULL Values

 In the formal relational model, a relation is defined as a set of tuples.
 By definition, all elements of a set are distinct; hence, all tuples in a relation must also be
distinct.
 This means that no two tuples can have the same combination of values for all their
attributes. Usually, there are other subsets of attributes of a relation schema R with the
property that no two tuples in any relation state r of R should have the same combination
of values for these attributes.
 Suppose that we denote one such subset of attributes by SK; then for any two distinct
tuples t1 and t2 in a relation state r of R, we have the constraint that:
t1[SK] ≠ t2[SK]
 A superkey SK specifies a uniqueness constraint that no two distinct tuples in any state r
of R can have the same value for SK.
 A key k of a relation schema R is a superkey of R with the additional property that

removing any attribute A from K leaves a set of attributes K′ that is not a superkey of R
any more. Hence, a key satisfies two properties:
1. Two distinct tuples in any state of the relation cannot have identical values for (all) the
attributes in the key. This uniqueness property also applies to a superkey.
2. It is a minimal superkey—that is, a superkey from which we cannot remove any

attributes and still have the uniqueness constraint hold. This minimality property is
required for a key but is optional for a superkey.
3. Relational Databases and Relational Database Schemas
 A relational database is a collection of many relations.

 A relational database schema S is a set of relation schemas S = {R1, R2, … , Rm}
and a set of integrity constraints IC.
 A relational database state DB of S is a set of relation states DB = {r1, r2, … ,rm}

such that each ri is a state of Ri and such that the ri relation states satisfy the
integrity

constraints specified in IC.

 A relational database schema that we call COMPANY = {EMPLOYEE, DEPARTMENT,
DEPT_LOCATIONS, PROJECT, WORKS_ON, DEPENDENT}.
 When we refer to a relational database, we implicitly include both its schema and its
current state.
 A database state that does not obey all the integrity constraints is called not valid,
and a state that satisfies all the constraints in the defined set of integrity constraints
IC is called a valid state.

4. Entity Integrity, Referential Integrity, and Foreign Keys

 The entity integrity constraint states that no primary key value can be NULL. This
is because the primary key value is used to identify individual tuples in a relation.
 Key constraints and entity integrity constraints are specified on individual relations.
 The referential integrity constraint is specified between two relations and is used to
maintain the consistency among tuples in the two relations.
 Informally, the referential integrity constraint states that a tuple in one relation
that refers to another relation must refer to an existing tuple in that
relation.
 For example, the attribute Dno of EMPLOYEE gives the department number for which
each employee works; hence, its value in every EMPLOYEE tuple must match the

Dnumber value of some tuple in the DEPARTMENT relation.

 The conditions for a foreign key, given below, specify a referential integrity
constraint between the two relation schemas R1 and R2.
 A set of attributes FK in relation schema R1 is a foreign key of R1 that references
relation R2 if it satisfies the following rules:
1. The attributes in FK have the same domain(s) as the primary key attributes PK of
R2; the attributes FK are said to reference or refer to the relation R2.
2. A value of FK in a tuple t1 of the current state r1(R1) either occurs as a value of PK
for some tuple t2 in the current state r2(R2) or is NULL. In the former case, we have
t1[FK] = t2[PK], and we say that the tuple t1 references or refers to the tuple t2.
 In this definition, R1 is called the referencing relation and R2 is the referenced
relation. If these two conditions hold, a referential integrity constraint from R1 to R2
is said to hold.
 In the EMPLOYEE relation, the attribute Dno refers to the department for which an
employee works;
hence, it is designated Dno to be a foreign key of EMPLOYEE referencing the
DEPARTMENT relation.
 This means that a value of Dno in any tuple t1 of the EMPLOYEE relation must match
a value of Constraints the primary key of DEPARTMENT—the Dnumber attribute—in
some tuple t2 of the DEPARTMENT relation, or the value of Dno can be NULL if the
employee does not belong to a department or will be assigned to a department later.

Other types of constraints

 The salary of an employee should not exceed the salary of the employee’s supervisor
and the maximum number of hours an employee can work on all projects per week is
56. Such constraints can be specified and enforced within the application programs
that update the database, or by using a general-purpose constraint specification
language. Sometimes called as Semantic Integrity constraint.
Update Operations, Transactions, and Dealing with Constraint Violations

i. There are three basic operations that can change the states of relations
in the database: Insert, Delete, and Update (or Modify).
ii. Insert is used to insert one or more new tuples in a relation.
iii. Delete is used to delete tuples.
iv. Update (or Modify) is used to change the values of some attributes in existing tuples.
The Insert Operation:

The Insert operation provides a list of attribute values for a new tuple t that is to be inserted into a
relation R.
Insert can violate any of the four types of constraints.
1. Domain constraints can be violated if an attribute value is given that
does not appear in the corresponding domain or is not of the appropriate
data type.
2. Key constraints can be violated if a key value in the new tuple t already exists
in another tuple in the relation r(R).
3. Entity integrity can be violated if any part of the primary key of the new tuple t is
NULL.
4. Referential integrity can be violated if the value of any foreign key in t refers
to a tuple that does not exist in the referenced relation.
 If an insertion violates one or more constraints, the default option is to reject the
insertion.
 Another option is to attempt to correct the reason for rejecting the insertion, but
this is typically not used for violations caused by Insert; rather, it is used more
often in correcting violations for Delete and Update.

The Delete Operation

 The Delete operation can violate only referential integrity.
 This occurs if the tuple being deleted is referenced by foreign keys from other tuples in the
database.
 To specify deletion, a condition on the attributes of the relation selects the tuple (or
tuples) to be deleted.
 Several options are available if a deletion operation causes a violation. The first option,
called restrict, is to reject the deletion.
 The second option, called cascade, is to attempt to cascade (or propagate) the deletion
by deleting tuples that reference the tuple that is being deleted. For example, in operation
2, the DBMS could automatically delete the offending tuples from WORKS_ON with Essn =
‘999887777’.
 A third option, called set null or set default, is to modify the referencing attribute
values that cause the violation; each such value is either set to NULL or changed to
reference another default valid tuple.

The Update Operation

The Update (or Modify) operation is used to change the values of one or more attributes in a
tuple (or tuples) of some relation R. It is necessary to specify a condition on the attributes of the
relation to select the tuple (or tuples) to be modified.
A transaction is an executing program that includes some database operations, such as

reading from the database, or applying insertions, deletions, or updates to the database. At the
end of the transaction, it must leave the database in a valid or consistent state that satisfies all
the constraints specified on the database schema.
A large number of commercial applications running against relational databases in online

transaction processing (OLTP) systems are executing transactions at rates that reach several
hundred per second.

RELATIONAL ALGEBRA
Unary and Binary relational operations
SELECT and PROJECT
The SELECT Operation
 The SELECT operation is used to choose a subset of the tuples from a relation that
satisfies a selection condition.
 It restricts the tuples in a relation to only those tuples that satisfy the condition.
 It can also be visualized as a horizontal partition of the relation into two sets of tuples—
those tuples that satisfy the condition and are selected, and those tuples that do not satisfy
the condition and are discarded.
 For example, to select the EMPLOYEE tuples whose department is 4, or those whose salary
is greater than $30,000
σDno=4(EMPLOYEE)
σSalary>30000(EMPLOYEE)
 In general, the SELECT operation is denoted by

σ<selection condition>(R)
where the symbol σ (sigma) is used to denote the SELECT operator and the selection condition is
a Boolean expression (condition) specified on the attributes of relation R.
 The Boolean expression specified in is made up of a number of clauses of the form :
<attribute name><comparison op><constant value>
Or
<attribute name><comparison op><attribute name>
 Clauses can be connected by the standard Boolean operators and, or, and not to form a
general selection condition.
 For example, to select the tuples for all employees who either work in department 4 and
make over
$25,000 per year, or work in department 5 and make over $30,000:
σ(Dno=4 AND Salary>25000) OR (Dno=5 AND

Salary>30000)(EMPLOYEE)
 The Boolean conditions AND, OR, and NOT have their normal interpretation, as follows:
■ (cond1 AND cond2) is TRUE if both (cond1) and (cond2) are TRUE; otherwise,it is FALSE.
■ (cond1 OR cond2) is TRUE if either (cond1) or (cond2) or both are TRUE; otherwise, it is

FALSE.
■ (NOT cond) is TRUE if cond is FALSE; otherwise, it is FALSE.
 The SELECT operator is unary; that is, it is applied to a single relation. Hence, selection
conditions cannot involve more than one tuple.
 The degree of the relation resulting from a SELECT operation—its number of attributes—is
the same as the degree of R.
 The SELECT operation is commutative; that is,
σ (cond1)(σ(cond2)(R)) = σ(cond2)(σ(cond1)(R))
The PROJECT Operation

 The PROJECT operation, selects certain columns from the table and discards the other
columns.
 The result of the PROJECT operation can be visualized as a vertical partition of the relation
into two relations: one has the needed columns (attributes) and contains the result of the
operation, and the other contains the discarded columns.
 For example, to list each employee’s first and last name and salary, we can use the
PROJECT operation as follows:
πLname, Fname, Salary(EMPLOYEE)

 The general form of the PROJECT operation is where (pi) is the symbol used to represent the
PROJECT operation, and is the desired sublist of attributes from the attributes of relation R.
π<attribute list>(R)
 The result of the PROJECT operation has only the attributes specified in in the same order
as they appear in the list. Hence, its degree is equal to the number of attributes in <attribute
list>.
 The PROJECT operation removes any duplicate tuples, so the result of the PROJECT
operation is a set of distinct tuples, and hence a valid relation. This is known as duplicate
elimination.
Sequences of Operations and the RENAME Operation

 The relations shown above depict operation results do not have any names.
 Either we can write the operations as a single relational algebra expression by nesting the
operations, or we can apply one operation at a time and create intermediate result relations.
 In the latter case, we must give names to the relations that hold the intermediate results.
 For example, to retrieve the first name, last name, and salary of all employees who
work in department number 5, apply a SELECT and a PROJECT operation.

DEP5_EMPS ← σDno=5(EMPLOYEE)
RESULT ← πFname, Lname, Salary(DEP5_EMPS)
 It is sometimes simpler to break down a complex sequence of operations by specifying
intermediate result relations than to write a single relational algebra expression.
 We can also use this technique to rename the attributes in the intermediate and result
relations.
 To rename the attributes in a relation, we simply list the new attribute names in
parentheses, as in the following example:
TEMP ← σDno=5(EMPLOYEE)
R(First_name, Last_name, Salary) ← πFname, Lname, Salary(TEMP)
 The formal RENAME operation—which can rename either the relation name or the attribute
names, or both—as a unary operator.
 The general RENAME operation when applied to a relation R of degree n is denoted
by any of the following three forms: ρS(B1, B2, ... , Bn)(R) or ρS(R) or ρ(B1, B2, ... ,
Bn)(R) where the symbol ρ (rho) is used to denote the RENAME operator, S is the
new relation name, and B1, B2, … , Bn are the new attribute names.
 The first expression renames both the relation and its attributes, the second renames the
relation only, and the third renames the attributes only.
Relational Algebra Operations from Set Theory
The UNION, INTERSECTION, and MINUS Operations

■ UNION: The result of this operation, denoted by R ∪S, is a relation that includes all tuples
that are either in R or in S or in both R and S. Duplicate tuples are eliminated.
■ INTERSECTION: The result of this operation, denoted by R ∩ S, is a relation that includes
all tuples that are in both R and S.
■ SET DIFFERENCE (or MINUS): The result of this operation, denoted by R – S, is a relation
that includes all tuples that are in R but not in S.
These are binary operations; that is, each is applied to two sets (of tuples).
When these operations are adapted to relational databases, the two relations on which
any of these three operations are applied must have the same type of tuples; this
condition has been called union compatibility or type compatibility.
Two relations R(A1, A2, … , An) and S(B1, B2, … , Bn) are said to be union compatible
(or type compatible) if they have the same degree n and if dom(Ai) = dom(Bi) for 1 ≤ i
≤ n. This means that the two relations have the same number of attributes and each

corresponding pair of attributes has the same domain.

For example, to retrieve the Social Security numbers of all employees who either work
in department 5 or directly supervise an employee who works in department 5,
DEP5_EMPS ← σDno=5(EMPLOYEE)
RESULT1 ← πSsn(DEP5_EMPS)
RESULT ← RESULT1 ∪ RESULT2

RESULT2(Ssn) ← πSuper_ssn(DEP5_EMPS)
Both UNION and INTERSECTION are commutative operations; that is,

R ∪ S = S ∪ R and R ∩ S = S ∩ R
Both UNION and INTERSECTION can be treated as n-ary operations applicable to any
number of relations because both are also associative operations; that is,
R ∪ (S ∪ T ) = (R ∪ S) ∪ T and (R ∩ S) ∩ T = R ∩ (S ∩ T)
The MINUS operation is not commutative; that is, in general, R − S ≠ S − R
The INTERSECTION can be expressed in terms of union and set difference as follows:
R ∩ S = ((R ∪ S) − (R − S)) − (S − R)
The CARTESIAN PRODUCT (CROSS PRODUCT) Operation
CARTESIAN PRODUCT operation—also known as CROSS PRODUCT or CROSS JOIN—

which is denoted by ×.
This is also a binary set operation, but the relations on which it is applied do not have to
be union compatible.
This set operation produces a new element by combining every member (tuple) from one
relation (set) with every member (tuple) from the other relation (set).
In general, the result of R(A1, A2, ..., An) × S(B1, B2, ..., Bm) is a relation Q with degree
n+m attributes Q(A1, A2, ..., An, B1, B2, ..., Bm), in that order.
The resulting relation Q has one tuple for each combination of tuples—one from R and one
from S.
Hence, if R has m tuples and S has n tuples, then R × S will have m*n tuples.
Example, suppose that we want to retrieve a list of names of each female employee’s
dependents. We can do this as follows:
FEMALE_EMPS ← σSex=‘F’(EMPLOYEE)
EMPNAMES ← πFname, Lname, Ssn(FEMALE_EMPS)
EMP_DEPENDENTS ← EMPNAMES × DEPENDENT
ACTUAL_DEPENDENTS ← σSsn=Essn(EMP_DEPENDENTS)
RESULT ← πFname, Lname, Dependent_name(ACTUAL_DEPENDENTS)

Binary Relational Operations: JOIN and DIVISION

The JOIN Operation
The JOIN operation, denoted by, is used to combine related tuples from two
relations into single “longer” tuples.
This operation is very important for any relational database with more than a single
relation because it allows us to process relationships among relations.
To illustrate JOIN, suppose that we want to retrieve the name of the manager of each
department, as follows:
DEPT_MGR ← DEPARTMENT Mgr_ssn=Ssn

EMPLOYEE
RESULT ← πDname, Lname, Fname(DEPT_MGR)
The general form of a JOIN operation on two relations R(A1, A2, … , An) and S(B1, B2, … , Bm)
is
R <join condition>S
The result of the JOIN is a relation Q with n + m attributes Q(A1, A2, … , An, B1, B2, … ,
Bm) in that order; Q has one tuple for each combination of tuples—one from R and one
from S—whenever the combination satisfies the join condition.
The main difference between CARTESIAN PRODUCT and JOIN are, In JOIN, only
combinations of tuples satisfying the join condition appear in the result, whereas in the
CARTESIAN PRODUCT all combinations of tuples are included in the result.
The join condition is specified on attributes from the two relations R and S and is
evaluated for each combination of tuples. Each tuple combination for which the join
condition evaluates to TRUE is included in the resulting relation Q as a single combined
 A general join condition is of the form

<condition>AND<condition> AND … AND<condition>
where each <condition> is of the form Ai θ Bj , Ai is an attribute of R, Bj is an attribute of S,
Ai and Bj have the same domain, and θ (theta) is one of the comparison operators {=,
<,>,< , ≥, ≠}. A JOIN operation with such a general join condition is called a THETA JOIN.
tuple.
Variations of JOIN:
The EQUIJOIN and NATURAL JOIN
 The most common use of JOIN involves join conditions with equality comparisons only. Such
a JOIN, where the only comparison operator used is =, is called an EQUIJOIN

 Notice that in the result of an EQUIJOIN we always have one or more pairs of attributes that

have identical values in every tuple

 For example, the values of the attributes Mgr_ssn and Ssn are identical in every tuple of
DEPT_MGR (the EQUIJOIN result) because the equality join condition specified on these two
attributes requires the values to be identical in every tuple in the result
 Because one of each pair of attributes with identical values is superfluous, a new operation
called NATURAL JOIN—denoted by * was created to get rid of the second (superfluous)
attribute in an EQUIJOIN condition
 The standard definition of NATURAL JOIN requires that the two join attributes (or
each pair of join attributes) have the same name in both relations. If this is not
the case, a renaming operation is applied first.
 Suppose we want to combine each PROJECT tuple with the DEPARTMENT tuple
that controls the project. In the following example, first we rename the Dnumber
attribute of DEPARTMENT to Dnum— so that it has the same name as the Dnum
attribute in PROJECT—and then we apply NATURAL JOIN: PROJ_DEPT ← PROJECT
* ρ(Dname, Dnum, Mgr_ssn, Mgr_start_date)(DEPARTMENT)
 The same query can be done in two steps by creating an intermediate table DEPT as
follows: DEPT ← ρ(Dname, Dnum, Mgr_ssn, Mgr_start_date)(DEPARTMENT) PROJ_DEPT
← PROJECT * DEPT
 The attribute Dnum is called the join attribute for the NATURAL JOIN operation,
because it is the only attribute with the same name in both relations.
 In general, the join condition for NATURAL JOIN is constructed by equating each
pair of join attributes that have the same name in the two relations and
combining these conditions with AND.
 A single JOIN operation is used to combine data from two relations so that
related information can be presented in a single table. These operations are also
known as inner joins.
 A more general, but nonstandard definition for NATURAL JOIN is Q ← R
*(list1),(list2)S
 In this case,<list1> specifies a list of i attributes from R, and <list2>specifies a list
of i attributes from S.
 The NATURAL JOIN or EQUIJOIN operation can also be specified among multiple tables,
leading to an n-way join. For example, consider the following three-way join:
((PROJECT Dnum=DnumberDEPARTMENT) Mgr_ssn=SsnEMPLOYEE)

A Complete Set of Relational Algebra Operations

It has been shown that the set of relational algebra operations {σ, π, ∪, ρ, –, ×} is a
complete set; that is, any of the other original relational algebra operations can be
expressed as a sequence of operations from this set.
For example, the INTERSECTION operation can be expressed by using UNION and MINUS
as follows: R ∩ S ≡ (R ∪ S) – ((R – S) ∪(S – R))
JOIN operation can be specified as a CARTESIAN PRODUCT followed by a SELECT
operation: R <condition>S ≡ σ <condition>(R × S)
A NATURAL JOIN can be specified as a CARTESIAN PRODUCT preceded by RENAME and
followed by SELECT and PROJECT operations.
The DIVISION Operation

The DIVISION operation, denoted by ÷, is useful for a special kind of query that sometimes
occurs in database applications.
example is Retrieve the names of employees who work on all the projects that ‘John Smith’
works on.
To express this query using the DIVISION operation, proceed as follows. First, retrieve the list of
project numbers that ‘John Smith’ works on in the intermediate relation SMITH_PNOS:
SMITH ← σFname=‘John’ AND Lname=‘Smith’(EMPLOYEE) SMITH_PNOS ←
πPno(WORKS_ON Essn=SsnSMITH)
 Next, create a relation that includes a tuple whenever the employee whose Ssn is Essn
works on the project whose number is Pno in the intermediate relation SSN_PNOS:
SSN_PNOS ← πEssn, Pno(WORKS_ON)
 Finally, apply the DIVISION operation to the two relations, which gives the desired
employees’ Social Security numbers SSNS(Ssn) ← SSN_PNOS ÷ SMITH_PNOS
RESULT ← πFname, Lname(SSNS * EMPLOYEE)
In general, the DIVISION operation is applied to two relations R(Z) ÷ S(X), where the
attributes of R are a subset of the attributes of S; that is, X ⊆ Z. Let Y be the set of
attributes of R that are not attributes of S.
The DIVISION operation is defined for convenience for dealing with queries that involve
universal quantification or the all condition.

Notation for Query Trees
Describes about the notation typically used in relational systems to represent queries
internally.
The notation is called a query tree or sometimes it is known as a query evaluation tree or
query execution tree.
It includes the relational algebra operations being executed and is used as a possible data
structure for the internal representation of the query in an RDBMS.
A query tree is a tree data structure that corresponds to a relational algebra expression.
It represents the input relations of the query as leaf nodes of the tree, and represents the
relational algebra operations as internal nodes.
An execution of the query tree consists of executing an internal node operation
whenever its operands (represented by its child nodes) are available, and then replacing
that internal node by the relation that results from executing the operation.
The execution terminates when the root node is executed and produces the result relation
for the query.
πPnumber, Dnum, Lname, Address, Bdate(((σPlocation
=‘Stafford’(PROJECT)) Dnum=Dnumber(DEPARTMENT)) Mgr_ssn=Ssn(EMPLOYEE))
Query tree for the above query. In this, the three leaf nodes P, D, and E represent the
three relations PROJECT, DEPARTMENT, and EMPLOYEE.
 In order to execute query, the node marked (1) in Figure must begin execution before node(2)
because some resulting tuples of operation (1) must be available before we can begin to
execute

operation (2). Similarly, node (2) must begin to execute and produce results before node (3)
can start execution, and so on.
 In general, a query tree gives a good visual representation and understanding of the query in
terms of the relational operations it uses and is recommended as an additional means for
expressing queries in relational algebra.
Additional Relational Operations (aggregate, grouping, etc.)
Generalized Projection
The generalized projection operation extends the projection operation by allowing
functions of attributes to be included in the projection list.
The generalized form can be expressed as: πF1, F2, ..., Fn (R)
where F1, F2, … , Fn are functions over the attributes in relation R and may involve arithmetic
operations and constant values.
This operation is helpful when developing reports where computed values have to be
produced in the columns of a query result.
Example: EMPLOYEE (Ssn, Salary, Deduction,
Years_service) A report may be required to show
Net Salary = Salary – Deduction, Bonus = 2000 * Years_service, and Tax = 0.25 *
Salary. Then a generalized projection combined with renaming may be used as follows:
REPORT ← ρ(Ssn, Net_salary, Bonus, Tax)(πSsn, Salary – Deduction, 2000 * Years_service, 0.25 *
Salary(EMPLOYEE))
Aggregate Functions and Grouping

 Mathematical aggregate functions on collections of values from the database. Examples of
such functions include retrieving the average or total salary of all employees or the total
number of employee tuples.
 Common functions applied to collections of numeric values include SUM, AVERAGE,
MAXIMUM, and MINIMUM.
o The COUNT function is used for counting tuples or values.
 AGGREGATE FUNCTION operation defined, using the symbol ℑ , to specify these types of
requests as follows:
o <grouping attribute>ℑ<function list> (R)
o Where<grouping attribute> is a list of attributes of the relation specified in R, and
<function list>is a list of (<function><attribute> ) pairs.
 In each such pair, is one of the allowed functions—such as SUM, AVERAGE, MAXIMUM,

MINIMUM, COUNT—and is an attribute of the relation specified by R.

 The resulting relation has the grouping attributes plus one attribute for each element in the
function list.
 For example, to retrieve each department number, the number of employees in the
department, and their average salary, while renaming the resulting attributes as indicated
below, we write: ρR(Dno, No_of_employees, Average_sal) (Dno ℑ COUNT Ssn, AVERAGE
Salary (EMPLOYEE))
 If we do not want to rename the attributes then the above query we can write it as, Dno
COUNT Ssn, AVERAGE Salary(EMPLOYEE)
Note: If no grouping attributes are specified, the functions are applied to all the tuples in the
relation, so the resulting relation has a single tuple only.
Recursive Closure Operations

 Recursive closure operation in relational algebra is applied to a recursive relationship
between tuples of the same type, such as the relationship between an employee and a
supervisor.
 This relationship is described by the foreign key Super_ssn of the EMPLOYEE relation and it
relates each employee tuple (in the role of supervisee) to another employee tuple (in the
role of supervisor).
 An example of a recursive operation is to retrieve all supervisees of an employee e at all
levels—that is, all employees e’ directly supervised by e, all employees e’’ directly
supervised by each employee e’, all employees e’’’ directly supervised by each employee
e’’, and so on.
 For example, to specify the Ssns of all employees e_ directly supervised—at level one—by
the employee e whose name is ‘James Borg’ we can apply the following operation:
BORG_SSN ← πSsn(σFname=‘James’ AND Lname=‘Borg’(EMPLOYEE)) SUPERVISION(Ssn1,

Ssn2) ← πSsn,Super_ssn(EMPLOYEE) RESULT1(Ssn) ← πSsn1(SUPERVISION
Ssn2=SsnBORG_SSN)
 To retrieve all employees supervised by Borg at level 2—that is, all employees e’’
supervised by some employee e’ who is directly supervised by Borg—we can apply another
JOIN to the result of the first query, as follows:
RESULT2(Ssn) ← πSsn1(SUPERVISION Ssn2=SsnRESULT1)
 To get both sets of employees supervised at levels 1 and 2 by ‘James Borg’, we can apply
the UNION operation to the two results, as follows:
RESULT ← RESULT2 U RESULT1

OUTER JOIN Operations

 For a NATURAL JOIN operation R * S, only tuples from R that have matching tuples in S—and
vice versa—appear in the result. Hence, tuples without a matching (or related) tuple are
eliminated from the JOIN result.
 Tuples with NULL values in the join attributes are also eliminated. This type of join, where
tuples with no match are eliminated, is known as an inner join.
 This amounts to the loss of information if the user wants the result of the JOIN to include all
the tuples in one or more of the component relations.
 A set of operations, called outer joins, were developed for the case where the user wants to
keep all the tuples in R, or all those in S, or all those in both relations in the result of the JOIN,
regardless of whether or not they have matching tuples in the other relation.
 This satisfies the need of queries in which tuples from two tables are to be combined by
matching corresponding rows, but without losing any tuples for lack of matching values.
 For example, suppose that we want a list of all employee names as well as the name of the
departments they manage if they happen to manage a department; if they do not manage
one, we can indicate it with a NULL value. We can apply an operation LEFT OUTER JOIN,
denoted by , to retrieve the result as follows:
EMP ← (EMPLOYEE Ssn=Mgr_ssnDEPARTMENT)

RESULT ← πFname, Minit, Lname, Dname(TEMP)
 The LEFT OUTER JOIN operation keeps every tuple in the first, or left, relation R in R S; if no
matching tuple is found in S, then the attributes of S in the join result are filled with NULL
values.
 A similar operation, RIGHT OUTER JOIN, denoted by , keeps every tuple in the
second, or right, relation S in the result of R S.
 A third operation, FULL OUTER JOIN, keeps all tuples in both the left and the right
relations when no matching tuples are found, filling them with NULL values as needed.

The OUTER UNION Operation

 The OUTER UNION operation was developed to take the union of tuples from two relations
that have some common attributes, but are not union (type) compatible.
 This operation will take the UNION of tuples in two relations R(X, Y) and S(X, Z) that are
partially compatible, meaning that only some of their attributes, say X, are union
compatible.
 The attributes that are union compatible are represented only once in the result, and
those attributes that are not union compatible from either relation are also kept in the
result relation T(X,Y, Z). It is therefore the same as a FULL OUTER JOIN on the common
attributes.
 Two tuples t1 in R and t2 in S are said to match if t1[X]=t2[X]. These will be combined
(unioned) into a single tuple in t. Tuples in either relation that have no matching tuple in
the other relation are padded with NULL values.
 For example, an OUTER UNION can be applied to two relations whose schemas are
STUDENT(Name, Ssn, Department, Advisor) and INSTRUCTOR(Name, Ssn, Department,
Rank).
 Tuples from the two relations are matched based on having the same combination of
values of the shared attributes—Name, Ssn, Department.
 The resulting relation, STUDENT_OR_INSTRUCTOR, will have the following attributes:

STUDENT_OR_INSTRUCTOR(Name, Ssn, Department, Advisor, Rank)
 All the tuples from both relations are included in the result, but tuples with the same
(Name, Ssn, Department) combination will appear only once in the result.
 Tuples appearing only in STUDENT will have a NULL for the Rank attribute, whereas tuples
appearing only in INSTRUCTOR will have a NULL for the Advisor attribute. A tuple that
exists in both relations, which represent a student who is also an instructor, will have
values for all its attributes.

Mapping Conceptual Design into a Logical Design
Relational Database Design using ER-to-Relational mapping.
Step 1: For each regular (strong) entity type E in the ER schema, create a relation R that includes
all the simple attributes of E.
Step 2: For each weak entity type W in the ER schema with owner entity type E, create a relation
R, and include all simple attributes (or simple components of composite attributes) of W as
attributes. In addition, include as foreign key attributes of R the primary key attribute(s) of the
relation(s) that correspond to the owner entity type(s).
Step 3: For each binary 1:1 relationship type R in the ER schema, identify the relations S and T
that correspond to the entity types participating in R. Choose one of the relations, say S, and include
the primary key of T as a foreign key in S. Include all the simple attributes of R as attributes of S.
Step 4: For each regular binary 1:N relationship type R identify the relation (N) relation S. the
primary key of T as a foreign key of S. Simple attributes of R map to attributes of S.
Step 5: For each binary M:N relationship type R, create a relation S. Include the primary keys of
participant relations as foreign keys in S. Their combination will be the primary key for S. Simple
attributes of R become attributes of S.
Step 6: For each multi-valued attribute A, create a new relation R. This relation will include an
attribute corresponding to A, plus the primary key K of the parent relation (entity type or
relationship type) as a foreign key in R. The primary key of R is the combination of A and K.
Step 7: For each n-ary relationship type R, where n>2, create a new relation S to represent R.
Include the primary keys of the relations participating in R as foreign keys in S. Simple attributes of
R map to attributes of S. The primary key of S is a combination of all the foreign keys that reference
the participants that have cardinality constraint > 1. For a recursive relationship, we will need a new
relation.

BCS403 DBMS Module2 Notes

Uploaded by

BCS403 DBMS Module2 Notes

Uploaded by

Database Management System (BCS403) Module 2: Relational Model

1. Relational Model Concepts

Domains, Attributes, Tuples, and Relations

Department of CSE, Vemana IT Page 1 of 25

SSN: string of digits of length nine

Department of CSE, Vemana IT Page 2 of 25

2. Ordering of Values within a Tuple

3. Values and NULLs in the Tuples

4. Interpretation (Meaning) of a Relation

Relational Model Notation

Department of CSE, Vemana IT Page 3 of 25

 The uppercase letters Q, R, S denote relation names.

2. Relational Model Constraints and Relational Database Schemas

The schema-based constraints include domain constraints, key constraints, constraints on

Department of CSE, Vemana IT Page 4 of 25

2. Key Constraints and Constraints on NULL Values

 A key k of a relation schema R is a superkey of R with the additional property that

2. It is a minimal superkey—that is, a superkey from which we cannot remove any

3. Relational Databases and Relational Database Schemas

 A relational database is a collection of many relations.

 A relational database state DB of S is a set of relation states DB = {r1, r2, … ,rm}

Department of CSE, Vemana IT Page 5 of 25

constraints specified in IC.

Department of CSE, Vemana IT Page 6 of 25

4. Entity Integrity, Referential Integrity, and Foreign Keys

Department of CSE, Vemana IT Page 7 of 25

Dnumber value of some tuple in the DEPARTMENT relation.

Department of CSE, Vemana IT Page 8 of 25

Other types of constraints

Update Operations, Transactions, and Dealing with Constraint Violations

The Insert Operation:

Department of CSE, Vemana IT Page 9 of 25

The Delete Operation

Department of CSE, Vemana IT Page 10 of 25

The Update Operation

A transaction is an executing program that includes some database operations, such as

A large number of commercial applications running against relational databases in online

Department of CSE, Vemana IT Page 11 of 25

 In general, the SELECT operation is denoted by

σ(Dno=4 AND Salary>25000) OR (Dno=5 AND

Department of CSE, Vemana IT Page 12 of 25

The PROJECT Operation

πLname, Fname, Salary(EMPLOYEE)

Sequences of Operations and the RENAME Operation

Department of CSE, Vemana IT Page 13 of 25

Relational Algebra Operations from Set Theory

The UNION, INTERSECTION, and MINUS Operations

Department of CSE, Vemana IT Page 14 of 25

corresponding pair of attributes has the same domain.

RESULT ← RESULT1 ∪ RESULT2

Both UNION and INTERSECTION are commutative operations; that is,

The CARTESIAN PRODUCT (CROSS PRODUCT) Operation

CARTESIAN PRODUCT operation—also known as CROSS PRODUCT or CROSS JOIN—

Department of CSE, Vemana IT Page 15 of 25

Binary Relational Operations: JOIN and DIVISION

DEPT_MGR ← DEPARTMENT Mgr_ssn=Ssn

 A general join condition is of the form

Department of CSE, Vemana IT Page 16 of 25

Department of CSE, Vemana IT Page 17 of 25

have identical values in every tuple

Department of CSE, Vemana IT Page 18 of 25

A Complete Set of Relational Algebra Operations

The DIVISION Operation

Department of CSE, Vemana IT Page 19 of 25

Notation for Query Trees

Department of CSE, Vemana IT Page 20 of 25

Additional Relational Operations (aggregate, grouping, etc.)

Aggregate Functions and Grouping

Department of CSE, Vemana IT Page 21 of 25

MINIMUM, COUNT—and is an attribute of the relation specified by R.

Recursive Closure Operations

BORG_SSN ← πSsn(σFname=‘James’ AND Lname=‘Borg’(EMPLOYEE)) SUPERVISION(Ssn1,

Department of CSE, Vemana IT Page 22 of 25

OUTER JOIN Operations

denoted by , to retrieve the result as follows:

EMP ← (EMPLOYEE Ssn=Mgr_ssnDEPARTMENT)