BCS403 DBMS Module2 Notes
BCS403 DBMS Module2 Notes
Module 2
Module 2: Relational Model: Relational Model Concepts, Relational Model Constraints and
relational database schemas, Update operations, transactions, and dealing with constraint
violations. Relational Algebra: Unary and Binary relational operations, additional relational
operations (aggregate, grouping, etc.) Examples of Queries in relational algebra. Mapping
Conceptual Design into a Logical Design: Relational Database Design using ER-to-Relational
mapping.
Textbook:
1. Fundamentals of Database Systems, Ramez Elmasri and Shamkant B. Navathe, 7th Edition,
2017, Pearson.
2. Database management systems, Ramakrishnan, and Gehrke, 3rd Edition, 2014, McGraw Hill.
Characteristics of Relations
1. Ordering of Tuples in a Relation
A relation is defined as a set of tuples. Mathematically, elements of a set have no
order among them; hence, tuples in a relation do not have any particular order.
Similarly, when tuples are represented on a storage device, they must be organized in
some fashion, and it may be advantageous, from a performance standpoint, to
organize them in a way that depends upon their content.
Relational Model Constraints on databases can generally be divided into three main categories:
1. Constraints that are inherent in the data model known as inherent model-based
constraints or implicit constraints.
2. Constraints that can be directly expressed in the schemas of the data model, typically by
specifying them in the DDL known as schema-based constraints or explicit constraints.
3. Constraints that cannot be directly expressed in the schemas of the data model, and hence
must be expressed and enforced by the application programs or in some other way known as
application-based or semantic constraints or business rules.
1. Domain Constraints
Domain constraints specify that within each tuple, the value of each attribute A must be
an atomic value from the domain dom(A).
The data types associated with domains typically include standard numeric data types for
integers and real numbers. Characters, Booleans, fixed-length strings, and variable-length
strings are also available, as are date, time, timestamp, and other special data types.
By definition, all elements of a set are distinct; hence, all tuples in a relation must also be
distinct.
This means that no two tuples can have the same combination of values for all their
attributes. Usually, there are other subsets of attributes of a relation schema R with the
property that no two tuples in any relation state r of R should have the same combination
of values for these attributes.
Suppose that we denote one such subset of attributes by SK; then for any two distinct
tuples t1 and t2 in a relation state r of R, we have the constraint that:
t1[SK] ≠ t2[SK]
A superkey SK specifies a uniqueness constraint that no two distinct tuples in any state r
of R can have the same value for SK.
When we refer to a relational database, we implicitly include both its schema and its
current state.
A database state that does not obey all the integrity constraints is called not valid,
and a state that satisfies all the constraints in the defined set of integrity constraints
IC is called a valid state.
If an insertion violates one or more constraints, the default option is to reject the
insertion.
Another option is to attempt to correct the reason for rejecting the insertion, but
this is typically not used for violations caused by Insert; rather, it is used more
often in correcting violations for Delete and Update.
RELATIONAL ALGEBRA
Unary and Binary relational operations
SELECT and PROJECT
The SELECT Operation
The SELECT operation is used to choose a subset of the tuples from a relation that
satisfies a selection condition.
It restricts the tuples in a relation to only those tuples that satisfy the condition.
It can also be visualized as a horizontal partition of the relation into two sets of tuples—
those tuples that satisfy the condition and are selected, and those tuples that do not satisfy
the condition and are discarded.
For example, to select the EMPLOYEE tuples whose department is 4, or those whose salary
is greater than $30,000
σDno=4(EMPLOYEE)
σSalary>30000(EMPLOYEE)
where the symbol σ (sigma) is used to denote the SELECT operator and the selection condition is
a Boolean expression (condition) specified on the attributes of relation R.
The Boolean expression specified in is made up of a number of clauses of the form :
<attribute name><comparison op><constant value>
Or
<attribute name><comparison op><attribute name>
Clauses can be connected by the standard Boolean operators and, or, and not to form a
general selection condition.
For example, to select the tuples for all employees who either work in department 4 and
make over
$25,000 per year, or work in department 5 and make over $30,000:
The Boolean conditions AND, OR, and NOT have their normal interpretation, as follows:
■ (cond1 AND cond2) is TRUE if both (cond1) and (cond2) are TRUE; otherwise,it is FALSE.
■ (cond1 OR cond2) is TRUE if either (cond1) or (cond2) or both are TRUE; otherwise, it is
FALSE.
■ (NOT cond) is TRUE if cond is FALSE; otherwise, it is FALSE.
The SELECT operator is unary; that is, it is applied to a single relation. Hence, selection
conditions cannot involve more than one tuple.
The degree of the relation resulting from a SELECT operation—its number of attributes—is
the same as the degree of R.
The SELECT operation is commutative; that is,
σ (cond1)(σ(cond2)(R)) = σ(cond2)(σ(cond1)(R))
π<attribute list>(R)
The result of the PROJECT operation has only the attributes specified in in the same order
as they appear in the list. Hence, its degree is equal to the number of attributes in <attribute
list>.
The PROJECT operation removes any duplicate tuples, so the result of the PROJECT
operation is a set of distinct tuples, and hence a valid relation. This is known as duplicate
elimination.
DEP5_EMPS ← σDno=5(EMPLOYEE)
RESULT ← πFname, Lname, Salary(DEP5_EMPS)
It is sometimes simpler to break down a complex sequence of operations by specifying
intermediate result relations than to write a single relational algebra expression.
We can also use this technique to rename the attributes in the intermediate and result
relations.
To rename the attributes in a relation, we simply list the new attribute names in
parentheses, as in the following example:
TEMP ← σDno=5(EMPLOYEE)
R(First_name, Last_name, Salary) ← πFname, Lname, Salary(TEMP)
The formal RENAME operation—which can rename either the relation name or the attribute
names, or both—as a unary operator.
The general RENAME operation when applied to a relation R of degree n is denoted
by any of the following three forms: ρS(B1, B2, ... , Bn)(R) or ρS(R) or ρ(B1, B2, ... ,
Bn)(R) where the symbol ρ (rho) is used to denote the RENAME operator, S is the
new relation name, and B1, B2, … , Bn are the new attribute names.
The first expression renames both the relation and its attributes, the second renames the
relation only, and the third renames the attributes only.
When these operations are adapted to relational databases, the two relations on which
any of these three operations are applied must have the same type of tuples; this
condition has been called union compatibility or type compatibility.
Two relations R(A1, A2, … , An) and S(B1, B2, … , Bn) are said to be union compatible
(or type compatible) if they have the same degree n and if dom(Ai) = dom(Bi) for 1 ≤ i
≤ n. This means that the two relations have the same number of attributes and each
DEP5_EMPS ← σDno=5(EMPLOYEE)
RESULT1 ← πSsn(DEP5_EMPS)
In general, the result of R(A1, A2, ..., An) × S(B1, B2, ..., Bm) is a relation Q with degree
n+m attributes Q(A1, A2, ..., An, B1, B2, ..., Bm), in that order.
The resulting relation Q has one tuple for each combination of tuples—one from R and one
from S.
Hence, if R has m tuples and S has n tuples, then R × S will have m*n tuples.
Example, suppose that we want to retrieve a list of names of each female employee’s
dependents. We can do this as follows:
FEMALE_EMPS ← σSex=‘F’(EMPLOYEE)
EMPNAMES ← πFname, Lname, Ssn(FEMALE_EMPS)
EMP_DEPENDENTS ← EMPNAMES × DEPENDENT
ACTUAL_DEPENDENTS ← σSsn=Essn(EMP_DEPENDENTS)
RESULT ← πFname, Lname, Dependent_name(ACTUAL_DEPENDENTS)
This operation is very important for any relational database with more than a single
relation because it allows us to process relationships among relations.
To illustrate JOIN, suppose that we want to retrieve the name of the manager of each
department, as follows:
The general form of a JOIN operation on two relations R(A1, A2, … , An) and S(B1, B2, … , Bm)
is
R <join condition>S
The result of the JOIN is a relation Q with n + m attributes Q(A1, A2, … , An, B1, B2, … ,
Bm) in that order; Q has one tuple for each combination of tuples—one from R and one
from S—whenever the combination satisfies the join condition.
The main difference between CARTESIAN PRODUCT and JOIN are, In JOIN, only
combinations of tuples satisfying the join condition appear in the result, whereas in the
CARTESIAN PRODUCT all combinations of tuples are included in the result.
The join condition is specified on attributes from the two relations R and S and is
evaluated for each combination of tuples. Each tuple combination for which the join
condition evaluates to TRUE is included in the resulting relation Q as a single combined
tuple.
Variations of JOIN:
The EQUIJOIN and NATURAL JOIN
The most common use of JOIN involves join conditions with equality comparisons only. Such
a JOIN, where the only comparison operator used is =, is called an EQUIJOIN
Next, create a relation that includes a tuple whenever the employee whose Ssn is Essn
works on the project whose number is Pno in the intermediate relation SSN_PNOS:
SSN_PNOS ← πEssn, Pno(WORKS_ON)
Finally, apply the DIVISION operation to the two relations, which gives the desired
employees’ Social Security numbers SSNS(Ssn) ← SSN_PNOS ÷ SMITH_PNOS
RESULT ← πFname, Lname(SSNS * EMPLOYEE)
In general, the DIVISION operation is applied to two relations R(Z) ÷ S(X), where the
attributes of R are a subset of the attributes of S; that is, X ⊆ Z. Let Y be the set of
attributes of R that are not attributes of S.
The DIVISION operation is defined for convenience for dealing with queries that involve
universal quantification or the all condition.
Describes about the notation typically used in relational systems to represent queries
internally.
The notation is called a query tree or sometimes it is known as a query evaluation tree or
query execution tree.
It includes the relational algebra operations being executed and is used as a possible data
structure for the internal representation of the query in an RDBMS.
A query tree is a tree data structure that corresponds to a relational algebra expression.
It represents the input relations of the query as leaf nodes of the tree, and represents the
relational algebra operations as internal nodes.
An execution of the query tree consists of executing an internal node operation
whenever its operands (represented by its child nodes) are available, and then replacing
that internal node by the relation that results from executing the operation.
The execution terminates when the root node is executed and produces the result relation
for the query.
πPnumber, Dnum, Lname, Address, Bdate(((σPlocation
=‘Stafford’(PROJECT)) Dnum=Dnumber(DEPARTMENT)) Mgr_ssn=Ssn(EMPLOYEE))
Query tree for the above query. In this, the three leaf nodes P, D, and E represent the
three relations PROJECT, DEPARTMENT, and EMPLOYEE.
In order to execute query, the node marked (1) in Figure must begin execution before node(2)
because some resulting tuples of operation (1) must be available before we can begin to
execute
operation (2). Similarly, node (2) must begin to execute and produce results before node (3)
can start execution, and so on.
In general, a query tree gives a good visual representation and understanding of the query in
terms of the relational operations it uses and is recommended as an additional means for
expressing queries in relational algebra.
Generalized Projection
The generalized projection operation extends the projection operation by allowing
functions of attributes to be included in the projection list.
The generalized form can be expressed as: πF1, F2, ..., Fn (R)
where F1, F2, … , Fn are functions over the attributes in relation R and may involve arithmetic
operations and constant values.
This operation is helpful when developing reports where computed values have to be
produced in the columns of a query result.
Example: EMPLOYEE (Ssn, Salary, Deduction,
Years_service) A report may be required to show
Net Salary = Salary – Deduction, Bonus = 2000 * Years_service, and Tax = 0.25 *
Salary. Then a generalized projection combined with renaming may be used as follows:
REPORT ← ρ(Ssn, Net_salary, Bonus, Tax)(πSsn, Salary – Deduction, 2000 * Years_service, 0.25 *
Salary(EMPLOYEE))
To retrieve all employees supervised by Borg at level 2—that is, all employees e’’
supervised by some employee e’ who is directly supervised by Borg—we can apply another
JOIN to the result of the first query, as follows:
RESULT2(Ssn) ← πSsn1(SUPERVISION Ssn2=SsnRESULT1)
To get both sets of employees supervised at levels 1 and 2 by ‘James Borg’, we can apply
the UNION operation to the two results, as follows:
RESULT ← RESULT2 U RESULT1
Tuples with NULL values in the join attributes are also eliminated. This type of join, where
tuples with no match are eliminated, is known as an inner join.
This amounts to the loss of information if the user wants the result of the JOIN to include all
the tuples in one or more of the component relations.
A set of operations, called outer joins, were developed for the case where the user wants to
keep all the tuples in R, or all those in S, or all those in both relations in the result of the JOIN,
regardless of whether or not they have matching tuples in the other relation.
This satisfies the need of queries in which tuples from two tables are to be combined by
matching corresponding rows, but without losing any tuples for lack of matching values.
For example, suppose that we want a list of all employee names as well as the name of the
departments they manage if they happen to manage a department; if they do not manage
one, we can indicate it with a NULL value. We can apply an operation LEFT OUTER JOIN,
The LEFT OUTER JOIN operation keeps every tuple in the first, or left, relation R in R S; if no
matching tuple is found in S, then the attributes of S in the join result are filled with NULL
values.
A similar operation, RIGHT OUTER JOIN, denoted by , keeps every tuple in the
second, or right, relation S in the result of R S.
A third operation, FULL OUTER JOIN, keeps all tuples in both the left and the right
relations when no matching tuples are found, filling them with NULL values as needed.
This operation will take the UNION of tuples in two relations R(X, Y) and S(X, Z) that are
partially compatible, meaning that only some of their attributes, say X, are union
compatible.
The attributes that are union compatible are represented only once in the result, and
those attributes that are not union compatible from either relation are also kept in the
result relation T(X,Y, Z). It is therefore the same as a FULL OUTER JOIN on the common
attributes.
Two tuples t1 in R and t2 in S are said to match if t1[X]=t2[X]. These will be combined
(unioned) into a single tuple in t. Tuples in either relation that have no matching tuple in
the other relation are padded with NULL values.
For example, an OUTER UNION can be applied to two relations whose schemas are
STUDENT(Name, Ssn, Department, Advisor) and INSTRUCTOR(Name, Ssn, Department,
Rank).
Tuples from the two relations are matched based on having the same combination of
values of the shared attributes—Name, Ssn, Department.
All the tuples from both relations are included in the result, but tuples with the same
(Name, Ssn, Department) combination will appear only once in the result.
Tuples appearing only in STUDENT will have a NULL for the Rank attribute, whereas tuples
appearing only in INSTRUCTOR will have a NULL for the Advisor attribute. A tuple that
exists in both relations, which represent a student who is also an instructor, will have
values for all its attributes.
Step 1: For each regular (strong) entity type E in the ER schema, create a relation R that includes
all the simple attributes of E.
Step 2: For each weak entity type W in the ER schema with owner entity type E, create a relation
R, and include all simple attributes (or simple components of composite attributes) of W as
attributes. In addition, include as foreign key attributes of R the primary key attribute(s) of the
relation(s) that correspond to the owner entity type(s).
Step 3: For each binary 1:1 relationship type R in the ER schema, identify the relations S and T
that correspond to the entity types participating in R. Choose one of the relations, say S, and include
the primary key of T as a foreign key in S. Include all the simple attributes of R as attributes of S.
Step 4: For each regular binary 1:N relationship type R identify the relation (N) relation S. the
primary key of T as a foreign key of S. Simple attributes of R map to attributes of S.
Step 5: For each binary M:N relationship type R, create a relation S. Include the primary keys of
participant relations as foreign keys in S. Their combination will be the primary key for S. Simple
attributes of R become attributes of S.
Step 6: For each multi-valued attribute A, create a new relation R. This relation will include an
attribute corresponding to A, plus the primary key K of the parent relation (entity type or
relationship type) as a foreign key in R. The primary key of R is the combination of A and K.
Step 7: For each n-ary relationship type R, where n>2, create a new relation S to represent R.
Include the primary keys of the relations participating in R as foreign keys in S. Simple attributes of
R map to attributes of S. The primary key of S is a combination of all the foreign keys that reference
the participants that have cardinality constraint > 1. For a recursive relationship, we will need a new
relation.