Lecture - 21 22 23
Lecture - 21 22 23
Lecture - 21 22 23
requirements and how to put them in ER diagrams, how to convert them into tables and their
columns, set their constraints etc. Once we have database ready users will start using them.
But how will they access the database? Most of the time they access the data by using some
applications. These applications will communicate to database by SQL and DBMS is
responsible for managing the application and SQL intact. SQL has its own querying methods
to interact with database. But how these queries work in the database? These queries work
similar to relational algebra that we have in mathematics. In database we
have tables participating in relational algebra.
Relational Query Language
Relational algebra uses relational algebra to break the user requests and instruct the
DBMS to execute them. Relational Query language is used by the user to communicate
with the database. They are generally on a higher level than any other programming
language.
This is further divided into two types
• Procedural Query Language
• Non-Procedural Language
Relational algebra will have operators to indicate the operations. This algebra can be applied
on single relation – called unary or can be applied on two tables – called binary. While
applying the operations on the relation, the resulting subset of relation is also known as new
relation. There can be multiple steps involved in some of the operations. The subsets of
relations at the intermediary level are also known as relation.
Relational Operations
Select (σ) – This is a unary relational operation. This operation pulls the horizontal subset
(subset of rows) of the relation that satisfies the conditions. This can use operators like <, >,
<=, >=, = and != to filter the data from the relation. It can also use logical AND, OR and NOT
operators to combine the various filtering conditions. This operation can be represented as
below:
σ p (r)
Where σ is the symbol for select operation, r represents the relation/table, and p is the logical
formula or the filtering conditions to get the subset. Let us see an example as below:
σdept_id = 20 AND salary>=10000 (EMPLOYEE) – Selects the records from EMPLOYEE table
with department ID = 20 and employees whose salary is more than 10000.
Project (∏)
Project (∏) – This is a unary operator and is similar to select operation above. It creates the
subset of relation based on the conditions specified. Here, it selects only selected
columns/attributes from the relation- vertical subset of relation. The select operation above
creates subset of relation but for all the attributes in the relation. It is denoted as below:
∏a1, a2, a3 (r)
Where ∏ is the operator for projection, r is the relation and a1, a2, a3 are the attributes of
the relations which will be shown in the resultant subset.
∏std_name, address, course (STUDENT) – This will select all the records from STUDENT table
but only selected columns – std_name, address and course. Suppose we have to select only
these 3 columns for particular student then we have to combine both project and select
operations.
∏STD_ID, address, course (σ STD_NAME = “James”(STUDENT)) – this selects the record for
‘James’ and displays only std_ID, address and his course columns. Here we can see two unary
operators are combined, and it has two operations performing. First it selects the tuple from
STUDENT table for ‘James’. The resultant subset of STUDENT is also considered as
intermediary relation. But it is temporary and exists till the end of this operation. It then filters
the 3 columns from this temporary relation.
Union (U)
Union (U) – It is a binary operator, which combines the tuples of two relations. It is denoted
by
RUS
Where R and S are the relations and U is the operator.
DESIGN_EMPLOYEE U TESTING_EMPLOYEE
Where DESIGN_EMPLOYEE and TESTING_EMPLOYEE are two relations.
• Cartesian product combines the attributes of two relations into one relation
whereas Union combines the tuples of two relations into one relation.
• In Union, both relations should have same number of columns. Suppose we
have to list the employees who are working for design and testing department.
Then we will do the union on employee table. Since it is union on same table
it has same number of attributes. Cartesian product does not concentrate on
number of attribute or rows. It blindly combines the attributes.
• In Union, both relations should have same types of attributes in same order. In
the above example, since union is on employee relation, it has same type of
attribute in the same order.
It need not have same number of tuples in both the relation. If there is a duplicate tuple as a
result of union, then it keeps only one tuple. If a tuple is present in any one relation, then it
keeps that tuple in the new relation. In the above example, number of employees in design
department need not be same as employees in testing department. It combines the table
data in the order they appear in the table.
We would not able to join both these tables if the order of columns or the number of columns
were different.
Set-difference (-)
Set-difference (-) – This is a binary operator. This operator creates a new relation with tuples
that are in one relation but not in other relation. It is denoted by ‘-‘symbol.
R–S
Where R and S are the relations. Suppose we want to retrieve the employees who are working
in Design department but not in testing.
DESIGN_EMPLOYEE −TESTING_EMPLOYEE
Rename (ρ)
Rename (ρ) – This is a unary operator used to rename the tables and columns of a relation.
When we perform self-join operation, we have to differentiate two same tables. In such case
rename operator on tables comes into picture. When we join two or more tables and if those
tables have same column names, then it is always better to rename the columns to
differentiate them. This occurs when we perform Cartesian product operation.
ρ R(E)
Where ρ is the rename operator, E is the existing relation name, and R is the new relation’s
name.
ρ STD_ID, STD_NAME, STD_ADDRESS(STUDENT) – It will rename the columns in the order the
names appear in the table
There are additional relational operations based on the above fundamental operations.
Some of them are:
Set Intersection
Set Intersection – This operation is a binary operation. It results in a relation with tuples that
are in both the relations. It is denoted by ‘∩ ‘.
R∩S
Where R and S are the relations. It picks all the tuples that are present in both R and S, and
results it in a new relation.
Suppose we have to find the employees who are working in both design and testing
department. If we have tuples as in above example, the new result relation will not have any
tuples. Suppose we have tuples like below and see the new relation after set difference. This
set intersection can also be written as a combination of set difference operations.
R ∩ S R-(R-S)
i.e.; it evaluates R-S to get the tuples which are present only in R and then it gets the record
which are present only in R but not in new resultant relation of R-S.
Assignment
Assignment – As the name indicates, the assignment operator ‘ ’ is used to assign the result
of a relational operation to temporary relational variable. This is useful when there is multiple
steps in relational operation and handling everything in one single expression is difficult.
Assigning the results into temporary relation and using this temporary relation in next
operation makes task simple and easy.
T S – denotes relation S is assigned to temporary relation T
A relational operation ∏a1, a2 (σ p (E)) with selection and projection can be divided as below.
T σ p (E)
S ∏a1, a2 (T)
Our example above in projection for getting STD_ID, ADDRESS and COURSE for the Student
‘James’ can be re-written as below.
Natural Join
Natural join – As we have seen above, cartesian product simply combines the attributes of
two relations into one. But the new relation will not have correct tuples. It has only
combinations of tuples. In order to get the correct tuples, we have to use selection operation
on the cartesian product result. This set of operations – cartesian product followed by
selection – is combined into one relation called natural join. It is denoted by ∞
R∞S
Suppose we want to select the employees who are working for department 10. Then we will
perform the cartesian product on the EMPLOYEES and DEPT and find the DEPT_ID in both
relations matching to 10. The same is done with natural join as
From the above example, we see that only the matching data from both the relations are
retained in the final relation. Suppose we want to retain all the information from first relation
and the corresponding information from the second relation irrespective of if it exists or not.
In such case we use outer join. This join makes sure all the combinations of tuples are shown
in correct way. Unlike cartesian product, this join make sure that to create a tuple from both
the table if there exists right match for them, and if there is no match null is added to those
attribute. Let see them in below types of outer join.
Division
Division – This operation is used to find the tuples with phrase ‘for all’. It is denoted by ‘÷’.
Suppose we want to see all the employees who work in all of departments. What are the steps
involved to find this?
• First we find all the department ID – T1 ∏DEPT_ID (DEPARTMENT)
• Next step is list all the employees and their departments – T2 ∏ EMP_ID,
DEPT_ID (EMPLOYEE)
In third step we will find the employees in T2 with the entire department ID in T1. This is
obtained by using division operation – T2 ÷ T1
Relational Operators are broadly divided into two group as basic operation and
derived/extended operation.
Join in DBMS is a binary operation which allows you to combine join product and selection in
one single statement. The goal of creating a join condition is that it helps you to combine the
data from two or more DBMS tables.
Inner Join
• Inner Join is used to return rows from both tables which satisfy the given condition. It
is the most widely used join operation and can be considered as a default join-type
• An Inner join or equijoin is a comparator-based join which uses equality comparisons
in the join-predicate. However, if you use other comparison operators like “>” it can’t
be called equijoin.
• Inner Join further divided into three subtypes:
Theta Join
Theta Join allows you to merge two tables based on the condition represented by theta. Theta
joins work for all comparison operators. It is denoted by symbol θ. The general case of JOIN
operation is called a Theta join.
Syntax - A ⋈θ B
Theta join can use any conditions in the selection criteria.
Table A Table B
column 1 column 2 column 1 column 2
1 1 1 1
1 2 1 3
column 1 column 2
1 2
EQUI Join
EQUI Join is done when a Theta join uses only the equivalence condition. EQUI join is the most
difficult operation to implement efficiently in an RDBMS, and one reason why RDBMS have
essential performance problems.
A ⋈ A.column 2 = B.column 2 (B)
column 1 column 2
1 1
C
Num Square
2 4
3 9
D
Num Cube
2 8
3 18
C⋈D
Num Square Cube
2 4 8
3 9 18
Outer Join
An Outer Join doesn’t require each record in the two join tables to have a matching record.
In this type of join, the table retains each record even if no other matching record exists.
Three types of Outer Joins are:
A
Num Square
2 4
3 9
4 16
B
Num Cube
2 8
3 18
5 75
A B
Num Square Cube
2 4 8
3 9 18
4 16 –
A B
Num Square Cube
2 4 8
3 9 18
4 16 –
5 – 75
Division operation
The division operator is used for queries which involve the ‘all’.
R1 ÷ R2 = tuples of R1 associated with all tuples of R2.
Retrieve the name of the subject that is taught in all courses.
Name Course
System Btech
Database Mtech
Database Btech
Algebra Btech
÷
Course
Btech
Mtech
=
Name
database
The resulting operation must have all combinations of tuples of relation S that are present in
the first relation or R.
Retrieve names of employees who work on all the projects that John Smith works on.
Consider the Employee table given below −
John 123 P1
Smith 123 P2
A 121 P3
÷
Works on the following −
123 P1 Market
123 P2 Sales
=
The result is as follows
Eno
123