Lecture - 21 22 23

Download as pdf or txt
Download as pdf or txt
You are on page 1of 11

So far, we have seen what a database is, what is the features of database, how to gather

requirements and how to put them in ER diagrams, how to convert them into tables and their
columns, set their constraints etc. Once we have database ready users will start using them.
But how will they access the database? Most of the time they access the data by using some
applications. These applications will communicate to database by SQL and DBMS is
responsible for managing the application and SQL intact. SQL has its own querying methods
to interact with database. But how these queries work in the database? These queries work
similar to relational algebra that we have in mathematics. In database we
have tables participating in relational algebra.
Relational Query Language
Relational algebra uses relational algebra to break the user requests and instruct the
DBMS to execute them. Relational Query language is used by the user to communicate
with the database. They are generally on a higher level than any other programming
language.
This is further divided into two types
• Procedural Query Language
• Non-Procedural Language

Procedural Query Language


A procedural query language will have set of queries instructing the DBMS to perform various
transactions in the sequence to meet the user request. The user instructs the system to
perform a set of operations on the database to determine the desired result s. For
example, get_CGPA procedure will have various queries to get the marks of student in each
subject, calculate the total marks, and then decide the CGPA based on his total marks. This
procedural query language tells the database what is required from the database and how to
get them from the database. Relational algebra is a procedural query language.

Non-Procedural Query Language


Non-procedural queries will have single query on one or more tables to get result from the
database. The user outlines the desired information without giving a specific procedure
for attaining the information. For example, get the name and address of the student with
particular ID will have single query on STUDENT table. Relational Calculus is a non-procedural
language which informs what to do with the tables, but doesn’t inform how to accomplish
this. Relational calculus is a non-procedural query language in which information is
retrieved from the database without specifying sequence of operation to be
performed.
Relational calculus is of two types which are as follows −
• Tuple calculus
• Domain calculus
These query languages basically will have queries on tables in the database. In the relational
database, a table is known as relation. Records / rows of the table are referred as tuples.
Columns of the table are also known as attributes. All these names are used interchangeably
in relational database.
Relational Algebra

Relational algebra is a procedural query language. It gives a step-by-step process to obtain


the result of the query. It uses operators to perform queries. This result is also considered as
a new table or relation. Suppose we have to retrieve student name, address and class for the
given ID. What a relational algebra will do in this case is, it filters the name, address and class
from the STUDENT table for the input ID. In mathematical terms, relational algebra has
produced a subset of STUDENT table for the given ID.

Relational algebra will have operators to indicate the operations. This algebra can be applied
on single relation – called unary or can be applied on two tables – called binary. While
applying the operations on the relation, the resulting subset of relation is also known as new
relation. There can be multiple steps involved in some of the operations. The subsets of
relations at the intermediary level are also known as relation.

Relational Operations

Select (σ) – This is a unary relational operation. This operation pulls the horizontal subset
(subset of rows) of the relation that satisfies the conditions. This can use operators like <, >,
<=, >=, = and != to filter the data from the relation. It can also use logical AND, OR and NOT
operators to combine the various filtering conditions. This operation can be represented as
below:
σ p (r)
Where σ is the symbol for select operation, r represents the relation/table, and p is the logical
formula or the filtering conditions to get the subset. Let us see an example as below:

σSTD_NAME = “James” (STUDENT)


What does above relation algebra do? It selects the record/tuple from the STUDENT table
with Student name as ‘James’

σdept_id = 20 AND salary>=10000 (EMPLOYEE) – Selects the records from EMPLOYEE table
with department ID = 20 and employees whose salary is more than 10000.

Project (∏)
Project (∏) – This is a unary operator and is similar to select operation above. It creates the
subset of relation based on the conditions specified. Here, it selects only selected
columns/attributes from the relation- vertical subset of relation. The select operation above
creates subset of relation but for all the attributes in the relation. It is denoted as below:
∏a1, a2, a3 (r)
Where ∏ is the operator for projection, r is the relation and a1, a2, a3 are the attributes of
the relations which will be shown in the resultant subset.

∏std_name, address, course (STUDENT) – This will select all the records from STUDENT table
but only selected columns – std_name, address and course. Suppose we have to select only
these 3 columns for particular student then we have to combine both project and select
operations.
∏STD_ID, address, course (σ STD_NAME = “James”(STUDENT)) – this selects the record for
‘James’ and displays only std_ID, address and his course columns. Here we can see two unary
operators are combined, and it has two operations performing. First it selects the tuple from
STUDENT table for ‘James’. The resultant subset of STUDENT is also considered as
intermediary relation. But it is temporary and exists till the end of this operation. It then filters
the 3 columns from this temporary relation.

Union (U)
Union (U) – It is a binary operator, which combines the tuples of two relations. It is denoted
by
RUS
Where R and S are the relations and U is the operator.

DESIGN_EMPLOYEE U TESTING_EMPLOYEE
Where DESIGN_EMPLOYEE and TESTING_EMPLOYEE are two relations.

∏ NAME (DESIGN_EMPLOYEE) ∪ ∏ NAME (TESTING_EMPLOYEE)

It is different from cartesian product in:

• Cartesian product combines the attributes of two relations into one relation
whereas Union combines the tuples of two relations into one relation.
• In Union, both relations should have same number of columns. Suppose we
have to list the employees who are working for design and testing department.
Then we will do the union on employee table. Since it is union on same table
it has same number of attributes. Cartesian product does not concentrate on
number of attribute or rows. It blindly combines the attributes.
• In Union, both relations should have same types of attributes in same order. In
the above example, since union is on employee relation, it has same type of
attribute in the same order.
It need not have same number of tuples in both the relation. If there is a duplicate tuple as a
result of union, then it keeps only one tuple. If a tuple is present in any one relation, then it
keeps that tuple in the new relation. In the above example, number of employees in design
department need not be same as employees in testing department. It combines the table
data in the order they appear in the table.

We would not able to join both these tables if the order of columns or the number of columns
were different.

Set-difference (-)
Set-difference (-) – This is a binary operator. This operator creates a new relation with tuples
that are in one relation but not in other relation. It is denoted by ‘-‘symbol.
R–S
Where R and S are the relations. Suppose we want to retrieve the employees who are working
in Design department but not in testing.

DESIGN_EMPLOYEE −TESTING_EMPLOYEE

Cartesian product (X)


Cartesian product (X): – This is a binary operator. It combines the tuples of two relations into
one relation.
RXS
Where R and S are two relations and X is the operator. If relation R has m tuples and relation
S has n tuples, then the resultant relation will have mn tuples. For example, if we perform
cartesian product on EMPLOYEE (5 tuples) and DEPT relations (3 tuples), then we will have
new tuple with 15 tuples.
EMPLOYEE X DEPT
This operator will simply create a pair between the tuples of each table. i.e.; each employee
in the EMPLOYEE table will be mapped with each department in DEPT table.

Rename (ρ)
Rename (ρ) – This is a unary operator used to rename the tables and columns of a relation.
When we perform self-join operation, we have to differentiate two same tables. In such case
rename operator on tables comes into picture. When we join two or more tables and if those
tables have same column names, then it is always better to rename the columns to
differentiate them. This occurs when we perform Cartesian product operation.
ρ R(E)
Where ρ is the rename operator, E is the existing relation name, and R is the new relation’s
name.

ρ STUDENT (STD_TABLE) – Renames STD_TABLE table to STUDENT


Let us see another example to rename the columns of the table. If the STUDENT table has ID,
NAME and ADDRESS columns and if they have to be renamed to STD_ID, STD_NAME,
STD_ADDRESS, then we have to write as follows.

ρ STD_ID, STD_NAME, STD_ADDRESS(STUDENT) – It will rename the columns in the order the
names appear in the table

There are additional relational operations based on the above fundamental operations.
Some of them are:
Set Intersection
Set Intersection – This operation is a binary operation. It results in a relation with tuples that
are in both the relations. It is denoted by ‘∩ ‘.
R∩S
Where R and S are the relations. It picks all the tuples that are present in both R and S, and
results it in a new relation.

Suppose we have to find the employees who are working in both design and testing
department. If we have tuples as in above example, the new result relation will not have any
tuples. Suppose we have tuples like below and see the new relation after set difference. This
set intersection can also be written as a combination of set difference operations.

R ∩ S R-(R-S)
i.e.; it evaluates R-S to get the tuples which are present only in R and then it gets the record
which are present only in R but not in new resultant relation of R-S.

In above example of employees,


DESIGN_EMPLOYEE – (DESIGN_EMPLOYEE – TESTING_EMPLOYEE)
It first filters only those employees who are only design employees – (104, Kathy). This result
is then used to find the difference with design employee. This will find those employees who
are design employees but not in new result – (100, James). Thus it gives the result tuple which
is both designer and tester. We can see here fundamental relational operator is used twice to
get set intersection. Hence this operation is not fundamental operation.

Assignment
Assignment – As the name indicates, the assignment operator ‘ ’ is used to assign the result
of a relational operation to temporary relational variable. This is useful when there is multiple
steps in relational operation and handling everything in one single expression is difficult.
Assigning the results into temporary relation and using this temporary relation in next
operation makes task simple and easy.
T S – denotes relation S is assigned to temporary relation T
A relational operation ∏a1, a2 (σ p (E)) with selection and projection can be divided as below.
T σ p (E)
S ∏a1, a2 (T)
Our example above in projection for getting STD_ID, ADDRESS and COURSE for the Student
‘James’ can be re-written as below.

∏STD_ID, address, course (σ STD_NAME = “James”(STUDENT))



T σ STD_NAME = “James”(STUDENT)
S ∏STD_ID, address, course (T)

Natural Join
Natural join – As we have seen above, cartesian product simply combines the attributes of
two relations into one. But the new relation will not have correct tuples. It has only
combinations of tuples. In order to get the correct tuples, we have to use selection operation
on the cartesian product result. This set of operations – cartesian product followed by
selection – is combined into one relation called natural join. It is denoted by ∞
R∞S
Suppose we want to select the employees who are working for department 10. Then we will
perform the cartesian product on the EMPLOYEES and DEPT and find the DEPT_ID in both
relations matching to 10. The same is done with natural join as

σ EMPLOYEE.DEPT_ID = DEPT>DEPT_ID AND EMPLOYEE.DEPT_ID = 10(EMPLOYEE X


DEPT)
Same can be written using natural join as EMPLOYEE ∞ DEPT

From the above example, we see that only the matching data from both the relations are
retained in the final relation. Suppose we want to retain all the information from first relation
and the corresponding information from the second relation irrespective of if it exists or not.
In such case we use outer join. This join makes sure all the combinations of tuples are shown
in correct way. Unlike cartesian product, this join make sure that to create a tuple from both
the table if there exists right match for them, and if there is no match null is added to those
attribute. Let see them in below types of outer join.

There are three types of outer joins


Left Outer Join
Left outer join – In this operation, all the tuples in the left hand side relation is retained. All
matching attribute in the right hand relation is displayed with values and the ones which do
not have value are shown as NULL.
Below example of left outer join on DEPT and EMPLOYEE table combines the matching
combination of DEPT_ID = 10 with values. But DEPT_ID = 30 does not have any employees
yet. Hence it displays NULL for those employees. Thus this outer join makes more meaningful
to combining two relations than a cartesian product.

Right outer join


Right outer join – This is opposite of left outer join. Here all the attributes of right-hand side
is retained and it matching attribute in left hand relation is found and displayed. If no
matching is found then null is displayed. Same above example is re-written to understand this
as below:
Notice the order and column difference in both the cases.

Full Outer Join


Full outer join – This is the combination of both left and right outer join. It displays all the
attributes from both the relation. If the matching attribute exists in other relation, then that
will be displayed, else those attributes are shown as null.
Hope above diagram is self-explanatory.

Division
Division – This operation is used to find the tuples with phrase ‘for all’. It is denoted by ‘÷’.
Suppose we want to see all the employees who work in all of departments. What are the steps
involved to find this?
• First we find all the department ID – T1 ∏DEPT_ID (DEPARTMENT)
• Next step is list all the employees and their departments – T2 ∏ EMP_ID,
DEPT_ID (EMPLOYEE)
In third step we will find the employees in T2 with the entire department ID in T1. This is
obtained by using division operation – T2 ÷ T1
Relational Operators are broadly divided into two group as basic operation and
derived/extended operation.
Join in DBMS is a binary operation which allows you to combine join product and selection in
one single statement. The goal of creating a join condition is that it helps you to combine the
data from two or more DBMS tables.

There are mainly two types of joins in DBMS:

1. Inner Joins: Theta, Natural, EQUI


2. Outer Join: Left, Right, Full

Inner Join
• Inner Join is used to return rows from both tables which satisfy the given condition. It
is the most widely used join operation and can be considered as a default join-type
• An Inner join or equijoin is a comparator-based join which uses equality comparisons
in the join-predicate. However, if you use other comparison operators like “>” it can’t
be called equijoin.
• Inner Join further divided into three subtypes:

Theta Join
Theta Join allows you to merge two tables based on the condition represented by theta. Theta
joins work for all comparison operators. It is denoted by symbol θ. The general case of JOIN
operation is called a Theta join.
Syntax - A ⋈θ B
Theta join can use any conditions in the selection criteria.

Table A Table B
column 1 column 2 column 1 column 2
1 1 1 1
1 2 1 3

A ⋈ A.column 2 > B.column 2 (B)

column 1 column 2
1 2

EQUI Join
EQUI Join is done when a Theta join uses only the equivalence condition. EQUI join is the most
difficult operation to implement efficiently in an RDBMS, and one reason why RDBMS have
essential performance problems.
A ⋈ A.column 2 = B.column 2 (B)
column 1 column 2
1 1

Natural Join (⋈)


Natural Join does not utilize any of the comparison operators. In this type of join, the
attributes should have the same name and domain. In Natural Join, there should be at least
one common attribute between two relations.
It performs selection forming equality on those attributes which appear in both relations and
eliminates the duplicate attributes.

C
Num Square
2 4
3 9

D
Num Cube
2 8
3 18

C⋈D
Num Square Cube
2 4 8
3 9 18

Outer Join
An Outer Join doesn’t require each record in the two join tables to have a matching record.
In this type of join, the table retains each record even if no other matching record exists.
Three types of Outer Joins are:

• Left Outer Join


• Right Outer Join
• Full Outer Join

Left Outer Join (A B)


Left Outer Join returns all the rows from the table on the left even if no matching rows have
been found in the table on the right. When no matching record is found in the table on the
right, NULL is returned.

A
Num Square
2 4
3 9
4 16
B
Num Cube
2 8
3 18
5 75

A B
Num Square Cube
2 4 8
3 9 18
4 16 –

Right Outer Join ( A B)


Right Outer Join returns all the columns from the table on the right even if no matching rows
have been found in the table on the left. Where no matches have been found in the table on
the left, NULL is returned. RIGHT outer JOIN is the opposite of LEFT JOIN
In our example, let’s assume that you need to get the names of members and movies rented
by them. Now we have a new member who has not rented any movie yet.
A B
Num Cube Square
2 8 4
3 18 9
5 75 –

Full Outer Join ( A B)


In a Full Outer Join , all tuples from both relations are included in the result, irrespective of
the matching condition.

A B
Num Square Cube
2 4 8
3 9 18
4 16 –
5 – 75

Division operation
The division operator is used for queries which involve the ‘all’.
R1 ÷ R2 = tuples of R1 associated with all tuples of R2.
Retrieve the name of the subject that is taught in all courses.

Name Course

System Btech

Database Mtech

Database Btech

Algebra Btech

÷
Course

Btech

Mtech
=

Name

database

The resulting operation must have all combinations of tuples of relation S that are present in
the first relation or R.
Retrieve names of employees who work on all the projects that John Smith works on.
Consider the Employee table given below −

Name Eno Pno

John 123 P1

Smith 123 P2

A 121 P3
÷
Works on the following −

Eno Pno Pname

123 P1 Market

123 P2 Sales
=
The result is as follows

Eno

123

The expression is as follows


Smith <- ΠPno(σEname = ‘john smith’ (employee * works on Pno=Eno))

You might also like