Relational Algebra

Download as doc, pdf, or txt
Download as doc, pdf, or txt
You are on page 1of 17

Relational Algebra - Example

Consider the following SQL to find which departments have had employees on the `Further Accounting'
course.
SELECT DISTINCT dname
FROM department, course, empcourse, employee
WHERE cname = `Further Accounting'
AND course.courseno = empcourse.courseno
AND empcourse.empno = employee.empno
AND employee.depno = department.depno;
The equivalent relational algebra is
PROJECTdname (department JOINdepno = depno (
PROJECTdepno (employee JOINempno = empno (
PROJECTempno (empcourse JOINcourseno = courseno (
PROJECTcourseno (SELECTcname = `Further Accounting' course)
))
))
))
Symbolic Notation
From the example, one can see that for complicated cases a large amount of the answer is formed from
operator names, such as PROJECT and JOIN. It is therefore commonplace to use symbolic notation to
represent the operators.
• SELECT ->σ (sigma)
• PROJECT -> π(pi)
• PRODUCT -> ×(times)
• JOIN -> |×| (bow-tie)
• UNION -> ∪ (cup)
• INTERSECTION -> ∩(cap)
• DIFFERENCE -> - (minus)
• RENAME ->ρ (rho)
Usage
The symbolic operators are used as with the verbal ones. So, to find all employees in department 1:
SELECTdepno = 1(employee)
becomes σdepno = 1(employee)

1
Conditions can be combined together using ^ (AND) and v (OR). For example, all employees in department 1
called `Smith':
SELECTdepno = 1 ^ surname = `Smith'(employee)
becomes σdepno = 1 ^ surname = `Smith'(employee)
The use of the symbolic notation can lend itself to brevity. Even better, when the JOIN is a natural join, the
JOIN condition may be omitted from |x|. The earlier example resulted in:
PROJECTdname (department JOINdepno = depno (
PROJECTdepno (employee JOINempno = empno (
PROJECTempno (empcourse JOINcourseno = courseno (
PROJECTcourseno (SELECTcname = `Further Accounting' course)))))))
becomes
πdname (department |×| (
πdepno (employee |×| (
πempno (empcourse |×| (
πcourseno (σcname = `Further Accounting' course) ))))))

Rename Operator
The rename operator returns an existing relation under a new name. ρA(B) is the relation B with its name
changed to A. For example, find the employees in the same Department as employee 3.
ρemp2.surname,emp2.forenames (
σemployee.empno = 3 ^ employee.depno = emp2.depno (
employee × (ρemp2employee)
)
)
Derivable Operators
• Fundamental operators:σ, π, ×, ∪, -, ρ
• Derivable operators: |×|,∩

A ∩ B ⇔ A - (A - B)

Equivalence

2
A|×|cB ⇔ πa1,a2,...aN(σc(A × B))
• where c is the join condition (eg A.a1 = B.a1),
• and a1,a2,...aN are all the attributes of A and B without repetition.
c is called the join-condition, and is usually the comparison of primary and foreign key. Where there are N
tables, there are usually N-1 join-conditions. In the case of a natural join, the conditions can be missed out, but
otherwise missing out conditions results in a cartesian product (a common mistake to make).
Equivalences
The same relational algebraic expression can be written in many different ways. The order in which tuples
appear in relations is never significant.
• A ×B ⇔ B × A
• A∩B⇔B∩A
• A ∪B ⇔ B ∪ A
• (A - B) is not the same as (B - A)
• σc1 (σc2(A)) ⇔ σc2 (σc1(A)) ⇔ σc1 ^ c2(A)
• πa1(A) ⇔ πa1(πa1,etc(A))
where etc represents any other attributes of A.
• many other equivalences exist.
While equivalent expressions always give the same result, some may be much easier to evaluate that others.
When any query is submitted to the DBMS, its query optimiser tries to find the most efficient equivalent
expression before evaluating it.
Comparing RA and SQL
• Relational algebra:
• is closed (the result of every expression is a relation)
• has a rigorous foundation
• has simple semantics
• is used for reasoning, query optimisation, etc.
• SQL:
• is a superset of relational algebra
• has convenient formatting features, etc.
• provides aggregate functions
• has complicated semantics
• is an end-user language.
Comparing RA and SQL
Any relational language as powerful as relational algebra is called relationally complete.

3
A relationally complete language can perform all basic, meaningful operations on relations.
Since SQL is a superset of relational algebra, it is also relationally complete.

Relational algebra
Relational algebra, an offshoot of first-order logic (and of algebra of sets), deals with a set of finitary relations (see
also relation (database)) which is closed under certain operators. These operators operate on one or more relations to
yield a relation. Relational algebra is a part ofcomputer science.

Introduction
Relational algebras received little attention until the publication of E.F. Codd's relational model of data in 1970. Codd
proposed such an algebra as a basis for database query languages. (See "Implementations" below.)

Relational algebra is essentially equivalent in expressive power to relational calculus (and thus first-order logic); this
result is known asCodd's theorem. Some care, however, has to be taken to avoid a mismatch that may arise between
the two languages since negation, applied to a formula of the calculus, constructs a formula that may be true on an
infinite set of possible tuples, while the difference operator of relational algebra always returns a finite result. To
overcome these difficulties, Codd restricted the operands of relational algebra to finiterelations only and also proposed
restricted support for negation (NOT) and disjunction (OR). Analogous restrictions are found in many other logic-based
computer languages. Codd defined the term relational completeness to refer to a language that is complete with
respect to first-order predicate calculus apart from the restrictions he proposed. In practice the restrictions have no
adverse effect on the applicability of his relational algebra for database purposes.

[edit]Primitive operations
As in any algebra, some operators are primitive and the others, being definable in terms of the primitive ones, are
derived. It is useful if the choice of primitive operators parallels the usual choice of primitive logical operators. Although it
is well known that the usual choice in logic of AND, OR and NOT is somewhat arbitrary, Codd made a similar arbitrary
choice for his algebra.

The six primitive operators of Codd's algebra are the selection, the projection, the Cartesian product (also called
the cross product or cross join), the set union, the set difference, and the rename. (Actually, Codd omitted the rename,
but the compelling case for its inclusion was shown by the inventors of ISBL.) These six operators are fundamental in the
sense that none of them can be omitted without losing expressive power. Many other operators have been defined in
terms of these six. Among the most important are set intersection, division, and the natural join. In fact ISBL made a
compelling case for replacing the Cartesian product with the natural join, of which the Cartesian product is a degenerate
case.

Altogether, the operators of relational algebra have identical expressive power to that of domain relational
calculus or tuple relational calculus. However, for the reasons given in the Introduction above, relational algebra has
strictly less expressive power than that of first-order predicate calculus without function symbols. Relational algebra
actually corresponds to a subset of first-order logic that is Horn clauses withoutrecursion and negation.

4
[edit]Set operators
Although three of the six basic operators are taken from set theory, there are additional constraints that are present in
their relational algebracounterparts: For set union and set difference, the two relations involved must be union-
compatible—that is, the two relations must have the same set of attributes. As set intersection can be defined in terms of
set difference, the two relations involved in set intersection must also be union-compatible.

The Cartesian product is defined differently from the one defined in set theory in the sense that tuples are considered to
be 'shallow' for the purposes of the operation. That is, unlike in set theory, where the Cartesian product of a n-tuple by
an m-tuple is a set of 2-tuples, the Cartesian product in relational algebra has the 2-tuple "flattened" into an n+m-tuple.
More formally, R × S is defined as follows:

R × S = {(r1, r2, ..., rn, s1, s2, ..., sm) | (r1, r2, ..., rn) ∈ R, (s1, s2, ..., sm) ∈ S}

Like the Cartesian product, the cardinality of the result is the product of the cardinalities of its factors, i.e., |R × S| = |R| ×
|S|. In addition, for the Cartesian product to be defined, the two relations involved must have disjoint headers—that is,
they must not have a common attribute name.

[edit]Projection (π)
Main article: Projection (relational algebra)

A projection is a unary operation written as where a1,...,an is a set of attribute names. The result of
such projection is defined as the set that is obtained when all tuples in R are restricted to the set {a1,...,an}.

[edit]Selection (σ)
Main article: Selection (relational algebra)

A generalized selection is a unary operation written as where is a propositional formula that consists
of atoms as allowed in the normal selection and the logical operators (and), (or) and (negation). This
selection selects all those tuples in R for which holds.

[edit]Rename ( ρ)
Main article: Rename (relational algebra)

A rename is a unary operation written as ρa / b(R) where the result is identical to R except that the b field in all
tuples is renamed to an afield. This is simply used to rename the attribute of a relation or the relation itself.

[edit]Joins and join-like operators


[edit]Natural join (⋈)
Natural join ( ) is a binary operator that is written as (R S) where R and S are relations.[1] The result of the
natural join is the set of all combinations of tuples in R and S that are equal on their common attribute names. For
an example consider the tables Employee and Deptand their natural join:

Employee Dept Employee Dept

5
Name EmpId DeptName DeptNam Name EmpId DeptName Manager
Manager
e

Harry 3415 Finance Harry 3415 Finance George


Finance George

Sally 2241 Sales Sally 2241 Sales Harriet


Sales Harriet

George 3401 Finance George 3401 Finance George


Production Charles

Harriet 2202 Sales Harriet 2202 Sales Harriet

This can also be used to define composition of relations. In category theory, the join is precisely the fiber product.

The natural join is arguably one of the most important operators since it is the relational counterpart of logical AND. Note
carefully that if the same variable appears in each of two predicates that are connected by AND, then that variable
stands for the same thing and both appearances must always be substituted by the same value. In particular, natural join
allows the combination of relations that are associated by a foreign key. For example, in the above example a foreign
key probably holds from Employee.DeptName to Dept.DeptName and then the natural join
of Employee and Dept combines all employees with their departments. Note that this works because the foreign key
holds between attributes with the same name. If this is not the case such as in the foreign key
from Dept.manager to Emp.emp-number then we have to rename these columns before we take the natural join. Such a
join is sometimes also referred to as an equijoin (see θ-join).

More formally the semantics of the natural join is defined as follows:

where Fun is a predicate that is true for a relation r if and only if r is a function. It is usually required that R and S must
have at least one common attribute, but if this constraint is omitted then the natural join becomes exactly the Cartesian
product.

The natural join can be simulated with Codd's primitives as follows. Assume that b1,...,bm are the attribute names
common to R, S, a1,...,anare the attribute names unique to R and c1,...,ck are the attribute unique to S. Furthermore
assume that the attribute names d1,...,dm are neither in R nor in S. In a first step we can now rename the common
attribute names in S:

Then we take the Cartesian product and select the tuples that are to be joined:

Finally we take a projection to get rid of the renamed attributes:

6
θ-join and equijoin
Consider tables Car and Boat which list models of cars and boats and their respective prices. Suppose a customer
wants to buy a car and a boat, but she doesn't want to spend more money for the boat than for the car. The θ-join on
the relation CarPrice ≥ BoatPrice produces a table with all the possible options.In case you are using a condition where
the attributes are equal the like Price the condition may be specified as Price=Price or alternatively (Price) itself.

Car Boat

CarModel CarPrice BoatModel BoatPrice

CarModel CarPrice BoatModel BoatPrice

CarA 20'000 Boat1 10'000


CarA 20'000 Boat1 10'000

CarB 30'000 Boat2 40'000

CarB 30'000 Boat1 10'000

CarC 50'000 Boat3 60'000


CarC 50'000 Boat1 10'000

CarC 50'000 Boat2 40'000

If we want to combine tuples from two relations where the combination condition is not simply the equality of shared
attributes then it is convenient to have a more general form of join operator, which is the θ-join (or theta-join). The θ-join

is a binary operator that is written as or where a and b are attribute names, θ is a binary
relation in the set {<, ≤, =, >, ≥}, v is a value constant, and Rand S are relations. The result of this operation consists of
all combinations of tuples in R and S that satisfy the relation θ. The result of the θ-join is defined only if the headers
of S and R are disjoint, that is, do not contain a common attribute.

The simulation of this operation in the fundamental operations is therefore as follows:

R φ S = σφ(R × S)

In case the operator θ is the equality operator (=) then this join is also called an equijoin.

Note, however, that a computer language that supports the natural join and rename operators does not need θ-join as
well, as this can be achieved by selection from the result of a natural join (which degenerates to Cartesian product when
there are no shared attributes).

7
Semijoin (⋉)(⋊)
The semijoin is joining similar to the natural join and written as R S where R and S are relations.[2] The result of the
semijoin is only the set of all tuples in R for which there is a tuple in S that is equal on their common attribute names. For
an example consider the tables Employeeand Dept and their semi join:

Employee Dept Employee Dept

Name EmpId DeptName DeptNam Name EmpId DeptName


Manager
e

Harry 3415 Finance Sally 2241 Sales


Sales Harriet

Sally 2241 Sales Harriet 2202 Production


Production Charles

George 3401 Finance

Harriet 2202 Production

More formally the semantics of the semijoin is defined as follows:

R S={t:t R, s S, Fun (t s) }

where Fun(r) is as in the definition of natural join.

The semijoin can be simulated using the natural join as follows. If a1, ..., an are the attribute names of R, then

R S = Πa1,..,an(R S).

Since we can simulate the natural join with the basic operators it follows that this also holds for the semijoin.

Antijoin (▷)
The antijoin, written as R S where R and S are relations, is similar to the natural join, but the result of an antijoin is
only those tuples in Rfor which there is no tuple in S that is equal on their common attribute names. [3]

For an example consider the tables Employee and Dept and their antijoin:

Employee Dept Employee Dept

Name EmpId DeptName DeptNam Name EmpId DeptName


Manager
e

8
Harry 3415 Finance Sales Harriet Harry 3415 Finance

Sally 2241 Sales Production Charles George 3401 Finance

George 3401 Finance

Harriet 2202 Production

The antijoin is formally defined as follows:

R S={t:t R s S : Fun (t s) }

or

R S={t:t R, there is no tuple s of S that satisfies Fun (t s) }

where Fun(r) is as in the definition of natural join.

The antijoin can also be defined as the complement of the semijoin, as follows:

R S=R-R S

Given this, the antijoin is sometimes called the anti-semijoin, and the antijoin operator is sometimes written as semijoin
symbol with a bar above it, instead of .

Division (÷)
The division is a binary operation that is written as R ÷ S. The result consists of the restrictions of tuples in R to the
attribute names unique toR, i.e., in the header of R but not in the header of S, for which it holds that all their
combinations with tuples in S are present in R. For an example see the tables Completed, DBProject and their division:

Completed DBProjec Completed÷DBProjec


t t

Student Task
Task Student

Fred Database1
Database1 Fred

Fred Database2
Database2 Sara

Fred Compiler1

9
Eugene Database1

Eugene Compiler1

Sara Database1

Sara Database2

If DBProject contains all the tasks of the Database project then the result of the division above contains exactly all the
students that have completed both of the tasks in the Database project.

More formally the semantics of the division is defined as follows:

R ÷ S = { t[a1,...,an] : t R s S ( (t[a1,...,an] s) R) }

where {a1,...,an} is the set of attribute names unique to R and t[a1,...,an] is the restriction of t to this set. It is usually
required that the attribute names in the header of S are a subset of those of R because otherwise the result of the
operation will always be empty.

The simulation of the division with the basic operations is as follows. We assume that a1,...,an are the attribute names
unique to R andb1,...,bm are the attribute names of S. In the first step we project R on its unique attribute names and
construct all combinations with tuples in S:

T := πa1,...,an(R) × S

In the prior example, T would represent a table such that every Student (because Student is the unique key / attribute of
the Completed table) is combined with every given Task. So Eugene, for instance, would have two rows, Eugene ->
Database1 and Eugene -> Database2 in T.

In the next step we subtract R from this relation:

U := T - R

Note that in U we have the possible combinations that "could have" been in R, but weren't. So if we now take the
projection on the attribute names unique to R then we have the restrictions of the tuples in R for which not all
combinations with tuples in S were present in R:

V := πa1,...,an(U)

So what remains to be done is take the projection of R on its unique attribute names and subtract those in V:

W := πa1,...,an(R) - V

10
Outer joins
Whereas the result of a join (or inner join) consists of tuples formed by combining matching tuples in the two operands,
an outer join contains those tuples and additionally some tuples formed by extending an unmatched tuple in one of the
operands by "fill" values for each of the attributes of the other operand.

The operators defined in this section assume the existence of a null value, ω, which we do not define, to be used for the
fill values. It should not be assumed that this is the NULL defined for the database language SQL, nor should it be
assumed that ω is a mark rather than a value, nor should it be assumed that the controversial three-valued logic is
introduced by it.

Three outer join operators are defined: left outer join, right outer join, and full outer join. (The word "outer" is sometimes
omitted.)
[edit]Left outer join ( ⟕)

The left outer join is written as R ⟕ S where R and S are relations.[4] The result of the left outer join is the set of all
combinations of tuples inR and S that are equal on their common attribute names, in addition (loosely speaking) to
tuples in R that have no matching tuples in S.

For an example consider the tables Employee and Dept and their left outer join:

Employee Dept Employee ⟕ Dept

Name EmpId DeptName DeptNam Name EmpId DeptName Manager


Manager
e

Harry 3415 Finance Harry 3415 Finance ω


Sales Harriet

Sally 2241 Sales Sally 2241 Sales Harriet


Production Charles

George 3401 Finance George 3401 Finance ω

Harriet 2202 Sales Harriet 2202 Sales Harriet

Tim 1123 Executive Tim 1123 Executive ω

In the resulting relation, tuples in S which have no common values in common attribute names with tuples in R take
a null value, ω.

Since there are no tuples in Dept with a DeptName of Finance or Executive, ωs occur in the resulting relation where
tuples in DeptName have tuples of Finance or Executive.

11
Let r1, r2, ..., rn be the attributes of the relation R and let {(ω, ..., ω)} be the singleton relation on the attributes that
are unique to the relationS (those that are not attributes of R). Then the left outer join can be described in terms of the
natural join (and hence using basic operators) as follows:

[edit]Right outer join ( ⟕)

The right outer join behaves almost identically to the left outer join, but the roles of the tables are switched.

The right outer join of relations R and S is written as R ⟕ S.[5] The result of the right outer join is the set of all
combinations of tuples in R andS that are equal on their common attribute names, in addition to tuples in S that
have no matching tuples in R.

For example consider the tables Employee and Dept and their right outer join:

Employee Dept Employee ⟕ Dept

Name EmpId DeptName DeptNam Name EmpId DeptName Manager


Manager
e

Harry 3415 Finance Sally 2241 Sales Harriet


Sales Harriet

Sally 2241 Sales Harriet 2202 Sales Harriet


Production Charles

George 3401 Finance ω ω Production Charles

Harriet 2202 Sales

Tim 1123 Executive

In the resulting relation, tuples in R which have no common values in common attribute names with tuples in S take
a null value, ω.

Since there are no tuples in Employee with a DeptName of Production, ωs occur in the Name attribute of the
resulting relation where tuples inDeptName had tuples of Production.

Let s1, s2, ..., sn be the attributes of the relation S and let {(ω, ..., ω)} be the singleton relation on the attributes that
are unique to the relationR (those that are not attributes of S). Then, as with the left outer join, the right outer join
can be simulated using the natural join as follows:

12
Full Outer join ( ⟕)

The outer join or full outer join in effect combines the results of the left and right outer joins.

The full outer join is written as R ⟕ S where R and S are relations.[6] The result of the full outer join is the set of all
combinations of tuples inR and S that are equal on their common attribute names, in addition to tuples in S that have no
matching tuples in R and tuples in R that have no matching tuples in S in their common attribute names.

For an example consider the tables Employee and Dept and their full outer join:

Employee Dept Employee =X= Dept

Name EmpId DeptName DeptNam Name EmpId DeptName Manager


Manager
e

Harry 3415 Finance Harry 3415 Finance ω


Sales Harriet

Sally 2241 Sales Sally 2241 Sales Harriet


Production Charles

George 3401 Finance George 3401 Finance ω

Harriet 2202 Sales Harriet 2202 Sales Harriet

Tim 1123 Executive Tim 1123 Executive ω

ω ω Production Charles

In the resulting relation, tuples in R which have no common values in common attribute names with tuples in S take
a null value, ω. Tuples inS which have no common values in common attribute names with tuples in R also take
a null value, ω.

The full outer join can be simulated using the left and right outer joins (and hence the natural join and set union) as
follows:

R ⟕ S = (R ⟕ S) (R ⟕ S)
Operations for domain computations
The aggregation operation
There are five aggregate functions that are included with most databases. These operations are Sum, Count, Average,
Maximum and Minimum. In relational algebra the aggregation operation over a schema (A1, A2, ... An) is written as
follows:
13
G1, G2, ..., Gm g f1(A1'), f2(A2'), ..., fk(Ak') (r)

where each Aj', 1 ≤ j ≤ k, is one of the original attributes Ai, 1 ≤ i ≤ n.

The attributes preceding the g are grouping attributes, which function like a "group by" clause in SQL. Then there are an
arbitrary number of aggregation functions applied to individual attributes. The operation is applied to an arbitrary
relation r. The grouping attributes are optional, and if they are not supplied, the aggregation functions are applied across
the entire relation to which the operation is applied.

Let's assume that we have a table named Account with three columns, namely Account_Number, Branch_Name and
Balance. We wish to find the maximum balance of each branch. This is accomplished by Branch_NameGMax(Balance)(Account).
To find the highest balance of all accounts regardless of branch, we could simply write GMax(Balance)(Account).

The extend operation


The extend operation is used for computed attributes. For example, if you have an attribute that contains the price of
some item and another attribute that contains the quantity of that item, you can form a computed attribute: EXTEND
(price * quantity) AS Total_Price.

Limitation of relational algebra


Although relational algebra seems powerful enough for most practical purposes, there are some simple and natural
operators on relationswhich cannot be expressed by relational algebra. The transitive closure of a binary relation is one
of them. Given a domain D, let binary relation R be a subset of DxD. The transitive closure R+ of R is the smallest subset
of DxD containing R which satifies the following condition:

x y z ((x,y) R+ (y,z) R+ (x,z) R+)

It can be proven that there is no relational algebra expression E(R) taking R as a variable argument which
produces R+. The proof is based on the fact that, given a relational expression E for which it is claimed that E(R)
= R+, where R is a variable, we can always find an instance r ofR (and a corresponding domain d) such that E(r)
≠ r+.[7]

Use of algebraic properties for query optimization


Queries can be represented as a tree, where

 the internal nodes are operators,


 leaves are relations,
 subtrees are subexpressions.

Our primary goal is to transform expression trees into equivalent expression trees, where the average size of the
relations yielded by subexpressions in the tree are smaller than they were before the optimization. Our secondary goal is
to try to form common subexpressions within a single query, or if there are more than one queries being evaluated at the
same time, in all of those queries. The rationale behind that second goal is that it is enough to compute common
subexpressions once, and the results can be used in all queries that contain that subexpression.

Here we present a set of rules, that can be used in such transformations.


14
[edit]Selection

Rules about selection operators play the most important role in query optimization. Selection is an operator that very
effectively decreases the number of rows in its operand, so if we manage to move the selections in an expression tree
towards the leaves, the internal relations(yielded by subexpressions) will likely shrink.
[edit]Basic selection properties

Selection is idempotent (multiple applications of the same selection have no additional effect beyond the first one),
and commutative (the order selections are applied in has no effect on the eventual result).

1.
2.

Breaking up selections with complex conditions

A selection whose condition is a conjunction of simpler conditions is equivalent to a sequence of selections with
those same individual conditions, and selection whose condition is a disjunction is equivalent to a union of
selections. These identities can be used to merge selections so that fewer selections need to be evaluated, or to
split them so that the component selections may be moved or optimized separately.

1.
2.

Selection and cross product

Cross product is the costliest operator to evaluate. If the input relations have N and M rows, the result will
contain NM rows. Therefore it is very important to do our best to decrease the size of both operands before applying the
cross product operator.

This can be effectively done, if the cross product is followed by a selection operator, e.g. σA(R × P). Considering the
definition of join, this is the most likely case. If the cross product is not followed by a selection operator, we can try to
push down a selection from higher levels of the expression tree using the other selection rules.

In the above case we break up condition A into conditions B, C and D using the split rules about complex selection
conditions, so that A = B C D and B only contains attributes from R, C contains attributes only
from P and D contains the part of A that contains attributes from both R and P. Note, that B, C or D are possibly empty.
Then the following holds:

Selection and set operators

Selection is distributive over the setminus, intersection, and union operators. The following three rules are used to push
selection below set operations in the expression tree. Note, that in the setminus and the intersection operators it is
possible to apply the selection operator to only one of the operands after the transformation. This can make sense in
cases, where one of the operands is small, and the overhead of evaluating the selection operator outweighs the benefits
of using a smaller relation as an operand.

15
1.
2.
3.

Selection and projection

Selection is associative with projection if and only if the fields referenced in the selection condition are a subset of the
fields in the projection. Performing selection before projection may be useful if the operand is a cross product or join. In
other cases, if the selection condition is relatively expensive to compute, moving selection outside the projection may
reduce the number of tuples which must be tested (since projection may produce fewer tuples due to the elimination of
duplicates resulting from elided fields).

[edit]Projection
[edit]Basic projection properties

Projection is idempotent, so that a series of (valid) projections is equivalent to the outermost projection.

[edit]Projection and set operators

Projection is distributive over set union.

Projection does not distribute over intersection and set difference. Counterexamples are given by:

and

where b is assumed to be distinct from b'.

Rename
Basic rename properties

Successive renames of a variable can be collapsed into a single rename. Rename operations which have no variables in
common can be arbitrarily reordered with respect to one another, which can be exploited to make successive renames
adjacent so that they can be collapsed.

1.
2.

Rename and set operators

Rename is distributive over set difference, union, and intersection.

16
1.
2.
3.

17

You might also like