Query Processing PDF
Query Processing
Practice Exercises
12.1 Assume (for simplicity in this exercise) that only one tuple fits in a block
and memory holds at most 3 blocks. Show the runs created on each pass
of the sort-merge algorithm, when applied to sort the following tuples on
the first attribute: (kangaroo, 17), (wallaby, 21), (emu, 1), (wombat, 13),
(platypus, 3), (lion, 8), (warthog, 4), (zebra, 11), (meerkat, 6), (hyena, 9),
(hornbill, 2), (baboon, 12).
Answer: We will refer to the tuples (kangaroo, 17) through (baboon, 12)
using tuple numbers t1 through t12 . We refer to the j th run used by the i th
pass, as ri j . The initial sorted runs have three blocks each. They are:
r11 = {t3 , t1 , t2 }
r12 = {t6 , t5 , t4 }
r13 = {t9 , t7 , t8 }
r14 = {t12 , t11 , t10 }
Each pass merges three runs. Therefore the runs after the end of the first
pass are:
r21 = {t3 , t1 , t6 , t9 , t5 , t2 , t7 , t4 , t8 }
r22 = {t12 , t11 , t10 }
At the end of the second pass, the tuples are completely sorted into one
12.2 Consider the bank database of Figure 12.13, where the primary keys are
underlined, and the following SQL query:
This expression performs the theta join on the smallest amount of data
possible. It does this by restricting the right hand side operand of the join
to only those branches in Brooklyn, and also eliminating the unneeded
attributes from both the operands.
12.3 Let relations r1 (A, B, C) and r2 (C, D, E) have the following properties: r1
has 20,000 tuples, r2 has 45,000 tuples, 25 tuples of r1 fit on one block, and
30 tuples of r2 fit on one block. Estimate the number of block transfers and
seeks required, using each of the following join strategies for r1 1 r2 :
a. Nested-loop join.
b. Block nested-loop join.
c. Merge join.
d. Hash join.
r1 needs 800 blocks, and r2 needs 1500 blocks. Let us assume M pages
of memory. If M > 800, the join can easily be done in 1500 + 800 disk
accesses, using even plain nested-loop join. So we consider only the case
where M 800 pages.
a. Nested-loop join:
Using r1 as the outer relation we need 20000 1500 + 800 =
30, 000, 800 disk accesses, if r2 is the outer relation we need 45000
800 + 1500 = 36, 001, 500 disk accesses.
c. Merge-join:
Assuming that r1 and r2 are not initially sorted on the join key, the to-
tal sorting cost inclusive of the output is Bs = 1500(2log M1 (1500/M)+
Using the branch-city index, we can retrieve all tuples with branch-city
value greater than or equal to Brooklyn by following the pointer
chains from the first Brooklyn tuple. We also apply the additional
criteria of a ssets < 5000 on every tuple.
12.7 Write pseudocode for an iterator that implements indexed nested-loop
join, where the outer relation is pipelined. Your pseudocode must define
the standard iterator functions open(), next(), and close(). Show what state
information the iterator must maintain between calls.
Answer: Let outer be the iterator which returns successive tuples from
the pipelined outer relation. Let inner be the iterator which returns suc-
cessive tuples of the inner relation having a given value at the join at-
tributes. The inner iterator returns these tuples by performing an index
lookup. The functions IndexedNLJoin::open, IndexedNLJoin::close and
IndexedNLJoin::next to implement the indexed nested-loop join iterator
are given below. The two iterators outer and inner, the value of the last
read outer relation tuple tr and a flag done r indicating whether the end of
the outer relation scan has been reached are the state information which
need to be remembered by IndexedNLJoin between calls.
done r := false;
if(outer.next() 6= false)
move tuple from outers output buffer to tr ;
done r := true;
boolean IndexedNLJoin::next()
while(done r )
if(inner.next(tr [JoinAttrs]) 6= false)
move tuple from inners output buffer to ts ;
compute tr 1 ts and place it in output buffer;
return true;
if(outer.next() 6= false)
move tuple from outers output buffer to tr ;
rewind inner to first tuple of s;
done r := true;
return false;
12.8 Design sort-based and hash-based algorithms for computing the relational
division operation (see Practise Exercises of Chapter 6 for a definition of
the division operation).
Answer: Suppose r (T S) and s(S) be two relations and r s has to be
For sorting based algorithm, sort relation s on S. Sort relation r on
(T, S). Now, start scanning r and look at the T attribute values of the first
tuple. Scan r till tuples have same value of T. Also scan s simultaneously
and check whether every tuple of s also occurs as the S attribute of r , in
a fashion similar to merge join. If this is the case, output that value of T
and proceed with the next value of T. Relation s may have to be scanned
multiple times but r will only be scanned once. Total disk accesses, after
