MySQL Index

Download as ppt, pdf, or txt
Download as ppt, pdf, or txt
You are on page 1of 79

Indexing and Hashing

Indexing and Hashing


 Basic Concepts
 Ordered Indices
 B+-Tree Index Files
 B-Tree Index Files
 Static Hashing
 Dynamic Hashing
 Comparison of Ordered Indexing and Hashing
 Index Definition in SQL
 Multiple-Key Access

22/2/20 Cryptography page:2


2
Basic Concepts
 Indexing mechanisms used to speed up access to
desired data.
 E.g., author catalog in library
 Search Key - attribute to set of attributes used to
look up records in a file.
 An index file consists of records (called index
entries) of the form
search-key pointer

 Index files are typically much smaller than the


original file
 Two basic kinds of indices:
 Ordered indices: search keys are stored in sorted order
 Hash indices: search keys are distributed uniformly across “buckets”
using a “hash function”.
22/2/20 Cryptography page:3
3
Index Evaluation Metrics
 Access types supported efficiently.
E.g.,
 records with a specified value in the attribute
 or records with an attribute value falling in a specified
range of values (e.g. 10000 < salary < 40000)
 Access time
 Insertion time

 Deletion time

 Space overhead

22/2/20 Cryptography page:4


4
Ordered Indices
 In an ordered index, index entries are stored sorted
on the search key value. E.g., author catalog in
library
 Primary index: in a sequentially ordered file, the index
whose search key specifies the sequential order of
the file
 Also called clustering index
 The search key of a primary index is usually but not necessarily the primary
key
 Secondary index: an index whose search key specifies
an order different from the sequential order of the
file. Also called non-clustering index
 Index-sequential file: ordered sequential file with a
primary index

22/2/20 Cryptography page:5


5
Dense Index Files
 Dense index — Index record appears for
every search-key value in the file.

22/2/20 Cryptography page:6


6
Sparse Index Files
 Sparse Index: contains index records for only some search-
key values.
 Applicable when records are sequentially ordered on search-key
 To locate a record with search-key value K we:
 Find index record with largest search-key value < K
 Search file sequentially starting at the record to which the index record points

22/2/20 Cryptography page:7


7
Sparse Index Files (Cont.)
 Compared to dense indices:
 Less space and less maintenance overhead for
insertions and deletions.
 Generally slower than dense index for locating
records.
 Good tradeoff: sparse index with an
index entry for every block in file,
corresponding to least search-key
value in the block.

22/2/20 Cryptography page:8


8
Multilevel Index
 If primary index does not fit in memory, access
becomes expensive
 Solution: treat primary index kept on disk as a
sequential file and construct a sparse index on it
 outer index – a sparse index of primary index
 inner index – the primary index file
 If even outer index is too large to fit in main
memory, yet another level of index can be
created, and so on
 Indices at all levels must be updated on insertion
or deletion from the file

22/2/20 Cryptography page:9


9
Multilevel Index (Cont.)

22/2/20 Cryptography page:10


10
Index Update: Record Deletion
 If deleted record was the only record in the file
with its particular search-key value, the search-key
is deleted from the index also.
 Single-level index deletion:
 Dense indices – deletion of search-key: similar to file record deletion.
 Sparse indices –
 if deleted key value exists in the index, the value is replaced by the next
search-key value in the file (in search-key order).
 If the next search-key value already has an index entry, the entry is
deleted instead of being replaced.

22/2/20 Cryptography page:11


11
Index Update: Record Insertion
 Single-level index insertion:
 Perform a lookup using the key value from inserted record
 Dense indices – if the search-key value does not appear in
the index, insert it.
 Sparse indices – if index stores an entry for each block of
the file, no change needs to be made to the index unless a
new block is created.
 If a new block is created, the first search-key value
appearing in the new block is inserted into the index.
 Multilevel insertion (as well as deletion)
algorithms are simple extensions of the
single-level algorithms
22/2/20 Cryptography page:12
12
Secondary Indices Example

Secondary index on balance field of account


 Index record points to a bucket that contains
pointers to all the actual records with that particular
search-key value
 Secondary indices have to be dense
22/2/20 Cryptography page:13
13
Primary and Secondary Indices
 Indices offer substantial benefits when searching
for records.
 BUT: Updating indices imposes overhead on
database modification --when a file is modified,
every index on the file must be updated,
 Sequential scan using primary index is efficient,
but a sequential scan using a secondary index is
expensive
 Each record access may fetch a new block from disk
 Block fetch requires about 5 to 10 micro seconds, versus about 100
nanoseconds for memory access

22/2/20 Cryptography page:14


14
B+-Tree Index Files
B+-tree indices are an alternative to indexed-sequential files.
 Disadvantage of indexed-sequential files
 performance degrades as file grows, since many overflow blocks get
created.
 Periodic reorganization of entire file is required.
 Advantage of B+-tree index files:
 automatically reorganizes itself with small, local, changes, in the
face of insertions and deletions.
 Reorganization of entire file is not required to maintain performance.
 (Minor) disadvantage of B+-trees:
 extra insertion and deletion overhead, space overhead .
 Advantages of B+-trees outweigh disadvantages
 B+-trees are used extensively

22/2/20 Cryptography page:15


15
B+-Tree Index Files (Cont.)
A B+-tree is a rooted tree satisfying the following properties:
 All paths from root to leaf are of the same
length
 Each node that is not a root or a leaf has
between n/2 and n children.
 A leaf node has between (n–1)/2 and n–1
values
 Special cases:
 If the root is not a leaf, it has at least 2 children.
 If the root is a leaf (that is, there are no other nodes in the tree),
it can have between 0 and (n–1) values.

22/2/20 Cryptography page:16


16
B+-Tree Node Structure
 Typical node

 Ki are the search-key values


 Pi are pointers to children (for non-leaf nodes) or
pointers to records or buckets of records (for leaf
nodes).
 The search-keys in a node are ordered
K1 < K2 < K3 < . . . < Kn–1
22/2/20 Cryptography page:17
17
Leaf Nodes in B+-Trees
Properties of a leaf node:

 For i = 1, 2, . . ., n–1, pointer Pi either points


to a file record with search-key value Ki, or to
a bucket of pointers to file records, each record
having search-key value Ki. Only need bucket st
ructure if search-key does not form a primary k
ey.
 If Li, Lj are leaf nodes and i < j, Li’s search-key
values are less than Lj’s search-key values
 Pn points to next leaf node in search-key order

22/2/20 Cryptography page:18


18
Non-Leaf Nodes in B+-Trees
 Non leaf nodes form a multi-level sparse index on the lea
f nodes. For a non-leaf node with m pointers:
 All the search-keys in the subtree to which P1 points are less
than K1
 For 2  i  n – 1, all the search-keys in the subtree to whic
h Pi points have values greater than or equal to Ki–1 and less
than Ki
 All the search-keys in the subtree to which Pn points have valu
es greater than or equal to Kn–1

22/2/20 Cryptography page:19


19
Example of a B+-tree

B+-tree for account file (n = 3)

22/2/20 Cryptography page:20


20
Example of B+-tree

B+-tree for account file (n = 5)

 Leaf nodes must have between 2 and 4 values


((n–1)/2 and n –1, with n = 5).
 Non-leaf nodes other than root must have
between 3 and 5 children ((n/2 and n with n
=5).
 Root must have at least 2 children.

22/2/20 Cryptography page:21


21
Observations about B+-trees
 Since the inter-node connections are done by pointers, “logi
cally” close blocks need not be “physically” close.
 The non-leaf levels of the B+-tree form a hierarchy of spar
se indices.
 The B+-tree contains a relatively small number of levels
 Level below root has at least 2* n/2 values
 Next level has at least 2* n/2 * n/2 values
 .. etc.
 If there are K search-key values in the file, the tr
ee height is no more than  logn/2(K)
 thus searches can be conducted efficiently.
 Insertions and deletions to the main file can be handle
d efficiently, as the index can be restructured in logar
ithmic time (as we shall see).
22/2/20 Cryptography page:22
22
Queries on B+-Trees
 Find all records with a search-key value of k.
1. N=root
2. Repeat
1. Examine N for the smallest search-key value > k.
2. If such a value exists, assume it is Ki. Then set N = Pi
3. Otherwise k  Kn–1. Set N = Pn
Until N is a leaf node
3. If for some i, key Ki = k follow pointer Pi to the desired rec
ord or bucket.
4. Else no record with search-key value k exists.

22/2/20 Cryptography page:23


23
Queries on B+-Trees (Cont.)
 If there are K search-key values in the file, the
height of the tree is no more than logn/2(K).
 A node is generally the same size as a disk block,
typically 4 kilobytes
 and n is typically around 100 (40 bytes per index entry).
 With 1 million search key values and n = 100
 at most log50(1,000,000) = 4 nodes are accessed in a lookup.
 Contrast this with a balanced binary tree with 1
million search key values — around 20 nodes are
accessed in a lookup
 above difference is significant since every node access may need a disk
I/O, costing around 20 milliseconds

22/2/20 Cryptography page:24


24
Updates on B+-Trees: Insertion
1. Find the leaf node in which the search-key
value would appear
2. If the search-key value is already present in
the leaf node
1. Add record to the file
3. If the search-key value is not present, then
1. add the record to the main file (and create a bucket if
necessary)
2. If there is room in the leaf node, insert (key-value, pointer)
pair in the leaf node
3. Otherwise, split the node (along with the new (key-value,
pointer) entry) as discussed in the next slide.
22/2/20 Cryptography page:25
25
Updates on B+-Trees: Insertion (Cont.)
 Splitting a leaf node:
 take the n (search-key value, pointer) pairs (including the
one being inserted) in sorted order. Place the first n/2 in
the original node, and the rest in a new node.
 let the new node be p, and let k be the least key value in p.
Insert (k,p) in the parent of the node being split.
 If the parent is full, split it and propagate the split further
up.
 Splitting of nodes proceeds upwards till a
node that is not full is found.
 In the worst case the root node may be split increasing the
height of the tree by 1.
Result of splitting node containing Brighton and Downtown on inserting Clearvie
w
Next step: insert entry with (Downtown,pointer-to-new-node)
22/2/20 Cryptography into parent page:26
26
Updates on B -Trees: Insertion
(Cont.)

B+-Tree before and after insertion of “Clearview”


22/2/20 Cryptography page:27
27
Insertion in B+-Trees (Cont.)
 Splitting a non-leaf node: when inserting
(k,p) into an already full internal node N
 Copy N to an in-memory area M with space for
n+1 pointers and n keys
 Insert (k,p) into M
 Copy P1,K1, …, K n/2-1,P n/2 from M back into
node N
 Copy Pn/2+1,K n/2+1,…,Kn,Pn+1 from M into newly
allocated node N’
 Insert (K n/2,N’) into parent NMianus
 Read pseudocode in book!
Downtown Mianus Perryridge Downtown Redwood

22/2/20 Cryptography page:28


28
Updates on B+-Trees: Deletion
 Find the record to be deleted, and remove it from the
main file and from the bucket (if present)
 Remove (search-key value, pointer) from the leaf node
if there is no bucket or if the bucket has become
empty
 If the node has too few entries due to the removal,
and the entries in the node and a sibling fit into a
single node, then merge siblings:
 Insert all the search-key values in the two nodes into a single node (the one
on the left), and delete the other node.
 Delete the pair (Ki–1, Pi), where Pi is the pointer to the deleted node, from
its parent, recursively using the above procedure.

22/2/20 Cryptography page:29


29
Updates on B+-Trees:
Deletion
 Otherwise, if the node has too few entries due to
the removal, but the entries in the node and a
sibling do not fit into a single node, then
redistribute pointers:
 Redistribute the pointers between the node and a sibling such that
both have more than the minimum number of entries.
 Update the corresponding search-key value in the parent of the node.
 The node deletions may cascade upwards till a
node which has n/2 or more pointers is found.
 If the root node has only one pointer after
deletion, it is deleted and the sole child becomes
the root.

22/2/20 Cryptography page:30


30
Examples of B+-Tree Deletion

Before and after deleting “Downtown”


 Deleting “Downtown” causes merging of under-full leaves
 leaf node can become empty only for n=3!
22/2/20 Cryptography page:31
31
Examples of B+-Tree Deletion
(Cont.)

Before and After deletion of “Perryridge” from result of


previous example

22/2/20 Cryptography page:32


32
Examples of B+-Tree Deletion
(Cont.)

 Leaf with “Perryridge” becomes underfull (actually empty, in this special case)
and merged with its sibling.
 As a result “Perryridge” node’s parent became underfull, and was merged with i
ts sibling
 Value separating two nodes (at parent) moves into merged node
 Entry deleted from parent
 Root node then has only one child, and is deleted

22/2/20 Cryptography page:33


33
Example of B+-tree Deletion
(Cont.)

Before and after deletion of “Perryridge” from earlier example


 Parent of leaf containing Perryridge became underfull, and
borrowed a pointer from its left sibling

22/2/20 Search-key value in the Cryptography
parent’s parent changes as a result
page:34
34
B+-Tree File Organization
 Index file degradation problem is solved by using B+-
Tree indices.
 Data file degradation problem is solved by using B+-T
ree File Organization.
 The leaf nodes in a B+-tree file organization store re
cords, instead of pointers.
 Leaf nodes are still required to be half full
 Since records are larger than pointers, the maximum number o
f records that can be stored in a leaf node is less than the
number of pointers in a nonleaf node.
 Insertion and deletion are handled in the same way a
s insertion and deletion of entries in a B+-tree index.

22/2/20 Cryptography page:35


35
B+-Tree File Organization (Cont.)

Example of B+-tree File Organization


 Good space utilization important since records use
more space than pointers.
 To improve space utilization, involve more sibling
nodes in redistribution during splits and merges

2n / 3
22/2/20 Cryptography page:36
36
Indexing Strings
 Variable length strings as keys
 Variable fanout
 Use space utilization as criterion for splitting, not numbe
r of pointers
 Prefix compression
 Key values at internal nodes can be prefixes of full key
 Keep enough characters to distinguish entries in the subtrees
separated by the key value
 E.g. “Silas” and “Silberschatz” can be separated by “Sil
b”
 Keys in leaf node can be compressed by sharing common prefi
xes

22/2/20 Cryptography page:37


37
B-Tree Index Files
 Similar to B+-tree, but B-tree allows search-key values to appear only
once; eliminates redundant storage of search keys.
 Search keys in nonleaf nodes appear nowhere else in the B-tree; an
additional pointer field for each search key in a nonleaf node must be
included.
 Generalized B-tree leaf node

 Nonleaf node – pointers Bi are the


bucket or file record pointers
22/2/20 Cryptography page:38
38
B-Tree Index File Example

B-tree (above) and B+-tree (below) on same data

22/2/20 Cryptography page:39


39
B-Tree Index Files (Cont.)
 Advantages of B-Tree indices:
 May use less tree nodes than a corresponding B+-Tree.
 Sometimes possible to find search-key value before reaching leaf node.
 Disadvantages of B-Tree indices:
 Only small fraction of all search-key values are found early
 Non-leaf nodes are larger, so fan-out is reduced. Thus, B-Trees typically
have greater depth than corresponding B+-Tree
 Insertion and deletion more complicated than in B+-Trees
 Implementation is harder than B+-Trees.
 Typically, advantages of B-Trees do not out weigh
disadvantages.

22/2/20 Cryptography page:40


40
Multiple-Key Access
 Use multiple indices for certain types of queries.
 Example:
select account_number
from account
where branch_name = “Perryridge” and balance = 1000
 Possible strategies for processing query using indices
on single attributes:
1. Use index on branch_name to find accounts with branch name
Perryridge; test balance = 1000
2. Use index on balance to find accounts with balances of $100
0; test branch_name = “Perryridge”.
3. Use branch_name index to find pointers to all records perta
ining to the Perryridge branch. Similarly use index on bal
ance. Take intersection of both sets of pointers obtained.

22/2/20 Cryptography page:41


41
Indices on Multiple Keys
 Composite search keys are search keys c
ontaining more than one attribute
 E.g. (branch_name, balance)
 Lexicographic ordering: (a1, a2) < (b1, b
2) if either
 a1 < b1, or
 a1=b1 and a2 < b 2

22/2/20 Cryptography page:42


42
Indices on Multiple Attributes
Suppose we have an index on combined search-key (branch_name, balan
ce)
 For where branch_name = “Perryridge” and balance =
1000
the index on (branch_name, balance) can be used to f
etch only records that satisfy both conditions.
 Using separate indices in less efficient — we may fetch many
records (or pointers) that satisfy only one of the conditions.
 Can also efficiently handle where branch_name = “Perr
yridge” and balance < 1000
 But cannot efficiently handle where branch_name < “P
erryridge” and balance = 1000
 May fetch many records that satisfy the first but not the sec
ond condition

22/2/20 Cryptography page:43


43
Non-Unique Search Keys
 Alternatives:
 Buckets on separate block (bad idea)
 List of tuple pointers with each key
 Low space overhead, no extra cost for queries
 Extra code to handle read/update of long lists
 Deletion of a tuple can be expensive if there are
many duplicates on search key (why?)
 Make search key unique by adding a record-identifier
 Extra storage overhead for keys
 Simpler code for insertion/deletion
 Widely used
22/2/20 Cryptography page:44
44
Other Issues in Indexing
 Covering indices
 Add extra attributes to index so (some) queries can avoid fetching the actual
records
 Particularly useful for secondary indices
 Why?
 Can store extra attributes only at leaf
 Record relocation and secondary indices
 If a record moves, all secondary indices that store record pointers have to be
updated
 Node splits in B+-tree file organizations become very expensive
 Solution: use primary-index search key instead of record pointer in
secondary index
 Extra traversal of primary index to locate record
 Higher cost for queries, but node splits are cheap
 Add record-id if primary-index search key is non-unique

22/2/20 Cryptography page:45


45
Hashing
Static Hashing
 A bucket is a unit of storage containing one or more
records (a bucket is typically a disk block).
 In a hash file organization we obtain the bucket of a
record directly from its search-key value using a hash
function.
 Hash function h is a function from the set of all search-
key values K to the set of all bucket addresses B.
 Hash function is used to locate records for access,
insertion as well as deletion.
 Records with different search-key values may be mapped
to the same bucket; thus entire bucket has to be
searched sequentially to locate a record.

22/2/20 Cryptography page:47


47
Example of Hash File Organization
Hash file organization of account file, using branch_name as key
(See figure in next slide.)

 There are 10 buckets,


 The binary representation of the ith charac
ter is assumed to be the integer i.
 The hash function returns the sum of the bi
nary representations of the characters mod
ulo 10
 E.g. h(Perryridge) = 5 h(Round Hill) = 3 h
(Brighton) = 3

22/2/20 Cryptography page:48


48
Example of Hash File Organization

Hash file organization of acco


unt file, using branch_name a
s key
(see previous slide for details
).

22/2/20 Cryptography page:49


49
Hash Functions
 Worst hash function maps all search-key values to the same
bucket; this makes access time proportional to the number of
search-key values in the file.
 An ideal hash function is uniform, i.e., each bucket is assigned
the same number of search-key values from the set of all
possible values.
 Ideal hash function is random, so each bucket will have the
same number of records assigned to it irrespective of the
actual distribution of search-key values in the file.
 Typical hash functions perform computation on the internal
binary representation of the search-key.
 For example, for a string search-key, the binary representations of all the
characters in the string could be added and the sum modulo the number of buckets
could be returned. .

22/2/20 Cryptography page:50


50
Handling of Bucket Overflows
 Bucket overflow can occur because of
 Insufficient buckets
 Skew in distribution of records. This can occur due to two
reasons:
 multiple records have same search-key value
 chosen hash function produces non-uniform distribution of
key values
 Although the probability of bucket overflow
can be reduced, it cannot be eliminated; it is
handled by using overflow buckets.

22/2/20 Cryptography page:51


51
Handling of Bucket Overflows (Cont.)
 Overflow chaining – the overflow buckets of a
given bucket are chained together in a linked
list.
 Above scheme is called closed hashing.
 An alternative, called open hashing, which does not use
overflow buckets, is not suitable for database applications.

22/2/20 Cryptography page:52


52
Hash Indices
 Hashing can be used not only for file
organization, but also for index-structure
creation.
 A hash index organizes the search keys, with
their associated record pointers, into a hash
file structure.
 Strictly speaking, hash indices are always
secondary indices
 if the file itself is organized using hashing, a separate primary
hash index on it using the same search-key is unnecessary.
 However, we use the term hash index to refer to both secondary
index structures and hash organized files.
22/2/20 Cryptography page:53
53
Example of Hash Index

22/2/20 Cryptography page:54


54
Deficiencies of Static Hashing
 In static hashing, function h maps search-key values to
a fixed set of B of bucket addresses. Databases grow
or shrink with time.
 If initial number of buckets is too small, and file grows, per
formance will degrade due to too much overflows.
 If space is allocated for anticipated growth, a significant am
ount of space will be wasted initially (and buckets will be un
derfull).
 If database shrinks, again space will be wasted.
 One solution: periodic re-organization of the file with
a new hash function
 Expensive, disrupts normal operations
 Better solution: allow the number of buckets to be mod
ified dynamically.
22/2/20 Cryptography page:55
55
Dynamic Hashing
 Good for database that grows and shrinks in size
 Allows the hash function to be modified dynamically
 Extendable hashing – one form of dynamic hashing
 Hash function generates values over a large range — typically b-bit
integers, with b = 32.
 At any time use only a prefix of the hash function to index into a table of
bucket addresses.
 Let the length of the prefix be i bits, 0  i  32.
 Bucket address table size = 2i. Initially i = 0
 Value of i grows and shrinks as the size of the database grows and
shrinks.
 Multiple entries in the bucket address table may point to a bucket (why?)
 Thus, actual number of buckets is < 2i
 The number of buckets also changes dynamically due to coalescing
and splitting of buckets.

22/2/20 Cryptography page:56


56
General Extendable Hash Structure

In this structure, i2 = i3 = i, whereas i1 = i – 1 (see next


slide for details)
22/2/20 Cryptography page:57
57
Use of Extendable Hash Structure
 Each bucket j stores a value ij
 All the entries that point to the same bucket have the same values on
the first ij bits.
 To locate the bucket containing search-key Kj:
1. Compute h(Kj) = X
2. Use the first i high order bits of X as a displacement into bucket ad
dress table, and follow the pointer to appropriate bucket
 To insert a record with search-key value Kj
 follow same procedure as look-up and locate the bucket, say j.
 If there is room in the bucket j insert record in the bucket.
 Else the bucket must be split and insertion re-attempted (next slid
e.)
 Overflow buckets used instead in some cases (will see shortly)

22/2/20 Cryptography page:58


58
Insertion in Extendable Hash Structure
(Cont)
To split a bucket j when inserting record with search-key value Kj:

 If i > ij (more than one pointer to bucket j)


 allocate a new bucket z, and set ij = iz = (ij + 1)
 Update the second half of the bucket address table entries
originally pointing to j, to point to z
 remove each record in bucket j and reinsert (in j or z)
 recompute new bucket for Kj and insert record in the bucket
(further splitting is required if the bucket is still full)
 If i = ij (only one pointer to bucket j)
 If i reaches some limit b, or too many splits have happened
in this insertion, create an overflow bucket
 Else
 increment i and double the size of the bucket address table.
 replace each entry in the table by two entries that point to the same buc
ket.
 recompute new bucket address table entry for Kj
Now i > ij so use the first case above.
22/2/20 Cryptography page:59
59
Deletion in Extendable Hash Structure
 To delete a key value,
 locate it in its bucket and remove it.
 The bucket itself can be removed if it becomes empty
(with appropriate updates to the bucket address table).

 Coalescing of buckets can be done (can coalesce only w


ith a “buddy” bucket having same value of ij and sam
e ij –1 prefix, if it is present)
 Decreasing bucket address table size is also possible
 Note: decreasing bucket address table size is an expensive operati
on and should be done only if number of buckets becomes much sm
aller than the size of the table

22/2/20 Cryptography page:60


60
Use of Extendable Hash Structure

Initial Hash structure, bucket size = 2


22/2/20 Cryptography page:61
61
Example (Cont.)
 Hash structure after insertion of one
Brighton and two Downtown records

22/2/20 Cryptography page:62


62
Example (Cont.)
Hash structure after insertion of Mianus record

22/2/20 Cryptography page:63


63
Example (Cont.)

Hash structure after insertion of three Perryridge records

22/2/20 Cryptography page:64


64
Example (Cont.)
 Hash structure after insertion of Redwood
and Round Hill records

22/2/20 Cryptography page:65


65
Extendable Hashing vs. Other Schemes
 Benefits of extendable hashing:
 Hash performance does not degrade with growth of file
 Minimal space overhead
 Disadvantages of extendable hashing
 Extra level of indirection to find desired record
 Bucket address table may itself become very big (larger than memory)
 Cannot allocate very large contiguous areas on disk either
 Solution: B+-tree file organization to store bucket address table
 Changing size of bucket address table is an expensive operation
 Linear hashing is an alternative mechanism
 Allows incremental growth of its directory (equivalent to bucket address
table)
 At the cost of more bucket overflows

22/2/20 Cryptography page:66


66
Comparison of Ordered Indexing and Hashing
 Cost of periodic re-organization
 Relative frequency of insertions and deletions
 Is it desirable to optimize average access time at the
expense of worst-case access time?
 Expected type of queries:
 Hashing is generally better at retrieving records having a spe
cified value of the key.
 If range queries are common, ordered indices are to be preferr
ed
 In practice:
 PostgreSQL supports hash indices, but discourages use due to p
oor performance
 Oracle supports static hash organization, but not hash indices
 MySQL supports hash indices
22/2/20 Cryptography page:67
67
Index Definition in SQL
 Create an index
create index <index-name> on <relation-name>
(<attribute-list>)
E.g.: create index b-index on branch(branch_name)
 Use create unique index to indirectly specify and enf
orce the condition that the search key is a candida
te key is a candidate key.
 Not really required if SQL unique integrity constraint is
supported
 To drop an index
drop index <index-name>
 Most database systems allow specification of type o
f index, and clustering.
22/2/20 Cryptography page:68
68
End of Chapter
Partitioned Hashing
 Hash values are split into segments that depend on each
attribute of the search-key.
(A1, A2, . . . , An) for n attribute search-key
 Example: n = 2, for customer, search-key being
(customer-street, customer-city)
search-key value hash value
(Main, Harrison) 101 111
(Main, Brooklyn) 101 001
(Park, Palo Alto) 010 010
(Spring, Brooklyn) 001 001
(Alma, Palo Alto) 110 010
 To answer equality query on single attribute, need to look
up multiple buckets. Similar in effect to grid files.

22/2/20 Cryptography page:70


70
Sequential File For account
Records

22/2/20 Cryptography page:71


71
Sample account File

22/2/20 Cryptography page:72


72
Figure 12.2

22/2/20 Cryptography page:73


73
Figure 12.14

22/2/20 Cryptography page:74


74
Figure 12.25

22/2/20 Cryptography page:75


75
Grid Files
 Structure used to speed the processing of
general multiple search-key queries involving
one or more comparison operators.
 The grid file has a single grid array and one
linear scale for each search-key attribute.
The grid array has number of dimensions
equal to number of search-key attributes.
 Multiple cells of grid array can point to
same bucket
 To find the bucket for a search-key value,
locate the row and column of its cell using
the linear scales and follow pointer

22/2/20 Cryptography page:76


76
Example Grid File for account

22/2/20 Cryptography page:77


77
Queries on a Grid File
 A grid file on two attributes A and B can
handle queries of all following forms with
reasonable efficiency
 (a1  A  a2)
 (b1  B  b2)
 (a1  A  a2  b1  B  b2),.
 E.g., to answer (a1  A  a2  b1  B  b2),
use linear scales to find corresponding
candidate grid array cells, and look up all the
buckets pointed to from those cells.

22/2/20 Cryptography page:78


78
Grid Files (Cont.)
 During insertion, if a bucket becomes full, new bucket
can be created if more than one cell points to it.
 Idea similar to extendable hashing, but on multiple dimensions
 If only one cell points to it, either an overflow bucket must be created or the
grid size must be increased
 Linear scales must be chosen to uniformly distribute
records across cells.
 Otherwise there will be too many overflow buckets.
 Periodic re-organization to increase grid size will help.
 But reorganization can be very expensive.
 Space overhead of grid array can be high.
 R-trees (Chapter 23) are an alternative

22/2/20 Cryptography page:79


79

You might also like