Module 5
Module 5
Module 5-Hashing
HASHING: Introduction, Static Hashing,
Dynamic Hashing
5.0 Hashing
• Hashing enables us to perform the dictionary operations such as search, insert and deleting expected
time.
• In a mathematical sense, a map is a relation between two sets. We can define Map M as a set of pairs,
where each pair is of the form (key, value), where for given a key, we can find a value using some
kind of a “function” that maps keys to values.
• The key for a given object can be calculated using a function called a hash function.
• Hashing technique is designed to use a special function called the hash function which is used to map
a given value with a particular key for faster access of elements.
Is a function which is used to put the data in hash table. The integer is returned by the hash function
is called hash key.
1. A hash function should hash address such that keys are distributed as evenly as possible
among the various cells of the hash table.
• Hash table is a data structure used for storing and retrieving data very quickly.
• Insertion, Deletion or Retrieval operation takes place with help of hash value.
• Hence every entry in the hash table is associated with some key.
• Using the hash key the required piece of data can be searched in the hash table by few or more key
comparisons. The searching time is dependent upon the size of the hash table.
lOMoAR cPSD| 39810125
2. Mid-Square Method:
• Here, the key K is squared. A number ‘l’ in the middle of K2 is selected by removing the digits
from both ends. H(k)=l
• Example 1:
Solution: Let key=2345, Its square is K2 =574525
H (2345) =45=>by discarding 57 and 25
BCS304
lOMoAR cPSD| 39810125
• Example 2: Calculate the hash value for keys 1234 using the mid-square method. The hash table
has 100 memory locations.
Solution: Note that the hash table has 100 memory locations whose indices vary from 0 to 99.
This means that only two digits are needed to map the key to a location in the hash table.
When k = 1234, k2 = 1522756, h (1234) = 27.
3. Folding Method:
Step 1: Divide the key value into a number of parts. That is, divide k into parts k1, k2, ..., kn,
where each part has the same number of digits except the last part which may have lesser digits
than the other parts.
Step 2: Add the individual parts. That is, obtain the sum of k1 + k2 + ... + kn. The hash value is
produced by ignoring the last carry, if any.
Note that the number of digits in each part of the key will vary depending upon the size of the
hash table.
Figure 10.5 shows a hash table in which each key from the set K is mapped to locations generated by
using a hash function. Note that keys k2 and k6 point to the same memory location. This is known as
collision. That is, when two or more keys map to the same memory location, a collision Figure 5.18
shows a hash table in which each key from the set K is mapped to locations generated by using a hash
function. Note that keys k2 and k6 point to the same memory location. This is known as collision.
That is, when two or more keys map to the same memory location, a collision.
A method used to solve the problem of collision, also called collision resolution technique, is applied.
The two most popular methods of resolving collisions are:
Suppose new record R with key K is to be added to the memory table T, but that memory with H(k) = h is
already filled, one way of avoiding collision is to design R to 1st available location following T[h]. We
lOMoAR cPSD| 39810125
assume that T with m location is circular so T[1] comes after T[m]. according to procedure, we search for
record R in table T by linearly searching location T[h], T[h+1]… until we meet an empty location or finding
R.
Example: Consider a hash table of size 10. Using linear probing, insert the keys 72, 27, 36, 24, 63, 81, 92,
and 101 into the table.
Solution: Let H(k) = k mod m, m = 10
Initially hash table will be
0 1 2 3 4 5 6 7 8 9
-1 -1 -1 -1 -1 -1 -1 -1 -1 -1
H(72) = 72 mod 10 = 2
0 1 2 3 4 5 6 7 8 9
-1 -1 72 -1 -1 -1 -1 -1 -1 -1
H(27) = 27 mod 10 = 7
0 1 2 3 4 5 6 7 8 9
-1 -1 72 -1 -1 -1 -1 27 -1 -1
H(36) = 36 mod 10 = 6
0 1 2 3 4 5 6 7 8 9
-1 -1 72 -1 -1 -1 36 27 -1 -1
H(24) = 24 mod 10 = 4
0 1 2 3 4 5 6 7 8 9
-1 -1 72 -1 24 -1 36 27 -1 -1
H(63) = 63 mod 10 =3
0 1 2 3 4 5 6 7 8 9
-1 -1 72 63 24 -1 36 27 -1 -1
H(81) = 81 mod 10 =1
0 1 2 3 4 5 6 7 8 9
-1 81 72 63 24 -1 36 27 -1 -1
H(92) = 92 mod 10 =2
Collision occurred since 2 is already filled. So go to next position – 3, which is also already filled, go to next
position – 4 which is also already filled. So go to 5 – which is not filled – so insert the key 92 in position 5.
0 1 2 3 4 5 6 7 8 9
-1 81 72 63 24 92 36 27 -1 -1
Collision occurred since 1 is already filled. Do linear probing and the next position free is 8, so insert key
101 in position 8.
0 1 2 3 4 5 6 7 8 9
-1 81 72 63 24 92 36 27 101 -1
A variation of the linear probing idea is called quadratic probing. Instead of using a constant
“skip” value, if the first hash value that has resulted in collision is h, the successive values
which are probed are h+1, h+4, h+9, h+16, and so on. In other words, quadratic probing uses
a skip consisting of successive perfect squares.
3. Double Hashing
In double hashing, we use two hash functions rather than a single function. Double hashing
uses the idea of applying a second hash function to the key when a collision occurs. The result
lOMoAR cPSD| 39810125
of the second hash function will be the number of positions form the point of collision to
insert.There are a couple of requirements for the second function:
• it must never evaluate to 0
• must make sure that all cells can be probed
• A popular second hash function is: Hash2(key) = M - ( key % M ) where M is a prime number that is
smaller than the size of the table. But any independent hash function may also be used. Example:
37,90,45,22,17,49,55
H1(key)=key%10
H2(key)=7-(key%7)
After collision
(H1(key)+1*H2(key))%10
90
17
22
45
55
37
49
4. Rehashing
When the hash table becomes nearly full, the number of collisions increases, thereby degrading
the performance of insertion and search operations. In such cases, a better option is to create a new
hash table with size double of the original hash table.
All the entries in the original hash table will then have to be moved to the new hash table.
This is done by taking each entry, computing its new hash value, and then inserting it in the new hash table.
Though rehashing seems to be a simple process, it is quite expensive and must therefore not be
done frequently.
Consider the hash table of size 5 given below. The hash function used is h(x) = x % 5. Rehash the entries
into to a new hash table.
BCS304
lOMoAR cPSD| 39810125
5. Chaining
• Chaining technique avoids collision using an array of liked lists (run time).If more than one key
has same hash value, then all the keys will be inserted at the end of the list (insert rear) one by
one and thus collision is avoided.
• Example: Construct a hash table of size and store the following words: like, a, tree, you, first, a,
place, to
1. Static Hashing:
lOMoAR cPSD| 39810125
• Is a hashing technique in which the table(bucket) size remains the same (Fixed during compilation
time) is called static hashing.
Dynamically increases the size of the hash table as collision occurs. There are two types:
1) Dynamic hashing using directory or (Extendible hashing) : uses a directory that grows or shrinks
depending on the data distribution. No overflow buckets
2) Directory less Dynamic hashing or (Linear hashing): No directory. Splits buckets in linear order,
uses overflow buckets.
Dynamic hashing using directory
BCS304
lOMoAR cPSD| 39810125
Explain directoryless dynamic hashing ( Appeared in Dec. 2016/Jan 2017)- 5 Marks Directory
less Dynamic hashing
Basic Idea:
• Pages are split when overflows occur – but not necessarily the page with the overflow.
• Directory avoided in Linear hashing by using overflow pages. (chaining approach) • Splitting occurs in
turn, in a round robin fashion.one by one from the first bucket to the last bucket.
• Use a family of hash functions h0, h1, h2, ...
• Each function’s range is twice that of its predecessor.
• When all the pages at one level (the current hash function) have been split, a new level is applied.
• Insert in Order using linear hashing: 1,7,3,8,12,4,11,2,10 • After insertion till 12:
• When 4 inserted overflow occurred. So we split the bucket (no matter it is full or partially empty). And
increment pointer.
lOMoAR cPSD| 39810125
• So we split bucket 0 and rehashed all keys in it. Placed 3 to new bucket as h1 (3 mod 6 = 3 ) and (12 mod
6 = 0 ). Then 11 and 2 are inserted. And now overflow. s is pointing to bucket 1, hence split bucket 1 by
re- hashing it.
After split:
Insertion of 10 : as (10 mod 3 = 1) and bucket 1 < s, we need to hash 10 again using h1(10) = 10 mod 6 = 4th
bucket.
BCS304
lOMoAR cPSD| 39810125
The single ended priority queues may be categorized as min and max priority queues.The operations
supported by a min priority queue are:
The operations supported by a max priority queue are the same as those supported by a min priority queue
except that in SP1 and SP3 we replace minimum by maximum. The heap structure is a classical data
structure for the representation of a priority queue. Using a min (max) heap, the minimum (maximum)
element can be found in O(1) time. Meldable (single- ended) priority queue, augments the operations SP1
through SP3 with a meld operation that melds together two priority queues.
A double-ended priority queue (DEPQ) is a data structure that supports the following opertions on a
collection of elements.
DP1: Return an element with minimum priority.
Example: A DEPQ may be used to implement a network buffer. This buffer holds packets that are waiting
their turn to be sent out over a network link; each packet has an associated priority. When the network link
becomes available, a packet with the highest priority is transmitted. This corresponds to a DeleteMax
operation. When a packet arrives at the buffer from elsewhere in the network, it is added to this buffer.
This corresponds to an Insert operation. If the buffer is full, we must drop a packet with the minimum
priority before we can insert one. This is achieved using a DeleteMin operation.
Let n be the total number of elements in the two priority queues that are to be combined. If heaps are used to
represent priority queues, then the combine operation takes O(n) time. Using a leftist tree, the combine
operation as well as the normal priority queue operations take logarithmic time.
Leftist trees, are defined using the concept of an extended binary tree. An extended binary tree is a binary
tree in which all empty binary sub trees have been replaced by a square node. The square nodes in an
extended binary tree are called external nodes. The original (circular) nodes of the binary tree are called
internal nodes
lOMoAR cPSD| 39810125
Let X be a node in an extended binary tree. Let left-child (x) and right-child (x), respectively, denote the left
and right children of the internal node x. Define shortest (x) to be the length of a shortest path from x to an
external node. It is easy to see that shortest (x) satisfies the following recurrence:
BCS304
lOMoAR cPSD| 39810125
lOMoAR cPSD| 39810125
BCS304
lOMoAR cPSD| 39810125
In computer science, an optimal binary search tree (Optimal BST), sometimes called a weight-balanced
binary tree, is a binary search tree which provides the smallest possible search time (or expected search
time) for a given sequence of accesses (or access probabilities).
BCS304
lOMoAR cPSD| 39810125
We solve the problem by knowing W (i, i+1), C (i, i+1) and R (i, i+1), 0 ≤ i < 4;
Knowing W (i, i+ 2), C (i, i+ 2) and R (i, i+ 2), 0 ≤ i < 3 and repeating until W (0, n),
C (0, n) and R (0, n) are obtained.
BCS304
lOMoAR cPSD| 39810125
lOMoAR cPSD| 39810125
BCS304
lOMoAR cPSD| 39810125
lOMoAR cPSD| 39810125
Question Bank
1. Explain open addressing and chaining used to handle overflows in hashing. ( Appeared in Dec.
2016/Jan 2017)- 5 Marks
4. What is collision? What are the methods to resolve collision? Explain linear probing with an
example. (Appeared in June/July 2017)- 8 Marks
5. Write a short note on hashing-Explain any 3 popular HASH functions. (Appeared in June/July
2018)- 8 Marks
6. . Explain in detail about static and dynamic hashing. (Appeared in Dec.2018/Jan 2019)- 10 Marks
7. Explain Hashing and Collision. What are the methods used to resolve collision. (Appeared in
June/July 2019)- 8 Marks
8. What is hashing? Explain with example hash following Hashing function (Appeared in June/July
2019)- 6 Marks
9. Consider the following 4- digit employee number 9614 , 5882 , 67 13 , 4409 , 1825.
10. Find the 2 - digit hash address of each number using (Appeared in June/July 2019)- 8 Marks i) The
division method with =97 .
ii) The midsquare method.
iii) The folding method without reversing iv)
The folding method with reversing.