Foundations and Trends® in Theoretical Computer Science > Vol 12 > Issue 3–4

Hashing, Load Balancing and Multiple Choice

By Udi Wieder, VMware Research, USA, [email protected]

 
Suggested Citation
Udi Wieder (2017), "Hashing, Load Balancing and Multiple Choice", Foundations and Trends® in Theoretical Computer Science: Vol. 12: No. 3–4, pp 275-379. http://dx.doi.org/10.1561/0400000070

Publication Date: 11 Jul 2017
© 2017 U. Wieder
 
Subjects
Private and Secure Data Management,  Theory,  Optimization,  Data structures,  Design and analysis of algorithms,  Stochastic Optimization
 

Free Preview:

Download extract

Share

Download article
In this article:
1. Introduction
2. Simple Hashing - the One Choice Scheme
3. Multiple Choice Schemes
4. The Heavily Loaded Case
5. Dictionaries
Acknowledgments
References

Abstract

Many tasks in computer systems could be abstracted as distributing items into buckets, so that the allocation of items across buckets is as balanced as possible, and furthermore, given an item’s identifier it is possible to determine quickly to which bucket it was assigned. A canonical example is a dictionary data structure, where ‘items’ stands for key-value pairs and ‘buckets’ for memory locations. Another example is a distributed key-value store, where the buckets represent locations in disk or even whole servers. A third example may be a distributed execution engine where items represent processes and buckets compute devices, and so on. A common technique in this domain is the use of a hash-function that maps an item into a relatively short fixed length string. The hash function is then used in some way to associate the item to its bucket. The use of a hash function is typically the first step in the solution and additional algorithmic ideas are required to deal with collisions and the imbalance of hash values. In this monograph we survey some of these techniques. We focus on multiple choice schemes where items are placed into buckets via the use of several independent hash functions, and typically an item is placed at the least loaded bucket at the time of placement. We analyze the distributions obtained in detail, and show how these ideas could be used to design basic data structures. With respect to data structures we focus on dictionaries, presenting linear probing, cuckoo hashing and many of their variants.

DOI:10.1561/0400000070
ISBN: 978-1-68083-282-2
120 pp. $85.00
Buy book (pb)
 
ISBN: 978-1-68083-283-9
120 pp. $260.00
Buy E-book (.pdf)
Table of contents:
1. Introduction
2. Simple Hashing - the One Choice Scheme
3. Multiple Choice Schemes
4. The Heavily Loaded Case
5. Dictionaries
Acknowledgments
References

Scalable Algorithms for Data and Network Analysis

Many tasks in computer systems could be abstracted as distributing items into buckets, so that the allocation of items across buckets is as balanced as possible, and, furthermore, given an item’s identifier it is possible to determine quickly to which bucket it was assigned. A canonical example is a dictionary data structure, where ‘items’ stands for key-value pairs and ‘buckets’ for memory locations. Another example is a distributed key-value store, where the buckets represent locations in disk or even whole servers. A third example may be a distributed execution engine where items represent processes and buckets compute devices, and so on. A common technique in this domain is the use of a hash-function that maps an item into a relatively short fixed length string. The hash function is then used in some way to associate the item to its bucket. The use of a hash function is typically the first step in the solution and additional algorithmic ideas are required to deal with collisions and the imbalance of hash values.

Hashing, Load Balancing and Multiple Choice presents some of the basic algorithmic ideas that underpin many of the practical and theoretically interesting approaches for this problem. It focuses on multiple choice schemes where items are placed into buckets via the use of several independent hash functions, and typically an item is placed at the least loaded bucket at the time of placement. It analyses the distributions obtained, and shows how these ideas could be used to design basic data structures. With respect to data structures it focuses on dictionaries, presenting linear probing, cuckoo hashing and many of their variants.

 
TCS-070