Lecture 5: Linear Sorting: Review
Lecture 5: Linear Sorting: Review
Lecture 5: Linear Sorting: Review
006
Massachusetts Institute of Technology
Instructors: Erik Demaine, Jason Ku, and Justin Solomon Lecture 5: Linear Sorting
Review
• Comparison search lower bound: any decision tree with n nodes has height ≥ dlg(n+1)e−1
• Can do faster using random access indexing: an operation with linear branching factor!
• Direct access array is fast, but may use a lot of space (Θ(u))
• Expectation input-independent: choose hash function randomly from universal hash family
• Last time we achieved faster find. Can we also achieve faster sort?
Operations O(·)
Container Static Dynamic Order
Data Structure build(X) find(k) insert(x) find min() find prev(k)
Array n n n n n
Sorted Array n log n log n n 1 log n
Direct Access Array u 1 1 u u
Hash Table n(e) 1(e) 1(a)(e) n n
2 Lecture 5: Linear Sorting
1 def direct_access_sort(A):
2 "Sort A assuming items have distinct non-negative keys"
3 u = 1 + max([x.key for x in A]) # O(n) find maximum key
4 D = [None] * u # O(u) direct access array
5 for x in A: # O(n) insert items
6 D[x.key] = x
7 i = 0
8 for key in range(u): # O(u) read out items in order
9 if D[key] is not None:
10 A[i] = D[key]
11 i += 1
Tuple Sort
• Item keys are tuples of equal length, i.e. item x.key = (x.k1 , x.k2 , x.k2 , . . .).
• Want to sort on all entries lexicographically, so first key k1 is most significant
• How to sort? Idea! Use other auxiliary sorting algorithms to separately sort each key
• (Like sorting rows in a spreadsheet by multiple columns)
• What order to sort them in? Least significant to most significant!
• Exercise: [32, 03, 44, 42, 22] =⇒ [42, 22, 32, 03, 44] =⇒ [03, 22, 32, 42, 44](n=5)
• Idea! Use tuple sort with auxiliary direct access array sort to sort tuples (a, b).
• Problem! Many integers could have the same a or b value, even if input keys distinct
• Need sort allowing repeated keys which preserves input order
• Want sort to be stable: repeated keys appear in output in same order as input
• Direct access array sort cannot even sort arrays having repeated keys!
• Can we modify direct access array sort to admit multiple keys in a way that is stable?
Counting Sort
• Instead of storing a single item at each array index, store a chain, just like hashing!
• For stability, chain data structure should remember the order in which items were added
• Use a sequence data structure which maintains insertion order
• To insert item x, insert last to end of the chain at index x.key
• Then to sort, read through all chains in sequence order, returning items one by one
1 def counting_sort(A):
2 "Sort A assuming items have non-negative keys"
3 u = 1 + max([x.key for x in A]) # O(n) find maximum key
4 D = [[] for i in range(u)] # O(u) direct access array of chains
5 for x in A: # O(n) insert into chain at x.key
6 D[x.key].append(x)
7 i = 0
8 for chain in D: # O(u) read out items in order
9 for x in chain:
10 A[i] = x
11 i += 1
4 Lecture 5: Linear Sorting
Radix Sort
• Idea! If u < n2 , use tuple sort with auxiliary counting sort to sort tuples (a, b)
• Sort least significant key b, then most significant key a
• Stability ensures previous sorts stay sorted
• Running time for this algorithm is O(2n) = O(n). Yay!
• If every key < nc for some positive c = logn (u), every key has at most c digits base n
• A c-digit number can be written as a c-element tuple in O(c) time
• We sort each of the c base-n digits in O(n) time
• So tuple sort with auxiliary counting sort runs in O(cn) time in total
• If c is constant, so each key is ≤ nc , this sort is linear O(n)!
1 def radix_sort(A):
2 "Sort A assuming items have non-negative keys"
3 n = len(A)
4 u = 1 + max([x.key for x in A]) # O(n) find maximum key
5 c = 1 + (u.bit_length() // n.bit_length())
6 class Obj: pass
7 D = [Obj() for a in A]
8 for i in range(n): # O(nc) make digit tuples
9 D[i].digits = []
10 D[i].item = A[i]
11 high = A[i].key
12 for j in range(c): # O(c) make digit tuple
13 high, low = divmod(high, n)
14 D[i].digits.append(low)
15 for i in range(c): # O(nc) sort each digit
16 for j in range(n): # O(n) assign key i to tuples
17 D[j].key = D[j].digits[i]
18 counting_sort(D) # O(n) sort on digit i
19 for i in range(n): # O(n) output to A
20 A[i] = D[i].item
For information about citing these materials or our Terms of Use, visit: https://ocw.mit.edu/terms