B-Trees: COL 106 Shweta Agrawal, Amit Kumar
B-Trees: COL 106 Shweta Agrawal, Amit Kumar
COL 106
Shweta Agrawal, Amit
Kumar
Slide Credit : Yael Moses, IDC
Herzliya
Motivation
Large differences between time
access to disk, cash memory and
core memory
Minimize expensive access
(e.g., disk access)
B-tree: Dynamic sets that is
optimized for disks
B-Trees
A B-tree is an M-way search tree with two properties :
1. It is perfectly balanced: every leaf node is at the same
depth
2. Every internal node other than the root, is at least halffull, i.e. M/2-1 #keys M-1
3. Every internal node with k keys has k+1 non-null
children
For simplicity we consider M even and we use t=M/2:
2.* Every internal node other than the root is at least halffull, i.e. t-1 #keys 2t-1, t #children 2t
0 5 10
40
25 35
20
45 55
0 5
40
25 35
45 55
10
B-tree
4-way tree
B-tree
1. It is perfectly balanced: every leaf node is at the same
depth.
2. Every node, except maybe the root, is at least half-full
t-1 #keys 2t-1
3. Every internal node with k keys has k+1 non-null
B-tree Height
Claim: any B-tree with n keys, height h and
minimum degree t satisfies:
h log t
n 1
2
Proof:
The minimum number of KEYS for a tree with
height h is obtained when:
The root contains one key
All other nodes contain t-1 keys
B-Tree: Insert X
1.
Fix an Overflowed
1.
60 65 68 83 86 90
60 65
z 83 86 90
Insert example
20
10
15
25
Insert 3:
35
20
10 15
25
35
40
60
45
40
45
M 6; t 3
80
55
62
66
70 74 78
87
98
60 80
55
62 66 70 74 78
87 98
20
10
15
25
Insert 61:
40
35
20
60
45
40
M 6; t 3
80
55
60
62 66 70 74 78
87 98
80
OVERFLOW
10
15
25
35
20
10 15
25
35
45
40
60
45
61 62 66 70 74 78
55
55
70
SPLIT IT
80
61 62 66
87 98
74
78
87 98
M 6; t 3
20
Insert 38:
5 10 15
25
35
20
5 10 15
25 35 38
40
60
45
40
55
60
45
70
80
61 62 66
70
55
74
78
87 98
80
61 62 66
74
78
87 98
M 6; t 3
Insert 4:
20
5 10 15
25
3 4 5 10 15
35 38
20
OVERFLOW
40
25
60
45
55
40
35 38
70
60
45
80
61 62 66
70
55
74
87 98
80
61 62 66
74
SPLIT IT
5 20
78
78
87 98
OVERFLOW
40 60 70 80
SPLIT IT
10 15
25 35 38
45
55
61 62 66
74
78
87
98
M 6; t 3
5 20 40 60
OVERFLOW
70 80
SPLIT IT
10 15
25 35 38
45
55
61 62 66
74
78
87
98
60
5
10 15
20 40
25 35 38
70
45
55
61 62 66
80
74
78
87
98
Complexity Insert
Inserting a key into a B-tree of height h is
done in a single pass down the tree and a
single pass up the tree
Complexity: O (h) O (log t n)
B-Tree: Delete X
Delete as in M-way tree
A problem:
might cause underflow: the number
of keys remain in a node < t-1
Recall: The root should have at least 1 value in it, and all other nodes should
have at least t-1 values in them
M 6; t 3
Underflow Example
60
Delete 87:
20
10 15
40
70
25 35 38
45
55
61 62 66
80
74
10 15
20
40
25 35 38
70
45
87
B-tree
UNDERFLOW
60
5
78
55
61 62 66
80
74
78
98
98
B-Tree: Delete X
Delete as in M-way tree
A problem:
might cause underflow: the number
of keys remain in a node < t-1
Solution:
make sure a node that is visited has
at least t instead of t-1 keys
Recall: The root should have at least 1 value in it, and all other nodes should
have at least t-1 (at most 2t-1) values in them
B-Tree-Delete(x,k)
1st case: k is in x and x is a leaf
delete k
k=66
62 66 70
74
62 70 74
k=50
x
30
50 70
5
35
40
Example t=3
45
90
6
30
45
70
5
35
40 45
90
6
30
35
40
50
70
90
60 z
55
Example t=3
30
70
90
35 40 50
55 65
Questions
When does the height of the tree shrink?
Why do we need the number of keys to be
at least t and not t-1 when we proceed
down in the tree?
Copyright The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
Delete Complexity
Basically downward pass:
Most of the keys are in the leaves
one downward pass
When deleting a key in internal node
may have to go one step up to
replace the key with its predecessor
or successor
Complexity
O(h) O(log t n)
A typical B-Tree
Copyright The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
Why B-Tree?
B-trees is an implementation of dynamic
sets that is optimized for disks
The memory has an hierarchy and there is a
tradeoff between size of units/blocks and access
time
The goal is to optimize the number of times
needed to access an expensive access time
memory
The size of a node is determined by
characteristics of the disk block size page
size
The number of access is proportional to the tree