0% found this document useful (0 votes)
13 views

B Trees

Uploaded by

kr gael
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
13 views

B Trees

Uploaded by

kr gael
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 76

B-trees

Eduardo Laber
David Sotelo
What are B-trees?
• Balanced search trees designed for secondary
storage devices

• Similar to AVL-trees but better at minimizing


disk I/O operations

• Main data structure used by DBMS to store


and retrieve information
What are B-trees?
• Nodes may have many children (from a few to
thousands)

• Branching factor can be quite large

• Every B-tree of n keys has height O(log n)

• In practice, its height is smaller than the height of


an AVL-Tree
B-trees and Branching Factor
Definition of B-trees
B-tree is a rooted tree containing the following
five properties:

1. Every node x has the following attributes:


a) x.n, the number of keys stored in node x
b) The x.n keys: x.key1 ≤ x.key2 ≤ ... ≤ x.keyx.n
c) The boolen x.leaf indicating if x is a leaf or an
internal node
Definition of B-trees
2. If x is an internal node it contains x.n + 1
pointers x.p1 , x.p2 , ... , x.p(x.n + 1) to its children

3. The keys x.keyi separate ranges of trees stored


in each subtree (x.pi , x.pi+1 )

4. All leaves have the same depth == tree’s


height.
Definition of B-trees
5. Bounds on the number of keys of a node:

• Let B be a positive integer representing the order of


the B-tree.

• Every node (except the root) must have at least B keys.

• Every node (except the root) must have at most 2B


keys.

• Root is free to contain between 1 and 2B nodes (why?)


Example of a B-tree
Exercise 1

Enumerate all valid B-trees of order 2


that represent the set {1, 2, ... , 8}
Exercise 1
Solution:

4 5

1 2 3 5 6 7 8 1 2 3 4 6 7 8

3 6

1 2 4 5 7 8
The height of a B-tree
Theorem : Let h be the height of a B-tree of n
keys and order B > 1. Then: h ≤ log B (n+1)/2

Proof:
• Root contains at least one key.
• All other nodes contain at least B keys
• At least one key at depth 0
• At least 2B keys at depth 1
• At least 2B2 + B keys at depth 2
• At least 2Bi + Bi-1 + Bi-2 + ... + B keys at depth i
Proof (continued)
h
n 1   2 B i

i 1
h 1
B 1
 n 1  2
B 1
h n 1 h
 n 2 B  1  B
2
n 1
 h log B ■
2
Searching a B-tree
• Similar to searching a binary search tree.

• Multiway branching decision according to the


number of the node’s chidren.

• Recursive procedure with a time complexity of


O(B log B n) for a tree of n keys and order B.
Searching a B-tree
B-TREE-SEARCH (x, k)
1 i=1
2 while i ≤ x.n and k > x.keyi do i = i + 1
3 if i ≤ x.n and k == x.keyi then return (x, i)
4 if x.leaf then return NIL
5 else
DISK-READ(x.pi)
return B-TREE-SEARCH (x.pi , k)
Searching a B-tree
• Search for the key F

C G K P S

A B D E F H I L M N O Q R T U
Searching a B-tree
• Search for the key F

C G K P S

A B D E F H I L M N O Q R T U
Searching a B-tree
• Search for the key F

C G K P S

A B D E F H I L M N O Q R T U
Searching a B-tree
• Search for the key N

C G K P S

A B D E F H I L M N O Q R T U
Searching a B-tree
Lemma: The time complexity of procedure
B-TREE-SEARCH is O(B log B n)

Proof:
• Number of recursive calls is equal to tree’s
height.
• The height of a B-tree is O(log B n)
• Cost between B and 2B iterations per call.
• Total of O(B log B n) steps. ■
Exercise 2
• Suppose that B-TREE-SEARCH is implemented
to use binary search rather than linear search
within each node.

• Show that this changes makes the time


complexity O(lg n), independently of how B
might be chosen as a function of n.
Exercise 2
Solution:
• By using binary search the number of steps of
the algorithm becomes O(lg B log B n) .

• Observe that log B n = lg n / lg B .

• Therefore O(lg B log B n) = O(lg n).


Linear or Binary B-tree search ?
Lemma: If 1 < B < n then lg n ≤ B logB n

Proof: lg n B
B
B log B n log B n 
lg B
B

lg n  B 
n B  B n B / B
 B
   
B
 lg n  B lg n / B lg n / n B

 lg n B
/ n lg n lg n
B 1
Inserting a key into a B-tree
• The new key is always inserted into an existing leaf node (why?)

• Firstly we search for the leaf position at which to insert the new
key.

• If such a node is full we split it.

• A split operation splits a full node around its median key into
two nodes having B keys each.

• Median key moves up into splitted node’s parent


(insertion recursive call).
Split operation
• Inserting key F into a full node (B = 2)

A C E G K M O Q
Split operation
• Node found but already full

A C E FG K M O Q
Split operation
• Median key identified

A C E FG K M O Q
Split operation
• Splitting the node

E J

A C F G K M O Q
Inserting a key into a B-tree
• Insertion can be propagated upward (B = 2)

E J T X

A C F G K M O Q UW Y Z
Inserting a key into a B-tree
• Insertion can be propagated upward (B = 2)

E J T X

A C F G K MNO Q UW Y Z
Inserting a key into a B-tree
• Insertion can be propagated upward (B = 2)

E J N T X

A C F G K M O Q UW Y Z

SPLIT
Inserting a key into a B-tree
• Insertion can be propagated upward (B = 2)

E J SPLIT T X

A C F G K M O Q U W Y Z
Inserting a key into a B-tree
B-TREE-INSERT (x, k, y)
1 i=1
2 while i ≤ x.n and k < x.keyi do i = i + 1
3 x.n = x.n + 1
4 x.keyi = k
5 x.pi+1 = y
6 for j = x.n downto i+1 do
7 x.keyj = x.keyj-1
8 x.pj = x.pj-1
9 end-for
10 DISK-WRITE(x)
Inserting a key into a B-tree
B-TREE-INSERT (x, k)
11 if x.n > 2*B then
12 [m, z] = SPLIT (x)
13 if x.parent != NIL then
14 DISK-READ (x.parent)
15 end-if
16 else
17 x.parent = ALLOCATE-NODE()
18 DISK-WRITE (x)
19 root = x.parent
20 end-else
21 B-TREE-INSERT (x.parent, m, z)
22 end-if
Inserting a key into a B-tree
SPLIT (x)
1 z = ALLOCATE-NODE()
2 m = FIND-MEDIAN (x)
3 COPY-GREATER-ELEMENTS(x, m, z)
4 DISK-WRITE (z)
5 COPY-SMALLER-ELEMENTS(x, m, x)
6 DISK-WRITE (x)
7 return [m, z]
Inserting a key into a B-tree
• Function B-TREE-INSERT has three arguments:
– The node x at which an element of key k should be inserted
– The key k to be inserted
– A pointer y to the left child of k to be used as one of the
pointers of x during insertion process.

• There is a global variable named root which is a pointer to the root


of the B-Tree.

• Observe that the field x.parent was not defined as an original B-


tree attribute, but is considered just to simplify the process.

• The fields x.leaf should also be updated accordingly.


Inserting a key into a B-tree
Lemma: The time complexity of B-TREE-INSERT is O(B log B n)
Proof:
• Recall that B-TREE-SEARCH function is called first and costs
O(log n) by using binary search. Then, B-TREE-INSERT starts by
visiting a node and proceeds upward.

• At most one node is visited per level/depth and only visited nodes
can be splitted. A most one node is created during the insertion
process. Cost for splitting is proportional to 2B.

• Number of visited nodes is equal to tree’s height and the height of


a B-tree is O(log B n). Cost between B and 2B iterations per visited
node. Total of O(B log B n) steps. ■
Some questions on insertion
• Which split operation increases the tree’s height?
The split of the root of the tree.

• How many DISK-READ operations are executed by the


insertion algorithm?
Every node was read at least twice.

• Does binary search make sense here?


Not exactly. We already pay O(B) to split a node (for
finding the median).
Drawbacks of our insertion method
• Once that the key’s insertion node is found it may be
necessary to read its parent node again (due to splitting).

• DISK-READ/WRITE operations are expensive and would be


executed al least twice for each node in the key’s path.

• It would be necessary to store a nodes’s parent or to use


the recursion stack to keep its reference.

• (Mond and Raz, 1985) provide a solution that spends one


DISK-READ/WRITE per visited node (See at CLRS)
Exercise 3

• Show the results of inserting the keys


E, H, B, A, F, G, C, J, D, I
in order into an empty B-tree of order 1.
Exercise 3
Solution: (final configuration)
E

B G I

A C D F H J
Exercise 4

• Does a B-tree of order 1 is a good choice for a balanced


search tree?

• What about the expression h ≤ log B (n+1)/2 when B =


1?
Deleting a key from a B-tree
• Analogous to insertion but a little more complicated.

• A key can be deleted from any node (not just a leaf) and
can affect its parent and its children (insertion operation
just affect parents).

• One must ensure that a node does not get to small


during deletion (less than B keys).

• As a result deleting a node is the most complex operation


on B-trees. It will be considered in 4 particular cases.
Deleting a key from a B-tree

• Case 1: The key is in a leaf node with more


than B elements.

Procedure:
 Just remove the key from the node.
Deleting a key from a B-tree
• Case 1: The key is in a leaf node with more
than B elements (B = 2)

E J T X

A C D F G K M O Q U W Y Z
Deleting a key from a B-tree
• Case 1: The key is in a leaf node with more
than B elements (B = 2)

E J T X

A D F G K M O Q U W Y Z
Deleting a key from a B-tree
Case 2: The join procedure
• The key k1 to be deleted is in a leaf x with exactly B elements.
• Let y be a node that is an “adjacent brother” of x.
• Suppose that y has exactly B elements.
Procedure:
 Remove the key k1.
 Let k2 be the key that separates nodes x and y in their parent.
 Join the the nodes x and y and move the key k2 from the parent
to the new joined node.
 If the parent of x becomes with B-1 elements and also has an
“adjacent brother” with B elements, apply the join procedure
recursively for the parent of x (seen as x) and its adjacent
brother (seen as y).
Deleting a key from a B-tree
• Case 2: Delete key Q (B = 2)

K T X

...
H I O Q U W Y Z
Deleting a key from a B-tree
• Case 2: Delete key Q (B = 2)

F
Parent

K T X

...
H I O Q U W Y Z
Node x Node y
Deleting a key from a B-tree
• Case 2: Delete key Q (B = 2)

F
Parent

K T X

...
H I O U W Y Z
Node x Node y
Deleting a key from a B-tree
• Case 2: Delete key Q (B = 2)

F
Parent

K T X

...
H I O U W Y Z
Node x Node y
Deleting a key from a B-tree
• Case 2: Delete key Q (B = 2)

F
Parent

K X

...
H I O T U W Y Z
Join
Deleting a key from a B-tree
• Case 2: Delete key Q (B = 2)

K X

...
H I O T U W Y Z
Deleting a key from a B-tree
Case 3: join and split
• The key k1 to be deleted is in a leaf x with exactly B elements.
• Let y be a node that is an “adjacent brother” of x.
• Suppose that y has more than B elements.
Procedure:
 Remove the key k1.
 Let k2 be the key that separates nodes x and y in their parent.
 Join the the nodes x and y and move the key k2 from the parent
to the new joined node z.
 Find the median key m of z
 Determine the new nodes x and y by splitting z around m.
 Insert m into the parent of x and y.
Deleting a key from a B-tree
• Case 3: Delete key F (B = 2)

E J T X

A C D F G K M O Q U W Y Z
Deleting a key from a B-tree
• Case 3: Delete key F (B = 2)

E J T X

A C D F G K M O Q U W Y Z
Deleting a key from a B-tree
• Case 3: Delete key F (B = 2)

E J T X

A C D G K M O Q U W Y Z
Deleting a key from a B-tree
• Case 3: Delete key F (B = 2)

N
Parent

E J T X

A C D G K M O Q U W Y Z
Node y Node x
Deleting a key from a B-tree
• Case 3: Delete key F (B = 2)

N
Parent

E J T X

A C D G K M O Q U W Y Z
Node y Node x
Deleting a key from a B-tree
• Case 3: Delete key F (B = 2)

J T X

Join
A C D E G
K M O Q U W Y Z
Deleting a key from a B-tree
• Case 3: Delete key F (B = 2)

J T X
Median key

A C D E G
K M O Q U W Y Z
Deleting a key from a B-tree
• Case 3: Delete key F (B = 2)

D J T X

Split

A C E G K M O Q U W Y Z
Deleting a key from a B-tree
• Case 3: Delete key F (B = 2)

D J T X

A C E G K M O Q U W Y Z
Deleting a key from a B-tree
Case 4: internal node
• The key k1 to be deleted is in a node x that is not a leaf or
a root.

Procedure:
 Let k2 be the smallest key that is greater than k1.
 Let y be the node of k2, which will be a leaf.
 Insert key k2 into x.
 Remove the key k1 from x.
 Solve now the problem of removing k2 from a leaf y,
previously considered.
Deleting a key from a B-tree
• Case 4: Delete key T (B = 2)

D J T X

A C E G K M O Q U W Y Z
Deleting a key from a B-tree
• Case 4: Delete key T (B = 2)

N
Node x

D J T X

A C E G K M O Q U W Y Z
Deleting a key from a B-tree
• Case 4: Delete key T (B = 2)

N
Node x

D J T X

A C E G K M O Q U W Y Z
Node y
Deleting a key from a B-tree
• Case 4: Delete key T (B = 2)

N
Node x

D J U X

A C E G K M O Q W Y Z
Node y
Deleting a key from a B-tree
• Case 4: Delete key T (B = 2)

N
Node x

D J U X

A C E G K M O Q W Y Z
Node y
Deleting a key from a B-tree
• Case 4: Delete key T (B = 2)

N
Node x

D J U X

A C E G K M O Q W Y Z
Node y
Deleting a key from a B-tree
• Case 4: Delete key T (B = 2)

N
Node x

D J U X

A C E G K M O Q W Y Z
Node y
Deleting a key from a B-tree
• Case 4: Delete key T (B = 2)

N
Node x

D J U

A C E G K M O Q W X Y Z
Node y
Deleting a key from a B-tree
• Case 4: Delete key T (B = 2)

N
Node x

D J U

A C E G K M O Q W X Y Z
Node y
Deleting a key from a B-tree
• Case 4: Delete key T (B = 2)

N
Node x

D J U

A C E G K M O Q W X Y Z
Node y
Deleting a key from a B-tree
• Case 4: Delete key T (B = 2)

N
Node x

D J U

A C E G K M O Q W X Y Z
Node y
Deleting a key from a B-tree
• Case 4: Delete key T (B = 2)

N
Node x

D J U

A C E G K M O Q W X Y Z
Node y
Deleting a key from a B-tree
• Case 4: Delete key T (B = 2)

D J N U

A C E G K M O Q W X Y Z

You might also like