Chp2 - Advanced Data Structure
Chp2 - Advanced Data Structure
STRUCTURES
B Tree
● In a binary search tree, AVL Tree, Red-Black tree etc., every node can have
only one value (key) and maximum of two children but there is another type of
search tree called B-Tree in which a node can store more than one value (key)
and it can have more than two children.
● B-Tree can be defined as a self-balanced search tree with multiple keys in
every node and more than two children for every node.Here, number of keys in
a node and number of children for a node is depend on the order of the
B-Tree. Every B-Tree has order.
● The main idea of using B-Trees is to reduce the number of disk accesses.
● Generally, the B-Tree node size is kept equal to the disk block size.
● Since the height of the B-tree is low so total disk accesses for most of the
operations are reduced significantly compared to balanced Binary Search
Trees like AVL Tree, Red-Black Tree, ..etc.
B Tree-Time complexity
1. Search O(log n)
2. Insert O(log n)
3. Delete O(log n)
Properties of B tree
● All the leaf nodes must be at same level.
● All nodes except root must have at least [m/2]-1 keys and
maximum of m-1 keys.
● All non leaf nodes except root (i.e. all internal nodes) must have
at least m/2 children.
● If the root node is a non leaf node, then it must have at least 2
children.
● A non leaf node with n-1 keys must have n number of children.
● All the key values within a node must be in Ascending Order.
B tree Insertion
Insertions are done at the leaf node level. The following algorithm needs to be followed in order to insert an item into B Tree.
1. Traverse the B Tree in order to find the appropriate leaf node at which the node can be inserted.
2. If the leaf node contain less than m-1 keys then insert the element in the increasing order.
3. Else, if the leaf node contains m-1 keys, then follow the following steps.
○ Insert the new element in the increasing order of elements.
○ Split the node into the two nodes at the median.
○ Push the median element upto its parent node.
○ If the parent node also contain m-1 number of keys, then split it too by following the same steps.
Example:
Insert the node 8 into the B Tree of order 5 shown in the following image.
The node, now contain 5 keys which is greater than (5 -1 = 4 ) keys. Therefore split the node from the median i.e. 8 and push it up to
its parent node shown as follows.
Deletion
Deletion is also performed at the leaf nodes. The node which is to be deleted can either be a leaf node or an internal node. Following
algorithm needs to be followed in order to delete a node from a B tree.
If the the node which is to be deleted is an internal node, then replace the node with its in-order successor or predecessor. Since,
successor or predecessor will always be on the leaf node hence, the process will be similar as the node is being deleted from the leaf
node.
Example
Delete the node 53 from the B Tree of order 5 shown in the following figure.
Searching in B Trees is similar to that in Binary search tree. For example, if we search for an item 49 in the following B Tree.
The process will something like following :
1. Compare item 49 with root node 78. since 49 < 78 hence, move to its left sub-tree.
2. Since, 40<49<56, traverse right sub-tree of 40.
3. 49>45, move to right. Compare 49.
4. match found, return.
Searching in a B tree depends upon the height of the tree. The search algorithm takes O(log n) time to search any element in
a B tree.
Application of B tree
● B tree is used to index the data and provides fast access to the actual data stored on the
disks since, the access to value stored in a large database that is stored on a disk is a very
time consuming process.
● Searching an un-indexed and unsorted database containing n key values needs O(n)
running time in worst case. However, if we use B Tree to index this database, it will be
searched in O(log n) time in worst case.
Drawback
● The drawback of B-tree used for indexing,
● however is that it stores the data pointer (a pointer to the disk file block containing the key value),
corresponding to a particular key value, along with that key value in the node of a B-tree.
● This technique, greatly reduces the number of entries that can be packed into a node of a B-tree, thereby
contributing to the increase in the number of levels in the B-tree, hence increasing the search time of a
record.
Solution
B+ tree eliminates the above drawback by storing data pointers only at the leaf nodes of the tree.
B+Tree
● B+ Tree is an extension of B Tree which allows efficient insertion, deletion and search operations.
● In B Tree, Keys and records both can be stored in the internal as well as leaf nodes. Whereas, in B+ tree,
records (data) can only be stored on the leaf nodes while internal nodes can only store the key values.
● The leaf nodes of a B+ tree are linked together in the form of a singly linked lists to make the search
queries more efficient.
● B+ Tree are used to store the large amount of data which can not be stored in the main memory. Due to the
fact that, size of main memory is always limited, the internal nodes (keys to access records) of the B+ tree
are stored in the main memory whereas, leaf nodes are stored in the secondary memory.
● A B+ tree is a data structure often used in the implementation of database indexes.
● Each node of the tree contains an ordered list of keys and pointers to lower level nodes in the tree. These
pointers can be thought of as being between each of the keys.
● To search for or insert an element into the tree, one loads up the root node, finds the adjacent keys that the
searched-for value is between, and follows the corresponding pointer to the next node in the tree.
Recursing eventually leads to the desired value or the conclusion that the value is not present..
Properties
a. The root node points to at least two nodes.
b. All non-root nodes are at least half full.
c. For a tree of order m, all internal nodes have m-1 keys and m pointers.
d. A B+-Tree grows upwards.
e. A B+-Tree is balanced.
f. Sibling pointers allow sequential searching.
Advantages Of B+ Trees
Step 2: If the leaf doesn't have required space, split the node and copy the middle node to the next index node.
Step 3: If the index node doesn't have required space, split the node and copy the middle element to the next index page.
Example :
Insert the value 195 into the B+ tree of order 5 shown in the following figure.
195 will be inserted in the right sub-tree of 120 after 190. Insert it at the desired position.
The node contains greater than the maximum number of elements i.e. 4, therefore split it and place the median node up to
the parent.
Now, the index node contains 6 children and 5 keys which violates the B+ tree properties, therefore we need to split it, shown as
follows.
Deletion in B+ Tree
Step 2: if the leaf node contains less than minimum number of elements, merge down the node with its sibling and delete
the key in between them.
Step 3: if the index node contains less than minimum number of elements, merge the node with the sibling and move down
the key in between them.
Example
Delete the key 200 from the B+ Tree shown in the following figure.
200 is present in the right sub-tree of 190, after 195. delete it.
Merge the two nodes by using 195, 190, 154 and 129.
Now, element 120 is the single element present in the node which is violating the B+ Tree properties. Therefore, we need to
merge it by using 60, 78, 108 and 120.
1. All internal and leaf nodes have data pointers Only leaf nodes have data pointers
Since all keys are not available at leaf, search often takes more All keys are at leaf nodes, hence search is faster
2.
time. and accurate..
Deletion of internal node is very complex and tree has to undergo Deletion of any node is easy because all node
5.
lot of transformations. are found at leaf.
6. Leaf nodes are not stored as structural linked list. Leaf nodes are stored as structural linked list.
7. No redundant search keys are present.. Redundant search keys may be present..
Red black tree
● A red-black tree is a kind of self-balancing binary search tree where each node
has an extra bit, and that bit is often interpreted as the colour (red or black).
● These colours are used to ensure that the tree remains balanced during insertions
and deletions.
● Although the balance of the tree is not perfect, it is good enough to reduce the
searching time and maintain it around O(log n) time, where n is the total number of
elements in the tree.
● It must be noted that as each node requires only 1 bit of space to store the colour
information, these types of trees show identical memory footprint to the classic
(uncoloured) binary search tree.
Rules
● Every node follow BST property
● Every node has a colour either red or black.
● The root of the tree is always black.
● There are no two adjacent red nodes (A red node cannot have a red parent or red child).
● Every path from a node (including root) to any of its descendants NULL nodes has the
same number of black nodes (In all the paths of the tree, there should be same number of
BLACK colored nodes.).
● Every Leaf (null) is black
● A nil is recognized to be black. This factor that every non-NIL node has two children.
● Every new node must be inserted with RED color.
Why Red-Black Trees?
● Most of the BST operations (e.g., search, max, min, insert, delete..
etc) take O(h) time where h is the height of the BST.
● The cost of these operations may become O(n) for a skewed Binary
tree.
● If we make sure that the height of the tree remains O(log n) after
every insertion and deletion, then we can guarantee an upper bound
of O(log n) for all these operations.
● The height of a Red-Black tree is always O(log n) where n is the
number of nodes in the tree.
Comparison with AVL Tree:
● The AVL trees are more balanced compared to Red-Black Trees, but they
may cause more rotations during insertion and deletion.
● So if your application involves frequent insertions and deletions, then
Red-Black trees should be preferred.
● And if the insertions and deletions are less frequent and search is a more
frequent operation, then AVL tree should be preferred over Red-Black Tree.
Examples of finding red black tree
Insertion
In a Red-Black Tree, every new node must be inserted with the color RED. The insertion operation in Red Black Tree
is similar to insertion operation in Binary Search Tree. But it is inserted with a color property. After every insertion
operation, we need to check all the properties of Red-Black Tree. If all the properties are satisfied then we go to next
operation otherwise we perform the following operation to make it Red Black Tree.
● 1. Recolor
● 2. Rotation
● 3. Rotation followed by Recolor
The insertion operation in Red Black tree is performed using the following steps...
● Step 1 - Check whether tree is Empty.
● Step 2 - If tree is Empty then insert the newNode as Root node with color Black and exit from the operation.
● Step 3 - If tree is not Empty then insert the newNode as leaf node with color Red.
● Step 4 - If the parent of newNode is Black then exit from the operation.
● Step 5 - If the parent of newNode is Red then check the color of parentnode's sibling of newNode.
● Step 6 - If it is colored Black or NULL then make suitable Rotation and Recolor it.
● Step 7 - If it is colored Red then perform Recolor. Repeat the same until tree becomes Red Black Tree.
Recolour
Recolouring is the change in colour of the node i.e. if it is red then change
it to black and vice versa. It must be noted that the colour of the NULL
node is always black. Moreover, we always try recolouring first, if
recolouring doesn’t work, then we go for rotation. Following is a detailed
algorithm. The algorithms have mainly two cases depending upon the
colour of the uncle. If the uncle is red, we do recolour. If the uncle is black,
we do rotations and/or recolouring.
Logic:
First, you have to insert the node similarly to that in a binary tree and assign a red colour to
it. Now, if the node is a root node then change its colour to black, but if it does not then check
the colour of the parent node. If its colour is black then don’t change the colour but if it is not
i.e. it is red then check the colour of the node’s uncle. If the node’s uncle has a red colour
then change the colour of the node’s parent and uncle to black and that of grandfather to red
colour and repeat the same process for him (i.e. grandfather).
But, if the node’s uncle has black colour then there are 4 possible cases:
Now, after these rotations, if the colours of the nodes are miss matching then recolour them.
Algorithm
● Let x be the newly inserted node.
○ Perform standard BST insertion and make the colour of newly inserted nodes as RED.
○ If x is the root, change the colour of x as BLACK (Black height of complete tree increases by 1).
○ Do the following if the color of x’s parent is not BLACK and x is not the root.
a) If x’s uncle is RED (Grandparent must have been black from property 4)
(i) Change the colour of parent and uncle as BLACK.
(ii) Colour of a grandparent as RED.
(iii) Change x = x’s grandparent, repeat steps 2 and 3 for new x.
b) If x’s uncle is BLACK, then there can be four configurations for x, x’s parent (p) and x’s grandparent (g) (This is similar to AVL Tree)
(i) Left Left Case (p is left child of g and x is left child of p)
(ii) Left Right Case (p is left child of g and x is the right child of p)
(iii) Right Right Case (Mirror of case i)
(iv) Right Left Case (Mirror of case ii)
Creating a red-black tree with elements 3, 21, 32 and 17 in an empty tree
Final tree
Deletion in RB tree
● The main property that violates after insertion is two consecutive reds.
● In delete, the main violated property is, change of black height in subtrees as
deletion of a black node may cause reduced black height in one root to leaf path.
● Deletion is a fairly complex process.
● To understand deletion, the notion of double black is used.
● When a black node is deleted and replaced by a black child, the child is marked as
double black. The main task now becomes to convert this double black to single
black.
Step 1
1) Perform standard BST delete. When we perform standard delete operation
in BST, we always end up deleting a node which is an either leaf or has only
one child (For an internal node, we copy the successor and then recursively
call delete for successor, successor is always a leaf node or a node with one
child). So we only need to handle cases where a node is leaf or has one child.
Let v be the node to be deleted and u be the child that replaces v (Note that u
is NULL when v is a leaf and color of NULL is considered as Black).
Step 2
2) Simple Case: If either u or v is red, we mark the replaced child as black (No change in black height). Note
that both u and v cannot be red as v is parent of u and two consecutive reds are not allowed in red-black tree.
Step 3
3) If Both u and v are Black.
3.1) Color u as double black. Now our task reduces to convert this double black to single black. Note that If v is
leaf, then u is NULL and color of NULL is considered black. So the deletion of a black leaf also causes a double
black.
Step 3
3.2) Do following while the current node u is double black, and it is not the root. Let sibling of node be s.
….(a): If sibling s is black and at least one of sibling’s children is red, perform rotation(s). Let the red child of
s be r. This case can be divided in four subcases depending upon positions of s and r.
…………..(i) Left Left Case (s is left child of its parent and r is left child of s or both children of s are red). This is
mirror of right right case shown in below diagram.
…………..(ii) Left Right Case (s is left child of its parent and r is right child). This is mirror of right left case shown
in below diagram.
…………..(iii) Right Right Case (s is right child of its parent and r is right child of s or both children of s are red)
Step 3
(iv) Right Left Case (s is right child of its parent and r is left child of s)
Step 3
(b): If sibling is black and its both children are black, perform recoloring, and recur for the parent if parent is
black. In this case, if parent was red, then we didn’t need to recur for parent, we can simply make it black (red +
double black = single black)
Step 3
(c): If sibling is red, perform a rotation to move old sibling up, recolor the old sibling and parent. The new sibling
is always black (See the below diagram). This mainly converts the tree to black sibling case (by rotation) and
leads to case (a) or (b). This case can be divided in two subcases.
…………..(i) Left Case (s is left child of its parent). This is mirror of right right case shown in below diagram. We
right rotate the parent p.
…………..(iii) Right Case (s is right child of its parent). We left rotate the parent p.
Step 3
3.3) If u is root, make it single black and return (Black height of complete tree reduces by 1)
Heap data structure
A Heap is a special Tree-based data structure in which the tree is a complete binary tree (A complete
binary tree is a binary tree in which all the levels except the last level, i.e., leaf node should be completely
The heap tree is a special balanced binary tree data structure where the root node is compared with its
children and arrange accordingly.
44 44 66
Deletion Operation in Max Heap
In a max heap, deleting the last node is very simple as it does not disturb max heap properties.
Deleting root node from a max heap is little difficult as it disturbs the max heap properties. We use the
following steps to delete the root node from a max heap...
● Step 1 - Swap the root node with last node in max heap
● Step 2 - Delete last node.
● Step 3 - Now, compare root value with its left child value.
● Step 4 - If root value is smaller than its left child, then compare left child with its right sibling. Else
goto Step 6
● Step 5 - If left child value is larger than its right sibling, then swap root with left child otherwise
swap root with its right child.
● Step 6 - If root value is larger than its left child, then compare root value with its right child value.
● Step 7 - If root value is smaller than its right child, then swap root with right child otherwise stop
the process.
● Step 8 - Repeat the same until root node fixes at its exact position.
Delete root node 90
Swap root node 90 with last node 75
Delete last node. Here the last node is 90. After deleting node with value 90 from heap, max
heap is as follows...
Here, root value (75) is smaller than its left
child value (89). So, compare left child (89) with
Compare root node (75) with its left child (89). its right sibling (70).
Here, left child value (89) is larger than its
right sibling (70), So, swap root (75) with left Now, again compare 75 with its left child (36).
child (89).
Here, node with value 75 is larger than its left
child. So, we compare node 75 with its right ● Here, node with value 75 is smaller
child 85. than its right child (85). So, we swap
both of them. After swapping max heap
is as above.
●
Now, compare node with value 75 with its left
child (15). Final tree
Here, node with value 75 is larger than its left
child (15) and it does not have right child. So
we stop the process.
Applications of Binary Heaps
● Binary heaps are used in a famous sorting algorithm known as Heap sort.
● Binary heaps are also the main reason of implementing priority queues, as because of them
the several priority queue operations like add(), remove() etc gets a time complexity of O(n).
● They are also the most preferred choice for solving Kth smallest / Kth Largest element
questions.
Insertion: O(log N)
Deletion: O (log N)
I.e Priority queue is a type of queue in which every element has a key associated to it and the queue
returns the element according to these keys, unlike the traditional queue which works on first come first
serve basis.
Priority queue using heap
a max-priority queue returns the element with maximum key first whereas, a
min-priority queue returns the element with the smallest key first.
Priority queue using heap
Priority queues are used in many algorithms like Huffman Codes, Prim's algorithm, etc. It is also
Heaps are great for implementing a priority queue because of the largest and smallest element at
the root of the tree for a max-heap and a min-heap respectively. We use a max-heap for a
2. Maximum/Minimum → To get the maximum and the minimum element from the max-priority queue and
min-priority queue respectively.
3. Extract Maximum/Minimum → To remove and return the maximum and the minimum element from the
max-priority queue and min-priority queue respectively.
new data must go in a place according to the specified order. This is what the insert operation does.
● The entire point of the priority queue is to get the data according to the key of the data and the
● Situation: we need to change the key of an element, so Increase/Decrease key is used to do that.
Operations on priority queue-Maximum/Minimum
We know that the maximum (or minimum) element of a priority queue is at the root of the max-heap (or
min-heap). So, we just need to return the element at the root of the heap Returning an element from an array is a
● Doing this, we have disturbed the heap property of the root but we have not
touched any of its children, so they are still heaps. So, we can call Heapify on
the root to make the tree a heap again.
● All the steps are constant time taking process except the Heapify operation, it
will take O(lgn) time and thus the xtract Maximum/Minimum is going to take
O(lgn) time.
Operations on priority queue-Increase and decrease key
● Whenever we change the key of an element, it must change its position to go in a place of correct order
according to the new key.
● If the heap is a max-heap and we are decreasing the key, then we just need to check if the key became
smaller than any of its children or not.
● If the new key is smaller than any of its children, then it is violating the heap property, so we will call
Heapify on it.
Operations on priority queue-Increase and decrease key
● In the case of increasing the key of an element in a max-heap, we might make it greater than
the key of its parent and thus violating the heap property.
● In this case, we swap the values of the parent and the node and this is done until the parent of
the node becomes greater than the node itself.
Operations on priority queue-Insert
● The insert operation inserts a new element in the correct order according to
its key. We just insert a new element at the last of the heap and increase
the heap size by 1.
● Since it is the last element, so we first give a very large value (infinity) in the
case of min-heap and a very less value (-inf) in the case of max-heap. Then
we just change the key of the element by using the Increase/Decrease key
operation.
● ll the steps are constant time taking steps, except the Increase/Decrease
key. So it will also take O(lgn) time.
Topological Sorting
● Topological Sort is a linear ordering of the vertices in such a way that if there is an edge in
the DAG going from vertex ‘u’ to vertex ‘v’, then ‘u’ comes before ‘v’ in the ordering.
● Topological Sorting is possible if and only if the graph is a Directed Acyclic Graph.
● There may exist multiple different topological orderings for a given directed acyclic graph.
● The ordering of the nodes in the array is called a topological ordering.
● 123456
● 123465
● 132456
● 132465
The Algorithm
1) Select the node that can't have any incoming directed edges; it must have an indegree of zero and
add it to the topological ordering.
2) Once a node is added to the topological ordering, we can take the node, and its outgoing edges,
out of the graph.
The Algorithm
3) Then, we can repeat our earlier approach: look for any node with an indegree of zero and add it to
the ordering.
Implementation
We'll use the strategy we outlined above:
4. Repeat.
We'll keep looping until there aren't any more nodes with indegree zero. This could happen for two reasons:
● There are no nodes left. We've taken all of them out of the graph and added them to the topological ordering.
● There are some nodes left, but they all have incoming edges. This means the graph has a cycle, and no
One small tweak. Instead of actually removing the nodes from the graph (and destroying our input!), we'll use a hash
map to track each node's indegree. When we add a node to the topological ordering, we'll decrement the indegree of
that node's neighbors, representing that those nodes have one fewer incoming edges.
Example
Example
Example
Applications of Topological Sort-