Data Structures Notes Using C++
Data Structures Notes Using C++
ON
DATA STRUCTURES USING C++
Unit I:
Algorithms, performance analysis‐ time complexity and space
complexity, Searching: Linear and binary search methods.
Sorting: Bubble sort, selection sort, Insertion sort, Quick sort, Merge sort, Heap sort.
Time complexities.
Unit II:
basic data structures‐ The list ADT, Stack ADT, Queue ADT, array and linked list
Implementation using template classes in C++.Trees‐Basic terminology Binary Tree
ADT, array and linked list Implementation, Binary tree traversals, threaded binary tree.
Unit III:
Priority Queues – Definition, ADT, Realizing a Priority Queue using Heaps, Definition,
insertion, Deletion, External Sorting‐ Model for external sorting, Multiway merge,
Polyphase merge.
Unit IV:
Dictionaries, linear list representation, skip list representation, operations insertion,
deletion and searching, hash table representation, hash functions, collision
resolution‐separate chaining, open addressing‐linear probing, quadratic probing, double
hashing, rehashing, extendible hashing, comparison of hashing and skip lists.
Unit V:
Search Trees:‐
Binary Search Trees, Definition, ADT, Implementation, Operations‐ Searching,
Insertion and Deletion, AVL Trees, Definition, Height of an AVL Tree, Operations –
Insertion, Deletion and Searching, B‐ Trees, B‐Tree of order m, height of a B‐Tree,
insertion, deletion and searching.
Graphs: Basic terminology, representation of graphs, graph search methods DFS,BFS
TEXT BOOKS:
1. Data Structures using C++, Special Edition‐MRCET, Tata McGraw‐Hill Publishers
2017.
2. Data structures, Algorithms and Applications in C++, S.Sahni, University
Press (India) Pvt.Ltd, 2nd edition, Universities Press Orient Longman Pvt. Ltd.
REFERENCES:
1. Data structures and Algorithms in C++, Michael T.Goodrich, R.Tamassia and
.Mount, Wiley student edition, John Wiley and Sons.
2. Data structures and Algorithm Analysis in C++, Mark Allen Weiss, Pearson
Education. Ltd., Second Edition.
3. Data structures and algorithms in C++, 3rd Edition, Adam Drozdek, Thomson
4. Data structures using C and C++, Langsam, Augenstein and Tanenbaum, PHI.
5. Problem solving with C++, The OOP, Fourth edition, W.Savitch, Pearson education.
MALLA REDDY COLLEGE OF ENGINEERING & TECHNOLOGY
DEPARTMENT OF INFORMATION TECHNOLOGY
INDEX
S. No Topic Page no
Unit
Algorithms, performance analysis- time
1 I complexity and space complexity 1
II
4 basic data structures‐ The list ADT 12
II
5 Doubly linked list 21
II Binary trees
6 36
III
7 Priority Queues – Definition, ADT 44
III
8 External Sorting‐ Model for external sorting, 55
III
9 Multiway merge 56
IV
10 Dictionaries, linear list representation, 59
IV
11 skip list representation 61
IV
12 hashing, rehashing, extendible hashing 64
V
13 Binary Search Trees, Definition 77
V
14 AVL Trees, Definition, Height of an AVL Tree 87
V
15 Graphs: Basic terminology, representation of graphs 107
UNIT -1
ALGORITHMS
Definition: An Algorithm is a method of representing the step-by-step procedure for
solving a problem. It is a method of finding the right answer to a problem or to a
different problem by breaking the problem into simple cases.
5. Input/output: Each algorithm must take zero, one or more quantities as input
data and gives one of more output values.
An algorithm can be written in English like sentences or in any
standard representations. The algorithm written in English language is called Pseudo
code.
Development Of An Algorithm
The steps involved in the development of an algorithm are as follows:
Testing and Validating: Once the program is written , it must be tested and then validated.
i.e., to check whether the program is producing correct results or not for different values of
input.
PERFORMANCE ANALYSIS
When several algorithms can be designed for the solution of a problem, there arises
the need to determine which among them is the best. The efficiency of a program or an
algorithm is measured by computing its time and/or space complexities.
The time complexity of an algorithm is a function of the running time of the
algorithm.
The space complexity is a function of the space required by it to run to completion.
The time complexity is therefore given in terms of frequency count.
Frequency count is basically a count denoting number of times a statement execution
Asymptotic Notations:
To choose the best algorithm, we need to check efficiency of each algorithm.
The efficiency can be measured by computing time complexity of each
algorithm. Asymptotic notation is a shorthand way to represent the time
complexity.
Using asymptotic notations we can give time complexity as ―fastest possible‖,
―slowest possible‖ or ―average time‖.
Various notations such as Ω, θ, O used are called asymptotic notions.
Definition:
Let, f(n) and g(n) are two non-negative functions. And if there exists an integer n0 and
constant C such that C > 0 and for all integers n > n0, f(n) ≤ c*g(n), then
f(n) = Og(n).
Omega Notation:-
Omega notation denoted ‗Ω‘ is a method of representing the lower bound of
algorithm‘s running time. Using omega notation we can denote shortest amount of time
taken by algorithm to complete.
Definition:
Let, f(n) and g(n) are two non-negative functions. And if there exists an integer n0
and constant C such that C > 0 and for all integers n > n0, f(n) >c*g(n), then
f(n) = Ω g(n).
Page 3
4 { 0
5 write(―Hello‖); n
6 } 0
7 } 0
total frequency count 2n+1
While computing the time complexity we will neglect all the constants, hence ignoring 2
and 1 we will get n. Hence the time complexity becomes O(n).
f(n) = Og(n).
i.e f(n)=O(2n+1)
=O(n) // ignore constants
1 Algorithm add(A,B,m,n) 0
2 { 0
3 for i=1 to m do m+1
4 for j=1 to n do m(n+1)
5 C[i,j] = A[i,j]+B[i,j] mn
within some data structure. Data structures can include linked lists, arrays, search trees, hash
tables, or various other storage methods. The appropriate search algorithm often depends on
the data structure being searched.
Search algorithms can be classified based on their mechanism of searching. They are
Linear searching
Binary searching
We begin search by comparing the first element of the list with the target element. If
it matches, the search ends and position of the element is returned. Otherwise, we will move
to next element and compare. In this way, the target element is compared with all the
elements until a match occurs. If the match do not occur and there are no more elements to
be compared, we conclude that target element is absent in the list by returning position as -
1.
BINARY SEARCHING
Binary search is a fast search algorithm with run-time complexity of Ο(log n). This search
algorithm works on the principle of divide and conquer. Binary search looks for a particular
item by comparing the middle most item of the collection. If a match occurs, then the index
of item is returned. If the middle item is greater than the item, then the item is searched in
the sub-array to the left of the middle item. Otherwise, the item is searched for in the sub-
array to the right of the middle item. This process continues on the sub-array as well until
the size of the subarray reduces to zero.
Before applying binary searching, the list of items should be sorted in ascending or
descending order.
Best case time complexity is O(1)
Worst case time complexity is O(log n)
SORTING
Arranging the elements in a list either in ascending or descending order. various sorting
algorithms are
Bubble sort
selection sort
Insertion sort
Quick sort
Merge sort
Heap sort
The bubble sort is an example of exchange sort. In this method, repetitive comparison is
performed among elements and essential swapping of elements is done. Bubble sort is
commonly used in sorting algorithms. It is easy to understand but time consuming i.e. takes
more number of comparisons to sort a list . In this type, two successive elements are
compared and swapping is done. Thus, step-by-step entire array elements are checked. It is
different from the selection sort. Instead of searching the minimum element and then
applying swapping, two records are swapped instantly upon noticing that they are not in
order.
ALGORITHM:
Bubble_Sort ( A [ ] , N )
Step 1: Start
Step 2: Take an array of n elements
Step 3: for i=0,………….n-2
Step 4: for j=i+1,…….n-1
Step 5: if arr[j]>arr[j+1] then
Interchange arr[j] and arr[j+1]
End of if
Step 6: Print the sorted array arr
Step 7:Stop
SELECTION SORT
Insertion sort: It iterates, consuming one input element each repetition, and growing a
sorted output list. Each iteration, insertion sort removes one element from the input data,
finds the location it belongs within the sorted list, and inserts it there. It repeats until no
input elements remain.
Page 15
ALGORITHM:
Step 1: start
Step 2: for i ← 1 to length(A)
Step 3: j ← i
Step 4: while j > 0 and A[j-1] > A[j]
Step 5: swap A[j] and A[j-1]
Step 6: j←j-1
Step 7: end while
Step 8: end for
Step9: stop
QUICK SORT
Quick sort: It is a divide and conquer algorithm. Developed by Tony Hoare in 1959. Quick
sort
first divides a large array into two smaller sub-arrays: the low elements and the high
elements.
Quick sort can then recursively sort the sub-arrays.
ALGORITHM:
Step 2: Partitioning: reorder the array so that all elements with values less than the pivot
come before the pivot, while all elements with values greater than the pivot come
after it (equal values can go either way). After this partitioning, the pivot is in its
final position. This is called the partition operation.
Step 3: Recursively apply the above steps to the sub-array of elements with smaller
values and separately to the sub-array of elements with greater values.
Merge sort is a sorting technique based on divide and conquer technique. In merge sort the
unsorted list is divided into N sublists, each having one element, because a list of one
element is considered sorted. Then, it repeatedly merge these sublists, to produce new sorted
sublists, and at lasts one sorted list is produced. Merge Sort is quite fast, and has a time
complexity of O(n log n).
int main()
{
int n,i;
int list[30];
cout<<"enter no of elements\n";
cin>>n;
cout<<"enter "<<n<<" numbers ";
for(i=0;i<n;i++)
cin>>list[i];
mergesort (list,0,n-1);
cout<<" after sorting\n";
for(i=0;i<n;i++)
cout<<list[i]<<‖\t‖;
return 0;
}
RUN 1:
enter no of elements 5
enter 5 numbers 44 33 55 11 -1
after sorting -1 11 33 44 55
It is a completely binary tree with the property that a parent is always greater than or
equal to either of its children (if they exist). first the heap (max or min) is created using
binary tree and then heap is sorted using priority queue.
Steps Followed:
a) Start with just one element. One element will always satisfy heap property.
b) Insert next elements and make this heap.
c) Repeat step b, until all elements are included in the heap.
Basic data structures- The list ADT, Stack ADT, Queue ADT,array and linked list Implementation
using template classes in C++.Trees-Basic terminology Binary Tree ADT, array and linked list
Implementation, Binary tree traversals, threaded binary tree.
Data structure A data structure is a specialized format for organizing and storing data.
General data structure types include the array, the file, the record, the table, the tree, and so
on. Any data structure is designed to organize data to suit a specific purpose so that it can be
accessed and worked with in appropriate ways
In computer science, an abstract data type (ADT) is a mathematical model for data
types where a data type is defined by its behavior (semantics) from the point of view of a
user of the data, specifically in terms of possible values, possible operations on data of this
type, and the behavior of these operations. When a class is used as a type, it is an abstract
type that refers to a hidden representation. In this model an ADT is typically implemented
as a class, and each instance of the ADT is usually a n object of that class. In ADT all the
implementation details are hidden
Linear data structures are the data structures in which data is arranged in a list or
in a sequence.
Non linear data structures are the data structures in which data may be
arranged in a hierarchic al manner
LIST ADT
List is basically the collection of elements arrange d in a sequential manner. In
memory we can store the list in two ways: one way is we can store the elements in
sequential memory locations. That means we can store the list in arrays.
The other way is we can use pointers or links to associate elements sequentially.
This is known as linked list.
The linked list is very different type of collection from an array. Using such lists, we
can store collections of information limited only by the total amount of memory that the OS
will allow us to use.Further more, there is no need to specify our needs in advance. The
linked list is very flexible dynamic data structure : items may be added to it or deleted from
it at will. A programmer need not worry about how many items a program will have to
accommodate in advance. This allows us to write robust programs which require much less
maintenance.
A singly linked list, or simply a linked list, is a linear collection of data items. The
linear order is given by means of POINTERS. These types of lists are often referred to
as linear linked list.
* Each item in the list is called a node.
* Each node of the list has two fields:
1. Information- contains the item being stored in the list.
2. Next address- contains the address of the next item in the list.
* The last node in the list contains NULL pointer to indicate that it is the end
of the list. Conceptual view of Singly Linked List
Insertion of a node
Deletions of a node
Traversing the list
Method -1:
struct node
{ Data link
int data;
struct node *link;
};
Method -2:
class node
{
public:
int data;
node *link;
};
temp
head is the pointer variable which contains address of the first node and temp contains
address of new node to be inserted then sample code is
temp->link=head;
head=temp;
After insertion:
temp
head is the pointer variable which contains address of the first node and temp contains
address of new node to be inserted then sample code is
t=head;
while(t->link!=NULL)
{
t=t->link;
}
t->link=temp;
c=1;
while(c<pos)
{
prev=cur;
cur=cur->link;
c++;
}
prev->link=temp;
temp->link=cur;
Deletions: Removing an element from the list, without destroying the integrity of the list
itself.
head is the pointer variable which contains address of the first node
sample code is
t=head;
head=head->link;
cout<<"node "<<t->data<<" Deletion is sucess";
delete(t);
head
head
struct node<T>*cur,*prev;
cur=prev=head;
while(cur->link!=NULL)
{prev=cur; cur=cur-
>link;
}
prev->link=NULL;
cout<<"node "<<cur->data<<" Deletion is sucess";
free(cur);
head
head
c=1;
while(c<pos)
{c++;
prev=cu
r;
cur=cur->link;
}
10 20 30 40 NULL
prev
10 20 30 40 NULL
Traversing the list: Assuming we are given the pointer to the head of the list, how do we
get the end of the list.
if(head==NULL)
{
cout<<"List is Empty\n";
}
else
{t=head;
while(t!=NUL
L)
{cout<<t->data<<"->";
t=t->link;
}
}
}
Method -1:
struct node
{
int data;
struct node *prev;
struct node * next;
};
Method -2:
class node
{
public:
int data;
node *prev;
node * next;
};
NUL
NULL 10 20 30 L
head is the pointer variable which contains address of the first node and temp contains
address of new node to be inserted then sample code is
temp->next=head;
head->prev=temp;
head=temp;
head
NUL
40 10 20 30 L
void DLL<T>::insert_end()
{
struct dnode<T> *t,*temp;
int n;
cout<<"Enter data into dnode:";
cin>>n;
temp=create_dnode(n);
if(head==NULL)
head=temp;
else
{t=head; while(t-
>next!=NULL)
t=t->next;
t->next=temp;
temp->prev=t;
}
}
Deletions: Removing an element from the list, without destroying the integrity of the list
itself.
head
NUL NUL
L 10 20 30 L
head is the pointer variable which contains address of the first node
sample code is
t=head;
head=head->next;
head->prev=NULL;
cout<<"dnode "<<t->data<<" Deletion is sucess";
delete(t);
head
NUL
NULL 10 NULL 20 30 L
struct dnode<T>*pr,*cr;
pr=cr=head;
while(cr->next!=NULL)
{pr=cr; cr=cr-
>next;
}
pr->next=NULL;
cout<<"dnode "<<cr->data<<" Deletion is sucess";
delete(cr);
head
pr cr
NULL 10 30 20 NULL
while(count<pos)
{pr=cr; cr=cr-
>next;
count++;
}
pr->next=cr->next;
cr->next->prev=pr;
head
NUL
NULL 10 30 20 L
cr
pr
Advantages:
Any node can be traversed starting from any other node in the list.
There is no need of NULL pointer to signal the end of the list and hence, all
pointers contain valid addresses.
In contrast to singly linked list, deletion operation in circular list is simplified as the
search for the previous node of an element to be deleted can be started from that
item itself.
head
STACK ADT:- A Stack is a linear data structure where insertion and deletion of items takes
place at one end called top of the stack. A Stack is defined as a data structure which operates
on a last-in first-out basis. So it is also is referred as Last-in First-out( LIFO).
Stack uses a single index or pointer to keep track of the information in the stack.
The basic operations associated with the stack are:
a) push(insert) an item onto the stack.
b) pop(remove) an item from the stack.
Assume that the array elements begin at 0 ( because the array subscript starts from 0)
and the maximum elements that can be placed in stack is max. The stack pointer, top, is
considered to be pointing to the top element of the stack. A push operation thus involves
adjusting the stack pointer to point to next free slot and then copying data into that slot of
the stack. Initially the top is initialized to -1.
Page 34
Applications of Stack:
1. Stacks are used in conversion of infix to postfix expression.
2. Stacks are also used in evaluation of postfix expression.
3. Stacks are used to implement recursive procedures.
4. Stacks are used in compilers.
5. Reverse String
The Tower of Hanoi (also called the Tower of Brahma or Lucas' Tower,[1] and sometimes
pluralized) is a mathematical game or puzzle. It consists of three rods, and a number of
disks of different sizes which can slide onto any rod. The puzzle starts with the disks in a
neat stack in ascending order of size on one rod, the smallest at the top, thus making a
conical shape.
QUEUE ADT
A queue is an ordered collection of data such that the data is inserted at one end and
deleted from another end. The key difference when compared stacks is that in a queue the
information stored is processed first-in first-out or FIFO. In other words the information
receive from a queue comes in the same order that it was placed on the queue.
Representing a Queue:
One of the most common way to implement a queue is using array. An easy way to do so is
to
define an array Queue, and two additional variables front and rear. The rules for
manipulating these
variables are
simple:
Each time information is added to the queue, increment rear.
Each time information is taken from the queue, increment front.
Whenever front >rear or front=rear=-1 the queue is empty.
Array implementation of a Queue do have drawbacks. The maximum queue size has to
be set at compile time, rather than at run time. Space can be wasted, if we do not use the
full capacity of the array.
For insertion and deletion of an element from a queue, the array elements begin at 0 and
the maximum elements of the array is maxSize. The variable front will hold the index of
the item that is considered the front of the queue, while the rear variable will hold the
index of the last item in the queue.
Assume that initially the front and rear variables are initialized to -1. Like stacks,
underflow and overflow conditions are to be checked before operations in a queue.
CIRCULAR QUEUE
Once the queue gets filled up, no more elements can be added to it even if any element
is removed from it consequently. This is because during deletion, rear pointer is not
adjusted.
When the queue contains very few items and the rear pointer points to last element. i.e.
rear=maxSize-1, we cannot insert any more items into queue because the overflow condition
satisfies. That means a lot of space is wasted
.Frequent reshuffling of elements is time consuming. One solution to this is
arranging all elements in a circular fashion. Such structures are often referred to as Circular
Queues.
A circular queue is a queue in which all locations are treated as circular such that
the first location CQ[0] follows the last location CQ[max-1].
if(front==-1)
cout<<"Queue is empty";
if(front==(rear+1)%max)
{
cout<<"Circular Queue is full\n";
}
Algorithm CQueueDeletion(Q,maxSize,Front,Rear,item)
Step 1: If Front = 0 then
print ―Queue Underflow‖
Return
Step 2: K=Q[Front]
Step 3: If Front = Rear then
begin
Front = -1
Rear = -1
end
else
If Front = maxSize-1 then
Front = 0
else
Front = Front + 1
Step 4: Return K
DS Using C++ Page 34
DEQUEUE
In a linear queue, the usual practice is for insertion of elements we use one end called rear
for deletion of elements we use another end called as front. But in the doubly ended queue
we can make use of both the ends for insertion of the elements as well as we can use both
the ends for deletion of the elements. That means it is possible to insert the elements by rear
as well as by front. Similarly it is possible to delete the elements from rear.
Normally insertion of elements is done at rear end and delete the elements from front end.
For example elements 10,20,30 are inserted at rear end.
To insert any element from front end then first shift all the elements to the right. It s
Terminology
BINARY TREES
Binary tree is a tree in which each node has at most two children, a left child and a right
child. Thus the order of binary tree is 2.
A binary tree is either empty or consists of
a) a node called the root
b)left and right sub trees are themselves binary trees.
2. Right skewed binary tree: If the left sub-tree is missing in every node of a tree we
call it is right subtree.
b) Linked Representation
Linked representation of trees in memory is implemented using pointers. Since each node
in a binary tree can have maximum two children, a node in a linked representation has two
pointers for both left and right child, and one information field. If a node does not have any
child, the corresponding pointer field is made NULL pointer. In linked list each node will
look like this:
Right
Left Child Data Child
2. Insertions and deletions which are the most common operations can be done
without moving the nodes.
1. This representation does not provide direct access to a node and special algorithms are
required.
2. This representation needs additional space in each node for storing the left and right sub-
trees.
C-B-A-D-E is the inorder traversal i.e. first we go towards the leftmost node. i.e. C so print
that node C. Then go back to the node B and print B. Then root node A then move towards
the right sub-tree print D and finally E. Thus we are following the tracing sequence of
Left|Root|Right. This type of traversal is called inorder traversal. The basic principle is to
traverse left sub-tree then root and then the right sub-tree.
template <class T>
void inorder(bintree<T> *root)
{
if(temp!=NULL)
{
inorder(root->left);
cout<<‖root->data‖;
inorder(root->right);
}
}
A-B-C-D-E is the preorder traversal of the above fig. We are following Root|Left|Right
path i.e. data at the root node will be printed first then we move on the left sub-tree and go
on printing the data till we reach to the left most node. Print the data at that node and then
move to the right sub-tree. Follow the same principle at each sub-tree and go on printing the
data accordingly.
Postorder Traversal:
From figure the postorder traversal is C-D-B-E-A. In the postorder traversal we are
following the Left|Right|Root principle i.e. move to the leftmost node, if right sub-tree is
there or not if not then print the leftmost node, if right sub-tree is there move towards the
right most node. The key idea here is that at each subtree we are following the
Left|Right|Root principle and print the data accordingly.
Threaded binary tree:- "A binary tree is threaded by making all right child pointers that
would normally be null point to the inorder successor of the node (if it exists), and all left
child pointers that would normally be null point to the inorder predecessor of the node."
3.Both left and right NULL pointers can be used to point to predecessor and successor of
that node respectively, under in order traversal. Such a tree is called a fully threaded tree.
A threaded binary tree where only one thread is used is also known as one way threaded tree
and where both threads are used is also known as two way threaded tree
Priority Queue
DEFINITION:
A priority queue is a collection of zero or more elements. Each element has a priority or value.
Unlike the queues, which are FIFO structures, the order of deleting from a priority queue is determined by
the element priority.
Elements are removed/deleted either in increasing or decreasing order of priority rather than in the order in
which they arrived in the queue.
There are two types of priority queues:
Min priority queue
Max priority queue
Min priority queue: Collection of elements in which the items can be inserted arbitrarily, but only smallest
element can be removed.
Max priority queue: Collection of elements in which insertion of items can be in any order but only largest
element can be removed.
In priority queue, the elements are arranged in any order and out of which only the smallest or largest
element allowed to delete each time.
The implementation of priority queue can be done using arrays or linked list. The data structure heap is
used to implement the priority queue effectively.
APPLICATIONS:
1. The typical example of priority queue is scheduling the jobs in operating system. Typically OS allocates
priority to jobs. The jobs are placed in the queue and position of the job in priority queue determines their
priority. In OS there are 3 jobs- real time jobs, foreground jobs and background jobs. The OS always
schedules the real time jobs first. If there is no real time jobs pending then it schedules foreground jobs.
Lastly if no real time and foreground jobs are pending then OS schedules the background jobs.
2. In network communication, the manage limited bandwidth for transmission the priority queue is used.
3. In simulation modeling to manage the discrete events the priority queue is used.
Various operations that can be performed on priority queue are-
1. Find an element
2. Insert a new element
3. Remove or delete an element
The abstract data type specification for a max priority queue is given below. The specification for a min priority
queue is the same as ordinary queue except while deletion, find and remove the element with minimum priority
12 4 12 14
11 10 18 20
Now if we want to insert 7. We cannot insert 7 as left child of 4. This is because the max heap has a property that
value of any node is always greater than the parent nodes. Hence 7 will bubble up 4 will be left child of 7.
Note: When a new node is to be inserted in complete binary tree we start from bottom and from left child on
the current level. The heap is always a complete binary tree.
12 7 inserted!
11
10 4
If we want to insert node 25, then as 25 is greatest element it should be the root. Hence 25 will bubble up and 18
will move down.
25 inserted!
12 18
11
10 4
The insertion strategy just outlined makes a single bubbling pass from a leaf toward the root. At each level we
do (1) work, so we should be able to implement the strategy to have complexity O(height) = O(log n).
For deletion operation always the maximum element is deleted from heap. In Max heap the maximum
element is always present at root. And if root element is deleted then we need to reheapify the tree.
25
12 18
11
10 4
Delete root element:25, Now we cannot put either 12 or 18 as root node and that should be greater than all its
children elements.
18
12 4
11 10
Now we cannot put 4 at the root as it will not satisfy the heap property. Hence we will bubble up 18 and place 18 at
root, and 4 at position of 18.
If 18 gets deleted then 12 becomes root and 11 becomes parent node of 10.
Thus deletion operation can be performed. The time complexity of deletion operation is O(log n).
1. Remove the maximum element which is present at the root. Then a hole is created at the root.
2. Now reheapify the tree. Start moving from root to children nodes. If any maximum element is found then
place it at root. Ensure that the tree is satisfying the heap property or not.
3. Repeat the step 1 and 2 if any more elements are to be deleted.
For deletion operation always the maximum element is deleted from heap. In Max heap the maximum
element is always present at root. And if root element is deleted then we need to reheapify the tree.
25
12 18
11
10 4
Delete root element:25, Now we cannot put either 12 or 18 as root node and that should be greater than all its
children elements.
18
12 4
11 10
Now we cannot put 4 at the root as it will not satisfy the heap property. Hence we will bubble up 18 and place 18 at
root, and 4 at position of 18.
If 18 gets deleted then 12 becomes root and 11 becomes parent node of 10.
Thus deletion operation can be performed. The time complexity of deletion operation is O(log n).
4. Remove the maximum element which is present at the root. Then a hole is created at the root.
5. Now reheapify the tree. Start moving from root to children nodes. If any maximum element is found then
place it at root. Ensure that the tree is satisfying the heap property or not.
6. Repeat the step 1 and 2 if any more elements are to be deleted.
Applications Of Heap:
1. Heap is used in sorting algorithms. One such algorithm using heap is known as heap sort.
HEAP SORT
Heap sort is a method in which a binary tree is used. In this method first the heap is created using binary tree and then
heap is sorted using priority queue.
Eg:
25 57 48 38 10 91 84 33
In the heap sort method we first take all these elements in the array ―A‖
Now start building the heap structure. In forming the heap the key point is build heap in such a way that
the highest value in the array will always be a root.
Insert 25
All the algorithms require that the input fit into main memory. There are, some applications
where the input is much too large to fit into memory.
To do so, external sorting algorithms are designed to handle very large inputs.
Internal sorting deals with the ordering of records of a file in the ascending or descending
order when the whole file or list is compact enough to be accommodate in the internal
memory of the computer.
In many applications and problems it is quite common to encounter huge files comprising millions
of records which need to be sorted for their effective use in the application concerned.
The application domains of e-governance, digital library, search engines, on-line telephone
directory and electoral system, to list a few, deal with voluminous files of records.
Majority of the internal sorting techniques are virtually incapable of sorting large files since they require the whole
file in the internal memory of the computer, which is impossible. Hence the need for external sorting methods which
are exclusive strategies to sort huge files.
External sorting is a term for a class of sorting algorithms that can handle massive amounts of data. External
sorting is required when the data being sorted do not fit into the main memory of a computing device (usually RAM)
and instead they must reside in the slower external memory (usually a hard drive). External sorting typically uses a
hybrid sort-merge strategy. In the sorting phase, chunks of data small enough to fit in main memory are read, sorted,
and written out to a temporary file. In the merge phase, the sorted sub-files are combined into a single larger file.
One example of external sorting is the external merge sort algorithm, which sorts chunks that each fit in
RAM, then merges the sorted chunks together. We first divide the file into runs such that the size of a run is small
enough to fit into main memory. Then sort each run in main memory using merge sort sorting algorithm. Finally
merge the resulting runs together into successively bigger runs, until the file is sorted.
Due to their large volume, the files are stored in external storage devices such as tapes, disks or
drums.
The external sorting strategies therefore need to take into consideration the kind of medium on
which the files reside, since these influence their work strategy.
A common principal behind most popular external sorting methods is outlined below:
1. Internally sort batches of records from the source file to generate runs. Write out the runs as and
when they are generated on to the external storage devices.
2. Merge the runs generated in the earlier phase, to obtain larger but fewer runs, and write them out
onto the external storage devices.
3. Repeat the run generated and merge, until in the final phase only one run gets generated, on which
MULTIWAY MERGE:
K-Way Merge Algorithms or Multiway Merges are a specific type of Sequence Merge Algorithms that
specialize in taking in multiple sorted lists and merging them into a single sorted list.
Example 1:
External Sorting: Example of multiway external sorting
Ta1: 17, 3, 29, 56, 24, 18, 4, 9, 10, 6, 45, 36, 11, 43
Assume that we have three tapes (k = 3) and the memory can hold three records.
Main memory sort
The first three records are read into memory, sorted and written on Tb1, the
second three records are read into memory, sorted and stored on Tb2, finally
the third three records are read into memory, sorted and stored on Tb3. Now
we have one run on each of the three tapes:
The next portion, which is also the last one, is sorted and stored onto Tb2:
Tb2: 18, 24, 56, 11, 43
Thus, after the main memory sort, our tapes look like this:
Merging
We build a heap tree in main memory out of the first records in each tape.
These records are: 3, 18, and 4.
We take the smallest of them - 3, using the deleteMin
operation, and store it on tape Ta1.
The record '3' belonged to Tb1, so we read the next record from Tb1 -
17, and insert it into the heap. Now the heap contains 18, 4, and 17.
The next deleteMin operation will output 4, and it will be stored on Ta1.
The record '4' comes from Tb3, so we read the next record '9' from Tb3
and insert it into the heap.
Now the heap contains 18, 17 and 9.
Proceeding in this way, the first three runs will be stored in sorted order on
Ta1. Ta1: 3, 4, 9, 10, 17, 18, 24, 29, 56
Tb1: 3, 4, 6, 9, 10, 11, 17, 18, 24, 29, 36, 43, 45, 56
Example 2:
If we have extra tapes, then we can expect to reduce the number of passes required to sort the input. This
is done by extending the basic (two- way) merge to a k-way merge.
Merging two runs is done by winding each input tape to the beginning of each run. Then the smaller
element is found, placed on an output tape, and the appropriate input tape is advanced. If there are k input
tapes, this strategy works the same way, the only difference being that it is slightly more complicated to
find the smallest of the k elements. We can find the smallest of these elements by using a priority queue.
To obtain the next element to write on the output tape, perform a deleteMin operation. The appropriate
input tape is advanced, and if the run on the input tape is not yet completed, we insert the new element
into the priority queue. For example distribute the input onto three tapes.
Ta1
Ta2
T
a3
Tb1 11 81 94 41 58 75
Tb2 12 35 96 15
Tb3 17 28 99
DICTIONARIES:
Dictionary is a collection of pairs of key and value where every value is associated with the
corresponding key.
Basic operations that can be performed on dictionary are:
1. Insertion of value in the dictionary
2. Deletion of particular value from dictionary
3. Searching of a specific value with the help of key
class dictionary
{
private:
int k,data;
struct node
{
public: int key;
int value;
struct node *next;
} *head;
public:
dictionary();
void insert_d( );
void delete_d( );
void display_d( );
void length();
};
New/head/curr/prev
1 10 NULL
New
4 20 NULL
Compare the key value of ‗curr‘ and ‗New‘ node. If New->key > Curr->key then attach New
node to ‗curr‘ node.
If we insert <3,15> then we have to search for it proper position by comparing key value.
(curr->key < New->key) is false. Hence else part will get executed.
1 10 4 20 7 80 NULL
3 15
Case 1: Initially assign ‗head‘ node as ‗curr‘ node.Then ask for a key value of the node which is
to be deleted. Then starting from head node key value of each jode is cked and compared with
the desired node‘s key value. We will get node which is to be deleted in variable ‗curr‘. The
node given by variable ‗prev‘ keeps track of previous node of ‗cuu‘ node. For eg, delete node
with key value 4 then
cur
1 10 3 15 4 20 7 80 NULL
se 2:
Then, simply make ‗head‘ node as next node and delete ‗curr‘
curr head
1 10 3 15 4 20 7 80 NULL
head
3 15 4 20 7 80 NULL
Eg:
null
Now to search any node from above given sorted chain we have to search the sorted chain
from head node by visiting each node. But this searching time can be reduced if we add one
level in every alternate node. This extra level contains the forward pointer of some node.
That means in sorted chain come nodes can holds pointers to more than one node.
NULL
If we want to search node 40 from above chain there we will require comparatively less time.
This search again can be made efficient if we add few more pointers forward references.
NULL
skip list
Element *next
Searching:
The desired node is searched with the help of a key value.
Searching for a key within a skip list begins with starting at header at the overall list level
and moving forward in the list comparing node keys to the key_val. If the node key is less than
the key_val, the search continues moving forward at the same level. If o the other hand, the
node key is equal to or greater than the key_val, the search drops one level and continues
forward. This process continues until the desired key_val has been found if it is present in the
skip list. If it is not, the search will either continue at the end of the list or until the first key with
a value greater than the search key is found.
Insertion:
There are two tasks that should be done before insertion operation:
1. Before insertion of any node the place for this new node in the skip list is searched.
Hence before any insertion to take place the search routine executes. The last[] array in
the search routine is used to keep track of the references to the nodes where the search,
drops down one level.
2. The level for the new node is retrieved by the routine randomelevel()
Using the hash key the required piece of data can be searched in the hash table by few or
more key comparisons. The searching time is then dependent upon the size of the hash
table.
The effective representation of dictionary can be done using hash table. We can place the
dictionary entries in the hash table using hash function.
HASH FUNCTION
Hash function is a function which is used to put the data in the hash table. Hence one can
use the same hash function to retrieve the data from the hash table. Thus hash function is
used to implement the hash table.
For example: Consider that we want place some employee records in the hash table The record
of employee is placed with the help of key: employee ID. The employee ID is a 7 digit number
for placing the record in the hash table. To place the record 7 digit number is converted into 3
digits by taking only last three digits of the key.
th
If the key is 496700 it can be stored at 0 position. The second key 8421002, the record of those
nd
key is placed at 2 position in the array.
Hence the hash function will be- H(key) = key%1000
Where key%1000 is a hash function and key obtained by hash function is called hash key.
Bucket and Home bucket: The hash function H(key) is used to map several dictionary
entries in the hash table. Each position of the hash table is called bucket.
The function H(key) is home bucket for the dictionary with pair whose value is key.
There are various types of hash functions that are used to place the record in the hash table-
H(key) = floor(p *(fractional part of key*A)) where p is integer constant and A is constant real
number.
H(key) = floor(50*(107*0.61803398987))
= floor(3306.4818458045)
= 3306
At 3306 location in the hash table the record 107 will be placed.
4. Digit Folding:
The key is divided into separate parts and using some simple operation these parts are
combined to produce the hash key.
For eg; consider a record 12365412 then it is divided into separate parts as 123 654 12 and
these are added together
H(key) = 123+654+12
= 789
The record will be placed at location 789
5. Digit Analysis:
The digit analysis is used in a situation when all the identifiers are known in advance.
We first transform the identifiers into numbers using some radix, r. Then examine the digits of
each identifier. Some digits having most skewed distributions are deleted. This deleting of digits
is continued until the number of remaining digits is small enough to give an address in the range
of the hash table. Then these digits are used to calculate the hash address.
the hash function is a function that returns the key value using which the record can be placed in
the hash table. Thus this function helps us in placing the record in the hash table at appropriate
position and due to this we can retrieve the record directly from that location. This function need
to be designed very carefully and it should not return the same hash key address for two
different records. This is an undesirable situation in hashing.
Definition: The situation in which the hash function returns the same hash key (home bucket)
for more than one record is called collision and two same hash keys returned for different
records is called synonym.
Similarly when there is no room for a new pair in the hash table then such a situation is
called overflow. Sometimes when we handle collision it may lead to overflow conditions.
Collision and overflow show the poor hash functions.
For example, 0
1 131
Consider a hash function. 2
3 43
H(key) = recordkey%10 having the hash table size of 10 4 44
5
The record keys to be placed are 6 36
7 57
131, 44, 43, 78, 19, 36, 57 and 77 8 78
131%10=1 9 19
44%10=4
43%10=3
78%10=8
19%10=9
36%10=6
57%10=7
77%10=7
Now if we try to place 77 in the hash table then we get the hash key to be 7 and at index 7
already the record key 57 is placed. This situation is called collision. From the index 7 if we
look for next vacant position at subsequent indices 8.9 then we find that there is no room to
place 77 in the hash table. This situation is called overflow.
In collision handling method chaining is a concept which introduces an additional field with
data i.e. chain. A separate chain table is maintained for colliding data. When collision occurs
then a linked list(chain) is maintained at the home bucket.
For eg;
Here D = 10
0
1 131 21 61 NULL
3 NULL
131 61 NULL
7 97 NULL
A chain is maintained for colliding elements. for instance 131 has a home bucket (key) 1.
similarly key 21 and 61 demand for home bucket 1. Hence a chain is maintained at index 1.
This is the easiest method of handling collision. When collision occurs i.e. when two records
demand for the same home bucket in the hash table then collision can be solved by placing the
second record linearly down whenever the empty bucket is found. When use linear probing
(open addressing), the hash table is represented as a one-dimensional array with indices that
range from 0 to the desired table size-1. Before inserting any elements into this table, we must
initialize the table to represent the situation where all slots are empty. This allows us to detect
overflows and collisions when we inset elements into the table. Then using some suitable hash
function the element can be inserted into the hash table.
For example:
H(key) = 131 % 10
=1
Index 1 will be the home bucket for 131. Continuing in this fashion we will place 4, 8, 7.
Now the next key to be inserted is 21. According to the hash function
H(key)=21%10
H(key) = 1
But the index 1 location is already occupied by 131 i.e. collision occurs. To resolve this
collision we will linearly move down and at the next empty location we will prob the element.
Therefore 21 will be placed at the index 2. If the next element is 5 then we get the home bucket
for 5 as index 5 and this bucket is empty so we will put the element 5 at index 5.
2 NULL 21 21
3 NULL NULL 31
4 4 4
4
NULL 5 5
5
NULL NULL 61
6
7 7 7
7
8 8 8
8
NULL NULL NULL
9
39
19%10 = 9 cluster is formed
18%10 = 8 29
39%10 = 9 8
29%10 = 9
8%10 = 8
18
QUADRATIC PROBING: 19
Quadratic probing operates by taking the original hash value and adding successive values of an
arbitrary quadratic polynomial to the starting value. This method uses following formula.
2
H(key) = (Hash(key) + i ) % m)
for eg; If we have to insert following elements in the hash table with table size 10:
H1(37) = 37 % 10 = 7
H1(90) = 90 % 10 = 0 37
H1(45) = 45 % 10 = 5
H1(22) = 22 % 10 = 2
49
H1(49) = 49 % 10 = 9
H2(key) = M – (key % M) 17
Here M is prime number smaller than the size of the table. Prime number 22
smaller than table size 10 is 7
Hence M = 7
45
H2(17) = 7-(17 % 7)
=7–3=4
37
That means we have to insert the element 17 at 4 places from 37. In short we have to take
4 jumps. Therefore the 17 will be placed at index 1.
49
Now to insert number 55
Key
H1(55) = 55 % 10 =5 Collision
90
H2(55) = 7-(55 % 7) 17
=7–6=1 22
That means we have to take one jump from index 5 to place 55.
Finally the hash table will be -
45
55
37
49
Comparison of Quadratic Probing & Double Hashing
The double hashing requires another hash function whose probing efficiency is same as
some another hash function required when handling random collision.
The double hashing is more complex to implement than quadratic probing. The quadratic
probing is fast technique than double hashing.
REHASHING
Rehashing is a technique in which the table is resized, i.e., the size of table is doubled by
creating a new table. It is preferable is the total size of table is a prime number. There are
situations in which the rehashing is required.
Consider we have to insert the elements 37, 90, 55, 22, 17, 49, and 87. the table size is 10 and
will use hash function.,
37 % 10 = 7
90 % 10= 0
55 % 10 = 5
22 % 10 = 2
17 % 10 = 7 Collision solved by linear probing
49 % 10 = 9
Now this table is almost full and if we try to insert more elements collisions will occur and
eventually further insertions will fail. Hence we will rehash by doubling the table size. The old
table size is 10 then we should double this size for new table, that becomes 20. But 20 is not a
prime number, we will prefer to make the table size as 23. And new hash function will be
1. This technique provides the programmer a flexibility to enlarge the table size if required.
2. Only the space gets doubled with simple hash function which avoids occurrence of
collisions.
EXTENSIBLE HASHING
Extensible hashing is a technique which handles a large amount of data. The data to be
placed in the hash table is by extracting certain number of bits.
Extensible hashing grow and shrink similar to B-trees.
In extensible hashing referring the size of directory the elements are to be placed in
buckets. The levels are indicated in parenthesis.
0 1
Levels
0) (1)
001 111
data to be
010
placed in bucket
The bucket can hold the data of its global depth. If data in bucket is more than
global depth then, split the bucket and double the directory.
Consider we have to insert 1, 4, 5, 7, 8, 10. Assume each page can hold 2 data entries (2 is
the depth).
Step 1: Insert 1, 4
1 = 001
0
4 = 100
(0)
We will examine last bit
001 of data and insert the data
010 in bucket.
Deletion Operation:
00 01 10 11
Delete 7.
00 01 10 11
(1) (1)
00 00 10 11
(1) (1)
100 001
101
This method is used to carry out dictionary Skip lists are used to implement dictionary
operations using randomized processes. operations using randomized process.
If the sorted data is given then hashing is The sorted data improves the performance
not an effective method to implement of skip list.
dictionary.
The space requirement in hashing is for The forward pointers are required for every
hash table and a forward pointer is required level of skip list.
per node.
Hashing is an efficient method than skip The skip lists are not that mush efficient.
lists.
Skip lists are more versatile than hash Worst case space requirement is larger for
table. skip list than hashing.
Search Trees:-Binary Search Trees, Definition, ADT, Implementation, Operations- Searching, Insertion
and Deletion, AVL Trees, Definition, Height of an AVL Tree, Operations – Insertion, Deletion and
Searching, B-Trees, B-Tree of order m, height of a B-Tree, insertion, deletion and searching. Graphs:
Basic terminology representation of graphs, graph search methods DFS,BFS.
TREES
A Tree is a data structure in which each element is attached to one or more elements directly beneath it.
Level 0
A
B 1
C D
E F G
H I J
2
K L 3
Terminology
The connections between elements are called branches.
A tree has a single root, called root node, which is shown at the top of the tree. i.e. root is always at
the highest level 0.
Each node has exactly one node above it, called parent. Eg: A is the parent of B,C and D.
The nodes just below a node are called its children. ie. child nodes are one level lower than the
parent node.
A node which does not have any child called leaf or terminal node. Eg: E, F, K, L, H, I and M are leaves.
Nodes with at least one child are called non terminal or internal nodes.
The child nodes of same parent are said to be siblings.
A path in a tree is a list of distinct nodes in which successive nodes are connected by branches in
the tree.
The length of a particular path is the number of branches in that path. The degree of a node
of a tree is the number of children of that node.
The maximum number of children a node can have is often referred to as the order of a
tree. The height or depth of a tree is the length of the longest path from root to any leaf.
1. Root: This is the unique node in the tree to which further sub trees are attached. Eg: A
Degree of the node: The total number of sub-trees attached to the node is called the degree of the
node.Eg: For node A degree is 3. For node K degree is 0
3.Leaves: These are the terminal nodes of the tree. The nodes with degree 0 are always the leaf nodes.
Eg: E, F, K, L,H, I, J
Binary tree is a tree in which each node has at most two children, a left child and a right child. Thus the
order of binary tree is 2.
A binary tree is a finite set of nodes which is either empty or consists of a root and two disjoint
trees called left sub-tree and right sub- tree.
In binary tree each node will have one data field and two pointer fields for representing the
sub-branches. The degree of each node in the binary tree will be at the most two.
1. Left skewed binary tree: If the right sub-tree is missing in every node of a tree we call it as left skewed
tree.
A
B C
D E F G
Note:
n
1. A binary tree of depth n will have maximum 2 -1 nodes.
2. A complete binary tree of level l will have maximum 2l nodes at each level, where l starts from 0.
3. Any binary tree with n nodes will have at the most n+1 null branches.
4. The total number of edges in a complete binary tree with n terminal nodes are 2(n-1).
a) Sequential Representation
b) Linked Representation
a) Sequential Representation
The simplest way to represent binary trees in memory is the sequential representation that uses
one-dimensional array.
1) The root of binary tree is stored in the 1 st location of array
th
2) If a node is in the j location of array, then its left child is in the location 2J+1 and its right
child in the location 2J+2
d+1
The maximum size that is required for an array to store a tree is 2 -1, where d is the depth of the tree.
2. Insertions and deletions which are the most common operations can be done without
moving the nodes.
1. This representation does not provide direct access to a node and special algorithms are
required.
2. This representation needs additional space in each node for storing the left and right sub-
trees.
Traversing a tree means that processing it so that each node is visited exactly once. A
binary tree can be
traversed a number of ways.The most common tree traversals are
In-order
Pre-order and
Post-order
Pre-order 1.Visit the root Root | Left | Right
2.Traverse the left sub tree in pre-order
3.Traverse the right sub tree in pre-order.
In-order 1.Traverse the left sub tree in in-order Left | Root | Right
2.Visit the root
3.Traverse the right sub tree in in-order.
Post-order 1.Traverse the left sub tree in post-order Left | Right | Root
2.Traverse the right sub tree in post-order.
3.Visit the root
B C
D E F G
H I J
K
The pre-order traversal is: ABDEHCFGIKJ
The in-order traversal is : DBHEAFCKIGJ
The post-order traversal is:DHEBFKIJGCA
nd th
Print 2 Print 4
B D
C Print this
at the last E
st
Print 1
C-B-A-D-E is the inorder traversal i.e. first we go towards the leftmost node. i.e. C so print that node
C. Then go back to the node B and print B. Then root node A then move towards the right sub-tree
print D and finally E. Thus we are following the tracing sequence of Left|Root|Right. This type of
traversal is called inorder traversal. The basic principle is to traverse left sub-tree then root and then the
right sub-tree.
Pseudo Code:
A-B-C-D-E is the preorder traversal of the above fig. We are following Root|Left|Right path i.e.
data at the root node will be printed first then we move on the left sub-tree and go on printing the
data till we reach to the left most node. Print the data at that node and then move to the right sub-
tree. Follow the same principle at each sub-tree and go on printing the data accordingly.
From figure the postorder traversal is C-D-B-E-A. In the postorder traversal we are following the
Left|Right|Root principle i.e. move to the leftmost node, if right sub-tree is there or not if not then
print the leftmost node, if right sub-tree is there move towards the right most node. The key idea
here is that at each sub-tree we are following the Left|Right|Root principle and print the data
accordingly.
Pseudo Code:
Before Insertion
In the above fig, if we wan to insert 23. Then we will start comparing 23 with value of root node
i.e. 10. As 23 is greater than 10, we will move on right sub-tree. Now we will compare 23 with 20
and move right, compare 23 with 22 and move right. Now compare 23 with 24 but it is less than
24. We will move on left branch of 24. But as there is node as left child of 24, we can attach 23 as
left child of 24.
This is the simplest deletion, in which we set the left or right pointer of parent node as NULL.
10
7 15
Before deletion
5 9 12 18
From the above fig, we want to delete the node having value 5 then we will set left pointer of its parent
node as NULL. That is left pointer of node having value 7 is set to NULL.
AVL TREES
Adelsion Velski and Lendis in 1962 introduced binary tree structure that is balanced with
respect to height of sub trees. The tree can be made balanced and because of this retrieval
of any node can be done in Ο(log n) times, where n is total number of nodes. From the name of these
scientists the tree is called AVL tree.
Definition:
An empty tree is height balanced if T is a non empty binary tree with T L and TR as
its left and right sub trees. The T is height balanced if and only if
i. TL and TR are height balanced.
ii. hL-hR <= 1 where hL and hR are heights of TL and TR.
The idea of balancing a tree is obtained by calculating the balance factor of a tree.
The balance factor BF(T) of a node in binary tree is defined to be hL-hR where hL and
hR are heights of left and right sub trees of T.
For any node in AVL tree the balance factor i.e. BF(T) is -1, 0 or +1.
Proof: Let an AVL tree with n nodes in it. Nh be the minimum number of nodes in an AVL tree of
height h.
In worst case, one sub tree may have height h-1 and other sub tree may have height h-2. And both these
sub trees are AVL trees. Since for every node in AVL tree the height of left and right sub trees differ
by at most 1.
Hence
N = N +N +1
h h-1 h-2
Where Nh denotes the minimum number of nodes in an AVL tree of height h.
N0=0 N1=2
> 2Nh-2
> 4Nh-4
.
.
> 2iNh-2i
N > 2h/2-1N2
= O(log N)
This proves that height of AVL tree is always O(log N). Hence search, insertion and deletion can
be carried out in logarithmic time.
The AVL tree follows the property of binary search tree. In fact AVL trees are
basically binary search trees with balance factors as -1, 0, or +1.
After insertion of any node in an AVL tree if the balance factor of any node
becomes other than -1, 0, or +1 then it is said that AVL property is violated. Then
we have to restore the destroyed balance condition. The balance factor is denoted at
right top corner inside the node.
After insertion of a new node if balance condition gets destroyed, then the nodes on that path(new node
insertion point to root) needs to be readjusted. That means only the affected sub tree is to be rebalanced.
The rebalancing should be such that entire tree should satisfy AVL property.
In above given example-
There are four different cases when rebalancing is required after insertion of new node.
1. An insertion of new node into left sub tree of left child. (LL).
2. An insertion of new node into right sub tree of left child. (LR).
3. An insertion of new node into left sub tree of right child. (RL).
4. An insertion of new node into right sub tree of right child.(RR).
Some modifications done on AVL tree in order to rebalance it is called rotations of AVL tree
Insertion Algorithm:
1.Insert a new node as new leaf just as an ordinary binary search tree.
2.Now trace the path from insertion point(new node inserted as leaf) towards root. For each node
‗n‘ encountered, check if heights of left (n) and right (n) differ by at most 1. a)If yes, move
towards parent (n).
b)Otherwise restructure by doing either a single rotation or a double rotation.
Thus once we perform a rotation at node ‗n‘ we do not require to perform any rotation at any
ancestor on ‗n‘.
2. RR rotation:
When node ‗4‘ gets attached as right child of node ‗C‘ then node ‗A‘ gets unbalanced. The
rotation which needs to be applied is RR rotation as shown in fig.
When node ‗2‘ is attached as a left child of node ‗C‘ then node ‗A‘ gets unbalanced as its
balance factor becomes -2. Then RL rotation needs to be applied to rebalance the AVL tree.
Example:
To insert node ‗1‘ we have to attach it as a left child of ‗2‘. This will unbalance the tree as follows.
We will apply LL rotation to preserve AVL property of it.
Insert 25
We will attach 25 as a right child of 18. No balancing is required as entire tree preserves the AVL
property
Even after deletion of any particular node from AVL tree, the tree has to be restructured in order to
preserve AVL property. And thereby various rotations need to be applied.
2. a) If the node to be deleted is a leaf node then simply make it NULL to remove.
b) If the node to be deleted is not a leaf node i.e. node may have one or two children, then the
node must be swapped with its in order successor. Once the node is swapped, we can remove
this node.
3. Now we have to traverse back up the path towards root, checking the balance factor of every
node along the path. If we encounter unbalancing in some sub tree
then balance that sub tree using appropriate single or double rotations. The deletion
algorithm takes O(log n) time to delete any node.
The searching of a node in an AVL tree is very simple. As AVL tree is basically binary search tree, the
algorithm used for searching a node from binary search tree is the same one is used to search a node
from AVL tree.
BTREES
Multi-way trees are tree data structures with more than two branches at a node. The data
structures of m-way search trees, B trees and Tries belong to this category of tree
structures.
AVL search trees are height balanced versions of binary search trees, provide efficient
retrievals and storage operations. The complexity of insert, delete and search operations on
AVL search trees id O(log n).
Applications such as File indexing where the entries in an index may be very large,
maintaining the index as m-way search trees provides a better option than AVL search trees
which are but only balanced binary search trees.
While binary search trees are two-way search trees, m-way search trees are extended binary
search trees and hence provide efficient retrievals.
B trees are height balanced versions of m-way search trees and they do not recommend
representation of keys with varying sizes.
Tries are tree based data structures that support keys with varying sizes.
A B tree of order m is an m-way search tree and hence may be empty. If non empty, then the
following properties are satisfied on its extended tree representation:
i. The root node must have at least two child nodes and at most m child nodes.
ii. All internal nodes other than the root node must have at least |m/2 | non empty child nodes and at most
m non empty child nodes.
iii. The number of keys in each internal node is one less than its number of child nodes and these keys
partition the keys of the tree into sub trees.
iv. All external nodes are at the same level.
v.
Example:
F K O B tree of order 4
Level 1
C D G M N P Q W
S T X Y Z
Level
3
Insertion
For example construct a B-tree of order 5 using following numbers. 3, 14, 7, 1, 8, 5, 11, 17, 13, 6, 23, 12,
20, 26, 4, 16, 18, 24, 25, 19
The order 5 means at the most 4 keys are allowed. The internal node should have at least 3 non empty
children and each leaf node must contain at least 2 keys.
1 3 7 14
1 3 8 14
1 3 5 8 11 14 17
Step 4: Now insert 13. But if we insert 13 then the leaf node will have 5 keys which is not allowed.
Hence 8,
11, 13, 14, 17 is split and medium node 13 is moved up.
7 13
1 3 5 8 11 14 17
7 13
1 3 5 6 8 11 12 14 17 20 23
Step 6: The 26 is inserted to the right most leaf node. Hence 14, 17, 20, 23, 26 the node is split and 20 will
be moved up.
7 13 20
1 3 5 6 8 11 12 14 17 23 26
4 7 13 20
1 3 5 6 8 11 12 14 16 17 18 23 24 25 26
Step 8: Finally insert 19. Then 4, 7, 13, 19, 20 needs to be split. The median 13 will be moved up
to from a root node.
The tree then will be -
13
4 7 17 20
1 3 5 6 8 11 12 14 16 18 19 23 24 25 26
4 7 17 20
1 3 5 6 8 11 12 14 16 18 19 23 24 25 26
4 7 17 20
1 3 5 6 11 12 14 16 18 19 23 24 25 26
Now we will delete 20, the 20 is not in a leaf node so we will find its successor which is 23, Hence
23 will be moved up to replace 20.
13
4 7 17 23
1 3 5 6 11 12 14 16 18 19 24 25 26
Next we will delete 18. Deletion of 18 from the corresponding node causes the node with only one
key, which is not desired (as per rule 4) in B-tree of order 5. The sibling node to immediate right
has an extra key. In such a case we can borrow a key from parent and move spare key of sibling up.
13
1 3 5 6 11 12 14 16 19 23 25 26
Now delete 5. But deletion of 5 is not easy. The first thing is 5 is from leaf node. Secondly this leaf
node has no extra keys nor siblings to immediate left or right. In such a situation we can combine this
node with one of the siblings. That means remove 5 and combine 6 with the node 1, 3. To make the
tree balanced we have to move parent‘s key down. Hence we will move 4 down as 4 is between 1, 3,
and 6. The tree will be-
13
7 17 24
1 3 4 6 11 12 14 16 19 23 25 26
But again internal node of 7 contains only one key which not allowed in B-tree. We then will try to borrow
a key from sibling. But sibling 17, 24 has no spare key. Hence we can do is that, combine 7 with 13 and
17, 24. Hence the B-tree will be
Searching
The search operation on B-tree is similar to a search to a search on binary search tree. Instead of choosing
between a left and right child as in binary tree, B-tree makes an m-way choice. Consider a B-tree as given
below.
13
4 7
17
20
1 3 5 6 8 11 12 14 16 18 19 23 24 25 26
The running time of search operation depends upon the height of the tree. It is O(log n).
Height of B-tree
The maximum height of B-tree gives an upper bound on number of disk access. The maximum number
of keys in a B-tree of order 2m and depth h is
Terminology of Graph
Graphs:-
A graph G is a discrete structure consisting of nodes (called vertices) and lines joining the
nodes (called edges). Two vertices are adjacent to each other if they are joint by an edge. The
edge joining the two vertices is said to be an edge incident with them. We use V (G) and E(G)
to denote the set of vertices and edges of G respectively.
Incidence Matrix
In this representation, graph can be represented using a matrix of size total number
of vertices by total number of edges. That means if a graph with 4 vertices and 6
edges can be represented using a matrix of 4X6 class. In this matrix, rows
represents vertices and columns represents edges. This matrix is filled with either 0
or 1 or -1. Here, 0 represents row edge is not connected to column vertex, 1
represents row edge is connected as outgoing edge to column vertex and -1
represents row edge is connected as incoming edge to column vertex.
Graph traversals
Graph traversal means visiting every vertex and edge exactly once in a well-defined order.
While using certain graph algorithms, you must ensure that each vertex of the graph is visited
During a traversal, it is important that you track which vertices have been visited. The most
common way of tracking vertices is to mark them.
This recursive nature of DFS can be implemented using stacks. The basic idea is as follows:
Pick a starting node and push all its adjacent nodes into a stack.
Pop a node from stack to select the next node to visit and push all its adjacent nodes into a
stack.
Repeat this process until the stack is empty. However, ensure that the nodes that are visited
are marked. This will prevent you from visiting the same node more than once. If you do not
mark the nodes that are visited and you visit the same node more than once, you may end up
in an infinite loop.