UNIT I (Repaired)
UNIT I (Repaired)
UNIT I (Repaired)
Unit I
1.1 Definitions of an Algorithm
An algorithm is a step by step procedure, which defines a set of instructions to be
executed in a certain order to get the desired output.
(Or)
An algorithm is a finite set of instructions which, if followed accomplish a particular task.
1.2 Characteristics of an Algorithm
In addition, every algorithm musty satisfies the following criteria:
Output: It produces zero or more outputs. At least one output must be produced.
Finiteness: All the operations can be carried out with a finite number of steps.
Effectiveness: The algorithm should be very efficient. The good algorithm occupies less
memory space as much as possible.
Unambiguous Algorithm should be clear and unambiguous. Each of its steps (or
phases), and their input/outputs should be clear and must lead to only one meaning.
Algorithms are generally created independently of underlying languages, i.e. an
algorithm can be implemented in more than one programming language.
O(g(n)) = {f(n) : positive constants c and n0, such that n n0, we have 0 f(n) cg(n) }
Omega Notation,
The n is the formal way to express the lower bound of an algorithm's running time. It measures
the best case time complexity or best amount of time an algorithm can possibly take to complete
(g(n)) = {f(n) : positive constants c and n0, such that n n0, we have 0 cg(n) f(n)}
Theta Notation,
The n is the formal way to express both the lower bound and upper bound of an
algorithm's running time. It is represented as follows
(g(n)) = {f(n) : positive constants c1, c2, and n0, such that n n0, we have 0 c1g(n) f(n)
c2g(n)}
Relations between , and O
Theorem: For any two function g(n) and f(n),
f(n) = (g(n)) Iff
(g(n))
Classification of sorting
a. External sorting
b. Internal sorting
c. Stable Sorting
External sorting
External sorting is a process of sorting in which large blocks of datas are stored
in storage devices are moved to the main memory and then sorted. I.e. a sorting can be
external if the records that it is sorting are in auxiliary storage.
Internal sorting
Internal sorting is a process of sorting the datas in the main memory. I.e. a sorting
can be internal if the records that it is sorting are in main memory.
Stable sort
A sorting technique is called stable if for all records i and j such that k[i] equals
k[j], if r[i] precedes r[j] in the original file, r[i] precedes r[j] in the sorted file. That is, a
stable sort keeps records with the same key in the same relative order that they were in
before the sort.
If the first few objects are already sorted, an unsorted object can be inserted in the sorted
set in proper place. This is called insertion sort. (Or) It is the one that sort a set of records
by inserting records into an existing sorted file.
With each pass of an insertion sort, one or more pieces of data are inserted into their correct
location in an ordered list.
Instead of inserting an element anywhere in the list and resorting it again, each time when a
new element is encountered, it is inserted in the correct position.
In this sorting, the list is divided into two parts: sorted and unsorted.
In each pass the first element of the unsorted sub-list is transferred to the sorted list by
inserting it at the appropriate place.
In pass p, move the pth element left until its correct place is found among the first p
elements.
Example: Card Players (As they pick up each card, they insert it into the proper sequence in
their hand
Steps
Let A be the array of n numbers.
Scan the array from A[1] to A[n-1] and find A[R] where R=1,2,3(N-1) and
insert into the proper position previously sorted sub-array, A[1],A[2]..A[R-1].
If R=2, A[2] is inserted into the previously sorted sub-array A[1], i.e., A[2] is
inserted either before A[1] or after A[1].
If R=3, A[3] is inserted into the previously sorted sub-array A[1],A[2] i.e., A[3] is
inserted either before A[1] or after A[2] or in between A[1] and A[2].
We can repeat the process for(n-1) times, and finally we get the sorted array.
Wall
last
Sorted
Unsorted
Fig.1.1: Insertion Sort concept
Figure 1 traces the insertion sort. Example: the sorts through a list of six numbers.
Sorting these data requires five sort passes. Each pass moves the wall one element to the right as
an element is removed from the unsorted sub-list and inserted into the sorted list.
Input: Unsorted list 34 |
64
51
32
21
After p = 2
34 |
64
51
32
21
After p = 3
34
64
| 51
32
21
After p = 4
34
51
64 |
32
21
After p = 5
32
34
51
64
After p = 6
21
32
34
51
21
64 |
Repeat step 4 for the elements in the unsorted portion of the vector.
The list is divided into two sublists, sorted and unsorted, which are divided by an
imaginary wall.
We find the smallest element from the unsorted sublist and swap it with the element at the
beginning of the unsorted data.
After each selection and swapping, the imaginary wall between the two sublists move one
element ahead, increasing the number of sort elements and decreasing the number of unsorted
ones.
Each time we move one element from the unsorted sublist to the sorted sublist, we say
that we have completed a sort pass.
A list of n elements requires n-1 passes to completely rearrange the data.
Example 1:
23
78
45
32
56
78
45
23
32
56
23
45
78
32
56
23
32
78
45
56
23
32
45
78
56
23
32
45
56
78
Original list
8
After pass 1
8
After pass 2
8
After pass 3
8
After pass 4
8
After pass 5
Example 2:
This means that the behavior of the selection sort algorithm does not depend on
the initial organization of data.
Since O(n2) grows, so rapidly, the selection sort algorithm is appropriate only for
small n.
Although the selection sort algorithm requires O(n 2) key comparisons, it only
requires O(n) moves.
A selection sort could be a good choice if data moves are costly, but key
comparisons are not costly (sort keys, long records).
Shell sort is a highly efficient sorting algorithm and is based on insertion sort algorithm.
This algorithm avoids large shifts as in case of insertion sort if smaller value is very far right and
have to move to far left.
This algorithm uses insertion sort on widely spread elements first to sort them and then
sorts the less widely spaced elements. This spacing is termed as interval. This interval is
calculated based on Knuth's formula as
h=h*3+1
where h is interval with initial value 1
This algorithm is quite efficient for medium sized data sets as its average and worst case
complexity are of O(n) where n are no. of items.
Example:
We take the below example to have an idea, how shell sort works? We take the same
array we have used in our previous examples. For our example and ease of understanding we
take the interval of 4. And make a virtual sublist of all values located at the interval of 4
positions. Here these values are {35, 14}, {33, 19}, {42, 27} and {10, 44}
We compare values in each sub-list and swap them (if necessary) in the original
array. After this step, new array should look like this
Then we take interval of 2 and this gap generates two sublists - {14, 27, 35, 42},
{19, 10, 33, 44}
We compare and swap the values, if required, in the original array. After this step,
this array should look like this {14, 27, 35, 42}, {10, 19, 33, 44}
And finally, we sort the rest of the array using interval of value 1. Shell sort uses insertion
sort to sort the array {10, 14, 19, 27, 33, 35, 42, and 44}.
The step by step depiction is shown below
We see that it required only four swaps to sort the rest of the array.
Example 2:
5 6
9 10
11 1
8},
{1,
9},
{2,
10},
{3,
11},
{4,
12},
{5,
6,
10,
13},
{6,
14},
{7,
15}
7,
11,
15}
next gap = 8 / 2 = 4
{0,
4,
8,
12},
{1,
5,
9,
13},
{2,
14},
{3,
next gap = 4 / 2 = 2
{0, 2, 4, 6, 8, 10, 12, 14}, {1, 3, 5, 7, 9, 11, 13, 15}
final gap = 2 / 2 = 1
Algorithm for shell sort
We shall now see the algorithm for shell sort.
Step 1 Initialize the value of h.
Step 2 Divide the list into smaller sub-list of equal interval h.
Step 3 Sort these sub-lists using insertion sort.
Step 3 Repeat until complete list is sorted.
1.6.4 Bubble sort
Another well known sorting method is bubble sort. It differs from the selection sort in
that instead of finding the smallest record and then performing an interchange two records are
interchanged immediately upon discovering that they are of out of order.
When this approach is used there are at most n-1 passes required. During the first pass
k1and k2 are compared, and if they are out of order, then records R1 and R2 are interchanged;
this process is repeated for records R2 and R3, R3 and R4 and so on .this method will cause with
small keys to bubble up.
After the first pass the record with the largest key will be in the nth position. On each
successive pass, the records with the next largest key will be placed in the position n-1, n-2,
n respectively, thereby resulting in a sorted table.
After each pass through the table, a check can be made to determine whether any
interchanges were made during that pass. If no interchanges occurred then the table must be
sorted and no further passes are required.
A general algorithm for bubble sort is
If the current element in the vector > next element in the vector then exchange elements.
If no exchanges were made then return else reduce the size of the unsorted vector by one.
Example 1:
One pass
Example 2:
Original list
After pass 1
After pass 2
After pass 3
After pass 4
QUICK_SORT(K,LB,UB)
Given a table K of N record, this recursive procedure sorts the table, as previously
described, in ascending order.
A dummy record with key K [N+1] is assumed where K[I]<=K[N+1] for all 1<=<=N .
The Integer parameters LB and UB denote the lower and upper bounds of the current sub
table being processed.
The indices I and J are used to select certain keys during the processing of each sub table.
KEY contains the key value which is being placed in its final position within the sorted sub table.
FLAG is a logical variable which indicates the end of the process that places a record in its final
position.
When FLAG becomes false, the input sub table has been partitioned into two disjointed
parts.
Variables used
K Array to hold elements
LB,UB Denotes the lower and upper bounds of the current sub table
I,J Used to select certain keys during processing
KEY Holds the key value of the final position
FLAG Logical variable to indicate the end of process
The quick-sort algorithm consists of the following three steps:
1. Divide: Partition the list.
To partition the list, we first choose some element from the list for which we hope
about half the elements will come before and half after. Call this element the
pivot.
Then we partition the elements so that all those with values less than the pivot
come in one sublist and all those with greater values come in another.
Partitioning places the pivot in its correct place position within the array.
Arranging the array elements around the pivot p generates two smaller sorting problems.
Sort the left section of the array, and sort the right section of the array.
When these two smaller sorting problems are solved recursively, our bigger
sorting problem is solved.
First, we have to select a pivot element among the elements of the given array, and we
put this pivot into the first location of the array before partitioning.
Which array item should be selected as pivot?
Somehow we have to select a pivot, and we hope that we will get a good
partitioning.
We can choose the first or last element as a pivot (it may not give a good
partitioning).
Partition Function
Invariant for the partition algorithm
The pivot value divides the list in to two parts. And recursively we find pivot for each
sub-lists until all lists contains only one element.
Quick Sort Pivot Algorithm
Based on our understanding of partitioning in quick sort, we should now try to write an
algorithm for it here.
Step 1 Choose the highest index value has pivot
Step 2 Take two variables to point left and right of the list excluding pivot
Step 3 left points to the low index
Step 4 right points to the high
Step 5 while value at left is less than pivot move right
Step 6 while value at right is greater than pivot move left
Step 7 if both step 5 and step 6 does not match swap left and right
Step 8 if left right, the point where they met is new pivot
QuickSort Algorithm
Using pivot algorithm recursively we end-up with smaller possible partitions. Each
partition then processed for quick sort. We define recursive algorithm for quicksort as below
Step 1 Make the right-most index value pivot
Step 2 partition the array using pivot value
Step 3 quicksort left partition recursively
Step 4 quicksort right partition recursively
Quicksort is slow when the array is sorted and we choose the first element as the pivot.
Although the worst case behavior is not so good, its average case behavior is much better
than its worst case.
Example 1:
Example 2:
Example 3:
Best-case
All the elements in the first array are smaller (or larger) than all the elements in
the second array.
Worst-case
But, merge sort requires an extra array whose size equals to the size of the original array.
If we use a linked list, we do not need an extra array
First it groups data items according to their rightmost character, and put these
groups into order with respect to this rightmost character.
We repeat these grouping and combining operations for all other character
positions in the data items from the rightmost to the leftmost character position.
Given tables of N Records arranged as a linked list, each node in the list consists of a Key
field, and a link field, this procedure performs the sort.
The first node is pointed by a pointer called FIRST the vectors T and B is the pointers to
store the address of the rear and front of the queue. In particular, these denoted as T[I] and B[I] ,
are the top and bottom of the records.
The pointer R is used to denote the current record. Next is the pointer to denote the next
record. Prev pointer is used to combine the queue. D is used to examine the digit.
Variables Used
First Pointer to point the First Node in the table.
TDenotes Top of the queue.
BDenotes the bottom of the Stack.
J Pass Index
PPocket index to point to the temporary table.
R Pointer to store the address of the current record being handled.
NEXTPointer which has the address of the next record in the table.
PREV Pointer to combine the pockets.
DCurrent digit being handled in the current Key field.
Example 1:
bad,bar,cat,dad,fat,god,him,mad,mom,par
combine groups (SORTED)
Example 2:
Its memory requirement is d * original size of data (because each group should
be big enough to hold the original data collection.)
The radix sort is more appropriate for a linked list than an array. (we will not need
the huge memory in this case)
Pass 1:
Graphically
Requirement
The number of chunks(K) <= M-1
Pass 2:
Divide the M buffers into:
- M-1 input buffers
- 1 output buffer
Use the M-1 input buffers to read the K sorted chunks (1 block at a time).
Merge sort the K sorted chunks together into a sorted file using 1 output buffer as
follows:
- Find the record with the smallest sort key among the K buffers.
- Move the record with the smallest sort key to the output buffer.
- When the output buffer is full, then write the output buffer to disk.
- When some input buffer is empty, then read another block from the sorted chunk
if there is more data.
We can use the M buffers to merge sort any number M1 sorted chunks into one larger
(sorted) chunk.
1.8 Searching
1.8.1 Linear/Sequential searching
The simplest search technique is the linear or sequential search. In this technique, we start
at a beginning of a list or table and search for the required record until the desired record is found
or the list is exhausted. This technique is suitable for a table or a linked list or an array and it can
be applied to an unordered list but the efficiency is increased if it is an ordered list.
For any search the total work is reflected by the number of comparisons of keys that
makes in the search. The number of comparisons depends on where the target (value to be
searched) key appears.
If the desired target key is at the first position of the list, only one comparison is required.
If the record is at second position, two comparisons are required. If it is the last position of the
list, n comparisons are compulsory.
If the search is unsuccessful, it makes n comparisons as the target will be compared with
all the entries of the list.
Variables used:
K Array to hold elements
N Total no of elements
X Element to be searched
For example:
Let us take the following example.
10
A[3]9
A[4]5
The two elements are not equal so the target checks with the next value.
Comparisons 2:
A[0]10
A[1]7
A[2]3
A[3]9
A[4]5
The two elements are not equal so the target checks with the next value.
Comparisons 3:
A[0]10
A[1]7
A[2]3
A[3]9
A[4]5
The two elements are not equal so the target checks with the next value.
Comparisons 4:
A[0]10
A[1]7
A[2]3
A[3]9
A[4]5
The two elements are equal. So the checking is finished. The value is returned.
Analysis of linear /sequential search:
Whether the sequential search carried out on list is implemented as arrays or linked list or
on files. The criterion in performance is the comparison loop. The fewer the number of
comparisons, the sooner the algorithm will terminate.
The fewer possible comparisons = 1. When they require item is the first in the list. The
maximum comparisons = N when the required item is the last item in the list. Thus if required
item is in position I in the list, I comparisons are required.
Hence the average number of comparisons done by sequential search is
1+2+3+. I +..+N/N
= N(N+1)/2*N
= (N+1)/2
Thus sequential search is easy to write and efficient for short lists. It does not require
sorted data.
1.8.2 Binary search
For searching lists with more values linear search is insufficient, binary search helps in
searching larger lists. To search a particular item with value target the approximate middle entry
of the table is located, and its key value is examined.
If its value is higher than the middle value, then the search is made with the elements
after the middle element. If the target value is smaller than the middle element, then the search is
made with the elements before the middle value. This process continues till the required target is
found.
Variables Used
11 17
20
25
11
[0]
17
[1]
20
[2]
25
[3]
low
30
[4]
33
[5]
[6]
mid
high
Is 17 = = A[3] ? No
17 < A[3], repeat the steps with low =0 and high = mid-1=2
Is low > high ? NO
Mid= (0+2)/2 = 1.
9
11
17
[0]
[1]
low
mid
20
[2]
high
[3]
25
30
[4]
33
[5]
[6]
Is 17 = = A[1] ? No
17 > A[1], repeat the steps with low = mid +1 =2 and high = 2
Is low > high ? NO
Mid= (2+2)/2 = 2.
9
11
[0]
17
[1]
20
[2]
25
[3]
30
[4]
33
[5]
[6]
low
mid
high
Is 17 = = A[2] ? Yes
Return (2).
Let us search for an element that is not in the list. Eg.:10
Is low > high ? NO
Mid= (0+6)/2 = 3.
9
11
[0]
17
[1]
20
[2]
low
25
[3]
30
[4]
33
[5]
[6]
mid
high
Is 10 = = A[3] ? No
10 < A[3], repeat the steps with low =0 and high = mid-1=2
Is low > high ? NO
Mid= (0+2)/2 = 1.
11
17
20
25
30
33
[0]
[1]
[2]
low
mid
high
[3]
[4]
[5]
[6]
Is 10 = = A[1] ? No
10 < A[1], repeat the steps with low = 0 and high = mid -1 = 0
Is low > high ? NO
Mid= (0+0)/2 = 0.
9
11
[0]
17
[1]
20
[2]
25
[3]
30
[4]
[5]
33
[6]
low
mid
high
Is 10 = = A[0] ? No
Repeat the steps with low = mid + 1=1 and high = 0
Is low > high ? Yes
9
11
17
[0]
[1]
high
low
20
[2]
[3]
25
30
[4]
[5]
33
[6]
return (-1).
Analysis of binary search
The binary search method needs no more than [log2n ] +1 comparisons. This implies that
for an array of million entries, only about twenty comparisons will be needed .Contrast this with
the case of sequential search which on the average will need (n+1)/2 comparisons.
1.8.2 Ternary search
A ternary search tree is a special trie data structure where the child nodes of a standard
trie are ordered as a binary search tree.
Recursively
Miss if encounter null link or reach end of key before NULL digit.
Representation
of
ternary
search
trees
Unlike trie(standard) data structure where each node contains 26 pointers for its children,
each
node
in
ternary
search
tree
contains
only
pointers:
1. The left pointer points to the node whose value is less than the value in the current
node.
2. The equal pointer points to the node whose value is equal to the value in the current
node.
3. The right pointer points to the node whose value is greater than the value in the current
node.
Apart from above three pointers, each node has a field to indicate data (character in case
of
dictionary)
and
another
field
to
mark
end
of
string.
So, more or less it is similar to BST which stores data based on some order. However,
data in a ternary search tree is distributed over the nodes. e.g. It needs 4 nodes to store the word
Geek.
Example 1
Example 2
c
/ | \
a
|
t
u
|
t
/ / |
s p e
h
| \
e u
/ |
i s
Example 3
f
o
r
i
s
n
o
w
h
e i
m
e
A search or insertion in a full TST requires time proportional to the key length. The
number of links in a TST is at most three times the number of characters in all the keys.
Advantages
One of the advantage of using ternary search trees over tries is that ternary search trees
are a more space efficient (involve only three pointers per node as compared to 26 in standard
tries). Further, ternary search trees can be used any time a hash table would be used to store
strings.
Tries are suitable when there is a proper distribution of words over the alphabets so that
spaces are utilized most efficiently. Otherwise ternary search trees are better. Ternary search trees
are efficient to use (in terms of space) when the strings to be stored share a common prefix.
1. Can make more space efficient by
2. Can compromise speed and space by having large branch at root (R or R 2) and rest of
trie is regular TST.
Disadvantage
1. Adapt to non-uniformity often seen
2. Though character set may be large, often only a few are used, or are used after a
particular prefix
6. Access bytes or larger symbols rather than bits (like Patricia tries), which are often
better supported/efficient, or more natural to the keys
Applications
of
ternary
search
trees
1. Ternary search trees are efficient for queries like Given a word, find the next word in
dictionary(near-neighbor lookups) or Find all telephone numbers starting with 9342 or typing
few starting characters in a web browser displays all website names with this prefix(Auto
complete feature).
2. Used in spell checks: Ternary search trees can be used as a dictionary to store all the
words. Once the word is typed in an editor, the word can be parallel searched in the ternary
search tree to check for correct spelling.
This search demonstrates three creative applications of the TST:
An English dictionary that matches words as you type and checks spelling.
A flexible array that can assume any size or dimension on the fly.
A database that stores all information in the same place (regardless of which record or
column the information belongs to), thereby decreasing access time and reducing storage
requirements.
Time Complexity: The time complexity of the ternary search tree operations is similar to
that of binary search tree. I.e. the insertion, deletion and search operations take time proportional
to the height of the ternary search tree. The space is proportional to the length of the string to be
stored.