AOA Module 6 - String of Algorithms - Aeraxia - in
AOA Module 6 - String of Algorithms - Aeraxia - in
AOA Module 6 - String of Algorithms - Aeraxia - in
#include<stdio.h>
#include<stdlib.h>
#include<string.h>
// function to find prefix
void prefixSearch(char* pat, int m, int* pps) {
int length = 0;
// array to store prefix
pps[0] = 0;
int i = 1;
while(i < m) {
// to check if the current character matches the previous character
if(pat[i] == pat[length]) {
// increment the length
length++;
// store the length in the prefix array
pps[i] = length;
}else {
if(length != 0) {
// to update length of previous prefix length
length = pps[length - 1];
i--;
} else
// if the length is 0, store 0 in the prefix array
pps[i] = 0;
}
i++; // incrementing i
}
}
// function to search for pattern
void patrnSearch(char* orgnString, char* patt, int m, int *locArray, int
*loc) {
int n, i = 0, j = 0;
n = strlen(orgnString);
// array to store the prefix values
int* prefixArray = (int*)malloc(m * sizeof(int)); // allocate memory for
the prefix array
// calling prefix function to fill the prefix array
prefixSearch(patt, m, prefixArray);
*loc = 0; // initialize the location index
while(i < n) {
// checking if main string character matches pattern string character
if(orgnString[i] == patt[j]) {
// increment both i and j
i++;
j++;
}
// if j and m are equal pattern is found
if(j == m) {
// store the location of the pattern
locArray[*loc] = i-j;
(*loc)++; // increment the location index
// update j to the previous prefix value
j = prefixArray[j-1];
// checking if i is less than n and the current characters do not match
}else if(i < n && patt[j] != orgnString[i]) {
if(j != 0)
// update j to the previous prefix value
j = prefixArray[j-1];
// if j is zero
else
i++; // increment i
}
}
free(prefixArray); // free the memory of the prefix array
}
int main() {
// declare the original text
char* orgnStr = "AAAABCAEAAABCBDDAAAABC";
// pattern to be found
char* patrn = "AAABC";
// get the size of the pattern
int m = strlen(patrn);
// array to store the locations of the pattern
int locationArray[strlen(orgnStr)];
// to store the number of locations
int index;
// calling pattern search function
patrnSearch(orgnStr, patrn, m, locationArray, &index);
// to loop through location array
for(int i = 0; i<index; i++) {
// print the location of the pattern
printf("Pattern found at location: %d\n", locationArray[i]);
}
}
Output
Pattern found at location: 1
Pattern found at location: 8
found at location: 17
1.
Q2) Rewrite and Compare Rabin Karp and Knuth Morris Pratt Algorithms
Give the pseudo code for the KMP String Matching Algorithm
The Rabin-Karp-Algorithm
The Rabin-Karp string matching algorithm calculates a hash value for the pattern, as well as for each M-
character subsequences of text to be compared. If the hash values are unequal, the algorithm will
determine the hash value for next M-character sequence. If the hash values are equal, the algorithm will
analyze the pattern and the M-character sequence. In this way, there is only one comparison per text
subsequence, and character matching is only required when the hash values match.
RABIN-KARP-MATCHER (T, P, d, q)
1. n ← length [T]
2. m ← length [P]
3. h ← dm-1 mod q
4. p ← 0
5. t0 ← 0
6. for i ← 1 to m
9. for s ← 0 to n-m
10. do if p = ts
Example: For string matching, working module q = 11, how many spurious hits does the
Rabin-Karp matcher encounters in Text T = 31415926535.......
T = 31415926535.......
P = 26
Solution:
Complexity:
The running time of RABIN-KARP-MATCHER in the worst case scenario O ((n-m+1) m but it has a
good average case running time. If the expected number of strong shifts is small O (1) and prime q is
chosen to be quite large, then the Rabin-Karp algorithm can be expected to run in time O (n+m) plus
the time to require to process spurious hits.
Knuth-Morris and Pratt introduce a linear time algorithm for the string matching problem. A matching
time of O (n) is achieved by avoiding comparison with an element of 'S' that have previously been involved
in comparison with some element of the pattern 'p' to be matched. i.e., backtracking on the string 'S' never
occurs
1. The Prefix Function (Π): The Prefix Function, Π for a pattern encapsulates knowledge about how the
pattern matches against the shift of itself. This information can be used to avoid a useless shift of the
pattern 'p.' In other words, this enables avoiding backtracking of the string 'S.'
2. The KMP Matcher: With string 'S,' pattern 'p' and prefix function 'Π' as inputs, find the occurrence of
'p' in 'S' and returns the number of shifts of 'p' after which occurrences are found
Solution:
KMP-MATCHER (T, P)
1. n ← length [T]
2. m ← length [P]
3. Π← COMPUTE-PREFIX-FUNCTION (P)
4. q ← 0 // numbers of characters matched
5. for i ← 1 to n // scan S from left to right
6. do while q > 0 and P [q + 1] ≠ T [i]
7. do q ← Π [q] // next character does not match
8. If P [q + 1] = T [i]
9. then q ← q + 1 // next character matches
10. If q = m // is all of p matched?
11. then print "Pattern occurs with shift" i - m
12. q ← Π [q] // look for the next match
Let us execute the KMP Algorithm to find whether 'P' occurs in 'T.'
For 'p' the prefix function, ? was computed previously and is as follows:
Solution:
Initially: n = size of T = 15
m = size of P = 7
Pattern 'P' has been found to complexity occur in a string 'T.' The total number of shifts that took place
for the match to be found is i-m = 13 - 7 = 6 shifts.
Q3) The Naive String Matching Algorithm
The naïve approach tests all the possible placement of Pattern P [1.......m] relative to text T [1......n].
We try shift s = 0, 1.......n-m, successively and for each shift s. Compare T [s+1.......s+m] to P [1......m].
The naïve algorithm finds all valid shifts using a loop that checks the condition P [1.......m] = T
[s+1.......s+m] for each of the n - m +1 possible value of s.
NAIVE-STRING-MATCHER (T, P)
1. n ← length [T]
2. m ← length [P]
3. for s ← 0 to n -m
4. do if P [1.....m] = T [s + 1....s + m]
5. then print "Pattern occurs with shift" s
Analysis: This for loop from 3 to 5 executes for n-m + 1(we need at least m characters at the end)
times and in iteration we are doing m comparisons. So the total complexity is O (n-m+1).
Example:
o Suppose T = 1011101110
o P = 111
o Find all the Valid Shift
Solution:
Q4) Naive algorithm for Pattern Searching
Given text string with length n and a pattern with length m, the task is to prints all
occurrences of pattern in text.
Note: You may assume that n > m.
Examples:
Slide the pattern over text one by one and check for a match. If a match is found,
then slide by 1 again to check for subsequent matches
#include <stdio.h>
#include <string.h>
int main() {
// Example 1
char txt1[] = "AABAACAADAABAABA";
char pat1[] = "AABA";
printf("Example 1:\n");
search(pat1, txt1);
// Example 2
char txt2[] = "agd";
char pat2[] = "g";
printf("\nExample 2:\n");
search(pat2, txt2);
return 0;
}
Output
• When the pattern is found at the very beginning of the text (or very early on).
• When the pattern doesn't appear in the text at all or appears only at the very
end.
Finite Automata:
A finite automaton M is a 5-tuple (Q, q0,A,∑δ), where
The finite automaton starts in state q0 and reads the characters of its input string one at a time. If the
automaton is in state q and reads input character a, it moves from state q to state δ (q, a). Whenever its
current state q is a member of A, the machine M has accepted the string read so far. An input that is not
allowed is rejected.
A finite automaton M induces a function ∅ called the called the final-state function, from ∑* to Q such
that ∅(w) is the state M ends up in after scanning the string w. Thus, M accepts a string w if and only if
∅(w) ∈ A.
∅ (∈)=q0
∅ (wa) = δ ((∅ (w), a) for w ∈ ∑*,a∈ ∑)
1. n ← length [T]
2. q ← 0
3. for i ← 1 to n
4. do q ← δ (q, T[i])
5. If q =m
6. then s←i-m
7. print "Pattern occurs with shift s" s
The primary loop structure of FINITE- AUTOMATON-MATCHER implies that its running time on a
text string of length n is O (n).
Computing the Transition Function: The following procedure computes the transition function δ from
given pattern P [1......m]
COMPUTE-TRANSITION-FUNCTION (P, ∑)
1. m ← length [P]
2. for q ← 0 to m
3. do for each character a ∈ ∑*
4. do k ← min (m+1, q+2)
5. repeat k←k-1
6. Until
7. δ(q,a)←k
8. Return δ
Example: Suppose a finite automaton which accepts even number of a's where ∑ = {a, b, c}
Solution:
The B-M algorithm takes a 'backward' approach: the pattern string (P) is aligned with the start of the text
string (T), and then compares the characters of a pattern from right to left, beginning with rightmost
character.
If a character is compared that is not within the pattern, no match can be found by analyzing any further
aspects at this position so the pattern can be changed entirely past the mismatching character.
For deciding the possible shifts, B-M algorithm uses two preprocessing strategies simultaneously.
Whenever a mismatch occurs, the algorithm calculates a variation using both approaches and selects the
more significant shift thus, if make use of the most effective strategy for each case.
The two strategies are called heuristics of B - M as they are used to reduce the search. They are:
o Suppose there is a character in a text in which does not occur in a pattern at all. When a
mismatch happens at this character (called as bad character), the whole pattern can be
changed, begin matching form substring next to this 'bad character.'
o On the other hand, it might be that a bad character is present in the pattern, in this case, align
the nature of the pattern with a bad character in the text.
This means that we need some extra information to produce a shift on encountering a bad character. This
information is about the last position of every aspect in the pattern and also the set of characters used in
a pattern (often called the alphabet ∑of a pattern).
COMPUTE-LAST-OCCURRENCE-FUNCTION (P, m, ∑ )
1. for each character a ∈ ∑
2. do λ [a] = 0
3. for j ← 1 to m
4. do λ [P [j]] ← j
5. Return λ
Example:
COMPUTE-GOOD-SUFFIX-FUNCTION (P, m)
1. Π ← COMPUTE-PREFIX-FUNCTION (P)
2. P'← reverse (P)
3. Π'← COMPUTE-PREFIX-FUNCTION (P')
4. for j ← 0 to m
5. do ɣ [j] ← m - Π [m]
6. for l ← 1 to m
7. do j ← m - Π' [L]
8. If ɣ [j] > l - Π' [L]
9. then ɣ [j] ← 1 - Π'[L]
10. Return ɣ
BOYER-MOORE-MATCHER (T, P, ∑)
1. n ←length [T]
2. m ←length [P]
3. λ← COMPUTE-LAST-OCCURRENCE-FUNCTION (P, m, ∑ )
4. ɣ← COMPUTE-GOOD-SUFFIX-FUNCTION (P, m)
5. s ←0
6. While s ≤ n - m
7. do j ← m
8. While j > 0 and P [j] = T [s + j]
9. do j ←j-1
10. If j = 0
11. then print "Pattern occurs at shift" s
12. s ← s + ɣ[0]
13. else s ← s + max (ɣ [j], j - λ[T[s+j]])
Complexity Comparison of String Matching Algorithm:
Algorithm Preprocessing Time Matching Time
Naive O (O (n - m + 1)m)