Z algorithm (Linear time pattern searching Algorithm)

KMP Algorithm for Pattern Searching

Last Updated : 25 Feb, 2025

Try it on GfG Practice

Given two strings txt and pat, the task is to return all indices of occurrences of pat within txt.

Examples:

Input: txt = “abcab”, pat = “ab”
Output: [0, 3]
Explanation: The string “ab” occurs twice in txt, first occurrence starts from index 0 and second from index 3.

Input: txt= “aabaacaadaabaaba”, pat = “aaba”
Output: [0, 9, 12]
Explanation:

Naive Pattern Searching Algorithm

We start at every index in the text and compare it with the first character of the pattern, if they match we move to the next character in both text and pattern.
If there is a mismatch, we start the same process for the next index of the text.

Please refer Naive algorithm for pattern searching for implementation.

KMP Pattern Searching Algorithm

The Naive Algorithm can work in linear time if we know for sure that all characters are distinct. Please refer Naive Pattern Searching for Distinct Characters in Pattern. The Naive algorithm can not be made better than linear when we have repeating characters.

Examples:

1) txt[] = “AAAAAAAAAAAAAAAAAB”, pat[] = “AAAAB”
2) txt[] = “ABABABCABABABCABABABC”, pat[] = “ABABAC” (not a worst case, but a bad case for Naive)

The KMP matching algorithm uses degenerating property (pattern having the same sub-patterns appearing more than once in the pattern) of the pattern and improves the worst-case complexity to O(n+m).

The basic idea behind KMP’s algorithm is: whenever we detect a mismatch (after some matches), we already know some of the characters in the text of the next window. We take advantage of this information to avoid matching the characters that we know will anyway match.

Matching Overview

txt = “AAAAABAAABA”
pat = “AAAA”
We compare first window of txt with pat

txt = “AAAAABAAABA”
pat = “AAAA” [Initial position]
We find a match. This is same as Naive String Matching.

In the next step, we compare next window of txt with pat.

txt = “AAAAABAAABA”
pat = “AAAA” [Pattern shifted one position]

This is where KMP does optimization over Naive. In this second window, we only compare fourth A of pattern
with fourth character of current window of text to decide whether current window matches or not. Since we know
first three characters will anyway match, we skipped matching first three characters.

Need of Preprocessing?

An important question arises from the above explanation, how to know how many characters to be skipped. To know this,
we pre-process pattern and prepare an integer array lps[] that tells us the count of characters to be skipped

In KMP Algorithm,

We preprocess the pattern and build LPS array for it. The size of this array is same as pattern length.
LPS is the Longest Proper Prefix which is also a Suffix. A proper prefix is a prefix that doesn’t include whole string. For example, prefixes of “abc” are “”, “a”, “ab” and “abc” but proper prefixes are “”, “a” and “ab” only. Suffixes of the string are “”, “c”, “bc”, and “abc”.
Each value, lps[i] is the length of longest proper prefix of pat[0..i] which is also a suffix of pat[0..i].

Preprocessing Overview:

We search for lps in subpatterns. More clearly we focus on sub-strings of patterns that are both prefix and suffix.
For each sub-pattern pat[0..i] where i = 0 to m-1, lps[i] stores the length of the maximum matching proper prefix which is also a suffix of the sub-pattern pat[0..i].

lps[i] = the longest proper prefix of pat[0..i] which is also a suffix of pat[0..i].

Note: lps[i] could also be defined as the longest prefix which is also a proper suffix. We need to use it properly in one place to make sure that the whole substring is not considered.

Examples of lps[] construction:

For the pattern “AAAA”, lps[] is [0, 1, 2, 3]

For the pattern “ABCDE”, lps[] is [0, 0, 0, 0, 0]

For the pattern “AABAACAABAA”, lps[] is [0, 1, 0, 1, 2, 0, 1, 2, 3, 4, 5]

For the pattern “AAACAAAAAC”, lps[] is [0, 1, 2, 0, 1, 2, 3, 3, 3, 4]

For the pattern “AAABAAA”, lps[] is [0, 1, 2, 0, 1, 2, 3]

Algorithm for Construction of LPS Array

lps[0] is always 0 since a string of length one has no non-empty proper prefix. We store the value of the previous LPS in a variable len, initialized to 0. As we traverse the pattern, we compare the current character at index i, with the character at index len.

Case 1 – pat[i] = pat[len]: this means that we can simply extend the LPS at the previous index, so increment len by 1 and store its value at lps[i].

Case 2 – pat[i] != pat[len] and len = 0: it means that there were no matching characters earlier and the current characters are also not matching, so lps[i] = 0.

Case 3 – pat[i] != pat[len] and len > 0: it means that we can’t extend the LPS at index i-1. However, there may be a smaller prefix that matches the suffix ending at i. To find this, we look for a smaller suffix of pat[i-len…i-1] that is also a proper prefix of pat. We then attempt to match pat[i] with the next character of this prefix. If there is a match, pat[i] = length of that matching prefix. Since lps[i-1] equals len, we know that pat[0…len-1] is the same as pat[i-len…i-1]. Thus, rather than searching through pat[i-len…i-1], we can use lps[len – 1] to update len, since that part of the pattern has already been matched.

Refer the below illustration for better explanation of all the cases:

Example of Construction of LPS Array

Implementation of KMP Algorithm

We initialize two pointers, one for the text string and another for the pattern. When the characters at both pointers match, we increment both pointers and continue the comparison. If they do not match, we reset the pattern pointer to the last value from the LPS array, because that portion of the pattern has already been matched with the text string. Similarly, if we have traversed the entire pattern string, we add the starting index of occurrence of pattern in text, to the result and continue the search from the lps value of last element of the pattern.

Let’s say we are at position i in the text string and position j in the pattern string when a mismatch occurs:

At this point, we know that pat[0..j-1] has already matched with txt[i-j..i-1].
The value of lps[j-1] represents the length of the longest proper prefix of the substring pat[0..j-1] that is also a suffix of the same substring.
From these two observations, we can conclude that there’s no need to recheck the characters in pat[0..lps[j-1]]. Instead, we can directly resume our search from lps[j-1].

C++

// C++ program to search the pattern in given text using
// KMP Algorithm

#include <iostream>
#include <string>
#include <vector>
using namespace std;

void constructLps(string &pat, vector<int> &lps) {

    // len stores the length of longest prefix which
    // is also a suffix for the previous index
    int len = 0;

    // lps[0] is always 0
    lps[0] = 0;

    int i = 1;
    while (i < pat.length()) {

        // If characters match, increment the size of lps
        if (pat[i] == pat[len]) {
            len++;
            lps[i] = len;
            i++;
        }

        // If there is a mismatch
        else {
            if (len != 0) {

                // Update len to the previous lps value
                // to avoid reduntant comparisons
                len = lps[len - 1];
            }
            else {

                // If no matching prefix found, set lps[i] to 0
                lps[i] = 0;
                i++;
            }
        }
    }
}

vector<int> search(string &pat, string &txt) {
    int n = txt.length();
    int m = pat.length();

    vector<int> lps(m);
    vector<int> res;

    constructLps(pat, lps);

    // Pointers i and j, for traversing
    // the text and pattern
    int i = 0;
    int j = 0;

    while (i < n) {

        // If characters match, move both pointers forward
        if (txt[i] == pat[j]) {
            i++;
            j++;

            // If the entire pattern is matched
            // store the start index in result
            if (j == m) {
                res.push_back(i - j);

                // Use LPS of previous index to
                // skip unnecessary comparisons
                j = lps[j - 1];
            }
        }

        // If there is a mismatch
        else {

            // Use lps value of previous index
            // to avoid redundant comparisons
            if (j != 0)
                j = lps[j - 1];
            else
                i++;
        }
    }
    return res;
}

int main() {
    string txt = "aabaacaadaabaaba";
    string pat = "aaba";

    vector<int> res = search(pat, txt);
    for (int i = 0; i < res.size(); i++)
        cout << res[i] << " ";

    return 0;
}

Java

// Java program to search the pattern in given text using
// KMP Algorithm

import java.util.ArrayList;

class GfG {
    
    static void constructLps(String pat, int[] lps) {
        
        // len stores the length of longest prefix which 
        // is also a suffix for the previous index
        int len = 0;

        // lps[0] is always 0
        lps[0] = 0;

        int i = 1;
        while (i < pat.length()) {
            
            // If characters match, increment the size of lps
            if (pat.charAt(i) == pat.charAt(len)) {
                len++;
                lps[i] = len;
                i++;
            }
            
            // If there is a mismatch
            else {
                if (len != 0) {
                    
                    // Update len to the previous lps value 
                    // to avoid redundant comparisons
                    len = lps[len - 1];
                } 
                else {
                    
                    // If no matching prefix found, set lps[i] to 0
                    lps[i] = 0;
                    i++;
                }
            }
        }
    }

    static ArrayList<Integer> search(String pat, String txt) {
        int n = txt.length();
        int m = pat.length();

        int[] lps = new int[m];
        ArrayList<Integer> res = new ArrayList<>();

        constructLps(pat, lps);

        // Pointers i and j, for traversing 
        // the text and pattern
        int i = 0;
        int j = 0;

        while (i < n) {
            // If characters match, move both pointers forward
            if (txt.charAt(i) == pat.charAt(j)) {
                i++;
                j++;

                // If the entire pattern is matched 
                // store the start index in result
                if (j == m) {
                    res.add(i - j);
                    
                    // Use LPS of previous index to 
                    // skip unnecessary comparisons
                    j = lps[j - 1];
                }
            }
            
            // If there is a mismatch
            else {
                
                // Use lps value of previous index
                // to avoid redundant comparisons
                if (j != 0)
                    j = lps[j - 1];
                else
                    i++;
            }
        }
        return res; 
    }

    public static void main(String[] args) {
        String txt = "aabaacaadaabaaba"; 
        String pat = "aaba"; 

        ArrayList<Integer> res = search(pat, txt);
        for (int i = 0; i < res.size(); i++) 
            System.out.print(res.get(i) + " ");
    }
}

Python

# Python program to search the pattern in given text 
# using KMP Algorithm

def constructLps(pat, lps):
    
    # len stores the length of longest prefix which 
    # is also a suffix for the previous index
    len_ = 0
    m = len(pat)
    
    # lps[0] is always 0
    lps[0] = 0

    i = 1
    while i < m:
        
        # If characters match, increment the size of lps
        if pat[i] == pat[len_]:
            len_ += 1
            lps[i] = len_
            i += 1
        
        # If there is a mismatch
        else:
            if len_ != 0:
                
                # Update len to the previous lps value 
                # to avoid redundant comparisons
                len_ = lps[len_ - 1]
            else:
                
                # If no matching prefix found, set lps[i] to 0
                lps[i] = 0
                i += 1

def search(pat, txt):
    n = len(txt)
    m = len(pat)

    lps = [0] * m
    res = []

    constructLps(pat, lps)

    # Pointers i and j, for traversing 
    # the text and pattern
    i = 0
    j = 0

    while i < n:
        
        # If characters match, move both pointers forward
        if txt[i] == pat[j]:
            i += 1
            j += 1

            # If the entire pattern is matched 
            # store the start index in result
            if j == m:
                res.append(i - j)
                
                # Use LPS of previous index to 
                # skip unnecessary comparisons
                j = lps[j - 1]
        
        # If there is a mismatch
        else:
            
            # Use lps value of previous index
            # to avoid redundant comparisons
            if j != 0:
                j = lps[j - 1]
            else:
                i += 1
    return res 

if __name__ == "__main__":
    txt = "aabaacaadaabaaba"
    pat = "aaba"

    res = search(pat, txt)
    for i in range(len(res)):
        print(res[i], end=" ")

C#

// C# program to search the pattern in given text using
// KMP Algorithm

using System;
using System.Collections.Generic;

class GfG {
    static void ConstructLps(string pat, int[] lps) {
        // len stores the length of longest prefix which 
        // is also a suffix for the previous index
        int len = 0;

        // lps[0] is always 0
        lps[0] = 0;

        int i = 1;
        while (i < pat.Length) {
          
            // If characters match, increment the size of lps
            if (pat[i] == pat[len]) {
                len++;
                lps[i] = len;
                i++;
            }
          
            // If there is a mismatch
            else {
                if (len != 0) {
                  
                    // Update len to the previous lps value 
                    // to avoid redundant comparisons
                    len = lps[len - 1];
                }
                else {
                  
                    // If no matching prefix found, set lps[i] to 0
                    lps[i] = 0;
                    i++;
                }
            }
        }
    }

    static List<int> search(string pat, string txt) {
        int n = txt.Length;
        int m = pat.Length;

        int[] lps = new int[m];
        List<int> res = new List<int>();

        ConstructLps(pat, lps);

        // Pointers i and j, for traversing 
        // the text and pattern
        int i = 0;
        int j = 0;

        while (i < n) {
          
            // If characters match, move both pointers forward
            if (txt[i] == pat[j]) {
                i++;
                j++;

                // If the entire pattern is matched 
                // store the start index in result
                if (j == m) {
                    res.Add(i - j);
                    
                    // Use LPS of previous index to 
                    // skip unnecessary comparisons
                    j = lps[j - 1];
                }
            }
            
            // If there is a mismatch
            else {
            
                // Use lps value of previous index
                // to avoid redundant comparisons
                if (j != 0)
                    j = lps[j - 1];
                else
                    i++;
            }
        }
        return res;
    }

    static void Main(string[] args) {
        string txt = "aabaacaadaabaaba";
        string pat = "aaba";

        List<int> res = search(pat, txt);
        for (int i = 0; i < res.Count; i++)
            Console.Write(res[i] + " ");
    }
}

JavaScript

// JavaScript program to search the pattern in given text 
// using KMP Algorithm

function constructLps(pat, lps) {
    
    // len stores the length of longest prefix which 
    // is also a suffix for the previous index
    let len = 0;

    // lps[0] is always 0
    lps[0] = 0;

    let i = 1;
    while (i < pat.length) {
        
        // If characters match, increment the size of lps
        if (pat[i] === pat[len]) {
            len++;
            lps[i] = len;
            i++;
        }
        
        // If there is a mismatch
        else {
            if (len !== 0) {
                
                // Update len to the previous lps value 
                // to avoid redundant comparisons
                len = lps[len - 1];
            } else {
                
                // If no matching prefix found, set lps[i] to 0
                lps[i] = 0;
                i++;
            }
        }
    }
}

function search(pat, txt) {
    const n = txt.length;
    const m = pat.length;

    const lps = new Array(m);
    const res = [];

    constructLps(pat, lps);

    // Pointers i and j, for traversing 
    // the text and pattern
    let i = 0;
    let j = 0;

    while (i < n) {
        
        // If characters match, move both pointers forward
        if (txt[i] === pat[j]) {
            i++;
            j++;

            // If the entire pattern is matched 
            // store the start index in result
            if (j === m) {
                res.push(i - j);
                
                // Use LPS of previous index to 
                // skip unnecessary comparisons
                j = lps[j - 1];
            }
        }
        
        // If there is a mismatch
        else {
            
            // Use lps value of previous index
            // to avoid redundant comparisons
            if (j !== 0)
                j = lps[j - 1];
            else
                i++;
        }
    }
    return res; 
}

const txt = "aabaacaadaabaaba";
const pat = "aaba";

const res = search(pat, txt);
console.log(res.join(" "));

Output

0 9 12

Time Complexity: O(n + m), where n is the length of the text and m is the length of the pattern. This is because creating the LPS (Longest Prefix Suffix) array takes O(m) time, and the search through the text takes O(n) time.

Auxiliary Space: O(m), as we need to store the LPS array of size m.

Z algorithm (Linear time pattern searching Algorithm)

kartik

News

Improve

Article Tags :

Practice Tags :

Similar Reads

What is Pattern Searching ?

Pattern searching in Data Structures and Algorithms (DSA) is a fundamental concept that involves searching for a specific pattern or sequence of elements within a given data structure. This technique is commonly used in string matching algorithms to find occurrences of a particular pattern within a

Introduction to Pattern Searching - Data Structure and Algorithm Tutorial

Pattern searching is an algorithm that involves searching for patterns such as strings, words, images, etc. We use certain algorithms to do the search process. The complexity of pattern searching varies from algorithm to algorithm. They are very useful when performing a search in a database. The Pat

Naive algorithm for Pattern Searching

Given text string with length n and a pattern with length m, the task is to prints all occurrences of pattern in text. Note: You may assume that n > m. Examples:Â Input: Â text = "THIS IS A TEST TEXT", pattern = "TEST"Output: Pattern found at index 10 Input: Â text = Â "AABAACAADAABAABA", pattern =

Rabin-Karp Algorithm for Pattern Searching

Given a text T[0. . .n-1] and a pattern P[0. . .m-1], write a function search(char P[], char T[]) that prints all occurrences of P[] present in T[] using Rabin Karp algorithm. You may assume that n > m. Examples: Input: T[] = "THIS IS A TEST TEXT", P[] = "TEST"Output: Pattern found at index 10 In

KMP Algorithm for Pattern Searching

Given two strings txt and pat, the task is to return all indices of occurrences of pat within txt. Examples: Input: txt = "abcab", pat = "ab"Output: [0, 3]Explanation: The string "ab" occurs twice in txt, first occurrence starts from index 0 and second from index 3. Input: txt= "aabaacaadaabaaba", p

Z algorithm (Linear time pattern searching Algorithm)

This algorithm efficiently locates all instances of a specific pattern within a text in linear time. If the length of the text is "n" and the length of the pattern is "m," then the total time taken is O(m + n), with a linear auxiliary space. It is worth noting that the time and auxiliary space of th

Finite Automata algorithm for Pattern Searching

Given a text txt[0..n-1] and a pattern pat[0..m-1], write a function search(char pat[], char txt[]) that prints all occurrences of pat[] in txt[]. You may assume that n > m.Examples: Input: txt[] = "THIS IS A TEST TEXT" pat[] = "TEST" Output: Pattern found at index 10 Input: txt[] = "AABAACAADAAB

Boyer Moore Algorithm for Pattern Searching

Pattern searching is an important problem in computer science. When we do search for a string in a notepad/word file, browser, or database, pattern searching algorithms are used to show the search results. A typical problem statement would be-Â " Given a text txt[0..n-1] and a pattern pat[0..m-1] wh

Aho-Corasick Algorithm for Pattern Searching

Given an input text and an array of k words, arr[], find all occurrences of all words in the input text. Let n be the length of text and m be the total number of characters in all words, i.e. m = length(arr[0]) + length(arr[1]) + ... + length(arr[k-1]). Here k is total numbers of input words. Exampl

ÂÂkasaiâ€™s Algorithm for Construction of LCP array from Suffix Array

Background Suffix Array : A suffix array is a sorted array of all suffixes of a given string. Let the given string be "banana". 0 banana 5 a1 anana Sort the Suffixes 3 ana2 nana ----------------> 1 anana 3 ana alphabetically 0 banana 4 na 4 na 5 a 2 nanaThe suffix array for "banana" :suffix[] = {