Open In App

Minimum number of deletions and insertions to transform one string into another

Last Updated : 16 Nov, 2024
Summarize
Comments
Improve
Suggest changes
Like Article
Like
Share
Report
News Follow

Given two strings s1 and s2. The task is to remove/delete and insert the minimum number of characters from s1 to transform it into s2. It could be possible that the same character needs to be removed/deleted from one point of s1 and inserted at another point.

Example 1: 

Input: s1 = “heap”, s2 = “pea”
Output: 3
Explanation: Minimum Deletion = 2 and Minimum Insertion = 1
p and h are deleted from the heap, and then p is inserted at the beginning. One thing to note, though p was required it was removed/deleted first from its position and then it was inserted into some other position. Thus, p contributes one to the deletion count and one to the insertion count.

Input: s1 = “geeksforgeeks”, s2 = “geeks”
Output: 8
Explanation: 8 deletions, i.e. remove all characters of the string “forgeeks”.

Using Recursion – O(2^n) Time and O(n) Space

A simple approach to solve the problem involves generating all subsequences of s1 and, for each subsequence, calculating the minimum deletions and insertions required to transform it into s2. An efficent approach uses the concept of longest common subsequence (LCS) to find length of longest LCS. Once we have LCS of two strings, we can find Minimum Insertion and Deletions to convert s1 into s2.

  • To minimize deletions, we only need to remove characters from s1 that are not part of the longest common subsequence (LCS) with s2. This can be determined by subtracting the LCS length from the length of s1. Thus, the minimum number of deletions is:
    minDeletions = length of s1 – LCS length.
  • Similarly, to minimize insertions, we only need to insert characters from s2 into s1 that are not part of the LCS. This can be determined by subtracting the LCS length from the length of s2. Thus, the minimum number of insertions is:
    minInsertions = length of s2 – LCS length.
C++
// C++ program to find the minimum number of insertion and deletion
// using recursion.

#include <iostream>
using namespace std;

int lcs(string &s1, string &s2, int m, int n) {
  
    // Base case: If either string is empty,
    // the LCS length is 0
    if (m == 0 || n == 0)
        return 0;

    // If the last characters of both substrings match
    if (s1[m - 1] == s2[n - 1])
        // Include the matching character in LCS and 
       // recurse for remaining substrings
        return 1 + lcs(s1, s2, m - 1, n - 1);

    else
        // If the last characters do not match, 
      // find the maximum LCS length by:
        // 1. Excluding the last character of s1
        // 2. Excluding the last character of s2
        return max(lcs(s1, s2, m, n - 1), lcs(s1, s2, m - 1, n));
}

int minOperations(string s1, string s2) {
    int m = s1.size();
    int n = s2.size();

    // the length of the LCS for s1[0..m-1]
  // and s2[0..n-1]
    int len = lcs(s1, s2, m, n);

    // Characters to delete from s1
    int minDeletions = m - len;

    // Characters to insert into s1
    int minInsertions = n - len;

    // Total operations needed
    int total = minDeletions + minInsertions;
    return total;
}

int main() {
    string s1 = "AGGTAB";
    string s2 = "GXTXAYB";
    int res = minOperations(s1, s2);
    cout << res;
    return 0;
}
Java Python C# JavaScript

Output
5

Using Top-Down DP (Memoization) – O(n^2) Time and O(n^2) Space

In this approach, we apply memoization to store the results of overlapping subproblems while finding the Longest Common Subsequence (LCS). A 2D array memo is used to save the LCS lengths for different substrings of the two input strings, ensuring that each subproblem is solved only once.
This method is similar to Longest Common Subsequence (LCS) problem using memoization.

C++
// C++ program to find the minimum of insertion and deletion
// using memoization.

#include <iostream>
#include <vector>
using namespace std;

int lcs(string &s1, string &s2, int m, int n, 
        vector<vector<int>> &memo) {
  
    // Base case: If either string is empty, the LCS length is 0
    if (m == 0 || n == 0)
        return 0;

     // If the value is already computed, return
    // it from the memo array
     if(memo[m][n]!=-1)
        return memo[m][n];
   
    // If the last characters of both substrings match
    if (s1[m - 1] == s2[n - 1])
      
        // Include the matching character in LCS and recurse for
       // remaining substrings
        return memo[m][n] = 1 + lcs(s1, s2, m - 1, n - 1, memo);

    else
      
        // If the last characters do not match, find the maximum LCS length by:
        // 1. Excluding the last character of s1
        // 2. Excluding the last character of s2
        return memo[m][n] = max(lcs(s1, s2, m, n - 1, memo),
                                lcs(s1, s2, m - 1, n, memo));
}

int minOperations(string s1, string s2) {
  
    int m = s1.size();  
    int n = s2.size();  
     
     // Initialize the memoization array with -1.
  vector<vector<int>> memo = vector<vector<int>>
                            (m+1,vector<int>(n+1,-1));
   
    // the length of the LCS for 
      // s1[0..m-1] and s2[0..n-1]
    int len = lcs(s1, s2, m, n, memo);

    // Characters to delete from s1
    int minDeletions = m - len;

    // Characters to insert into s1
    int minInsertions = n - len;

    // Total operations needed
    int total = minDeletions + minInsertions;
    return total;
}

int main() {
  
    string s1 = "AGGTAB";
    string s2 = "GXTXAYB";
    int res = minOperations(s1, s2);
    cout << res;
    return 0;
}
Java Python C# JavaScript

Output
5

Using Bottom-Up DP (Tabulation) – O(n^2) Time and O(n^2) Space

The approach is similar to the previous one, just instead of breaking down the problem recursively, we iteratively build up the solution by calculating in bottom-up manner. We maintain a 2D dp[][] table, such that dp[i][j], stores the Longest Common Subsequence (LCS) for the subproblem(i, j).
This approach is similar to finding LCS in bottom-up manner.

C++
// C++ program to find the minimum of insertion and deletion
// using tabulation.

#include <iostream>
#include <vector>
using namespace std;
 
int lcs(string &s1, string &s2) {
  
    int m = s1.size();
    int n = s2.size();

    // Initializing a matrix of size (m+1)*(n+1)
    vector<vector<int>> dp(m + 1, vector<int>(n + 1, 0));

    // Building dp[m+1][n+1] in bottom-up fashion
    for (int i = 1; i <= m; ++i) {
        for (int j = 1; j <= n; ++j) {
            if (s1[i - 1] == s2[j - 1])
                dp[i][j] = dp[i - 1][j - 1] + 1;
            else
                dp[i][j] = max(dp[i - 1][j], dp[i][j - 1]);
        }
    }

    // dp[m][n] contains length of LCS for s1[0..m-1]
    // and s2[0..n-1]
    return dp[m][n];
}

int minOperations(string s1, string s2) {
  
    int m = s1.size();
    int n = s2.size();

    // the length of the LCS for
      // s1[0..m-1] and s2[0..n-1]
    int len = lcs(s1, s2);

    // Characters to delete from s1
    int minDeletions = m - len;

    // Characters to insert into s1
    int minInsertions = n - len;

    // Total operations needed
    int total = minDeletions + minInsertions;
    return total;
}
int main() {
  
    string s1 = "AGGTAB";
    string s2 = "GXTXAYB";
    int res = minOperations(s1, s2);
    cout << res;

    return 0;
}
Java Python C# JavaScript

Output
5

Using Bottom-Up DP (Space-Optimization)– O(n^2) Time and O(n) Space

In the previous approach, the longest common subsequence (LCS) algorithm uses O(n * n) space to store the entire dp table. However, since each value in dp[i][j] only depends on the current row and the previous row, we don’t need to store the entire table. This can be optimized by storing only the current and previous rows. For more details, refer to A Space Optimized Solution of LCS.

C++
// C++ program to find the minimum of insertion and deletion
// using space optimized.

#include <bits/stdc++.h>
using namespace std;

int lcs(string &s1, string &s2) {
  
    int m = s1.length(), n = s2.length();

    vector<vector<int>> dp(2, vector<int>(n + 1));

    for (int i = 0; i <= m; i++) {

        // Compute current binary index. If i is even
        // then curr = 0, else 1
        bool curr = i & 1;

        for (int j = 0; j <= n; j++) {
          
            // Initialize first row and first column with 0
            if (i == 0 || j == 0)
                dp[curr][j] = 0;

            else if (s1[i - 1] == s2[j - 1])
                dp[curr][j] = dp[1 - curr][j - 1] + 1;

            else
                dp[curr][j] = max(dp[1 - curr][j], dp[curr][j - 1]);
        }
    }

    return dp[m & 1][n];
}

int minOperations(string s1, string s2) {
    int m = s1.size();
    int n = s2.size();

    // the length of the LCS for s1[0..m-1] and s2[0..n-1]
    int len = lcs(s1, s2);

    // Characters to delete from s1
    int minDeletions = m - len;

    // Characters to insert into s1
    int minInsertions = n - len;

    // Total operations needed
    int total = minDeletions + minInsertions;
    return total;
}

int main() {
    string s1 = "AGGTAB";
    string s2 = "GXTXAYB";
    int res = minOperations(s1, s2);
    cout << res;
    return 0;
}
Java Python C# JavaScript

Output
5


Next Article
Article Tags :
Practice Tags :

Similar Reads

three90RightbarBannerImg