Open In App

Count number of Distinct Substring in a String

Last Updated : 06 Mar, 2025
Summarize
Comments
Improve
Suggest changes
Like Article
Like
Share
Report
News Follow

Given a string, count all distinct substrings of the given string.

Examples: 

Input : abcd
Output : abcd abc ab a bcd bc b cd c d
All Elements are Distinct

Input : aaa
Output : aaa aa a aa a a
All elements are not Distinct

Prerequisite : Print subarrays of a given array

The idea is to use hash table (HashSet in Java) to store all generated substrings. Finally we return size of the HashSet.

Implementation:

C++
// C++ program to count all distinct substrings in a string
#include<bits/stdc++.h>
using namespace std;

int distinctSubstring(string str)
{
    // Put all distinct substring in a HashSet
    set<string> result ;

    // List All Substrings
    for (int i = 0; i <= str.length(); i++)
    {
        for (int j = 1; j <= str.length()-i; j++)
        {

            // Add each substring in Set
            result.insert(str.substr(i, j));
        }
    }

    // Return size of the HashSet
    return result.size();
}

// Driver Code
int main()
{
    string str = "aaaa";
    cout << (distinctSubstring(str));
}

// This code is contributed by Rajput-Ji
Java Python C# JavaScript

Output
4

Complexity Analysis:

  • Time Complexity: O(n3logn)
  • Auxiliary Space: O(n),  since n extra space has been taken.

How to print the distinct substrings?

C++
// C++ program to count all distinct
// substrings in a string
#include <bits/stdc++.h>
using namespace std;

set<string> distinctSubstring(string str)
{

    // Put all distinct substrings
    // in the Hashset
    set<string> result;

    // List all substrings
    for(int i = 0; i <= str.length(); i++)
    {
        for(int j = i + 1; j <= str.length(); j++)
        {

            // Add each substring in Set
            result.insert(str.substr(i, j));
        }
    }

    // Return the hashset
    return result;
}

// Driver code
int main()
{
    string str = "aaaa";
    set<string> subs = distinctSubstring(str);

    cout << "Distinct Substrings are: \n";
    for(auto i : subs)
        cout << i << endl;
}

// This code is contributed by Ronak Mangal
Java Python C# JavaScript

Output
Distinct Substrings are: 
a
aa
aaa
aaaa

Complexity Analysis:

  • Time Complexity: O(n3logn)
  • Auxiliary Space: O(n)

Optimization: We can further optimize the above code. The substr() function works in linear time. We can use append current character to previous substring to get the current substring. 

Implementation:

C++
// C++ implementation of the approach
#include <bits/stdc++.h>
using namespace std;

// Function to return the count of
// valid sub-strings
void printSubstrings(string s)
{

    // To store distinct output substrings
    unordered_set<string> us;

    // Traverse through the given string and
    // one by one generate substrings beginning
    // from s[i].
    for (int i = 0; i < s.size(); ++i) {

        // One by one generate substrings ending
        // with s[j]
        string ss = "";
        for (int j = i; j < s.size(); ++j) {

            ss = ss + s[j];
            us.insert(ss);
        }
    }

    // Print all substrings one by one
    for (auto s : us)
        cout << s << " ";
}

// Driver code
int main()
{
    string str = "aaabc";
    printSubstrings(str);
    return 0;
}
Java Python C# JavaScript

Output
bc b abc ab aabc aa aaa c a aaab aab aaabc 

Complexity Analysis:

  • Time Complexity: O(n2)
  • Auxiliary Space: O(n)

Space Optimization using Trie Data Structure (when we just need count of distinct substrings)

The above approach makes use of hashing which may lead to memory limit exceeded (MLE) in case of very large strings. The approximate space complexity of them is around O(n^3) as there can be n(n+1)/2 substrings which is around O(n^2) and each substring can be at least of 1 length or n length, i.e O(n/2) average case. This makes the total space complexity to be O(n^3).

We can improve this using Trie. The idea is to insert characters that are not already present in the Trie. And when such addition happens we know that this string is occurring for the first time and thus we print it. And if some characters of the string is already present we just move on to the next node without reading them which helps us on saving space.

The time complexity for this approach is O(n^2) similar to previous approach but the space reduces to O(n)*26. 

Implementation:

C++
#include <bits/stdc++.h>
using namespace std;

class TrieNode {
public:
    bool isWord;
    TrieNode* child[26];

    TrieNode()
    {
        isWord = 0;
        for (int i = 0; i < 26; i++) {
            child[i] = 0;
        }
    }
};

int countDistinctSubstring(string str)
{
    TrieNode* head = new TrieNode();

    // will hold the count of unique substrings
    int count = 0;
    // included count of substr " "

    for (int i = 0; i < str.length(); i++) {
        TrieNode* temp = head;

        for (int j = i; j < str.length(); j++) {
            // when char not present add it to the trie
            if (temp->child[str[j] - 'a'] == NULL) {
                temp->child[str[j] - 'a'] = new TrieNode();
                temp->isWord = 1;
                count++;
            }
            // move on to the next char
            temp = temp->child[str[j] - 'a'];
        }
    }

    return count;
}

int main()
{
    int count = countDistinctSubstring("aaabc");

    cout << "Count of Distinct Substrings: " << count
         << endl;

    return 0;
}
Java Python C# JavaScript

Output
Count of Distinct Substrings: 12

Complexity Analysis:

  • Time Complexity: O(n2)
  • Auxiliary Space: O(n2


Next Article
Practice Tags :

Similar Reads

three90RightbarBannerImg