Remove duplicate words from Sentence using Regular Expression
Given a string str which represents a sentence, the task is to remove the duplicate words from sentences using regular Expression in Programming Languages like C++, Java, C#, Python, etc.
Examples of Remove Duplicate Words from Sentences
Input: str = “Good bye bye world world”
Output: Good bye world
Explanation: We remove the second occurrence of bye and world from Good bye bye world worldInput: str = “Ram went went to to to his home”
Output: Ram went to his home
Explanation: We remove the second occurrence of went and the second and third occurrences of to from Ram went went to to to his home.Input: str = “Hello hello world world”
Output: Hello world
Explanation: We remove the second occurrence of hello and world from Hello hello world world.
1. Get the sentence.
2. Form a regular expression to remove duplicate words from sentences.
regex = "\\b(\\w+)(?:\\W+\\1\\b)+";
The details of the above regular expression can be understood as:
- “\\b”: A word boundary. Boundaries are needed for special cases. For example, in “My thesis is great”, “is” wont be matched twice.
- “\\w+” A word character: [a-zA-Z_0-9]
- (?:\\W+\\1\\b)+ : This part is a non-capturing group (denoted by (?:…)). It’s used to group together the repeated words. Let’s break it down further:
- “\\W+” : This matches one or more non-word characters (anything that is not a word character).
- “\\1:” This is a back reference to the first capturing group (\\w+). It ensures that the same word that was captured earlier is repeated. The \\1 references the exact text captured by the first capturing group.
- “\\b” Another word boundary anchor to ensure that the repeated word is a whole word.
- “+” This quantifier ensures that the non-capturing group (?:\\W+\\1\\b) matches one or more times, effectively matching one or more repeated words.
3. Match the sentence with the Regex. In Java, this can be done using Pattern.matcher().
4. return the modified sentence.
Below is the implementation of the above approach:
// C++ program to remove duplicate words
// using Regular Expression or ReGex.
#include <iostream>
#include <regex>
using namespace std;
// Function to validate the sentence
// and remove the duplicate words
string removeDuplicateWords(string s)
// Regex to matching repeated words.
const regex pattern("\\b(\\w+)(?:\\W+\\1\\b)+", regex_constants::icase);
string answer = s;
for (auto it = sregex_iterator(s.begin(), s.end(), pattern);
it != sregex_iterator(); it++)
// flag type for determining the matching behavior
// here it is for matches on 'string' objects
smatch match;
match = *it;
answer.replace(answer.find(match.str(0)), match.str(0).length(), match.str(1));
return answer;
// Driver Code
int main()
// Test Case: 1
string str1
= "Good bye bye world world";
cout << removeDuplicateWords(str1) << endl;
// Test Case: 2
string str2
= "Ram went went to to his home";
cout << removeDuplicateWords(str2) << endl;
// Test Case: 3
string str3
= "Hello hello world world";
cout << removeDuplicateWords(str3) << endl;
return 0;
// This code is contributed by yuvraj_chandra
// Java program to remove duplicate words
// Using Regular Expression or ReGex.
import java.util.regex.Matcher;
import java.util.regex.Pattern;
// Driver Class
class GFG {
// Function to validate the sentence
// and remove the duplicate words
public static String removeDuplicateWords(String input)
// Regex to matching repeated words.
String regex = "\\b(\\w+)(?:\\W+\\1\\b)+";
Pattern p = Pattern.compile(regex,Pattern.CASE_INSENSITIVE);
// Pattern class contains matcher() method
// to find matching between given sentence
// and regular expression.
Matcher m = p.matcher(input);
// Check for subsequences of input
// that match the compiled pattern
while (m.find()) {
input = input.replaceAll(,;
return input;
// Driver code
public static void main(String args[])
// Test Case: 1
String str1 = "Good bye bye world world";
// Test Case: 2
String str2 = "Ram went went to to his home";
// Test Case: 3
String str3 = "Hello hello world world";
System.out.println( removeDuplicateWords(str3));
# Python program to remove duplicate words
# using Regular Expression or ReGex.
import re
# Function to validate the sentence
# and remove the duplicate words
def removeDuplicateWords(input):
# Regex to matching repeated words
regex = r'\b(\w+)(?:\W+\1\b)+'
return re.sub(regex, r'\1', input, flags=re.IGNORECASE)
# Driver Code
# Test Case: 1
str1 = "Good bye bye world world"
# Test Case: 2
str2 = "Ram went went to to his home"
# Test Case: 3
str3 = "Hello hello world world"
# This code is contributed by yuvraj_chandra
using System;
using System.Text.RegularExpressions;
class Program
// Function to validate the sentence
// and remove the duplicate words
static string RemoveDuplicateWords(string s)
// Regex to matching repeated words.
Regex pattern = new Regex(@"\b(\w+)(?:\W+\1\b)+", RegexOptions.IgnoreCase);
string answer = s;
MatchCollection matches = pattern.Matches(s);
foreach (Match match in matches)
answer = answer.Replace(match.Groups[0].Value, match.Groups[1].Value);
return answer;
// Driver Code
static void Main()
// Test Case: 1
string str1 = "Good bye bye world world";
// Test Case: 2
string str2 = "Ram went went to to his home";
// Test Case: 3
string str3 = "Hello hello world world";
// Function to remove duplicate words using Regular Expression
function removeDuplicateWords(input) {
// Regular expression to match repeated words
let regex = /\b(\w+)(?:\W+\1\b)+/gi;
// Replace duplicate words with the first occurrence
return input.replace(regex, '$1');
// Test cases
// Test Case: 1
let str1 = "Good bye bye world world";
// Test Case: 2
let str2 = "Ram went went to to his home";
// Test Case: 3
let str3 = "Hello hello world world";
Good bye world Ram went to his home Hello world
Complexity of the above Programs
Time Complexity : O(n), where n is length of string
Auxiliary Space : O(1)