SQL Query to Delete Duplicate Rows

Last Updated : 26 Nov, 2024

Duplicate rows in a database can cause inaccurate results, waste storage space, and slow down queries. Cleaning duplicate records from our database is an essential maintenance task for ensuring data accuracy and performance. Duplicate rows in a SQL table can lead to data inconsistencies and performance issues, making it crucial to identify and remove them effectively

In this article, we will explain the process of deleting duplicate rows from a SQL table step-by-step, using SQL Server, with examples and outputs. We’ll cover techniques using GROUP BY, CTE, and more, incorporating best practices to help us effectively handle duplicates.

What Are Duplicate Rows?

Duplicate rows are records in a database table that have identical values in one or more columns. These rows often arise due to issues like multiple imports, user errors, or missing constraints like primary keys or unique indexes. Duplicate rows can lead to inaccurate reports, redundant data storage, and slower query performance, making it essential to remove them.

A SQL query to delete duplicate rows typically involves identifying duplicates using functions like ROW_NUMBER() or COUNT() and making sure that only one copy of each record is kept in the table.

Why Remove Duplicate Rows?

Optimize memory usage by eliminating redundant data.
Ensure data accuracy for reports and queries.
Improve query performance by reducing data redundancy.

Steps to Delete Duplicate Rows in SQL

To effectively remove duplicate rows in SQL, we can follow a structured approach. Below are the steps to create a DETAILS table and demonstrate how to identify and delete duplicate records:

Step 1: Create the Sample Table

We will create a table named DETAILS to demonstrate how to identify and delete duplicate rows. This step helps in setting up the necessary structure to store sample data and perform operations like detecting duplicates and applying deletion techniques.

Query:

CREATE TABLE DETAILS (
    SN INT IDENTITY(1,1) PRIMARY KEY,
    EMPNAME VARCHAR(25) NOT NULL,
    DEPT VARCHAR(20) NOT NULL,
    CONTACTNO BIGINT NOT NULL,
    CITY VARCHAR(15) NOT NULL
);

Step 2: Insert Data into the Table

Let’s insert some data, including duplicates, into the DETAILS table. This step allows us to copy real-world scenarios where duplicate records might occur, enabling us to demonstrate how to identify and remove them effectively.

Query:

INSERT INTO DETAILS (EMPNAME, DEPT, CONTACTNO, CITY)
VALUES 
    ('VISHAL', 'SALES', 9193458625, 'GAZIABAD'),
    ('VIPIN', 'MANAGER', 7352158944, 'BAREILLY'),
    ('ROHIT', 'IT', 7830246946, 'KANPUR'),
    ('RAHUL', 'MARKETING', 9635688441, 'MEERUT'),
    ('SANJAY', 'SALES', 9149335694, 'MORADABAD'),
    ('VIPIN', 'MANAGER', 7352158944, 'BAREILLY'),
    ('VISHAL', 'SALES', 9193458625, 'GAZIABAD'),
    ('AMAN', 'IT', 78359941265, 'RAMPUR');

Output

Step 3: Identify Duplicate Rows

We Use the GROUP BY clause with the COUNT(*) function to find rows with duplicate values. This step helps us group the records by specific columns and count how many times each combination occurs, making it easier to identify duplicates that appear more than once in the table.

Query:

SELECT EMPNAME,DEPT,CONTACTNO,CITY, 
COUNT(*) FROM DETAILS
GROUP BY EMPNAME,DEPT,CONTACTNO,CITY
HAVING COUNT(*)>1

Output

Step 4: Retain Unique Rows

To identify unique rows, use the same query without the HAVING clause or filter rows with a count of 1. This step allows us to isolate the rows that appear only once in the table, ensuring that only the distinct records are retained while duplicates are removed.

Query:

SELECT EMPNAME,DEPT,CONTACTNO,CITY, 
COUNT(*) FROM DETAILS
GROUP BY EMPNAME,DEPT,CONTACTNO,CITY

Output

Step 5: Delete Duplicate Rows

Use the GROUP BY clause along with MIN(SN) to retain one unique row for each duplicate group. This method identifies the first occurrence of each duplicate combination based on the SN (serial number) and deletes the other duplicate rows.

Query:

DELETE FROM DETAILS
WHERE SN NOT IN (
    SELECT MIN(SN)
    FROM DETAILS
    GROUP BY EMPNAME, DEPT, CONTACTNO, CITY
);
Select * FROM DETAILS;

Output

Output after Deleting Duplicate Rows

Step 6: Alternative Approach Using CTE

Using a Common Table Expression (CTE), we can delete duplicates in a more structured way. CTEs provide a cleaner approach by allowing us to define a temporary result set that can be referenced within the DELETE statement. This method can be more readable and maintainable, especially when dealing with complex queries.

Query:

WITH CTE AS (
    SELECT SN, EMPNAME, DEPT, CONTACTNO, CITY,
           ROW_NUMBER() OVER (PARTITION BY EMPNAME, DEPT, CONTACTNO, CITY ORDER BY SN) AS RowNum
    FROM DETAILS
)
DELETE FROM CTE WHERE RowNum > 1;

Output

SN	EMPNAME	DEPT	CONTACTNO	CITY
1	VISHAL	SALES	9193458625	GAZIABAD
2	VIPIN	MANAGER	7352158944	BAREILLY
3	ROHIT	IT	7830246946	KANPUR
4	RAHUL	MARKETING	9635688441	MEERUT
5	SANJAY	SALES	9149335694	MORADABAD
8	AMAN	IT	78359941265	RAMPUR

Explanation:

In the above result, the duplicate rows for VISHAL and VIPIN have been removed, and only one instance of each remains.

Conclusion

Duplicate rows in SQL databases can negatively impact performance and data accuracy. Using methods like GROUP BY, ROW_NUMBER(), and CTE, we can efficiently delete duplicate rows in SQL while retaining unique records. Always test your queries on a backup or development environment to ensure accuracy before applying them to production databases. By Using these methods, we can confidently remove duplicate rows in SQL, keeping our database clean and reliable.

FAQs

How to delete duplicate records in SQL query?

Use the ROW_NUMBER() function with a CTE to identify duplicates based on specific criteria, then delete rows where the row number is greater than 1.

How do I delete duplicate rows?

Identify duplicate rows using a unique column or combination of columns, leverage the DISTINCT keyword or ROW_NUMBER() function, and delete the unwanted duplicates.

How do I delete duplicate records in the same table?

Use a WITH clause to create a temporary table (CTE) to flag duplicates, then delete rows that exceed your duplicate threshold, typically by row number or timestamp.

SQL Query to Convert an Integer to Year Month and Days

ManasChhabra2

Improve

Article Tags :

SQL Query to Delete Duplicate Rows

What Are Duplicate Rows?

Why Remove Duplicate Rows?

Steps to Delete Duplicate Rows in SQL

Step 1: Create the Sample Table

Step 2: Insert Data into the Table

Step 3: Identify Duplicate Rows

Step 4: Retain Unique Rows

Step 5: Delete Duplicate Rows

Step 6: Alternative Approach Using CTE

Conclusion

FAQs

How to delete duplicate records in SQL query?

How do I delete duplicate rows?

How do I delete duplicate records in the same table?

Similar Reads

MySQL Queries

MySQL Clause

MySQL Working with Data

MySQL Working with Images

Thank You!

What kind of Experience do you want to share?