How to Delete Duplicate Rows in a Table in SQL Server
Database:
Operators:
Table of Contents
Problem
You want to delete duplicate rows from an existing table in SQL Server.
Example
A company has its employee data stored in the table employees
with the columns empName
and dept
.
empName | dept |
---|---|
Jack Russel | Sales |
Jan Kowalski | HR |
John Doe | Sales |
Jack Russel | Sales |
John Doe | Sales |
Marta Wilson | HR |
Jack Russel | Sales |
Let’s say you want to delete duplicates from this table and leave only one record among the duplicates. For example, Jack Russel from the Sales department has multiple records that are the same.
Solution
One way to approach this problem in SQL Server is to use a CTE and the ROW_NUMBER()
function. Let’s look at the query:
WITH duplicates (name, dept, duplicate_count) AS ( SELECT name, dept, ROW_NUMBER() OVER(PARTITION BY name, dept ORDER BY name) FROM employees ) DELETE from duplicates WHERE duplicate_count > 1
Here are the contents of the employees
table after you run this query.
empName | dept |
---|---|
Jack Russel | Sales |
Jan Kowalski | HR |
John Doe | Sales |
Marta Wilson | HR |
Discussion
First, we create a CTE called duplicates
with a new column called duplicate_count
, which stores all records from the table employees
. The new column stores the count of repetitions for each record in the table. The first occurrence of “Jack Russel - Sales” gets duplicate_count
1, the second one gets 2, and so on. Here’s what the contents of the duplicates
CTE look like:
empName | dept | duplicate_count |
---|---|---|
Jack Russel | Sales | 1 |
Jack Russel | Sales | 2 |
Jack Russel | Sales | 3 |
Jan Kowalski | HR | 1 |
John Doe | Sales | 1 |
John Doe | Sales | 2 |
Marta Wilson | HR | 1 |
We now have a CTE with information about which rows are duplicated. Those that have duplicate_count
of 2 or more are duplications. With that information, we may safely delete all records that are duplications. The query keeps only the rows that have duplicate_count
equal to 1.