Delete Duplicate Rows from SQL Server Table

In this article we will learn how to Delete Duplicate Rows using these methods.

  • Delete statement with Group by and Having clause.
  • By collecting result on CTE with ROW_NUMBER then Delete statement on CTE.

Using GROUP BY and HAVING Clause

Since when I am into SQL World I have been using this method to find the duplicate rows and remove them from the table. I have used this method many many times in experience of 12 years in SQL Server database management. What I do is, just get a min id or max from the table and run the Delete statement with NOT IN that ID.

First of lets run select statement to find the records which we want to keep in the table.

  1. Run Select statement min of a column which represents identity of row. Mostly it can be primary key or id column.
  2. Group by on columns which generates duplicates
  3. Find only rows which has multiple occurrences in this group

Step 1 : Review the Table and understand the Uniqueness

Here is the example table- EmployeeMst having two employees duplicated but they have different age. We have to decide which rows should be kept and which should be removed.

Select statement EmployeeMst
EmployeeMst

Lets take a decision that any row which is inserted first should be kept and rest duplicates should be removed.

Group by with Min and Max
Group by with Min and Max

Step 2 : Find Duplicate Records with Select statement

-- Code to find duplicate records.
select * from EmployeeMst 
where id in (
	select  max(id) 
	from employeemst 
	group by EmployeeName,Department 
	having count(1)>1
)

Step 3 : Final Query – Run Delete statement using with IN or NOT IN

-- Delete duplicate records
delete from EmployeeMst 
where id in (
	select  max(id) 
	from employeemst 
	group by EmployeeName,Department 
	having count(1)>1
)

Same thing can be done in different way. We will remove having count(1)>1 from inner query and we will use Not IN instead of IN

delete from EmployeeMst 
where id not in (
	select  min(id) 
	from employeemst 
	group by EmployeeName,Department 
)

This will give you stronger result. How? When you have have multiple repetition of the same rows then the above will be perfect solution. Yes, it may take little more time comparatively but it is the perfect one.


Using CTE with ROW_NUMBER

We have another method to find and remove the Duplicate Rows from the table. We can use ROW_NUMBER function to with Temp Table along with Delete Statement. It is pretty easy process.

Step 1 : Review the Table by using ORDER BY and understand Uniqueness

Duplicate Rows
Duplicate Rows

Here is the sample table where we have Employees. We have two employees duplicated, but their age is different. We will ignore the age and find the duplicate rows by EmployeeName and Department.

Step 2 : Apply RepeatCount Column to find number of repetition

We will run ROW_NUMBER with PARTITION BY to find Repetition sequence of employee. In the example below we will get Repeat count using partition on EmployeeName and Department

SELECT EmployeeName,ROW_NUMBER() OVER(PARTITION by EmployeeName, Department ORDER BY EmployeeName) 
AS RepeatCount
FROM dbo.EmployeeMst
ROW_NUMBER to find duplicate rows
Employee List with Repeat Count

Step 3 : Collect result in CTE

Now we will use Common Table Expression(CTE) to collect result from above select statement.

Lets first run select statement on CTE to make sure we are getting right result in CTE then we will replace select statement with Delete.

Here is the select statement on CTE named TempEmp

WITH TempEmp (EmployeeName,RepeatCount)
AS
(
SELECT EmployeeName,ROW_NUMBER() OVER(PARTITION by EmployeeName, Department ORDER BY EmployeeName) 
AS RepeatCount
FROM dbo.EmployeeMst
)
--Select Rows from TempEmp where RepeatCount is more than 1
select * FROM TempEmp
WHERE RepeatCount> 1 
Repeat Rows in CTE
Repeat Rows in CTE

Step 4: Final Query – Run Delete Statement on CTE

We can see two rows are selected in CTE. We can run delete statement on CTE as below.

WITH TempEmp (EmployeeName,RepeatCount)
AS
(
SELECT EmployeeName,ROW_NUMBER() OVER(PARTITION by EmployeeName, Department ORDER BY EmployeeName) 
AS RepeatCount
FROM dbo.EmployeeMst
)
--Delete from TempEmp where RepeatCount is more than 1
Delete  FROM TempEmp
WHERE RepeatCount> 1 

This is final result after deletion of Duplicate Rows.


See Also :

How to use ROW_NUMBER function with PARTITION BY clause

Should we Shrink Database to release unused space on disk?

How to get list of Tables with Table Size, Row Count and Index size allocated on Disk

2 thoughts on “Delete Duplicate Rows from SQL Server Table

Leave a Reply

Your email address will not be published. Required fields are marked *