I was helping someone out today on Experts-Exchange who had a quite a few duplicate records in a table that he wanted to get rid of. Here’s an example of the data.
SELECT * FROM People GO
LastName FirstName -------- --------- Smith John Smith John Smith John Jackson Tom Jones Sarah Jones Sarah Williams Jessica
As you can see, there is nothing in this table that makes each row unique, and therefore you cannot delete one of the records for John Smith without deleting both records.
The easiest way to delete these duplicates is to add a new IDENTITY column to the table.
ALTER TABLE People ADD ID INT IDENTITY(1,1) GO SELECT * FROM People GO
ID LastName FirstName ----------- -------- --------- 1 Smith John 2 Smith John 3 Smith John 4 Jackson Tom 5 Jones Sarah 6 Jones Sarah 7 Williams Jessica
This will provide us with a unique value across all records, which we can use to find and delete the duplicate rows.
DELETE FROM People WHERE ID NOT IN ( --Return only the ID for records we want to keep SELECT ID FROM ( --Return one ID for each person SELECT MIN(ID) AS ID, LastName, FirstName FROM People GROUP BY LastName, FirstName ) a ) GO SELECT * FROM People GO
ID LastName FirstName ----------- -------- --------- 1 Smith John 4 Jackson Tom 5 Jones Sarah 7 Williams Jessica
We are now left with only one record for each person. You can now drop the ID column from the table.
ALTER TABLE People DROP COLUMN ID GO
-A
Advertisement
One of my friends over at http://www.sqlservernation.com, posted another way to delete duplicate records that’s actually much easier if you’re using SQL 2005 or 2008. Instead of having to make changes to the structure of the table, he used a Common Table Expression (CTE) to delete the duplicate records. http://www.sqlservernation.com/blogs/tipweek/archive/2009/02/23/how-to-delete-duplicate-values.aspx
Here’s an example of using a CTE with the data used in this post. This is for SQL 2005 or 2008 only since CTEs and ranking functions are new to SQL 2005.
;WITH CTE_People (FirstName, LastName, Ranking)
AS
(
SELECT FirstName, LastName, DENSE_RANK() OVER(PARTITION BY FirstName, LastName ORDER BY NEWID() ASC)
FROM People
)
DELETE FROM CTE_People
WHERE Ranking > 1
Comment by Aaron Akin — February 26, 2009 @ 9:02 pm |
I like. It’s one of those simple things that us novices don’t always think about.
Comment by Chris — March 9, 2009 @ 7:35 am |
Dude. Stop giving the answers to my interview questions.
Comment by Erik Klein — March 8, 2010 @ 5:36 am |
It sounds like your candidates could use some help! Hope things are going well at GS.
Comment by Aaron Akin — March 8, 2010 @ 8:28 am |