As we all know, SQL Server 2005 has new features when compared to SQL Server 2000. One of the features that we are going to discuss in this article is the Row_Number() function. SQL Server Database administrators and developers have been longing for this function for a long time--now the wait is over.
Traditionally developers and Database administrators used temporary tables and co-related sub-queries to generate calculated row numbers in a query. Now SQL Server 2005 provides a function, which replaces all of the additional resources we used to generate row numbers.
Let us assume that we have the following database [EMPLOYEE TEST] and the following table [EMPLOYEE] in the database. You can use the below query to create the database, table and all the corresponding rows.
USE [MASTER] GO IF EXISTS (SELECT NAME FROM SYS.DATABASES WHERE NAME = N'EMPLOYEE TEST') DROP DATABASE [EMPLOYEE TEST] GO CREATE DATABASE [EMPLOYEE TEST] GO USE [EMPLOYEE TEST] GO IF EXISTS (SELECT * FROM SYS.OBJECTS WHERE OBJECT_ID = OBJECT_ID(N'[DBO].[EMPLOYEE]') AND TYPE IN (N'U')) DROP TABLE [DBO].[EMPLOYEE] GO CREATE TABLE EMPLOYEE (EMPID INT, FNAME VARCHAR(50), LNAME VARCHAR(50)) GO INSERT INTO EMPLOYEE (EMPID, FNAME, LNAME) VALUES (2021110, 'MICHAEL', 'POLAND') GO INSERT INTO EMPLOYEE (EMPID, FNAME, LNAME) VALUES (2021110, 'MICHAEL', 'POLAND') GO INSERT INTO EMPLOYEE (EMPID, FNAME, LNAME) VALUES (2021115, 'JIM', 'KENNEDY') GO INSERT INTO EMPLOYEE (EMPID, FNAME, LNAME) VALUES (2121000, 'JAMES', 'SMITH') GO INSERT INTO EMPLOYEE (EMPID, FNAME, LNAME) VALUES (2011111, 'ADAM', 'ACKERMAN') GO INSERT INTO EMPLOYEE (EMPID, FNAME, LNAME) VALUES (3015670, 'MARTHA', 'LEDERER') GO INSERT INTO EMPLOYEE (EMPID, FNAME, LNAME) VALUES (1021710, 'MARIAH', 'MANDEZ') GO
Let us browse the table Employee by using the following SQL Query.
SELECT EMPID, FNAME, LNAME FROM EMPLOYEE
The results of the above query look like Table 1.0
2021110 | MICHAEL | POLAND | 2021110 | MICHAEL | POLAND | 2021115 | JIM | KENNEDY | 2121000 | JAMES | SMITH | 2011111 | ADAM | ACKERMAN | 3015670 | MARTHA | LEDERER | 1021710 | MARIAH | MANDEZ |
Table 1.0
Traditionally in SQL Server 2000, in order to create row numbers based on the rows available in a table, we used to use the following query.
SELECT ROWID=IDENTITY(int,1,1) , EMPID, FNAME, LNAME INTO EMPLOYEE2 FROM EMPLOYEE ORDER BY EMPID This query created a new table using the identity function in order to generate RowId.
Let us query the table by using the following query.
SELECT RowID, EMPID, FNAME, LNAME FROM EMPLOYEE2
The results of the above query would look like Table 1.1
1 | 1021710 | MARIAH | MANDEZ | 2 | 2011111 | ADAM | ACKERMAN | 3 | 2021110 | MICHAEL | POLAND | 4 | 2021110 | MICHAEL | POLAND | 5 | 2021115 | JIM | KENNEDY | 6 | 2121000 | JAMES | SMITH | 7 | 3015670 | MARTHA | LEDERER |
Table 1.1
In this illustration it is clear that the table has a duplicate row with EMPID = 2021110.
To delete the duplicate row with EMPID = 2021110, we have to delete the row in employee2 table and I cannot delete the duplicate row directly from the Employee table.
SQL Server 2005 provides a new function, Row_Number(), for generating row numbers. In order to delete the duplicate row from the original table we can use the features, Common Table Expression and Row_Number() together.
Let us generate the ROWID using the Row_Number() function based on EMPID.
SELECT ROW_NUMBER() OVER (ORDER BY EMPID ASC) AS ROWID, * FROM EMPLOYEE
The results of the above query would look like Table 1.2
1 | 1021710 | MARIAH | MANDEZ | 2 | 2011111 | ADAM | ACKERMAN | 3 | 2021110 | MICHAEL | POLAND | 4 | 2021110 | MICHAEL | POLAND | 5 | 2021115 | JIM | KENNEDY | 6 | 2121000 | JAMES | SMITH | 7 | 3015670 | MARTHA | LEDERER |
Table 1.2
In this result set, we can identify the duplicate row for the EMPID 2021110.
Let us display the duplicate row using the Common Table expression and Row_Number() function by using the following query.
WITH [EMPLOYEE ORDERED BY ROWID] AS (SELECT ROW_NUMBER() OVER (ORDER BY EMPID ASC) AS ROWID, * FROM EMPLOYEE) SELECT * FROM [EMPLOYEE ORDERED BY ROWID] WHERE ROWID =4
The results of the above query would look like Table 1.3.
This duplicate row can be deleted using the Common Table expression and Row_Number() function by using the following query.
WITH [EMPLOYEE ORDERED BY ROWID] AS (SELECT ROW_NUMBER() OVER (ORDER BY EMPID ASC) AS ROWID, * FROM EMPLOYEE) DELETE FROM [EMPLOYEE ORDERED BY ROWID] WHERE ROWID =4
Let us query the table using the following query.
SELECT * FROM EMPLOYEE
The results of the above query would look like Table 1.4.
2021110 | MICHAEL | POLAND | 2021115 | JIM | KENNEDY | 2121000 | JAMES | SMITH | 2011111 | ADAM | ACKERMAN | 3015670 | MARTHA | LEDERER | 1021710 | MARIAH | MANDEZ |
Table 1.4
In this illustration, we can see that the duplicate row has been deleted.
Conclusion
In this article, we have discussed the new features of Row_Number() function and Common Table Expression and used both the features together to delete duplicate rows. Labels: Row_Number(), SQL Server |