Deleting Duplicates in oracle

Frequently developers ask this question on SQL - I am unable to add a PK constraint to a table - what is the best way of finding the problem rows ?

If you issue an "alter table ... add primary key" command, it may come back with "ORA-02437: cannot validate ..." , because you have some problem rows.

The key to solving the problem is to identify the problem rows and delete the duplicates.

How to identify duplicates
Before issuing a command to remove duplicate records you may want to know how much extra records are present. There are several ways to do that.

Using a "group by" is the best option

	SQL> select empno, count(*) from employee group by empno having count(*)>1;

Another method is to make use of rowid. Select the rowid's greater than the minimum rowid for each key column.

	SQL> select empno from employee a where a.rowid>(select min(rowid) from employee b where a.empno=b.empno);

Using Exceptions table to identify duplicates
If the table contains more than 10000 rows it is better to create an index on empno columns before running the above query. The above query may do a cartesian self join on employee table. This means there will be NxN comparison opeartions. Even if the index is not required for future use, it will be better to create index and drop the index after deleting the duplicates.

Deleting duplicates
In order to delete duplicate rows the following command can be used

	SQL> delete employee a where a.rowid>(select min(rowid) from employee b where a.empno=b.empno);

My Oracle Debug Guide