How to remove duplicate rows from an SQL table

Overview

In SQL, we can follow the following two steps to remove duplicate values:

  1. Group all the rows using the target column(s).
  2. Find all the row IDs using the MIN() function. This will give us a list of all the row IDs. In case of duplicate rows, only the ID of the first instance will be taken.
  3. Delete all the rows that are not present in the list returned in step 2.

Example

Let's create a table and insert some values into it.

/* Create table Student */
CREATE TABLE Student(
ID INT NOT NULL,
NAME VARCHAR (20) NOT NULL,
AGE INT NOT NULL,
PRIMARY KEY (ID)
);
/* Insert values in table Student */
INSERT INTO Student (ID, NAME, AGE)
VALUES (1, 'Shubh', 22);
INSERT INTO Student (ID, NAME, AGE)
VALUES (2, 'Parth', 21);
INSERT INTO Student (ID, NAME, AGE)
VALUES (3, 'Shubh', 22);
INSERT INTO Student (ID, NAME, AGE)
VALUES (4, 'Pratik', 23);
/* Display table data */
SELECT * FROM Student;

Explanation

  • Lines 2 to 7: We create a table, Student.
  • Lines 10 to 20: We insert a few values in the Student table.
  • Line 23: We display the data present in the Student table.

Now let's find and delete all the duplicate rows from the Student table.

/* Create table Student */
CREATE TABLE Student(
ID INT NOT NULL,
NAME VARCHAR (20) NOT NULL,
AGE INT NOT NULL,
PRIMARY KEY (ID)
);
/* Insert values in table Student */
INSERT INTO Student (ID, NAME, AGE)
VALUES (1, 'Shubh', 22);
INSERT INTO Student (ID, NAME, AGE)
VALUES (2, 'Parth', 21);
INSERT INTO Student (ID, NAME, AGE)
VALUES (3, 'Shubh', 22);
INSERT INTO Student (ID, NAME, AGE)
VALUES (4, 'Pratik', 23);
/* Removing duplicate rows from 'Student' table */
DELETE FROM Student WHERE ID NOT IN (
SELECT ID FROM (
SELECT MIN(ID) AS ID FROM Student
GROUP BY NAME, AGE
) AS S
);
/* Display table data */
SELECT * FROM Student;

Explanation

  • Lines 2 to 7: We create a table, Student.
  • Lines 10 to 20: We insert a few values in the Student table.
  • Lines 23 to 28: We find a list of all the row IDs in the Student table using the MIN() function and delete those rows that are not present in this list.
  • Line 31: We display the data present in the Student table.

Free Resources