How to Find Duplicate Records in a MySQL Table
-
Use the
SELECT DISTINCT
Statement to Check for Duplicates in a MySQL Table -
Use the
SELECT COUNT
Statement to Find the Number of Occurrence of Duplicates in a MySQL Table -
Use the
INNER JOIN
Statement to View Duplicate Records in a MySQL Table
This tutorial explores how to check for duplicates in a MySQL table and retrieve the number of duplicates with the SELECT DISTINCT
and SELECT COUNT
statements, respectively.
Use the SELECT DISTINCT
Statement to Check for Duplicates in a MySQL Table
The impact of having duplicate records in a database table can vary from a minor inconvenience to disaster. Luckily, MySQL has a few nifty keywords that can combine to scan a table for duplicates.
Also, we can count the number of occurrences of duplicate records and delete them where necessary.
Let us create a test_company
database with a sample table called employee_details
.
CREATE DATABASE test_company;
USE test_company;
CREATE TABLE employees_details(
id INT AUTO_INCREMENT,
name VARCHAR (255) NOT NULL,
title VARCHAR(255) NOT NULL,
salary INT,
PRIMARY KEY (id)
);
The table is populated with values, including duplicates.
INSERT INTO employees_details (name, title, salary) Values
('James Maddison','Computer Engineer',80000),
('Matthew Defoe','Software Architect',150000),
('Daniel Jameson','Software Engineer II', 95000),
('Jules Reddington','Senior Software Engineer',120000),
('Carlos Rodriguez','Data Engineer',100000),
('Matthew Defoe','Software Architect',150000),
('Daniel Jameson','Software Engineer II', 95000),
('Jules Reddington','Senior Software Engineer',120000);
SELECT * FROM employees_details;
Output:
+----+------------------+--------------------------+--------+
| id | name | title | salary |
+----+------------------+--------------------------+--------+
| 1 | James Maddison | Computer Engineer | 80000 |
| 2 | Matthew Defoe | Software Architect | 150000 |
| 3 | Daniel Jameson | Software Engineer II | 95000 |
| 4 | Jules Reddington | Senior Software Engineer | 120000 |
| 5 | Carlos Rodriguez | Data Engineer | 100000 |
| 6 | Matthew Defoe | Software Architect | 150000 |
| 7 | Daniel Jameson | Software Engineer II | 95000 |
| 8 | Jules Reddington | Senior Software Engineer | 120000 |
+----+------------------+--------------------------+--------+
8 rows in set (0.00 sec)
There are three duplicates in this table, and this is easy to spot because of the small size of the table. We use the SELECT DISTINCT
statement for larger tables to retrieve unique records from the table.
According to the official documentation, the SELECT DISTINCT
statement retrieves only one instance of an output record, even if it occurs multiple times.
Hence, we can be sure there are duplicates when the number of records returned by the SELECT DISTINCT
statement is less than the total records in the table.
-- Retrieving only distinct records.
SELECT DISTINCT name,title,salary FROM employees_details;
Output:
+------------------+--------------------------+--------+
| name | title | salary |
+------------------+--------------------------+--------+
| James Maddison | Computer Engineer | 80000 |
| Matthew Defoe | Software Architect | 150000 |
| Daniel Jameson | Software Engineer II | 95000 |
| Jules Reddington | Senior Software Engineer | 120000 |
| Carlos Rodriguez | Data Engineer | 100000 |
+------------------+--------------------------+--------+
5 rows in set (0.00 sec)
Notice that the id
column is not included in the query because MySQL identifies the id
column as unique records.
Hence, including the id
column in the query will return all records (including the duplicates) as unique.
From the result-set, we can deduce that there are three (3) duplicate records since the query returns five (5) distinct records and there are eight (8) records in the table.
Use the SELECT COUNT
Statement to Find the Number of Occurrence of Duplicates in a MySQL Table
Now, after detecting the existence of duplicates, we can use the SELECT COUNT
statement to find the number of occurrences of the duplicates.
SELECT name AS 'employee name', COUNT(*) AS Occurrence FROM employees_details
GROUP BY name
HAVING Occurrence > 1;
Output:
+------------------+------------+
| employee name | Occurrence |
+------------------+------------+
| Daniel Jameson | 2 |
| Jules Reddington | 2 |
| Matthew Defoe | 2 |
+------------------+------------+
3 rows in set (0.001 sec)
This retrieves the duplicated records and the number of duplicates per record in the table. As expected, there are three (3) duplicated records.
The use of the SELECT COUNT
statement is further discussed via this official reference.
Use the INNER JOIN
Statement to View Duplicate Records in a MySQL Table
We can use an INNER JOIN
query between the target table and the SELECT DISTINCT
query to view the duplicate records alongside the main record.
SELECT I.id, O.name, O.title, O.salary
FROM employees_details AS I
INNER JOIN(SELECT DISTINCT name, title, salary FROM employees_details) AS O
ON I.name = O.name;
Output:
+----+------------------+--------------------------+--------+
| id | name | title | salary |
+----+------------------+--------------------------+--------+
| 1 | James Maddison | Computer Engineer | 80000 |
| 2 | Matthew Defoe | Software Architect | 150000 |
| 6 | Matthew Defoe | Software Architect | 150000 |
| 3 | Daniel Jameson | Software Engineer II | 95000 |
| 7 | Daniel Jameson | Software Engineer II | 95000 |
| 4 | Jules Reddington | Senior Software Engineer | 120000 |
| 8 | Jules Reddington | Senior Software Engineer | 120000 |
| 5 | Carlos Rodriguez | Data Engineer | 100000 |
+----+------------------+--------------------------+--------+
8 rows in set (0.001 sec)
Victor is an experienced Python Developer, Machine Learning Engineer and Technical Writer with interests across various fields of science and engineering. He is passionate about learning new technologies and skill and working on challenging problems. He enjoys teaching, intellectual discourse, and gaming, among other things.
LinkedIn GitHub