Locating Duplicate Entries In A MySQL Table


If you don’t have unique indexes on your table, it is likely that you will occasionally have entries that are duplicated. This can often happen because of a software bug or possibly a user error. Some applications even choose not to have unique indexes for performance reasons; though this happens at the cost of data integrity.

The best I know of, to demonstrate how to locate the duplicate entries, is to use an example. So let’s say you have the following table:

mysql> desc result;
+-------------------------+--------------+------+-----+---------+----------------+
| Field                   | Type         | Null | Key | Default | Extra          |
+-------------------------+--------------+------+-----+---------+----------------+
| id                      | bigint(20)   | NO   | PRI | NULL    | auto_increment |
| comment                 | varchar(255) | YES  |     | NULL    |                |
| key1_id                 | bigint(20)   | YES  | MUL | NULL    |                |
| key2_id                 | bigint(20)   | YES  | MUL | NULL    |                |
| str1                    | varchar(255) | YES  | MUL | NULL    |                |
| result                  | float        | YES  |     | NULL    |                |
+-------------------------+--------------+------+-----+---------+----------------+

In this example, let’s say you want the following field set to have a unique combination of values. In other words, you want these fields to be a sort of composite key; so that there are no two records in the database which have the same values for these three fields.

| key1_id             | bigint(20)   | YES  | MUL | NULL    |                |
| key2_id             | bigint(20)   | YES  | MUL | NULL    |                |
| str1                | varchar(255) | YES  | MUL | NULL    |                |

Well, let’s say that you are concerned about having duplicate values in your data. You need to have an effective way of querying the database in order to find the duplicate records. The following query would locate duplicates of the above field-set.

mysql> select id, str1, key1_id, key2_id, count(*)
from test_result group by str1,key1_id,key2_id
having count(*) > 1;

In the example you group by the list of fields that you want to use as your composite key and select the fields you want to see in your output. The above output looked like the following:

+-------+----------+--------------+-------------+----------+
| id    | str1     | key1_id      | key2_id | count(*) |
+-------+----------+--------------+-------------+----------+
| 17293 | EDDP     |         1951 |        1923 |        4 |
| 17302 | GRAV     |         1951 |        1923 |        4 |
| 17294 | MTDN     |         1951 |        1923 |        4 |
| 17301 | OXID     |         1951 |        1923 |        4 |
| 17303 | PH       |         1951 |        1923 |        4 |
| 17288 | XTC      |         1951 |        1923 |        4 |
+-------+----------+--------------+-------------+----------+
6 rows in set (0.52 sec)

What we have in this example is four of each of these records which are supposed to be unique.


Leave a Reply

Your email address will not be published. Required fields are marked *