Locating Duplicate Entries In A MySQL Table

If you don’t have unique index­es on your table, it is like­ly that you will occa­sion­al­ly have entries that are dupli­cat­ed. This can often hap­pen because of a soft­ware bug or pos­si­bly a user error. Some appli­ca­tions even choose not to have unique index­es for per­for­mance rea­sons; though this hap­pens at the cost of data integri­ty.

The best I know of, to demon­strate how to locate the dupli­cate entries, is to use an exam­ple. So let’s say you have the fol­low­ing table:

mysql> desc result;
+-------------------------+--------------+------+-----+---------+----------------+
| Field                   | Type         | Null | Key | Default | Extra          |
+-------------------------+--------------+------+-----+---------+----------------+
| id                      | bigint(20)   | NO   | PRI | NULL    | auto_increment |
| comment                 | varchar(255) | YES  |     | NULL    |                |
| key1_id                 | bigint(20)   | YES  | MUL | NULL    |                |
| key2_id                 | bigint(20)   | YES  | MUL | NULL    |                |
| str1                    | varchar(255) | YES  | MUL | NULL    |                |
| result                  | float        | YES  |     | NULL    |                |
+-------------------------+--------------+------+-----+---------+----------------+

In this exam­ple, let’s say you want the fol­low­ing field set to have a unique com­bi­na­tion of val­ues. In oth­er words, you want the­se fields to be a sort of com­pos­ite key; so that there are no two records in the data­base which have the same val­ues for the­se three fields.

| key1_id             | bigint(20)   | YES  | MUL | NULL    |                |
| key2_id             | bigint(20)   | YES  | MUL | NULL    |                |
| str1                | varchar(255) | YES  | MUL | NULL    |                |

Well, let’s say that you are con­cerned about hav­ing dupli­cate val­ues in your data. You need to have an effec­tive way of query­ing the data­base in order to find the dupli­cate records. The fol­low­ing query would locate dupli­cates of the above field-set.

mysql> select id, str1, key1_id, key2_id, count(*)
from test_result group by str1,key1_id,key2_id
having count(*) > 1;

In the exam­ple you group by the list of fields that you want to use as your com­pos­ite key and select the fields you want to see in your out­put. The above out­put looked like the fol­low­ing:

+-------+----------+--------------+-------------+----------+
| id    | str1     | key1_id      | key2_id | count(*) |
+-------+----------+--------------+-------------+----------+
| 17293 | EDDP     |         1951 |        1923 |        4 |
| 17302 | GRAV     |         1951 |        1923 |        4 |
| 17294 | MTDN     |         1951 |        1923 |        4 |
| 17301 | OXID     |         1951 |        1923 |        4 |
| 17303 | PH       |         1951 |        1923 |        4 |
| 17288 | XTC      |         1951 |        1923 |        4 |
+-------+----------+--------------+-------------+----------+
6 rows in set (0.52 sec)

What we have in this exam­ple is four of each of the­se records which are sup­posed to be unique.

Leave a Reply

Your email address will not be published. Required fields are marked *