Deleting Rows From a Table Referenced in the Query

If you do a fair amount of SQL, then every now and then, you’ll like­ly run into a sit­u­a­tion where you need to delete some­thing from a table; but need to also ref­er­ence that table in the query as well.

The trou­ble is MySQL won’t let you do this, in many cir­cum­stances. For­tu­nate­ly, there is a sim­ple workaround.

delete from tableA 
 where id in (
   select b.id 
   from tableB b 
   join tableA a 
     on a.tableb_id = b.id
   );

The above query would throw an error sim­i­lar to the fol­low­ing:

ERROR 1093 (HY000): You can't specify target table 'tableA' for update in FROM clause

We can skirt this lim­i­ta­tion by nest­ing the sub­s­e­lect inside of anoth­er select state­ment. Then it will work just fine.

delete from tableA where id in (
  select aId from (
    select b.id as bId from tableB b join tableA a 
    on a.tableb_id = b.id 
  ) as apt
);

You’ll get out­put to indi­cate that the query is suc­cess­ful.

Query OK, 183 rows affected (0.53 sec)

This saved me a bunch of time and kept me from hav­ing to rework my query com­plete­ly, so I am shar­ing it with you!

Using MySQL Queries to Dump/Format Data to Delimited Files

It is often use­ful to uti­lize MySQL queries, to export data from tables then write this data into files. MySQL offers the abil­i­ty to cus­tomize many of the details of the­se exports, which should cov­er most sit­u­a­tions.

In this exam­ple, we are select­ing data from a table, apply­ing some for­mat­ting and writ­ing it to a file.

SELECT code, name, REPLACE(REPLACE(REPLACE(comment,"\r\n\r\n",'||'),"\r\n",'||'),"\n",'||') 
INTO OUTFILE '/tmp/20111222_practice_comments.csv' 
FIELDS TERMINATED BY ','  
OPTIONALLY ENCLOSED BY '"' 
ESCAPED BY '\\' 
LINES TERMINATED BY '\n' 
FROM practice_table;

Con­tin­ue read­ing “Using MySQL Queries to Dump/Format Data to Delim­it­ed Files”

RSYNC a File Through a Remote Firewall

One of my recent tasks was to set-up a new auto­mat­ic back­up script, which dumps out the MySQL data­base on the remote host at a reg­u­lar time, at a lat­er time, it is RSYNC’d from the back­up server through a remote fire­wall. I must say that I was a lit­tle sur­prised, to dis­cov­er that the fin­ished script and the con­fig­u­ra­tion that goes along with it, was actu­al­ly quite sim­ple and eas­i­ly repeat­able. I was able to repli­cate the process for three sites very quick­ly and will eas­i­ly be able to scale it to many more when nec­es­sary.

SSH Tunneling and SSH Keys

In order to per­form a process on a remote fire­walled host, you need to first set up keys, to allow the trust­ed back­up server to gain access to the inter­me­di­ate host. You must also set up a key which allows the inter­me­di­ate host to gain access to the fire­walled host.

First, let’s gen­er­ate a pub­lic key on the back­up server, if we don’t already have one. Be sure to use an emp­ty pass phrase since this is an unat­tend­ed script.

[backup@lexx log]# ssh-keygen -t dsa
Generating public/private dsa key pair.
Enter file in which to save the key (/backup/.ssh/id_dsa): 
Enter passphrase (empty for no passphrase): 
Enter same passphrase again: 
Your identification has been saved in /backup/.ssh/id_dsa.
Your public key has been saved in /backup/.ssh/id_dsa.pub.
The key fingerprint is:
3d:48:9c:0f:46:dc:da:c3:a6:19:82:63:b1:18:91:62 backup@lexx

The default key will, by default, be locat­ed in ~/.ssh/id_dsa.pub Copy the con­tents of this file to the clip­board, you will need this to get the remote server to trust the back­up server.

Logon to the remote exter­nal server via ssh. On this server we will con­fig­ure it to trust the back­up server.

[backup@lexx ~]# ssh user@remotehost.com
user@remotehost.com's password: 
Last login: Thu Jul 14 22:57:58 2011 from 69.73.94.214
[user@remotehost ~]# ls -al .ssh
total 28
drwx------  2 user user 4096 2011-07-14 22:05 .
drwxr-x--- 12 user user 4096 2011-07-14 21:54 ..
-rw-------  1 user user 3024 2011-07-14 21:57 authorized_keys2
-rw-------  1 user user  668 2010-10-27 23:52 id_dsa
-rw-r--r--  1 user user  605 2010-10-27 23:52 id_dsa.pub
-rw-r--r--  1 user user 5169 2010-10-21 13:01 known_hosts

If the authorized_keys2 or sim­mi­lar­ly named file does not yet exist, cre­ate it and open the file in your text edi­tor of choice. Then paste the key you copied from the id_dsa.pub file on the back­up server.

To make the remote server rec­og­nize the new­ly added key run the fol­low­ing:

[user@remotehost ~]# ssh-agent sh -c 'ssh-add < /dev/null && bash'

Now we can make sure that the key works as intend­ed by run­ning the fol­low­ing com­mand, which will ssh into the server and exe­cute the upti­me com­mand:

[backup@lexx ~]$ ssh user@remotehost.com uptime
 23:57:17 up 47 days,  4:11,  1 user,  load average: 0.54, 0.14, 0.04

Since we got the out­put of the upti­me com­mand with­out a login prompt, it means the key was cre­at­ed suc­cess­ful­ly.

Now we repeat the ssh key process, this time between the remote­host server and the fire­walled server.

[user@remotehost ~]# ssh-keygen -t dsa
Generating public/private dsa key pair.
Enter file in which to save the key (/user/.ssh/id_dsa): 
Enter passphrase (empty for no passphrase): 
Enter same passphrase again: 
Your identification has been saved in /user/.ssh/id_dsa.
Your public key has been saved in /user/.ssh/id_dsa.pub.
The key fingerprint is:
3d:48:9c:0f:46:dd:df:c3:a6:19:82:63:b1:18:91:62 user@remotehost

Copy the infor­ma­tion from the .ssh/id_dsa.pub of the remote exter­nal server to the fire­walled server, then add to the authorized_keys file and run:

[user@firewalledserver ~]# ssh-agent sh -c 'ssh-add < /dev/null && bash'

Now you should be able to pass the rsync com­mand all the way through the remote fire­wall, to the fire­walled server from the back­up server.

This can be test­ed by the fol­low­ing com­mand, which tun­nels through the fire­wall and exe­cutes the upti­me com­mand on the inter­nal server:

[backup@lexx ~]$ ssh user@remotehost.com ssh user@firewalledserver uptime
 23:52:17 up 41 days,  4:12,  1 user,  load average: 0.50, 0.13, 0.03

RSYNC The Data From the Backup Server, Through The Firewall

Now that we’ve got all of our keys set-up, most of the work has been done. I’m assum­ing you have a cron job on the inter­nal server which dumps the mysql data­base at a speci­fic time. You should set-up your rsync com­mand late enough, so that the cron job has had enough time to dump the data­base.

Here is the rsync com­mand which puts you through the fire­wall to down­load the remote mysql data­base dump. The –z flag allows you to do this with com­pres­sion, which can sig­nif­i­cant­ly speed up the process.

[backup@lexx ~]$ rsync -avz -e "ssh user@remotehost.com ssh" user@firewalledserver:/home/user/rsync-backup/mysqldump.sql /home/backup/

This will cre­ate a hid­den file that will be named some­thing like .mysqldump.sql.NvD8D, which stores the data until the sync is com­plete. After the sync is com­plete you will see a file named mysqldump.sql in /home/backup/ fold­er.

Just set up the nec­es­sary cron scripts to make sure every­thing hap­pens at the right time, pos­si­bly put some log­ging in there so you can see what has hap­pened and you’re done!

Here’s an exam­ple of what I did on the back­up server, to call the back­up script. It appends the out­put of both STDOUT and STDERR to the /var/log/remote_backuplog file each time it is run. It also runs the script as the back­up user so the files it gen­er­ates have the cor­rect per­mis­sions for the back­up user to access.

01 6 * * * backup /home/backup/run_backups.sh >> /var/log/remote_backuplog 2>&1

Here is what my rsync script run_backups.sh looks like.

#!/bin/bash
 
echo "running backups"
# print the date into the logfile
date
 
# backup server 1
echo "backing up server1"
ssh user@externalserver1 ssh user@internalserver1 ls -l /home/user/rsync-backup/mysqldump.sql
/usr/bin/rsync -avz -e "ssh user@externalserver1 ssh" user@internalserver1:/home/user/rsync-backup/mysqldump.sql /home/backup/server1/
 
# backup server 2
echo "backing up server2"
ssh user@externalserver2 ssh user@internalserver2 ls -l /home/user/rsync-backup/mysqldump.sql
/usr/bin/rsync -avz -e "ssh user@externalserver2 ssh" user@internalserver2:/home/user/rsync-backup/mysqldump.sql /home/backup/server2/
 
# backup server 3
echo "backing up server3"
ssh user@externalserver3 ssh user@internalserver3 ls -l /home/user/rsync-backup/mysqldump.sql
/usr/bin/rsync -avz -e "ssh user@externalserver3 ssh" user@internalserver3:/home/user/rsync-backup/mysqldump.sql /home/backup/server3/

How To Execute a Script After a Running Process Completes

Most peo­ple who are famil­iar with Lin­ux, real­ize that there are ways of chain­ing process­es to run one after anoth­er. Typ­i­cal­ly this is done by writ­ing a script, or using && to daisy chain addi­tion­al com­mands on com­mand line.

There is, how­ev­er, anoth­er way to do this; if you’ve already issued a com­mand and want to add anoth­er com­mand after the orig­i­nal has already start­ed. This is espe­cial­ly use­ful if you’re unzip­ping, say a 15 giga­byte data­base dump, and you want to make sure that the import hap­pens imme­di­ate­ly after the import is com­plete.

Here’s an exam­ple of what would hap­pen if I were enter­ing the com­mands man­u­al­ly.

Macintosh:~ chriscase$ scp user@hostname.com:archive/database.sql.gz .
Macintosh:~ chriscase$ gunzip database.sql.gz
Macintosh:~ chriscase$ mysql -u dbusername -pdbpassword dbname &lt; database.sql

Since I’m not going to stay glued to the con­sole for the entire dura­tion of this process, I either need to write a script or fig­ure out anoth­er tech­nique, so I keep things mov­ing along.

As it turns out, there is a very sim­ple way to accom­plish this, with the com­mand wait. This com­mand has the abil­i­ty to wait until a spec­i­fied process is com­plete before exe­cut­ing a com­mand.

Here’s an exam­ple of how this could be used, if you want­ed to add the last two process­es after the scp from the above exam­ple had already begun.

Macintosh:~ chriscase$ scp user@hostname.com:archive/database.sql.gz .

Once the down­load is kicked off, you can kick it into the back­ground by using [ctrl-z] which will pause the process and then issu­ing the com­mand [bg]. This will put the paused process run­ning into the back­ground. Now, to chain the oth­er process­es after­ward, you can do the fol­low­ing.

Macintosh:~ chriscase$ wait %1 && gunzip database.sql.gz && mysql -u dbusername -pdbpassword dbname < database.sql

The above code will wait to exe­cute until the scp is done, then it will use gzip to unzip the file and mysql to import the data­base dump file. Now that you’ve done this, you can go off and do some­thing else, con­fi­dent that your data­base will be done import­ing in a few hours.

Efficiently Loading Paginated Results From MySQL with Just One Query Per Page

There are many sit­u­a­tions in which pag­i­na­tion of query results is very use­ful, espe­cial­ly for per­for­mance opti­miza­tion.  In most of the­se kinds of sit­u­a­tions, the pag­i­nat­ing of results requires you to deter­mine the total num­ber of results, so the appli­ca­tion knows the num­ber of pages avail­able.

The most com­mon way to do this, is to use two queries; one which obtains the count of results and one which obtains a par­tic­u­lar page of results.  This method has the poten­tial to be even less opti­mal than load­ing the entire set of results, how­ev­er; due to the fact that two queries are now nec­es­sary, where­as before there was only one query.

In most data­base sys­tems, it is pos­si­ble to over­come this lim­i­ta­tion; though the tech­nique is speci­fic to the par­tic­u­lar data­base you are using.  This exam­ple explains how to do this in MySQL.

Here is the sub-optimal exam­ple, using two queries; it is load­ing the first page of results. The LIMIT 0, 40 means it will start at posi­tion 0 (the begin­ning) and obtain a set of 40 results.

SELECT count(*) FROM my_table WHERE timestamp >= '2010-03-15' and timestamp <= '2010-08-01';
 
SELECT id, timestamp, field1, field2, field3 FROM my_table WHERE timestamp >= '2010-03-15' and timestamp <= '2010-08-01' ORDER BY id LIMIT 0, 40;

Here is a more opti­mal exam­ple, which uses two state­ments, only one of which is a real query.  Every­thing is done dur­ing the first state­ment, the sec­ond state­ment mere­ly loads the count, which was cal­cu­lat­ed dur­ing the first state­ment.

SELECT SQL_CALC_FOUND_ROWS id, timestamp, field1, field2, field3 FROM my_table WHERE timestamp >= '2010-03-15' and timestamp <= '2010-08-01' ORDER BY id LIMIT 0, 40;
 
SELECT found_rows() AS cnt;

One of the draw­backs of SQL_CALC_FOUND_ROWS, or count(*) in gen­er­al; is the fact that, by run­ning the­se cal­cu­la­tions, you lose some of the ben­e­fit of pag­i­na­tion. This is because your data­base is required to exam­ine all of the effect­ed data in order to gen­er­ate an accu­rate count.

Depend­ing on the specifics of your my.cnf con­fig­u­ra­tion file, the first state­ment will cache part of the infor­ma­tion, caus­ing it to exe­cute faster when sub­se­quent pages are load­ed. In some of my own test­ing I have seen a sig­nif­i­cant speedup after the first page is load­ed.

If you want to get the most out of your appli­ca­tion, you will like­ly need to do a com­bi­na­tion of appli­ca­tion tun­ing, query tun­ing and data­base tun­ing. Gen­er­al­ly you will want to start by tun­ing the appli­ca­tion itself; this way you’re elim­i­nat­ing any bot­tle­necks inher­ent in the appli­ca­tion. Then you’ll need to do some query opti­miza­tions if the appli­ca­tion still isn’t fast enough. Final­ly, you’ll want to look into what con­fig­u­ra­tion changes you can make on the data­base in order to speed things up from the source.

Bash Shell Script to Convert All MySQL Tables to use the Innodb Engine

The fol­low­ing, when run in the Bash shell, will con­vert the engine for all of your MySQL tables to Inn­oDB.

mysql -B -N -e "SHOW TABLES" -u  <username> --password=<password> <databasename> | while read table; \
do \
     echo "+ Converting Table $table"; \
     mysql -B -N -e "alter table $table engine=innodb" -u <username> --password=<password> <databasename>; \ 
done;

Often, if you have a heav­i­ly used data­base, you will want to con­sid­er the Inn­oDB engine. By default, the MySQL data­base engine is MyISAM. How­ev­er, the Inn­oDB engine has many advan­tages; par­tic­u­lar­ly for high uti­liza­tion envi­ron­ments. With Inn­oDB data-integrity is main­tained through­out the entire query process because it is trans­ac­tion safe. 

Inn­oDB also pro­vides row-locking, as opposed to table-locking. With row-locking, while one query is busy updat­ing or insert­ing a row, anoth­er query can update a dif­fer­ent row at the same time. Hence Inn­oDB has supe­ri­or fea­tures for multi-user con­cur­ren­cy and per­for­mance.

Locating Duplicate Entries In A MySQL Table

If you don’t have unique index­es on your table, it is like­ly that you will occa­sion­al­ly have entries that are dupli­cat­ed. This can often hap­pen because of a soft­ware bug or pos­si­bly a user error. Some appli­ca­tions even choose not to have unique index­es for per­for­mance rea­sons; though this hap­pens at the cost of data integri­ty.

The best I know of, to demon­strate how to locate the dupli­cate entries, is to use an exam­ple. So let’s say you have the fol­low­ing table:

mysql> desc result;
+-------------------------+--------------+------+-----+---------+----------------+
| Field                   | Type         | Null | Key | Default | Extra          |
+-------------------------+--------------+------+-----+---------+----------------+
| id                      | bigint(20)   | NO   | PRI | NULL    | auto_increment |
| comment                 | varchar(255) | YES  |     | NULL    |                |
| key1_id                 | bigint(20)   | YES  | MUL | NULL    |                |
| key2_id                 | bigint(20)   | YES  | MUL | NULL    |                |
| str1                    | varchar(255) | YES  | MUL | NULL    |                |
| result                  | float        | YES  |     | NULL    |                |
+-------------------------+--------------+------+-----+---------+----------------+

In this exam­ple, let’s say you want the fol­low­ing field set to have a unique com­bi­na­tion of val­ues. In oth­er words, you want the­se fields to be a sort of com­pos­ite key; so that there are no two records in the data­base which have the same val­ues for the­se three fields.

| key1_id             | bigint(20)   | YES  | MUL | NULL    |                |
| key2_id             | bigint(20)   | YES  | MUL | NULL    |                |
| str1                | varchar(255) | YES  | MUL | NULL    |                |

Well, let’s say that you are con­cerned about hav­ing dupli­cate val­ues in your data. You need to have an effec­tive way of query­ing the data­base in order to find the dupli­cate records. The fol­low­ing query would locate dupli­cates of the above field-set.

mysql> select id, str1, key1_id, key2_id, count(*)
from test_result group by str1,key1_id,key2_id
having count(*) > 1;

In the exam­ple you group by the list of fields that you want to use as your com­pos­ite key and select the fields you want to see in your out­put. The above out­put looked like the fol­low­ing:

+-------+----------+--------------+-------------+----------+
| id    | str1     | key1_id      | key2_id | count(*) |
+-------+----------+--------------+-------------+----------+
| 17293 | EDDP     |         1951 |        1923 |        4 |
| 17302 | GRAV     |         1951 |        1923 |        4 |
| 17294 | MTDN     |         1951 |        1923 |        4 |
| 17301 | OXID     |         1951 |        1923 |        4 |
| 17303 | PH       |         1951 |        1923 |        4 |
| 17288 | XTC      |         1951 |        1923 |        4 |
+-------+----------+--------------+-------------+----------+
6 rows in set (0.52 sec)

What we have in this exam­ple is four of each of the­se records which are sup­posed to be unique.

Eliminating Duplicate Records From MySQL Tables

Any­one who works with data­base dri­ven devel­op­ment to any extent, will occa­sion­al­ly run into a sit­u­a­tion where dupli­cate infor­ma­tion is added to their data­base. I have per­son­al­ly run into such a prob­lem on many occa­sions since I start­ed work­ing with data­base dri­ven soft­ware.

Being able to quick­ly undo data cor­rup­tion is extreme­ly impor­tant in pro­duc­tion data­bas­es. As much as we like to have a per­fect pro­duc­tion envi­ron­ment; it is often very dif­fi­cult to do ded­i­cate the time and resources nec­es­sary to avoid all mis­takes.

The fol­low­ing exam­ple is a great way to remove dupli­cates, based upon a set of fields. Basi­cal­ly you cre­ate an unique index on a list of fields (com­pos­ite key) of your spec­i­fi­ca­tion.

Let’s say you have the fol­low­ing table:

mysql> desc result;
+-------------------------+--------------+------+-----+---------+----------------+
| Field                   | Type         | Null | Key | Default | Extra          |
+-------------------------+--------------+------+-----+---------+----------------+
| id                      | bigint(20)   | NO   | PRI | NULL    | auto_increment |
| comment                 | varchar(255) | YES  |     | NULL    |                |
| key1_id                 | bigint(20)   | YES  | MUL | NULL    |                |
| key2_id                 | bigint(20)   | YES  | MUL | NULL    |                |
| str1                    | varchar(255) | YES  | MUL | NULL    |                |
| result                  | float        | YES  |     | NULL    |                |
+-------------------------+--------------+------+-----+---------+----------------+

In this exam­ple, the fields which you do not want dupli­cat­ed are as fol­lows:

| key1_id              | bigint(20)   | YES  | MUL | NULL    |                |
| key2_id              | bigint(20)   | YES  | MUL | NULL    |                |
| str1                 | varchar(255) | YES  | MUL | NULL    |                |

In remove dupli­cates on those three fields, to where there is only one record in the data­base with the same val­ues for each of those three fields, you would do the fol­low­ing:

alter ignore table result add unique index unique_result (key1_id,key2_id,str1);

I ran this on one of my data­bas­es and got the fol­low­ing:

Query OK, 12687876 rows affected (21 min 13.74 sec)
Records: 12687876  Duplicates: 1688  Warnings: 0

In this exam­ple, we removed 1688 dupli­cate records based upon the three spec­i­fied fields.