Efficiently Loading Paginated Results From MySQL with Just One Query Per Page

There are many sit­u­a­tions in which pag­i­na­tion of query results is very use­ful, espe­cial­ly for per­for­mance opti­miza­tion.  In most of the­se kinds of sit­u­a­tions, the pag­i­nat­ing of results requires you to deter­mine the total num­ber of results, so the appli­ca­tion knows the num­ber of pages avail­able.

The most com­mon way to do this, is to use two queries; one which obtains the count of results and one which obtains a par­tic­u­lar page of results.  This method has the poten­tial to be even less opti­mal than load­ing the entire set of results, how­ev­er; due to the fact that two queries are now nec­es­sary, where­as before there was only one query.

In most data­base sys­tems, it is pos­si­ble to over­come this lim­i­ta­tion; though the tech­nique is speci­fic to the par­tic­u­lar data­base you are using.  This exam­ple explains how to do this in MySQL.

Here is the sub-optimal exam­ple, using two queries; it is load­ing the first page of results. The LIMIT 0, 40 means it will start at posi­tion 0 (the begin­ning) and obtain a set of 40 results.

SELECT count(*) FROM my_table WHERE timestamp >= '2010-03-15' and timestamp <= '2010-08-01';
 
SELECT id, timestamp, field1, field2, field3 FROM my_table WHERE timestamp >= '2010-03-15' and timestamp <= '2010-08-01' ORDER BY id LIMIT 0, 40;

Here is a more opti­mal exam­ple, which uses two state­ments, only one of which is a real query.  Every­thing is done dur­ing the first state­ment, the sec­ond state­ment mere­ly loads the count, which was cal­cu­lat­ed dur­ing the first state­ment.

SELECT SQL_CALC_FOUND_ROWS id, timestamp, field1, field2, field3 FROM my_table WHERE timestamp >= '2010-03-15' and timestamp <= '2010-08-01' ORDER BY id LIMIT 0, 40;
 
SELECT found_rows() AS cnt;

One of the draw­backs of SQL_CALC_FOUND_ROWS, or count(*) in gen­er­al; is the fact that, by run­ning the­se cal­cu­la­tions, you lose some of the ben­e­fit of pag­i­na­tion. This is because your data­base is required to exam­ine all of the effect­ed data in order to gen­er­ate an accu­rate count.

Depend­ing on the specifics of your my.cnf con­fig­u­ra­tion file, the first state­ment will cache part of the infor­ma­tion, caus­ing it to exe­cute faster when sub­se­quent pages are load­ed. In some of my own test­ing I have seen a sig­nif­i­cant speedup after the first page is load­ed.

If you want to get the most out of your appli­ca­tion, you will like­ly need to do a com­bi­na­tion of appli­ca­tion tun­ing, query tun­ing and data­base tun­ing. Gen­er­al­ly you will want to start by tun­ing the appli­ca­tion itself; this way you’re elim­i­nat­ing any bot­tle­necks inher­ent in the appli­ca­tion. Then you’ll need to do some query opti­miza­tions if the appli­ca­tion still isn’t fast enough. Final­ly, you’ll want to look into what con­fig­u­ra­tion changes you can make on the data­base in order to speed things up from the source.

Looking Towards Calmer Seas Ahead

For the first time in a long time, as far as soft­ware work is con­cerned at least, I’ve felt more and more like the May­tag Repair­man. After many weeks up to my ears in cus­tomer issues; the­se hec­tic times may be draw­ing to a close, with more peace­ful waters ahead.

I remem­ber when one of the Com­put­er Sci­ence pro­fes­sors at UAHuntsville once told me that soft­ware real­ly is a machine, though most peo­ple don’t think of it as such. There real­ly are mov­ing parts, and though they are micro­scop­ic; their design makes a huge dif­fer­ence when you start to use some­thing heav­i­ly in a busi­ness crit­i­cal envi­ron­ment. The­se are the lessons I have been learn­ing first-hand over the past cou­ple of years.

I start­ed out at the job I am work­ing now as a novice, who knew a lit­tle about every­thing; work­ing on a soft­ware project that, while gen­er­al­ly well designed, had some scalability/stability issues in cer­tain areas; as well as some user expe­ri­ence issues which need­ed address­ing.

Now I’ve been through some major storms, in a man­ner of speak­ing, and I feel like I can real­ly do some­thing sig­nif­i­cant now. I’ve seen pesky issues arise as our cus­tomers grow; some­times spend­ing weeks track­ing them down and fix­ing them.

In the process of mov­ing through the­se issues, I’ve learned a great deal about tun­ing up MySQL and HQL (Hiber­nate Query Lan­guage); as well as tun­ing the appli­ca­tion itself. I’ve learned that the most impor­tant place to start opti­miz­ing is the appli­ca­tion itself; because oth­er­wise you are just wast­ing your avail­able resources.

I’ve got­ten into this rou­tine where I hunger for more wis­dom and hands-on expe­ri­ence. Often, when I get a free evening, I’ll go to the soft­ware sec­tion of the book­store, drink cof­fee and absorb as much infor­ma­tion as I can. I real­ly love keep­ing up with the lat­est inno­va­tions. When the oppor­tu­ni­ty aris­es, I like to employ what I learn in the projects I’m work­ing on.
Con­tin­ue read­ing “Look­ing Towards Calmer Seas Ahead”

Writing Complex Web Apps With Google Web Toolkit (GWT)

The Google Web Toolkit (GWT) is a relatively new set of open source tools, developed by Google; which aims to allow developers to write much of the client-side code as Java. This Java code is then compiled into the appropriate JavaScript code, to run on the user's web browser. Basically, the Google team has come up with a way of allowing developers to write most of their web applications in Java, instead of having to switch between Java and JavaScript; thus minimizing the amount of cross-browser JavaScript development/testing.

The developers of GWT have chosen to focus their efforts on Eclipse as the preferred IDE; though you are not limited to Eclipse. One of the great benefits of GWT, is that you can now step through most of your application in the Eclipse debugger. This makes developing the client-side aspects of your app much easier and more stable than having to use JavaScript debugging tools like Firebug.

Attached is a Google Tech Talk from Google developer Bruce Johnson, in which he explains GWT in great detail. The video is a couple of years old; but it is still a good intro to GWT.

Google Tech TalksJune 24, 2008


YouTube DirectEclipse Day at the Googleplex: GWT in Eclipse

Eclipse Day at the Googleplex

Speaker: Bruce Johnson, Google

Building high-performance Ajax easily with Google Web Toolkit (GWT) in Eclipse has always been possible, but soon it will be downright easy. Bruce will present GWT's upcoming Eclipse plugin that helps novices get started and lets experts fly.

Sending Mail in Shell Scripts via an External Server with Nail

If you’ve ever tried send­ing email via the com­mand line, using the mail util­i­ty, you may find that the method can be unre­li­able in some cas­es.  The email mes­sages are often inter­cept­ed by spam bots, fil­tered by secu­ri­ty pro­grams, etc.  A more ele­gant and sta­ble alter­na­tive, is to use your exist­ing email server to send the mes­sage.  Using the pro­gram nail makes this an easy task to do via the com­mand line.

The fol­low­ing exam­ple shows you how to send a sim­ple mes­sage with attach­ment. Here is the syn­tax for send­ing a mes­sage with nail.

echo "" | nail -s "" -a   ...

In order for nail to func­tion, you must have the .mail con­fig­u­ra­tion file in your path. Here is a sam­ple .mail con­fig­u­ra­tion file to get you start­ed quick­ly.

set smtp=smtp://yourhost.com
set from="yourname@yourhost.com (Display Name)"
set smtp-auth=login
set smtp-auth-user=your_username
set smtp-auth-password=your_password

Streaming Data as Downloadable Files in Struts Servlet

One way to stream data to the client, is to use the Print­Writer, a library which allows you to direct­ly manip­u­late the out­put stream which is sent to the client. One of the ben­e­fits of stream­ing the out­put to the client with Print­Writer, is the abil­i­ty to send data as it is gen­er­at­ed; instead of hav­ing to buffer all of the data on the server, then send the data to the client after the entire set is gen­er­at­ed.

For con­ve­nience and espe­cial­ly for large files, it is impor­tant to mod­i­fy the HTTP head­ers in HttpServle­tRe­spon­se, instruct­ing the client’s browser to save the file to disk.

The fol­low­ing is a min­i­mal  exam­ple, which shows how to dump a dum­my CSV text file as a down­load­able file in a struts Action Servlet.

public class CsvDataDumpAction extends Action {
 
	public ActionForward execute
	(ActionMapping mapping, ActionForm form, HttpServletRequest request, HttpServletResponse response)
	{
		// declare the PrintWriter object for later instantiation
		PrintWriter pw = null;
 
		// modify the HTTP response header, so the file is downloaded as DataDump.txt
		response.setHeader("Content-disposition", "attachment; filename=DataDump.txt");
		response.setContentType("application/octet-stream");
 
		// catch the IOException generated by the PrintWriter
		try {
			// Sample header with four fields
			String header = "Field1,Field2,Field3,Field4";
			pw.println(header);
			// flush the buffer, sending the header line to the client
			pw.flush();
 
			// generate 1000 lines of dummy test data, with the field name, followed by the number of the row
			for(int i = 0; i &lt; 1000; i++) {
				pw.println("Field1_"+i+",Field2_"+i+",Field3_"+i+",Field4_"+i);
				// flush the buffer after each line is generated,
				// sending the data to the client as it is generated
				pw.flush();
			} 
 
			// show stack traces for the PrintWriter in the logs
		} catch (IOException e) { 	e.printStackTrace(); 	}
		return null;
	}
}

Bash Shell Script to Convert All MySQL Tables to use the Innodb Engine

The fol­low­ing, when run in the Bash shell, will con­vert the engine for all of your MySQL tables to Inn­oDB.

mysql -B -N -e "SHOW TABLES" -u  <username> --password=<password> <databasename> | while read table; \
do \
     echo "+ Converting Table $table"; \
     mysql -B -N -e "alter table $table engine=innodb" -u <username> --password=<password> <databasename>; \ 
done;

Often, if you have a heav­i­ly used data­base, you will want to con­sid­er the Inn­oDB engine. By default, the MySQL data­base engine is MyISAM. How­ev­er, the Inn­oDB engine has many advan­tages; par­tic­u­lar­ly for high uti­liza­tion envi­ron­ments. With Inn­oDB data-integrity is main­tained through­out the entire query process because it is trans­ac­tion safe. 

Inn­oDB also pro­vides row-locking, as opposed to table-locking. With row-locking, while one query is busy updat­ing or insert­ing a row, anoth­er query can update a dif­fer­ent row at the same time. Hence Inn­oDB has supe­ri­or fea­tures for multi-user con­cur­ren­cy and per­for­mance.

Copying Yesterday’s Exceptions with Stack Traces from Logs, Then Emailing To Administrators

When you have a java appli­ca­tion server which gen­er­ates a great deal of logs, it can be tricky to find the most impor­tant infor­ma­tion, espe­cial­ly if you have detailed log­ging. For­tu­nate­ly grep is capa­ble of doing this very well.

The fol­low­ing com­mand will gath­er all WARN, ERROR, FATAL, and Excep­tion stack traces. This com­mand can be very use­ful for Java log mon­i­tor­ing scripts.

cat /jboss-4.0.2/server/default/log/server.log | grep "ERROR\|FATAL\|Exception\|at.*\.java\:.*"

Understanding this expression

In this expres­sion ‘\|’ is used as an OR oper­a­tor to look for dif­fer­ent pat­terns. ‘WARN’, ‘ERROR’, and ‘FATAL’ pat­terns are used to fil­ter first line of the log event that can pos­si­bly con­tain an excep­tion at WARN, ERROR and FATAL log­ging lev­els. We lat­er fil­ter the first line of the stack trace with ‘Excep­tion’ as the first line of the stack trace usu­al­ly has the Excep­tion name fol­lowed by the excep­tion mes­sage e.g. ‘java.lang.NullPointerException’.

After this you will have the stack trace ele­ments which start with ‘at’ and end with pat­tern ‘(FileName.java:lineNo)’ e.g. at java.lang.Thread.run(Thread.java:595). The­se stack trace ele­ments are fil­tered with ‘at.*\.java\:.*’. All the­se pattern’s OR’ed togeth­er can fil­ter the com­plete stack trace in log at WARN, ERROR and FATAL log lev­el. Some false pos­i­tives may also get fil­tered out with this com­mand if the log com­ment has words like WARN, ERROR, FATAL, Excep­tion.

source: com­put­er tech­nol­o­gy roller

Filtering by Date: Yesterday’s Logs

If you want to fil­ter the log­files after a cer­tain date, the fol­low­ing com­mand is very use­ful. It gets the date for yes­ter­day, using the date for­mat yyyy-mm-dd, then is uses sed to print all of the lines after the spec­i­fied date. This is a good com­mand to run after mid­night, to retrieve the pre­vi­ous day’s logs.

cat /jboss-4.0.2/server/default/log/server.log | sed "1,/$(date --date='yesterday' '+%Y-%m-%d')/d"

Putting it All Together

Daily Log Monitor Script to Email Error Stack Traces to the Administrator

Here is a com­plete mon­i­tor­ing script I wrote, which emails me all of the pre­vi­ous days errors stack traces. I have it run­ning in cron.daily in order to reg­u­lar­ly send me the jboss error stack traces.

#!/bin/bash

# email addresses to send the message to
email="address@host.com"

# determine the number of running instances
errors=$(cat /jboss-4.0.2/server/default/log/server.log | sed "1,/$(date --date='yesterday' '+%Y-%m-%d')/d" | grep "ERROR\|FATAL\|Exception\|at.*\.java\:.*")

subject="JBOSS DAILY ERROR DIGEST FOR: $(hostname)"

echo "$errors" | /bin/mail -s "$subject" "$email"

Monitoring Process Counts and Alerting Via Email

Below is a sim­ple script called monitor_jboss, which checks to see if jboss is run­ning and whether or not too many instances are cur­rent­ly run­ning. I found a need to write this script because we have some cron scripts which auto­mat­i­cal­ly restart JBoss each day and the JBoss shut­down script itself some­times fails to prop­er­ly shut down, caus­ing some quirky behav­ior.

If it deter­mi­nes that one of the fol­low­ing con­di­tions are true, it sends a short email to the address spec­i­fied in the vari­able email describ­ing the prob­lem.

  • JBoss is not run­ning at all
  • Jboss has more than max instances run­ning

This script is then placed in /etc/cron.d/cron.hourly/ where it will check the sys­tem once an hour and send an email as appro­pri­ate.

#!/bin/bash

# email addresses to send the message to
email="address@host.com"

# maximum number of concurrently running instances allowed
max=1

# determine the number of running instances
count_running_jbosses=$(ps aux | grep jboss | grep -v grep | grep -v monitor_jboss  | wc -l)

if [ $count_running_jbosses -eq "0" ]            # jboss isn't running
then
        message="JBoss Is Currently Not Running"
fi
if [ $count_running_jbosses >  $max ]           # too many jboss instances running
then
        message="JBoss Is Currently Running $count_running_jbosses instances; the maximum is $max"
fi

subject="JBOSS MONITORING ALERT FOR: $(hostname)"

echo "$message" | /bin/mail -s "$subject" "$email"

Locating Duplicate Entries In A MySQL Table

If you don’t have unique index­es on your table, it is like­ly that you will occa­sion­al­ly have entries that are dupli­cat­ed. This can often hap­pen because of a soft­ware bug or pos­si­bly a user error. Some appli­ca­tions even choose not to have unique index­es for per­for­mance rea­sons; though this hap­pens at the cost of data integri­ty.

The best I know of, to demon­strate how to locate the dupli­cate entries, is to use an exam­ple. So let’s say you have the fol­low­ing table:

mysql> desc result;
+-------------------------+--------------+------+-----+---------+----------------+
| Field                   | Type         | Null | Key | Default | Extra          |
+-------------------------+--------------+------+-----+---------+----------------+
| id                      | bigint(20)   | NO   | PRI | NULL    | auto_increment |
| comment                 | varchar(255) | YES  |     | NULL    |                |
| key1_id                 | bigint(20)   | YES  | MUL | NULL    |                |
| key2_id                 | bigint(20)   | YES  | MUL | NULL    |                |
| str1                    | varchar(255) | YES  | MUL | NULL    |                |
| result                  | float        | YES  |     | NULL    |                |
+-------------------------+--------------+------+-----+---------+----------------+

In this exam­ple, let’s say you want the fol­low­ing field set to have a unique com­bi­na­tion of val­ues. In oth­er words, you want the­se fields to be a sort of com­pos­ite key; so that there are no two records in the data­base which have the same val­ues for the­se three fields.

| key1_id             | bigint(20)   | YES  | MUL | NULL    |                |
| key2_id             | bigint(20)   | YES  | MUL | NULL    |                |
| str1                | varchar(255) | YES  | MUL | NULL    |                |

Well, let’s say that you are con­cerned about hav­ing dupli­cate val­ues in your data. You need to have an effec­tive way of query­ing the data­base in order to find the dupli­cate records. The fol­low­ing query would locate dupli­cates of the above field-set.

mysql> select id, str1, key1_id, key2_id, count(*)
from test_result group by str1,key1_id,key2_id
having count(*) > 1;

In the exam­ple you group by the list of fields that you want to use as your com­pos­ite key and select the fields you want to see in your out­put. The above out­put looked like the fol­low­ing:

+-------+----------+--------------+-------------+----------+
| id    | str1     | key1_id      | key2_id | count(*) |
+-------+----------+--------------+-------------+----------+
| 17293 | EDDP     |         1951 |        1923 |        4 |
| 17302 | GRAV     |         1951 |        1923 |        4 |
| 17294 | MTDN     |         1951 |        1923 |        4 |
| 17301 | OXID     |         1951 |        1923 |        4 |
| 17303 | PH       |         1951 |        1923 |        4 |
| 17288 | XTC      |         1951 |        1923 |        4 |
+-------+----------+--------------+-------------+----------+
6 rows in set (0.52 sec)

What we have in this exam­ple is four of each of the­se records which are sup­posed to be unique.

Eliminating Duplicate Records From MySQL Tables

Any­one who works with data­base dri­ven devel­op­ment to any extent, will occa­sion­al­ly run into a sit­u­a­tion where dupli­cate infor­ma­tion is added to their data­base. I have per­son­al­ly run into such a prob­lem on many occa­sions since I start­ed work­ing with data­base dri­ven soft­ware.

Being able to quick­ly undo data cor­rup­tion is extreme­ly impor­tant in pro­duc­tion data­bas­es. As much as we like to have a per­fect pro­duc­tion envi­ron­ment; it is often very dif­fi­cult to do ded­i­cate the time and resources nec­es­sary to avoid all mis­takes.

The fol­low­ing exam­ple is a great way to remove dupli­cates, based upon a set of fields. Basi­cal­ly you cre­ate an unique index on a list of fields (com­pos­ite key) of your spec­i­fi­ca­tion.

Let’s say you have the fol­low­ing table:

mysql> desc result;
+-------------------------+--------------+------+-----+---------+----------------+
| Field                   | Type         | Null | Key | Default | Extra          |
+-------------------------+--------------+------+-----+---------+----------------+
| id                      | bigint(20)   | NO   | PRI | NULL    | auto_increment |
| comment                 | varchar(255) | YES  |     | NULL    |                |
| key1_id                 | bigint(20)   | YES  | MUL | NULL    |                |
| key2_id                 | bigint(20)   | YES  | MUL | NULL    |                |
| str1                    | varchar(255) | YES  | MUL | NULL    |                |
| result                  | float        | YES  |     | NULL    |                |
+-------------------------+--------------+------+-----+---------+----------------+

In this exam­ple, the fields which you do not want dupli­cat­ed are as fol­lows:

| key1_id              | bigint(20)   | YES  | MUL | NULL    |                |
| key2_id              | bigint(20)   | YES  | MUL | NULL    |                |
| str1                 | varchar(255) | YES  | MUL | NULL    |                |

In remove dupli­cates on those three fields, to where there is only one record in the data­base with the same val­ues for each of those three fields, you would do the fol­low­ing:

alter ignore table result add unique index unique_result (key1_id,key2_id,str1);

I ran this on one of my data­bas­es and got the fol­low­ing:

Query OK, 12687876 rows affected (21 min 13.74 sec)
Records: 12687876  Duplicates: 1688  Warnings: 0

In this exam­ple, we removed 1688 dupli­cate records based upon the three spec­i­fied fields.