We recently examined a customer’s system to try to speed up an ETL (Extraction, Transformation and Loading) process for a big data set into a sort of datamart or DW.  What we typically do is ask customers to run the process in question, and then examine what’s happening.  In this case, the (very large, powerful) database server was almost completely idle, with virtually no I/O activity or CPU usage.  So we looked at the server where the ETL process was running.  It was running at 25% CPU usage and was writing some files to disk, but not waiting on I/O.

What’s going on here?  Where’s the bottleneck?  The process is slow, and neither machine is really doing much work.  Why?

Maybe you guessed the network.  Nope, not the network either.  There was plenty of spare network capacity.

If I told you the ETL machine was using exactly 25% of its CPU capacity, would you guess that it had 4 CPU cores and one of them was running at 100% usage?  This is what was happening.  The ETL app was single-threaded and CPU-bound.

Of course, we measured the entire process, so we could say authoritatively what was going on.  But this problem actually took only a few minutes to diagnose.  The point here is to look beyond the database server for what seems to be a database problem.  This is why we call ourselves “full-stack performance tuning experts.”  We try not to have tunnel vision. This reminds me of another problem I helped debug a few weeks ago — a really slow website was due to a curl call that was hidden in the code, and timing out because of DNS issues.

One of the best things you can do to improve your performance is build profiling into your application, or run the application under a profiler (Google for “profiling <language>”) and find the places where it consumes the most time.  It’s especially valuable to profile “external resource calls” such as calls to the database, web services, and so on. These approaches can make it much easier to find the slow parts.

We devoted part of a chapter to profiling in our book.  We explain a lot of useful techniques to help you build “light-weight” profiling into the application from the start (a very smart thing to do).

Sometimes our clients believe they already know the source of the problem, but they can’t prove it. They ask us to either prove them right or show them the real problem. The way we do this is to prefer measurements to guesses.

Of course, finding the problem is only part of the battle. Fixing it is another matter. But “how to fix every performance problem” doesn’t fit into a single blog post!

8 Comments
Oldest
Newest Most Voted
Inline Feedbacks
View all comments
Joe Izenman

Admittedly I haven’t finished reading it yet, but I’m going to hazard a guess and say that “how to fix every performance problem” probably doesn’t quite fit into a book, either.

Rock Howard

Good article. In the late ’90s my company made a lot of money tuning deployment stacks running Java-based software systems. I led most of the efforts and while the typical solution involved tuning DB queries, that was not generally done to reduce the load on the DB server, but rather to reduce the CPU overhead of data translation on massive result sets. (The Oracle JDBC drivers were really bad in those days. I wonder if they ever fixed them.) I gave a presentation once on performance tuning Java-server based stacks and easily filled up 30 slides with all manner of possible bottlenecks. With all those considerations, the only possible approach, which I espoused relentlessly, was measure — don’t guess!

Another useful subpoint about the tuning process is this — change one thing at a time. I had a client where two different teams were tuning simualtaneously. One was tuning the database and the other the application. As it turned out, those teams were tossing the bottleneck back and forth between them like a hot potato without even realizing it! Given the extreme panic within the organization, they might never have solved the issue without an outsider like myself stepping in and taking charge of the tuning process.

My favorite tuning challenge was a very tough nut where the key to dramatically improving throughput was inserting a strategically located sleep statement. The over-clogged network and database queue was spewing so many errors that the CPUs were spending most of their time churning through exceptions. The sleep statement forced everything to wait a bit and this dramatically reducing exception processing. From there I was able to tune the system to the point where the sleep was no longer necessary (but neither did it hurt so the customer decided to deploy with it in place just as a safety precaution.)

Well we made a lot of money and saved our clients millions of dollars, but I am very happy to be out of the performance tuning game. It is much harder work than most greenfield development. However I have recently soft lauched a new public site and I wouldn’t mind at all if I was required to performance tune that site. At least I have a firm handle on the system design and that, in the long run, is the key to scalable performance.

Jim Murphy

Oh – Liked the way this started but I didn’t go the way I wanted – to solving my particular problem that is. 🙂

I have a simple table with lots of data some 60 million rows. Queries take a long, long time though CPU is low – never getting above 4%. What should I start measuring? What can I check to know if its I/O bound or what the bottleneck is?

Table Described:

+—————–+———————-+——+—–+———————+—————-+
| Field | Type | Null | Key | Default | Extra |
+—————–+———————-+——+—–+———————+—————-+
| id | int(11) | NO | PRI | NULL | auto_increment |
| title | varchar(255) | YES | | NULL | |
| description | text | YES | | NULL | |
| full_story | text | YES | | NULL | |
| uri | varchar(255) | YES | MUL | NULL | |
| pubdate | datetime | YES | | 0000-00-00 00:00:00 | |
| pubdate_reverse | int(9) | YES | | NULL | |
| feed_id | int(5) | YES | MUL | NULL | |
+—————–+———————-+——+—–+———————+—————-+

The Query:

Select id, title, uri, pubdate from entries where feed_id = 151 order by pubdate_reverse limit 1000;

1000 rows in set (3 min 34.11 sec)

The explain:

+—-+————-+———+——+——————-+——————-+———+——-+——-+————-+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+—-+————-+———+——+——————-+——————-+———+——-+——-+————-+
| 1 | SIMPLE | entries | ref | index_feed_recent | index_feed_recent | 5 | const | 18032 | Using where |
+—-+————-+———+——+——————-+——————-+———+——-+——-+————-+
1 row in set (0.00 sec)

Hints?

Best,

Jim

Jim Murphy

Thanks Baron,

It appears to me that disk is pretty close to saturated:

avg-cpu: %user %nice %system %iowait %steal %idle
0.51 0.00 0.20 54.55 0.00 44.73

Device: rrqm/s wrqm/s r/s w/s rsec/s wsec/s avgrq-sz avgqu-sz await svctm %util
sdb 255.00 221.80 70.60 49.00 2580.80 2159.00 39.63 3.88 32.34 7.66 91.60
sdc 235.20 216.80 65.40 47.80 2360.00 2104.60 39.44 3.82 33.53 7.23 81.80
dm-0 0.00 0.00 628.00 535.20 5024.00 4263.60 7.98 39.04 33.41 0.85 99.20

So what to do when you MySQL queries are IO bound? Any query hints to help it do less work?

Thanks for the advice,

Jim

Jim Murphy

I appreciate that. I’ve been reading you book on Safari-Online and will head tot he forums. Thanks for the help.

Jim