How to debug long-running transactions in MySQL

Among the many things that can cause a “server stall” is a long-running transaction. If a transaction remains open for a very long time without committing, and has modified data, then other transactions could block and fail with a lock wait timeout. The problem is, it can be very difficult to find the offending code so that it can be fixed. I see this much too often, and have developed a favorite technique for tracking down what that long-running transaction is doing.

Of course, in some cases it’s actually easy to figure out what the long-running transaction is doing. The most obvious is if it’s a long-running query. If that’s the case, then you’ll see the query in the processlist, and you can track down where it’s coming from in the source code. The problem comes when the transaction remains open, but either it isn’t running queries anymore, or it runs such fast queries that you can’t capture them in the processlist.

The steps I use to debug this problem are as follows. First, I identify which transaction is blocking others. Next, I find out the TCP port for that connection. I then sniff TCP traffic for a while, use mk-query-digest to convert this into a query log, and examine the log.

In MySQL 5.1 and newer with the InnoDB plugin, it’s easy to figure out which transaction is blocking others. There are some INFORMATION_SCHEMA tables that make this simple. Just query the tables, and if something is blocked, find its thread ID. Match this up with the ID in SHOW PROCESSLIST, and look in the Host column; there you’ll see a hostname followed by a port number. That is the port number of the TCP connection that’s to blame for causing the trouble.

If you’re in pre-5.1 or non-plugin InnoDB versions, you can’t reliably figure out what connection is to blame for blocking others. You can guess, and sometimes that’s good enough. Just pull the oldest transaction that holds any locks from SHOW INNODB STATUS and assume that it’s the culprit. This doesn’t always work, and I hate things that sometimes work. They teach you to assume that it’ll always work, which bites you at the worst times. But I digress.

Here is a small script that I use quite a bit. I simply start a screen session and run it in a loop with a 30-second sleep in between. Notice that it self-purges its own data, and it won’t run if the disk is getting full. I’ve learned the hard way a long time ago never to run a script that will let data accumulate or fill up the disk.

#!/bin/bash

# Begin by deleting things more than 7 days old
find /root/tcpdumps/ -type f -mtime +7 -exec rm -f '{}' \;

# Bail out if the disk is more than this %full.
PCT_THRESHOLD=95
# Bail out if the disk has less than this many MB free.
MB_THRESHOLD=100
# Make sure the disk isn't getting too full.
avail=$(df -m -P /root/tcpdumps/ | awk '/^\//{print $4}');
full=$(df -m -P /root/tcpdumps/ | awk '/^\//{print $5}' | sed -e 's/%//g');
if [ "${avail}" -le "${MB_THRESHOLD}" -o "${full}" -ge "${PCT_THRESHOLD}" ]; then
   echo "Exiting, not enough free space (${full}%, ${avail}MB free)">&2
   exit 1
fi

host=$(mysql -ss -e 'SELECT p.HOST FROM information_schema.innodb_lock_waits w INNER JOIN information_schema.innodb_trx b ON b.trx_id = w.blocking_trx_id INNER JOIN information_schema.processlist p on b.trx_mysql_thread_id = p.ID LIMIT 1')
if [ "${host}" ]; then
   echo "Host ${host} is blocking"
   port=$(echo ${host} | cut -d: -f2)
   tcpdump -i eth0 -s 65535 -x -nn -q -tttt port 3306 and port ${port} > /root/tcpdumps/`date +%s`-tcpdump &
   mysql -e 'show innodb status\Gshow full processlist' > /root/tcpdumps/`date +%s`-innodbstatus
   pid=$!
   sleep 30
   kill ${pid}
fi

#!/bin/bash

# Begin by deleting things more than 7 days old

find /root/tcpdumps/ -type f -mtime +7 -exec rm -f '{}' \;

# Bail out if the disk is more than this %full.

PCT_THRESHOLD=95

# Bail out if the disk has less than this many MB free.

MB_THRESHOLD=100

# Make sure the disk isn't getting too full.

avail=$(df -m -P /root/tcpdumps/ | awk '/^\//{print $4}');

full=$(df -m -P /root/tcpdumps/ | awk '/^\//{print $5}' | sed -e 's/%//g');

if [ "${avail}" -le "${MB_THRESHOLD}" -o "${full}" -ge "${PCT_THRESHOLD}" ]; then

echo "Exiting, not enough free space (${full}%, ${avail}MB free)">&2

exit 1

host=$(mysql -ss -e 'SELECT p.HOST FROM information_schema.innodb_lock_waits w INNER JOIN information_schema.innodb_trx b ON b.trx_id = w.blocking_trx_id INNER JOIN information_schema.processlist p on b.trx_mysql_thread_id = p.ID LIMIT 1')

if [ "${host}" ]; then

echo "Host ${host} is blocking"

port=$(echo ${host} | cut -d: -f2)

tcpdump -i eth0 -s 65535 -x -nn -q -tttt port 3306 and port ${port} > /root/tcpdumps/`date +%s`-tcpdump &

mysql -e 'show innodb status\Gshow full processlist' > /root/tcpdumps/`date +%s`-innodbstatus

pid=$!

sleep 30

kill ${pid}

If this captures anything, then you can simply run “mk-query-digest –type=tcpdump –no-report –print” on the resulting tcpdump file (after inspecting the captured SHOW INNODB STATUS and SHOW PROCESSLIST to make sure you haven’t caught a false-positive). I also like to use the –timeline feature in mk-query-digest, which prints out a sequence of queries in an easy-to-view format, rather than aggregating the worst queries together.

This has worked for me a few times. Alas, there are some cases where it won’t work. If the offending transaction is idle, and not continuing to execute more queries, then it won’t work. In addition, I will admit that I ran into a case I couldn’t solve even with full server logging. There was a transaction that was blocking others and had locks and undo log records, yet had only run SELECT queries since its inception. That case is still open and I am still stumped as to how a SELECT-only transaction can possibly have undo log records!

The greater moral of this story is — InnoDB is designed for lots of short, small transactions. Don’t design batch jobs that run in loops and update data, unless you make them commit between iterations. And watch your server to find idle-in-transaction connections that are just sitting there doing nothing, and kill them if you can’t make the developers fix them 🙂 I’ve seen dozens of instances of those two types of problems bringing everything on the system to a grinding “lock wait timeout” halt.

6 Comments

Oldest

Newest Most Voted

Inline Feedbacks

View all comments

Peter Boros

13 years ago

I am using almost the same thing, but in a stored procedure, and I save the results to a table. If it would be excessive, most likely the lock is easy to spot anyway:). This approach misses one feature: you can’t tell what sql executed on a given thread id last time. If you could, the “many weird sleeping threads” problems would be so much easier to diagnose.

Baron Schwartz

Author

Reply to Peter Boros

13 years ago

That would be very helpful, yes.

Pavel Shevaev

13 years ago

Baron, as for InnoDB being designed for lots of short, small transactions…. what about save points? Do they help in this case? I mean if I have a big fat transaction and if I use save points inside of it periodically, will it make InnoDB happier?

Baron Schwartz

Author

13 years ago

That’s a really good question. I think I know the answer (I am pretty sure that locks are not released after a savepoint), but I am not sure how savepoints are actually implemented in InnoDB. I would need to study that.

Aaron Brown

13 years ago

As always, your timing is impeccable, Baron

Markus Hall

12 years ago

Could also be worth checking your disk space. You get a server stall if the file system holding your database is full, which I don’t think (though not sure) you can tell by looking at the info tables alone.

MySQL 5.7
End of Life

Compare Percona to Leading Database Solutions

Software
Downloads

Product
Documentation

Resource Hub

Financial Services

Driving Database Success

Percona Blog

Percona Community Hub

Percona Events Hub

About Percona

Percona in the News

Our Customers

Our Partners

Careers

Contact Us

How to debug long-running transactions in MySQL

Related

Related Blog Articles

RECOMMENDED ARTICLES

Valkey/Redis: Sets and Sorted Sets

Securing Your MySQL Database: Essential Best Practices

Mastering Database Monitoring: Running PMM in High Availability Mode

MOST POPULAR ARTICLES

Auditing login attempts in MySQL

Deploy Django on Kubernetes With Percona Operator for PostgreSQL

MySQL “Got an error reading communication packet”

MySQL 5.7 End of Life

Compare Percona to Leading Database Solutions

Software Downloads

Product Documentation

Resource Hub

Financial Services

Driving Database Success

Percona Blog

Percona Community Hub

Percona Events Hub

About Percona

Percona in the News

Our Customers

Our Partners

Careers

Contact Us

How to debug long-running transactions in MySQL

Related

Share This Post!

Want to get weekly updates listing the latest blog posts?

Related Blog Articles

RECOMMENDED ARTICLES

Valkey/Redis: Sets and Sorted Sets

Securing Your MySQL Database: Essential Best Practices

Mastering Database Monitoring: Running PMM in High Availability Mode

MOST POPULAR ARTICLES

Auditing login attempts in MySQL

Deploy Django on Kubernetes With Percona Operator for PostgreSQL

MySQL “Got an error reading communication packet”

MySQL 5.7
End of Life

Software
Downloads

Product
Documentation