pt-stalk recipes: Gather forensic data about MySQL when a server problem occurs

It happens to us all from time to time: a server issue arises that leaves you scratching your head. That’s when Percona Toolkit’s pt-stalk comes into play, helping you diagnose the problem by capturing diagnostic data that helps you pinpoint what’s causing the havoc hitting your database.

From the documentation (https://www.percona.com/doc/percona-toolkit/pt-stalk.html):

pt-stalk watches for a trigger condition to become true, and then collects data to help in diagnosing problems. It is designed to run as a daemon with root privileges, so that you can diagnose intermittent problems that you cannot observe directly. You can also use it to execute a custom command, or to gather the data on demand without waiting for the trigger to happen.

There are some common options that you can use in all examples, so I recommend you to read the documentation if you have any specific questions.

Be prepared! It’s wise to have pt-stalk running 24/7, because problems such as MySQL lockups or spikes of activity typically leave no evidence to use in root cause analysis. By continuously running pt-stalk, you’ll have the data it gathers when the trouble occurs.

Let’s look at some specific “pt-stalk recipes.”

Just collect the information:

pt-stalk will collect the information and will exit after that.

Every hour for one day:

Collect the information every one hour (–sleep=3600) 24 times (–iterations=24) without wait for any condition (–threshold=0) and run in background (–daemonize).

A host has more than 10 connections:

Collect the all information when the server 10.0.0.23 (–match 10.0.0.23) have more than 10 (–threshold 10) connections opened. You can use any variable from the “show processlist” command, in this case, I’m using the “Host” variable.

More than one variable:

In some cases, you want to check more than one variable, in those cases, you will have to write a small scrip to do this.

The script:
The script must contain a shell function called “trg_plugin” and that function must return a number, this number will be the one that pt-stalk will use to match against the –threshold option.

The pt-stalk command:
Collect all information when the function called trg_plugin inside the script ./pt-stalk-function (–function ../pt-stalk-function) return more than 100 (–threshold 100)

Custom collector with plugins:

Plugins are useful to collect information that it is not included in the pt-stalk by default. For example, if you want to collect pid status information from /proc/[mysqlpid]/status you can use plugins for this.

The plugin:
The script in this case contain a shell function called “before_collect” and pt-stalk will run this function before collect the information (you can collect after, before and after, etc, please check the documentation for more information)

The pt-stalk command:
Before collect the information, it will run the plugin ./pt-stalk-pidplugin (–plugin ./pt-stalk-pidplugin)

Have any comments or questions? Just let me know below!

9 Comments
Oldest
Newest Most Voted
Inline Feedbacks
View all comments
honeybee

question, what is in .my.default.cnf?

Hugo

Hello,

Ive try the plugin, but always defaults to the command to pt-stalk like
–variable=Threads_running –match=

So, never triggers even if the plugin returns a threshold > than defined

Alex Dicianu

If you want to check Threads_running, the function needs to be status, not processlist.

pt-stalk –function status –variable Threads_running –threshold 2

Check man pt-stalk for more details.

Ashwini

Hi,

Thanks for you post..
I want to get all sql queries should log during my slave lag.

Am using below command :

/usr/bin/pt-stalk –function=/root/pt-plug.sh –variable=seconds_behind_master –threshold=5 –cycles=7 –[email protected] –log=/mnt/pt-stalk.log –collect –collect-tcpdump –pid=/root/pt-stalk.pid –daemonize

This variable does not takes values –collect –collect-tcpdump.
So where my sqldump will get log ? whats the default location for this logs ?

Thanks In Advance !!

Adam Swanson

Thanks for the post. One thing I’m struggling with is how to have pt-stalk stalk multiple triggers.

For example, I have a configs directory and a functions directory. According to the documentation I can use the –config flag and a comma separated list. This doesn’t seem to work as we would expect. It seems to only runt he last config that’s in that CSL.

How can you use pt-stalk to monitor multiple custom functions?

pt-stalk –config config1.conf, config2.conf, config3.conf does not seem to work as expected.

martinarrieta

Could you please provide us the full command? Maybe the issue are the spaces between the files ie- –config config1.conf,config2.conf vs –config config1.conf, config2.conf .

Also “PTDEBUG=1 pt-stalk ….” can help you to provide us more information.

Regards,

Martin.

Adam Swanson

/usr/bin/pt-stalk –config /usr/local/bin/mysql/pt-stalk/configs/idleconnections.conf,/usr/local/bin/mysql/pt-stalk/configs/longquery.conf –dest=/var/lib/mysql/pt-stalk –log=/var/log/mysql/pt-stalk.log –sleep=300 –interval=300 –defaults-file=/etc/percona-toolkit.conf –daemonize

is the full command. Have you ever used multiple configs successfully? Can you paste an example of how you use it?

Jo Valerio

From the pt-stalk output in /var/lib/pt-stalk, what are the first five report you normally check and why?