July 24, 2014

Beware the Innodb Table Monitor

As I stated in my last post, I decided to use the Innodb Table monitor to diagnose an Index count mismatch error a customers found in their mysqld.err log to verify if the problem still existed.

The Innodb Table Monitor has existed for basically forever in Innodb (the MySQL manual discusses it back in the 4.1 manual), and is from a time when what was going on inside of Innodb was a lot murkier than it is now.   To use it, you create a table (in any database you choose), like this:

This, primitively, tells Innodb that it should start the Table monitor.  The monitor itself dumps the contents of the Data dictionary to the mysql error log, which looks something like this:

Basically you can see every table, every column, and every index in that table.  Once you get the output, you just DROP the table and it stops.  It’s been around for forever, so it should be stable, right?

So, to my customer.  It may be important to note that this customer had a few hundred tables.  I scheduled a time with the customer during off hours to connect to their master and collect the output so we could confirm if Innodb really did have all the indexes or not.  I collected the data just fine, and I found that indeed the indexes that Innodb knew about were correct, and that the error messages the customer saw was apparently only transitory.

However, as I was writing an email to the customer to explain it, I got an email explaining that he saw a big spike in io-wait on the server during the time I ran the Innodb Table monitor (it was just for a minute) and he was very glad we waited until after hours.  The server itself was fine, and there didn’t seem to be any adverse effects from running the monitor, but something clearly was a lot busier than expected during execution.

He had data collected using Percona Tookit‘s pt-stalk during the time and asked me to take a look and see what happened.  I looked through the data and found not only was there high io-wait on the system, but also there were over a hundred user queries in SHOW PROCESSLIST that were stuck in the ‘Opening Tables’ state, uh-oh.

I dug through the source code and found a few interesting things about the table monitor:

  • It holds the dict_sys->mutex for the entire time it runs
  • Before outputting the information for each table, it calls dict_update_statistics and appears to force table statistics to be regenerated for each table
  • It’s possible I’m mistaken, but it sure looks like dict_update_statistics is called twice for each table, at least in the latest Percona Server 5.1 in: storage/innodb_plugin/dict/dict0load.c: line 226, and again in: storage/innodb_plugin/dict/dict0dict.c: line 4883
So, the io-wait spike is explained by table statistics being regenerated, and the SHOW PROCESSLIST connections stuck in ‘Opening Tables’ were waiting for the dict_sys->mutex, which is required for opening tables in Innodb.
It would be better for the Table Monitor if:
  • Table stats weren’t regenerated unless it was necessary
  • dict_sys->mutex weren’t such a big hammer
These points are probably moot, since I don’t think most people don’t use this feature very often.
Just because it’s old, doesn’t mean you should trust it.
About Jay Janssen

Jay joined Percona in 2011 after 7 years at Yahoo working in a variety of fields including High Availability architectures, MySQL training, tool building, global server load balancing, multi-datacenter environments, operationalization, and monitoring. He holds a B.S. of Computer Science from Rochester Institute of Technology.

Comments

  1. Hey Jay,

    I was searching for dict_sys mutex and this post was one of the results. I stumbled upon dict_sys mutex at another place — both row_import_tablespace_for_mysql and fsp_get_available_space_in_free_extents, first one being called for alter table and second one for show table like operations I guess — can get into contention.

    Few bugs exist on bugs.mysql for this — http://bugs.mysql.com/bug.php?id=54538 — though it touched upon other areas where that mutex had affected.

  2. I looked at mysql tip source code, and it is called dict_stats_update now and looks like it has new code for lock propagation (to avoid repeated lock/unlock i guess).

    From cscope, here is what I could get on its call sites:

    0 storage/innobase/dict/dict0dict.cc dict_table_open_on_name 938 dict_stats_update(table,
    1 storage/innobase/dict/dict0dict.cc dict_table_print_low 4944 dict_stats_update(table, DICT_STATS_FETCH, TRUE);
    2 storage/innobase/dict/dict0load.cc dict_process_sys_tables_rec_and_mtr_commit 367 dict_stats_update(*table, DICT_STATS_FETCH, TRUE);
    3 storage/innobase/fts/fts0fts.cc fts_get_total_document_count 2384 dict_stats_update(table, DICT_STATS_RECALC_TRANSIENT, FALSE);
    4 storage/innobase/handler/ha_innodb.cc info_low 9684 ret = dict_stats_update(ib_table, stats_upd_option,FALSE);
    5 storage/innobase/row/row0mysql.cc row_update_statistics_if_needed 980 dict_stats_update(table, DICT_STATS_FETCH, FALSE);
    6 storage/innobase/row/row0mysql.cc row_truncate_table_for_mysql 3318 dict_stats_update(table, DICT_STATS_RECALC_PERSISTENT_SILENT,FALSE)

    FALSE == where caller is not holding the lock and dict_stats_update will in turn hold the lock.

    =========
    | /* check whether caller_has_dict_sys_mutex is set correctly;
    | note that mutex_own() is not implemented in non-debug code so
    | we cannot avoid having this extra param to the current function */
    | ut_ad(caller_has_dict_sys_mutex
    | ? mutex_own(&dict_sys->mutex)
    | : !mutex_own(&dict_sys->mutex));

    ================

Speak Your Mind

*