A few weeks ago I was asked to isolate some functionalities from Mark Callaghan’s MySQL patch bundle. They were extensions adding per-user and per-table accounting to the database, accessible with a new set of commands such as SHOW TABLE_STATISTICS, SHOW INDEX_STATISTICS and SHOW USER_STATISTICS. The first two can interest anyone to periodically check what data or which index are the most active or which are not being used at all and could be candidates for dropping. All the patch features will surely be a great help for hosting providers to produce detailed reports on how each customer utillizes the database.

Having many different users running their queries on a single database instance means they will be constantly “battling” for the resources, each one caring only for his own application, completely ignoring the fact they chose shared hosting. Of course this alone is not a problem and theoretically if you don’t over-allocate, the users should never become conscious of the struggle happening underneath.

Obviously, that’s only in the ideal world and either it’s not even possible or, at best, economically inefficient to allocate the entire machine for only a few customers simply because their skills in SQL are limited.

The real life customers generate very uneven load – some barely any and some a lot. It has to be said that the SQL performance is very tricky, it’s very easy to produce large sets of data for the hardware to process from even small tables. Therefore often the resulting database load may not be directly related to the web traffic some customer has. The load can also be changing in time, mostly growing, often spiking. One way or another after a while every server becomes too tight for everyone. And for you it is best to be pro-active, to observe the changes as they happen and react before any user can start complaining. You can watch the load growth and based on that make the decision on splitting half of the users to some newly purchased hardware. But how to avoid the undesired situation that you only sort out the least active customers from the heavier ones?

It’s probably not uncommon someone uploads a really badly designed application, which runs queries not using indexes or which examine a lot of rows. This can elevate the load, but just as well it can bring the server down. You can have slow query log enabled and review it occasionally, but doing it offline means you can’t react in the real time on the critical situation. You can have some sophisticated mechanisms to browse the process list or scan the slow log live, but that is a bit ugly solution.

That is where this patch can be very useful. With simple MySQL SHOW command it gives you the basic statistics on the database activity broken down by users, tables or indexes. Monitoring those numbers every minute can show you changes in usage by customer and immediately catch any spikes as they happen pinpointing the source to make an intervention. Looking at totals over a longer period you can catch users causing the most traffic to help you with some strategic decisions.

Sample outputs:

The statistics for tables and indexes can be cleaned with the corresponding FLUSH command (e.g. FLUSH INDEX_STATISTICS).

All that can be fed into scripts to bring you automated reports or to perform actions such as:

  • list users by the number of queries issued or any other information
  • find active tables which do not have any indexes used meaning table scans
  • nag users they should consider upgrading to a dedicated server because their usage is very high
  • temporarily block users having huge spike in MySQL usage

Personally I would see a few more things implemented. A few examples to mention:

  • per-user CPU time usage, some queries may be CPU bound rather than reading lots of rows
  • support for prepared statements which all currently fall under Other_commands
  • a few MySQL performance counters broken down by users (full joins, table scans, disk temp tables, etc.)
  • limiting command availability to SUPER users only or restricting regular users to see their own statistics only
  • information_schema support to allow SELECT query transformations, joins and better filtering for the statistics

Also combining these new commands with our msl patch, which introduces the advanced query logging, would give the administrator powerful tool to almost effortlessly catch the users doing bad things on the database and point them to the specific problems they have (e.g. particular query using full joins).

If you want the statistics patch mentioned in this article, then Bluehost.com CEO, Matt Heatton, has published it on his blog.

10 Comments
Oldest
Newest Most Voted
Inline Feedbacks
View all comments
quaker

Hello,

What about servers with 64k+ users (with one database per user), about 8k+ users active (sending at least one query per day). I’m asking about performance penalty and extra memory usage.

Mark Callaghan

It is nice to see another addition to the Percona Community Edition of MySQL (microslow patch, this, patch for IO statistics).

I may be biased, but I could not do without this feature. It makes it very easy to determine which users create the most load on the server and how that load changes over time. Once a load problem is isolated to one user, statement sampling can be used to determine the statements that may be causing the problem. I have been able to significantly reduce load on important servers with this by determining the statements that should be changed.

I think Percona published a patch that tracks IO by statement (can you reference that patch again). I hope to add a column for IO per user to SHOW USER_STATISTICS. I think that the patch left out FLUSH TABLE_STATISTICS, but that will be in a future patch.

Tim Harper

That’s hot! Looks like a great way to profile which indexes are pulling their weight.

Jay Pipes

You also may want to check this out: http://mattheaton.com/?p=130

Mark Callaghan

The costs for monitoring should be low and whatever you lose in monitoring overhead should be insignificant compared to what you can get back by identifying and then fixing users/statements that are causing load problems. Maybe Peter can write a tutorial on using this — after identifying the problem user, the next step is to sample or log all statements they run to find the problem statements.

Peter Zaitsev

Mark,

Here is the patch information:
http://www.mysqlperformanceblog.com/2007/10/31/new-patch-for-mysql-performance/

We only track IO for Innodb – for MyISAM it is kind of impossible to split logical IO from OS cache from physical IO from the media unless you know a way to check if logical pread required physical IO inside of OS 🙂

Roman

mysqld crashes with this patch on query
create table bbb SELECT * FROM aaa;
when table aaa does not exists

# mysql -A
Welcome to the MySQL monitor. Commands end with ; or \g.
Your MySQL connection id is 2817 to server version: 5.0.51b-log

Type ‘help;’ or ‘\h’ for help. Type ‘\c’ to clear the buffer.

mysql> create database foobar;
Query OK, 1 row affected (0.00 sec)

mysql> use foobar
Database changed
mysql> create table bbb SELECT * FROM aaa;
ERROR 2013 (HY000): Lost connection to MySQL server during query

nrg

Hello! beautiful patch! Can we adopt it to new centos mysql 5.0.77 ?

Daniel

Works out of the box with MySQL 5.0.70 (gcc-4.2.1,libc-2.6.1 (build against 2.6.19 kernel-header),binutils-2.6.18)

No crashes,no misbehavoir.