August 23, 2008

How to track down the source of Aborted_connects

Posted by Baron Schwartz

Yesterday I helped someone who was seeing a lot of "server has gone away" error messages on his website. While investigating this problem, I noticed several things amiss, which appeared to be related but really weren't. The biggest measurable sign was

CODE:
  1. [percona@server ~]$ mysqladmin ext | grep Abort
  2. | Aborted_clients                | 14835        |
  3. | Aborted_connects               | 15598        |

[read more...]

July 30, 2008

Quick tip: how to convert tables to InnoDB

Posted by Baron Schwartz

I use Maatkit for a lot of grunt work and thought you might appreciate this quick tip. Suppose you have a bazillion tables to convert from MyISAM to InnoDB, but they are mixed in with other tables that are already InnoDB, or are another storage engine that you don't want to touch.

CODE:
  1. mk-find <db_name> --engine MyISAM --exec "ALTER TABLE %D.%N ENGINE=INNODB" --print

Here's a bonus tip, while I'm at it. I had a client a while back whose application creates tables as needed, so they had about 90,000 tables in a bunch of different databases, all named things like user_123_456_friends. I wanted to add an index to them -- but not to the ones named friends_123_456_user.

CODE:
  1. mk-find <db_name> --tblregex '^user_\d+_\d+_friends$' --exec 'ALTER TABLE %D.%N ADD KEY(site_id)'

Boy, is that a lot easier than adding indexes to 90k tables by hand!

July 25, 2008

The #1 mistake hosting providers make for MySQL servers

Posted by Baron Schwartz

This article is not meant to malign hosting providers, but I want to point out something you should be aware of if you're getting someone else to build and host your servers for you.

Most hosting providers -- even the big names -- continue to install 32-bit GNU/Linux operating systems on 64-bit hardware. This is a serious mistake.

You have to tell them to install a 64-bit operating system. If you don't then you will come to a point where your needs grow and you want to use more memory -- and they will gladly install 8 or 16GB of memory for you, but MySQL can't use it because it runs in a single process, which is limited to about 2.5GB of memory. And then you have to rebuild the whole operating system from scratch. But you don't want any downtime, so you have to buy another server, set it up as a slave, switch your site to use it, and then rebuild the old server. That 32-bit OS turned into a pretty expensive mistake.

I do not know why the hosting providers keep doing this. Just yesterday I got a quote from a hosting provider for a medium-high-end system with 8GB of RAM, and forgot to tell them 64-bit OS, and they actually listed 32-bit explicitly on the quote -- useless! I would estimate about half of all the hosted systems I've seen so far have this mismatch. I don't know why they do this -- maybe there is a reason, but I don't know it and it looks pretty silly to me. 64-bit hardware and operating systems aren't new anymore. In fact, 32-bit is hard to find in server-class hardware these days. So it certainly looks like the hosting companies need to change what they're doing, but maybe there's a different reason.

July 13, 2008

How to Outrun the Lions

Posted by Baron Schwartz

I just posted slides from a talk I gave at a Facebook application developer conference in Las Vegas this weekend. The talk is titled Outrun the Lions. Our customers run several of the top 10 applications on Facebook right now (as measured by the number of active users), and I revealed the secrets to building applications that can handle the load.

[read more...]

July 3, 2008

How to load large files safely into InnoDB with LOAD DATA INFILE

Posted by Baron Schwartz

Recently I had a customer ask me about loading two huge files into InnoDB with LOAD DATA INFILE. The goal was to load this data on many servers without putting it into the binary log. While this is generally a fast way to load data (especially if you disable unique key checks and foreign key checks), I recommended against this. There are several problems with the very large transaction caused by the single statement. We didn't want to split the file into pieces for the load for various reasons. However, I found a way to load the single file in chunks as though it were many small files, which avoided splitting the file and let us load with many transactions instead of one huge transaction.

[read more...]

June 23, 2008

Neat tricks for the MySQL command-line pager

Posted by Baron Schwartz

How many of you use the mysql command-line client?  And did you know about the pager command you can give it?  It's pretty useful.  It tells mysql to pipe the output of your commands through the specified program before displaying it to you.

Here's the most basic thing I can think of to do with it: use it as a pager.  (It's scary how predictable I am sometimes, isn't it?)

[read more...]

May 31, 2008

Is DNS the Achilles heel in your MySQL installation?

Posted by Baron Schwartz

Do you have skip_name_resolve set in your /etc/my.cnf? If not, consider it. DNS works fine, until it doesn't. Don't let it catch you off guard.

Do you really need to restrict MySQL users based on hostnames? If you don't, you should probably disable this feature of MySQL's authentication system. You never know when your hosting provider's DNS (or your own for that matter) will go into the toilet. And when that happens, MySQL mysteriously stops letting users log in, and all kinds of chaos ensues. Worse, it can be kind of hard to know that this is the problem, and diagnosing adds to your downtime.

[read more...]

May 24, 2008

INFORMATION_SCHEMA tables in the InnoDB pluggable storage engine

Posted by Baron Schwartz

Much has been written about the new InnoDB pluggable storage engine, which Innobase released at the MySQL conference last month. We've written posts ourselves about its fast index creation capabilities and the compressed row format, and how that affects performance. One of the nice things they added in this InnoDB release is INFORMATION_SCHEMA tables that show some status information about InnoDB. Here are the tables:

SQL:
  1. mysql> SHOW TABLES FROM INFORMATION_SCHEMA LIKE 'INNODB%';
  2. +----------------------------------------+
  3. | Tables_in_INFORMATION_SCHEMA (INNODB%) |
  4. +----------------------------------------+
  5. | INNODB_CMP                             |
  6. | INNODB_CMP_RESET                       |
  7. | INNODB_CMPMEM                          |
  8. | INNODB_CMPMEM_RESET                    |
  9. | INNODB_LOCK_WAITS                      |
  10. | INNODB_LOCKS                           |
  11. | INNODB_TRX                             |
  12. +----------------------------------------+

The _CMP tables show statistics about compression; they contain a lot of useful information about compression, decompression, memory management, fragmentation etc. Beware that selecting from the tables whose names contain RESET has a side effect: it resets the statistics back to 0.

There are also locks and transactions tables. A while ago, the InnoDB developers contacted me to ask my opinion about what would be useful to put in the INFORMATION_SCHEMA. I told them the single biggest thing I could not get from InnoDB at the time was visibility into which transactions are blocking others when there are lock waits. It appears that they agreed this was important to add. (I subsequently discovered that it is possible to find out more information on InnoDB locks even in the older versions of InnoDB, but it's not really easy.)

These tables are fully documented in the InnoDB plugin manual, along with extensive examples of how to use them to find out what is blocking what and so on. Note that the InnoDB plugin manual is being maintained on www.innodb.com, not as part of the regular MySQL manual.

April 22, 2008

How to estimate query completion time in MySQL

Posted by Baron Schwartz

Have you ever run a query in MySQL and wondered how long it'll take to complete? Many people have had this experience. It's not a big deal until the query has been running for an hour. Or a day and a half. Just when IS that query going to finish, anyway?

There are actually a few ways to estimate how long it'll take for the query to complete, depending on what the query is. One of the simplest is to estimate how many rows the query needs to examine, measure how fast it's working, and do the math.

As an example, I recently worked on a customer's site where a typical data-warehousing query needed optimization. It was a fact table joined to two dimension tables -- a classic star schema query. The fact table was very large, and after some tuning (I'll write more about that later) I convinced MySQL to perform the query as a table scan of the fact table, then an index lookup in each dimension table in turn.

The table structures aren't really important. All you need to know for this post is that the fact table has about 150 million rows and the query was taking over 10 minutes to complete. Actually, it had never completed at all, according to the customer. I wanted to know whether I'd be waiting for another minute, hours, or days.

The answer was simple, because there was nothing else running on the server. That means that SHOW GLOBAL STATUS gave a rough idea of what the query was actually doing. (If there had been a lot of activity on the server, I wouldn't have been able to say with confidence that SHOW GLOBAL STATUS showed what that one query was doing; activity from other queries would have been mixed in there too. It would be great to be able to choose another thread and watch only its status, but MySQL doesn't currently let you do that.)

The solution was to measure how fast the query was scanning rows in the table scan of the fact table. This is shown by the Handler_read_rnd_next status variable. Here's an easy way to watch it (innotop is another handy way):

CODE:
  1. mysqladmin extended -r -i 10 | grep Handler_read_rnd_next
  2. -- ignore the first line of output...
  3. | Handler_read_rnd_next             | 429224      |

So the server was reading roughly 43K rows per second, and there were 150 million rows in the table. A little math later, and you get 3488 seconds to completion, or a little less than an hour. And indeed the query completed in about 55 minutes.

This is the simplest case, and there are more complicated ones to consider, but hopefully this gives you an idea how you can tackle this problem in different situations.

April 2, 2008

Stored Function to generate Sequences

Posted by peter

Today a customer asked me to help them to convert their sequence generation process to the stored procedure and even though I have already seen it somewhere I did not find it with two minutes of googling so I wrote a simple one myself and posting it here for public benefit or my later use :)
[read more...]


This page was found by: backup table mysqlad...