July 27, 2007

More Gotchas with MySQL 5.0

Posted by peter

Working on large upgrade of MySQL 4.1 running Innodb to MySQL 5.0 and doing oprofile analyzes we found very interesting issue of buf_get_latched_pages_number being responsible for most CPU usage. It did not look right.

The close look revealed this is the function which is used to compute number of latched pages in Innodb Buffer Pool, which is called when you run SHOW INNODB STATUS and SHOW STATUS. In this case for same of monitoring SHOW GLOBAL STATUS was run every couple of seconds being considered fast operation, as it is in MySQL 4.1

But why buf_get_latched_pages_number ? Because this function traverses whole buffer pool to compute number of latched pages, which is quite slow with buffer pools containing millions of pages. But even worse this is done while holding global buffer pool mutex.

Another function which we’ve seen being responsible for high CPU usage is ha_print_info which also may traverse large arrays.

But this is only one of the gotchas. The second issue we ran into is number of Threads_Running being much higher for MySQL 5.0 compared to MySQL 4.1 It was something like 2-3 for MySQL 4.1 vs 40-60 for MySQL 5.0
At the same time however there were virtually no disk IO and CPU being loaded less than 20% (out of 4 Cores) in all cases which points to some locks/latches being responsible for this high amount of threads.

It turns out this issue is side effect of the first issue described - if many queries are being run and buffer pool is large, a lot of queries pile up waiting on locked innodb buffer pool mutex, so SHOW STATUS shows high number of running threads.

This theory can be confirmed by the fact using mysqladmin processlist | grep -v Sleep shows much smaller value which is close in MySQL 5.0 and 4.1

Hopefully this issue will be fixed sooner or later (I would be even happy to give up Innodb_buffer_pool_pages_latched n SHOW STATUS output

MySQL 5.0 Community Edition Regression

Posted by peter

It is true you might be better of being unaware about problems, because in this case you might not run into them :)
Just couple of days ago Kaj Arno told me there are issues reported with SHOW PROFILE patch available in latest MySQL Community Edition and I’ve been troubleshooting customer affected by one of these very issues today.

The Bug can dramatically increase System CPU usage (about 10 times in this case)

The “fix” was to get MySQL 5.0.44 sources from Dorsal Source which base

July 26, 2007

MySQL on OSCON

Posted by peter

Interesting enough this year people at OSCON do not show to much interest in the MySQL, and Databases for that reason. Our talk comparing performance of MySQL Storage Engines had probably 20-30 people, Monty’s talk on MySQL Source Code had about same number, and as Monty told me he never had so few people on his talk. This is from massive 2600 people attending the conference.

The other Database related talks I visited had similar size of audience - talks on migration To and From MySQL, Josh Berkus talk on Performance Optimization (this one if about full stack), Developing using SQL Lite. On other hand front end related talks such as PHP performance optimization or High Performance JavaScript were attended extremely well.

MySQL BOF supported by bear and pizza had about same number of attendees. With a lot of familiar faces.

MySQL booth was not to active ether - in many cases I could see lonely Kelly sitting on it. But This could be related to the fact there were few MySQL Developers hanging around it most of the time, compared to army of technical guys at PostgreSQL booth. I would say even SolidDB booth was more attended.

I’ve heard comments from some MySQL people this is probably due to the fact all folks really interested in MySQL go to Users Conference. This is surely true but I would expect significant Overlap between “Open Source Guys” and “MySQL Users” and if you’re happen to be on the conference with topic of your interest you’re likely to visit sessions etc. Or may be people think they know they need to know about MySQL already and nothing cool happened from MySQL Users Conference ?

In general my impression PostgreeSQL is attracting more interest on the OSCON than MySQL.

July 25, 2007

MySQL and PostgreSQL SpecJAppServer benchmark results

Posted by peter

Listening to Josh Berkus presentation on OSCON today I decided to take a closer look at SpecJAppServer benchmarks results which were published by PostgreSQL recently and which as Josh Puts it “This publication shows that a properly tuned PostgreSQL is not only as fast or faster than MySQL, but almost as fast as Oracle (since the hardware platforms are different, it’s hard to compare directly).”

If you look at Benchmark Results List you would see MySQL Scores 720.56 and PostgreSQL scores 778.14 JOPS on 12 cores. This seems to show PostgreSQL is some 10% faster, from the glance view.

If you take a closer look you however would notice hardware is different - MySQL benchmark use Sun Fire X4100 available in Nov 2005 using Opteron 285 CPU, PostgreSQL benchmarks use Sun Fire X4200 M2 available in September 2006, using AMD Opteron 2220 SE. We can see some 5% difference in CPU frequency alone (2600Mhz vs 2800Mhz) and it well may be more in terms of performance due to architecture optimizations.

The other big issue is using MySQL 5.0.27 which has Innodb scalability issues, which were a lot improved in 5.0.37 which I’d expect to show significant gains.

This is not to mention specific tuning for Benchmark which could be performed on database engine level. As Josh now seems to be working for Sun I expect there is significant optimizations which could be done (if they are required).

Working a lot on published SpecJAppServer benchmark while still working for MySQL I remember we spot number of issues in MySQL exposed by this benchmark with only few of them fixed in time to make it into the publication.

It is always hard to argue which performance you would get in the benchmarks in the difference conditions but my expectations would be getting at least 800 JOPS on updated software and hardware configuration, so I would argue MySQL is Slower.

On the other hand It does not really matter… Being 10-20% close in terms of performance is close enough for most applications for other properties such as scaling maintenance to become more important. So I would agree with Josh - PostgreSQL is not slow for this kind of applications and can well be used as MySQL alternative.

However such general transactional application (general so it can work with all databases) is not where most of MySQL sweet spots come from. MySQL performance gains usually come from MySQL unique features - non transactional MyISAM, Multi Value inserts, Query Cache, Merge Tables, Archive Tables, Simple Replication. Most of these do not come implicitly, you’ve got to use them explicitly to get performance advantage.

Regarding general features I guess some things are faster others are slower in MySQL. For example PostgreSQL is slower at connection creation and MySQL is very fast, while SubSelect optimization is very bad in MySQL for many cases until MySQL 5.2

July 24, 2007

What exactly is read_rnd_buffer_size

Posted by peter

Looking for documentation for read_rnd_buffer_size you would find descriptions such as “The read_rnd_buffer_size is used after a sort, when reading rows in sorted order. If you use many queries with ORDER BY, upping this can improve performance” which is cool but it does not really tell you how exactly read_rnd_buffer_size works as well as which layer it corresponds to - SQL or storage engine.

Honestly as it had name very similar to read_buffer_size which is currently only used by MyISAM tables I thought read_rnd_buffer_size is also MyISAM only. But talking to Monty today I learned it is not the case.

read_rnd_buffer can be used for All storage engines not only by MyISAM. It is used for some sorts to optimally read rows after the sort. Here is how it works:

As sort is performed it can be performed having only row pointers together with key value - which are offsets for MyISAM and primary key values for Innodb or storing full data which is being retrieved (good for small data lengths).

In case sort with row pointer storage is used and the fields which are being length can be converted to fixed size (basically everything but BLOB/TEXT) MySQL can use read_rnd_buffer to optimize data retrieval - As data is sorted by the key value it needs to be accessed in pretty much random row pointer (typically physical) order. MySQL takes bunch of pointers from sort_buffer (just enough so all rows fit in read_rnd_buffer as they are read) and sorts them by row pointer, when performs reading into read_rnd_buffer in the sorted order - it can be pretty much sequential if you’re lucky.

The read_rnd_buffer_size is important (optimization works in following conditions):

  • Row pointers are stored in the sort_buffer, not the whole data selected
  • Blob/Text columns are not selected
  • A lot of rows are retrieved after sort - if you have LIMIT 10 it is unlikely to help as MySQL will stop fetching rows by
    pointers quickly

For me this means since MySQL 4.1 this option is used in narrow range of cases - if you retrieve few fields (less than max_length_for_sort_data) data should be stored in sort buffer and sort file so there would be no need for read_rnd_buffer, if the selected columns are long so they are longer than max_length_for_sort_data it would frequently mean there are some TEXT/BLOB columns among them. It would be used however if there is large number of columns or there are long VARCHAR columns used - it takes only couple of UTF8 VARCHAR(255) to create a row which is longer than max_length_for_sort_data in its static presentation.

We should do benchmarks sometime to see how it really impacts performance both for MyISAM and Innodb.

July 23, 2007

Sphinx: Going Beyond full text search

Posted by peter

I’ve already wrote a few times about various projects using Sphinx with MySQL for scalable Full Text Search applications. For example on BoardReader we’re using this combination to build search against over 1 billion of forum posts totaling over 1.5TB of data handling hundreds of thousands of search queries per day.

The count of forum posts being large, is however not the largest we’ve got to deal in the project - number of links originating from forum posts is a bit larger number.

The task we had for links is to be able to search for links pointing to given web site as well. The challenge in this case is we do not only want to match links directed to “mysql.com” but links to “www.mysql.com” or “dev.mysql.com/download/” as well as they are all considered to belong to mysql.com domain, while searching for “dev.mysql.com/download”" will only match dev.mysql.com domain and files within /download/ directory.

Initially we implemented it in MySQL using partitioning by domain which link was pointing to. So “mysql.com” links were stored in one table group and “google.co.uk” on another. We still had serious challenges however - as each applies to many search URLS,
such as “dev.mysql.com/download/mysql-5.1.html” would match “mysql.com”, “dev.mysql.com”, “dev.mysql.com/download/” and
“dev.mysql.com/download/mysql-5.1.html” we could not use link=const where clause but had to use link like “prefix%” which means index could not be used to get 20 last links and filesort over millions of links we had to youtube.com wikipedia.org and other top domains was extremely slow. Not to mention counting number of links (and number number of distinct forum sites) pointing to the given URL or graphs showing number of links per day. To fight this problem we had to restrict number of days we allow to cover based on the amount of links to the domain… but for some top domains it was slow even with just 3 days worth of data.

You might point out if we had link_date between X and Y and link like “prefix%” kind of where clause we would not be able to use index past link_date part, it is true so we had to use link_date in ( ) and link like “prefix%” which allows to use both keyparts which is much better but not good enough.

Caching is not good enough in such case as we do not want a single user to wait for minutes. large variety of problematic search urls does not allow to use pre-caching not to mention general load on server such batch processing would put.

The first alternative to this approach was to store duplicate data storing link to “dev.mysql.com/download/mysql-5.1.html” as links to 4 url prefixes I mentioned above. Unfortunately this would blow up data stored quite significantly, requiring in average of 6 rows for each link and it does not solve all the problems - result counting and number of distinct sites were still pretty slow and we did not want to go into creating all this data as summary tables.

Instead we decided to use Sphinx for this kind of task which proved to be extremely good idea. We converted all URLs to search keywords and now these 6 rows become simply one row in sphinx index with 6 “keywords” - specially crafter strings which corresponded to the URLs. Of course we did not store these in the table but instead used UDF to convert URL to list of “keywords” on the fly.

As results we now can pull up results even for youtube.com for fractions of the second and we could show 3 months worth of data for any URLs. (We could use longer time span but we did not have enough memory for Sphinx attribute storage). It is especially great as there is still room for optimization - Sphinx stores word positions in the index, while we do not need them in this case as we’re doing kind of “boolean full text search”. Plus we can make index built sorted by timestamp which would allow to same on sorting which is now still happening.

Using Sphinx such non-traditional way required implementing some features more traditional for SQL databases rather than full text search applications. Group By was added to Sphinx so we could search number of matches per day, or number of matches per language.

For Domain Profile we’ve got to use even more of those features such as counting number of distinct sites pointing to the given url or domain etc. Honestly this is where we cheated a bit and distinct number is bit approximate for large numbers but it still works really well for our needs.

Sure we could use summary tables for domains but it would be a lot of work and raver inflexible if we would like to add some more features and take a lot of resources to update or periodically build for millions of domains.

As this part worked pretty well we also decided to use Sphinx for other part of the web site - Forum Site Profile. This uses some pre-generated data such as number of posts in total for forum or in thread but most of other stuff is built with Sphinx. This also uses fair amount of tricks using fake full text search to retrieve all posts from given web site or forum from the global sphinx index.

So in general we find parallel processing using sphinx pretty good solution for many data crunching needs especially when lack of parallel abilities in MySQL makes its use rather inconvenient and pre-generating data is inconvenient or impossible.

If you’re attending OSCON 2007 and would like to learn more about Sphinx we have a BOF on Thursday to talk on the topic.

July 18, 2007

Microslow patch for 5.1.20

Posted by Maciej Dobrzanski

Microslow patch has been there for some time, but only for earlier MySQL editions such as 4.1 and 5.0. Now it’s also available for the latest 5.1.

Because MySQL went through a lot of internal changes, the patch had to be written from scratch. It introduces some minor change in existing functionality and also adds new.

For the patch to work you must of course enable slow logging with --slow-query-log parameter. MySQL 5.1 has this nice feature which allows you to redirect query log (it’s actually called general log now) and slow log to CSV tables mysql.general_log and mysql.slow_log respectively. The patch will not work as intended with those tables, because there is no space for storing 64bit integers for time with microsecond resolution. Instead you should redirect the output for your logs to the plain old-fashioned files. In MySQL 5.1 it’s done by setting --log-slow-queries=slow.log and --log-output=FILE.

Also you need to configure so called long query time. It’s done either with startup parameter --long_query_time=n or by setting MySQL system variable with SET [SESSION|GLOBAL] long_query_time=n. The value is in microseconds (0 means to log all queries).

The new feature is that patch can allow you to log queries executed by the slave thread. This can be achieved by running MySQL with startup time parameter --log-slow-slave-statements. Please note, that any runtime changes of long_query_time or min_examined_row_limit variables will take effect on the slave thread only after it’s restarted with SLAVE STOP and SLAVE START commands.

The change in functionality is more of a bugfix than anything else. long_query_time value was meant to set time limit in microseconds, but in the end it was always compared to seconds of query execution time. Now it’s fixed, so for example you can log only those queries that run 0.3s or more with long_query_time=300000.

Maciek

How Innodb flushes data to the disk ?

Posted by peter

In my previous post I was a bit wrong giving Innodb some properties it does not have.
In fact Innodb does not currently sort pages in their position order flushing them to disk. Pages to be flushed are instead identified by other means - LRU and pages which contain oldest LSN (so which were flushed longest time ago).

As pages are to be flushed are identified Innodb looks at pages pages N-1 and N+1 and if they are dirty schedules flush for these as well. Such approach allows to merge few pages together and perform flushes in larger blocks.

It is really interesting to see how much sorting would speed things up, ie how much flushing pages 1 10000 200 20000 300 30000 in order 1, 200, 300, 10000, 20000, 30000 will help compared to random order and if it is worth to bother or if optimization Innodb performs is good enough and IO subsystem and RAID can do the rest.

July 17, 2007

Innodb Recovery - Is large buffer pool always better ?

Posted by peter

How does Buffer Pool size affects Innodb Performance ? I always expected the effect to be positive, Innodb with large buffer pool to performing better. Including Recovery of course. I even blogged about it.

It turns out it is not always the case. Last week I was called to help with Innodb crash recovery on box with large amount of memory (64GB) with Innodb Buffer pool configured to be just a bit over 20GB. Innodb recovery was taking long hours with reasonably sized Innodb log files (256M) gradually progressing slower and slower so it was just about 50% after 3 hours with very little disk IO and one CPU 100% busy.

I took oprofile to see what exactly CPU is being spent for and saw the following picture:

samples % app name symbol name
1345161 95.2609 mysqld buf_flush_insert_sorted_into_flush_list
37849 2.6804 no-vmlinux (no symbols)
8865 0.6278 mysql (no symbols)
7423 0.5257 libc-2.4.so (no symbols)
2623 0.1858 mysqld buf_calc_page_new_checksum

So it is inserting pages in the flush list which takes a lot of time…

Innodb is smart about a way it flushes pages, it tries to merge sequential pages together and generally flush things in order. It is not however overly smart inserting pages to the flush list - keeping simple sorted linked list, so being forced to scan good portion of it finding place to insert the page.

So I took a shot at restarting MySQL with much smaller buffer pool size and we got it recovered much faster.

Interesting enough the problem does not seems to happen with all recoveries out where as 20GB buffer pool is not way more than typical size - I keep buffer pool at 12GB on 16GB boxes for example. I guess the problem in this case was - there was enough memory for everything so flush list could grow unrestricted without Innodb bothering to flush anything (I only could see reads while recovery was going with large buffer pool) plus large distinct amount of pages modified so a lot of pages had to be added to the flush list.

I ran into this problem during Innodb recovery and I guess this is when the problem is most visible as Innodb is not doing anything else. I however would guess such inefficient flush list handling can be affecting performance on normal workloads as well.

Lets hope Heikki will find a time to assign this to someone to be fixed soon :)

Update: Heikki is telling me it only happens on recovery which is good, Also this function in fact sorts page not by position but by LSN this is why it is used only on recovery.

July 10, 2007

Silicon Valley Onsite consulting anyone ?

Posted by peter

Last time I was in Silicon Valley in April after MySQL Users Conference, this time I’m planning to spend July 30 - August 2nd in Silicon Valley after OSCON visiting friends and customers. If you’re located in Silicon Valley or San Francisco area and interested in some onsite MySQL Consulting I can offer 1 day and half day visits during these days with no added cost (meaning you just pay for consulting time, and Hotel and Travel is on me)

This time could be well spent to have mini-training on MySQL high performance design operations or development practices, to look into your application architecture and check it against growth requirements or simply look into performance challenges you might have.