July 26, 2014

Another scalability fix in XtraDB

Recent scalability fixes in InnoDB and also Google’s and your SMP fixes almost made InnoDB results acceptable in primary key lookups queries, but secondary indexes were forgotten for some time. Now having Dell PowerEdge R900 on board (16CPU cores, 16GB RAM) I have some time for experiments, and I played with queries


against table

with table size 1mil records fully fitting in memory. I run it with innodb_thread_concurrency=16 to match threads inside InnoDB with count of cores.

Results for InnoDB-plugin-1.0.2 were kind of discouraging, dropping down just after 8 connections, si I decided to test MySQL-5.1.30 with standard InnoDB. It was better, but still far from what we can expect.

After investigation Yasufumi pointed to page_hash mutex, which was abused. This mutex is used as mutex even in places where shared read lock is enough – and we replaced page_hash mutex to page_hash read-write lock.

The new results you can see on graph.

The patch for page_hash will be included in next release of XtraDB, and expect new results.

for reference InnoDB was run with next parameters:

About Vadim Tkachenko

Vadim leads Percona's development group, which produces Percona Clould Tools, the Percona Server, Percona XraDB Cluster and Percona XtraBackup. He is an expert in solid-state storage, and has helped many hardware and software providers succeed in the MySQL market.

Comments

  1. peter says:

    Hm. Interesting why plugin would degrade so badly here. I though most of the scaling issues with plugin were related to the storage engine lock which was fixed by now.

  2. Mark Callaghan says:

    Very nice. Is there a sysbench command line that you used for this test?

  3. Vadim says:

    Mark,

    You need sysbench 0.5 from SVN, phptest.lua script from lp:~percona-dev/perconatools/phpweb, and command to use

    sysbench —-oltp-table-size=1000000 –init-rng=on –rand-init=on –mysql-engine-trx=yes –rand-type=uniform –subtest=READ_KEY_POINT_LIMIT –test=phptest.lua –oltp-table-name=sbtest –max-requests=0 –num-threads=N –mysql-db=sbtest –max-time=180 –max-requests=0 run

  4. pat says:

    This is one of those cases where the professional developer in me is wincing. How sure can you really be that a read lock is sufficient? Presumably the original developers opted for the mutex rather than a read lock for one of two reasons:

    1) It was convenient and he was in a hurry/lazy
    2) There’s a subtle reason the mutex is really required

    While option 1 is possible, the innodb guys don’t seem like the lazy/sloppy sort (although I freely admit to never having met them), and I’m kind of assuming they’re looking at scalability issues as well.

    So if turning a few calls from _getMutex into _getLock is all it takes to get a significant bump in throughput, I’m assuming they’d have figured it out already.

  5. Vadim says:

    pat,

    I also can’t and will not comment InnoDB developers, I believe it is brilliant engineering team.

    Note that benchmark was done on 16-core server and problem really appears on 8+ threads, so it will not noticeable on server with 8 and less cores. I have no idea what hardware InnoDB uses for internal development and testing. Let me show you the comment from Google patch http://code.google.com/p/google-mysql-tools/wiki/InnodbIoTuning : “Rate limiting is used to prevent IO done by background threads from using all of the capacity of the server. The limit is based on the assumption that the server can do 100 IOPs. That is rarely true today, so we added a variable to specify the IOPs provided by the server” – this gives me thought that InnoDB is not taking into account recent hardware.

  6. pat says:

    Vladim,

    I hope you’re right because it implies that if the innodb team spent some serious time looking at higher end configurations they could get significant throughout increases w/o archtiectural changes.

    Right now I’m running into cases where I’ve got some very high end boxes in the data center that I’d like to use as innodb servers, but all the scalability benchmarks seem to indicate I’d be better off sticking with 8 core servers leading to the weird configuration where I’ve got 16 core ap

    These days its hard to buy a commercial grade server with < 8 cores, or at the very least its make no financial sense.

    — Pat

  7. peter says:

    Pat,

    I think there is also a lot of history here. When Heikki started Innodb he did his work and investigation on 2 CPU Pentium box or something like it. When given thing was not a contention I assume he just kept it simple. As we get a lot of cores a lot of these contentions become exposed. Some in particular contention on buffer pool pages fixed by Innodb team and I expect more to be fixed soon. The fixes does not have to be overly complicated in a lot of cases though. The teams at Percona and Google are just going ahead of the curve getting fixes right now and often available for MySQL versions which are currently used in production.

    Note it is a general mistake to think developers of the popular software should have all simple issues worked out. Look at MySQL itself – how could it get through the years with single mutex on table cache or key cache ?

    Look at Linux kernel with arguably most popular EXT3 file system – which has inode locking for O_DIRECT, basically serializing all concurrent writes to the same file ?

  8. Pat,

    “if the innodb team spent some serious time looking at higher end configurations they could get significant throughout increases…”

    Right. They could IMO. But will they? My take on any unfinished code — InnoDB, Falcon, PBXT, MySQL, Maria, even Maatkit — is this: don’t pin your hopes on code that might exist someday. When it’s done, use it — until then don’t spend resources and time waiting for a solution to your problems. Either find a different solution, or make sure that the one you need actually becomes reality. It’s kind of like Stephen Covey’s circle-of-influence thing, only applied to software choices. A nice thing about GPL software is that you are not utterly dependent on something that’s beyond your control. You can change it yourself or you can sponsor/hire someone else to change it, which a fair number of Percona customers are choosing to do these days.

    What are the real economics around your choice of hardware? How many boxes, multiplied by what economic factors (initial purchase price, performance per dollar thereafter, performance per watt…) are you looking at? Multiply that over a year, and see how much you can afford to sponsor the features and scalability you need to lower your TCO. If you get 2x performance for your workload, how much is that worth to you? How much does it cost to wait and hope? Do you have any reasonable confidence that you know what InnoDB is working on or will release next, or what they even believe is a problem for users?

  9. In the article you’ve got a little typo: “I decided to test MySQL-5.0.30 with standard InnoDB”, based on the graph this should be 5.1.30.

  10. Vadim says:

    Daniel,
    Thank you fixed

  11. Vadim, Baron,

    thank you for noticing this. We studied the page hash mutex protection when merging the Google scalability patches.

    A question is if an rw-lock is better than splitting the mutex into several mutexes. Marko is studying this today.

    Best regards,

    Heikki

  12. Vadim says:

    Heikki,

    That’s great. If you fix it in InnoDB – less work for us :)

  13. Yasufumi says:

    peter,

    Sorry for my late reply.

    I think InnoDB-Plugin must use page_hash more times than the normal InnoDB for its new page management system (for data compression).

    So, “5.1.30 InnoDB page_hash-patch” will show more performance in this case for now…

Speak Your Mind

*