April 21, 2014

HandlerSocket on SSD

We all enjoyed Yoshinori announcement of HandlerSocket, the plugin to MySQL which open NOSQL way to access data stored in InnoDB.
The published results are impressive, but I want to understand some, that’s why I run couple more experiments.
In blog post Yoshinori used the case when all data fits into memory, and one of question I had what if we put data on SSD ( FusionIO 320GB MLC in this experiment) how it will affect throughput. The idea there is to check if it can be good NOSQL solution with permanent storage.

I should give respect to HandlerSocket developers – I was able to install and get it working with Percona Server 5.1.50-12.1 without any issues.

So for experiment I used Cisco UCS C250 for server and Dell PowerEdge R900 for client running Perl script (single thread) with HandlerSocket client.

Table is standard sysbench table with 300 mil rows and I used kind of PK lookups queries to HandlerSocket.
To measure how IO access affects throughput, I vary amount of rows accessed in script, on 150 mil rows the table does not fit into memory (and the more rows we access, the more amount of IO we have to perform), and with 300 mil rows the datasize is twice as available buffer pool.

And we compare results when table is located on FusionIO and on regular RAID10 ( 8 disks).

and row results are on Wiki page

As you see with regular disk you can’t expect good throughput when data stops fitting into memory, while
with FusionIO it is pretty much acceptable. With data as twice as big as memory we only had about half throughput drop, which is pretty decent result.

I am looking to run write benchmarks on HandlerSocket and if the results are good we may include it in our Percona Server distribution.

About Vadim Tkachenko

Vadim leads Percona's development group, which produces the Percona Server and Percona XtraBackup. He is an expert in solid-state storage, and has helped many hardware and software providers succeed in the MySQL market.

Comments

  1. Steve says:

    I have also observed the 10,000 qps limit when testing Handlersocket with both PHP and Perl, under certain conditions. i.e not persisting the connection / index objects.

    However, I managed 150,000 qps (all cores on db saturated) using 50 client threads when keeping the connection and opened index objects persistent… but this dropped to 10,000 qps if I re-open them in every loop.

    Indeed… if I tried doing MySQLi based PK lookups in Perl/PHP, I hit an almost identical limit (10,500), and then got failed connects…

    So using persistent connection objects for both MySQLi and HandlerSocket access methods I achieved

    40,000 qps via MySQL (avg but fluctuating between 20,000 and 40,000)
    150,000 qps via HanderSocket (rock solid, no fluctuations).

    And this was with execute_single statements only.

    I suggest you re-run your tests and simply loop the execute_single or execute_multi statements. You should then see some pretty fantastic results. Also try executing through a low-latency link if available….

    //Steve

  2. Andy says:

    Hi,

    In Yoshinori’s post, he reported a result of 750K qps. In your benchmark the performance (when data set fit in memory) is about 600K qpm, which is about 10K qps.

    The difference between 750K qps and 10K qps is pretty big. Both you and Yoshinori benchmarked lookup by PK. What accounted for this huge difference in performance?

    Also, if you switched from HandlerSocket to regular SQL, what type of performance would you get? Seems like 10K qps is very achievable even using regular SQL interface.

  3. Thanks for testing. I expected you guys would be interested, so before announcing I checked HandlerSocket would work with the latest Percona Server. Great to see you installed without any issues.

  4. Vadim says:

    Andy,

    There are couple things

    - there I used single thread, while Yoshinori’s results are multi-threaded
    - I used Perl client, while Yoshinori used C++ client. As I see Perl takes a lot of overhead, so this one of
    reason why with Perl you are getting worse results.

  5. peter says:

    Vadim, Yoshinori,

    I’m wondering what does this number of 600K really means, especially when we speak about single socket/single thread.
    1GB network has RT which would not allow anything close to this. If we’re to compare batched accesses, the good question
    is how many keys do we access per batch (and hence round trip) and also it should be compare to select with IN clause in MySQL
    and multi_get in memcache is not it ?

  6. Victor says:

    Vadim,

    Your Cisco server has 346GB RAM and 320GB FusionIO – how could you possible load data from FusionIO 320GB that `does not fit into 346GB memory`?

  7. Vadim says:

    Victor,

    To emulate IO I used

    innodb_buffer_pool_size=35G
    innodb_flush_method = O_DIRECT

    That is only 35G were available for InnoDB.

  8. Ken says:

    Peter, He was using 1 quad-port NIC (Broadcom NetXtreme II BCM5709) and used 3 ports of it. He also said that “Talking about NIC port to CPU core ratio, so far I think 2 cpu cores per NIC port is enough for HandlerSocket read-only workloads. %us+%sy was almost 100% when using 3 ports / 8 cores so we can’t expect much higher performance by just increasing ports.”

    There was also an interesting comment from a user:
    “NICs is you may be able to get a lot more speed if you multiplex the IRQ channels of NICs that support Tx/Rx queues. Here is a post (http://bit.ly/dkugV0) where I got the NOSQL datastore redis to go from about 220 qops/s to 420K qop/s w/ a single NIC (that has Tx/Rq queues) and a quadcore @ 3.0 Ghz.”

    and for the fetching multiple rows in a single round-trip is there but no specific number.
    “multi_get operations (similar to IN(1,2,3..), fetching multiple rows via single network round-trip) are also supported.”

    The gain is basically from by-passing table open/close and SQL Parsing.

  9. Steve says:

    I have also observed the 10,000 qps limit when testing Handlersocket with both PHP and Perl, under certain conditions. i.e not persisting the connection / index objects.

    However, I managed 150,000 qps (all cores on db saturated) using 50 client threads when keeping the connection and opened index objects persistent… but this dropped to 10,000 qps if I re-open them in every loop.

    Indeed… if I tried doing MySQLi based PK lookups in Perl/PHP, I hit an almost identical limit (10,500), and then got failed connects…

    So using persistent connection objects for both MySQLi and HandlerSocket access methods I achieved

    40,000 qps via MySQL (avg but fluctuating between 20,000 and 40,000)
    150,000 qps via HanderSocket (rock solid, no fluctuations).

    And this was with execute_single statements only.

    I suggest you re-run your tests and simply loop the execute_single or execute_multi statements. You should then see some pretty fantastic results. Also try executing through a low-latency link if available….

    //Steve

Speak Your Mind

*