August 30, 2014

Handling big result sets

Sometime it is needed to handle a lot of rows on client side. Usual way is send query via mysql_query and than handle the result in loop mysql_fetch_array (here I use PHP functions but they are common or similar for all APIs, including C).
Consider table:

And simple PHP script which retrieves all rows from the table:

Inital execution time is : 99.04 s.

That’s a bit long and can be improved. How? Let’s look mysql_query function: http://www.php.net/manual/en/function.mysql-query.php. Unfortunately there is nothing interesting for our current task excluding link to mysql_unbuffered_query and this link says:

mysql_unbuffered_query() sends a SQL query query to MySQL, without fetching and buffering the result rows automatically, as mysql_query() does.

So mysql_query buffers all rows on client side. Let’s try our script with mysql_unbuffered_query instead of mysql_query.
execution time: 54.34 s
Good improvement.

The identical results we can get with mysqi_ family. There is mysqli_query function which gets two parameters and second one is result mode : MYSQLI_USE_RESULT or MYSQLI_STORE_RESULT. (Also there can be used pair mysql_real_query / mysql_use_result or mysql_store_result, which are the same). mysqli_query with MYSQLI_STORE_RESULT (used by default) is equal mysql_query, and with MYSQLI_USE_RESULT – is equal to mysql_unbuffered_query.
Results:
MYSQLI_STORE_RESULT: 97.83s
MYSQLI_USE_RESULT: 52.89 s

Next thing which can help is prepared statements. Why?
Let me quote Peter:

If prepared statements are not used server needs to convert millions of values from int to the
string and client has to do reverse conversion. This takes time. Furthemore string data takes more space which also slow down things a bit.

So script with prepared statements:

Please note prepared statements also can use $stmt->store_result(); to buffer the result on client side.
The results:

with $stmt->store_result(): 55.20s
without $stmt->store_result(): 39.77s.

So unbuffered query improves execution time by 1.83 times, and unbuffered query with prepared statement protocol by 2.53 times.
It is often though mysql_store_result is faster than mysql_use_result, as you can see in certain conditions it could be opposite.
Prepared statements are typically considered for performance reasons if statement needs to be executed several time – as you can see there is other cases as well – if you have query which retrieves a lot of data from MySQL prepared statements may also perform better, probably because of binary protocol and so no data conversion to string as with standard protocol

About Vadim Tkachenko

Vadim leads Percona's development group, which produces Percona Clould Tools, the Percona Server, Percona XraDB Cluster and Percona XtraBackup. He is an expert in solid-state storage, and has helped many hardware and software providers succeed in the MySQL market.

Comments

  1. eric says:

    Thank you for this explaination.

    I won more than 30 seconds on one request

  2. Robert Hopson says:

    I noticed you’re connecting to localhost; do you think that in many environments there is external db server, and the buffering might more positively affect performance?

    I’d be happy to help test this if you wanted to share your test database.

  3. Vadim says:

    Robert,

    Sure,
    here is data: http://mysql.apachephp.com/files/longf.sql.gz
    Yes, buffering might a have positive effect in case with distributed server – but how much it should be tested.

  4. P. Kuiper says:

    Hi Vadim,

    Thanks for pointing this out. I tried this on my database and I came up with the following results:

    mysql_query:
    Number of rows: 1000000. execution time: 24.53382897377 s

    mysql_unbuffered_query: (same code, only changing mysql_query to mysql_unbuffered_query)
    Number of rows: 1000000. execution time: 16.347587108612 s

    I repeated this a few times and the above times seem to be persistent, ie almost the same for every run.

    Now here is the kicker:

    Using the new mysqli and prepared statement based on the code you provided:

    Number of rows: 1000000. execution time: 1.9737339019775 s

    BTW my table (InnoDB) has 10+ million records in there. The query I used:
    SELECT * FROM current LIMIT 1000000;

    I had to limit it to 1 million because it didn’t really like the whole 10 million :) (The waiting on the result drove me nuts)

    Though I have to admit, the first time I ran this code it took 14 seconds. After a rerun it was around 1.9 to 2.0 s. Does mysql cache these prepared statements perhaps? Either way the performance benefit is worthwhile :)

    Thank you for pointing this out. The code used for the last test is actually a few lines more and has somewhat more complicated due to the statement preparing etc. This shows the shortest and easiest path sure isn’t always the best :)

    Is there any reason you used the OO style in your example instead of the procedural style? Myself, I used the procedural style.

    Regards,

    Peter

  5. P. Kuiper says:

    Oh, my MySQL is connected to my webserver through a direct (cross) cable. So no connecting to localhost :)

  6. Apachez says:

    This does also exist in the perl api for mysql (dbi/dbd) and can be used as:

    $dbh->{‘mysql_use_result’}=0; #disable
    $dbh->{‘mysql_use_result’}=1; #enable

    and since mysql 4.1 server based prepares can be used with:

    $dbh->{‘mysql_server_prepare’} = 1;

    however the server based prepares will always use store_result instead of use_result for some reason according to some docs/forumposts.

    One note, in the perl documentation it is said that use_result tends to block other processes. Is this also the case with php?

    If Vadim has the possibility, could you please test the benchmark with say 10 och 100 concurrent runs of this testscript with use_result vs with store_result to see if some blocking migh occur which will make use_result slower than store_result for multiuser environments (like if you have more than 1 concurrent mysql query) ?

  7. Vadim says:

    Peter Kuiper,

    Thank you! Great results!

    I think even better results you can get with native C API and prepared statements.
    It looks like in PHP binded variables have additional processing, but in C they are mapped to native
    C variables and result is fetched without overhead.

    About OO – no special reason. I just copy-pasted parts of code from PHP manual.
    In common I also prefer procedural way, as historically OO works slow in PHP.

  8. Vadim says:

    Apachez,

    mysql_use_result blocks other threads in next sense
    (qoute from http://dev.mysql.com/doc/refman/5.0/en/mysql-use-result.html)

    On the other hand, you shouldn’t use mysql_use_result() if you are doing a lot of processing for each row on the client side, or if the output is sent to a screen on which the user may type a ^S (stop scroll). This ties up the server and prevent other threads from updating any tables from which the data is being fetched.

    Also other drawbacks of mysql_use_result:
    When using mysql_use_result(), you must execute mysql_fetch_row() until a NULL value is returned, otherwise, the unfetched rows are returned as part of the result set for your next query. The C API gives the error Commands out of sync; you can’t run this command now if you forget to do this!

    You may not use mysql_data_seek(), mysql_row_seek(), mysql_row_tell(), mysql_num_rows(), or mysql_affected_rows() with a result returned from mysql_use_result(), nor may you issue other queries until mysql_use_result() has finished.

  9. peter says:

    Vadim,

    This problem of mysql_use_result mainly applies to MyISAM tables as table locks will be held for long time. Also it only applies to the cases then temporary table is not used to store full result set before sending it. For Innodb tables it is practically never problem due to onsistent reads for MyISAM tables it can be worked around by using SQL_BUFFER_RESULT option… It does buffering in temporary table. It is likely to kill all speed benefit on mysql_use_result however :)

  10. Robert Hopson says:

    I ran this code on my MySQL 4.1 server, connecting remotely. Averages for several runs:

    Initial Time: 73.3s
    Unbuffered: 59.3s

    Using MySQLi: 39.46
    MySQLi w/o store_result(): 26.44

    So it would seem that your findings hold true for remote servers. Perhaps on a more congested network there would be a point when buffering would provide some benefit?

    Thanks for taking the time to investigate this.

  11. Vadim says:

    Robert,

    I think you are right about congested networks – I’d expect on connection with big latency the buffering makes sence. Maybe I will test it somehow :)

  12. Marc says:

    When do I need MYSQLI_STORE_RESULT and $stmt->store_result()

    Is there only a benefit if I receive many rows or are they useful for simple selects with LIMIT 10 or similar?

    What are the disadvantages?

    regards

Speak Your Mind

*