Handling big result sets

Sometime it is needed to handle a lot of rows on client side. Usual way is send query via mysql_query and than handle the result in loop mysql_fetch_array (here I use PHP functions but they are common or similar for all APIs, including C).
Consider table:

CREATE TABLE `longf` (
  `f1` int(11) NOT NULL auto_increment,
  `f2` date default NULL,
  `f3` date default NULL,
  `f4` varchar(14) default NULL,
  `f5` varchar(6) default NULL,
  `f6` date default NULL,
  `f7` smallint(6) default NULL,
  `f8` smallint(6) default NULL,
  `f9` varchar(13) default NULL,
  `f10` varchar(39) default NULL,
  `f11` int(11) default NULL,
  `f12` float default NULL,
  `f13` int(11) default NULL,
  `f14` smallint(6) default NULL,
  `f15` varchar(39) default NULL,
  `f16` date default NULL,
  `f17` smallint(6) default NULL,
  `f18` int(11) NOT NULL,
  `f19` date default NULL,
  `f20` date default NULL,
  `f21` varchar(14) default NULL,
  `f22` varchar(6) default NULL,
  `f23` date default NULL,
  `f24` smallint(6) default NULL,
  `f25` smallint(6) default NULL,
  `f26` varchar(13) default NULL,
  `f27` varchar(39) default NULL,
  `f28` int(11) default NULL,
  `f29` float default NULL,
  `f30` int(11) default NULL,
  `f31` smallint(6) default NULL,
  `f32` varchar(39) default NULL,
  `f33` date default NULL,
  `f34` smallint(6) default NULL,
  `f35` int(11) NOT NULL,
  PRIMARY KEY  (`f1`)
);

select count(*) from longf;
+----------+
| count(*) |
+----------+
|  5242880 | 
+----------+

CREATE TABLE `longf` (

`f1` int(11) NOT NULL auto_increment,

`f2` date default NULL,

`f3` date default NULL,

`f4` varchar(14) default NULL,

`f5` varchar(6) default NULL,

`f6` date default NULL,

`f7` smallint(6) default NULL,

`f8` smallint(6) default NULL,

`f9` varchar(13) default NULL,

`f10` varchar(39) default NULL,

`f11` int(11) default NULL,

`f12` float default NULL,

`f13` int(11) default NULL,

`f14` smallint(6) default NULL,

`f15` varchar(39) default NULL,

`f16` date default NULL,

`f17` smallint(6) default NULL,

`f18` int(11) NOT NULL,

`f19` date default NULL,

`f20` date default NULL,

`f21` varchar(14) default NULL,

`f22` varchar(6) default NULL,

`f23` date default NULL,

`f24` smallint(6) default NULL,

`f25` smallint(6) default NULL,

`f26` varchar(13) default NULL,

`f27` varchar(39) default NULL,

`f28` int(11) default NULL,

`f29` float default NULL,

`f30` int(11) default NULL,

`f31` smallint(6) default NULL,

`f32` varchar(39) default NULL,

`f33` date default NULL,

`f34` smallint(6) default NULL,

`f35` int(11) NOT NULL,

PRIMARY KEY (`f1`)

);

select count(*) from longf;

+----------+

| count(*) |

+----------+

| 5242880 |

+----------+

And simple PHP script which retrieves all rows from the table:

< ?php
$link = mysql_connect('localhost', 'root', '')
   or die('Could not connect: ' . mysql_error());
mysql_select_db('test') or die('Could not select database');


$time_start = microtime(true);

// Performing SQL query
$query = 'SELECT * FROM longf';
$result = mysql_query($query) or die('Query failed: ' . mysql_error());

// retrive result
while ($line = mysql_fetch_array($result, MYSQL_NUM)) {
}

// Free resultset
mysql_free_result($result);

$time_end = microtime(true);
$time = $time_end - $time_start;

echo "execution time: ".$time." s";


mysql_close($link);
?>

< ?php

$link = mysql_connect('localhost', 'root', '')

or die('Could not connect: ' . mysql_error());

mysql_select_db('test') or die('Could not select database');

$time_start = microtime(true);

// Performing SQL query

$query = 'SELECT * FROM longf';

$result = mysql_query($query) or die('Query failed: ' . mysql_error());

// retrive result

while ($line = mysql_fetch_array($result, MYSQL_NUM)) {

}

// Free resultset

mysql_free_result($result);

$time_end = microtime(true);

$time = $time_end - $time_start;

echo "execution time: ".$time." s";

mysql_close($link);

Inital execution time is : 99.04 s.

That’s a bit long and can be improved. How? Let’s look mysql_query function: http://www.php.net/manual/en/function.mysql-query.php. Unfortunately there is nothing interesting for our current task excluding link to mysql_unbuffered_query and this link says:

mysql_unbuffered_query() sends a SQL query query to MySQL, without fetching and buffering the result rows automatically, as mysql_query() does.

So mysql_query buffers all rows on client side. Let’s try our script with mysql_unbuffered_query instead of mysql_query.
execution time: 54.34 s
Good improvement.

The identical results we can get with mysqi_ family. There is mysqli_query function which gets two parameters and second one is result mode : MYSQLI_USE_RESULT or MYSQLI_STORE_RESULT. (Also there can be used pair mysql_real_query / mysql_use_result or mysql_store_result, which are the same). mysqli_query with MYSQLI_STORE_RESULT (used by default) is equal mysql_query, and with MYSQLI_USE_RESULT – is equal to mysql_unbuffered_query.
Results:
MYSQLI_STORE_RESULT: 97.83s
MYSQLI_USE_RESULT: 52.89 s

Next thing which can help is prepared statements. Why?
Let me quote Peter:

If prepared statements are not used server needs to convert millions of values from int to the
string and client has to do reverse conversion. This takes time. Furthemore string data takes more space which also slow down things a bit.

So script with prepared statements:

< ?php
$mysqli = new mysqli("localhost", "root", "", "test");

/* check connection */
if (mysqli_connect_errno()) {
   printf("Connect failed: %s\n", mysqli_connect_error());
   exit();
}

$query = 'SELECT * FROM longf';


$time_start = microtime(true);

if ($stmt = $mysqli->prepare($query)) {

   /* execute query */
   $stmt->execute();
   $stmt->bind_result($n1, $n2,$n3, $n4,$n5, $n6,$n7, $n8,$n9, $n10,$n11, $n12,$n13, $n14,$n15, $n16,$n17, $n18,$n19, $n20,$n21, $n22, $n23, $n24,$n25, $n26,$n27, $n28,$n29, $n30,$n31, $n32, $n33, $n34, $n35);

   /* store result */

  $stmt->store_result();
   
  while ($stmt->fetch()) {
   }

   printf("Number of rows: %d.\n", $stmt->num_rows);

   /* free result */
   $stmt->free_result();

   /* close statement */
   $stmt->close();
}



$time_end = microtime(true);
$time = $time_end - $time_start;

echo "execution time: ".$time." s";

/* close connection */
$mysqli->close();
?>

< ?php

$mysqli = new mysqli("localhost", "root", "", "test");

/* check connection */

if (mysqli_connect_errno()) {

printf("Connect failed: %s\n", mysqli_connect_error());

exit();

}

$query = 'SELECT * FROM longf';

$time_start = microtime(true);

if ($stmt = $mysqli->prepare($query)) {

/* execute query */

$stmt->execute();

$stmt->bind_result($n1, $n2,$n3, $n4,$n5, $n6,$n7, $n8,$n9, $n10,$n11, $n12,$n13, $n14,$n15, $n16,$n17, $n18,$n19, $n20,$n21, $n22, $n23, $n24,$n25, $n26,$n27, $n28,$n29, $n30,$n31, $n32, $n33, $n34, $n35);

/* store result */

$stmt->store_result();

while ($stmt->fetch()) {

}

printf("Number of rows: %d.\n", $stmt->num_rows);

/* free result */

$stmt->free_result();

/* close statement */

$stmt->close();

}

$time_end = microtime(true);

$time = $time_end - $time_start;

echo "execution time: ".$time." s";

/* close connection */

$mysqli->close();

Please note prepared statements also can use $stmt->store_result(); to buffer the result on client side.
The results:

with $stmt->store_result(): 55.20s
without $stmt->store_result(): 39.77s.

So unbuffered query improves execution time by 1.83 times, and unbuffered query with prepared statement protocol by 2.53 times.
It is often though mysql_store_result is faster than mysql_use_result, as you can see in certain conditions it could be opposite.
Prepared statements are typically considered for performance reasons if statement needs to be executed several time – as you can see there is other cases as well – if you have query which retrieves a lot of data from MySQL prepared statements may also perform better, probably because of binary protocol and so no data conversion to string as with standard protocol

12 Comments

Oldest

Newest Most Voted

Inline Feedbacks

View all comments

Robert Hopson

17 years ago

I noticed you’re connecting to localhost; do you think that in many environments there is external db server, and the buffering might more positively affect performance?

I’d be happy to help test this if you wanted to share your test database.

Vadim Tkachenko

Author

17 years ago

Robert,

Sure,
here is data: http://mysql.apachephp.com/files/longf.sql.gz
Yes, buffering might a have positive effect in case with distributed server – but how much it should be tested.

P. Kuiper

17 years ago

Hi Vadim,

Thanks for pointing this out. I tried this on my database and I came up with the following results:

mysql_query:
Number of rows: 1000000. execution time: 24.53382897377 s

mysql_unbuffered_query: (same code, only changing mysql_query to mysql_unbuffered_query)
Number of rows: 1000000. execution time: 16.347587108612 s

I repeated this a few times and the above times seem to be persistent, ie almost the same for every run.

Now here is the kicker:

Using the new mysqli and prepared statement based on the code you provided:

Number of rows: 1000000. execution time: 1.9737339019775 s

BTW my table (InnoDB) has 10+ million records in there. The query I used:
SELECT * FROM current LIMIT 1000000;

I had to limit it to 1 million because it didn’t really like the whole 10 million 🙂 (The waiting on the result drove me nuts)

Though I have to admit, the first time I ran this code it took 14 seconds. After a rerun it was around 1.9 to 2.0 s. Does mysql cache these prepared statements perhaps? Either way the performance benefit is worthwhile 🙂

Thank you for pointing this out. The code used for the last test is actually a few lines more and has somewhat more complicated due to the statement preparing etc. This shows the shortest and easiest path sure isn’t always the best 🙂

Is there any reason you used the OO style in your example instead of the procedural style? Myself, I used the procedural style.

Regards,

Peter

P. Kuiper

17 years ago

Oh, my MySQL is connected to my webserver through a direct (cross) cable. So no connecting to localhost 🙂

Apachez

17 years ago

This does also exist in the perl api for mysql (dbi/dbd) and can be used as:

$dbh->{‘mysql_use_result’}=0; #disable
$dbh->{‘mysql_use_result’}=1; #enable

and since mysql 4.1 server based prepares can be used with:

$dbh->{‘mysql_server_prepare’} = 1;

however the server based prepares will always use store_result instead of use_result for some reason according to some docs/forumposts.

One note, in the perl documentation it is said that use_result tends to block other processes. Is this also the case with php?

If Vadim has the possibility, could you please test the benchmark with say 10 och 100 concurrent runs of this testscript with use_result vs with store_result to see if some blocking migh occur which will make use_result slower than store_result for multiuser environments (like if you have more than 1 concurrent mysql query) ?

Vadim Tkachenko

Author

17 years ago

Peter Kuiper,

Thank you! Great results!

I think even better results you can get with native C API and prepared statements.
It looks like in PHP binded variables have additional processing, but in C they are mapped to native
C variables and result is fetched without overhead.

About OO – no special reason. I just copy-pasted parts of code from PHP manual.
In common I also prefer procedural way, as historically OO works slow in PHP.

Vadim Tkachenko

Author

17 years ago

Apachez,

mysql_use_result blocks other threads in next sense
(qoute from http://dev.mysql.com/doc/refman/5.0/en/mysql-use-result.html)

On the other hand, you shouldn’t use mysql_use_result() if you are doing a lot of processing for each row on the client side, or if the output is sent to a screen on which the user may type a ^S (stop scroll). This ties up the server and prevent other threads from updating any tables from which the data is being fetched.

Also other drawbacks of mysql_use_result:
When using mysql_use_result(), you must execute mysql_fetch_row() until a NULL value is returned, otherwise, the unfetched rows are returned as part of the result set for your next query. The C API gives the error Commands out of sync; you can’t run this command now if you forget to do this!

You may not use mysql_data_seek(), mysql_row_seek(), mysql_row_tell(), mysql_num_rows(), or mysql_affected_rows() with a result returned from mysql_use_result(), nor may you issue other queries until mysql_use_result() has finished.

Peter Zaitsev

Admin

17 years ago

Vadim,

This problem of mysql_use_result mainly applies to MyISAM tables as table locks will be held for long time. Also it only applies to the cases then temporary table is not used to store full result set before sending it. For Innodb tables it is practically never problem due to onsistent reads for MyISAM tables it can be worked around by using SQL_BUFFER_RESULT option… It does buffering in temporary table. It is likely to kill all speed benefit on mysql_use_result however 🙂

Robert Hopson

17 years ago

I ran this code on my MySQL 4.1 server, connecting remotely. Averages for several runs:

Initial Time: 73.3s
Unbuffered: 59.3s

Using MySQLi: 39.46
MySQLi w/o store_result(): 26.44

So it would seem that your findings hold true for remote servers. Perhaps on a more congested network there would be a point when buffering would provide some benefit?

Thanks for taking the time to investigate this.

Vadim Tkachenko

Author

17 years ago

Robert,

I think you are right about congested networks – I’d expect on connection with big latency the buffering makes sence. Maybe I will test it somehow 🙂

Marc

14 years ago

When do I need MYSQLI_STORE_RESULT and $stmt->store_result()

Is there only a benefit if I receive many rows or are they useful for simple selects with LIMIT 10 or similar?

What are the disadvantages?

regards

TimEricsson

17 years ago

Thank you for this explaination.

I won more than 30 seconds on one request

MySQL 5.7
End of Life

Compare Percona to Leading Database Solutions

Software
Downloads

Product
Documentation

Resource Hub

Financial Services

Driving Database Success

Percona Blog

Percona Community Hub

Percona Events Hub

About Percona

Percona in the News

Our Customers

Our Partners

Careers

Contact Us

Handling big result sets

Related

Related Blog Articles

RECOMMENDED ARTICLES

Valkey/Redis: Not-So-Good Practices

Valkey/Redis: Configuration Best Practices

Valkey/Redis: The Hash Datatype

MOST POPULAR ARTICLES

Auditing login attempts in MySQL

Deploy Django on Kubernetes With Percona Operator for PostgreSQL

MySQL “Got an error reading communication packet”

MySQL 5.7 End of Life

Compare Percona to Leading Database Solutions

Software Downloads

Product Documentation

Resource Hub

Financial Services

Driving Database Success

Percona Blog

Percona Community Hub

Percona Events Hub

About Percona

Percona in the News

Our Customers

Our Partners

Careers

Contact Us

Handling big result sets

Related

Share This Post!

Want to get weekly updates listing the latest blog posts?

Related Blog Articles

RECOMMENDED ARTICLES

Valkey/Redis: Not-So-Good Practices

Valkey/Redis: Configuration Best Practices

Valkey/Redis: The Hash Datatype

MOST POPULAR ARTICLES

Auditing login attempts in MySQL

Deploy Django on Kubernetes With Percona Operator for PostgreSQL

MySQL “Got an error reading communication packet”

MySQL 5.7
End of Life

Software
Downloads

Product
Documentation