August 1, 2014

MySQL Prepared Statements

If you care about archiving best performance in your application using MySQL you should learn about prepared statements. These do not neccesary provide performance beneft but they may, they also have other benefits.

As a quick introduction – before MySQL 4.1 there were only textual statements and textual protocol for data transfer – query was sent as text and result returned back as text. For example number 123 would be sent as string “123″. Such protocol had serious performance implication – queries had to be parsed fully each time, all return values had to be converted to the strings on server side and back on the client side, which is pretty expensive especially for certain data types. Furthermore BLOBs require escaping as not all characters could be used in textual protocol, which not only consumed time but also required extra memory consumption both on server and client.

So in MySQL 4.1 Prepared statement came. Do not mix these native Server Side Prepared Statements with emulated prepared statements which JDBC, ODBC and some other drivers had forever. Now query passed to the server contained placeholders for parameters, for example “select name from user where id=?” and executing query with different parameters did not require passing full query to the server and full parsing. (Note: Optimization phase currently remains). Not only this is faster if you execute statement serveral times but also it is more secure – it saves you from many variants of “SQL Injection” – passing malformed data so query changes it meaning.

You also can bind direct variables to return columns – which means not only data does not need to be converted back and forth but you also save extra data copying from row structure to your variables.

So how much extra performance can you get by using prepared statements ? Results can vary. In certain cases I’ve seen 5x+ performance improvements when really large amounts of data needed to be retrieved from localhost – data conversion can really take most of the time in this case. It could also reduce performance in certain cases because if you execute query only once extra round trip to the server will be required, or because query cache does not work.

I’ve done a simple benchmark (using SysBench) to see performance of simple query (single row point select) using standard statement, prepared statement and have it served from query cache. Prepared statements give 2290 queries/sec which is significantly better than 2000 with standard statements but it is still well below 4470 queries/sec when results are served from query cache.

So there are good reasons to use prepared statements:

  1. Save on query parsing
  2. Save on data conversion and copying
  3. Avoid SQL Injection
  4. Save memory on handling blobs

There are also drawbacks and chewats of using prepared statements:

  1. Query cache does not work
  2. Extra server round trip required if statement used only once
  3. Not all statements can be prepared. So you can’t use prepared API exclusively you’ll need to fall back to normal API for some statements
  4. Newer and sometimes buggy code. I had a lot of problems with PHP prepared statements. It is getting better but still it is less mature than standard API
  5. You can’t use placeholders in place of all identifiers. For example you can’t use them for table name. In certain version it even does not work for LIMIT boundaries
  6. Inconvenient list handling. Unlike in for example PEAR emulated prepard statements there is no nice way to pass list of values to IN
  7. Harder tracing. Logs were now fixed to include full statement text not only “Execute” but in SHOW INNODB STATUS you would still see statements without actual values – quite inonvenient for analyses.

Few notes:

There is no statement cache – You can allocate multiple copies of same prepared statement and they will each use separate structures on the server. It does not matter if you do it from same connection or multiple connections.

Do not forget to close prepared statements – Many memory leaks reported in MySQL Server turned out to be prepare statements or cursors which were forgotten to be closed. Watch Com_stmt_prepare and Com_stmt_close to see if you’re closing all prepared statements. In newer versions you can also use prepared_stmt_count variable to track number of open statements diretly. You can also adjust max_prepared_stmt_count variable which limits how many statements can be open at the same time to avoid overload.

Check execution rate You can take a look at Com_stmt_execute to Com_stmt_prepare ratio to see how many times each statement is executed. If it is executed only once standard statements might be better for you, or might be you just need to cache prepared statements in your application (avoid closing and recycle). Note in some cases it still can be better to use prepared statements even if you use each only once due to data conversion.

Try them if API supports fast switch The great thing I love about Java MySQL API is it can switch between prepared and non-prepared statements transparently (in most cases). So you can try one way and another and see which one works best. Some other API have the same advantage.

In general I hope there is great future ahead for MySQL Prepared statements – many of the problems such as extra round trip, non working query cache or non working prepared statements for some of the statements as well as most of other problems can be fixed. MySQL Development team just need some resources allocated to complete this feature.

About Peter Zaitsev

Peter managed the High Performance Group within MySQL until 2006, when he founded Percona. Peter has a Master's Degree in Computer Science and is an expert in database kernels, computer hardware, and application scaling.

Comments

  1. Could you explain why it is that the query cache is bypassed?

    Will this ever be fixed?

    Luckily Tailrank doesn’t ever need the query cache because we use an HTTP proxy and memcached which totally removes the DB from the equation.

    Kevin

  2. peter says:

    Kevin,

    Query cache is simply not implemented for prepared statements which use different protocol and would need a bit different algorithm for caching – such as parameter values need to be stored together with query text.

    I honestly do not think it will be fixed soon if ever. Prepared statements were added in 4.1 and 5.1 beta does not have the fix still.

    I guess one of the reason is – simple applications do not use prepared statements and more complex ones use more efficient caching, such as memcached :)

  3. Apachez says:

    Ehm I found that not true.

    I use only prepared statements (prepared in perl dbi/dbd, no server prepares) on my site (http://www.tbg.nu) and the query cache efficiency is at around 55-60%. This will bring me both speed and security through placeholders.

    Also the server based prepares has some issues If im not mistaken regarding non SELECT/INSERT/UPDATE/DELETE queries. But I dont know if that has to do with perl dbi/dbd or mysql server. The issues occurs when using $dbh->{‘mysql_server_prepare’} = 1;

  4. Apachez says:

    Uhh nevermind, that nap seems to be healthy :p

    I see now that this blogitem speaks about serverbased prepares…

  5. peter says:

    Yeh Apachez,

    That are frequently mixed, especially on APIs such as Java, Perl, or ODBC which do prepared statement emulation on client if server supports it. On C or PHP it is hard to make a mistake as API is different, unless you use wrappers of course.

  6. Thanks for a great post on prepared statements. I was very surprised to read that prepared statements do not use a statement cache. We currently have a great hit ration on our query cache, and rarely loop over insert statements, so it’s good to know that prepared statements would not do anything beneficial for our performance there.

  7. Jeff says:

    It’s quite silly that prepared statments don’t allow any caching. A prepared statment would often be cached just as inteligently as using memcached precisely because your caching based solely on the paramaters.

  8. peter says:

    Jeff,

    This is fixed in MySQL 5.1

    But you’re right it should have done long ago

  9. Some micro-benchmarks on this subject, using JDBC:

    http://euedge.com/blog/2007/11/11/prepared-statement-performance-in-mysql/

  10. selvakumar micheal says:

    Hi, i need an urgent help, In mysql Without using the prepare statement how to assign dynamicaly a table name in a procedure. the sample coding is given bellow.

    create procedure table_name(x varchar(100))
    begin
    select * from x;
    end;|

    call table_name(‘books’);

    It shows an error message table x doesnot existt

    Any one can help me how to solve this problem

  11. Bimal says:

    Whatever be the case, the prepared statements are always good, if supported. Consider security, against the speed!

  12. Scott Marlowe says:

    Bimal, that only works if you’re not building HUGE sites. If you’re building amazon.com part II, then you’ll have to go with the way that runs fast enough to handle the load. Web sites so big they have to have multiple IPs because you run out of ports on the front end web servers, and with a farm of db servers. For the CMS portion of a site like that you’d need to add something like memcached if you stuck with prepared statements. But the transactional side would be better off in innodb tables with transactional semantics over it. For that prepared statements would be a perfect match.

    So I disagree that prepared statements are always good. If they’re 1,000 times slower than text statements, it’s likely you’ll switch to text rather than buy a 2048 CPU machine with 128G of ram or something like that, or a farm of equivalent size. When handling money definitely. When handling slashdot, not so much.

  13. Jim Lyons says:

    I devised a benchmark, comparing updating a table using a statement-at-a-time versus using a prepare/execute pair. The table had a million rows and each row was updated.

    The statement-at-a-time file looked like:

    reset query cache;
    update company set data_source_id = data_source_id – 1 where id = 12256006995591;
    update company set data_source_id = data_source_id – 1 where id = 120749009859430;
    .
    . a million rows of updates
    .

    The prepare/execute file looked like:

    reset query cache;
    prepare upd from
    “update company set data_source_id = data_source_id + 1 where id = ?”;
    set @id = 12256006995591;
    execute upd using @id;
    set @id = 120749009859430;
    execute upd using @id;
    .
    . a million rows repeating set @id = … execute upd using @id
    .

    I had thought the prepare/execute file would run faster, since supposedly mysql had already done a hard parse on the command. But it ran significantly slower (about 25%) than the statement-at-a-time version. This was consistent over several tries. Also, I tried one version using Perl DBI and its prepare and execute commands and it just did the same data in half the time of the statement-at-a-time version.

    There’s not any likelihood of extraneous events screwing up the results. I ran the commands one right after the other, several times, and in different order. Also, they were run on the same machine, my own private server, so there was no contention from other processes to account for the difference. There was no query cache defined and I used different operators for the 2 statements so no command was the same as any other. I even reset the query cache even though it wasn’t being used. So, I’m convinced the results are real.

    Why did the prepare/execute script take longer?

  14. peter says:

    Jim,

    The Perl/DBI should have run faster… strange if it did not. The way to run via command line with “execute using @id” simply has much more statements so it should be faster.
    Are you sure you used prepared statements in Perl ? Some versions of DBI will have emulated prepared statements by default so you will not use PS on server in reality.

  15. Jim Lyons says:

    The perl did run faster – a lot faster. sorry if i wasn’t clear about that.

    The tests using the statement-at-a-time approach and the prepare statements were both stored in ASCII text files. For the prepare approach, I did one prepare and a million “set … execute” pairs, withe the statement-at-a-time approach, I did a million “update ” commands. I am baffled why the statement-at-a-time approach is faster than the command line “set … execute” approach.

  16. peter says:

    Good.
    Set/Execute is designed for Prepared Statement testing rather than production use. You should use language provided prepared statements API if you care about performance.

  17. Alex says:

    Hello!
    There are a few unneeded queries during this kind of method

    callableStatement = connection.prepareCall(“{ call SOME_PROCEDURE_NAME(?, ?, ?, ?, ?) }”);
    when using jdbc java connectorJ

    They are (see in logs):
    Query SELECT DATABASE()
    Query SHOW CREATE PROCEDURE `db_name`.`SOME_PROCEDURE_NAME`
    Query SELECT name, type, comment FROM mysql.proc WHERE name like ‘SOME_PROCEDURE_NAME’ and db ‘db_name’ ORDER BY name

    And it can influence on the performance.
    Does anybody know how to make them disappear?
    The most important to avoid is this one – SELECT DATABASE()
    Thank you in advance!
    Alex

  18. faicker says:

    Beginning with 5.1.17, prepared statements use the query cache under certain conditions, which differ depending on the preparation method:

    http://dev.mysql.com/doc/refman/5.1/en/query-cache-operation.html

  19. julian0zzx says:

    I got a very curious result when I test PreparedStatement in Java. My Driver is 5.1.6 and MySQL is 5.1.
    Four scenario I have test:
    1/ insert tbl (,,,) values(,,,), (,,,),(,,,), … // a long sql sent to mysql
    2/ for () {insert tbl(,,,) values(,,,)} // a short sql sent to mysql 1000*10 times
    3/ preparedstatement(insert tbl(,,,) values(?,?,?,?)); for (ps.setXxx(1,xx), ,,, addBatch) ps.executeBatch() // 1000*10 times
    4/ insert tbl (,,,) values(?,?,?,?),(?,?,?,?),(?,?,?,?),,,1000*10,,,(?,?,?,?)
    The result is:
    1/ ~2000ms
    2/ ~9500ms
    3/ ~9500ms
    4/ ~300ms
    on my env.
    I donot know why the 4th is most effecient and 1st is better than 3, can anyone help me?

    thx!

  20. David says:

    What code do you use to benchmark?

  21. Mark says:

    Thanks for the article and good list of pros/cons. mysqli’s prepared statements are complexly restrictive with dynamic queries . What do you or others reading this many-year old article think about safemysql?

    https://github.com/colshrapnel/safemysql
    Code:
    https://github.com/colshrapnel/safemysql/blob/master/safemysql.class.php

    He’s passing data separately from queries with placeholders which seems to accomplish the same security-wise as mysqli prepared statements, no? If so the best approach may be to widely use a wrapper class like safemysql with prepared statements used for oft-repeated queries, to get the speed advantage of re-use they offer.

    Thoughts on using safemysql?

    Thanks

  22. Lio says:

    Hi guys

    Im not a very good english listener, so be patient with me please.

    I use everyday php, so to use prepared statement in that language, i need to use mysqli or pdo, not conventional right ?

    Thanks a lot

    Best regards

  23. Hi Lio,

    This is a very old post (from 2006). I encourage you to ask your question on Percona’s discussion forums, where your peers along with Percona experts go every day to ask and share advice. http://www.percona.com/forums

Speak Your Mind

*