April 16, 2014

Be careful when joining on CONCAT

The other day I had a case with an awful performance of a rather simple join. It was a join on tb1.vid = CONCAT(‘prefix-’, tb2.id) with tb1.vid – indexed varchar(100) and tb2.id – int(11) column. No matter what I did – forced it to use key, forced a different join order, it did not want to use tb1.vid index for it. And no surprise it was way too slow, the number of rows analyzed was really huge:

Then I took a look at MySQL manual and here’s a short quote about CONCAT:

…If all arguments are non-binary strings, the result is a non-binary string. If the arguments include any binary strings, the result is a binary string. A numeric argument is converted to its equivalent binary string form; if you want to avoid that, you can use an explicit type cast

OK, let’s check if that really helps:

Much better now.

About Aurimas Mikalauskas

Aurimas joined Percona in 2006, a few months after Peter and Vadim founded the company. His primary focus is on high performance, but he also specializes in full text search, high availability, content caching techniques and MySQL data recovery.

Comments

  1. paul says:

    I am surprised to see this blog posting, as there is nothing to be careful about joining on concat, simply if you are doing that your DB design is grossly wrong. Stop and think how ridiculous it is to be concat’ing a string on every query and joining on it.

  2. Dale says:

    I agree. I had the same initial thought. I could not think of a time that I would need to use a concat() within the where clause. The whole idea behind a good scheme model is that you use good keys that match up without needing further tweaking.

  3. While I generally agree that the schema design is suboptimal if you get into situations like this, sometimes you just cannot help it. Unfortunately(?) we do not live in a perfect world, and existing systems cannot always easily be changed. So it is important to know the caveats that might bite you if you have to do such things.

  4. peter says:

    Dale, Paul

    This would be the case if we would live in the perfect world. True in the database with good design you would not join on CONCAT but there are a lot of databases which are written with not so good design, and these are the ones which we’re called to fix quite commonly.

    Often to do it right you would need to redo quite a lot, however the customer may not be ready for that in many cases so we end up squeezing as much as we can from the current schema which brings us to deal with strange queries which bring various weird issues.

  5. peter says:

    Thanks Daniel,

    I wanted to add one more thing about it – generally this choice of making Number a binary string is counter intuitive to me.

    The number is a string and for numbers it does not matter if it is case sensitive and case insensitive so I’d see CONCAT to simply handle numbers by converting them to the type of other argument.

    Though I do not know may be this more intuitive solution would some ugly side effects.

  6. paul says:

    @peter

    I understand where your coming from and whilst I don’t recommend wrapping crap design with hacks, you could actually eliminate the concat from the select query by adding an additional column to the table and use a trigger which updates this additional column by performing the concat at point of data insert/update. Then by changing the select query to use our new column we have eliminated the concat on where clause and have got all the speed benefits of a properly designed schema.

  7. Aurimas says:

    Paul,

    thanks. Actually, what you mentioned – was my very first recommendation for the customer. Thing is, in this case – there is a number of different tables in place of tb2 joined and in each case – different prefix is used. As query is really executed only occasionally, having it executed in less than a second (instead of few minutes) is enough and it does not cause additional disk/memory waste.

    But, I agree- schema should be planned to avoid that sort of join in advance.

  8. ron says:

    Thanks for posting this!

    I had a query with a CONCAT in it (the db I was forced to use had a bad schema) that was running really slow. Wrapping the integer part of the query in an explicit CAST made a huge difference.

  9. Thanks very much. Reduced a timeout server load down to 6 seconds for me. Fantastic advice!

  10. Pierre says:

    Thanks! This worked perfectly!

  11. Daniel says:

    Thanks for your tip!

    As some mentioned above, it would be great to not have to deal with these performance-killers but it still happens…
    You really helped me out :)

    Hopefully i will have time soon to fix the real problem with the whole setup..

  12. Pierre says:

    Thanks! I already had to deal twice with this situation. There are always people idealizing things, but practical solutions like this can save huge amount of time. Not always things can be redesigned from scratch…

  13. Titan says:

    Hi there! Is there any ways to EXPLAIN the SQL which is imported from the SOURCE command! @@

  14. Titan, the short answer is No. You could probably craft something, but it’s not going to be something as simple as EXPLAIN SOURCE.

    Aurimas

  15. Frank says:

    Of course that one might need CONCAT in real world, only unexperienced person might think that’s not necessary. I need to calculate IBAN from bank number+acc.number+sufix country number (3 separate columns that by design can only be separate). And cast or convert isn’t doing proper job with a number with 23 digits.

  16. Damodaran says:

    Hi..It had a remarkable performance improvement..Thanks for the post.

    Well said Frank..

    “Of course that one might need CONCAT in real world, only unexperienced person might think that’s not necessary.”

Speak Your Mind

*