The other day I had a case with an awful performance of a rather simple join. It was a join on tb1.vid = CONCAT(‘prefix-‘, tb2.id) with tb1.vid – indexed varchar(100) and tb2.id – int(11) column. No matter what I did – forced it to use key, forced a different join order – it did not want to use tb1.vid index for it. And no surprise it was way too slow, the number of rows analyzed was really huge:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 | mysql> EXPLAIN -> SELECT -> tb1.* -> FROM tb2 -> STRAIGHT_JOIN tb1 -> WHERE -> ( -> tb1.vid LIKE 'prefix-%' AND -> tb1.vid = CONCAT('prefix-', tb2.ID) AND -> tb2.gid = 1337 -> ) ORDER BY tb1.title ASC LIMIT 4\G *************************** 1. row *************************** id: 1 select_type: SIMPLE table: tb2 type: ref possible_keys: gid key: gid key_len: 4 ref: const rows: 53 Extra: Using where; Using temporary; Using filesort *************************** 2. row *************************** id: 1 select_type: SIMPLE table: tb1 type: ALL possible_keys: vid key: NULL key_len: NULL ref: NULL rows: 570518 Extra: Using where 2 rows in set (0.00 sec) |
Then I took a look at MySQL manual and here’s a short quote about CONCAT:
…If all arguments are non-binary strings, the result is a non-binary string. If the arguments include any binary strings, the result is a binary string. A numeric argument is converted to its equivalent binary string form; if you want to avoid that, you can use an explicit type cast…
OK, let’s check if that really helps:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 | mysql> EXPLAIN -> SELECT -> tb1.* -> FROM tb2 -> STRAIGHT_JOIN tb1 -> WHERE -> ( -> tb1.vid LIKE 'prefix-%' AND -> tb1.vid = CONCAT('prefix-', CAST(tb2.ID AS CHAR)) AND -> tb2.gid = 1337 -> ) ORDER BY tb1.title ASC LIMIT 4\G *************************** 1. row *************************** id: 1 select_type: SIMPLE table: tb2 type: ref possible_keys: gid key: gid key_len: 4 ref: const rows: 53 Extra: Using where; Using temporary; Using filesort *************************** 2. row *************************** id: 1 select_type: SIMPLE table: tb1 type: ref possible_keys: vid key: vid key_len: 101 ref: func rows: 2 Extra: Using where 2 rows in set (0.00 sec) |
Much better now.
I am surprised to see this blog posting, as there is nothing to be careful about joining on concat, simply if you are doing that your DB design is grossly wrong. Stop and think how ridiculous it is to be concat’ing a string on every query and joining on it.
I agree. I had the same initial thought. I could not think of a time that I would need to use a concat() within the where clause. The whole idea behind a good scheme model is that you use good keys that match up without needing further tweaking.
While I generally agree that the schema design is suboptimal if you get into situations like this, sometimes you just cannot help it. Unfortunately(?) we do not live in a perfect world, and existing systems cannot always easily be changed. So it is important to know the caveats that might bite you if you have to do such things.
Dale, Paul
This would be the case if we would live in the perfect world. True in the database with good design you would not join on CONCAT but there are a lot of databases which are written with not so good design, and these are the ones which we’re called to fix quite commonly.
Often to do it right you would need to redo quite a lot, however the customer may not be ready for that in many cases so we end up squeezing as much as we can from the current schema which brings us to deal with strange queries which bring various weird issues.
Thanks Daniel,
I wanted to add one more thing about it – generally this choice of making Number a binary string is counter intuitive to me.
The number is a string and for numbers it does not matter if it is case sensitive and case insensitive so I’d see CONCAT to simply handle numbers by converting them to the type of other argument.
Though I do not know may be this more intuitive solution would some ugly side effects.
@peter
I understand where your coming from and whilst I don’t recommend wrapping crap design with hacks, you could actually eliminate the concat from the select query by adding an additional column to the table and use a trigger which updates this additional column by performing the concat at point of data insert/update. Then by changing the select query to use our new column we have eliminated the concat on where clause and have got all the speed benefits of a properly designed schema.
Paul,
thanks. Actually, what you mentioned – was my very first recommendation for the customer. Thing is, in this case – there is a number of different tables in place of tb2 joined and in each case – different prefix is used. As query is really executed only occasionally, having it executed in less than a second (instead of few minutes) is enough and it does not cause additional disk/memory waste.
But, I agree- schema should be planned to avoid that sort of join in advance.
Thanks for posting this!
I had a query with a CONCAT in it (the db I was forced to use had a bad schema) that was running really slow. Wrapping the integer part of the query in an explicit CAST made a huge difference.
Thanks very much. Reduced a timeout server load down to 6 seconds for me. Fantastic advice!
Thanks! This worked perfectly!
Thanks for your tip!
As some mentioned above, it would be great to not have to deal with these performance-killers but it still happens…
You really helped me out 🙂
Hopefully i will have time soon to fix the real problem with the whole setup..
Thanks! I already had to deal twice with this situation. There are always people idealizing things, but practical solutions like this can save huge amount of time. Not always things can be redesigned from scratch…
Hi there! Is there any ways to EXPLAIN the SQL which is imported from the SOURCE command! @@
Titan, the short answer is No. You could probably craft something, but it’s not going to be something as simple as EXPLAIN SOURCE.
Aurimas
Of course that one might need CONCAT in real world, only unexperienced person might think that’s not necessary. I need to calculate IBAN from bank number+acc.number+sufix country number (3 separate columns that by design can only be separate). And cast or convert isn’t doing proper job with a number with 23 digits.
Hi..It had a remarkable performance improvement..Thanks for the post.
Well said Frank..
“Of course that one might need CONCAT in real world, only unexperienced person might think that’s not necessary.”
Thanks for sharing very useful Tips.
Hi, Aurimas..
Excellent post.I want to thank you for this informative read, I really appreciate sharing this great post. Keep up your work.