The other day I had a case with an awful performance of a rather simple join. It was a join on tb1.vid = CONCAT(‘prefix-‘, tb2.id) with tb1.vid – indexed varchar(100) and tb2.id – int(11) column. No matter what I did – forced it to use key, forced a different join order – it did not want to use tb1.vid index for it. And no surprise it was way too slow, the number of rows analyzed was really huge:

Then I took a look at MySQL manual and here’s a short quote about CONCAT:

…If all arguments are non-binary strings, the result is a non-binary string. If the arguments include any binary strings, the result is a binary string. A numeric argument is converted to its equivalent binary string form; if you want to avoid that, you can use an explicit type cast

OK, let’s check if that really helps:

Much better now.

18 Comments
Oldest
Newest Most Voted
Inline Feedbacks
View all comments
paul

I am surprised to see this blog posting, as there is nothing to be careful about joining on concat, simply if you are doing that your DB design is grossly wrong. Stop and think how ridiculous it is to be concat’ing a string on every query and joining on it.

Dale

I agree. I had the same initial thought. I could not think of a time that I would need to use a concat() within the where clause. The whole idea behind a good scheme model is that you use good keys that match up without needing further tweaking.

Daniel Schneller

While I generally agree that the schema design is suboptimal if you get into situations like this, sometimes you just cannot help it. Unfortunately(?) we do not live in a perfect world, and existing systems cannot always easily be changed. So it is important to know the caveats that might bite you if you have to do such things.

Peter Zaitsev

Dale, Paul

This would be the case if we would live in the perfect world. True in the database with good design you would not join on CONCAT but there are a lot of databases which are written with not so good design, and these are the ones which we’re called to fix quite commonly.

Often to do it right you would need to redo quite a lot, however the customer may not be ready for that in many cases so we end up squeezing as much as we can from the current schema which brings us to deal with strange queries which bring various weird issues.

Peter Zaitsev

Thanks Daniel,

I wanted to add one more thing about it – generally this choice of making Number a binary string is counter intuitive to me.

The number is a string and for numbers it does not matter if it is case sensitive and case insensitive so I’d see CONCAT to simply handle numbers by converting them to the type of other argument.

Though I do not know may be this more intuitive solution would some ugly side effects.

paul

@peter

I understand where your coming from and whilst I don’t recommend wrapping crap design with hacks, you could actually eliminate the concat from the select query by adding an additional column to the table and use a trigger which updates this additional column by performing the concat at point of data insert/update. Then by changing the select query to use our new column we have eliminated the concat on where clause and have got all the speed benefits of a properly designed schema.

ron

Thanks for posting this!

I had a query with a CONCAT in it (the db I was forced to use had a bad schema) that was running really slow. Wrapping the integer part of the query in an explicit CAST made a huge difference.

Barry Fisher

Thanks very much. Reduced a timeout server load down to 6 seconds for me. Fantastic advice!

Pierre

Thanks! This worked perfectly!

Daniel

Thanks for your tip!

As some mentioned above, it would be great to not have to deal with these performance-killers but it still happens…
You really helped me out 🙂

Hopefully i will have time soon to fix the real problem with the whole setup..

Pierre

Thanks! I already had to deal twice with this situation. There are always people idealizing things, but practical solutions like this can save huge amount of time. Not always things can be redesigned from scratch…

Titan

Hi there! Is there any ways to EXPLAIN the SQL which is imported from the SOURCE command! @@

Frank

Of course that one might need CONCAT in real world, only unexperienced person might think that’s not necessary. I need to calculate IBAN from bank number+acc.number+sufix country number (3 separate columns that by design can only be separate). And cast or convert isn’t doing proper job with a number with 23 digits.

Damodaran

Hi..It had a remarkable performance improvement..Thanks for the post.

Well said Frank..

“Of course that one might need CONCAT in real world, only unexperienced person might think that’s not necessary.”

williamjacques

Thanks for sharing very useful Tips.

johnives

Hi, Aurimas..

Excellent post.I want to thank you for this informative read, I really appreciate sharing this great post. Keep up your work.