October 16, 2007

Be careful when joining on CONCAT

Posted by Aurimas Mikalauskas

The other day I had a case with an awful performance of a rather simple join. It was a join on tb1.vid = CONCAT('prefix-', tb2.id) with tb1.vid - indexed varchar(100) and tb2.id - int(11) column. No matter what I did - forced it to use key, forced a different join order, it did not want to use tb1.vid index for it. And no surprise it was way too slow, the number of rows analyzed was really huge:

SQL:
  1. mysql> EXPLAIN
  2.     -> SELECT
  3.     ->  tb1.*
  4.     -> FROM tb2
  5.     -> STRAIGHT_JOIN tb1
  6.     -> WHERE
  7.     -> (
  8.     ->    tb1.vid LIKE 'prefix-%' AND
  9.     ->    tb1.vid = CONCAT('prefix-', tb2.ID) AND
  10.     ->    tb2.gid = 1337
  11.     -> ) ORDER BY tb1.title ASC LIMIT 4\G
  12. *************************** 1. row ***************************
  13.            id: 1
  14.   select_type: SIMPLE
  15.         TABLE: tb2
  16.          type: ref
  17. possible_keys: gid
  18.           KEY: gid
  19.       key_len: 4
  20.           ref: const
  21.          rows: 53
  22.         Extra: USING WHERE; USING TEMPORARY; USING filesort
  23. *************************** 2. row ***************************
  24.            id: 1
  25.   select_type: SIMPLE
  26.         TABLE: tb1
  27.          type: ALL
  28. possible_keys: vid
  29.           KEY: NULL
  30.       key_len: NULL
  31.           ref: NULL
  32.          rows: 570518
  33.         Extra: USING WHERE
  34. 2 rows IN SET (0.00 sec)

Then I took a look at MySQL manual and here's a short quote about CONCAT:

...If all arguments are non-binary strings, the result is a non-binary string. If the arguments include any binary strings, the result is a binary string. A numeric argument is converted to its equivalent binary string form; if you want to avoid that, you can use an explicit type cast...

OK, let's check if that really helps:

SQL:
  1. mysql> EXPLAIN
  2.     -> SELECT
  3.     ->  tb1.*
  4.     -> FROM tb2
  5.     -> STRAIGHT_JOIN tb1
  6.     -> WHERE
  7.     -> (
  8.     ->    tb1.vid LIKE 'prefix-%' AND
  9.     ->    tb1.vid = CONCAT('prefix-', CAST(tb2.ID AS CHAR)) AND
  10.     ->    tb2.gid = 1337
  11.     -> ) ORDER BY tb1.title ASC LIMIT 4\G
  12. *************************** 1. row ***************************
  13.            id: 1
  14.   select_type: SIMPLE
  15.         TABLE: tb2
  16.          type: ref
  17. possible_keys: gid
  18.           KEY: gid
  19.       key_len: 4
  20.           ref: const
  21.          rows: 53
  22.         Extra: USING WHERE; USING TEMPORARY; USING filesort
  23. *************************** 2. row ***************************
  24.            id: 1
  25.   select_type: SIMPLE
  26.         TABLE: tb1
  27.          type: ref
  28. possible_keys: vid
  29.           KEY: vid
  30.       key_len: 101
  31.           ref: func
  32.          rows: 2
  33.         Extra: USING WHERE
  34. 2 rows IN SET (0.00 sec)

Much better now.

Related posts: :Researching your MySQL table sizes::Using INFORMATION_SCHEMA instead of shell scripting::Finding out largest tables on MySQL Server:
 

8 Comments »

  1. 1. paul

    I am surprised to see this blog posting, as there is nothing to be careful about joining on concat, simply if you are doing that your DB design is grossly wrong. Stop and think how ridiculous it is to be concat’ing a string on every query and joining on it.

    Comment :: October 16, 2007 @ 4:24 pm

  2. 2. Dale

    I agree. I had the same initial thought. I could not think of a time that I would need to use a concat() within the where clause. The whole idea behind a good scheme model is that you use good keys that match up without needing further tweaking.

    Comment :: October 16, 2007 @ 5:52 pm

  3. While I generally agree that the schema design is suboptimal if you get into situations like this, sometimes you just cannot help it. Unfortunately(?) we do not live in a perfect world, and existing systems cannot always easily be changed. So it is important to know the caveats that might bite you if you have to do such things.

    Comment :: October 17, 2007 @ 2:07 am

  4. Dale, Paul

    This would be the case if we would live in the perfect world. True in the database with good design you would not join on CONCAT but there are a lot of databases which are written with not so good design, and these are the ones which we’re called to fix quite commonly.

    Often to do it right you would need to redo quite a lot, however the customer may not be ready for that in many cases so we end up squeezing as much as we can from the current schema which brings us to deal with strange queries which bring various weird issues.

    Comment :: October 17, 2007 @ 2:10 am

  5. Thanks Daniel,

    I wanted to add one more thing about it - generally this choice of making Number a binary string is counter intuitive to me.

    The number is a string and for numbers it does not matter if it is case sensitive and case insensitive so I’d see CONCAT to simply handle numbers by converting them to the type of other argument.

    Though I do not know may be this more intuitive solution would some ugly side effects.

    Comment :: October 17, 2007 @ 2:14 am

  6. 6. paul

    @peter

    I understand where your coming from and whilst I don’t recommend wrapping crap design with hacks, you could actually eliminate the concat from the select query by adding an additional column to the table and use a trigger which updates this additional column by performing the concat at point of data insert/update. Then by changing the select query to use our new column we have eliminated the concat on where clause and have got all the speed benefits of a properly designed schema.

    Comment :: October 17, 2007 @ 3:34 am

  7. Paul,

    thanks. Actually, what you mentioned - was my very first recommendation for the customer. Thing is, in this case - there is a number of different tables in place of tb2 joined and in each case - different prefix is used. As query is really executed only occasionally, having it executed in less than a second (instead of few minutes) is enough and it does not cause additional disk/memory waste.

    But, I agree- schema should be planned to avoid that sort of join in advance.

    Comment :: October 17, 2007 @ 4:56 am

  8. [...] He and his partners of Percona write about topics like eliminating ORDER BY function or Be careful when joining on CONCAT. Peter also held presentations at all Mysql Conferences. In 2007, for example he talked about [...]

    Pingback :: November 18, 2007 @ 11:29 am

 



Subscribe without commenting


This page was found by: mysql concat mysql concat perform... use of concat in mys...