October 21, 2008

JOIN Performance & Charsets

Posted by Ryan Lowe |

We have written before about the importance of using numeric types as keys, but maybe you've inherited a schema that you can't change or have chosen string types as keys for a specific reason. Either way, the character sets used on joined columns can have a significant impact on the performance of your queries.

Take the following example, using the InnoDB storage engine:

SQL:
  1. CREATE TABLE `t1` (
  2. `char_id` char(6) NOT NULL,
  3. `v` varchar(128) NOT NULL,
  4. PRIMARY KEY (`char_id`)
  5. ) ENGINE=InnoDB DEFAULT CHARSET=utf8;
  6.  
  7. CREATE TABLE `t2` (
  8. `id` int UNSIGNED NOT NULL AUTO_INCREMENT,
  9. `char_id` char(6) NOT NULL,
  10. `v` varchar(128) NOT NULL,
  11. PRIMARY KEY (`id`),
  12. KEY (`char_id`)
  13. ) ENGINE=InnoDB DEFAULT CHARSET=latin1;

t1 has been populated with 100,000 records, while t2 has 400,000. The data set fit easily in memory.

SQL:
  1. mysql> EXPLAIN EXTENDED SELECT SQL_NO_CACHE COUNT(t1.char_id)
  2.      >  FROM t1
  3.      >  JOIN t2 USING (char_id)\G
  4. *************************** 1. row ***************************
  5.            id: 1
  6.   select_type: SIMPLE
  7.         TABLE: t2
  8.          type: INDEX
  9. possible_keys: NULL
  10.           KEY: char_id
  11.       key_len: 6
  12.           ref: NULL
  13.          rows: 394424
  14.         Extra: USING INDEX
  15. *************************** 2. row ***************************
  16.            id: 1
  17.   select_type: SIMPLE
  18.         TABLE: t1
  19.          type: eq_ref
  20. possible_keys: PRIMARY
  21.           KEY: PRIMARY
  22.       key_len: 18
  23.           ref: func
  24.          rows: 1
  25.         Extra: USING WHERE; USING INDEX
  26. 2 rows IN SET, 1 warning (0.00 sec)
  27.  
  28. mysql> SHOW WARNINGS\G
  29. *************************** 1. row ***************************
  30.   Level: Note
  31.    Code: 1003
  32. Message: SELECT sql_no_cache count(`test`.`t1`.`char_id`) AS `COUNT(t1.char_id)` FROM `test`.`t1` JOIN `test`.`t2` WHERE (`test`.`t1`.`char_id` = convert(`test`.`t2`.`char_id` USING utf8))
  33. 1 row IN SET (0.00 sec)

Notice the differences in key_len and the explicit call to CONVERT() in the WHERE clause. This is a result of the joined columns being of different character sets.

The above query took an average of 4.33 seconds to execute with t1 as utf8 and t2 as latin1. Converting both tables to utf8 resulted in an average execution time of 3.12 seconds and had the following EXPLAIN:

SQL:
  1. *************************** 1. row ***************************
  2.            id: 1
  3.   select_type: SIMPLE
  4.         TABLE: t1
  5.          type: INDEX
  6. possible_keys: PRIMARY
  7.           KEY: PRIMARY
  8.       key_len: 18
  9.           ref: NULL
  10.          rows: 99414
  11.         Extra: USING INDEX
  12. *************************** 2. row ***************************
  13.            id: 1
  14.   select_type: SIMPLE
  15.         TABLE: t2
  16.          type: ref
  17. possible_keys: char_id
  18.           KEY: char_id
  19.       key_len: 18
  20.           ref: test.t1.char_id
  21.          rows: 1
  22.         Extra: USING INDEX
  23. 2 rows IN SET, 1 warning (0.00 sec)
  24.  
  25. mysql> SHOW WARNINGS\G
  26. *************************** 1. row ***************************
  27.   Level: Note
  28.    Code: 1003
  29. Message: SELECT sql_no_cache count(`test`.`t1`.`char_id`) AS `COUNT(t1.char_id)` FROM `test`.`t1` JOIN `test`.`t2` WHERE (`test`.`t2`.`char_id` = `test`.`t1`.`char_id`)
  30. 1 row IN SET (0.00 sec)

Notice here how there is no CONVERT() required and the key_len on both tables match. Just this simple change resulted in more than a 25% improvement in average execution time, from 4.33 to 3.12 seconds.

This test was performed with MySQL 5.0.67, FreeBSD 7, on a box with 2GB RAM.

Related posts: :Using delayed JOIN to optimize count(*) and LIMIT queries::Join performance of MyISAM and Innodb::xtrabackup-0.6:
 

No Comments »

 

Subscribe without commenting

Trackbacks/Pingbacks