I prefer to use Integers for joins whenever possible and today I worked with client which used character keys, in my opinion without a big need. I told them this is suboptimal but was challenged with rightful question about the difference. I did not know so I decided to benchmark.

The results below are for MySQL 5.1.18 using MyISAM and Innodb tables. This time unlike other benchmarks I decided to do Join not on primary key and have query to read data for both tables. If the query would be index covering I would expect us to see different ratio. The query I use here is constructed to stress out join code while avoid sending data to the client Do not try to find any good meaning for query or schema. For joins which fetch just few rows difference is likely to be less as the join code itself is likely to be responsible for less portion of response time.

OK. Lets start with first simple MyISAM table and join query performed on INT fields:

So what about Innodb ? Innodb executed the same query in 2.9 seconds which was a bit disappointing for me as I expected MyISAM to be slower due to amount of extra system calls it has to read row data from OS Cache.

So what if we convert i and j columns to varchar while sticking to utf8 character set which we had as default ? Joining on Char columns completes in 4.5 seconds on Innodb which is about 50% slower compared to Joining on Int, for MyISAM however the time became 11.0 seconds which is over 6 times slower than joining on the integer.

In fact this was expected as MyISAM uses key compression for varchar columns so random key lookups become significantly slower. I tried to set pack_keys=0 which typically helps in similar cases but it looks like there is regression bug and this setting does not work any more.

The next test I decided to do is to convert Innodb table to latin1 character set. I was expected this to shorten some internal buffers MySQL has to allocate for key comparison as well as have comparison function significantly faster. The reason for this test was – latin1 encoding is enough for most character based keys – uuid, sha1/md5 based etc.

Latin1 encoding indeed gave significant improvement to 3.5 seconds which is just 20% slower than integer based join for Innodb.

Finally I decided to check if using longer strings slows down things significantly and so I replaced all numbers with their sha1() hashes which still made eq join to run the same ways but gave me much longer keys. The performance dropped down to 6.1 seconds which makes it over 2 times slower compared to integer based join.

So how do I read these results ?

  1. CHAR keys are indeed slower for joins compared to integer keys
  2. Performance degradation can range from few percent to couple of times for Innodb tables
  3. MyISAM Tables may suffer significantly if key compression is not disabled
  4. Joining on Shorter CHAR keys is significantly faster than Long keys
  5. Latin1 (I guess any simple encoding) is significantly faster for joins compared to UTF8

These tests were preformed for in memory tables, for IO bound workload results are likely to be different as longer indexes expected to have much worse cache fit, especially if you keep them unpacked.

16 Comments
Oldest
Newest Most Voted
Inline Feedbacks
View all comments
Sergei Golubchik

Did you also try latin1 on MyISAM tables ? It’s interesting to see the difference between utf8 and latin1 for MyISAM. And when you say “latin1” what collation do you mean ? Not latin1_german2_ci (which doesn’t allow byte-wise comparison), I hope ?

viktors

Good post! It would be interesting how BIGINT keys compare to INTs – often tables using BIGINTs are the same tables which require the most performance. Also, when comparing SHA1 columns, did you try BINARY/VARBINARY to prevent case-insensitive comparison (which could lower performance)?

vinjvinj

Did you try using char instead of varchar? Most guid used in the db are always of fixed length.

Kevin Burton

“Changing column types from INT to BIGINT slowed down join speed for MyISAM tables from 1.7 seconds to 3.8 seconds”

How much data? Could it have been just reading them off disk since bigints are twice as large?

I’m going to have to play with this since we use latin1 SHA1 varchars for PKs.

Pajz

Hi Peter may be your computer has low memory.
This comment is base in My SQL documenration try again to retreive the data..

Before MySQL 5.0.3, a CHAR or VARCHAR column with a length specification greater than 255 is converted to the smallest TEXT type that can hold values of the given length. For example, VARCHAR(500) is converted to TEXT, and VARCHAR(200000) is converted to MEDIUMTEXT. Similar conversions occur for BINARY and VARBINARY, except that they are converted to a BLOB type

Alexey

Is VARBINARY same as VARCHAR/binary?

Rob

any Mysql Podcast other than She BA Podcast that are good?
sorry this is off topic I know….

Cheers Rob

rafael

how to convert MYSQL ENUM (char type) into ENUM (int type)

Rakesh

I believe this is true from across the database environments; be it MS-SQL or MySQL or ORACLE or DB2 or any other.

Michael Evans

I know it’s been a while since this was posted, but I don’t believe the comparison was done fairly. An int(10) is only 4 bytes long, whereas char(10) is truly 10 bytes. Just from the size difference alone I would except char(10) to be slower because at the very least you’re comparing 2.5 times as many bytes–just like switching from int to bigint slowed things down because bigint is 8 bytes.

I wonder how a char(4) would compare to an int, since at least then you’re comparing the same number of bytes.