September 28, 2009

How number of columns affects performance ?

Posted by peter |

It is pretty understood the tables which have long rows tend to be slower than tables with short rows. I was interested to check if the row length is the only thing what matters or if number of columns we have to work with also have an important role. I was interested in peak row processing speed so I looked at full table scan in case data fits in OS cache completely. I created 3 tables – First containing single tinyint column which is almost shortest type possible (CHAR(0) could be taking less space), table with 1 tinyint column and char(99) column and table with 100 tinyint columns. The former two tables have the same row length but have number of column different 50 times. Finally I have created 4th table which is also 100 columns but one of them is VARCHAR causes raw format to be dynamic.

[read more...]

Why InnoDB index cardinality varies strangely

Posted by Baron Schwartz |

This is a very old draft, from early 2007 in fact. At that time I started to look into something interesting with the index cardinality statistics reported by InnoDB tables. The cardinality varies because it's derived from estimates, and I know a decent amount about that. The interesting thing I wanted to look into was why the cardinality varies in a particular pattern.

Here I'll grab a bunch of cardinality estimates from sakila.film on MySQL 5.0.45 and put them into a file:

CODE:
  1. baron@kanga:~$ while true; do mysql sakila -N -e 'show index from film' | head -n 2 | tail -n 1 | awk '{print $7}'; done> sizes

After a while I cancel it and then sort and aggregate them with counts:

CODE:
  1. baron@kanga:~$ sort sizes | uniq -c
  2. 157 1022
  3. 156 1024
  4. 156 1058
  5. 156 1059
  6. 156 1131
  7. 313 951
  8. 312 952
  9. 312 953

Look at the distribution of the counts. The weighted average of these is 1000.53, so it's close to the truth (1000 rows). But five of the eight distinct estimates are shown about one-half as often as the others; it looks like the random choice of which statistic to use is not evenly distributed.

I mentioned this to Heikki and he pondered it for a bit -- but neither of us really figured out what was going on. I know the code superficially, but not as well as he or Yasufumi or others do; and I was not able to find a cause.

More recently I saw that I'm not the only one who notices oddities in the random number generation. I waited. And indeed the fixes for that bug seemed to have fixed the skew in the statistics. Case solved, and all I had to do was wait. Truly, laziness is a virtue.