April 23, 2008

Testing InnoDB “Barracuda” format with compression

Posted by Vadim |

New features of InnoDB - compression format and fast index creation sound so promising so I spent some time to research time and sizes on data we have on our production. The schema of one of shards is

SQL:
  1. CREATE TABLE `article87` (
  2.   `id` bigint(20) UNSIGNED NOT NULL,
  3.   `ext_key` varchar(32) NOT NULL,
  4.   `site_id` int(10) UNSIGNED NOT NULL,
  5.   `forum_id` int(10) UNSIGNED NOT NULL,
  6.   `thread_id` varchar(255) CHARACTER SET latin1 NOT NULL,
  7.   `published` timestamp NOT NULL DEFAULT '0000-00-00 00:00:00',
  8.   `crawled` timestamp NOT NULL DEFAULT '0000-00-00 00:00:00',
  9.   `subject` varchar(255) NOT NULL,
  10.   `title` varchar(255) NOT NULL,
  11.   `url` varchar(255) NOT NULL,
  12.   `num_links` smallint(6) NOT NULL,
  13.   `links_in` int(10) UNSIGNED NOT NULL,
  14.   `cache_author` varchar(255) NOT NULL,
  15.   `cache_site` varchar(255) DEFAULT NULL,
  16.   `anchor` varchar(255) NOT NULL,
  17.   `isthread` tinyint(3) UNSIGNED NOT NULL,
  18.   `author_id` int(10) UNSIGNED NOT NULL,
  19.   `inserted` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
  20.   `fromfile` varchar(255) NOT NULL,
  21.   `language_id` tinyint(3) UNSIGNED NOT NULL,
  22.   `encoding` varchar(255) NOT NULL,
  23.   `warning` mediumtext NOT NULL,
  24.   `is_thread_start` tinyint(3) UNSIGNED NOT NULL,
  25.   `source` mediumint(8) UNSIGNED NOT NULL,
  26.   `hash` char(32) NOT NULL,
  27.   `mod_is` tinyint(3) UNSIGNED NOT NULL DEFAULT '0',
  28.   `is_adult` tinyint(3) UNSIGNED NOT NULL DEFAULT '0',
  29.   `bodyuc` mediumtext NOT NULL,
  30.   PRIMARY KEY (`id`),
  31.  KEY `ext_key` (`ext_key`),
  32.  KEY `forum_id` (`forum_id`,`thread_id`,`published`),
  33.  KEY `site_id` (`site_id`,`published`),
  34.  KEY `hash` (`hash`),
  35.  KEY `forum_id_2` (`forum_id`,`is_thread_start`,`published`),
  36.  KEY `published` (`published`),
  37.  KEY `inserted` (`inserted`),
  38.  KEY `forum_id_3` (`forum_id`,`thread_id`,`is_thread_start`),
  39.  KEY `site_id_2` (`site_id`,`author_id`)
  40. ) ENGINE=InnoDB DEFAULT CHARSET=latin1;


in fact this is not exact schema - difference is we are using DEFAULT CHARSET=utf8, which seems not working with fast index creation (bug 33650) , so I use latin1 for tests. Also we actually store 'bodyuc' as compressed field, doing compress()/uncompress() to store/restore. To test InnoDB compression I use only uncompressed text, though I tested compression on compress()-ed data just to see if there is any benefit from index compression.

For test I use dump created with mysqldump, final size 30286M.
To create InnoDB compression table I use ENGINE=InnoDB KEY_BLOCK_SIZE=8 and to test fast index creation I create table only with primary key, without additional indexes.

Numbers I got:

load dump into table with default format and all indexes (case 1) - 152m34.792s
size of final .ibd file for (case 1) - 43032M

What if load data and indexes separately:

load dump into table with default format and without indexes (case 2a) - 59m12.575s
size of .ibd after (case 2a) - 36968M
create all indexes (case 2b) - 18min46.05s
size of .ibd after (case 2b) - 40400M

So as you see time to load (case 2) (with fast create index way) is faster almost two times and space decreased by 6%.

Now with compressed table:

load dump into table with default format and all indexes (case 3) - 228m55.251s (time took 1.5 times longer than (case 1) )
size of final .ibd file for (case 3) - 16284M (space ratio is 0.37 compare to (case 1) )

What if load data and indexes separately:

load dump into table with default format and without indexes (case 4a) - 71m10.760s (longer 1.2 times than (case 2a))
size of .ibd after (case 4a) - 13208M (smaller 0.35 times)
create all indexes (case 4b) - 42m54.63sec (longer 2.28 times)
size of .ibd after (case 4b) - 14968M (smaller 0.37 times)

Total time for (case 4) , 6844 sec, took 1.46 times longer than (case 2) , 4678 sec, but space decreased by even more ( 0.37 from original)

Also interesting to note that fast index creation allows to load time two time faster than usual load.

Ok, There are a lot of numbers, let me summarize it:

Test Load time, sec Size, MB
Baseline (1), default format, usual load 9154 43032
default format, fast index creation 4678 (0.51)* 40400 (0.93)
compress format, default load 13735 (1.50) 16284 (0.38)
compress format, fast index creation 6844 (0.74) 14968 (0.35)

* - ratio to baseline

So in conclusion
- Fast Index creation allows to speedup load 2 times (even indexes fit into memory)
- Load in compress format slower by 30-50%
- Table in compress format takes only 1/3 of original table

Of course we also expect significant performance gain for I/O bound queries in case of compression tables, this is topic for different research.

Related posts: :Real-Life Use Case for “Barracuda” InnoDB File Format::INFORMATION_SCHEMA tables in the InnoDB pluggable storage engine::MySQL 5.0, 5.1 and Innodb Plugin CPU Efficiency:
 

3 Comments »

  1. You did what am currently doing. Great job, thanks..

    Comment :: April 23, 2008 @ 11:14 am

  2. It would be very interesting to see how large the performance gains were for I/O bound queries for the compression tables.

    Comment :: April 23, 2008 @ 12:00 pm

  3. 2Daniel Crabtree: Check out my post regarding possible performance gain (of course, it was a specific use case, but still could be interesting: http://www.mysqlperformanceblog.com/2008/04/23/real-life-use-case-for-barracuda-innodb-file-format/

    Comment :: April 23, 2008 @ 2:30 pm

 

Subscribe without commenting

Trackbacks/Pingbacks