Testing InnoDB “Barracuda” format with compression
New features of InnoDB - compression format and fast index creation sound so promising so I spent some time to research time and sizes on data we have on our production. The schema of one of shards is
-
CREATE TABLE `article87` (
-
`id` bigint(20) UNSIGNED NOT NULL,
-
`ext_key` varchar(32) NOT NULL,
-
`site_id` int(10) UNSIGNED NOT NULL,
-
`forum_id` int(10) UNSIGNED NOT NULL,
-
`thread_id` varchar(255) CHARACTER SET latin1 NOT NULL,
-
`published` timestamp NOT NULL DEFAULT '0000-00-00 00:00:00',
-
`crawled` timestamp NOT NULL DEFAULT '0000-00-00 00:00:00',
-
`subject` varchar(255) NOT NULL,
-
`title` varchar(255) NOT NULL,
-
`url` varchar(255) NOT NULL,
-
`num_links` smallint(6) NOT NULL,
-
`links_in` int(10) UNSIGNED NOT NULL,
-
`cache_author` varchar(255) NOT NULL,
-
`cache_site` varchar(255) DEFAULT NULL,
-
`anchor` varchar(255) NOT NULL,
-
`isthread` tinyint(3) UNSIGNED NOT NULL,
-
`author_id` int(10) UNSIGNED NOT NULL,
-
`inserted` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
-
`fromfile` varchar(255) NOT NULL,
-
`language_id` tinyint(3) UNSIGNED NOT NULL,
-
`encoding` varchar(255) NOT NULL,
-
`warning` mediumtext NOT NULL,
-
`is_thread_start` tinyint(3) UNSIGNED NOT NULL,
-
`source` mediumint(8) UNSIGNED NOT NULL,
-
`hash` char(32) NOT NULL,
-
`mod_is` tinyint(3) UNSIGNED NOT NULL DEFAULT '0',
-
`is_adult` tinyint(3) UNSIGNED NOT NULL DEFAULT '0',
-
`bodyuc` mediumtext NOT NULL,
-
PRIMARY KEY (`id`),
-
KEY `ext_key` (`ext_key`),
-
KEY `forum_id` (`forum_id`,`thread_id`,`published`),
-
KEY `site_id` (`site_id`,`published`),
-
KEY `hash` (`hash`),
-
KEY `forum_id_2` (`forum_id`,`is_thread_start`,`published`),
-
KEY `published` (`published`),
-
KEY `inserted` (`inserted`),
-
KEY `forum_id_3` (`forum_id`,`thread_id`,`is_thread_start`),
-
KEY `site_id_2` (`site_id`,`author_id`)
-
) ENGINE=InnoDB DEFAULT CHARSET=latin1;
in fact this is not exact schema - difference is we are using DEFAULT CHARSET=utf8, which seems not working with fast index creation (bug 33650) , so I use latin1 for tests. Also we actually store 'bodyuc' as compressed field, doing compress()/uncompress() to store/restore. To test InnoDB compression I use only uncompressed text, though I tested compression on compress()-ed data just to see if there is any benefit from index compression.
For test I use dump created with mysqldump, final size 30286M.
To create InnoDB compression table I use ENGINE=InnoDB KEY_BLOCK_SIZE=8 and to test fast index creation I create table only with primary key, without additional indexes.
Numbers I got:
load dump into table with default format and all indexes (case 1) - 152m34.792s
size of final .ibd file for (case 1) - 43032M
What if load data and indexes separately:
load dump into table with default format and without indexes (case 2a) - 59m12.575s
size of .ibd after (case 2a) - 36968M
create all indexes (case 2b) - 18min46.05s
size of .ibd after (case 2b) - 40400M
So as you see time to load (case 2) (with fast create index way) is faster almost two times and space decreased by 6%.
Now with compressed table:
load dump into table with default format and all indexes (case 3) - 228m55.251s (time took 1.5 times longer than (case 1) )
size of final .ibd file for (case 3) - 16284M (space ratio is 0.37 compare to (case 1) )
What if load data and indexes separately:
load dump into table with default format and without indexes (case 4a) - 71m10.760s (longer 1.2 times than (case 2a))
size of .ibd after (case 4a) - 13208M (smaller 0.35 times)
create all indexes (case 4b) - 42m54.63sec (longer 2.28 times)
size of .ibd after (case 4b) - 14968M (smaller 0.37 times)
Total time for (case 4) , 6844 sec, took 1.46 times longer than (case 2) , 4678 sec, but space decreased by even more ( 0.37 from original)
Also interesting to note that fast index creation allows to load time two time faster than usual load.
Ok, There are a lot of numbers, let me summarize it:
| Test | Load time, sec | Size, MB |
| Baseline (1), default format, usual load | 9154 | 43032 |
| default format, fast index creation | 4678 (0.51)* | 40400 (0.93) |
| compress format, default load | 13735 (1.50) | 16284 (0.38) |
| compress format, fast index creation | 6844 (0.74) | 14968 (0.35) |
* - ratio to baseline
So in conclusion
- Fast Index creation allows to speedup load 2 times (even indexes fit into memory)
- Load in compress format slower by 30-50%
- Table in compress format takes only 1/3 of original table
Of course we also expect significant performance gain for I/O bound queries in case of compression tables, this is topic for different research.
3 Comments











del.icio.us
digg
You did what am currently doing. Great job, thanks..
Comment :: April 23, 2008 @ 11:14 am
It would be very interesting to see how large the performance gains were for I/O bound queries for the compression tables.
Comment :: April 23, 2008 @ 12:00 pm
2Daniel Crabtree: Check out my post regarding possible performance gain (of course, it was a specific use case, but still could be interesting: http://www.mysqlperformanceblog.com/2008/04/23/real-life-use-case-for-barracuda-innodb-file-format/
Comment :: April 23, 2008 @ 2:30 pm