If you ever wondered how big is that or another index in InnoDB … you had to calculate it yourself by multiplying size of row (which I should add is harder in the case of a VARCHAR – since you need to estimate average length) on count of records. And it still would be quite inaccurate as secondary indexes tend to take more space. So we added more detailed index statistics into our xtrabackup utility. The thanks for this feature goes to a well known Social Network who sponsored the development.
We chose to put this into xtrabackup for a couple of reasons – the first is that running statistics on your backup database does not need to hurt production servers, and the second reason is that running statistic on a stopped database is more accurate than with online (although online is also supported, but you may have inexact results).
Let’s see how it works. I have one table with size 13Gb what was filled during about 2.5 years.
The table is:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 | CREATE TABLE `link_out104` ( `domain_id` int(10) unsigned NOT NULL, `link_id` int(10) unsigned NOT NULL auto_increment, `url_from` varchar(255) NOT NULL, `url_to` varchar(255) NOT NULL, `anchor` varchar(255) NOT NULL, `from_site_id` int(10) unsigned NOT NULL, `from_forum_id` int(10) unsigned NOT NULL, `from_author_id` int(10) unsigned NOT NULL, `from_message_id` bigint(20) unsigned NOT NULL, `message_published` timestamp NOT NULL default CURRENT_TIMESTAMP, `kind` enum('link','img') NOT NULL, `url_title` varchar(255) NOT NULL, `isexternal` tinyint(3) unsigned NOT NULL, `revert_domain` varchar(255) NOT NULL, `url_prefix` varchar(255) NOT NULL, `from_domain_id` int(10) unsigned NOT NULL, `ext` varchar(25) NOT NULL, `linktype` enum('html','video','mp3','image','pdf','other') NOT NULL, `message_day` date NOT NULL, `mod_is` tinyint(3) unsigned NOT NULL default '0', `is_adult` tinyint(3) unsigned NOT NULL default '0', PRIMARY KEY (`link_id`), UNIQUE KEY `domain_id_2` (`domain_id`,`link_id`), KEY `domain_id` (`domain_id`,`from_site_id`,`message_published`), KEY `revert_domain` (`revert_domain`,`url_prefix`(80)), KEY `from_site_id` (`from_site_id`,`message_published`), KEY `site_message` (`from_site_id`,`message_day`,`isexternal`), KEY `from_message_id` (`from_message_id`,`link_id`) ) ENGINE=InnoDB AUTO_INCREMENT=26141165 DEFAULT CHARSET=utf8; |
And size of file is about 12.88 GB
1 | -rw-r--r-- 1 root root 13832814592 Sep 10 14:41 link_out104.ibd |
So to get statistics we run:
1 | xtrabackup --stats --tables=art.link* --datadir=/mnt/data/mysql/ |
which will show something like this:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 | <INDEX STATISTICS> table: art/link_out104, index: PRIMARY, space id: 12, root page 3 estimated statistics in dictionary: key vals: 25265338, leaf pages 497839, size pages 498304 real statistics: level 2 pages: pages=1, data=5395 bytes, data/pages=32% level 1 pages: pages=415, data=6471907 bytes, data/pages=95% leaf pages: recs=25958413, pages=497839, data=7492026403 bytes, data/pages=91% table: art/link_out104, index: domain_id_2, space id: 12, root page 4 estimated statistics in dictionary: key vals: 27755790, leaf pages 23125, size pages 26495 real statistics: level 2 pages: pages=1, data=510 bytes, data/pages=3% level 1 pages: pages=30, data=393125 bytes, data/pages=79% leaf pages: recs=25958413, pages=23125, data=337459369 bytes, data/pages=89% table: art/link_out104, index: domain_id, space id: 12, root page 5 estimated statistics in dictionary: key vals: 3006231, leaf pages 43255, size pages 49600 real statistics: level 2 pages: pages=1, data=2850 bytes, data/pages=17% level 1 pages: pages=114, data=1081375 bytes, data/pages=57% leaf pages: recs=25953873, pages=43255, data=545031333 bytes, data/pages=76% table: art/link_out104, index: revert_domain, space id: 12, root page 6 estimated statistics in dictionary: key vals: 1204830, leaf pages 133869, size pages 153984 real statistics: level 3 pages: pages=1, data=373 bytes, data/pages=2% level 2 pages: pages=6, data=58143 bytes, data/pages=59% level 1 pages: pages=832, data=9146283 bytes, data/pages=67% leaf pages: recs=25839414, pages=133869, data=1566961607 bytes, data/pages=71% table: art/link_out104, index: from_site_id, space id: 12, root page 7 estimated statistics in dictionary: key vals: 330426, leaf pages 33889, size pages 38848 real statistics: level 2 pages: pages=1, data=1764 bytes, data/pages=10% level 1 pages: pages=84, data=711669 bytes, data/pages=51% leaf pages: recs=25956416, pages=33889, data=441259072 bytes, data/pages=79% table: art/link_out104, index: site_message, space id: 12, root page 8 estimated statistics in dictionary: key vals: 1399286, leaf pages 32260, size pages 36992 real statistics: level 2 pages: pages=1, data=1680 bytes, data/pages=10% level 1 pages: pages=80, data=677460 bytes, data/pages=51% leaf pages: recs=25956043, pages=32260, data=441252731 bytes, data/pages=83% table: art/link_out104, index: from_message_id, space id: 12, root page 9 estimated statistics in dictionary: key vals: 25964521, leaf pages 27979, size pages 28160 real statistics: level 2 pages: pages=1, data=798 bytes, data/pages=4% level 1 pages: pages=38, data=587559 bytes, data/pages=94% leaf pages: recs=25958413, pages=27979, data=441293021 bytes, data/pages=96% |
The output is intensive, let me highlight some points:
1 2 | table: art/link_out104, index: PRIMARY, space id: 12, root page 3 leaf pages: recs=25958413, pages=497839, data=7492026403 bytes, data/pages=91% |
It says that PRIMARY key (which is the table by itself, as InnoDB is clustering data by primary key) takes 497839 pages ( 16KB each) and size of data 7492026403 bytes or (6.98 GB). And density ( fitting data into pages) is quite good – 91%. But it was expected, as table is really mostly inserted in, updates and deletes are rare).
And let’s take index domain_id
1 2 | table: art/link_out104, index: domain_id, space id: 12, root page 5 leaf pages: recs=25953873, pages=43255, data=545031333 bytes, data/pages=76% |
you can see the allocated pages (43255 pages or 708689920 bytes) are filled only by 76% ( data takes 545031333 bytes). And that means that 150MB are just waste of space. Which is really even worse for key revert_domain
1 | leaf pages: recs=25839414, pages=133869, data=1566961607 bytes, data/pages=71% |
.
For this key about 600MB is empty.
This needs a bit of explaining:
This does not have as good efficiency as the primary key, but a lot of this is to be expected. In a lot of cases we insert into the primary key in order which makes things very predictable, but the inserts into the secondary key index are random – which leads to a lot of page splits.
One helpful new feature to address this is in XtraDB/InnoDB plugin – fast index creation. With this feature, InnoDB creates indexes by sort, so page fill factor should be quite good.
To check that, there is xtrabackup –stats for index domain_id created for table in Barracuda format with Fast creation method:
1 2 3 4 5 6 7 | table: art/link_out104, index: domain_id, space id: 15, root page 49160 estimated statistics in dictionary: key vals: 5750565, leaf pages 34383, size pages 34496 real statistics: level 2 pages: pages=1, data=1375 bytes, data/pages=8% level 1 pages: pages=55, data=859575 bytes, data/pages=95% leaf pages: recs=25958413, pages=34383, data=545126673 bytes, data/pages=96% |
As you see this time it takes 34383 pages (compare to 43255 in previous statistics).
Though it would be interesting to see how it will grow with further inserts, and I also suspect random INSERTS into so dense space going to be slower than in previous case.
The –stats is not in xtrabackup release yet, only in source code repository, but should be released quite soon.
And the last point of the post – if you are badly missing some features in MySQL, InnoDB, InnoDB-plugin, XtraDB, XtraBackup – you know whom ask for!
wow this is probably the best feature added “evar”! Thanks for the hard work!
Vadim,
What is leaf pages vs size pages ? “key vals: 1204830, leaf pages 133869, size pages 153984” It looks like size pages could be larger than sum of number of pages on all levels. Is this because there could be some pages which are allocated to the index because of extent allocation but never used or something else ?
This actually tells about 2 different issues with full space utilization – partially filled pages and empty pages in allocated extents.
Another thing I think it would be nice to see values per page as well. This can be computed from the rest of data but it is good if it can be seen. I would use it to understand how efficient range scans could be.
Another thing – how is this stats computed – doing file scan or scanning indexes in the order. It may be good to see information similar to what “filefrag” gives – how badly each index leaf space is fragmented – in perfect world we would prefer to see each of them sequential if we do not use SSDs
I uploaded an awk script to flatten the output from ‘xtrabackup –stats’. It is at http://launchpad.net/mysqlatfacebook/other/files/+download/xtrabackup_flatten.awk. This generates one line of output per index which is suitable for more processing using Unix command line utilities.
Peter,
Hmm…
Simply, “estimated statistics in dictionary:” shows same values to the INNODB TABLE MONITOR.
(“appr.key vals %lu, leaf pages %lu, size pages %lu” : http://dev.mysql.com/doc/refman/5.0/en/innodb-monitors.html#innodb-table-monitor )
So, honestly, I don’t grasp what is the “size pages” exactly yet…
And, I will add information about “recs/page” and “contiguousness”.
(contiguousness like https://www.percona.com/docs/wiki/patches:innodb_check_fragmentation ?)
Yasufumi,
Thanks for explanation. So how this tool will count pages allocated in extents to given index. Will it simply consider these pages empty ? Though I’m not sure the “level” for such pages should be known yet.
Now available in 5.6 mysql.innodb_index_stats :
see http://aadant.com/blog/2014/02/04/how-to-calculate-a-specific-innodb-index-size/