May 1, 2008

Learning about MySQL Table Fragmentation

Posted by peter

Recently I was working with the customer who need quick warmup - to get Innodb table fetched in memory as fast as possible to get good in memory access performance.

To do it I run the query: “SELECT count(*) FROM tbl WHERE non_idx_col=0″ I use this particular form of query because it will do full table scan - running count(*) without where clause may pick to scan some small index instead.

If your table is not fragmented one of two things should happen - either you should be reading at your hard drive sequential read rate or you would see MySQL becoming CPU bound if IO subsystem is too fast.

In this case however I saw neither - The vmstat showed read speed less than 10MB/sec which is very low for this system which had 6 15K SAS hard drives in RAID10.

Another indication of bad fragmentation was average IO size seen in SHOW INNODB STATUS output. It was around 20KB which means most reads are single page (16K reads). In case of non fragmented table you would see Innodb sequential read-ahead kick in which does reads in 1MB blocks and so you would see average IO size in hundreds of KB.

Now it is worth to notice you can see poor sequential scan performance even if table is not logically fragmented and Innodb is reading data in large blocks - this can happen in case Innodb table file is itself fragmented.

To check if this is the case I usually do “cat table.ibd > /dev/null” and watch IO statistics. If you see small IO request sizes in iostat and simply read speed. Like for the customer in question I saw file read speed of about 50MB/sec which is of course much better than 10MB/sec but well below RAID array capacity.

To check if file fragmentation is the issue or it is poor or miss configured IO subsystem I do another check by running cat /dev/sdb1 > /dev/null - Physical hard drive should never suffer fragmentation so you can get as much sequential IO as you can get (using IO pattern “cat” uses). In this case I got about 300MB/sec which confirmed file fragmentation is also the issue.

Interesting enough the “cure” for both fragmentation issues is the same - OPTIMIZE TABLE tbl - this command recreates the table by writing the new .ibd file (if you’re using innodb_file_per_table=1) which normally would be much less fragmented because it is written at once. Too bad however it requires table to be locked while it is being rebuilt and also it really only defragments clustered key but not the index.

P.S It would be cool to get Innodb objects (data and Index) fragmentation statistics which actually should not be that hard to implement.

Related posts: :Working with many files and file system fragmentation::RAID System performance surprises::Quickly preloading Innodb tables in the buffer pool:
 

9 Comments »

  1. Would also be nice to get stats about whether and why readahead isn’t working.

    But these might be the same stats….. hm

    Comment :: May 3, 2008 @ 9:01 pm

  2. Kevin,

    I think non working read-ahead is result not the cause - if you have requests coming in a way read-ahead is not triggered it does not work :)

    Comment :: May 3, 2008 @ 9:47 pm

  3. just a note about “Optimize table” fixing file fragmentation and table space fragmentation: For ext-3 this is only true if there is enough contiguous blocks left on the disk for the new file created by optimize table-while the table space fragmentation is still solved with Optimize table.

    http://mysqldba.blogspot.com/2008/01/filefrag-dbas-best-friend.html

    Comment :: May 5, 2008 @ 12:26 pm

  4. Dathan,

    Thanks - Indeed optimize/copy may not succeed defragmenting but this is as much as we can succeed trying to defragment something from application level.
    Even if you do not have enough contiguos blocks in single piece the new file often would still have smaller number of fragments than old one.

    Comment :: May 5, 2008 @ 1:24 pm

  5. You may want to check out filefrag, this will tell you (rather quickly) how many fragments your file has. This works on just about any file system (on linux).

    If running XFS, xfs_bmap gives you much more detailed output. xfs_fsr will also (online) defragment the file. You can also use xfs_io to reserve space (resvspc) for files if you want to do this from the start.

    Comment :: May 5, 2008 @ 8:41 pm

  6. I have been using a similar method to ‘warm up the cache’ so to speak: SELECT 1 FROM (SELECT * FROM tablename) x LIMIT 1;
    Is there any benefit from doing it your way over this way or are they pretty much the same thing?

    Comment :: May 6, 2008 @ 7:47 pm

  7. Jay,

    Your method will require temporary table creation in MySQL which can be very large so it is better to use different query patterns as highlighted.

    Comment :: May 6, 2008 @ 11:33 pm

  8. I posted this to MySQL:
    http://bugs.mysql.com/bug.php?id=36588

    Originally we had this discussion in our Forums:
    http://www.webyog.com/forums//index.php?showtopic=3925

    Is this related, you think. Or am I overlooking something basic?

    The culprit is that also SHOW TABLE STATUS can be very slow. Same for queries to Information_Schema fetching the same information. But it seems not always so - I have tried on my laptop and here it is not the case. Framentation makes the difference?

    Comment :: May 11, 2008 @ 1:40 am

  9. Peter,

    Good catch though it does not have anything to do with table fragmentation. Simply if you’re querying number of rows in table for Innodb it performs estimation - this estimation requires several random dives in the clustered index, which causes IO for large table sizes.

    Comment :: May 12, 2008 @ 10:01 pm

 

Subscribe without commenting


This page was found by: mysql fragmented tab... mysql find fragmente... table fragmentation ... defragmenting a inno... optimize mysql defra...