August 20, 2014

How much space does empty Innodb table take ?

How much space would empty MyISAM table take ? Probably 8K for .frm file, 1KB for .MYI file and 0 for MYD file. .MYI file can be larger if you have many indexes.

How much space will Innodb take:

Check out files (using Innodb File Per Table)

-rw-rw—- 1 mysql mysql 8578 Dec 16 20:33 test_innodb.frm
-rw-rw—- 1 mysql mysql 98304 Dec 16 20:33 test_innodb.ibd

So we get about 100K and so about 10 times more for MyISAM. This is ignored space which needs to be allocated in main tablespace for Innodb data dictionary. But that one is pretty small.

This is the good reason to avoid having very small Innodb tables – they will take much more space than MyISAM.

So .ibd file we get in case of table having no indexes (besides clustered key) – takes 6*16K pages. I wonder why as much as 6 pages are required for start ?

If we add more indexes to this tables – each further index will take additional 16K page even if it contains no data. This is understandable – each index has to have at least one page allocated to it.

Now it is very interesting – SHOW TABLE STATUS does not seems to show everything:

Such table’s .idb file takes 128K from the start while we only see 16K of data+32K of index, so another 5 pages are invisible. This tells me you can’t use this information to reliably identify space tables take on disk, especially for large number of very small Innodb tables.

Also note amount of free space – even though pages contain no data they are not considered free.

Avg_row_length is another field which may need an explanation. This value is computed by dividing Data_Length (exact number) by number of rows (estimated number) which means this value is going to be changing back and forth and it would be very inaccurate for small tables. For example it will show 16K as average row size for table with one row:

Free Space for tables created in innodb_file_per_table mode is interesting question on its own.

As we populate table we will see Free space will remain at zero as Data_length is small:

When at certain point you will see Innodb Free space to become non zero:

And the file size also jumps significantly (to 9MB):

-rw-rw—- 1 mysql mysql 8578 Dec 16 20:58 test_innodb.frm
-rw-rw—- 1 mysql mysql 9437184 Dec 16 21:06 test_innodb.ibd

If you do the math here you can see there is only about 1MB out of 9MB seen as Index_Length+Data_Length while another 4MB are visible in the “Innodb Free Space” and almost 5MB more is not visible at all.

This tells you it is not about tables which contain couple of rows which can take a lot of space in Innodb. Tables showing as using 1MB of Innodb Data can also really be taking almost 10 times more on the disk.

It is not quite clear to me what is happening here. According to documentation each index should get 2 segments one for non-leaf an one for leaf pages. However the space allocation should happen page by page until whole 32 pages allocated. In the case above no single segment should require more than 32 pages so It is surprising why all of them take 5MB (because 4MB are free)

What is clear however is what if some pages from segment are allocated it goes in the interesting space regarding space reporting – it will be gone from the free space, while only pages actually allocated will be shown in Data_Length and Index_Length fields.

Doing more tests with Inserts I can see Innodb seems to always try to keep at least 4MB free in the tablespace – populating table with more and more data I see free space never falls below 4MB while data file on disk continues to grow.

Finally it is worth to note if you’re using innodb_file_per_table the per table tablespaces are not going to grow by innodb_autoextend_increment – instead file will grow by 1MB to 4MB increments. There is a bug reported about it.

As a Summary I should not the following:

  • Small Innodb tables will take more space on disk than one may anticipate.
  • Using innodb_file_per_table can cause significant space use overhead if small tables are used
  • The information in INFORMATION_SCHEMA can’t be used to judge how much space table is really taking on disk.
About Peter Zaitsev

Peter managed the High Performance Group within MySQL until 2006, when he founded Percona. Peter has a Master's Degree in Computer Science and is an expert in database kernels, computer hardware, and application scaling.

Comments

  1. Peter, very interesting,

    On large tables I have found that the difference is insignificant. There’s always these one or two very large tables which consume most disk space, and when considering disk limitations, they are the one being taken into account.

    Did you happen a lot on databases where there were many small tables, and which constituted the majority of disk space?

  2. peter says:

    Shlomi,

    For large tables the difference between reported size and physical size may be insignificant this is true.

    However there are very different applications around the world. There are applications which have hundreds of thousands and millions of tables. For these this difference is significant.

  3. Pete says:

    Related question: what is the storage cost for sparse INNODb tables, for example tables with 20 columns where 18 often contain NULL values? Is space pre-allocated?

  4. peter says:

    Pete,

    In the recent Innodb versions the NULL values are not simply not stored. Format will be something like column 1 value 2, column 10 value 3 skipping all 8 NULL columns in between.

  5. We’ve been using Innodb tables and finding that the mix of federated and innodb makes any attempt at reporting on innodb via information_schema useless.

    Has anyone run across federated tables causing the inability to report anything useful from information_schema?
    What I have found is that *AnythinG* more complicated than a select table_name, table_schema, engine (no like statements, concat, or math on the fly will not work — see below for the concat ex)

    I have and wonder if there is a good work around for 5.1.30
    mysql Ver 14.14 Distrib 5.1.30, for unknown-linux-gnu (x86_64) using readline 5.1

    All these wonderful ideas to monitor growth seem to get whacked when I run specific queries. Some queries to information_schema are ok though so it isn’t consistently and completely broken.

    I’ve tried to dump the entire set of databases with triggers and functions (-R) to search for it and that won’t work even with –no-data and –force

    One, this db.acct_set table does not exist because there is no database db. I assume it could be defined anywhere.
    Two, the obscure error 1146 (select error) and 1431 (error on mysqldump) don’t get me very far

    mysql> SELECT concat(table_schema,’.’,table_name),concat(round(table_rows/1000000,2),’M’) rows,concat(round(data_length/(1024*1024*1024),2),’G’) DATA,concat(round(index_length/(1024*1024*1024),2),’G’) idx,concat(round((data_length+index_length)/(1024*1024*1024),2),’G’) total_size,round(index_length/data_length,2) idxfrac FROM information_schema.TABLES ORDER BY data_length+index_length DESC LIMIT 10;
    ERROR 1431 (HY000): The foreign data source you are trying to reference does not exist. Data source error: error: 1146 ‘Table ‘db.acct_set’ doesn’t exist’

    mysql> desc db.acct_set;
    ERROR 1146 (42S02): Table ‘manhunt.v4_account_settings’ doesn’t exist

  6. Peter, thanks for these explanations,

    However, I’m facing an important issue on large InnoDB table as well. For instance, I created a table with 1 BIGINT field, 7 DATE fields and 9 TINYINT fields (theorically 38 bytes per row and no variable length field).

    The table contains 30 partitions with 7 millions rows each (total of 210 millions rows).

    The information_schema returns an AVG_ROW_LENGTH of 77 BYTES… and the estimated number of rows is correct.

    How can we explain such a difference from 38 to 77 bytes?

  7. Rick James says:

    Bruno,

    There is a lot of overhead in InnoDB.
    * A 16KB block is never completely full. After a lot of churn, the block splits will lead to an average of a block being 69% full.
    * Due to MVCC, there can be extra copies of a row in the block.
    * Each field (at least non-NULL fields) has 1 or 2 bytes of length, etc, even for fixed-length fields.
    * Each row has (used to have) 29 bytes of overhead.
    Your 77 bytes is actually less than what this info predicts.

    Peter,
    I have empirically determined that “free space” (of TABLE STATUS) does not include free space in the 16KB blocks. And I suspect that, for large tables, completely freed 16KB blocks are not counted as “free” until a 1MB extent (or whatever) is completely freed.
    This makes it difficult to know when to OPTIMIZE a table (or REORGANIZE PARTITION) because one cannot see how much would be freed.

  8. Esad says:

    Peter, thank you for your explanations.
    But now i am wondering, should i use innodb or move on with myisam?

Speak Your Mind

*