February 3, 2012

Researching your MySQL table sizes

I posted a simple INFORMATION_SCHEMA query to find largest tables last month and it got a good response. Today I needed little modifications to that query to look into few more aspects of data sizes so here it goes:

Find total number of tables, rows, total data in index size for given MySQL Instance

SELECT count(*) tables,
  concat(round(sum(table_rows)/1000000,2),'M') rows,
  concat(round(sum(data_length)/(1024*1024*1024),2),'G') data,
  concat(round(sum(index_length)/(1024*1024*1024),2),'G') idx,
  concat(round(sum(data_length+index_length)/(1024*1024*1024),2),'G') total_size,
  round(sum(index_length)/sum(data_length),2) idxfrac
FROM information_schema.TABLES;
+--------+----------+---------+--------+------------+---------+
| tables | rows     | data    | idx    | total_size | idxfrac |
+--------+----------+---------+--------+------------+---------+
|   1538 | 1623.91M | 314.00G | 36.86G | 350.85G    |    0.12 |
+--------+----------+---------+--------+------------+---------+
1 row in set (52.56 sec)

Find the same data using some filter
I often use similar queries to find space used by particular table “type” in sharded environment when multiple tables with same structure and similar name exists:

SELECT count(*) tables,
       concat(round(sum(table_rows)/1000000,2),'M') rows,
       concat(round(sum(data_length)/(1024*1024*1024),2),'G') data,
       concat(round(sum(index_length)/(1024*1024*1024),2),'G') idx,
       concat(round(sum(data_length+index_length)/(1024*1024*1024),2),'G') total_size,
       round(sum(index_length)/sum(data_length),2) idxfrac
       FROM information_schema.TABLES
       WHERE  table_name like "%performance_log%";
+--------+---------+---------+-------+------------+---------+
| tables | rows    | data    | idx   | total_size | idxfrac |
+--------+---------+---------+-------+------------+---------+
|    120 | 370.29M | 163.97G | 0.00G | 163.97G    |    0.00 |
+--------+---------+---------+-------+------------+---------+
1 row in set (0.03 sec)

Find biggest databases

SELECT
        count(*) tables,
        table_schema,concat(round(sum(table_rows)/1000000,2),'M') rows,
        concat(round(sum(data_length)/(1024*1024*1024),2),'G') data,
        concat(round(sum(index_length)/(1024*1024*1024),2),'G') idx,
        concat(round(sum(data_length+index_length)/(1024*1024*1024),2),'G') total_size,
        round(sum(index_length)/sum(data_length),2) idxfrac
        FROM information_schema.TABLES
        GROUP BY table_schema
        ORDER BY sum(data_length+index_length) DESC LIMIT 10;
+--------+--------------------+-------+-------+-------+------------+---------+
| tables | table_schema       | rows  | data  | idx   | total_size | idxfrac |
+--------+--------------------+-------+-------+-------+------------+---------+
|     48 | cacti              | 0.01M | 0.00G | 0.00G | 0.00G      |    0.72 |
|     17 | mysql              | 0.00M | 0.00G | 0.00G | 0.00G      |    0.18 |
|      4 | pdns               | 0.00M | 0.00G | 0.00G | 0.00G      |    1.00 |
|      2 | test               | 0.00M | 0.00G | 0.00G | 0.00G      |    0.12 |
|     16 | information_schema | NULL  | 0.00G | 0.00G | 0.00G      |    NULL |
+--------+--------------------+-------+-------+-------+------------+---------+
5 rows in set (0.32 sec)

Data Distribution by Storage Engines
You can change this query a bit and get most popular storage engines by number of tables or number of rows instead of data stored.

SELECT engine,
        count(*) tables,
        concat(round(sum(table_rows)/1000000,2),'M') rows,
        concat(round(sum(data_length)/(1024*1024*1024),2),'G') data,
        concat(round(sum(index_length)/(1024*1024*1024),2),'G') idx,
        concat(round(sum(data_length+index_length)/(1024*1024*1024),2),'G') total_size,
        round(sum(index_length)/sum(data_length),2) idxfrac
        FROM information_schema.TABLES
        GROUP BY engine
        ORDER BY sum(data_length+index_length) DESC LIMIT 10;
+------------+--------+---------+---------+--------+------------+---------+
| engine     | tables | rows    | data    | idx    | total_size | idxfrac |
+------------+--------+---------+---------+--------+------------+---------+
| MyISAM     |   1243 | 941.06M | 244.09G | 4.37G  | 248.47G    |    0.02 |
| InnoDB     |    280 | 682.82M | 63.91G  | 32.49G | 96.40G     |    0.51 |
| MRG_MyISAM |      1 | 13.66M  | 6.01G   | 0.00G  | 6.01G      |    0.00 |
| MEMORY     |     14 | 0.00M   | 0.00G   | 0.00G  | 0.00G      |    NULL |
+------------+--------+---------+---------+--------+------------+---------+
4 rows in set (14.02 sec)

Trivial but handy.

About Peter Zaitsev

Peter managed the High Performance Group within MySQL until 2006, when he founded Percona. Peter has a Master's Degree in Computer Science and is an expert in database kernels, computer hardware, and application scaling.

Comments

  1. Some very handy queries here, Peter. Thank you. This is going to my bookmarks.

  2. ryan says:

    These have defiantly gone into my snippets list, thanks.

    Also, BTW…
    Your syntax highlighter has a spelling error. When you switch to ‘PLAIN TEXT’, the option to switch back to the highlighted version is tagged ‘HILITED HTML’ – ought to be ‘HIGHLIGHTED HTML’.

  3. ryan, that’s just an American slang version of the word, not really a misspelling.

  4. ryan says:

    Hum, learned two things today then :-)

  5. Prashanth Gajra says:

    Anybody can let me know how can i get the total number of tables in database without using a SELECT query but by using a stored procedure?

  6. peter says:

    Well what is the problem with select query ?
    You can wrap it around by stored procedure if you need to.

  7. Alexey Kupershtokh says:

    Could you remove all the “->” in order to allow people copy-paste right from here?

  8. Neil says:

    Why does SUM(table_rows) fluctuate so wildly? I run this query:

    SELECT CONCAT(SUM(table_rows)/1000000,’M') AS no_rows FROM information_schema.TABLES WHERE TABLE_SCHEMA = ”;

    and no_rows was first 2.73M, then 2.83M, then 2.81M, then 2.78M, then 2.81M…?

    Why does this vary?

  9. It’s a restriction of InnoDB. See http://dev.mysql.com/doc/refman/5.0/en/innodb-restrictions.html. It gives an approximate count, which fluctuates a lot.

  10. Taha Patel says:

    This is really cool. Thanks.

  11. Gerry says:

    Another vote for putting the queries on one line for copy and pasting.

  12. bernard says:

    excellent post, but one question. How to find total no. of indexes/index keys (not index size) and unique keys, ordered keys.I know you have the solution, please post it. Thanks in advance.

  13. Easier to copy paste:

    SELECT count(*) TABLES, concat(round(sum(table_rows)/1000000,2),’M') rows, concat(round(sum(data_length)/(1024*1024*1024),2),’G') DATA, concat(round(sum(index_length)/(1024*1024*1024),2),’G') idx,
    concat(round(sum(data_length+index_length)/(1024*1024*1024),2),’G') total_size, round(sum(index_length)/sum(data_length),2) idxfrac
    FROM information_schema.TABLES;

  14. Nice article! Really good to analyse data usage!

  15. narendra says:

    how to find only size of the table…?

Speak Your Mind

*