Getting History of Table Sizes in MySQL

One data point which is very helpful but surprisingly few people have is the history of the table sizes. Projection of data growth is very important component for capacity planning and simply watching the growth of space used on partition is not very helpful.

Now as MySQL 5.0+ has information schema collecting and keeping this data is very easy:

create database stats;
use stats;
CREATE TABLE `tables` (
`DAY` date not null,
`TABLE_SCHEMA` varchar(64) NOT NULL DEFAULT '',
`TABLE_NAME` varchar(64) NOT NULL DEFAULT '',
`ENGINE` varchar(64) DEFAULT NULL,
`TABLE_ROWS` bigint(21) unsigned DEFAULT NULL,
`DATA_LENGTH` bigint(21) unsigned DEFAULT NULL,
`INDEX_LENGTH` bigint(21) unsigned DEFAULT NULL,
`DATA_FREE` bigint(21) unsigned DEFAULT NULL,
`AUTO_INCREMENT` bigint(21) unsigned DEFAULT NULL,
PRIMARY KEY(DAY,TABLE_SCHEMA,TABLE_NAME),
KEY(TABLE_SCHEMA,TABLE_NAME)
) ENGINE=INNODB DEFAULT CHARSET=utf8;

create database stats;

use stats;

CREATE TABLE `tables` (

`DAY` date not null,

`TABLE_SCHEMA` varchar(64) NOT NULL DEFAULT '',

`TABLE_NAME` varchar(64) NOT NULL DEFAULT '',

`ENGINE` varchar(64) DEFAULT NULL,

`TABLE_ROWS` bigint(21) unsigned DEFAULT NULL,

`DATA_LENGTH` bigint(21) unsigned DEFAULT NULL,

`INDEX_LENGTH` bigint(21) unsigned DEFAULT NULL,

`DATA_FREE` bigint(21) unsigned DEFAULT NULL,

`AUTO_INCREMENT` bigint(21) unsigned DEFAULT NULL,

PRIMARY KEY(DAY,TABLE_SCHEMA,TABLE_NAME),

KEY(TABLE_SCHEMA,TABLE_NAME)

) ENGINE=INNODB DEFAULT CHARSET=utf8;

And use this query to populate it:

INSERT INTO stats.TABLES
SELECT DATE(NOW()),
       TABLE_SCHEMA,
       TABLE_NAME,
       ENGINE,
       TABLE_ROWS,
       DATA_LENGTH,
       INDEX_LENGTH,
       DATA_FREE,
       AUTO_INCREMENT
FROM   INFORMATION_SCHEMA.TABLES;

INSERT INTO stats.TABLES

SELECT DATE(NOW()),

TABLE_SCHEMA,

TABLE_NAME,

ENGINE,

TABLE_ROWS,

DATA_LENGTH,

INDEX_LENGTH,

DATA_FREE,

AUTO_INCREMENT

FROM INFORMATION_SCHEMA.TABLES;

I put it to the cron to run nightly as:

1  0   *  *    *    mysql -u root -e "INSERT INTO stats.tables SELECT DATE(NOW()),TABLE_SCHEMA,TABLE_NAME,ENGINE,TABLE_ROWS,DATA_LENGTH,INDEX_LENGTH,DATA_FREE,AUTO_INCREMENT FROM INFORMATION_SCHEMA.TABLES"

1	1 0 * * * mysql -u root -e "INSERT INTO stats.tables SELECT DATE(NOW()),TABLE_SCHEMA,TABLE_NAME,ENGINE,TABLE_ROWS,DATA_LENGTH,INDEX_LENGTH,DATA_FREE,AUTO_INCREMENT FROM INFORMATION_SCHEMA.TABLES"

Though if you’re looking to keep it completely inside MySQL you can create appropriate event in MySQL 5.1+

Unless you’re having millions of tables this is something you can set it up and forget but when a year later when someone asks you about growth rate for individual table you will have it handy.

If you’re having large number of MySQL servers, especially in Sharded environment it makes sense to modify the script so you store data on one central server this allows you to do a lot of nice queries to see how evenly data is distributed among shards etc. In replicated environment you should put this cron on Master only. If you use statement based replication it will automatically pick up sizes and even different tables on master and slave.

13 Comments

Oldest

Newest Most Voted

Inline Feedbacks

View all comments

Klaus

13 years ago

Thanks Peter, what a helpful little tool, fully automated!

stewart

13 years ago

Makes me think of keeping this as a table stat to have better expansion of data files.

bob

13 years ago

Nice. Thanks.

Shantanu Oak

13 years ago

Very useful tip. Instead of inserting the records in a table, I would prefer saving it to a text file by adding the standard and error out to the cron entry.
… >> /home/mysql_log.txt 2>> /home/mysql_log_err.txt

When I need to know the number of records, I can now use grep
grep -w tbl_name mysql_log.txt

Rob Wultsch

13 years ago

“Unless you’re having millions of tables this is something you can set it up and forget but when a year later when someone asks you about growth rate for individual table you will have it handy.”

Depending upon what your environment looks like, the information_schema query can easily run a server out io (amongst other things) for long periods of time. I have had to clean up a rather large mess caused by a query like the one suggested to.

I suggest that the query above should only be used if there is a relatively small number of tables. I would probably not use it if there were more than a thousand tables on a server.

YMMV

DaniÃ«l van Eeden

13 years ago

Maybe using events could be nice instead of cron.

Morgan Tocker

13 years ago

I might suggest a small modification to cut out the noise from information_schema / mysql databases:

INSERT INTO stats.tables SELECT DATE(NOW()),TABLE_SCHEMA,TABLE_NAME,ENGINE,TABLE_ROWS,DATA_LENGTH,INDEX_LENGTH,DATA_FREE,AUTO_INCREMENT FROM INFORMATION_SCHEMA.TABLES WHERE TABLE_SCHEMA NOT IN (‘mysql’, ‘information_schema’);

It’s not a big deal – but these infrequently change.

Peter Zaitsev

Author

13 years ago

Morgan,

Yeah. You can typically skip these databases. You probably would skip performance_schema in MySQL 5.5 too.

Peter Zaitsev

Author

13 years ago

Rob,

Indeed performance schema queries can have different impact to server performance. It is not just about number of tables but also their memory fit and number of indexes for Innodb tables as stats would need to be recomputed. How much impact it is going to do on server depends on how powerful server is – what it does is a lot of random IO, which can impact server response to other queries. If you’re in such situation you can use more complex script to do show tables (which is very fast) and when get size table by table with sleep in between.

Baron Schwartz

13 years ago

Maatkit’s mk-find has features to iterate the tables and insert various information from them, just as you’ve described. Another advantage of this is that you won’t get false-positive checksum differences, which you’ll get if you let INSERT..SELECT replicate.

shazamo

13 years ago

very cool and helpful tool but the issue with that is table_rows field does not contain accurate numbers, it contains an estimate values

Ronald Bradford

13 years ago

Realize that for large InnoDB schemas (e.g. over 1TB, tables over 500GB etc), accessing the INFORMATION_SCHEMA.TABLES information for InnoDB tables will cause random index scans of the data for statistics purpose. This is one reason why these queries can take minutes to run.

Alex

13 years ago

Sad, but it doesn’t provide correct info for barracuda tables. Any workarounds?

MySQL 5.7
End of Life

Compare Percona to Leading Database Solutions

Software
Downloads

Product
Documentation

Resource Hub

Financial Services

Driving Database Success

Percona Blog

Percona Community Hub

Percona Events Hub

About Percona

Percona in the News

Our Customers

Our Partners

Careers

Contact Us

Getting History of Table Sizes in MySQL

Related

Related Blog Articles

RECOMMENDED ARTICLES

Valkey/Redis: Configuration Best Practices

Valkey/Redis: The Hash Datatype

Valkey/Redis Replication and Auto-Failover With Sentinel Service

MOST POPULAR ARTICLES

Auditing login attempts in MySQL

Deploy Django on Kubernetes With Percona Operator for PostgreSQL

MySQL “Got an error reading communication packet”

MySQL 5.7 End of Life

Compare Percona to Leading Database Solutions

Software Downloads

Product Documentation

Resource Hub

Financial Services

Driving Database Success

Percona Blog

Percona Community Hub

Percona Events Hub

About Percona

Percona in the News

Our Customers

Our Partners

Careers

Contact Us

Getting History of Table Sizes in MySQL

Related

Share This Post!

Want to get weekly updates listing the latest blog posts?

Related Blog Articles

RECOMMENDED ARTICLES

Valkey/Redis: Configuration Best Practices

Valkey/Redis: The Hash Datatype

Valkey/Redis Replication and Auto-Failover With Sentinel Service

MOST POPULAR ARTICLES

Auditing login attempts in MySQL

Deploy Django on Kubernetes With Percona Operator for PostgreSQL

MySQL “Got an error reading communication packet”

MySQL 5.7
End of Life

Software
Downloads

Product
Documentation