Compact backups in Percona XtraBackup

We continue to improve Percona XtraBackup, and today I would like to give a preview for one feature which comes in next Percona XtraBackup 2.1 release.

This feature is “Compact backups”, and let me explain what it does.
As you may know InnoDB PK (Primary Key) contains all data, and all secondary indexes are only subset of columns of Primary Key. So in theory we can store only PK, and re-build secondary indexes as we need. Well, now it is possible not only in theory.

To create a compact backup you should use
innobackupex --compact
and it will create a backup where all InnoDB tables contain only Primary Keys and not secondary.
It allows to safe some space on a backup storage. How much? Well, it depends on how many indexes you have.
For example for table order_line from tpcc benchmark, 100W.
Original size: 3140M,
Size in compact backup: 2228M.

You may suspect that there is a catch somewhere, and yes, there is.
To recovery a usable database, we need to rebuild indexes, and it is done on prepare stage,
and it takes time.
The command to prepare is:

innobackupex --apply-log --rebuild-indexes /data/backup

As a bonus, secondary indexes are created by sorting, that in general gives much less fragmented indexes, so it may result in an additional space saving.

In fact this --rebuild-indexes can be used on a full backup, and it will result in rebuilt de-fragmented indexes.

I encourage you to try this feature and report your experience.
Right now it is available only in source code from https://code.launchpad.net/~percona-core/percona-xtrabackup/2.1,
but preview binaries should be available soon.

11 Comments

Oldest

Newest Most Voted

Inline Feedbacks

View all comments

marc castrovinci

11 years ago

I’m not sure if this is the forum for this question, but what do you suspect is the max size for a database when XtraBackup becomes an unusable option?

We have a server that is 150GB and XtraBackup is now taking up to 40 minutes for an incremental ( which we do hourly ). So out of an hour, the slave server gets 20 minutes rest. I’m in the process of changing the backup scheme to use ec2-consistent-snapshot.

I havent tried the newer XtraDB which utilizes changed page mapping, but even that will only delay the inevitable of taking a full hour and running into the next incremental.

Vadim Tkachenko

Author

11 years ago

Marc,

the max size for a database is defined by your workload and your storage, and also how long backup procedure
you want to tolerate.

I.e. for light read-only workload on SSD storage even huge databases may be backup fast.

As for incremental backups – we have another feature coming, “Bitmap-based backup”, which should
make incremental backups much faster, stay tuned.

Frederic Descamps

11 years ago

Hi Vadim,

Does it mean that taking a backup will become slower as the table space is not “copied” directly from the file system ?

Vadim Tkachenko

Author

11 years ago

Fred,

Not necessary. We still copy from the file system, but we skip pages used by secondary keys…

marc castrovinci

11 years ago

Vadim,

Thanks for the info. Looking forward to the updates. As a DBA, I still prefer to have atleast a few backups that I can manage and move around unlike an EC2 snapshot which DevOps owns.

Peter Zaitsev

Admin

11 years ago

Vadim,

Nice. I assume this feature works with compression too right ? Did we do any measurements about it ? I guess data and indexes would normally compress at different size.

Also regarding rebuilding indexes is it done sequentially or is there some level of parallel index build (for same or different table) ?

Vadim Tkachenko

Author

11 years ago

Peter,

This works with compression, but I do not have numbers.

Right now indexes are rebuilt sequentially, but to make it in parallel should be next improvement.

Mark Callaghan

11 years ago

Nice feature. Thanks for advancing the state of the art.

Franck

11 years ago

@ Marc castrovinci
40 min for 150GB, it seems to me you’re doing a full backup.
I thought incremental backup would be backing up the binary logs?

shawn

11 years ago

Vadim
i used this new feature ,but i found a question , if i dont use –rebuild-index option, as i think, it only apply data to datafile , the space of secondary index won’t be allocted, so the dadafile files could be used . Actually， when i dont use rebulid option， Xbackup also add some vacnt pages to the exist space file , this monment ,the space file was useless。

should it be as i think， if dont use index rebuild， the space only contain primary key could be used correctly？

Alexey Kopytov

Editor

11 years ago

@shawn As answered elsewhere, no, it is impossible to use datafiles without rebuilding indexes.

MySQL 5.7
End of Life

Compare Percona to Leading Database Solutions

Software
Downloads

Product
Documentation

Resource Hub

Financial Services

Driving Database Success

Percona Blog

Percona Community Hub

Percona Events Hub

About Percona

Percona in the News

Our Customers

Our Partners

Careers

Contact Us

Feature preview: Compact backups in Percona XtraBackup

Related

Related Blog Articles

RECOMMENDED ARTICLES

Securing Your MySQL Database: Essential Best Practices

Troubleshooting PostgreSQL on Kubernetes With Coroot

Mastering Database Monitoring: Running PMM in High Availability Mode

MOST POPULAR ARTICLES

Auditing login attempts in MySQL

Deploy Django on Kubernetes With Percona Operator for PostgreSQL

MySQL “Got an error reading communication packet”

MySQL 5.7 End of Life

Compare Percona to Leading Database Solutions

Software Downloads

Product Documentation

Resource Hub

Financial Services

Driving Database Success

Percona Blog

Percona Community Hub

Percona Events Hub

About Percona

Percona in the News

Our Customers

Our Partners

Careers

Contact Us

Feature preview: Compact backups in Percona XtraBackup

Related

Share This Post!

Want to get weekly updates listing the latest blog posts?

Related Blog Articles

RECOMMENDED ARTICLES

Securing Your MySQL Database: Essential Best Practices

Troubleshooting PostgreSQL on Kubernetes With Coroot

Mastering Database Monitoring: Running PMM in High Availability Mode

MOST POPULAR ARTICLES

Auditing login attempts in MySQL

Deploy Django on Kubernetes With Percona Operator for PostgreSQL

MySQL “Got an error reading communication packet”

MySQL 5.7
End of Life

Software
Downloads

Product
Documentation