Want to archive tables? Use Percona Toolkit's pt-archiver

Percona Toolkit’s pt-archiver is one of the best utilities to archive the records from large tables to another tables or files. One interesting thing is that pt-archiver is a read-write tool. It deletes data from the source by default, so after archiving you don’t need to delete it separately.

As it is done by default, you should take care before actually running it on then production server. You can test your archiving jobs with the — dry-run OR you can use the –no-delete option if you’re not sure about. The purpose of this script is mainly to archive old data from the table without impacting OLTP queries and insert the data into another table on the same/different server OR into a file in a format which is suitable for LOAD DATA INFILE.

How does pt-archiver select records to archive?

Pt-archiver uses the index to select records from the table. The index is used to optimize repeated accesses to the table. Pt-archiver remembers the last row it retrieves from each SELECT statement, and uses it to construct a WHERE clause. It does this using the columns in the specified index that should allow MySQL to start the next SELECT where the last one ended – rather than potentially scanning from the beginning of the table with each successive SELECT.

If you want to run pt-archiver with a specific index you can use the “-i” option in –source DSN options. The “-i” option tells pt-archiver which index it should scan to archive. This appears in a FORCE INDEX or USE INDEX hint in the SELECT statements that are used to fetch rows to archive. If you don’t specify anything, pt-archiver will auto-discover a good index, preferring a PRIMARY KEY if one exists. Most of the time, without “-i” option, pt-archiver works well.

How to run pt-archiver?

For archive records into normal file, you can run something like

pt-archiver --source h=localhost,D=nil,t=test --file '/home/nilnandan/%Y-%m-%d-tabname' --where "name='nil'" --limit-1000

1	pt-archiver --source h=localhost,D=nil,t=test --file '/home/nilnandan/%Y-%m-%d-tabname' --where "name='nil'" --limit-1000

From archive records from one table to another table on same server or different, you can run something like

pt-archiver --source h=localhost,D=nil,t=test --dest h=fedora.vm --where "name='nil'" --limit-1000

1	pt-archiver --source h=localhost,D=nil,t=test --dest h=fedora.vm --where "name='nil'" --limit-1000

Please check this before you use default file option (-F) in —source

Archiving in a replication environment:

In the replication environment it’s really important that the slave should not lag for a long time. So for that, there are two options which we can use while archiving to control the slave lag on slave server.

–check-slave-lag : Pause archiving until the specified DSN’s slave lag is less than –max-lag. In this option, you can give slave details to connect slave lag. (i.e –check-slave-lag h=localhost,S=/tmp/mysql_sandbox29784.sock)

–max-lag : Pause archiving if the slave given by –check-slave-lag lags.

This options causes pt-archiver to look at the slave every time when it’s about to fetch another row. If the slave’s lag is greater than the option’s value, or if the slave isn’t running (so its lag is NULL), pt-archiver sleeps for –check-interval seconds and then looks at the lag again. It repeats until the slave is caught up, then proceeds to fetch and archive the row.

Some useful options for pt-archiver:

–for-update/-share-lock : Adds the FOR UPDATE/LOCK IN SHARE MODE modifier to SELECT statements.

–no-delete : Do not delete archived rows.

–plugin : Perl module name to use as a generic plugin.

–progress : Print progress information every X rows.

–statistics : Collect and print timing statistics.

–where : WHERE clause to limit which rows to archive (required).

nilnandan@nil:~$ pt-archiver --source h=localhost,D=nil,t=test,S=/tmp/mysql_sandbox29783.sock --file '/home/nilnandan/%Y-%m-%d-tabname' --where "name='nilnandan'" --limit=50000 --progress=50000 --txn-size=50000 --statistics --bulk-delete --max-lag=1 --check-interval=15 --check-slave-lag h=localhost,S=/tmp/mysql_sandbox29784.sock
TIME ELAPSED COUNT
2013-08-08T10:08:39 0 0
2013-08-08T10:09:25 46 50000
2013-08-08T10:10:32 113 100000
2013-08-08T10:11:41 182 148576
Started at 2013-08-08T10:08:39, ended at 2013-08-08T10:11:59
Source: D=nil,S=/tmp/mysql_sandbox29783.sock,h=localhost,t=test
SELECT 148576
INSERT 0
DELETE 148576
Action Count Time Pct
print_file 148576 18.2674 9.12
bulk_deleting 3 8.9535 4.47
select 4 2.9204 1.46
commit 3 0.0005 0.00
other 0 170.0719 84.95
nilnandan@nil:~$

nilnandan@nil:~$ pt-archiver --source h=localhost,D=nil,t=test,S=/tmp/mysql_sandbox29783.sock --file '/home/nilnandan/%Y-%m-%d-tabname' --where "name='nilnandan'" --limit=50000 --progress=50000 --txn-size=50000 --statistics --bulk-delete --max-lag=1 --check-interval=15 --check-slave-lag h=localhost,S=/tmp/mysql_sandbox29784.sock

TIME ELAPSED COUNT

2013-08-08T10:08:39 0 0

2013-08-08T10:09:25 46 50000

2013-08-08T10:10:32 113 100000

2013-08-08T10:11:41 182 148576

Started at 2013-08-08T10:08:39, ended at 2013-08-08T10:11:59

Source: D=nil,S=/tmp/mysql_sandbox29783.sock,h=localhost,t=test

SELECT 148576

INSERT 0

DELETE 148576

Action Count Time Pct

print_file 148576 18.2674 9.12

bulk_deleting 3 8.9535 4.47

select 4 2.9204 1.46

commit 3 0.0005 0.00

other 0 170.0719 84.95

nilnandan@nil:~$

Percona Toolkit’s pt-archiver works with Percona XtraDB Cluster (PXC) 5.5.28-23.7 and newer, but there are three limitations you should consider before archiving on a cluster. You can get more information here.

pt-archiver is extensible via a plugin mechanism. You can inject your own code to add advanced archiving logic that could be useful for archiving dependent data, applying complex business rules, or building a data warehouse during the archiving process. Follow this URL for more info on that.

Bugs related to pt-archiver: https://bugs.launchpad.net/percona-toolkit/+bugs?field.tag=pt-archiver

More details about pt-archiver: https://docs.percona.com/percona-toolkit/pt-archiver.html

17 Comments

Oldest

Newest Most Voted

Inline Feedbacks

View all comments

Document Storage London

10 years ago

Document Storage London is the good environment and save your document on low cost leavel.

Rachel

10 years ago

I execute following command, always encounter following error:
C:\Program Files\MySQL\MySQL Server 5.5\bin>pt-archiver –source h=localhost,D=a
mi,t=table –user root –password root –file ‘c:\test.txt’ –where
“id=’915′” –no-check-charset –commit-each –limit 1
Cannot open ‘c:\test.txt’: Invalid argument

Appreciate your comments and suggestion ASAP.

Nilnandan Joshi

Author

10 years ago

Hi Rachel,

Percona Toolkit is not fully supported to Windows though some utilities works fine. Here, it looks like pt-archiver can’t understand the path of –file. (seems it’s problem of backslash v/s forward slash). Can you try to run same command with –file options like –file ‘C:/test.txt’ OR –file ‘C:\\test.txt’ i.e

>pt-archiver –source h=localhost,D=ami,t=table –user root –password root –file ‘c:/test.txt’ –where “id=915” –no-check-charset –commit-each –limit 1

Irvin

9 years ago

I am looking to use pt-archiver to archive several fast growing tables. I want to put the script in a cron job but don’t necessarily want to have the password in the script.

Will pt-archiver ever use –login-path=xxx ? This seems a much more secure method of running scripts in batch mode.

CEPE

9 years ago

Hi, in your last example, the limit condition (–limit) seems to be useless, isn’t it?

Rajeev Rai

9 years ago

I have the same problem with pt-archver as Irvin has..

I dont want to provide username/password details in command line.. Can this tool pick it up from a source file ?

Nilnandan Joshi

Author

9 years ago

Hi Irvin/Rajeev,

You can create .my.cnf file at the location from where you want to run pt-archiver. So you don’t need to give user/pass with command line. It will take user credentials from there only. i.e

nilnandan@desktop:~$ cat .my.cnf
[client]
user = root
password=root
nilnandan@desktop:~$

Please check and let me know if it works or not.

Prateek

8 years ago

HI there, I am new to this tool and want to archive tables created in MYSQL. can u please help me how to start from the very beginning. what all commands to write ..I have read the document but not able to understand how to start my work. Please help me.
Thanks

Nilnandan Joshi

Author

8 years ago

Hi Prateek,

It’s very easy. You can select the records with –where option and archive them. I would suggest to read this wiki and you’ll get all the steps which you needed.
https://www.percona.com/doc/percona-toolkit/2.2/pt-archiver.html

Nani

8 years ago

Hi Nilandan,

If Im going to archive old data from parent table to another table.

How is the application going to fetch old data if required, will there be any reference stored somewhere that data is in another table?

Nani

8 years ago

Also,

One more question is does pt-archiver archives data based on Date.

That is all the data that is created before 6 months should be archived, is there any option like that?

Nani

8 years ago

Nilandan,

Also few more questions may be not relevant to this KB, I have few questions on Indexes.

1) If I do MysqlDump of all databases & restore on new Server does Indexes gets created automatically on new server
2) Does it require to periodically Drop/Recreate indexes for better performance
3) Can we drop & create Indexes on Slave (master-Slave ) scenario , does this break replication

Gopal

7 years ago

Hi Nil, I am facing a Problem with Login, when I specify the –file.
{code:sql}
gopal@D252:~/Work/percona-toolkit-2.2.17/bin$ perl pt-archiver –source h=sedodb2-analysis.i.sedorz.net -usdbrw -p=$sdbrw_pw,D=temp,t=xyz –dest h=sedodbdevha1.i.sedorz.net -usdbrw -p$sdbrw_pw,D=temp,t=xyz –where “1=1” –progress=1
TIME ELAPSED COUNT
2016-05-19T10:32:51 0 0
2016-05-19T10:32:51 0 1
2016-05-19T10:32:51 0 2
2016-05-19T10:32:51 0 3
2016-05-19T10:32:51 0 4
2016-05-19T10:32:51 0 4

gopal@D252:~/Work/percona-toolkit-2.2.17/bin$ perl pt-archiver –source h=sedodb2-analysis.i.sedorz.net -usdbrw -p=$sdbrw_pw,D=temp,t=xyz –file xyz –where “1=1”
DBI connect(‘temp;host=sedodb2-analysis.i.sedorz.net;mysql_read_default_group=client’,’sdbrw’,…) failed: Access denied for user ‘sdbrw’@’172.29.14.249’ (using password: YES) at pt-archiver line 2492.
{code}

I am using the same Credential in the first and Second statement.

Gopal

Reply to Gopal

7 years ago

Hi Nil, You can ignore last comment, I found the Problem.

shruti kapoor

Reply to Gopal

5 years ago

I am also facing a Problem with Login.

vishnu

6 years ago

Hi Nil,

can we load infile into a table using pt-archiever ?

shruti kapoor

5 years ago

I am facing a Problem with Login.

MySQL 5.7
End of Life

Compare Percona to Leading Database Solutions

Software
Downloads

Product
Documentation

Resource Hub

Financial Services

Driving Database Success

Percona Blog

Percona Community Hub

Percona Events Hub

About Percona

Percona in the News

Our Customers

Our Partners

Careers

Contact Us

Want to archive tables? Use Percona Toolkit’s pt-archiver

Related

Related Blog Articles

RECOMMENDED ARTICLES

Seamless Table Modifications: Leveraging pt-online-schema-change for Online Alterations

Securing Your MySQL Database: Essential Best Practices

Troubleshooting PostgreSQL on Kubernetes With Coroot

MOST POPULAR ARTICLES

Auditing login attempts in MySQL

Deploy Django on Kubernetes With Percona Operator for PostgreSQL

MySQL “Got an error reading communication packet”

MySQL 5.7 End of Life

Compare Percona to Leading Database Solutions

Software Downloads

Product Documentation

Resource Hub

Financial Services

Driving Database Success

Percona Blog

Percona Community Hub

Percona Events Hub

About Percona

Percona in the News

Our Customers

Our Partners

Careers

Contact Us

Want to archive tables? Use Percona Toolkit’s pt-archiver

Related

Share This Post!

Want to get weekly updates listing the latest blog posts?

Related Blog Articles

RECOMMENDED ARTICLES

Seamless Table Modifications: Leveraging pt-online-schema-change for Online Alterations

Securing Your MySQL Database: Essential Best Practices

Troubleshooting PostgreSQL on Kubernetes With Coroot

MOST POPULAR ARTICLES

Auditing login attempts in MySQL

Deploy Django on Kubernetes With Percona Operator for PostgreSQL

MySQL “Got an error reading communication packet”

MySQL 5.7
End of Life

Software
Downloads

Product
Documentation