Compression for InnoDB backup

Playing with last version of xtrabackup and compress it I noticed that gzip is unacceptable slow for both compression and decompression operations. Actually Peter wrote about it some time ago, but I wanted to review that data having some new information. In current multi-core word the compression utility should utilize several CPU to speedup operation, and another my requirement was the ability to work with stdin / stdout, so I could do scripting something like: innobackupex –stream | compressor | network_copy.

My research gave me next list: pigz (parallel gzip), pbzip2 (parallel bzip2), qpress ( command line utility for QuickLZ) and I wanted to try LZO (as lzop 1.03 command line + LZO 2 libraries). Actually lzop does not support parallel operations, but it is know to have good decompression speed even with 1 thread. UPDATE 17-Mar-2009: I added lzma results also by request from comments.

For compression test I took ~12GB of InnoDB data files generated by tpcc benchmark with 100 warehouses.

I tested 1, 2, 4 parallel threads for tools that support it and different level of compression ( 1,2,3 for qpress; -1 and -5 for other tools)

The raw results are available here http://spreadsheets.google.com/ccc?key=pOIo5aX59b6biPZ0QTVMXHg&hl=en, and I copy table in place in case if Google stops to work.


	threads	level	compressed size	compress ratio	comression time, sec	compr speed, MB/s	decomp time, sec	decomp speed, MB/s
qpress	1	1	6,058.93	0.52	109	55.59	92	65.86
	1	2	5,892.62	0.51	201	29.32	123	47.91
	1	3	5,885.01	0.51	473	12.44	84	70.06
	2	1	6,058.93	0.52	65	93.21	66	91.80
	2	2	5,892.62	0.51	110	53.57	112	52.61
	2	3	5,885.01	0.51	245	24.02	84	70.06
	4	1	6,058.93	0.52	48	126.23	66	91.80
	4	2	5,892.62	0.51	64	92.07	68	86.66
	4	3	5,885.01	0.51	130	45.27	65	90.54
pigz	1	1	4,839.97	0.42	438	11.05	129	37.52
	1	5	3,460.31	0.30	763	4.54	121	28.60
	2	1	4,839.97	0.42	213	22.72	109	44.40
	2	5	3,460.31	0.30	379	9.13	104	33.27
	4	1	4,839.97	0.42	107	45.23	112	43.21
	4	5	3,460.31	0.30	190	18.21	103	33.60
LZOP	1	1	5,831.25	0.50	184	31.69	83	70.26
	1	5	5,850.16	0.50	179	32.68	87	67.24
pbzip2	1	1	4,154.41	0.36	1594	2.61	597	6.96
	1	5	4,007.07	0.34	1702	2.35	644	6.22
	2	1	4,154.41	0.36	800	5.19	605	6.87
	2	5	4,007.07	0.34	844	4.75	648	6.18
	4	1	4,154.41	0.36	399	10.41	602	6.90
	4	5	4,007.07	0.34	421	9.52	645	6.21
LZMA	1	1	3,623.66	0.31	1454	2.49	501	7.23
	1	5	NA	NA	not done in 2h	NA	NA	NA

To summarize results:

pbzip2 obviously show good compression, but the speed of processing is too slow. What is interesting on Level 5 the compression is worse than in pigz Level 5
pigz is good for compression and faster than pbzip2 but still not so fast; however multi-threaded processing may be OK, especially if you need to keep compatibility, e.g. copy result on boxes where only standard gzip available
qpress is not so good in compression ration, but speed is impressive, and maybe we will ship xtrabackup with this compression
LZO is even faster in decompression than qpress, but I would like to see parallel version. There is the patch for it, but it did not apply clean to lzop 1.02, so I skipped it
In my opinion in all cases Level 1 of compression shows better tradeoff between size of archive and compression/decompression time

There is no obvious winner, it depends on what is more important for you – size or time, but having this data we can make decision.

18 Comments

Oldest

Newest Most Voted

Inline Feedbacks

View all comments

Morgan Tocker

15 years ago

The interesting thing is that decompression doesn’t seem to get the same speed boosts from added threads that compression does. I’ve always thought that decompression should be faster than compression, but in almost all of your 4 threaded tests that’s not the case.

Chip Turner

15 years ago

Very nice comparison of parallel compression choices. This is a fun kind of analysis to perform.

Ultimately the goal is probably to get the data off of the database as quickly as possible. It would be interesting to see compression_time + ultimate_size / network_speed to get the total time to actually get the data off of the machine and thereby have a completed backup. I imagine qpress 4.1 would still be optimal. Also worth factoring in is what rate xtrabackup can provide data to the algorithm; so long as the algorithm is faster than xtrabackup, you can decide strictly on space, right?

Of course, sometimes you want to optimize for network data copied (preferring higher compression ratios) or less impact to the machine you are backing up (preferring fewer parallel cores, or more throughput when running nice’d).

Are you just deciding a default/recommended algorithm for a pluggable system or will what you decide be the only option?

Geoffrey Lee

15 years ago

I think it would be interesting to include p7zip in your benchmark as well. 7-Zip has been well known for its multi-threading support. http://en.wikipedia.org/wiki/Lempel-Ziv-Markov_chain_algorithm#7-Zip_reference_implementation

Vadim Tkachenko

Author

15 years ago

Chip,

We actually propose stream which can be compressed any tool you want, that’s why my requirement was accept stdin and output to stdout.

Vadim Tkachenko

Author

15 years ago

Geoffrey Lee,

I need tool to accept stdin and output to stdout, see comment above. I was not able to redirect a pipe to p7zip, i.e. cat files.tar | p7zip > files.tar.7zip

slavik

15 years ago

Vadim, use lzma instead p7zip
(http://tukaani.org/lzma/)

Mark

15 years ago

Vadim,

I use p7zip in some of our production machines where size matters, plus it can encrypt the file at almost no performance hit. On an 8-core machine, it can be quite fast as it can use all of the cores at once (make sure to either ‘nice’ it or use it off-peak because it can really slow things down!). It can be used to read from stdin using the “-si” option:

cat files.tar | 7z a [various options] -si files.tar.7z

I’ve found that using a compression level of 3 gives a good balance between compressed size and time spent compressing, or you can set it to 1 and it will be even faster. For a mulit-gigabyte database backup (mysqldump file), it can reduce the file to approximately 70% the size of a gzip’ed in about the same amount of time.

Baron Schwartz

15 years ago

Hmmm, does it have almost no performance hit, or does it really slow things down? I’m confused.

Dennis Birkholz

15 years ago

I think Mark meant that 7z slows the machine so extremly “really” down that additional encryption does not hit the performance any more 😉

I would like to see a lzma benchmark, to but i am not sure if it supports multicore procession.

slavik

15 years ago

Baron,
Mark meant that using aes encryption does not affect time spent to compression.
This is typical for modern processors, usually because sheduler can’t always use all cores power (cache miss, io bottlenecks, kernel tasks and so on) so small number of cpu resources (but enough for encryption) is always avaiable.

Vadim Tkachenko

Author

15 years ago

slavik,

I added results for LZMA. with compression level 5 it was not able to finish in 2h, so I stopped that.

Mark

15 years ago

That will teach me to post at 1 in the morning!

Yes, I meant that 7zip does take more resources than gzip, but adding encryption doesn’t add any *more*. As for the performance hit, the machines I use 7zip on have distinct busy and non-busy times, so I can schedule a 25-minute, 7zipped backup during a non-busy time fairly easily. I realize this isn’t necessarily normal for most servers though, for the rest of our off-line backups we use gzip. After looking at these results I’m thinking about lzo instead, especially since it’s less of a hit on a busy machine. Unfortunately, I see that version 2 doesn’t have a nice, gzip-like executable. Since we use 64-bit machines almost exclusively, and v2 promises better performance on 64-bit, does anyone have a link to a command-line archiver that can use lzo v2?

Vadim Tkachenko

Author

15 years ago

Mark,

for me on Ubuntu 8.10 where I did test – lzop comes linked with LZO v2 libraries.

So even your distributive has lzop with LZO v1 you probably can compile it linked to v2.

Steven Roussey

15 years ago

lzjb

Reminds me of discussions in this post:

ZFS & MySQL/InnoDB Compression Update
http://blogs.smugmug.com/don/2008/10/13/zfs-mysqlinnodb-compression-update/

slavik

15 years ago

Vadim,
too ugly results for lzma, can u post system spec?
I test compress/decompress on windows 7zip, quad core amd 9500 with 8gb ram, so I have decompress speed 20 mb/s (too close for regular old 160gb PATA drive speed) and compression (fast mode) speed 8mb/s.
I will try later on a similar system under Linux, and post results.

Vadim

15 years ago

slavik,

it is Dell PowerEdge R900, 4x quadcores

vendor_id : GenuineIntel
cpu family : 6
model : 15
model name : Intel(R) Xeon(R) CPU E7320 @ 2.13GHz
stepping : 11
cpu MHz : 2127.881
cache size : 2048 KB

with 32GB of RAM

slavik

15 years ago

I found that LZMA can’t scale, with -1 it can use only 1 thread, with -5 (or bigger) only 2 worker threads.
“Sets multithread mode. If you have a multiprocessor or multicore system, you can get a increase with this switch. 7-Zip supports multithread mode only for LZMA compression and BZip2 compression / decompression. If you specify {N}, for example mt=4, 7-Zip tries to use 4 threads. LZMA compression uses only 2 threads.” http://www.bugaco.com/7zip/MANUAL/switches/method.htm
In my tests on amd 9950 with 2gig of ram: 4 mb/s compression, and about 8 mb/s decompression.
I think it’s results of terrible optimization of unix port

Snarky

14 years ago

The decompression speed computation is deceiving. You should divide by the size of the uncompressed data not the compressed one because the better the compression the worse the decompression speed will look when it is not the case.

MySQL 5.7
End of Life

Compare Percona to Leading Database Solutions

Software
Downloads

Product
Documentation

Resource Hub

Financial Services

Driving Database Success

Percona Blog

Percona Community Hub

Percona Events Hub

About Percona

Percona in the News

Our Customers

Our Partners

Careers

Contact Us

Compression for InnoDB backup

Related

Related Blog Articles

RECOMMENDED ARTICLES

Valkey/Redis: Not-So-Good Practices

Choosing the Right Database: Comparing MariaDB vs. MySQL, PostgreSQL, and MongoDB

Valkey/Redis: Configuration Best Practices

MOST POPULAR ARTICLES

Auditing login attempts in MySQL

Deploy Django on Kubernetes With Percona Operator for PostgreSQL

MySQL “Got an error reading communication packet”

MySQL 5.7 End of Life

Compare Percona to Leading Database Solutions

Software Downloads

Product Documentation

Resource Hub

Financial Services

Driving Database Success

Percona Blog

Percona Community Hub

Percona Events Hub

About Percona

Percona in the News

Our Customers

Our Partners

Careers

Contact Us

Compression for InnoDB backup

Related

Share This Post!

Want to get weekly updates listing the latest blog posts?

Related Blog Articles

RECOMMENDED ARTICLES

Valkey/Redis: Not-So-Good Practices

Choosing the Right Database: Comparing MariaDB vs. MySQL, PostgreSQL, and MongoDB

Valkey/Redis: Configuration Best Practices

MOST POPULAR ARTICLES

Auditing login attempts in MySQL

Deploy Django on Kubernetes With Percona Operator for PostgreSQL

MySQL “Got an error reading communication packet”

MySQL 5.7
End of Life

Software
Downloads

Product
Documentation