Playing with last version of xtrabackup and compress it I noticed that gzip is unacceptable slow for both compression and decompression operations. Actually Peter wrote about it some time ago, but I wanted to review that data having some new information. In current multi-core word the compression utility should utilize several CPU to speedup operation, and another my requirement was the ability to work with stdin / stdout, so I could do scripting something like: innobackupex –stream | compressor | network_copy.

My research gave me next list: pigz (parallel gzip), pbzip2 (parallel bzip2), qpress ( command line utility for QuickLZ) and I wanted to try LZO (as lzop 1.03 command line + LZO 2 libraries). Actually lzop does not support parallel operations, but it is know to have good decompression speed even with 1 thread. UPDATE 17-Mar-2009: I added lzma results also by request from comments.


For compression test I took ~12GB of InnoDB data files generated by tpcc benchmark with 100 warehouses.

I tested 1, 2, 4 parallel threads for tools that support it and different level of compression ( 1,2,3 for qpress; -1 and -5 for other tools)

The raw results are available here http://spreadsheets.google.com/ccc?key=pOIo5aX59b6biPZ0QTVMXHg&hl=en, and I copy table in place in case if Google stops to work.

threadslevelcompressed sizecompress ratiocomression time, seccompr speed, MB/sdecomp time, secdecomp speed, MB/s
qpress116,058.930.5210955.599265.86
125,892.620.5120129.3212347.91
135,885.010.5147312.448470.06
216,058.930.526593.216691.80
225,892.620.5111053.5711252.61
235,885.010.5124524.028470.06
416,058.930.5248126.236691.80
425,892.620.516492.076886.66
435,885.010.5113045.276590.54
pigz114,839.970.4243811.0512937.52
153,460.310.307634.5412128.60
214,839.970.4221322.7210944.40
253,460.310.303799.1310433.27
414,839.970.4210745.2311243.21
453,460.310.3019018.2110333.60
LZOP115,831.250.5018431.698370.26
155,850.160.5017932.688767.24
pbzip2114,154.410.3615942.615976.96
154,007.070.3417022.356446.22
214,154.410.368005.196056.87
254,007.070.348444.756486.18
414,154.410.3639910.416026.90
454,007.070.344219.526456.21
LZMA113,623.660.3114542.495017.23
15NANAnot done in 2hNANANA

To summarize results:

  • pbzip2 obviously show good compression, but the speed of processing is too slow. What is interesting on Level 5 the compression is worse than in pigz Level 5
  • pigz is good for compression and faster than pbzip2 but still not so fast; however multi-threaded processing may be OK, especially if you need to keep compatibility, e.g. copy result on boxes where only standard gzip available
  • qpress is not so good in compression ration, but speed is impressive, and maybe we will ship xtrabackup with this compression
  • LZO is even faster in decompression than qpress, but I would like to see parallel version. There is the patch for it, but it did not apply clean to lzop 1.02, so I skipped it
  • In my opinion in all cases Level 1 of compression shows better tradeoff between size of archive and compression/decompression time

There is no obvious winner, it depends on what is more important for you – size or time, but having this data we can make decision.

18 Comments
Oldest
Newest Most Voted
Inline Feedbacks
View all comments
Morgan Tocker

The interesting thing is that decompression doesn’t seem to get the same speed boosts from added threads that compression does. I’ve always thought that decompression should be faster than compression, but in almost all of your 4 threaded tests that’s not the case.

Chip Turner

Very nice comparison of parallel compression choices. This is a fun kind of analysis to perform.

Ultimately the goal is probably to get the data off of the database as quickly as possible. It would be interesting to see compression_time + ultimate_size / network_speed to get the total time to actually get the data off of the machine and thereby have a completed backup. I imagine qpress 4.1 would still be optimal. Also worth factoring in is what rate xtrabackup can provide data to the algorithm; so long as the algorithm is faster than xtrabackup, you can decide strictly on space, right?

Of course, sometimes you want to optimize for network data copied (preferring higher compression ratios) or less impact to the machine you are backing up (preferring fewer parallel cores, or more throughput when running nice’d).

Are you just deciding a default/recommended algorithm for a pluggable system or will what you decide be the only option?

Geoffrey Lee

I think it would be interesting to include p7zip in your benchmark as well. 7-Zip has been well known for its multi-threading support. http://en.wikipedia.org/wiki/Lempel-Ziv-Markov_chain_algorithm#7-Zip_reference_implementation

slavik

Vadim, use lzma instead p7zip
(http://tukaani.org/lzma/)

Mark

Vadim,

I use p7zip in some of our production machines where size matters, plus it can encrypt the file at almost no performance hit. On an 8-core machine, it can be quite fast as it can use all of the cores at once (make sure to either ‘nice’ it or use it off-peak because it can really slow things down!). It can be used to read from stdin using the “-si” option:

cat files.tar | 7z a [various options] -si files.tar.7z

I’ve found that using a compression level of 3 gives a good balance between compressed size and time spent compressing, or you can set it to 1 and it will be even faster. For a mulit-gigabyte database backup (mysqldump file), it can reduce the file to approximately 70% the size of a gzip’ed in about the same amount of time.

Baron Schwartz

Hmmm, does it have almost no performance hit, or does it really slow things down? I’m confused.

Dennis Birkholz

I think Mark meant that 7z slows the machine so extremly “really” down that additional encryption does not hit the performance any more 😉

I would like to see a lzma benchmark, to but i am not sure if it supports multicore procession.

slavik

Baron,
Mark meant that using aes encryption does not affect time spent to compression.
This is typical for modern processors, usually because sheduler can’t always use all cores power (cache miss, io bottlenecks, kernel tasks and so on) so small number of cpu resources (but enough for encryption) is always avaiable.

Mark

That will teach me to post at 1 in the morning!

Yes, I meant that 7zip does take more resources than gzip, but adding encryption doesn’t add any *more*. As for the performance hit, the machines I use 7zip on have distinct busy and non-busy times, so I can schedule a 25-minute, 7zipped backup during a non-busy time fairly easily. I realize this isn’t necessarily normal for most servers though, for the rest of our off-line backups we use gzip. After looking at these results I’m thinking about lzo instead, especially since it’s less of a hit on a busy machine. Unfortunately, I see that version 2 doesn’t have a nice, gzip-like executable. Since we use 64-bit machines almost exclusively, and v2 promises better performance on 64-bit, does anyone have a link to a command-line archiver that can use lzo v2?

Steven Roussey

lzjb

Reminds me of discussions in this post:

ZFS & MySQL/InnoDB Compression Update
http://blogs.smugmug.com/don/2008/10/13/zfs-mysqlinnodb-compression-update/

slavik

Vadim,
too ugly results for lzma, can u post system spec?
I test compress/decompress on windows 7zip, quad core amd 9500 with 8gb ram, so I have decompress speed 20 mb/s (too close for regular old 160gb PATA drive speed) and compression (fast mode) speed 8mb/s.
I will try later on a similar system under Linux, and post results.

Vadim

slavik,

it is Dell PowerEdge R900, 4x quadcores

vendor_id : GenuineIntel
cpu family : 6
model : 15
model name : Intel(R) Xeon(R) CPU E7320 @ 2.13GHz
stepping : 11
cpu MHz : 2127.881
cache size : 2048 KB

with 32GB of RAM

slavik

I found that LZMA can’t scale, with -1 it can use only 1 thread, with -5 (or bigger) only 2 worker threads.
“Sets multithread mode. If you have a multiprocessor or multicore system, you can get a increase with this switch. 7-Zip supports multithread mode only for LZMA compression and BZip2 compression / decompression. If you specify {N}, for example mt=4, 7-Zip tries to use 4 threads. LZMA compression uses only 2 threads.” http://www.bugaco.com/7zip/MANUAL/switches/method.htm
In my tests on amd 9950 with 2gig of ram: 4 mb/s compression, and about 8 mb/s decompression.
I think it’s results of terrible optimization of unix port

Snarky

The decompression speed computation is deceiving. You should divide by the size of the uncompressed data not the compressed one because the better the compression the worse the decompression speed will look when it is not the case.