Playing with last version of xtrabackup and compress it I noticed that gzip is unacceptable slow for both compression and decompression operations. Actually Peter wrote about it some time ago, but I wanted to review that data having some new information. In current multi-core word the compression utility should utilize several CPU to speedup operation, and another my requirement was the ability to work with stdin / stdout, so I could do scripting something like: innobackupex –stream | compressor | network_copy.
My research gave me next list: pigz (parallel gzip), pbzip2 (parallel bzip2), qpress ( command line utility for QuickLZ) and I wanted to try LZO (as lzop 1.03 command line + LZO 2 libraries). Actually lzop does not support parallel operations, but it is know to have good decompression speed even with 1 thread. UPDATE 17-Mar-2009: I added lzma results also by request from comments.
For compression test I took ~12GB of InnoDB data files generated by tpcc benchmark with 100 warehouses.
I tested 1, 2, 4 parallel threads for tools that support it and different level of compression ( 1,2,3 for qpress; -1 and -5 for other tools)
The raw results are available here http://spreadsheets.google.com/ccc?key=pOIo5aX59b6biPZ0QTVMXHg&hl=en, and I copy table in place in case if Google stops to work.
|
To summarize results:
- pbzip2 obviously show good compression, but the speed of processing is too slow. What is interesting on Level 5 the compression is worse than in pigz Level 5
- pigz is good for compression and faster than pbzip2 but still not so fast; however multi-threaded processing may be OK, especially if you need to keep compatibility, e.g. copy result on boxes where only standard gzip available
- qpress is not so good in compression ration, but speed is impressive, and maybe we will ship xtrabackup with this compression
- LZO is even faster in decompression than qpress, but I would like to see parallel version. There is the patch for it, but it did not apply clean to lzop 1.02, so I skipped it
- In my opinion in all cases Level 1 of compression shows better tradeoff between size of archive and compression/decompression time
There is no obvious winner, it depends on what is more important for you – size or time, but having this data we can make decision.
The interesting thing is that decompression doesn’t seem to get the same speed boosts from added threads that compression does. I’ve always thought that decompression should be faster than compression, but in almost all of your 4 threaded tests that’s not the case.
Very nice comparison of parallel compression choices. This is a fun kind of analysis to perform.
Ultimately the goal is probably to get the data off of the database as quickly as possible. It would be interesting to see compression_time + ultimate_size / network_speed to get the total time to actually get the data off of the machine and thereby have a completed backup. I imagine qpress 4.1 would still be optimal. Also worth factoring in is what rate xtrabackup can provide data to the algorithm; so long as the algorithm is faster than xtrabackup, you can decide strictly on space, right?
Of course, sometimes you want to optimize for network data copied (preferring higher compression ratios) or less impact to the machine you are backing up (preferring fewer parallel cores, or more throughput when running nice’d).
Are you just deciding a default/recommended algorithm for a pluggable system or will what you decide be the only option?
I think it would be interesting to include p7zip in your benchmark as well. 7-Zip has been well known for its multi-threading support. http://en.wikipedia.org/wiki/Lempel-Ziv-Markov_chain_algorithm#7-Zip_reference_implementation
Chip,
We actually propose stream which can be compressed any tool you want, that’s why my requirement was accept stdin and output to stdout.
Geoffrey Lee,
I need tool to accept stdin and output to stdout, see comment above. I was not able to redirect a pipe to p7zip, i.e. cat files.tar | p7zip > files.tar.7zip
Vadim, use lzma instead p7zip
(http://tukaani.org/lzma/)
Vadim,
I use p7zip in some of our production machines where size matters, plus it can encrypt the file at almost no performance hit. On an 8-core machine, it can be quite fast as it can use all of the cores at once (make sure to either ‘nice’ it or use it off-peak because it can really slow things down!). It can be used to read from stdin using the “-si” option:
cat files.tar | 7z a [various options] -si files.tar.7z
I’ve found that using a compression level of 3 gives a good balance between compressed size and time spent compressing, or you can set it to 1 and it will be even faster. For a mulit-gigabyte database backup (mysqldump file), it can reduce the file to approximately 70% the size of a gzip’ed in about the same amount of time.
Hmmm, does it have almost no performance hit, or does it really slow things down? I’m confused.
I think Mark meant that 7z slows the machine so extremly “really” down that additional encryption does not hit the performance any more 😉
I would like to see a lzma benchmark, to but i am not sure if it supports multicore procession.
Baron,
Mark meant that using aes encryption does not affect time spent to compression.
This is typical for modern processors, usually because sheduler can’t always use all cores power (cache miss, io bottlenecks, kernel tasks and so on) so small number of cpu resources (but enough for encryption) is always avaiable.
slavik,
I added results for LZMA. with compression level 5 it was not able to finish in 2h, so I stopped that.
That will teach me to post at 1 in the morning!
Yes, I meant that 7zip does take more resources than gzip, but adding encryption doesn’t add any *more*. As for the performance hit, the machines I use 7zip on have distinct busy and non-busy times, so I can schedule a 25-minute, 7zipped backup during a non-busy time fairly easily. I realize this isn’t necessarily normal for most servers though, for the rest of our off-line backups we use gzip. After looking at these results I’m thinking about lzo instead, especially since it’s less of a hit on a busy machine. Unfortunately, I see that version 2 doesn’t have a nice, gzip-like executable. Since we use 64-bit machines almost exclusively, and v2 promises better performance on 64-bit, does anyone have a link to a command-line archiver that can use lzo v2?
Mark,
for me on Ubuntu 8.10 where I did test – lzop comes linked with LZO v2 libraries.
So even your distributive has lzop with LZO v1 you probably can compile it linked to v2.
lzjb
Reminds me of discussions in this post:
ZFS & MySQL/InnoDB Compression Update
http://blogs.smugmug.com/don/2008/10/13/zfs-mysqlinnodb-compression-update/
Vadim,
too ugly results for lzma, can u post system spec?
I test compress/decompress on windows 7zip, quad core amd 9500 with 8gb ram, so I have decompress speed 20 mb/s (too close for regular old 160gb PATA drive speed) and compression (fast mode) speed 8mb/s.
I will try later on a similar system under Linux, and post results.
slavik,
it is Dell PowerEdge R900, 4x quadcores
vendor_id : GenuineIntel
cpu family : 6
model : 15
model name : Intel(R) Xeon(R) CPU E7320 @ 2.13GHz
stepping : 11
cpu MHz : 2127.881
cache size : 2048 KB
with 32GB of RAM
I found that LZMA can’t scale, with -1 it can use only 1 thread, with -5 (or bigger) only 2 worker threads.
“Sets multithread mode. If you have a multiprocessor or multicore system, you can get a increase with this switch. 7-Zip supports multithread mode only for LZMA compression and BZip2 compression / decompression. If you specify {N}, for example mt=4, 7-Zip tries to use 4 threads. LZMA compression uses only 2 threads.” http://www.bugaco.com/7zip/MANUAL/switches/method.htm
In my tests on amd 9950 with 2gig of ram: 4 mb/s compression, and about 8 mb/s decompression.
I think it’s results of terrible optimization of unix port
The decompression speed computation is deceiving. You should divide by the size of the uncompressed data not the compressed one because the better the compression the worse the decompression speed will look when it is not the case.