InnoDB compression is getting some traction, and I see quite contradictory opinions. Someone has successful deployments in productions, and someone says that compression in current implementation is useless.
To get some initial impression about performance I decided to run some sysbench with multi-tables benchmarks.
I actually was preparing to do complex research, but even first initial results are quite discouraging.

My setup: Dell PowerEdge R900, running Percona-Server-5.1.57-rel12.8 (will be in public release soon), storage is FusionIO 320GB MLC card, which does not matter a lot in this case of CPU-bound benchmark.

First stage – load data. Scripts for multi-table sysbench allow to load data in parallel, so let’s load in 16 tables in 16 parallel threads, 25,000,000 rows in each tables. That gives about 6GB of data per table (uncompressed) and 96GB of data in total.

Results: Load time for regular tables: 19693 sec, for compressed tables: 38278 sec.
Compressed tables are create as: ROW_FORMAT=COMPRESSED KEY_BLOCK_SIZE=8,
with final size 3.1GB per table. So we have 2x win in space in trade for 2x worse time. Maybe fair deal.

Now let’s run oltp read-only workload in 16 parallel threads with limiting dataset to 3,000,000 rows per table, that is in total about 11GB of working set. Using 24GB of memory for buffer_pool will give us fully in-memory CPU-bound workload.

command to run:

this will report us results each 10 sec.

After initial warm-up the throughput for regular tables are stabilized on level 4650 transactions per sec. I expected some overhead for compressed tables, but not such: the throughput with compressed tables are 30 transactions per sec. This is 150x difference.

As workload is clear read-only CPU bound, let’s check CPU stats:

regular tables:

Compressed tables:

With regular tables CPU is utilized 85% and this is quite decent number. With compressed tables
CPU utilization is 7%. Obviously we have some mutex serialization problem.

Analyzing SHOW INNODB STATUS (SEMAPHORES) for workload with compression tables we can see

Apparently using compressed tables we have very strong contention in LRU_list_mutex.

I should check how compressed tables perform in IO-bound workload (this is where they should give main benefit),
but for in-memory load it shows significant scalability problem.

14 Comments
Oldest
Newest Most Voted
Inline Feedbacks
View all comments
sampath

Hi ,
I am working on the MySQLv5.5 which is named as performance release, but i got degraded performance.I have changed some attributes in my.cnf as follows: *)innodb_buffer_pool_size but its a crap. Would you please let me know is there any other settings has to changed in MySQLv5.5 to acquire performance…

Josephan

Hi all,
I’m a newbie to MySQL, I have an issue with an InnoDB table. I stumbled on this site while looking for solutions to this. I want to know, if it’s possible to allocate space to an InnoDB table. Currently The table with the problem has 0kb free space left. and it has only 288 rows. But it also has an an AUTO_INCREMENT field with current value of 8345. Currently no data is being inserted into the table. it just doesnt store the data anymore. I need some help to find out what the cause is.

Thanks

Ovais Tariq

Technically speaking compression should help in I/O bound workload because it would allow more data to be buffered in the bufferpool thereby reducing disk operations. And today we have multi-core processing machines which shouldnt find it difficult decompressing the compressed pages on a need basis,.

Technically it looks as if compression should have a positive impact on performance, but I guess its still too early to use in production,.

Peter Zaitsev

Interesting results from both Mark and Vadim. I think it shows well there is a lot of research to be done with compression still.

Arrix Zhou

A recent benchmark by facebook is showing different results (also using 5.1):
“Not only did InnoDB cache the table in the buffer pool, it was also able to cache uncompressed copies of all data for the compressed tables and performance for compressed tables was the same as for the uncompressed table.”

http://www.facebook.com/notes/mysql-at-facebook/innodb-compression-for-read-only-workloads/10150190328375933

lanh

Great article, compression is the way to go

Vojtech Kurka

Vadim, your test verifies my findings with Innodb Compression: http://forum.percona.com/index.php/t/1774/

Our workload was not fully cpu bound and the interesting fact was, the compression also slowed down operations on COMPACT tables.

AndyScott

Peter,

Thanks for the link.

The performance of compression is indeed disappointing. Looks like I’d be better off using a small page size than compression.

By the way have you seen this post:
http://ralpartha.blogspot.com/2011/05/effect-of-changing-innodb-page-size.html

He seems to have obtained results that are completely opposite to yours. In your test smaller page size offers better performance. In his test the smallest page size (4K) produced the worst performance.

Any idea what might have caused the difference?

Peter Zaitsev

Andy,

I did some tests exactly about how much data we can fit in buffer pool for compressed vs non compressed data. It helps somewhat but significantly less than I would expected.
http://www.mysqlperformanceblog.com/2011/05/10/innodb-caching-part-2/

inaam

Vadim,

Wondering if you can try this with stock MySQL 5.5. I believe LRU_list_mutex is Percona Server specific and it most likely is protecting operations on LRU list which are protected by the buffer pool mutex in 5.5. This would mean that in 5.5 buffer pool mutex should show up as a hot mutex. In that case, having multiple buffer pools may ease pressure on the buffer pool mutex.

Having said this it may as well be that buf_pool_zip_mutex turns out to be the culprit. This mutex is a single replacement of block::mutex for all pages that are present in the buffer pool in compressed only form i.e.: the compressed only pages in a buffer pool don’t have separate mutexes to protect fields in buf_page_t instead this protection is provided by buf_pool_zip_mutex.

Glad to see that finally you and others are getting to test compression. Hopefully, we’ll iron out any wrinkles soon.

inaam

Andy,

Yes, InnoDB can cache more data pages when compression is used. It is true that when a page is accessed we need a decompressed version of the page which is also kept in the buffer pool alongside the compressed version. But the LRU mechanism has the capability to discard the uncompressed version of a data page to make room for another page to be read in. What this means is that there can be pages in the buffer pool in compressed only format. In fact, at any given point in time the buffer pool can contain upto 90% compressed only pages.

Of course, when these compressed only pages are accessed these have to be decompressed which eats up a bit of CPU. The idea is to cache more data to avoid IO at the cost of extra CPU cycles for decompressions.

AndyScott

Does compression help in turning a dataset that doesn’t in memory into one that does?

I read that InnoDB keeps both compressed and uncompressed page in memory. In that case using compression would actually increase memory requirements because now there is duplicate data.

I had assumed the best use case for compression is to turn a table that’s too big to fit in memory (eg. a 30G table and a 20G buffer pool) into one that fits (eg. compress the 30G table into 15G which now fits in the 20G buffer pool). But does that really work seeing how InnoDB is keeping both compressed and uncompressed pages in memory?

Peter Zaitsev

Vadim,

Can you test with non uniform distribution too ?

From what I’ve seen for Compression to be reasonable fit you have to have most accesses happening in memory and touching pages which are uncompressed already.

It also looks to me like Compression code did not get much love in MySQL 5.5 optimizations. Frankly I do not see any benchmarks published which would include compression.