Data Compression in InnoDB for Text and Blob Fields

Have you wanted to compress only certain types of columns in a table while leaving other columns uncompressed? While working on a customer case this week, I saw an interesting problem where a table had many heavily utilized TEXT fields with some read queries exceeding 500MB (!!) and stored in a 100GB table. In this case, we were not allowed to make any query or application logic changes so we chose to implement the Barracuda file format and utilize compressed rows as this appealed to me for this mostly-read application. One quick way you can see if your rows will benefit from compression would be to read Peter Zaitsev’s blog post and execute:

SELECT AVG(LENGTH((`colTextField`)) FROM `t1` WHERE `id` < 1000

1	SELECT AVG(LENGTH((`colTextField`)) FROM `t1` WHERE `id` < 1000

compare this to:

SELECT AVG(LENGTH(COMPRESS(`colTextField`))) FROM `t1` WHERE `id` < 1000

1	SELECT AVG(LENGTH(COMPRESS(`colTextField`))) FROM `t1` WHERE `id` < 1000

In our case, we saw about a 75% reduction when the TEXT field was compressed which we felt indicated there would be a benefit derived from table compression.

With the original InnoDB Antelope file format, you have the choice of ROW_FORMAT=COMPACT and ROW_FORMAT=REDUNDANT where InnoDB stored the first 768 bytes of variable-length columns (BLOB, VARCHAR, TEXT) in the index record, and the remainder stored in overflow pages. COMPACT became the default after MySQL 5.0.3 and has a more compact representation for nulls and variable-length fields than REDUNDANT.

Using InnoDB’s new Barracuda file format (available since InnoDB plugin 1.1 or MySQL 5.5) you can now leverage table compression by specifying ROW_FORMAT=COMPRESSED. In our case we only wanted MySQL to try to move the larger (greater than 16KB) TEXT fields off-page so we utilized the KEY_BLOCK_SIZE=16 directive. This means that each TEXT / BLOB field that exceeds 16KB it would be stored in its own page (less the 20-byte pointer stored in the index page). Based on our analysis 75% of the blobs stored in the table were over 8KB, which were responsible for 90% of space usage hence compressing only externally stored blobs provided substantial advantages. Why did we choose a KEY_BLOCK_SIZE that is the same value of the InnoDB page size of 16KB? As the fine MySQL manual states:

This setting may still be useful for tables with many long BLOB, VARCHAR or TEXT columns, because such values often do compress well, and might, therefore, require fewer “overflow” pages.

I did not test with a smaller KEY_BLOCK_SIZE as we had minimal time to effect the table compression modification (given the long run-time of the ALTER TABLE), you may find your application benefits from a different KEY_BLOCK_SIZE value. Also, note that you need to enable the dynamic variable innodb_file_format=BARRACUDA (don’t forget to set it in my.cnf!):

SET GLOBAL innodb_file_format=BARRACUDA;

1	SET GLOBAL innodb_file_format=BARRACUDA;

One caveat: you must be running with innodb_file_per_table=1 as the InnoDB system tablespace cannot be compressed, see this page for further details on how to enable compression for a table.

To utilize Barracuda format tables you will need to create them new and migrate data, or affect existing tables with an ALTER TABLE statement. As table compression is table specific, ROW_FORMAT and KEY_BLOCK_SIZE directives are passed via CREATE TABLE or ALTER TABLE statements. In our case, we chose to re-build the table using ALTER TABLE via a null-operation like this:

ALTER TABLE `t1` ENGINE=InnoDB ROW_FORMAT=COMPRESSED KEY_BLOCK_SIZE=16;

1	ALTER TABLE `t1` ENGINE=InnoDB ROW_FORMAT=COMPRESSED KEY_BLOCK_SIZE=16;

In our case even though the customer had a 3GHz 24-core machine the ALTER TABLE was progressing slowly as it was bound to a single CPU while compressing the data. Just have patience. 🙂 Keep in mind too that if you started with a 100GB table and assuming you know your approximate compression rate, you will be left with a considerably smaller on-disk footprint so ideally, you will be able to postpone that purchase of additional disk capacity.

So what was the real-world outcome of this exercise? We were able to show a 70% improvement in queries against this table when the TEXT fields were not part of the query request due to Barracuda not storing 768 bytes of the blob on field, and reduce the table down to 30GB. Happy customer 🙂

One parting idea: you may be able to leverage pt-online-schema-change from Percona Toolkit 2.1 in order to modify the table if you cannot sustain the blocking effects of a traditional ALTER TABLE statement.

I hope this helps you understand a use case where table compression can be beneficial when your workload is mostly-read. Thanks for reading my first percona.com/blog blog post!

3 Comments

Oldest

Newest Most Voted

Inline Feedbacks

View all comments

aftab

11 years ago

Why do you suggest table compression is helpful when workload is mostly-read only? Is not helpful when the load is mostly write and why?

Reply to aftab

7 years ago

Hi aftab, sorry I didn’t reply to this sooner.

You gain a significant performance boost by utilizing compression on mostly-read workloads because there is less disk IO required to retrieve InnoDB pages, which translates (usually) into faster query response time as the database server is blocked less on disk IO.

Your mileage may vary on write-heavy workloads if you do not have many cores or fast enough cores as you then may move the bottleneck to the CPU while waiting for compression to run on the relevant pages before committing to disk. For this reason it is less clear-cut to recommend compression — we recommend that you test your specific queries on your own dataset before implementing compression.

Thanks for the question aftab !

Michael Coburn

Author

11 years ago

Hi aftab, sorry I didn’t reply to this sooner.

Thanks for the question aftab !

MySQL 5.7
End of Life

Compare Percona to Leading Database Solutions

Software
Downloads

Product
Documentation

Resource Hub

Financial Services

Driving Database Success

Percona Blog

Percona Community Hub

Percona Events Hub

About Percona

Percona in the News

Our Customers

Our Partners

Careers

Contact Us

Data Compression in InnoDB for Text and Blob Fields

Related

Related Blog Articles

RECOMMENDED ARTICLES

Valkey/Redis: Not-So-Good Practices

Choosing the Right Database: Comparing MariaDB vs. MySQL, PostgreSQL, and MongoDB

Valkey/Redis: Configuration Best Practices

MOST POPULAR ARTICLES

Auditing login attempts in MySQL

Deploy Django on Kubernetes With Percona Operator for PostgreSQL

MySQL “Got an error reading communication packet”

MySQL 5.7 End of Life

Compare Percona to Leading Database Solutions

Software Downloads

Product Documentation

Resource Hub

Financial Services

Driving Database Success

Percona Blog

Percona Community Hub

Percona Events Hub

About Percona

Percona in the News

Our Customers

Our Partners

Careers

Contact Us

Data Compression in InnoDB for Text and Blob Fields

Related

Share This Post!

Want to get weekly updates listing the latest blog posts?

Related Blog Articles

RECOMMENDED ARTICLES

Valkey/Redis: Not-So-Good Practices

Choosing the Right Database: Comparing MariaDB vs. MySQL, PostgreSQL, and MongoDB

Valkey/Redis: Configuration Best Practices

MOST POPULAR ARTICLES

Auditing login attempts in MySQL

Deploy Django on Kubernetes With Percona Operator for PostgreSQL

MySQL “Got an error reading communication packet”

MySQL 5.7
End of Life

Software
Downloads

Product
Documentation