I’m running in this misconception second time in a week or so, so it is time to blog about it.
How blobs are stored in Innodb ? This depends on 3 factors. Blob size; Full row size and Innodb row format.

But before we look into how BLOBs are really stored lets see what misconception is about. A lot of people seems to think for standard (“Antelope”) format first 768 bytes are stored in the row itself while rest is stored in external pages, which would make such blobs really bad. I even seen a solution to store several smaller blobs or varchar fields which are when concatenated to get the real data. This is not exactly what happens 🙂

With COMPACT and REDUNDANT row formats (used in before Innodb plugin and named “Antelope” in Innodb Plugin and XtraDB) Innodb would try to fit the whole row onto Innodb page. At least 2 rows have to fit to each page plus some page data, which makes the limit about 8000 bytes. If row fits completely Innodb will store it on the page and not use external blob storage pages. For example 7KB blob can be stored on the page. However if row does not fit on the page, for example containing two 7KB blobs Innodb will have to pick some of them and store them in external blob pages. It however will keep at least 768 bytes from each of the BLOBs on the row page itself. With two of 7KB blobs we will have one blob stored on the page completely while another will have 768 bytes stored on the row page and the remainder at external page.

Such decision to store first 768 bytes of the BLOB may look strange, especially as MySQL internally has no optimizations to read portions of the blob – it is either read completely or not at all, so the 768 bytes on the row page is a little use – if BLOB is accessed external page will always have to be read. This decision seems to be rooted in desire to keep code simple while implementing initial BLOB support for Innodb – BLOB can have prefix index and it was easier to implement index BLOBs if their prefix is always stored on the row page.

This decision also causes strange data storage “bugs” – you can store 200K BLOB easily, however you can’t store 20 of 10K blobs. Why ? Because each of them will try to store 768 bytes on the row page itself and it will not fit.

Another thing to beware with Innodb BLOB storage is the fact external blob pages are not shared among the blobs. Each blob, even if it has 1 byte which does not fit on the page will have its own 16K allocated. This can be pretty inefficient so I’d recommend avoiding multiple large blobs per row when possible. Much better decision in many cases could be combine data in the single large Blob (and potentially compress it)

If all columns do not fit to the page completely Innodb will automatically chose some of them to be on the page and some stored externally. This is not clearly documented neither can be hinted or seen. Furthermore depending on column sizes it may vary for different rows. I wish Innodb would have some way to tune it allowing me to force actively read columns for inline store while push some others to external storage. May be one day we’ll come to implementing this in XtraDB 🙂

So BLOB storage was not very efficient in REDUNDANT (MySQL 4.1 and below) and COMPACT (MySQL 5.0 and above) format and the fix comes with Innodb Plugin in “Barracuda” format and ROW_FORMAT=DYNAMIC. In this format Innodb stores either whole blob on the row page or only 20 bytes BLOB pointer giving preference to smaller columns to be stored on the page, which is reasonable as you can store more of them. BLOBs can have prefix index but this no more requires column prefix to be stored on the page – you can build prefix indexes on blobs which are often stored outside the page.

COMPRESSED row format is similar to DYNAMIC when it comes to handling blobs and will use the same strategy storing BLOBs completely off page. It however will always compress blobs which do not fit to the row page, even if KEY_BLOCK_SIZE is not specified and compression for normal data and index pages is not enabled.

If you’re interested to learn more about Innodb row format check out this page in Innodb docs:

It is worth to note I use BLOB here in a very general term. From storage prospective BLOB, TEXT as well as long VARCHAR are handled same way by Innodb. This is why Innodb manual calls it “long columns” rather than BLOBs.

15 Comments
Oldest
Newest Most Voted
Inline Feedbacks
View all comments
Shlomi Noach

Hi Peter,

Thank you! I fell into the same misconceptions you described.
With regard to COMPRESSED format: I’ve noticed no (or unnoticed) reduction in optimized table size for very large tables with long text columns when using KEY_BLOCK_SIZE=16.
You say only the text that won’t fit into the row page is compressed?
How is the text compressed? (any known algorithm?) Is the compression done per-text value, or does it build a dictionary by utilizing more than one value?

Vince Gatto

Peter,

For applications that are very blob heavy, do most people seem to stick with MySQL? What are some of the popular alternatives you see when you’re out in the field?

Ivan Novick

Also note for very large blogs, as far as I recall, innodb allocates memory for the entire blob to be returned to the user. So imagine a blob that is 1 GB in size and you have 25 conncurrent requests for such a sized blob, innodb will need to allocate 25 GB of memory to satisfy this query. See: http://mysqlinsights.blogspot.com/2009/01/mysql-blobs-and-memory-allocation.html

The people at pbxt have created a solution for their storage engine using a streaming blob protocol, but this requires a change to mysql client as far as i know: http://www.blobstreaming.org/

Shlomi Noach

Weird. This post does not show up on planetmysql.

Tony

I seem to be missing the point?? WHY, WHY, WHY would you save images in your DB when saving the link/URL to a simple varchar field and letting the OS do the work on the image directory is not MUCH more efficient?? BLOB, really?

paul

Hi Peter,

re: “If all columns do not fit to the page completely Innodb will automatically chose some of them to be on the page and some stored externally. This is not clearly documented neither can be hinted or seen.”

Last year as a result of docs Bug#29042 (http://bugs.mysql.com/bug.php?id=29042) we did identify that the manual indeed implied some of the misconceptions to which you allude. We tried to make some improvements in the discussion. See:

http://dev.mysql.com/doc/refman/5.1/en/innodb-file-space.html:

“The maximum row length, except for variable-length columns (VARBINARY, VARCHAR, BLOB and TEXT), is slightly less than half of a database page. That is, the maximum row length is about 8000 bytes. LONGBLOB and LONGTEXT columns must be less than 4GB, and the total row length, including BLOB and TEXT columns, must be less than 4GB.

If a row is less than half a page long, all of it is stored locally within the page. If it exceeds half a page, variable-length columns are chosen for external off-page storage until the row fits within half a page. For a column chosen for off-page storage, InnoDB stores the first 768 bytes locally in the row, and the rest externally into overflow pages. Each such column has its own list of overflow pages. The 768-byte prefix is accompanied by a 20-byte value that stores the true length of the column and points into the overflow list where the rest of the value is stored.”

That doesn’t address newer storage formats provided by InnoDB Plugin, of source. Integrating Plugin information into the manual is a longer-term project still in progress.

Very interesting article, thanks for writing it.

Dani

— Why store blobs in DB rather as opposed to plain files in a filesystem?

I agree with you in general, but there are applications where storing tons of files in a filesystem is bad.

I ran into problems having many thousands (millions?) of files, where I ran out of inodes in the filesystem I was using. Besides disk size, I did not want to have to worry about keeping track of inodes nor extra options to mkfs to increase inode counts whenever deploying a new server or increasing disk or logical volume size.
If you split things into multiple subdirectories then you also have to ensure your application(s) maintain those directories — create new, delete old when empty, etc.
Also doing an ‘ls’ on a huge directory will tie things up on your server.

The more efficient thing to do is:
A) create 1 table that has ONLY an id sequence column set as primary index, and 2nd column for the blob. This table could conceivably be in a different database if you so choose.

B) All other metadata about this blob must be held in a separate table, and the id column used to associate rows (1-to-1 relationship).

C) Do your hefty queries (with joins, complex where clauses, etc) ONLY against the metadata table, and when you actually need the data from the blob query just that single row from your blob table.

What you achieve:
1) A single large file in the filesystem for that 1 InnoDB table that holds the blob. Queries against this table will use the primary index, and will only pull from the actual table what you need.
If disk space usage is important, use row compression and pay the CPU price on your DB server, or have your application compress/decompress upon INSERT/SELECT to pay that price on your app. server.

2) A well-designed meta-data table will contain everything you need to know about that blob, and you could design things to even contain a field that is, say, the first 50 bytes of your blob — if you need that for indexing or searching.

If you think you need multiple blob columns, then as Peter suggested, consider combining them into 1 blob and on the application side splitting up the blobs. That might not so easy either, so maybe use multiple “blob-only” tables that have just a few blob columns each — less than 10 he said was problematic.

Tobias

Thanks Peter,

finally a clear statement on storing images as blobs. 🙂
I have been looking for this single-lined answer for quite a long time now…

Tobias 🙂

Mohamad

@Peter, at which point does VARCHAR become LONG VARCHAR, and this gets treated the same as TEXT and BLOB? Is there a character threshold?

Rolfen

Another reason for using BLOBs vs files is, what I believe to be called “referencial integrity”, ie, when you delete a parent row, the data gets delete too (through foreign key cascading, or directly if the BLOB is stored in the same row). You don’t need to worry about keeping track of and deleting files in the filesystem.

Mahesh

my DB a table contains 20 long-blob columns(columns can’t be moved in another table) Each row contains a data size of 0.55 MB approximately. i created indexes on that table but it doesn’t improve the performance.We need to store more than 1 million records in this table so i am planning to partition this table. My question is whether Partitioning will improve the performance or not (Read/Write)?

Can you help me on this how to improve it.Suggest if any alternate way to improve.