Despite being standard Innodb feature forever Insert Buffers remains some kind of mysterious thing for a lot of people, so let me try to explain thing a little bit.

Innodb uses insert buffer to “cheat” and not to update index leaf pages when at once but “buffer” such updates so several updates to the same page can be performed with single sweep. Insert buffer only can work for non-unique keys because until the merge is performed it is impossible to check if the value is unique.

Insert buffer is allocated in the Innodb system table space. Even though it is called “buffer” similar to “doublewrite buffer” it is really the space in the tablepace. Though it can be cached in the buffer pool same as other pages. This property allows insert buffer to survive transaction commits and even MySQL restarts. Really it may take weeks before the given index page is merged, though usually it is much sooner than that.

There are two ways of insert buffer merge is happening. First is on demand merge – if accessed page contains unmerged records in insert buffer the merge is performed before page is made available. This means insert buffer can slow down read operations.

The other way insert buffer is merged is by background thread. There are very little merges happening if system is loaded and merge process becomes more active if system is idle. This behavior can cause interesting results, like you had system lightly used and have very little IO activity, but when you remove the load from the system completely you see high IO load which goes for hours even after all buffer pool dirty pages are completed. This can be very surprising.

Stats about Innodb Insert Merge Buffer are available in SHOW INNODB STATUS output:

The “seg size” is a full allocated size of segment in pages. So in this case it is about 180MB
The “free list” is number of pages which are free – containing no unmerged records. The “size” is size (in pages) of insert buffer which is not merged.

The fact size is in pages is not really helpful because depending on the row size there can be different number of rows in the insert buffer – and it is rows we see in performance stats, for example to understand when insert buffer merge will be completed.

The “inserts” is number of inserts to insert buffer since start and number of merged records is number of records which were merged to their appropriate page locations since start. So we know in this case insert buffer has grown 534339 records since start. There is a temptation to use this number as count of unmerged rows in insert buffer but this would not be correct – insert buffer may not be empty at the start. So you can only tell insert buffer has at least this number of records. For the same reason do not get scared if you see more merged records than inserted.

The value of 2246304 merges shows us there was about 3 records merged for each merge operation, meaning insert buffer could in theory reduce IO needed to update leaf pages 3 times.

As I mentioned Insert buffer merge can take quite a while – with 100 records merged per second we’re looking at least 5343 seconds or 1.5 hours on this server… and there are insert buffers which are 10x and 100x larger than this.

Innodb unfortunately offers no control for insert buffer while it surely would be benefiting for different workloads and hardware configuration. For example there is very good question if insert buffer really makes sense for SSD because saving random IO is not so much needed for these devices.

15 Comments
Oldest
Newest Most Voted
Inline Feedbacks
View all comments
pat

I wonder how many of the “tricks” that modern databases use to work around poor random access IO patterns on winchester drives are going to end up being counterproductive once solid state becomes the norm.

Does the extra complexity of something like the double write buffer or the insert buffer really provide a worthwhile benefit if you assume equal access time to any point in the io subsystem?

Baron Schwartz

You didn’t mention that the Percona patches provide control over the insert buffer.

Pavel

Pat, will solid state disks survive database workload with their 100k possible writes only?

Morgan Tocker

Pavel,

The 100k max writes refers to the cheaper MLC (Multi layer) Flash. SLC (aimed more at the server end) can sustain a higher number of writes.

Dan Dascalescu

Solid state drives are not the big revolution. The big revolution is nano-wire memory: 1000 faster than Flash memory, terabyte capacity, and lower power consumption.

This will render databases obsolete, along with silly tricks involving hard disk rotation cycles.

Wagner Bianchi

Can you give me an explaination about merged and unmerged pages?

Wagner Bianchi

OK, Peter…thank you…I was missunderstood.

mz

Innodb locking problem with unique keys when inserting

CREATE TABLE male_id (
entry_id int(10) unsigned NOT NULL AUTO_INCREMENT,
jy_id int(10) unsigned NOT NULL,
PRIMARY KEY (entry_id),
UNIQUE KEY jy_id (jy_id)
) ENGINE=InnoDB AUTO_INCREMENT=1

When 10 processes concurrently inserting rows to the above table, only the rows from one process are actually recorded in the table. I tested with autocommit=1 and even when transaction_isolation_level=seriazable. It seems Innodb is not doing locking properly. If I remove the primary key (entry_id), the problem remains. But the problem is gone if I change the unique key (jy_id) to be index (jy_id).

justin.ren

There are two ways of insert buffer merge is happening
how about the third way: slow shutdown

mark wang

Hi Peter,
For MyISAM, the bulk_insert_buff_size have similar effect. So it is still true for MyISAM for the following “Insert buffer only can work for non-unique keys because until the merge is performed it is impossible to check if the value is unique.”

Regards
Mark

praji venu

I could not find any reference on MySQL how the inserts are buffered (not the secondary index). Does the insets get buffered in the buffer pool like updates and synced later with the ibdata file. ? For updates this buffering improves the performance and reduce the I/O. Are inserts are also buffered the same way. If not then what is the reason. thanks

Praji

Alain Craven

We have a “work horse” data server performing massive amounts of data purification (string matching with huge secondary indexes). The server is designed to clean and parse close on a TB of data using fuzzy matching.

I find that with a very large insert buffer I am unable to perform a consistent backup with innobackupex.
The apply-log always fails.
It appears that the xtrabackup_log grows too big (can be multi-GB). Even just applying the log causes the restore server resource usage to increase very high (close to 100%).
Right now I am forced to use either a slave and stop the slave until writes completes, then backup, or stop heavy writes to the master and wait for insert buffer to clear.