It is a known fact that ext3 is not the most efficient file system out there and for example file removals can be painfully slow and cause a lot of random I/O. However, as it turns out, it can sometimes have a much more severe impact on the MySQL performance that it would seem. When or why?

When you run

there are several things that need to happen – write lock on a table so it cannot be used by any other thread, the data file(s) removal by the storage engine and of course in the end MySQL has to destroy the definition file (.frm). That’s not all that happens, there is one other thing:

The entire code which removes a table is surrounded by

mutex. The mutex is used in various places inside MySQL, but primarily when tables are being opened or closed. It means that when

is locked, no query can be executed because they are stopped from accessing any table.

Here is when the slow file removal on the ext3 file system starts to be a pain. Deleting a 10GB file can take a few seconds and if that is a MySQL table, the mutex remains locked for all that time stalling all queries:

I tried some alternative approaches to trick MySQL into removing smaller files with

to minimize the effect, such as:

  • TRUNCATE TABLE large_table; ALTER TABLE large_table ENGINE=…; DROP TABLE large_table;
  • TRUNCATE TABLE large_table; OPTIMIZE TABLE large_table; DROP TABLE large_table;

Unfortunately as it turned out each of the administrative commands like

or

one way or another uses

mutex when the old table files are deleted:

The only alternative seems to be changing the file system. To XFS for example, which handles file removals much more efficiently:

EXT3

XFS

A better solution through MySQL internals could be to simulate the table drop by renaming the corresponding data file or files and physically removing them outside of the mutex lock. However it may not be that simple, because the actual removal is performed by the storage engine, so it’s not something MySQL code can control.

This is certainly not a common situation, but may become a problem to anyone when it’s the least expected (e.g. dropping old unused tables).

23 Comments
Oldest
Newest Most Voted
Inline Feedbacks
View all comments
Olivier B.

And what about ext4 ? Since some months all my new MySQL servers use only ext4 instead of xfs ; is it a wrong choice ?

Toby

Surely the correct solution is, as you say, to change the metadata (quickly) and let the file system operations happen asynchronously. Changing filesystems to solve this sounds pretty backasswards. (That said, how does reiser3 do here?)

Richard Dale

A solution from a completely different field.

The developers of mythtv (a TV recording system) achieved better performing deletes of large files on ext3 through the “slow delete” feature. Typically TV recordings are 1-12GB per hour, so a movie could be 20+GB in size. Without the feature, a delete of the file on ext3 would lock up the entire system for about 20-30 seconds.

The “slow delete” simply truncates the files in chunks in the background.

Patrick E

Does this also apply to temp tables? Seems like this could be a huge problem and would possibly cause a large sort to block other arbitrary select queries?

pat

Generally, temp tables are going to show up in /tmp which is usually tempfs. I wouldn’t expect to see slow delete mechanics at work in that case since you’re basically memory mapped although I’ll admit to not having run a test :).

Olivier B.

For me there is 3 cases where tables are created in /tmp :
– when you use “create temporary table”
– when MySQL need a temporary table which use BLOB or TEXT column
– when MySQL need a temporary table which is bigger than tmp_table_size/max_heap_table_size

Right ?

And because of this last case, we can’t use /dev/shm to store temporary tables : it may use too much memory.

pat

Maybe I’m wrong here (its been a while since I looked seriously at filesystems), but isn’t tempfs always implemented as virtual memory on linux?

http://en.wikipedia.org/wiki/TMPFS

Ryan Huddleston

This issue has been a big problem for us for a long time as we constantly create and drop schemas. I worked around the issue by having a special method that drops tables one at a time and sleeps if the process count gets too high on the master or slaves. Though the solution is far from ideal as large tables still lock things up for too long.

See the mysql Worklog task for this at: forge.mysql.com/worklog/task.php?id=3983

Patrick E

On our production redhat machines at least it seems to be ext3, not sure if that’s default or a quirk of our particular setup though.

/dev/vg00/lvtmp /tmp ext3 defaults 1 2

And indeed tmpfs is virtual memory (RAM+SWAP) backed (http://en.wikipedia.org/wiki/TMPFS), however for a dedicated mysql machine that doesn’t make too much sense, since if you had the ram you’d just increase your tmp_table_size etc to use that ram directly i imagine. I don’t mind that these large sorts use disk, but i do mind if it blocks my other select queries for no good reason. As a general rule, locks should absolutely never be held around any I/O operation…

Patrick E

To explain our rationale for not using tmpfs, and using a disk store like ext3:

1) Our database is typically 100% cached (both data and indexes), 16 – 20 GB in a few hundred MyISAM tables (out of 24 GB RAM on the machine). It is 99.9999% reads on a very static but highly random set of data.
2) Most sorts are small or medium sized, we have max_heap_table_size=100M and tmp_table_size=100M in order to do these in memory
3) We have occasional large (2-4 GB +) sorts that we’d like to spill to disk. It’s okay if these are somewhat slow. I/O contention is not an issues since all reads are cached and the few writes we have are not performance critical.

I’m just wondering if in this case this lock will bite us for these large sorts (which afaics, we definitely do need on disk, since our ram is mostly dedicated to caching the db / indexes). I’d go spelunking, but have no experience with the mysql codebase really. Just a concerned user here 🙂

Peter Zaitsev

Maciek,

I should note this does not only applies to MyISAM tables but to Innodb tables in innodb_file_per_table mode as well.

I recently spoke to the customer which simply can’t drop 400G table because it stalls everything for long time.

Ryan Huddleston
Georgi Alexandrov

Maciej,

you’re testing this with ext3’s journal data, ordered data or write-back mode? As the default is ordered data and the closest behavior to XFS can be achieved via the write-back mode (which btw will become the default soon).

Michael Will

Note that ext4 as it is included in RHEL5.3 is clearly marked as a technology preview, not production quality. But since they refused to support XFS in favor of the outdated ext3 for years now, the best bet for heavy I/O machines is Novell SLES anyways 🙂

Michael

Garrett

I’ve found doing a TRUNCATE on a MYISAM table will resize the .MYD file without a lock. You can then drop the table instanstly. With InnoDB tables, the .IBD file doesn’t get resized/deleted until you OPTIMIZE or DROP.

Mark Cotner

If you’re using percona xtradb binaries you can use this option to reduce the pain.

https://www.percona.com/doc/percona-server/5.1/performance/innodb_lazy_drop_table.html

‘njoy,
Mark

Robert Lu

You can create hard link of table file, and drop table.

drop will only minus 1 from table file’s count, it doesn’t lock table very long time.

And then, we can delete hard link, which will doesn’t lock table.

Murali Krishna

This is still a problem even after 10 years. I recently experienced this with a degrade in the performance when dropping a database which is hosted along with 10 other databases on the node and almost everything stalled for 3 minutes with a rapid increase of connections from 80 to 500. I am using MySQL 5.5.19