One of the big “Selling Points” of MySQL is support for Multiple Storage engines, and from the glance view it is indeed great to provide users with same top level SQL interface allowing them to store their data many different way. As nice as it sounds the in theory this benefit comes at very significant cost in performance, operational and development complexity. Let me touch just on the few complications it brings in

Conversions – Each storage engine has its own data storage format so MySQL has to do a lot of copying and conversions while accessing the data, which significantly limits performance compared to basically “zero copy” design one may have streaming data from memory when it fits there.
Optimizer and Execution Storage engines are not created equal especially if you look at In Memory storage engines vs Distributed ones. Trying to handle all different cases in Optimizer – such as in memory nature of MEMORY storage engine, clustering by Primary key in Innodb and distributed nature of NDB makes it more complicated when it should be and still because it has to serve so many use cases it is not tuned to exploit full performance potential of any given storage engine.
Synchronization The top level on MySQL side, such as .frm files and binary log files has to be synchronized with storage engine transactions which require a lot of complications and performance overhead. MySQL has to do several fsync() calls per transaction commit to be on the safe side and still number of operations are not fully safe – you can for example get .frm out of sync with internal data dictionary if you crash at wrong time.
Transaction Support We have to deal with mix of locks on high level as well as on storage engine internally. We have to deal with different locking strategies storage engines may have and if we venture into using cross storage engine transactions we’re up for a lot of surprises.
Backup Cross storage engine backup is a pain because they are so different. Some of them are distributed and some in memory so even OS level approaches such as LVM backup may not work in all cases. We also get into synchronization aspects making backup which prevents from having hot backup in many cases.
Replication Support for different storage engines is a pain in replication. To start we have to write second binary log file when all transaction engines have their own !
Quality Assurance Multiple storage engines are much more complicated to test. And I believe multi-storage engine use is really not tested very deeply. I’ve seen some tests being run for different storage engines, but really this is not enough as MySQL allows you using multiple storage engines in the query you need to test all kinds of combination such as running multi-table update joining PBXT and Innodb tables.
Operations Things like backup and monitoring becomes a lot more complicated plus you have to deal with balancing – allocating resources to different storage engines and performance tuning becomes more complicated.

Now what is interesting for probably 95% of applications single storage engine would be good enough. In fact people already do not love to mix multiple storage engines very actively because of potential complications involved.

Now lets think what we could have if we have a version of MySQL Server which drops everything but Innodb Storage engine (it could be NDB, PBXT etc, it does not matter).

We could save a lot of CPU cycles by having storage format same as processing format. We could tune Optimizer to handle Innodb specifics well. We could get rid of SQL level table locks and using Innodb internal data dictionary instead of Innodb files. We would use Innodb transactional log for replication (which could be extended a bit for this purpose). Finally backup can be done in truly hot way without nasty “FLUSH TABLE WITH READLOCK” and hoping nobody is touching “mysql” database any more. Single Storage Engine server would be also a lot easier to test and operate.

This also would not mean one has to give up flexibility completely, for example one can imagine having Innodb tables which do not log the changes, hence being faster for update operations. One could also lock them in memory to ensure predictable in memory performance.

22 Comments
Oldest
Newest Most Voted
Inline Feedbacks
View all comments
Jay Paroline

I think this is exactly the reasoning why PostgreSQL only has one storage engine to rule them all.

It seems like the Drizzle project is trying to address a lot of these types of performance issues, but they are keeping support for pluggable storage engines (MyISAM notably absent) and even extending it. I wonder what their take on those issues is, or if they are able to get around some of them (perhaps by rigidly defining how a storage engine must behave/store data to avoid the copying issues?). If anyone has some insight I’d love to hear it.

Igor Babaev

Yep. The only minor problem is to find developers to implement all this.

Arvids

These are right thoughts on how to boost the InnoDB performance and MySQL as a whole. But still, MySQL is really good in cases like you have tables with lots of changing non essential data to move to MEMORY table or tables (like “online”, “peers” in Bittorrent trackers, etc). It gives a really good performance boost. NDB is second good thing to use, although it’s not used as wide as other things.
Basicaly I thing than there shouldn’t be InnoDB only, but to leave some essential engines like InnoDB, MEMORY and NDB and make them work together better. It would be like performance boosted edition with limited capabilities on supported storage engines.

Vladimir Kolesnikov

Peter,

I see your point but I think the current MySQL approach addresses the performance in a more wider scope – you cannot make generic code fit well one specific engine but as time passes and demands change this approach allows fast development of solutions (implemented in the form of engines) that address specific problems better than one highly optimized generic solution.

I think in some sense this is the philosophy of MySQL 🙂

Isotopp

PeterZ, most of these commercial storage engines do not exist – at least where I work. See http://mysqldump.azundris.com/archives/87-Why-commercial-storage-engines-do-not-excite-me.html

Hi Peter!

Some of what you are referring too is just the cost of having much of the design of MySQL based around the limitations, and features, of MyISAM. You can’t entirely get rid of the concept of “multiple engines”. Why? Because you will want to have an in-memory engine for some actions (and Innodb treats standard and temporary tables as the same, which is non-optimum).

As far as testing goes, there are many bugs in the multi-storage interface. It took us a number of months to make the test suite all work with Innodb (and fix the bugs along the way). Paul is still working on fixing issues that were found with PBXT after porting to Drizzle (since we can run a larger test suite against PBXT).

In Drizzle we have taken some engines which cause performance and/or feature issues and made them temporary only for the time being. This solves a number of architecture issues (and we are just beginning to really make use of some of this, there is a lot of work that has to be done in order to make a number of things work). It is tough to really weigh the real cost in multiple engines until you understand how internally each is used (and how the cost adds up). At some point when we close in on our beta I will blog about some of these.

Cheers,
-Brian

Paul McCullagh

Hi Peter,

I think Brian has a very good point when he refers to: “much of the design of MySQL based around the limitations, and features, of MyISAM”.

It is still clear today that the storage engine API was based on a non-transactional, ISAM-like interface. There was no re-design done, when the first transactional engine was added.

But, this is history, and I know that there were very good reasons for doing things this way at the time (for example, not wanting to destabilize the existing MyISAM implementation).

What Drizzle has recognizing, for example, with its latest changes to the API is that the non-transactional case is simply a subset of the transactional case. For example: an engine->startTransaction() call is a no-op for a non-transactional engine.

But this call does not exist in the standard MySQL engine API! Why not? Well because it is possible for the engine to hack in a begin transaction call by counting external_lock() calls…

So what I am saying is this: all of the points you raised above are major sticking points. However, most of these problems can be solved by moving to a well designed API.

And, I think it is worth it, because this is definitely something that sets MySQL/Drizzle apart from other SQL implementations.

Percona has written its own storage engine, xtradb. Does this mean you are going to discontinue development on it? Or will Percona server only use xtradb in the future?

I have no idea how to understand this post in this context; either you are going to radically change Percona Server / xtradb to be incompatible with MySQL, or you are saying one thing and doing another and being a hyprocrite.

Rob Wultsch

For the sake of argument why not just use Postgres?

PG seems significantly less buggy and significantly more responsive to bugs. (#989 I am looking at you)
PG only has a single storage engine. No multiple data dictionaries and hyper easy hot backups.
PG seems to have fewer weird issues scaling up.
BSD licence.
No conference surprise features.
No crippleware^H^H^H^H^H^H^H^H^H^H enterprise only features.

Roland Bouman

Great post!

I always like the idea behind the ability to use multiple storage engines. The idea that “not all data is created equal” makes sense, and it stands to reason that the is benefit in the ability to address specific needs with a specific storage solution. But over time, a few things made me feel uncomfortable:

– if different storage engines cater to different use cases, then why is the SQL dialect, even DDL, just a fixed set? Shouldn’t we be able to configure tables backed by a particular storage with options that are specific to that engine? In the case of a storage engine plugin, shouldn’t the SQL dialect extend accordingly? Sure, there may be some table options today that apply only to a specific engine, but the syntax for these options is hard-wired in the code proper – not in any particular storage engine. Essentially this means that if you want to develop a storage engine yourself that needs specific table options, you can’t just write a plugin – you actually have to fork the server to allow the user to talk properly to your engine.

– perhaps an extension of the previous point, shouldn’t data type support be dynamic too? It seems reasonable that one of the special advantages of a particular storage engine would be that it implements particular data types. Again, if we’d want to create an engine to support a particular specific data type, we have no option but to fork the server.

– you already mentioned it, but indeed – what about the optimizer? Shouldn’t the optimizer be extensible and pluggable too? Column-oriented storage engine vendors like infobright and calpont all had to put in their optimizer next to the native one in order to benefit from their engine. Is that just a corner case? WOuld it even be possible to come up with an optimizer design that is so generic that it is even possible to fully benefit from different engines?

(I am not pretending I have any answers – just voicing concerns about the current state of the ‘pluggable’ SE)

Henrik Ingo

Hi Roland

The points you raise are actually being worked on:

CREATE TABLE options exists in MariaDB 5.2: http://askmonty.org/wiki/Manual:Extending_CREATE_TABLE

As part of Timour’s “Query Fragment Pushdown” work (this is a set of several worklogs) in the SE summit we also discussed the issue of “normalized cost model” for the optimizer, where an engine could give info to the optimizer on which operations will be efficient and which are costly. (A more detailed explanation would be beyond my ability to write it, and space available here…) The obvious alternative would be a pluggable optimizer, but it seems preferable to have the optimizer constant, and let the engine just influence the decision.

Some other of Peter’s comments are also valid, but are not to my knowledge being addressed – for instance nobody is trying to get rid of the .frm files. Especially for “remote” engines like NDB, syncing the .frm is a hassle.

waterguo

well said. modularization comes with cost. in database world, the cost is too high to be ignored.

Kevin

Doesn’t MyISAM still offer some significant benefits in certain situations? If you have a database with a relatively small, constant, known INSERT/UPDATE load, but a very large SELECT load, are you not still better off using MyISAM with fixed-length rows over InnoDB?

If so, I have to question the assumption that “for probably 95% of applications single storage engine would be good enough” if that storage engine is InnoDB. I find it hard to believe that the sort of use model I describe accounts for less than 5% of all MySQL uses.

Has this situation changed?

Simon

Yea, I don’t use MySQL hence I don’t have the possibility to use different storage engines as all other database engines have only one storage engine. I’ve never suffered from that.