About a week ago Marten send me email pointing to his article published on Jays Blog (Come on Marten, it is time for you to get your own blog). I should have replied much earlier but only found time to do that now. So here is my list

1. Be Pluggable

Unlike many OpenSource projects MySQL was single chunk of code and for years the only way you could officially extend it was using UDFs which was very limited. Compare this with other OpenSource projects such as PostgreSQL (plugable indexes etc), Apache, PHP or Linux Kernel. Yes in MySQL 5.1 the situation is changes – now there are plugable storage engines (something even PostgreSQL does not have) as well as Full Text Search parses but there is very long way to go before you could do any significant functionality ourside of storage engines as plugins.

Two aspects why I’m especially interested about it is Sphinx which is currently implemented as storage engine for MySQL but this role does not suite it well as well for our performance optimization works – we would like to get much more stats from MySQL and it would be great if we could write these as plugins instead of patches.

2. Be Scalable

I’d like to see MySQL Scaling to large amount of CPUs for all storage engines all operation systems and all workloads. Well this may be too much of the wish but at least it should happen for most common cases. Innodb scaling issues are the most common problem (and patches currently available only fix part of it) but far from being only problem. There are fair amount of issues with MyISAM and generally on MySQL level which needs to be taken care of. Many of these issues were known for years. The good thing is now with multi-cores coming even on laptops these issues can’t be under carpet any more and they are getting attention and gradually being fixed.

3 Be Distributed
One of the most common buzzwords used together with MySQL is Scale Out, but to say the truth MySQL does not provide much of the functionality for scale out itself, besides its fast and simple replication. You still would normally need to manually chop your data to the multiple databases and handle data merging on application together with high availability and all kinds of problems. The dream for many MySQLers is to be able to get bunch of boxes and use them same as single MySQL server with improved performance and high availability. Of course there is MySQL Cluster but it does not get where quite yet having very particular performance properties and operating requirements. In fact I do not think MySQL Cluster would ever be that thing as it provides too strong guarantees. It is same as with MySQL Replication – the reason it is fast and has low performance overhead on the master is at large extent because it is asynchronous. To offer great performance on distributed systems, especially if they are connected via not very fast network you need to come up with different consistence guaranty and semantics… and believe me most of Web applications could live with it. They anyway read stale data from the slave or memcached manually relaxing data consistence requirements.

But honestly most parts are OK – over time you develop practices and tool set to deal with multiple nodes nicely even if you do not have much support from MySQL to do so. What is really bad is Parallel query. Reporting with group by over 10.000.000 of rows in real time is one painful thing to do requiring complex application programming to do efficiently in distributed manner. Of course there are summary tables and other tricks we use when such need arises but we still can’t deny – MySQL does not handle complex queries well, being able to use only one CPU effectively on single node and no true federation support.

4. Be Solid
MySQL has very good stability track record, with exception of MySQL 5.0 it was typically safe to run it in production well before it was stable (I remember we had successful production on 3.23 alpha). However many of new releases score pretty poor on the scale of being Solid.

By by being Solid I mean Solid functionally complete features which work as uses intuitively expect them to work instead of having a lot of little known documented (sometimes) limitations. I’m saying about Prepared Statements in MySQL 4.1 which came without query cache support but even worse not all of statements could be prepared not to mention other issues. I’m saying about SubSelects which were added in MySQL 4.1 but you still could not run (and still can’t) most of your Oracle/DB2/MS SQL Queries on MySQL because subqueries optimizer weakness. I’m saying about XA which were released in MySQL 5.0 without consistent work with MySQL close Partner – Innobase to ensure Group Commit still works in 5.0 and so we do not get major regressions. I’m saying about Stored Procedures which were released without proper development and debugging support. These are just few examples.

Of course you could say “release early release often” is a good strategy and at large extent I would agree but MySQL is not released often any more. There is relatively long release cycle with great deal of planning and it looks like no features go in stable release just the day before it is released. With such release management I would expect MySQL to try to be a lot more consistent as such approach means if something is inconsistent it stays where for very long time. For example Query Cache support for Prepared Statements is only fixed in MySQL 5.1 and Most of other items are still open.

Well may be this is remainder of wild first VC-backed years of MySQL and not things are getting straightened up. I know on a lot of listed issues the work is going on and it just takes time to get them all fixed.

5. Do not forget about the roots.
MySQL has gained its popularity as database for Web but web users do not pay much because they are OK with GPL and often have smart guys inside who can figure things out and even patch MySQL if needed. So MySQL went to chase Enterprise market and develop enterprise features pushing back a lot of features which Web users would like. These features are often small, typically not SQL Standard compatible but help you if you’re web developer a lot. In my Early years in MySQL I have created more than hundred of such little projects suggested by MySQL customers and myself based on long MySQL usage for Web. Later I stopped adding these because I have not seen much traction for these tasks as well as of similar tasks created by other guys.
These would include support for query timeouts, parallel query execution, CONNECT BY support and various performance related features.

I’m not sure if MySQL got back to spend significant resources on taking care of Web guys needs but at least we have now Community tree and Web guys can take care of themselves if they want to. Jeremy Cole’s patches were accepted, our patches and I guess Google patches are in the queue which is very good sign. There is still fair amount of work to do to straighten up this process so good patches can quickly appear in community tree. Once this is done I think amount of contributions will increase a lot.

6 Comments
Oldest
Newest Most Voted
Inline Feedbacks
View all comments
Kilroy

I’d like to second that regarding “Be Solid”. When I started using MySQL ten years ago, I defended it with, “It doesn’t do everything, but what it does it does REALLY WELL.”

Nowadays when I see a new feature, I usually cringe when I read about its limitations, and many of those limitations, once documented, never get addressed. (Or I don’t notice that they do, since I never started using it.) It has made MySQL a much more difficult product to use, because there are pitfalls everywhere.

Perhaps if they can get real participation from the volunteer community that will help. They have good people there, and I know they want their product to be good, so it’s probably just a manpower problem. (They say they’re going to switch away from BitKeeper someday; I expect community involvement to improve at that point.)

Anyway, if they could get back to the idea of only deploying a feature if they can do so in a way that doesn’t just work for the narrowest of uses, that would be wonderful.

Thanks for the great blog!

Randy

I third the “Be solid” comment. I can’t count the number of times the last_insert_id() function hasn’t worked properly. It’s broke at least three times in version 5 alone.

James Day

Bugs introduced with 4.1 and 5.0 features are still slowing MySQL down. The backlog of increasingly rarely encountered bugs is gradually being cleared up, so MySQL should gradually start to release more quickly. Not as quickly as we might like, but it’s happening. Hopefully the QA lessons will also help to make the future releases more reliable more quickly.

For a scale-out in a box solution, look at MySQL Proxy as part of the package.

chad

I am glad that you spell these out. I came from Oracle and SQL Server background, where parallel query and partition give you the performance for multiple CPUs and fast IO system. I have spent some time trying to figure out how to drive up performance for large Data warehouse with 16 CPU cores with MySQL. Now I realize that MySQL is just not there in terms of parallel queries, which are so fundimental to ETL process and OLAP processing. Hope MySQL catch up with Oracle and SQL Server soon. Love to know any tricks to speed things up…

no need

I wish MySQL would just GO AWAY. They have set the tech landscape back 10 years with their warped and defective product.