June 15, 2007

Commodity Hardware, Commodity Software and Commodity People

Posted by peter

In the previous post I mentioned not all architectures and solutions work for Commodity People, and people seems to agree with me.
Number of vendors would claim they are in Commodity Software or Hardware business but few would probably mention they are doing it for Commodity People, because few people would like to be called commodity - each of us would like to rightfully think he is special and unique.

Thinking more about the topic I think being “Commodity People Friendly” is one of important properties for commodity products. Look for example at Dell HP or Whitebox x86 servers, they are not only cheaper but they are also easier to use than Mini Computer systems from IBM. Directly attached storage is more simple to use than SAN, MySQL is more simple to use than Oracle or DB2, PHP is more simple than Java.

Even for the same Vendors you can find commodity products are designed to use by commodity people - they tend to be more failsafe and easy to use than higher end ones. Look for example at LinkSys home routers, by CISCO and compare it to traditional IOS based series. HP is another example Vendor being on commodity and high end markets.

I believe at large extent MySQL gained its popularity due to these properties of being simple to use and forgiving to the errors. Look at early MySQL versions - if you insert too long data it would just strip it but continue working, you can copy (MyISAM) table while server is running and it would typically work. Even if table is corrupted MySQL still would run just giving you errors sometimes. No transactions means you do not have to deal with deadlocks or learn isolation modes. Perfect solution for absolute beginner.

Now from queries design MySQL would support a lot of functions (which you can always lookup in the manual) but it did not support any complex constructs like subqueries or views so you rarely would scratch your head thinking what is this query you peer has written suppose to do.

Of course you could tell me who cares about “commodity people” - we would hire smartest guy out where who can compute optimal join combination for any 15 way join in their head not to mention understanding all ups and downs of database management system. This sounds nice in theory but there are not so many smart guys out where, and if they are they may be pretty expensive so you can’t have many of them in the team. In many MySQL Projects there are not even dedicated MySQL people and Web developers simply use MySQL as they fit suitable and one of them assumes DBA roles and installs MySQL and chases developer if they write too bad queries (in a good case). In a bad case you may find MySQL which just happened to come with OS and queries which no one ever run EXPLAIN for which just happen to work anyway because of very small database size.

Now, for a smaller projects even if you happened to have smart MySQL guy you might not be in a better shape as your business may be at risk if he leaves and no one else is able to understand his smart ways. This can be much worse than having commodity solutions used which may not be optimal but which everyone in the team understand and is able to support if needed.

What scares me in MySQL Development is what is is quickly leaving this Commodity Space in terms of overall feature complexity. MySQL 5.2 will give you many storage engines to play with (many with transactions an some with clustering) with support of partitioning stored procedures views and a lot of stuff. Think how much freedom evil smarty has to design something which would be hard for other people to understand and support. Interesting enough the fact MySQL is Open Source puts it in a worse shape than Oracle and other systems here - typically you get charged more for Advanced features but with MySQL everything is free out where to try so there is no financial barriers stopping you from shooting yourself in the foot.

MySQL Skills also would likely loose portability. If you look at MySQL 4.0 you could simply ask if person knows MySQL or does not, in the new versions it becomes possible for someone to be an expert in one area and familiar with one design approach but not with the other. Someone could be great MySQL Expert but have no experience with MySQL Cluster, other may be good with MySQL Cluster but have no idea how to write storage engines or optimize for Falcon storage engine. Relatively simple 1000 page book which would cover pretty much all of MySQL features in version 4.0 becomes a book shelf for new MySQL versions.

I can’t say this is any unusual development. I think it is natural for software to chase features, because this is where customers are leading the product (”implement this and we’ll buy it”) but this is also why old products become feature overkill, slow and complicated. I’m not in positions to complain though, increasing complication will mean more services will come to us for Consulting Services

Do I see MySQL replaced in its space any time soon ? I do not think so. I do not think we need yet another SQL database because SQL language itself is way to complicated and outdated, plus it is not expressive enough for many modern application needs. I would expect solutions to be developed which operate on more flexible data structures, handle distributed semantics of web better and are expressive in a different form. I know we already had a false start in this area with XML databases but this is not unusual for technology to rethink itself and gain the market from the second attempt.

Some interesting developments we kind of have in this are is of course famous Google BigTable or FaceBook API, I know few other companies have their special “database” interfaces which run on top of MySQL or other databases. Other interesting development is Scalable Blob Streaming. This projects starts with retrieving data from storage engine using HTTP Protocol and I expect it would not be long before other operations would follow.

One thing I was thinking a lot is all these great Storage Engines - do they all really need MySQL to run ? At this point all Open Source storage engines out where are for MySQL but why could not one develop smaller and lighter top part ? The same way as you can run PHP as part of Apache but you can hook it up to bunch of other web servers as well, such as lighttpd.

In fact in this area MySQL Cluster, which was always best isolated of MySQL source leads the pack - there are bunch of interfaces to talk to MySQL cluster as PHP session storage or as a REST Web Service, which are all rather interesting.

One more interesting development I expect we might see is more active use of remote storage services. Amazon has S3 service which deals with files, I bet similar service could be designed for many of data store applications, especially for specific needs when for example large amounts of data need to be analyzed which requires large amount of hardware to offer quick response time, but only for short time frame.

Anyway it is hard to predict the future so it would be fun to watch how things develop.

May 24, 2007

MySQL Geek Job Openings

Posted by peter

The consulting load keeps increasing so we’re looking for some help.

This job would be perfect for someone interested in high performance and scaling with decent knowledge of MySQL and eagerness to learn more.

We do encourage people from all countries to apply.

May 4, 2007

Linux failing to boot screen on the plane

Posted by peter

I’ve seen Windows Blue Screen of Death, stalled boot process or simply application error dialog on many big information screens in shops, airports and other places as well in other systems such as cache machines, airport self checking systems or photo print kiosk.

Today coming back from US on NorthWest Airbus A330 we got stalled Linux boot screen on the main display when we landed. This must be some older kernel as it was in pseudo text mode black screen with penguin in the top left corner.

It is always fun to learn what is inside of these closed systems via error messages they might throw up. Too bad you can’t see if MySQL is inside the same way. In the Internet however you can frequently learn smaller sites are powered by MySQL when they print MySQL error messages to the screen in case of problems.

February 14, 2007

Getting use of Slave in MySQL Replication

Posted by peter

MySQL Replication is asynchronous which causes problems if you would like to use MySQL Slave as it can contain stale data. It is true delay is often insignificant but in times of heavy load or in case you was running some heavy queries on the master which not take time to replicate to the slave replication lag can be significant. Also even very small lag can cause the problems - for example you’ve posted comment on the blog and on next page reload you do not see it as it was read from the slave millisecond later…. this is something you would not like to happen.

I’ll list some techniques here which I found to be helpful for offloading load to the slave without causing application to be have crazy. The same approach can be used in Master-Master replication in Active-Passive mode, just think about passive node as a slave.
[read more...]

February 11, 2007

Content delivery system design mistakes

Posted by peter

This week I helped dealing with performance problems (part MySQL related and part related to LAMP in general) of system which does quite a bit of content delivery, serving file downloads and images - something a lot of web sites need to do these days. There were quite a bit of mistakes in design for this one which I though worth to note, adding some issues seen in other systems.

Note this list applies to static content distribution, dynamic content has some of its own issues which need different treatment.

DNS TTL Settings The system was using DNS based load balancing, using something like img23.domain.com to serve some of the images. I’m not big fan of purely DNS based load balancing and HA but it works if configured well. In this case however the problem was zero TTL set in DNS configuration. This obviously adds latency especially for “aggregate” pages which may require images to be pulled from 10 different image servers.

Keep Alive In my previous post I wrote you often do not need keep alive for dynamic pages (there are also exceptions) but you really should have Keep Alive enabled while serving images. It especially hurts not to have one if 30 thumbnails are loaded per page if you do not have one.
[read more...]

January 30, 2007

Making MySQL Replication Parallel

Posted by peter

Kevin Burton writes about making MySQL Replication Parallel. Many of us have been beaten by the fact MySQL Replication is single threaded so in reality it is only able to use only single CPU and single disk effectively which is getting worse and worse as computers are getting “wider” these days with multi-core CPUs.

Kevin proposes to execute queries in parallel and it is generally good idea, the problem is however implementing it right without changing MySQL Replication semantics - which is - Slave database state corresponds to master database state at certain point in time. It is delayed but threads reading from the slave never will see state of the database which never existed on master.

As I commented in Kevins blog the problem is very simple to illustrate - assume you have 2 queries modifying 2 different tables, query A and query B. On the Master query A completed first and B followed it. On the slave we execute them in parallel so query B may complete before query A causing database stage which never existed on the master. Of course the idea could be to wait on final commit stage and commit queries A and B in order defined by Master but it brings to the plate other problems such as possible deadlocks between queries if they are complex transactions.

It should be however not as bad if we only look at single queries or transactions which do not have any overlap in terms of tables.

For some users commit order for independent queries may be unimportant so this restriction could be weakened to only make sure there is a “barrier” between queries which are possibly dependent on each other, such as reading or writing to the same tables.

There is other possible solution it is to allow multiple threads inside the server to share same transactional/lock context. In this case replication could accumulate number of queries execute them in parallel and then commit all at once.

None of these however are easy trick which I would expect to come quite soon.

On other hand if support for Multi-Master is implemented for many applications Parallel Replication could be implemented simply by filtering transactions and writing to number of binary logs.

If you’re “Scaling Out” you may just treat single server as it is few servers, so place several independent pieces on it, for example if different databases. Now if you could setup filtering so updates for each of them is written to its own binary log file and setup multi-master replication so slave can read all of them in parallel you can get replication parallel enough for many application without serious code complications.

If MySQL would not implement it it might be nice feature to hack into community tree.

October 26, 2006

Speaking on OpenSource Database Conference, Frankfurt

Posted by peter

I’ll have two sessions on upcoming OpenSource Database Conference in Frankfurt 6-8 November. One session will be general MySQL Performance Optimization workshop the other will be focused on Innodb architecture and optimization.

If you’re visiting this event or International PHP Conference which runs parallel to this even drop me a note and we can chat.

You also might noticed I was not posting too actively in October - it turned to be very busy month but hopefully I’ll get some more time soon :)

September 22, 2006

EuroOSCON 2006 - High Performance FullText Search

Posted by peter

I’m now back from EuroOSCON 2006 which was the reason I was not posting for a while. Pretty interesting event, even though it looks like it is getting less geeky compared to OSCON in US I visited two years ago - a lot of presentations now shifted to philosophical, political and business issues. I however do not know might be this is just Europe thing.

I gave a talk on High Performance FullText Search for Database Content which is now available for download from MySQL Performance Presentations page

[read more...]