June 12, 2007

RAID and Scale Out Discussions

Posted by peter

Just found this wonderful summary of articles by Jeremy and wanted to give some of my thoughts on the topic.

First lets speak about death of the RAID. I think this is far from the case especially if you consider Software RAID here.

For many workloads you would like to get RAID just for the sake of BBU. As Jeremy mentioned RAID is cheap these days if you buy right one and can offer substantial improvement for write intense workload by safe write buffering and write merging.

Performance is another story. RAID is usually easiest way to get extra performance from your IO subsystem. Spreading the database among say 10 commodity boxes is often expensive for existing applications, even for new applications it will affect development time and complexity and it is well possible it might be no guys inside the company skilled enough to work with distributed systems.

I think to many considerations are being made thinking about commodity hardware and really smart people, while in most cases tasks have to be solved with commodity hardware and commodity people.

RAID can be indeed slower than direct disks. For example for BoardReader now indexing almost billion of forum posts we decided to go ahead with raw drives for search index because it is far faster than using software RAID. But this is only because Sphinx allows using highly parallel architectures which make it possible. The boxes we use indeed even have RAID available which is used for OS hard drive.

Indeed using RAID for OS partition (software) is what we use even for low end boxes. This has zero cost and offer redundancy in case of hard drive failure.

Now regarding infrastructure. When being called for Consulting I often ask how large system is needs to growth and how much effort people would like to spend on the infrastructure because this defines a lot what type of hardware and in which fashion can be used. You can build infrastructure which would use crappy boxes as redundant array of inexpensive servers, basically moving redundancy one way up from the disk. It however would be rather expensive and can be only justified with extreme hardware needs. In fact even Google as far as I remember uses decent servers with RAID for their MySQL installations.

I surely agree with Jeremy on Commodity does not mean Crapy thing. There is usually a sweet spot when you can get well performing hardware with good price/performance. We use a lot of Dell PowerEdge 2950 with 6 hard drives and 16GB of RAM.
With other vendors I see people using 2*2 Opterons with 32GB of RAM and 10-14 hard drives. Both solutions have comparable price/performance, though former depends on your application quite a lot.

I think MySQL is pretty well supported by matching hardware these days. As it does not use many CPUs and many hard drives very efficiently in many cases so going much higher does not buy you much anyway.

Not all the needs are satisfied that well by vendors. Dell for example for some reason failed to provide cheap 32GB of memory option even with their new Opteron servers. They also do not provide high volume internal storage option. Ie find me 12 or more hot swap SATA hard drives in the case in Dell range, while you can by appropriate cases from Supermicro and a lot of other vendors. I think this is where technical possibilities clash with sales needs - there is more revenue up-selling to external directly attach storage or SAN

MySQL Stored Procedures problems and use practices

Posted by peter

To be honest I’m not a big fan of Stored Procedures, At least not in the form they are currently implemented in MySQL 5.0

Only SQL as a Language Which is ancient ugly for algorithmic programming and slow. It is also forces you to use a lot of foreign constructs to “original” MySQL style - to process data via cursors, handle error via Handlers etc. If you spent last 10 years writing Stored Procedures for Oracle or DB2 it may be cool and convenient for you, but not for me :)

Lack of Debugging I like to be able to debug software, if not full blown debugger I’d like to have things like echo and var_dump. Due to the context of execution these are not easy though. Of course you can code a little helper Debug Storage Procedure which will log some information in MySQL table but it is not convenient enough.

Bad Parser Error Messages MySQL Parser is in general far from perfect when it comes to error handling. “You have an error in your SQL syntax … near ” at line 1″ is not very helpful even when dealing with large queries but for Stored Procedures that is the real issue. So you have to overcome few road blocks even before you start fighting with debugging.

No Profiling tools If you’re interested in Performance you need a way to profile what inside stored procedures is taking the time, which is not something readily available. Even if you look at most simple and typically most time consuming part of Stored Procedure execution - running of SQL queries - you do not get these logged, instead slow query log will contain full stored procedure calls.

On the other hand Stored Procedures Indeed can help to Improve MySQL Performance. For DBT2 benchmarks we’ve tried a while back MySQL 5.0 was about 10% slower than 4.1 without stored procedures but was 20% faster if Stored Procedures are used. And this is of course not the limit. You may also have other reasons to use Stored Procedures besides performance and these can be valid.

What I tend to do if Stored Procedures are helpful for Performance reasons is to have two code versions, one using stored procedures and other doing same thing using direct statements. This allows to debug and profile most of the things comfortably and works pretty well especially if you keep your stored procedures simple so you do not add much bugs converting code from your language of choice to SQL.

There is one more thing you should beware with Stored Procedures is to put a lot of computational load in them. Not only the language is slow at it but you’re also loading CPU on your Database Server, which is typically more expensive to scale than Application/Web Servers. It is OK however to do some simple math if it can help you to avoid sending large result set back to the client as in this case server may need to do more work to send it back than Stored Procedure to process it.

Stored Procedures also have number of performance gotchas which I’ve been running into production but have not taken a time to research into details. So just beware.

I also hope other time MySQL and third parties will develop tools and extensions to target many of the problems mentioned.


This page was found by: how to debug slow ra...