April 24, 2014

When to use Hardware upgrade instead of Software Optimization

One typical question which frequently pops up is whenever it is better to use hardware upgrade or optimize software more. I already wrote about it, for example here.

Today I’ll look at the same topic from the consultants view. When consultant should suggest hardware upgrade and when it is not in a simple checklist form.

How good is hardware ? Sometimes people use so crappy hardware it would be much cheaper for them to upgrade before purchasing much of professional services. Though in some cases people like their system to be optimal and so they want to run it on some old box even if it costs them more to optimize it. It may be valid choice allowing to take a hardware boost later down the road when you need a major performance increase and do not have a time for big application changes.

Is current hardware usage balanced ? Say you have 10 servers one of them is overloaded and 8 are almost idle. In this case proper balancing is the answer rather than getting more hardware. Of course not all of the applications can be easily balanced but it is also surprising how many cases of people with reasonable sharded or replicated architecture suffer from wrong balance. Balancing can be simple operations act or require application changes which is another variable.

Are there any spikes in hardware use ? Often performance problems happen only Sometimes….. for example nightly when you do a backup or at 15 minutes off hour when you have some cron job running. In such cases evening out the usage is often better choice than hardware upgrade and it also can be done easily.

Is MySQL, Queries, Architecture optimized well enough ? Do not advice hardware upgrade as the first thing when you can triple performance by simple my.cnf change. Well enough is a tricky term though. You need to balance things and see what can be easily done by consultant or the customer and what is not. Adding the indexes is easy. Changing the query in your own application is easy but is hard for third party application, especially if it is close source. Big schema changes, caching, sharding can be even more complication – it depends. The bigger your application is going to be the more optimal you want to be on application level to be efficient.

What is exactly your goal with hardware upgrade ? Same as with software optimizations and changes you do should have a goal. You add the index to avoid full table scan and make given query faster. You add memory to avoid disk IO and make lookups faster. The goal in the application performance – making throughput better, query faster should lead you to goals in hardware (faster random IO, better caching, faster execution) which you can use to understand what needs to be done (gettings SSD, upgrading memory, upgrading CPUs). There are number of balancing questions you have to solve too such as SSD may not need so good in memory caching any more etc. There is no such thing as better hardware really, just hardware which serves application needs better. Many of us probably remember as moving from 4 Core systems to 8 Core reduced performance for many MySQL systems even though hardware itself was superior.

There are surely more things you can look at but these are simple obvious things you can keep in mind. Do you think I should add something else ?

About Peter Zaitsev

Peter managed the High Performance Group within MySQL until 2006, when he founded Percona. Peter has a Master's Degree in Computer Science and is an expert in database kernels, computer hardware, and application scaling.

Comments

  1. Pat Casey says:

    What are folks deploying for large installs these days?

    To give the background here, we deploy a LOT of mysql running a not entirely predictable workload (we allow freeform query in our app). We started out buying lots and lots of what my operations staff calls CABs (Cheap Ass Boxes). We did some analysis though and demonstrated that we were “wasting” a lot of those boxes.

    We’d have 4 or 5 boxes, for example, with CPU loading under 20% and healthy looking memory, but we couldn’t collapse them onto one box because if we did that the IO subsystem would lock up and performance would be terrible as soon as anybody did a table scan.

    It turned out it was cheaper for us to invest in some high end storage (a Netapp SAN) and replace 4 CABs with one mid teir blade with a lot of memory.

    Net effect is we have roughly the same cost structure, the same or better performance, and 1/4 the number of boxes under management.

    Has anybody else tried to solve the same problem on a large scale? I know that investing a mess of money in a SAN isn’t the mysql “way”, but I’m kind of curious as to how other folks are solving the IO problem (which admittedly isn’t mysql specific).

  2. peter says:

    Pat,

    What kind of storage did you have on these boxes ?

    My general experience is SAN/NetApp are pure solution for MySQL Server when it comes to Performance. It can make things easy for you to manage or deal with cases when it is just hard to get so much of local storage.

    IO system starvation can be the problem for sure but good directly attached storage is often good answer for that.

    Think about Caching too. Collapsing 4 boxes into one you need 4x more memory to maintain the same hit ratio and so to have just 4x IO subsystem demands.

    The question of the balance is also always the good question. I often see people reading word “commodity” as “cheap crap”. So they end up buying very cheap boxes for databases – little memory, no good RAID, poor hardware components and when loose a lot of time managing these.

  3. Pat Casey says:

    The old CAB boxes just had paired (raid 0) SCSI drives on them (as I said, they were cheap).

    We tested pretty extensively with RAID 5 arrays (5 disks in the array if I recall), but we could still IO it out if there was a spike of some sort, plus with that kind of array attached, the boxes weren’t all that cheap anyway.

    You’re definately right about the memory though, we’re collapsing 4 8G boxes into 1 32G box, so there’s a non linearity in cost there.

    The other nice thing we liked about the SAN was:

    1) Snapshotting was very fast and efficient
    2) If I lost a blade, I could just mount the LUN on the hot spare and be up again in a matter of minutes

    Again though, those are advantages we paid for (san != cheap).

  4. peter says:

    I see, and which IO system did you try to consolidate it too ?

    Regargind your two items – as you can see they do not have much to do with performance but rather how do you have you operations and high availability setup.
    I dislike SAN mainly because their cost. For most high volume applications the difference is just too high.

    In reality if you look at the physics it all comes down to amount of disks you have, cache you have etc. The “smart software” can optimize things sometimes a bit bit you should not expect any magic, in particular free magic – many optimizations are helpful for some workloads but bad for others.

    Regarding snapshotting LVM works pretty well if it is setup properly. You have to plan certain IO capacity for it but it is still typically more efficient.

    About loosing the server… I prefer to design systems there is never dependency on single copy of data. You can get tables corrupted because of MySQL bugs (or filesystem, hardware etc) – so I would rather keep the slave or DRBD partition on the other node which can be used for failover.

    Though there are a lot of choices and conditions, among them some people just love SANs so purchasing them may make sense to keep them happy at very least :)

  5. Your point about “Is MySQL, Queries, Architecture optimized well enough ?” is well taken. I’ve been bumping up against this situation with a high-traffic Magento store of late. Even after tuning, the site just doesn’t perform adequately enough and the kinds of changes that are required really must be undertaken by Varien, the developer of Magento. In the interim, I can only recommend a faster server to my client and hope that he accepts my suggestion that a more distributed architecture (split database/web server) is the only practical solution for the short term.

    This is ordinarily the LAST thing I would recommend, because I tend to think there’s always more performance that can be squeezed out of most environments – but in this case the only sane solution is to apply more hardware.

Speak Your Mind

*