I think every person responsible for Development or Operations of growing application sooner or later have to decide on couple few questions on how to tackle application performance. These questions are:

  • Should we Optimize Application or get more Hardware ?
  • Should we do things ourselves or hire an experts to help us ?

The answer on these questions actually depend on a lot of things, some of which we’ll try to cover here.

The things which usually define best solution are economics , resources and risks .

First lets talk about optimizing application vs getting better hardware.

With application optimization typically you have rather interesting relationship between “effort” and “result” typical application would have number of low hanging fruits which are easy to fix – adding couple of proper indexes or tuning couple of MySQL settings.

When low hanging fruits are fixed you end up with either fine tuning which gives little improvement or more application changes such as building summary tables, significant schema changes, use of replication, caching, using memcached, sphinx, implementing proper web caching etc. These changes are not only require more time to do but they also require proper experience to implement well and add has higher risks – adding index you rarely would add bugs to your application while redesigning application schema or implementing caching are much more bugs prone. Another risk is implementing such solution in time before your application collapsing.

When it comes even more serious application architecture changes such as sharding which unlike of things like caching often has to be done at once rather than incrementally which becomes even more expensive, requires even higher experience to do well and poses higher risks.

On the Hardware side things are not so trivial as well. First there is Scale-Up of using more powerful server and there is Scale-Out when you can get use of more servers efficiently.

If you Scale-Up you by getting more powerful MySQL Server hardware the cost is going to be increasing dramatically after certain hardware size. For example right now you can get 8 core x86-64 box with 32G with good price/performance value while if you would like to go significantly higher like 32 cores and 128G of RAM it is going to be dis proportionally more expensive.

There are also however resources and risks issues – fewer people have experience dealing with quite high end hardware plus it is often more risky if it breaks because immediate replacement may not be available and keeping unused spare box can be expensive. Because fewer users use these boxes you also likely to have more bugs on all levels starting from Hardware and going down to MySQL.

Then now even if it all works perfectly you should think if your assessment of MySQL scaling abilities is the right one. It well may be you will not be able to scale efficiently past certain hardware configuration – some of these issues may be MySQL related and others (like excessive row level locks) may be caused by application design.

Another thing to consider – even if you have unlimited money to spend there are still limits of how powerful hardware you can get on the market, especially if you consider some workloads may have problems with latency rather than throughput. If single user
CPU bound query takes 30 seconds you unlikely will be able to solve the problem by CPU upgrade. For single thread workload the most expensive x86-64 CPU you can buy would be unlikely more than 2-3 times faster than the one in laptop I’m using right now.

So what it all means ?

This means it often good idea to pick up low hanging fruit straight away – before you look at upgrade the first time, because it even can take less time than moving the application. On reasonably tuned applications when further tunings are expensive you may well upgrade hardware instead but as you grow further you’ve got to think about application again.

Of course this assumes your application starts from the ground and grows gradually. There are cases when application starts from quite serious load to start with – like if you work for Yahoo and develop the new “Applet” which will be featured on the front page you quite likely start with scaled out and reasonably optimized phase.

Let us think about some Social Network applications – these tend to grow fast if they are lucky.

Initially it runs on some dedicated (or even virtual) rented server which would usually have 1-2GB of memory may be couple of SATA hard drives – something one would rent for $100-200 a month, personally funding projects people rarely like to waste more money until projects takes off.

As it gets slow one can spend say $1000 or personal time optimizing it and get 10x capacity compared to naive implementation.

At the same time one would likely get couple of boxes to spread Web and Database instances and establish may be establish replication mainly to be able to fail over if there are problems.

Over time application still will be limited by MySQL server so the alternative would be either spend $5000 on implementing caching to upload it or to get some higher end box with few cores 16-32G of RAMs and good IO subsystem which would cost $5-10K or some $500 a month if you want to lease it. Of course because of replication you would want at least two.

Implementing memcache may look like best idea from economic standpoint until you think about resources and risks – hardware upgrade is straightforward and can be done by operations and data center stuff so developers can keep developing new features which users are screaming about instead of this boring backend stuff and when hunt for bugs.

After this step you however would need to look at application – as you can get server much higher and you need to scale 10x within next 6 months by your conservative growth forecast. So you go ahead and implement memcache, sharding, replication or other techniques which allow you to scale out to the several boxes.

If you do this step properly you may end up in interesting situation – you can put 10, 20, 40 shards and scale almost linearly. Does it mean you should not look at application performance and just grow by getting more hardware ?

Not really. Even though you may have solved your survival problem it does not mean you’re running things cost effective way and getting maximum out of hardware you have. Plus there could be nice surprises such as data center power or space limits coming your way.

Assuming server cost of $10K per server and operating cost of $200 per server (space/power/admin overhead) if you have just one server optimizing your application very modest 10% does not give you much benefit – you would just need another server a bit later. However if you have 100 MySQL servers to deal with even modest 10% can save you $100K in hardware cost and $2000 a month in operating costs.

The point is the more large scale application you have the more time and effort you should spend on optimization. Same often applies to HA as higher scale applications tend to have higher cost of downtime.

So now what is about using your Own Resources vs Hiring Consultants ?

Again we should think about economics resources and risks.

In the early stage it often makes sense to do a lot of things yourself. Learning to understand the basics allow you to design the application more efficient way and you probably do not want to need consultants to review every simple query you design. Getting low hanging fruit is also not a rocket science – adding couple of extra indexes is not that hard to do.

At the same time it may make sense to attract experts to review what you’re doing to make sure you actually picked up all these low hanging fruits you could and what you’re not doing some silly mistakes with application architecture or table structure which may cost you much more to fix at later stage.

As time goes many project find them in the situation when they have hard time allocating resources to get application optimized properly – they need to move fast rolling out new stuff especially in high pace markets. Besides lack of resources it attracting experts to help you well may be less risky – hopefully they have already dealt with similar situations and know what works well which reduces trial and error and general risk of going wrong way. Plus because they know the stuff they often need to spend significantly less time on the same task which makes it cheaper than looking internal resources in the end.

As project moves to the higher end another variable comes into the play – even if your local resources have created the system which performing well getting experts which may have ideas how to optimize it another 10-15% may be a great idea. On higher end I’d even look to speak to few of them as optimization is as art as it is science and each expert may his own ideas which may or may not work well.

Reading this you may think I’m promoting our Services. Indeed I am. I truly think our services is a good match for a lot of companies and is cost efficient and low risk way to tackle these problems, and even though I just spoke about scaling and performance optimization it goes far beyond this.

Hiring us is of course not the only option. There are variety of companies on he market offering services around MySQL and LAMP and MySQL is one of them. Plus for large companies it may make sense to hire great experts and have them to do inhouse consulting helping various projects to get things right. Both Google and Yahoo for example have great experts in MySQL and Scaling in general.

6 Comments
Oldest
Newest Most Voted
Inline Feedbacks
View all comments

Or you can hire me 🙂

Geoff McQueen

Awesome post. There’s nothing wrong with a sales pitch when it comes with so much awesome free advice, and when the blog it accompanies comes with so much excellent content. Thanks for taking the time on this and all the other posts…

T-Enterprise

Excellent Article – Saved and forwarded. Cheers!

Wes Mahler

I look forward to the time we get so big, so massive, that we need experts such as yourself to scale a big operation!

Joe

I’m at exactly the stage you’ve mentioned in this entry. I have 3 servers – one master (used for admin too) and 2 replicated servers used for the live site. They are load-balanced too using a dedicated software load-balancer. Currently all three have 8GB RAM. In the schema I use caching tables that are aggregated with several joins and are updated nightly. These are used for searching, etc. and help me having to do joins in real-time. However, the indexes on these tables are big. The indexes for all my tables are 9GB in total. Often I get slow queries even when using these tables in joins. I’ve also implemented memcached and it’s really good once queries are cached, but the first query is always slow.

How do I know if more RAM will help? What is the relationship between index size and RAM size, and is this on a per query basis, or over time? How do I know if faster processors will help? My DB could probably be more optimised along the way, but testing with such large tables is not easy 🙂

Can anyone offer some advice to some of these basic questions?

Thanks