July 30, 2014

MySQL Sizing questions

I frequently get questions along the lines of “how many transactions per second MySQL can do” or “how many servers I need to handle 100.000 users” or “which hardware would be enough to handle my 40GB” database.

There are two things which are very interesting about these questions which make them funny and annoying at the same time

1) People asking these questions demand the answer. It could be internal Sales and Marketing people or even customers saying they typically get the answers from other vendors.

2) The best answer to this question will not be a number but something like “it depends”. Seriously you need much more information than given in this example to give estimate which is correct at least to the order of magnitude.

Seriously in MySQL Consulting practice it would not be exception to have 1-2 orders of magnitude performance increase just by proper tuning – this means even application implementation details may give huge difference, not to mention such aspecs as data details, load distribution, what exactly application is doing and a lot of these little substle variables which add up to completely different picture.

Because of so huge use case differences it hard even to give sizing recomendations even if application is defined. For example if you would ask me what hardware you need for ecommerce web site with 100.000 visitors per day I still would not have direct answers. This is because there are still a lot of seriously impacting variables. For example are you selling 100 items or millions of items as amazon.com. What search services are you going to offer ? Is there any extra services such as live advicing of items based on your previous purchase history etc. To illustrate it better – take a look at Dell DVD Store Whitepaper This paper is using “real world” kind of application benchmark which emulates internet shop activit. According to this paper you can get some 2000 orders/min from single box (including search etc). Actually with proper optimization you can even get it to 5000 orders/min from the same single server. Even with 10$ per average order it means $26 billions per year. This is more than Amazon.com does. So would they be able to run their whole orders from single MySQL server. Of course not, because their application is a lot different even though it is doing about the same thing.

So why people are asking questions which have no answers, and why are they demanding answers for them ? I guess there are two reasons:

1) Even if it does not make it is used by Sales and Marketing forces in many companies because it looks good. Also Big bosses (frequenly barely technical) need estimates for their plans and they would rather take wrong estimates than no estimates. Now than these estimates turn out to be wrong they can blame software vendor for providing these, while vendor typically would not accept any responsibilty for estimates being wrong. On the contrary – underestimates would turn extra license sales, while overestimates would typically mean more than needed licenses and hardware is already purchased.

2) There are certain cases where quick estimates actually make sense. This typically applies to partcular end user applications with with very particular usage scenario – quite typical in the enterprise, telecom words etc. For example purchasing retail system you can count number of sellers you have, number of items plus few more variables and very well estimate hardware requirements. These would of course apply to particular system in particular configuration but it will be rather accurate. Many people however would assume it is the same for database server handling custom application – no it is not.

So am I saying we do not need any sizing at all and should just pray we purchased enough hardware ? Of course note. We need estimates, however we need to spend time doing serious capacity planning, probably look at real data try tunning sample queries etc. This should be serious effort if you want to get usable results. Trying to guess from just couple of numbers could be extremely dangerous as it can be very wrong. You may have good intuition and you may guess right but it still should be treated as a guess, making sure there is not catastrophy if you are wrong.

I should write about some techniques and formulas which you can use for capacity planning in one of the next articles.

About Peter Zaitsev

Peter managed the High Performance Group within MySQL until 2006, when he founded Percona. Peter has a Master's Degree in Computer Science and is an expert in database kernels, computer hardware, and application scaling.

Comments

  1. People, who are looking for a budgetary sizing estimate for MySQL can do that on http://www.sizinglounge.com . It delivers results for servers from hp, dell, sun and ibm.

  2. peter says:

    Just visited sizinglounge web size and it seems to be just what I wrote in my article. There are some numbers published but there is no information what meet stays behind these numbers. 100 users or 1000 users is meaningless numbers before workload is defined (and application is known) – The same server handling 1000 users for one application might have hard time handling 10 for other one.

  3. May be,

    No of users connecting to server at a given time to pull and push said amount of data using system resources will give a fair idea about proposed capacity planning.

    Also, system resource loads and MySQL loads, will both have to go hand in hand in generating such formulas.

    This is just to begin at…

  4. Hi Peter
    We have a cluster to our site. We are a NewsPaper and we have 6K of visits daily. This cluster include:
    2 Web Servers with
    Port:100Mbps, i7 QuadCore- 2.93 Ghz H/T, 16GB of RAM, 2 HD of 1TB each one, and 10 TB of Bandwith. This servers active-active controlled by loadbalancing HA proxy. PHP installed. Apache Server.

    2 Database server with the same configuration, but active-pasive controlled by load balancing using HT proxy. MySQL installed.

    2 load balancing servers Celeron 2.4 Ghz with 1GB of RAM. (each one)

    Our cluster solution include a 20 TB of Bandwith and 100Mbps of uplink.

    Our Database can increase in records 5000 by month, because we are a newspaper.

    Question: Do You consider that our server configuration is properly for the number of clients and the database?

    We are currently experiencing slowness and timeout on our website http://www.panorama.com.ve and we need a different opinion about it.

    Thanks and greetings from Venezuela.

Speak Your Mind

*