Modeling MySQL Capacity by Measuring Resource Consumptions

There are many angles you can look at the system to predict in performance, the model baron has published for example is good for measuring scalability of the system as concurrency growths. In many cases however we’re facing a need to answer a question how much load a given system can handle when load is low and we might not be able to perform reliable benchmark.

Before I get into further details I’d like to look at basics – what resources are really needed to provide resource for given query ? It surely needs CPU cycles, it may need disk IO. You may also need other resources such as network IO or memory to store temporary table, but let us ignore them for a moment. The amount of resources system has will place a limit on amount of queries system can ran, for example if we have query which requires 1 CPU second and 1 IO to execute and we have 16 core system with hard drive which can do 100 IOPS we will be consuming all CPU when we’re running 16 seconds per second.

Of course no system scale perfectly, so you would unlikely be able to get 16 queries per second on such system. There are internal scaling aspect of the system which include both latching as well as inevitable application specific scalability restrictions, such as row level locks. There is also a load aspect – “random arrivals” tell us the number of work system has to do will vary significantly all the time. Baron’s model deals with some of these pretty well and for sake of this discussion we will diminish it to workload and hardware specific constant. For example we can use factor of 0.7 and state we can
safely run 0.7*16 ~= 12 queries per second. For some ideas about what factor may make sense for your system check out Thinking Clearly About Performance by Cary Millsap.

So how we can use this data to estimate capacity of MySQL system ? We can look at CPU and IO consumption per Query and compare it to estimated (or benchmarked) system performance to provide our estimates.

If we’re running Innodb with MySQL we can use Innodb_data_reads , Innodb_data_writes, Innodb_os_log_fsyncs for your disk IO estimation. When you can divide it per number of “Questions” or “Com_select” to get amount of IO per query or per Select. It is good to check it over certain intervals – some workloads will have this as a very stable value for others it might go back and forth a lot.

How to get CPU consumption per query ? You can take a look at procfs for MySQL process:

root@ubuntu:/var/log/mysql# cat /proc/19018/stat
19018 (mysqld) S 1 19018 19018 0 -1 4202752 198731 0 0 0 347 5303 0 0 20 0 20 0 75673117 472850432 12361 18446744073709551615 4194304 11903564 140737335329904 140737335328304 139790070763411 0 552967 4096 26345 18446744073709551615 0 0 17 0 0 0 0 0 0

1 2	root@ubuntu:/var/log/mysql# cat /proc/19018/stat 19018 (mysqld) S 1 19018 19018 0 -1 4202752 198731 0 0 0 347 5303 0 0 20 0 20 0 75673117 472850432 12361 18446744073709551615 4194304 11903564 140737335329904 140737335328304 139790070763411 0 552967 4096 26345 18446744073709551615 0 0 17 0 0 0 0 0 0

The #14 and #15 here is kernel and user CPU usage of MySQL process in 1/100 of the second. (This is pretty idle test system). So 347 and 5303 correspond to 3.47 seconds of user time and 53.03 system time consumed by the process. Collecting these at regular intervals and correlating to number of queries running will give average CPU usage per query.

If you’re running Percona Server you can get the value from User Statistics

*************************** 1. row ***************************
                  USER:user
     TOTAL_CONNECTIONS: 1
CONCURRENT_CONNECTIONS: 0
        CONNECTED_TIME: 800
             BUSY_TIME: 775
              CPU_TIME: 49
        BYTES_RECEIVED: 21847267
            BYTES_SENT: 336986112
  BINLOG_BYTES_WRITTEN: 0
          ROWS_FETCHED: 485139
          ROWS_UPDATED: 0
       TABLE_ROWS_READ: 610954
       SELECT_COMMANDS: 181243
       UPDATE_COMMANDS: 0
        OTHER_COMMANDS: 0
   COMMIT_TRANSACTIONS: 181243
 ROLLBACK_TRANSACTIONS: 0
    DENIED_CONNECTIONS: 0
      LOST_CONNECTIONS: 0
         ACCESS_DENIED: 0
         EMPTY_QUERIES: 13099

*************************** 1. row ***************************

USER:user

TOTAL_CONNECTIONS: 1

CONCURRENT_CONNECTIONS: 0

CONNECTED_TIME: 800

BUSY_TIME: 775

CPU_TIME: 49

BYTES_RECEIVED: 21847267

BYTES_SENT: 336986112

BINLOG_BYTES_WRITTEN: 0

ROWS_FETCHED: 485139

ROWS_UPDATED: 0

TABLE_ROWS_READ: 610954

SELECT_COMMANDS: 181243

UPDATE_COMMANDS: 0

OTHER_COMMANDS: 0

COMMIT_TRANSACTIONS: 181243

ROLLBACK_TRANSACTIONS: 0

DENIED_CONNECTIONS: 0

LOST_CONNECTIONS: 0

ACCESS_DENIED: 0

EMPTY_QUERIES: 13099

In this case I can see this user took 49 CPU seconds per 181243 select queries which is about 270us per select query. We can also get “BUSY TIME” here, subtracting CPU time from it we get “Wait Time”
which is in this case 775-49=726 seconds or 4005us per select. Wait time is often IO (which you can see separately through number of IOPS) but it also can be row level locks, etc. The ratio between Wait Time and CPU time is very helpful to see how “wait free your system is”. If it system is already have low wait ratio increasing amount of memory for example is unlikely to help.

One helpful way to use this information is to compare systems with different memory amount having same workload. You will often see increasing amount of memory not only helps you to reduce wait time and number of IOs per query as well as increase CPU time spent. This is because IO handling requires significant number of CPU.

With Percona Server, enabled full query logging and log_slow_verbosity=full you can also get great amount of related data from mk-query-digest report:

# Overall: 1.79M total, 115 unique, 0 QPS, 0x concurrency ________________
# Attribute          total     min     max     avg     95%  stddev  median
# ============     ======= ======= ======= ======= ======= ======= =======
# Exec time          8338s     1us    284s     5ms    13ms   298ms   185us
# Lock time            71s       0     3ms    39us    54us    28us    35us
# Rows sent          7.15M       0  56.36k    4.19    0.99  360.95    0.99
# Rows examine      33.77M       0   2.21M   19.81    0.99   3.53k    0.99
# Rows affecte           0       0       0       0       0       0       0
# Rows read          5.25M       0 507.86k    3.08    1.96  469.31    0.99
# Bytes sent         8.77G      11  55.41M   5.15k   3.88k 258.55k   1.46k
# Merge passes           0       0       0       0       0       0       0
# Tmp tables        21.36k       0       2    0.01       0    0.11       0
# Tmp disk tbl      21.21k       0       2    0.01       0    0.11       0
# Tmp tbl size     554.04M       0  12.83M  324.99       0  32.65k       0
# Query size       294.90M      14     783  158.14  621.67  149.04  107.34
# InnoDB:
# IO r bytes        14.81G       0   1.10G   8.69k  15.96k 910.44k       0
# IO r ops         947.67k       0  70.34k    0.54    0.99   56.64       0
# IO r wait          7127s       0    266s     4ms    12ms   238ms       0
# pages distin      17.51M       1  44.28k   10.27    9.83  250.90    7.70
# queue wait             0       0       0       0       0       0       0
# rec lock wai           0       0       0       0       0       0       0
# Boolean:
# Filesort       0% yes,  99% no
# Full scan      0% yes,  99% no
# Tmp table      1% yes,  98% no
# Tmp table on   1% yes,  98% no

# Overall: 1.79M total, 115 unique, 0 QPS, 0x concurrency ________________

# Attribute total min max avg 95% stddev median

# ============ ======= ======= ======= ======= ======= ======= =======

# Exec time 8338s 1us 284s 5ms 13ms 298ms 185us

# Lock time 71s 0 3ms 39us 54us 28us 35us

# Rows sent 7.15M 0 56.36k 4.19 0.99 360.95 0.99

# Rows examine 33.77M 0 2.21M 19.81 0.99 3.53k 0.99

# Rows affecte 0 0 0 0 0 0 0

# Rows read 5.25M 0 507.86k 3.08 1.96 469.31 0.99

# Bytes sent 8.77G 11 55.41M 5.15k 3.88k 258.55k 1.46k

# Merge passes 0 0 0 0 0 0 0

# Tmp tables 21.36k 0 2 0.01 0 0.11 0

# Tmp disk tbl 21.21k 0 2 0.01 0 0.11 0

# Tmp tbl size 554.04M 0 12.83M 324.99 0 32.65k 0

# Query size 294.90M 14 783 158.14 621.67 149.04 107.34

# InnoDB:

# IO r bytes 14.81G 0 1.10G 8.69k 15.96k 910.44k 0

# IO r ops 947.67k 0 70.34k 0.54 0.99 56.64 0

# IO r wait 7127s 0 266s 4ms 12ms 238ms 0

# pages distin 17.51M 1 44.28k 10.27 9.83 250.90 7.70

# queue wait 0 0 0 0 0 0 0

# rec lock wai 0 0 0 0 0 0 0

# Boolean:

# Filesort 0% yes, 99% no

# Full scan 0% yes, 99% no

# Tmp table 1% yes, 98% no

# Tmp table on 1% yes, 98% no

In this case I can see there is average time per query is 5ms; it is requiring in average 0.54 read operation per second which takes 4ms which adds up pretty well. We can also see what the query in average examines 20 rows, which means about 1 IO per 40 rows… which amounts to pretty IO bound load for me.

But average is only average. It is a lot more interesting to look at Per-query information from mk-query digest (I omit queries text for client privacy)

# Query 1: 0 QPS, 0x concurrency, ID 0x382A5F3785EB3CEE at byte 114085880
# Scores: Apdex = 1.00 [1.0], V/M = 0.02
# Attribute    pct   total     min     max     avg     95%  stddev  median
# ============ === ======= ======= ======= ======= ======= ======= =======
# Count         98 1753697
# Exec time     52   4337s    71us   466ms     2ms    12ms     7ms   176us
# Lock time     96     69s    17us     3ms    39us    49us    27us    33us
# Rows sent     21   1.57M       0       1    0.94    0.99    0.24    0.99
# Rows examine   4   1.57M       0       1    0.94    0.99    0.24    0.99
# Rows affecte   0       0       0       0       0       0       0       0
# Rows read     32   1.70M       0       3    1.02    1.96    0.33    0.99
# Bytes sent    32   2.88G      80 127.65k   1.72k   3.88k   1.23k   1.46k
# Merge passes   0       0       0       0       0       0       0       0
# Tmp tables     0       0       0       0       0       0       0       0
# Tmp disk tbl   0       0       0       0       0       0       0       0
# Tmp tbl size   0       0       0       0       0       0       0       0
# Query size    61 180.07M      95     142  107.67  107.34    3.05  107.34
# InnoDB:
# IO r bytes    50   7.50G       0 128.00k   4.49k  15.96k   8.37k       0
# IO r ops      50 480.18k       0       8    0.28    0.99    0.52       0
# IO r wait     55   3949s       0   466ms     2ms    12ms     7ms       0
# pages distin  65  11.43M       1      18    6.84    9.83    2.20    7.70
# queue wait     0       0       0       0       0       0       0       0
# rec lock wai   0       0       0       0       0       0       0       0
# String:
# Databases    
# Hosts        localhost
# InnoDB trxID 3BBF3B55 (1/0%), 3BBF3B5A (1/0%)... 1753695 more
# Last errno   0
# Users        user
# Query_time distribution
#   1us
#  10us  #
# 100us  ################################################################
#   1ms  #############
#  10ms  ######
# 100ms  #
#    1s
#  10s+

# Query 1: 0 QPS, 0x concurrency, ID 0x382A5F3785EB3CEE at byte 114085880

# Scores: Apdex = 1.00 [1.0], V/M = 0.02

# Attribute pct total min max avg 95% stddev median

# ============ === ======= ======= ======= ======= ======= ======= =======

# Count 98 1753697

# Exec time 52 4337s 71us 466ms 2ms 12ms 7ms 176us

# Lock time 96 69s 17us 3ms 39us 49us 27us 33us

# Rows sent 21 1.57M 0 1 0.94 0.99 0.24 0.99

# Rows examine 4 1.57M 0 1 0.94 0.99 0.24 0.99

# Rows affecte 0 0 0 0 0 0 0 0

# Rows read 32 1.70M 0 3 1.02 1.96 0.33 0.99

# Bytes sent 32 2.88G 80 127.65k 1.72k 3.88k 1.23k 1.46k

# Merge passes 0 0 0 0 0 0 0 0

# Tmp tables 0 0 0 0 0 0 0 0

# Tmp disk tbl 0 0 0 0 0 0 0 0

# Tmp tbl size 0 0 0 0 0 0 0 0

# Query size 61 180.07M 95 142 107.67 107.34 3.05 107.34

# InnoDB:

# IO r bytes 50 7.50G 0 128.00k 4.49k 15.96k 8.37k 0

# IO r ops 50 480.18k 0 8 0.28 0.99 0.52 0

# IO r wait 55 3949s 0 466ms 2ms 12ms 7ms 0

# pages distin 65 11.43M 1 18 6.84 9.83 2.20 7.70

# queue wait 0 0 0 0 0 0 0 0

# rec lock wai 0 0 0 0 0 0 0 0

# String:

# Databases

# Hosts localhost

# InnoDB trxID 3BBF3B55 (1/0%), 3BBF3B5A (1/0%)... 1753695 more

# Last errno 0

# Users user

# Query_time distribution

# 1us

# 10us #

# 100us ################################################################

# 1ms #############

# 10ms ######

# 100ms #

# 1s

# 10s+

We can see this query takes 2ms to respond in average. Most of which is taken by IO and also what this query takes 0.28 IOs per query in average. It is also simple query which touches less than 1 row in average which makes it very IO bound.

So what If I am planning for load growth and need to have system handle another 1000 of such queries per second ? I will need to do another 280 reads per second which you can use to guess whenever current IO subsystem can handle it or whenever it needs an increase.

The query time distribution histogram is also very interesting here we can see this query which analyzes no more than 1 row may take up to 8 io requests (could happen due to looking to undo space etc) and can take between up to 10-100 ms. The queries which are in 100us range are ones where no IO needed to happen so such histogram also gives us a good clue how many queries needed no io, needed 1 IO which was not queued (less than 10 ms) or needed more than that.

Going from this we can also estimate the cost of such of such query. Lets assume it is restricted by IO performance (which it is in this disk) and having and the cost of system which can run 1000 IOPs
costs $500 to run per month (including leasing, power etc). Such system will be able to do 2592000000 IOPS per month and (using our 0.7 factor) we have 6480000000 queries such system can run comfortably in a month. This gives us 12960000 or about 13M queries per dollar.

As a summary it is often very helpful to take a close look at your workload and get an understanding how much your queries (at least most important ones) cost you in terms of CPU and IO. From this you can very easily understand what kind of hardware will take you to reach appropriate performance, what kind of hardware provides better balance of CPU vs IO utilization as well as as simple as how much does it cost to run a query. With Cloud Computing being hot a lot of Directors would like to know the costs in “utility” model and you do not have to be in the cloud to provide them with estimates.

1 Comment

Oldest

Newest Most Voted

Inline Feedbacks

View all comments

shenglin

13 years ago

Good post! It’s really good model to convert IO/CPU cost to money. That’s the first step for capacity plan. Once we can evaluate the tps/qps with specified hardware and platform, you can provide the capacity plan per the application requirement

MySQL 5.7
End of Life

Compare Percona to Leading Database Solutions

Software
Downloads

Product
Documentation

Resource Hub

Financial Services

Driving Database Success

Percona Blog

Percona Community Hub

Percona Events Hub

About Percona

Percona in the News

Our Customers

Our Partners

Careers

Contact Us

Modeling MySQL Capacity by Measuring Resource Consumptions

Related

Related Blog Articles

RECOMMENDED ARTICLES

Valkey/Redis: Not-So-Good Practices

Choosing the Right Database: Comparing MariaDB vs. MySQL, PostgreSQL, and MongoDB

Valkey/Redis: Configuration Best Practices

MOST POPULAR ARTICLES

Auditing login attempts in MySQL

Deploy Django on Kubernetes With Percona Operator for PostgreSQL

MySQL “Got an error reading communication packet”

MySQL 5.7 End of Life

Compare Percona to Leading Database Solutions

Software Downloads

Product Documentation

Resource Hub

Financial Services

Driving Database Success

Percona Blog

Percona Community Hub

Percona Events Hub

About Percona

Percona in the News

Our Customers

Our Partners

Careers

Contact Us

Modeling MySQL Capacity by Measuring Resource Consumptions

Related

Share This Post!

Want to get weekly updates listing the latest blog posts?

Related Blog Articles

RECOMMENDED ARTICLES

Valkey/Redis: Not-So-Good Practices

Choosing the Right Database: Comparing MariaDB vs. MySQL, PostgreSQL, and MongoDB

Valkey/Redis: Configuration Best Practices

MOST POPULAR ARTICLES

Auditing login attempts in MySQL

Deploy Django on Kubernetes With Percona Operator for PostgreSQL

MySQL “Got an error reading communication packet”

MySQL 5.7
End of Life

Software
Downloads

Product
Documentation