July 30, 2014

Performance Schema overhead

As continuation of my CPU benchmarks it is interesting to see what is scalability limitation in MySQL 5.6.2, and I am going to check that using PERFORMANCE SCHEMA, but before that let’s estimate what is potential overhead of using PERFORMANCE SCHEMA. So I am going to run the same benchmarks (sysbench read-only and read-write) as in previous post with different performance schema options and compare results.

I am going to use Cisco UCS C250
with next settings:

  • PERFORMANCE SCHEMA disabled (NO PS)
  • PERFORMANCE SCHMEA enabled, with all consumers ON (PS on)
  • PERFORMANCE SCHMEA enabled, but only global_instrumentation consumer enabled. It allows to gather table and index access statistics (PS only global)
  • PERFORMANCE SCHMEA enabled, but all consumers OFF (PS all off)

The full results with details are not our Benchmark Wiki

There is graph for read-only case:

and for read-write:

To have some numeric impression, let’s see ration of result with PS to result without PS

There is table with ratios for read-only case:

ThreadsPS onPS only globalPS all off
11.111.101.13
21.131.081.04
41.151.071.05
81.181.071.03
241.211.081.06
321.251.101.08
481.251.101.08
641.231.101.06
1281.231.101.04
2561.211.081.04
5121.181.071.01
10241.171.010.96

There is table with ratios for read-write case:

ThreadsPS onPS only globalPS all off
11.070.940.98
21.111.001.06
41.151.041.08
81.191.021.08
241.171.001.07
321.181.071.06
481.181.091.13
641.171.111.11
1281.181.091.12
2561.141.040.99
5121.171.021.04
10241.211.061.07

So this allows us to make next summary:

In read-only case, Performance Schema with all consumers gives about 25% overhead,
with “global instrumentation” only -10%, and with all disabled consumers – about 8%.

For read-write case, Performance Schema with all consumers gives about 19% overhead,
with “global instrumentation” only -11%, and it is about the same with all disabled consumers.

Is that big or small ? I leave it for you to decide, I think it may be acceptable in some cases and not in some others.
I wish only that Performance Schema with all disabled consumers gives less overhead, 8-11% seems significant.
If nothing helps I would like to be able to fully disable / enable performance schema in run-time, not at start-time.

As I understand dtrace / systemtap probes can be disabled / enabled at run-time, and when they disabled – it is almost 0% overhead, why Performance Schema can’t do the same ?

(Disclaimer: This benchmark is sponsored by Well Know Social Network, and they are generous to make it public)

About Vadim Tkachenko

Vadim leads Percona's development group, which produces Percona Clould Tools, the Percona Server, Percona XraDB Cluster and Percona XtraBackup. He is an expert in solid-state storage, and has helped many hardware and software providers succeed in the MySQL market.

Comments

  1. Mark Callaghan says:

    What is the bug number for the performance problem being investigated?

  2. Marc Alff says:

    Our internal benchmarks for mysql-trunk also show the same problem, this is a performance bug currently being investigated.

    About how the performance schema instrumentation works, a good way to start is to follow the instrumentation APIs. The code is publicly available, and also documented.

    For example, for the mutex instrumentation, see mysql_mutex_lock().

    Regards,
    – Marc

  3. Davi Arnaut says:

    Vadim,

    Please take a look at the high level architecture of WL#2360 (http://forge.mysql.com/worklog/task.php?id=2360). See the overhead heading.

  4. Vadim Tkachenko says:

    Davi,

    You explained how dtrace works, but I would be very much interested to hear details how P_S works. In particular, why with enabled P_S, but disabled all consumers, we still have 10% overhead? P_S does not need to collect and aggregate stats in this case, so what is it doing ?
    Can you shed some light on this ?

  5. Davi Arnaut says:

    > My point is that ideally we should have near to zero overhead when we do not
    > use dtrace / systemtap / P_S.

    If we are speaking ideally, we should have it improve performance! Seriously though, I explained how dtrace operates on a completely different level then P_S.

  6. Vadim Tkachenko says:

    Davi,

    My point is that ideally we should have near to zero overhead when we do not use dtrace / systemtap / P_S.

    E.g. when we disable all consumers, P_S does not need to aggregate and store data, does it ?
    The why do we see about 10% degradation ?

  7. Davi Arnaut says:

    > As I understand dtrace / systemtap probes can be disabled / enabled at run-time,
    > and when they disabled – it is almost 0% overhead, why Performance Schema
    > can’t do the same ?

    For example, dtrace is able to patch instructions in a binary at runtime in order to add trap instructions — this requires kernel support, etc. Also, differently from dtrace/systemtap, P_S needs to aggregate and store data, etc.

  8. Patrick Casey says:

    Interesting benchmark … that’s more overhead than I would have guessed w/o having looked at the code.

    Has anybody looked at how other database vendors (I’m thinking specifically of Oracle here) offer these kind of stats? Are they taking a similar non-trivial performance hit, or is their architecture different enough that they get away with it?

  9. Peter Zaitsev says:

    One interesting thing is how overhead changes by workload type.
    Do we have more overhead for short statements (such as sysbench simple tests) or for long queries which crunch many row ? How much overhead do we get when we’re IO bound. Many applications have spare CPU and they just would not like their performance to drop for IO bound workload due to added contentions etc.

    Finally do you remember how Performance Schema in “row access counters only” mode compares to user_statistics patch in its overhead ?

  10. Vadim Tkachenko says:

    Peter,

    I used recommendations from that post.
    The results are “PS all off” columns if we disable in the way described in ‘cheat sheet’

  11. Peter Laursen says:

    I think there is a ‘cheat sheet’ here. But I did not try it.
    http://marcalff.blogspot.com/2011/04/performance-schema-faq-1-enable-without.html

  12. I think there is a ‘cheat sheet’ here. But I did not try it.
    http://marcalff.blogspot.com/2011/04/performance-schema-faq-1-enable-without.html

  13. Peter,

    I used recommendations from that post.
    The results are “PS all off” columns if we disable in the way described in ‘cheat sheet’

  14. One interesting thing is how overhead changes by workload type.
    Do we have more overhead for short statements (such as sysbench simple tests) or for long queries which crunch many row ? How much overhead do we get when we’re IO bound. Many applications have spare CPU and they just would not like their performance to drop for IO bound workload due to added contentions etc.

    Finally do you remember how Performance Schema in “row access counters only” mode compares to user_statistics patch in its overhead ?

  15. Patrick Casey says:

    Interesting benchmark … that’s more overhead than I would have guessed w/o having looked at the code.

    Has anybody looked at how other database vendors (I’m thinking specifically of Oracle here) offer these kind of stats? Are they taking a similar non-trivial performance hit, or is their architecture different enough that they get away with it?

  16. Davi Arnaut says:

    > As I understand dtrace / systemtap probes can be disabled / enabled at run-time,
    > and when they disabled – it is almost 0% overhead, why Performance Schema
    > can’t do the same ?

    For example, dtrace is able to patch instructions in a binary at runtime in order to add trap instructions — this requires kernel support, etc. Also, differently from dtrace/systemtap, P_S needs to aggregate and store data, etc.

  17. Davi,

    My point is that ideally we should have near to zero overhead when we do not use dtrace / systemtap / P_S.

    E.g. when we disable all consumers, P_S does not need to aggregate and store data, does it ?
    The why do we see about 10% degradation ?

  18. Davi Arnaut says:

    > My point is that ideally we should have near to zero overhead when we do not
    > use dtrace / systemtap / P_S.

    If we are speaking ideally, we should have it improve performance! Seriously though, I explained how dtrace operates on a completely different level then P_S.

  19. Davi,

    You explained how dtrace works, but I would be very much interested to hear details how P_S works. In particular, why with enabled P_S, but disabled all consumers, we still have 10% overhead? P_S does not need to collect and aggregate stats in this case, so what is it doing ?
    Can you shed some light on this ?

  20. Davi Arnaut says:

    Vadim,

    Please take a look at the high level architecture of WL#2360 (http://forge.mysql.com/worklog/task.php?id=2360). See the overhead heading.

  21. The overhead is still there. I measured 10% for PS=on with default consumers and read-only sysbench – http://bugs.mysql.com/bug.php?id=68413

  22. Gia McNerney says:

    Great test and report Vadim! I think the performance schema will be a great addition to MySQL once it’s more established. Just starting with MySQL after retiring from handling oracle databases – now volunteering :) Thanks for sharing.

Speak Your Mind

*