July 24, 2009

Maatkit Now Supports Memcached

Posted by Ryan Lowe |

Have you ever wondered how optimized your Memcached installation is? There is a common misconception that one doesn't have to think too deeply about Memcached performance, but that is not true. If your setup is inefficient, you could:

  • Burn Memory
  • Waste Network Round-Trips
  • Store Keys That Never Get Retrieved
  • Have a Low Cache Hit Ratio (i.e. query MySQL too much)
  • Suffer a fate too horrible to contemplate.

Percona does a lot of consulting around Memcached, so we try to take a quantitative, scientific approach to measuring memcached performance, just like everything else we do.

memcached is basically a key-value in-memory database, so it works well to analyze its traffic with Maatkit's mk-query-digest tool. You can now use mk-query-digest to get "query event" information and report on memcached effectiveness. As far as we know, this is the first and only tool of its kind.

The output is very similar to that which would be displayed for MySQL queries. Commands are fingerprinted so that similar things can be aggregated together. The output begins with an overall report:

CODE:
  1. # Overall: 2.69k total, 113 unique, 28.36 QPS, 0.04x concurrency _________
  2. #                    total     min     max     avg     95%  stddev  median
  3. # Exec time             4s    11us      2s     1ms   403us    43ms    28us
  4. # Hosts               2694
  5. # Time range        2009-06-11 21:54:47.089744 to 2009-06-11 21:56:22.136503
  6. # bytes              7.34M       0 265.49k   2.79k   4.71k  16.50k   51.63
  7. # key print           2694
  8. #   4% (119)  Memc_delete
  9. #   6% (163)  Memc_error
  10. 77% (2k)   Memc_get
  11. #   0% (4)    Memc_incr
  12. 15% (415)  Memc_miss
  13. 17% (483)  Memc_set

We can see that 77% of commands (cmds) were get, 17% set, 4% delete and 4 incr (increment) cmds. At present, Memc_error only refers to gets (or, retrieval cmds in general) that were interrupted (some things are TODO). Memc_miss is for any cmd that resulted in a NOT_FOUND result. So in this case, 15% of all cmds tried to do something with a nonexistent key.

The next bit aggregates similar queries into classes and prints out the most expensive ones. Here's the top one:

CODE:
  1. # Query 1: 7.06 QPS, 0.04x concurrency, ID 0x7AFBE9433477C7BC at byte 5983429
  2. # This item is included in the report because it matches --limit.
  3. #              pct   total     min     max     avg     95%  stddev  median
  4. # Count         24     657
  5. # Exec time     87      3s    19us      2s     5ms     1ms    87ms    31us
  6. # Hosts                  1 127.0.0.1
  7. # Time range 2009-06-11 21:54:48.050834 to 2009-06-11 21:56:21.669302
  8. # bytes         38   2.79M       0   4.75k   4.35k   4.71k   1.31k   4.71k
  9. # key print              1 TEST_KEY
  10. 17% (114)  Memc_error
  11. # 100% (657)  Memc_get
  12. # Query_time distribution
  13. #   1us
  14. #  10us  ################################################################
  15. # 100us  #############
  16. #   1ms  ####
  17. #  10ms
  18. # 100ms
  19. #    1s  #
  20. #  10s+
  21. get TEST_KEY\G

Interestingly, apparently 17% of all "get TEST_KEY" were interrupted (Memc_error), probably by the TCP connection being closed client-side.

You'll notice 100% Memc_get. This is a result of the default --report and --group-by, which is "fingerprint" just like for slow query logs. So the fingerprint for this event is "get TEST_KEY" and since the cmd is "get", 100% Memc_get. There's another way to aggregate -- by the fingerprinted key, omitting the verb (get/set/etc). To do this, you'd specify key_print as the group-by value. More details are below.

This feature is still in beta. Some things are TODO; the report format could be cleaned up a little, etc. But we're releasing early and often, and we're calling on the community to help us out with testing! Here's how:

  1. Get the latest mk-query-digest
  2. Capture memcached network traffic with tcpdump
  3. Parse memcached network traffic with mk-query-digest
  4. Ponder the results, or consider a different --report option

In code:

CODE:
  1. wget  http://maatkit.googlecode.com/svn/trunk/mk-query-digest/mk-query-digest
  2. sudo tcpdump -s 65535 -x -n -q -tttt -i eth0 port 11211> memc_tcpdump.txt
  3. mk-query-digest --type memcached memc_tcpdump.txt

Depending on what you're interested in, you may want to try these other reports:

  • --report key_print: Group events by a fingerprinted version of keys. So "get user_123" and "set user_456" are grouped together because the key_print for both is "user_?". Doing this, per-class results will have various percentages for Memc_get, Memc_set, etc. instead of 100% Memc_cmd like above.
  • --report key: Like key_print but the keys are not fingerprinted so the two examples above are different classes because "user_123" != "user_456". This would allow you to see per-key stats. Maybe, for example, how often a certain key is get'ed in comparison to other keys. (I.e. "what's the most popular get'ed or set'ed key" etc.)
  • --report cmd: Group events by cmd. So "get user_123" and "get whatever" are grouped together because both are "get" cmds. You may want to do this to see, for example, what percentage of get cmds miss, i.e. result in NOT_FOUND.

These instructions could change in the future -- this blog post is not authoritative documentation. The most current documentation is always embedded in the tool itself, and is readable with perldoc! Let us know what you think. Please use the Maatkit mailing list and and post bug reports on Google Code.

Related posts: :Daniel Nichter and Percona join forces on Maatkit::A quick way to get memcached status::APC or Memcached:
 

9 Comments »

  1. 1. Piotr

    I have

    root@lb2:~# ./mk-query-digest –type memcached memc_tcpdump.txt
    # mk_query_digest:6381 6920 Use of uninitialized value in string eq at ./mk-query-digest line 5363, chunk 1.
    #
    # mk_query_digest:6381 6920 Use of uninitialized value in string eq at ./mk-query-digest line 5363, chunk 2.
    #

    Comment :: July 24, 2009 @ 2:57 pm

  2. Hey Piotr,

    How’d you capture tcpdump info? Also, it doesn’t yet understand IPv6. Send Daniel the tcpdump output. And let’s put bug reports on Google Code:)

    Comment :: July 24, 2009 @ 3:50 pm

  3. 3. Gavin Towey

    When I try to use it, it just gives lots of these lines:

    ./mk-query-digest –type memcached memc_tcpdump.txt

    # mk_query_digest:6758 5136 tell() on closed filehandle at ./mk-query-digest line 2019.

    Comment :: July 30, 2009 @ 3:05 pm

  4. 4. DLS

    I also get errors:

    # mk_query_digest:6758 29461 Use of uninitialized value in pattern match (m//) at mk-query-digest line 2045, chunk 2972.

    Here’s how I captured tcpdump:

    tcpdump -s 65535 -v -x -n -q -tttt -i any port 11211 > memc_tcpdump.txt

    Comment :: August 3, 2009 @ 1:14 pm

  5. For those having issues parsing memcached dumps, I’ll be happy to fix them, but I’ll need a sample of your dump file that’s causing the crash. If your dump file doesn’t have private data, could you please create a new issue at http://code.google.com/p/maatkit/issues/list and upload the dump file? Else, you can email me your dump file, I’ll fix it and keep your data private; the address is my name at percona.com.

    Also, if you haven’t tried the latest version already, try it: wget http://www.maatkit.org/trunk/mk-query-digest
    I fixed some similar issues last week with memc parsing.

    Comment :: August 3, 2009 @ 1:36 pm

  6. Hi there!

    I’m excited to try this out, but have similar problems shown here in the comments:

    mk_query_digest:6758 23474 Use of uninitialized value $source in pattern match (m//) at /usr/bin/mk-query-digest line 2045, chunk 180.

    Many lines of this – I even tried to make modifications to not apply the regex if not defined, but I’m not getting anywhere with it.

    The tcpdump command options:

    tcpdump -s 65535 -v -x -n -q -tttt -i any port 11211 > memc_tcpdump.txt

    Any suggestions?

    Thanks!

    Comment :: August 6, 2009 @ 6:10 am

  7. CaptTofu, I see you commented on http://code.google.com/p/maatkit/issues/detail?id=544 and that you fixed this by removing the -v option from tcpdump?

    Currently, the tcpdump output must be like:

    2009-07-04 21:33:39.229179 IP 127.0.0.1.35126 > 127.0.0.1.11211: tcp 31
    0×0000: 4500 0053 1d2a 4000 4006 1f79 7f00 0001
    0×0010: 7f00 0001 8936 2bcb d4c3 3924 d4c8 1de4
    0×0020: 8018 0201 fe47 0000 0101 080a 0008 240b
    0×0030: 0008 240b 7365 7420 6d79 5f6b 6579 2030
    0×0040: 2030 2031 300d 0a53 6f6d 6520 7661 6c75
    0×0050: 650d 0a

    So you may need to tweak your tcpdump options to get the output correct. Any extra info will confuse the tcpdump parser and cause these kinds of errors.

    Comment :: August 6, 2009 @ 10:32 am

  8. Due to loss of formatting, the above sample output is missing spaces before each hex dump line.

    Comment :: August 6, 2009 @ 10:33 am

  9. Daniel,

    removing the -v helped, but a later run, even without the -v, gave me the error. It was a huge dump file, and strangely, while I was doing this, my amazon instance kicked me out and wouldn’t let me back in (not related I’m sure!) So I have no way of obtaining that file that was giving me problems.

    Comment :: August 6, 2009 @ 3:43 pm

 

Subscribe without commenting

Trackbacks/Pingbacks