April 20, 2007

Back from RIT2007

Posted by peter |

So I’m back from RIT2007 which took place April 16,17 in Moscow,Russia.
The conference was well organized, while the conference center was a bit small for event – halls were jam packed during the breaks and there was limited sitting available for many sessions.

It was great to see attendees from almost all serious Internet Technologies players among attendees and speakers – Yandex, Rambler, Mail.ru all were quite well presented and there were a lot of people to talk to and large number of interesting talks.

The conference was quite diverse in presented topics which made it yet more interesting for person with wide range of interests as me – there were talks starting from very technical such as FreeBSD kernel tuning, Web QA or AJAX Development, number of Project Management focused talks, Investment related talks (both for ones looking for investments and looking to invest) and even Search Engine Doorway creating tutorial.

I’ve got so many questions so I had to skip number of presentations I originally was hoping to visit, and I almost lost my voice after first day of the conference.

The session schedule was too packed in my opinion. Sessions were running by 30 minutes (3 sessions in a raw without breaks) with 15 minutes recommended for talk and 15 minutes for questions. Also you could not get your computer on the speakers table neither you had remote control so you had to say “next slide” all the time which was a bit inconvenient.

As you guess you can’t tell much about MySQL Performance Optimization in just 15 minutes so I had to keep it to very basics while trying to balance it and include some non trivial stuff as well.

I also found Russian language is more elaborate than English so I spent more time talking than I originally planned.

Andrew Aksenoff, author of Sphinx was to speak about Sphinx but he got badly sick few days before the conference so I had to do this presentation as well. I must say speaking with someone else’s slides is much harder than using your own.

After the conference, 18th of April I had 1 day training/master class which had about 40 people which is more than I would expected. That was first time for me speaking 8 hours in a row, without breaks (which were used just to ask me more questions),
before that I did 3 hours presentations max.

I’ve published slides for 3 sessions I and Andrew wrote specially for this conference on MySQL Presentations page.

Next week I’m going to MySQL Users Conference to have some more fun :)

April 11, 2007

How simple answer are you looking for ?

Posted by peter |

I’ve got an interesting comment the other day saying “I’ve heard Jay Pipes saying indexes with cardinality lower than 30% are worthless, true?”

That is interesting question and it has different answers depending on who is asking. A lot of people want to hear simple answers to the questions because they would be overwhelmed by complete answer, it is like children being taught math first taught simple division skills like you can divide 10 by 2 but not by 3. As we grow up we learn the truth is much more complex than that and if you Major in math you will find there is a lot of stuff out where far beyond basic stuff we use in every day life.

Answering such question you can say yes it is true and for clueless person it will be helpful answer because it will allow him to avoid creation indexes on the column containing just one value which is good idea.

If you start to explain it in details, a lot of variables add in – covering indexes, skewed distribution vs lookup frequency, IO bound or CPU bound workload which may be far too complicated.

It is also interesting the broad answer would not be helpful – there are workloads when even with 1% matching rows you’re better of doing full table scans, and there are ones with 99% same values in the column having it indexed helps.

The other questions which people like to get quick answers for are for example “Which MySQL Storage Engine to Use” or “How Much Memory Compared to Your Database Size Should you Have” , “What Key Cache Hit Ratio is Good One”

This often gives me head aches – I like correctness in the answers but it is often too much to deal with, so I’m trying to keep the answers tailored towards customer needs omitting details which should not be important and making sure all relevant stuff is covered.

This is one of the reasons of hiring MySQL Expert – you typically can find same information in the books, Internet, and even in this blog but selecting what is relevant for your case might be complicated.

April 10, 2007

COUNT(*) vs COUNT(col)

Posted by peter |

Looking at how people are using COUNT(*) and COUNT(col) it looks like most of them think they are synonyms and just using what they happen to like, while there is substantial difference in performance and even query result.

Lets look at the following series of examples:

[read more...]

April 8, 2007

PBXT benchmarks

Posted by Vadim |

The PBXT Storage Engine (http://www.primebase.com/xt/) is getting stable and we decided to benchmark it in different workloads.
This time I tested only READ queries, similar to ones in benchmark InnoDB vs MyISAM vs Falcon
(http://www.mysqlperformanceblog.com/2007/01/08/innodb-vs-myisam-vs-falcon-benchmarks-part-1)
The difference is I used new sysbench with Lua scripting language, so all queries were scripted for sysbench.
[read more...]

April 6, 2007

Master-Master or Master with Many Slaves

Posted by peter |

I just found post by Kevin, in which he criticizes Master-Master approach, finding Master with many slaves more optimal. There is surely room for master-N-slaves systems but I find Master-Master replication much better approach in many cases.

Kevin Writes “It requires extra hardware thats sitting in standby and not being used (more money and higher capital expenditures)”, I’m surprised why would it ? Typically you can use both nodes for most of the reads, and this is in fact use pattern MMM was designed for.

“There’s only one machine to act as the standby master. If you have 10 slaves you should be able to fail five times and still be operational.” This is valid consideration but honestly for most of applications it is enough unless you’re using unreliable hardware.
Typically other risks of downtime are higher, what is about having application error which deletes/trashes all the data and gets replicated to all the slaves ?

When you compare master-master to multiple slaves you should compare it for same amount of servers. For example if we have 6 boxes we can use 1 master and 5 slaves or 3 master-master pairs. In this case each of the master-master nodes gets 1/3rd of database size and about 1/3rd of traffic.

The benefits of Master-Master replication are the following

It is more simple especially if you just write to one node fallback and recovery are rather easy. Even if all things are automated simple things mean less software bugs.

Handling write load If your application is write intensive master-N-slave configuration will be saturated much faster because it has to handle much more write load. Especially keeping into account MySQL replication is single thread it might be not long before it will be unable to keep up.

Waste of cache memory If you have same data on the slaves you will likely same data cached in their database caches. You can partially improve it by load partitioning but still it will not be perfect – for example all of write load has to go to all nodes getting appropriate data in the cache. In our example if you have 16GB boxes and say 12GB allocated to MySQL database caches you can get 12GB effective cache on the master-N-slave configuration compared to 36GB of effective cache on 3 master-master pairs.

Waste of Disk Disk is cheap but for IO bound workloads you may need fast disk array, which becomes not so cheap so having less data to deal with becomes important.

More time to clone If replication breaks you may need more time to re-clone it (or restore database from backup) compared to multiple master-master pairs.

I however agree if you have small database (compared to memory size) with insignificant write load master-N-slave configuration may be indeed better as it allows to keep application simple, having all data on the same server.

The customer for which MMM cluster was implemented has about 600GB per node and some other installations have similar sizes.
MMM Cluster support Master with multiple slaves configuration as well, this however was not the main focus.

Using delayed JOIN to optimize count(*) and LIMIT queries

Posted by peter |

In many Search/Browse applications you would see main (fact) table which contains search fields and dimension tables which contain more information about facts and which need to be joined to get query result.

If you're executing count(*) queries for such result sets MySQL will perform the join even if you use LEFT JOIN so it is not needed which slows down things considerably. In similar way MySQL generates full rows while executing queries with limit before throwing them away which makes queries with high offset values very expensive.

To get better performance you can "Help" MySQL and remove JOIN for count(*) and do JOIN after limiting result set for retrieval queries.

Lets look at following simple example with one dimension table. In real life you will usually have several of these so performance improvements can be even higher.

SQL:
  1. CREATE TABLE `fact` (
  2.   `i` int(10) UNSIGNED NOT NULL,
  3.   `val` int(10) UNSIGNED NOT NULL,
  4.   KEY `i` (`i`,`val`)
  5. )
  6.  
  7. CREATE TABLE `dim` (
  8.   `id` int(10) UNSIGNED NOT NULL AUTO_INCREMENT,
  9.   `pad` varchar(100) NOT NULL,
  10.   PRIMARY KEY  (`id`)
  11. )
  12.  
  13. mysql> SELECT count(*) FROM dim;
  14. +----------+
  15. | count(*) |
  16. +----------+
  17. |    30720 |
  18. +----------+
  19. 1 row IN SET (0.00 sec)
  20.  
  21. mysql> SELECT count(*) FROM fact;
  22. +----------+
  23. | count(*) |
  24. +----------+
  25. 7340032 |
  26. +----------+
  27. 1 row IN SET (0.00 sec)
  28.  
  29. mysql> SELECT count(*) FROM fact WHERE i<10000;
  30. +----------+
  31. | count(*) |
  32. +----------+
  33. |   733444 |
  34. +----------+
  35. 1 row IN SET (0.44 sec)
  36.  
  37.  
  38. mysql> SELECT count(*) FROM fact LEFT JOIN dim ON val=id WHERE i<10000;
  39. +----------+
  40. | count(*) |
  41. +----------+
  42. |   733444 |
  43. +----------+
  44. 1 row IN SET (2.15 sec)
  45.  
  46.  
  47. mysql> SELECT i,pad FROM fact LEFT JOIN dim ON val=id WHERE i<10000 LIMIT 500000,10;
  48. +------+------------------------------------------+
  49. | i    | pad                                      |
  50. +------+------------------------------------------+
  51. | 6811 | 06bfea523be29a6070488ee66e874dffa170de76 |
  52. | 6811 | 3baf40c2d76998270f8954bedda386b5021e0624 |
  53. | 6811 | 35ad5c3a9d0763acc305992327864bed1af34167 |
  54. | 6811 | 81de98a3ef74ddc0fa4f7c95a27e3dbebca8df0d |
  55. | 6811 | 11cde5d0bd8ffe1eda86b39d05a58c525e8fac8f |
  56. | 6811 | 25c474b380388c23b1de730c4255612e1233e14e |
  57. | 6811 | 1d32b5ba28a513097fc88f3efd91155b2697aeec |
  58. | 6811 | bdc9a39cdfafda26fc2f48a48abd3bc5f051a4ea |
  59. | 6811 | d2e6cb9ca5aa9dd2bc3d033de45579a76ccdafdf |
  60. | 6811 | 0130c708083d77377255bd8f5e0daa15fbb24212 |
  61. +------+------------------------------------------+
  62. 10 rows IN SET (3.88 sec)
  63.  
  64. mysql> SELECT i,pad FROM (SELECT i,val FROM fact WHERE i<10000 LIMIT 500000,10) res LEFT JOIN dim ON val=id;
  65. +------+------------------------------------------+
  66. | i    | pad                                      |
  67. +------+------------------------------------------+
  68. | 6811 | 06bfea523be29a6070488ee66e874dffa170de76 |
  69. | 6811 | 3baf40c2d76998270f8954bedda386b5021e0624 |
  70. | 6811 | 35ad5c3a9d0763acc305992327864bed1af34167 |
  71. | 6811 | 81de98a3ef74ddc0fa4f7c95a27e3dbebca8df0d |
  72. | 6811 | 11cde5d0bd8ffe1eda86b39d05a58c525e8fac8f |
  73. | 6811 | 25c474b380388c23b1de730c4255612e1233e14e |
  74. | 6811 | 1d32b5ba28a513097fc88f3efd91155b2697aeec |
  75. | 6811 | bdc9a39cdfafda26fc2f48a48abd3bc5f051a4ea |
  76. | 6811 | d2e6cb9ca5aa9dd2bc3d033de45579a76ccdafdf |
  77. | 6811 | 0130c708083d77377255bd8f5e0daa15fbb24212 |
  78. +------+------------------------------------------+
  79. 10 rows IN SET (0.30 sec)

So as you can see using this trick we get 5 times speed up for count(*) query and 12 times. This is of course for extremely high offset value but it is also for example with only one dimension which fully fits in memory. for IO bound workload performance difference can be much higher than that.

You may also notice one more trick I'm using here - fact table has covered index on which has val column in it this allow to get join query a bit more optimal.

So right now performance gain may be worth the trick, in the future I hope MySQL Optimizer will be improved so it does these transformations automatically.

April 5, 2007

MySQL Master-Master replication manager released

Posted by peter |

The MySQL Master-Master replication (often in active-passive mode) is popular pattern used by many companies using MySQL for scale out. Most of the companies would have some internal scripts to handle things as automatic fallback and slave cloning but no Open Source solution was made available.

Few months ago we were asked to implement such solution for one of the customers and they kindly agreed to let us release things under GPL2 License, and we gave them reduced rate for being Open Source friendly.

So what does this tool do and how it works ?

Currently it is implemented based on Linux IP management tools and LVM for snapshot creation but we hope support for other operation systems added in the future.

It can manage master-master pair as well as other configurations such as master-master and bunch of slaves.

Typically you would define "roles" for example READER and WRITER roles for most simply case and assign them to the severs. For example you can say both servers in pair can be reader at the same time but only one of them should be writable at the same time for master-master pair.

Each of the roles will have pair of IPs associated with it in this case and it will make sure all of these IPs are handled by some server, so you can use DNS for load balancing without worrying about TTL and similar things.

If Active server fails in the pair both its READER and WRITER role will be taken over by passive node and depending on monitoring configuration it can happen within few seconds.

Such IP based high availability and load balancing does not require any extra hardware or software and works well for bunch of applications which could be implemented using different languages etc, which makes application based fallbacks problematic.

The tool also takes extra caution to prevent application mistakes. What will happen if you will write to the slave because of application error ? Well you'll break replication often without knowing about it. You can kind of solve it with using read-only user for slave connection but what if your application simply was misconfigured and things the server is master when it is not ? To take care of this Master Master Manager makes sure only one of the nodes is writable at all times and other is set to --read-only, so unless you use user with SUPER privilege you should be safe.

It has some other neat features, for example you may configure it to remove READER role from the server if it gets too delayed with replication (or if replication breaks) so you do not have to do it in each application.

We also took extra caution about making sure things can't run out of sync silently. For example you might know if slave server reboots (say power goes down) you can't be sure about data consistency because replication may be restarted with wrong position. Sometimes this shows up as an errors but for some query pattern it will not. MMM will detect this situation and will hold the server for administrator to decide.

One command LVM based sync is also implemented so restoring broken replication safe way becomes very easy.
Besides simple "cloning" of the nodes you can use same tool to create a backup with number of methods supported including compressed backup or incremental backups with rdiff.

Finally we have implemented safe role switch, meaning you can move writer to other node in clean way (making sure replication is still in sync) - I see too often this switch happens "dirty" way potentially risking replication inconsistencies. This is very handy if you want to restart one of servers in clusters to upgrade OS or add more RAM to the system.

This tool works well for number of customers and users but it is surely early software version so try it on your own risk and make sure to provide your feedback and suggestions.

You can learn some more about MMM here, download the tool here and have your questions answered and suggestions welcomed here