While we do have many blog posts on replication on our blog, such as on replication being single-threaded, on semi-synchronous replication or on estimating replication capacity, I don’t think we have one that covers the very basics of how MySQL replication really works on the high level. Or it’s been so long ago I can’t [...]
SQL Injection Questions Followup
I presented a webinar today about SQL Injection, to try to clear up some of the misconceptions that many other blogs and articles have about this security risk. You can register for the webinar even now that I’ve presented it, and you’ll be emailed a link to the recording, which will be available soon. During [...]
Aligning IO on a hard disk RAID – the Benchmarks
In the first part of this article I have showed how I align IO, now I want to share results of the benchmark that I have been running to see how much benefit can we get from a proper IO alignment on a 4-disk RAID1+0 with 64k stripe element. I haven’t been running any benchmarks [...]
Using any general purpose computer as a special purpose SIMD computer
Often times, from a computing perspective, one must run a function on a large amount of input. Often times, the same function must be run on many pieces of input, and this is a very expensive process unless the work can be done in parallel. Shard-Query introduces set based processing, which on the surface appears [...]
Distributed Set Processing with Shard-Query
Can Shard-Query scale to 20 nodes? Peter asked this question in comments to to my previous Shard-Query benchmark. Actually he asked if it could scale to 50, but testing 20 was all I could due to to EC2 and time limits. I think the results at 20 nodes are very useful to understand the performance: [...]
Modeling MySQL Capacity by Measuring Resource Consumptions
There are many angles you can look at the system to predict in performance, the model baron has published for example is good for measuring scalability of the system as concurrency growths. In many cases however we’re facing a need to answer a question how much load a given system can handle when load is [...]
Impact of the sort buffer size in MySQL
The parameter sort_buffer_size is one the MySQL parameters that is far from obvious to adjust. It is a per session buffer that is allocated every time it is needed. The problem with the sort buffer comes from the way Linux allocates memory. Monty Taylor (here) have described the underlying issue in detail, but basically above [...]
Scaling: Consider both Size and Load
So lets imagine you have the server handling 100.000 user accounts. You can see the CPU,IO and Network usage is below 10% of capacity – does it mean you can count on server being able to handle 1.000.000 of accounts ? Not really, and there are few reasons why, I’ll name most important of them: [...]
Star Schema Bechmark: InfoBright, InfiniDB and LucidDB
In my previous rounds with DataWarehouse oriented engines I used single table without joins, and with small (as for DW) datasize (see http://www.mysqlperformanceblog.com/2009/10/02/analyzing-air-traffic-performance-with-infobright-and-monetdb/, http://www.mysqlperformanceblog.com/2009/10/26/air-traffic-queries-in-luciddb/, http://www.mysqlperformanceblog.com/2009/11/02/air-traffic-queries-in-infinidb-early-alpha/). Addressing these issues, I took Star Schema Benchmark, which is TPC-H modification, and tried run queries against InfoBright, InfiniDB, LucidDB and MonetDB. I did not get results for MonetDB, will [...]
MySQL-Memcached or NOSQL Tokyo Tyrant – part 2
Part 1 of our series set-up our “test” application and looked at boosting performance of the application by buffer MySQL with memcached. Our test application is simple and requires only 3 basic operations per transaction 2 reads and 1 write. Using memcached combined with MySQL we ended up nearly getting a 10X performance boost from [...]

