July 20, 2008

Missing Data - rows used to generate result set

Posted by peter

As Baron writes it is not the number of rows returned by the query but number of rows accessed by the query will most likely be defining query performance. Of course not all row accessed are created equal (such as full table scan row accesses may be much faster than random index lookups row accesses in the same table) but this is very valuable data point to optimize query anyway.
[read more...]

June 23, 2008

Neat tricks for the MySQL command-line pager

Posted by Baron Schwartz

How many of you use the mysql command-line client?  And did you know about the pager command you can give it?  It’s pretty useful.  It tells mysql to pipe the output of your commands through the specified program before displaying it to you.

Here’s the most basic thing I can think of to do with it: use it as a pager.  (It’s scary how predictable I am sometimes, isn’t it?)

[read more...]

April 18, 2008

Idea: Couple of more string types

Posted by peter

MySQL has a lot of string data types - CHAR, VARCHAR, BLOB, TEXT, ENUM and bunch of variants such as VARBINARY but I think it is not enough :)

I would also like to see type HEXCHAR which would be able to store hex strings, such as those returned as MD5() and SHA1() efficiently. With little modification it could work for UUID() as well (it adds some dashes). Currently it is quite inconvenient to deal with strings like that in MySQL. Either you store them as strings and waste space or you spend them as binary and deal with inconvenience of having not readable strings in the table OR adding UNHEX() everywhere - which also adds overhead.
[read more...]

April 9, 2008

State of MySQL Market and will Replication live ?

Posted by peter

There are interesting posts these day about future of MySQL Replication by Brian Frank and Arjen.

I think it very interesting to take a look at a bit bigger picture using framework from Innovators Dilemma and Innovators Solution.

I’m not going to speak about disruption and commoditisation of Database Market, leaving this for Market talks, my interest is market of Web Applications in general.
[read more...]

April 8, 2008

MySQL should have dynamic durability settings

Posted by peter

If you’re using Innodb tables MySQL has innodb_flush_log_at_trx_commit variable which defines how durable your transactions are. If you have high durability requirements you set it to 1 and log records are pushed directly to the disk on transaction commit. If you do not bother loosing come committed transactions you can set it to 0 and Innodb will only flush log approximately once per second. Finally you can set it to 2 which is flushes data to operation system cache (so if MySQL crashes transaction is not lost) but does not save from OS crashes or power failures.

So far so good. The only problem is many applications have different kind of transactions in the mix.
[read more...]

January 24, 2008

Enum Fields VS Varchar VS Int + Joined table: What is Faster?

Posted by Alexey Kovyrin

Really often in customers’ application we can see a huge tables with varchar/char fields, with small sets of possible values. These are “state”, “gender”, “status”, “weapon_type”, etc, etc. Frequently we suggest to change such fields to use ENUM column type, but is it really necessary (from performance standpoint)? In this post I’d like to present a small benchmark which shows MySQL performance when you use 3 different approaches: ENUM, VARCHAR and tinyint (+joined table) columns.

In practice you can also often use 4th variant which is not comparable directly, which is using integer value and having value mapping done on application level.
[read more...]

November 18, 2007

MySQL: Data Storage or Data Processing

Posted by peter

I was thinking today of how people tend to use MySQL in modern applications and it stroke me in many cases MySQL is not used to process the data, at least not on the large scale - instead it is used for data storage and light duty data retrieval. Even in this case however the cache (ie memcache) is often implemented to offload MySQL.

When it comes to large scale data processing other techniques are used. Google uses some bigtable based technologies for search and Google Analytics (which is quite typical log processing application), when you look at recommendation task as well as a lot of other personalized services they may not use MySQL (and SQL at all). In many cases you would see absolutely custom solutions are implemented, in other cases you would see Hadoop or Sphinx used to crunch data in parallel or it even could be special wrapper on top of MySQL do do the work
[read more...]

September 28, 2007

Using VIEW to reduce number of tables used

Posted by peter

Many Open Source software solutions use database per user (or set of tables per user) which starts to cause problems if it is used on massive scale (blog hosting, forum hosting etc), resulting of hundreds of thousands if not millions of tables per server which can become really inefficient.

It is especially inefficient with Innodb tables both in terms of space (some tables would keep only couple of small rows, but require at least 16K page in Innodb), keeping all tables open in Innodb dictionary and number of other challenges in IO management and recovery. For MyISAM it works better but still overhead can get significant because table_cache can’t be made large enough and so a lot of table reopens needs to happen which requires table header modification, which is costly.

Of course if you can simply rewrite software to store multiple users per table it is best way to go, however quite typically this is way too much work and also requires constant patches as new software versions come out.
[read more...]

September 18, 2007

Possible optimization for sort_merge and UNION ORDER BY LIMIT

Posted by peter

Every so often you need to perform sort results retrieved from MySQL when your WHERE clause goes beyound col=const values which would allow MySQL to still use second portion of the index for the order by. Ranges as well as IN lists make this optimization impossible, not even speaking about index merge optimization. Lets look at this example:
[read more...]

September 12, 2007

Read Buffers, mmap, malloc and MySQL Performance

Posted by peter

Monty Taylor posted interesting investigation of the fact read_buffer_size variable affects connection speed.

This is not something you would expect right ? me too. Not only global user data is expected to be cached on startup but even if it is not why would you do full table scan to fetch single user information ? Something is fishy here and perhaps bug needs to be filed either resulting in fixes or docummentation updates.
[read more...]