March 29, 2007

Speaking on RIT-2007, Moscow, Russia

Posted by peter

I’ve been invited to speak at RIT-2007 which stands for “Russian Internet Technologies” conference. It takes place 16-17 April in Moscow, Russia. It looks like this is going to be biggest Russian conference on Internet technologies so far with many large Russian Internet projects sending their delegates. You can check out conference sessions schedule here.

Andrew Aksyonoff, the author of Sphinx Search Engine will also be giving a talk there.

The day following the conference April 18 I’m giving full day “Master Class”/Training on MySQL Performance.

Even though this post is in English the conference and training will be in Russian language :)

March 28, 2007

MySQL Replication and Slow Queries

Posted by peter

I just filed a bug regarding slow queries executed by replication thread are not being logged to the slow query log.

This is not a big deal but it is ugly little gotcha which I think few people know about. It is especially bad if you’re using tools to analyze slow query log to find what queries take up the most resources on your server - in certain configurations replication load itself may be very significant contributor.

But even if you do not, as I wrote in the bug report it is quite handy to have this information our where as query times on master and slave can be different in a lot of circumstances.

One more thing to consider - slow replication queries also identify replication lag, ie if you have query which ran 10 seconds on the slave and your replication is well loaded, this means at some point in time your replication lag was at least 10 seconds.

March 27, 2007

Integers in PHP, running with scissors, and portability

Posted by shodan

Until recently I thought that currently popular scripting languages, which mostly evolved over last 10 years or something, must allow for easier portability across different platforms compared to ye good olde C/C++.

After all, their development started a few decades after C, so its notorious caveats are all well-known and should be easy to avoid when designing a new language, right?

However, PHP just brought me a new definition of “portable” - and that was when working with… integers.

[read more...]

PHP Sessions - Files vs Database Based

Posted by peter

One may think changing PHP session handler from file based to database driven is fully transparent. In many cases it is, sometimes however it may cause some unexpected problems as happened to one of our customers.

If you use file based sessions PHP will lock session file for whole script execution duration, which means all requests from the same sessions will be serialized on PHP level, which means they also will be serialized for single user on database level. If you change to store PHP sessions in MySQL instead this effect may be no more true and you may have number of requests executing for the same session at the same time. First of course means you may have your session data damaged because you will have lost session variables update from one of the script, in addition however you may run into database related issues of modifying user profile or other user/session related data in parallel, if you do not use transactions or lock tables.

So how you can get back your old file based session behavior with MySQL Sessions ?

If you have dedicated connection to session database and use Innodb tables for your session storage you can start transaction on the session start and use SELECT … FOR UPDATE to lock the session row in the session table for whole request length. On the end of the session the same row is updated and transaction is committed.

If you share session connection with other modules or do not use transactional tables for session you can use GET_LOCK to get same behavior. In the start of the session you can do SELECT GET_LOCK(’‘,10) and in in the end of the request
SELECT RELEASE_LOCK(’‘) where session_id is current session identifier. Note - setting this external lock on session name should be done before session data is read from database for things to work properly.

This approach assumes you do not use GET_LOCK in other places in your application as as soon as it is called second time previous lock is automatically released. The good thing about it however - you can use it as an extra to your current MySQL Sessions system without need to change how it works internally. If you do not use persistent connections you even do not have to release lock - as soon as connection is closed the lock is automatically released.

The value 10 in GET_LOCK is timeout in seconds - if lock can’t be granted for this amount of time it will return “0″ indicating lock was not granted in this case you can select to continue without session or may do something else, like logging error as this generally should not happen in well tuned applications.

March 26, 2007

Only Design What You Can Implement

Posted by peter

Working with various projects using MySQL I observe a lot of problems are coming from very simple fact - product is designed containing features which developers do not know how to implement effectively.

In many companies you would see “waterfall” like approach for web application development at least on business-development boundary. Business people would dream up some features for developers to implement which developers simply take and implement as good as they can - unfortunately often not good enough to match performance and scalability requirements for the application.

In the end such approach may have nasty surprises - application gets serve performance problems due to features which might be not critical after all and even Business people would rather make things running fast than having these features working exactly as required.

So what is my point ? Only plan for features which you know how to implement to match your performance goals. If you do not know how to implement something performing well - do not implement it at all. Sometimes you can find our similar feature may be implemented working much faster way, in other cases it may be most efficient to ask for help or search for help - most problems you’ve got to deal with in web applications are already solved by someone in one form or another.

Talk to Business People They may not agree to give up features they have designed but still would like to note it will require 30 servers instead of planned 10.

Make sure you both understand performance requirements This is probably problem with smaller companies rather than big ones - sometimes speaking to developers and business people you would see very different performance metrics.
For example Business guys often forget “bots” in their planning which may be responsible for 50-70% load for some projects.

In many cases poorly implemented features are actually fine from business point of view - they still work fine on the load and the database size in question. Just make sure you know when things are expected to stop working.

Doing architecture review I always ask what growth patterns for database size and load is expected - if growth is slow it might be easily to solve things which are not created scalable with extra hardware, if growth is fast implementing things differently is often needed.

March 23, 2007

Beware large Query_Cache sizes

Posted by peter

During last couple of months I had number of consulting projects with MySQL Query Cache being source of serious problems. One would see queries both for MyISAM and Innodb tables mysteriously stalling for few seconds with “NULL” in State column.

These could be troubleshooted to waiting on Query Cache which at the same time had massive amount of entries invalidated by some batch data load job.

When you should worry ? If you set query_cache_size relatively high at 256MB or more. It can be seen worse if your query cache size is in Gigabytes. At the same time check how many queries do you have in cache - Qcache_queries_in_cache - if it is in hundreds of thousands it may take a while to invalidate them. But first of all you should have something which causes massive amount of invalidations like tens-hundreds of thousands queries being invalidated by single insert - this typically happens if you have rare bulk loads rather than constant insertion.
You can also check how long FLUSH QUERY CACHE takes - if it is in seconds this is about as long as invalidation can take in extreme case.

How to solve the problem ? The easiest solution is to decrease query cache size - smaller values do not have such problems but also often will be less efficient. In many cases you may want to disable query cache at all and use external cache such as memcached which does not have this problem. You also can cause invalidations to happen more regularly but this also should affect query cache hit rate.

March 19, 2007

Onsite consulting in Sillicon Valley anyone ?

Posted by peter

I’ve already wrote we do not generally do a lot of on-site onsite consulting, but as I’m anyway going to be speaking at MySQL Users Conference anyway it may be good time for some onsite jobs. In fact I have already scheduled visits on Friday, Monday and Tuesday following the conference but I can do couple of more till the end of that week. Just let me know soon before I bought plane tickets.

I also know I virtually know a lot of guys from Sillicon Valley and SF whom I’d love to meet in person. Drop me a note if you’re interested I’ll have number of evenings and post conference weekend available.

March 18, 2007

MySQL Releases first real Community Release

Posted by peter

Few days ago MySQL released MySQL 5.0.37 which I would call first real Community Version, because unlike previous versions which were released from the same source tree as Enterprise MySQL this one has some community features added, which is great.

Also over last half a year MySQL seems to have organized resources for Community Engineering team which means we’ll likely get more cool contributions accepted into community version.

The great question however why MySQL did not announce the split just now when it really got some real meaning from technical standpoint ? I guess it is mainly to be able to sell to more customers which would not have to pay for MySQL running it under GPL but still would like convenience of having quick binary updates and running some kind of Certified version.

But lets leave this aside - this is how Sales and Marketing frequently works - lets go ahead and sell the idea and we can fill it with meaning later on if it takes on, if not - it is cheaper to scrap stuff if there were no real engineering done to support it.

The questions which I’m pretty curious right now are the following:

Are the more cool features to follow ? Jeremy’s PROFILE patches are great but I hope the patches will be continue to flow in. In fact we have submitted our slow query log with microseconds patch so we’ll see how quickly and if it will be accepted for inclusion.

How good will be quality control This is another scary thing for me. For example I have found Fedora Core quality sub par (updates breaking things in some cases) and so I use CentOS for production servers, which is effectively rebuild of RedHat Enterprise Version. Community Version of MySQL has all chances to be as stable as Enterprise or even more stable as it will be used by larger volume of users, and if there would be such nice stuff as profiling patches in it I guess many big web guys will use community version too. This is of course if MySQL applies reasonable quality assurance to accepted code.

What is about binaries ? MySQL 5.0.37 was rare exception of community release with binaries and if MySQL follows its promises we now will not see any more community binaries for few months. For us and our customers we have no problem building binaries. Some Linux distributions, FreeBSD etc will also likely have recent community releases quickly available (again assuming quality will be where) but distributions with conservative build policies, including RHEL will need third party binaries provided. Hopefully the people will settle in community which will do it.

What is about End of Life ? One thing about community versions is they are expected to reach end of life faster, so you will need to upgrade to new MySQL Version or you would be left without fixes. Same situation with Fedora Core actually was another reason to use CentOS for servers on production - Fedora Cores keep coming up and become unsupported.
Of course in theory I could upgrade servers to the new version relatively painless but I’m not venturing it for servers which are hosted on the other continent. With MySQL it is happily less painful and if new version does not work you can normally roll back easier but still. I suspect regarding this issue again will a lot depend on MySQL - if new version will be easy to upgrade to, the amount of new bugs and incompatible changes will be small - most community members will just move, if not there will be critical mass of community members reached and process of backporting patches to community version may be organized. Another good thing - unless you used some specific community features you should be good with self built enterprise version build as soon as sources will be available during for a whole life cycle of this version.

March 15, 2007

MySQL MyISAM Active Active Clustering - looking for trouble ?

Posted by peter

Reading last few days worth of planet MySQL and commenting on some entries as you can see. The post by Oli catches my attention. There is also PDF with more details available

Oli is saying you can use MySQL with Active Active Clustering and MyISAM tables if you follow certain rules like enabling external locking, disabling query cache and delayed key writes etc. This is as far as many articles on this point go, and in theory this is what should work. In practice however you should think about what exactly are you looking for with this setup.

Performance ? This configuration adds effectively global table locks besides extra overhead you get from external locking. Plus key cache have to be invalidated with remote writes which makes even less efficient. So this configuration would unlikely be well performing compared to other setups, ie replication based. If you can’t use replication because it is async and your load loads CPU more than IO subsystem it may sense configure multiple MySQL servers this way.

High availability ? This is where the biggest gotcha is waiting for you. If other MySQL node crashes it leaves table in potentially corrupted state which you need to check and repair before you can safely use it, which takes long time for large tables. Furthermore as I remember MySQL simply was not repairing corrupted tables in this configuration even if myisam_repair is set.

Space Saving ? As I mentioned replication can be better idea but may be the problem is your data is so large you can’t afford multiple copies ? Well in this case you’re risking more in terms of downtime while repairing large tables. Disk space is cheap these days so it is rarely the problem.

Stability ? But the main problem probably comes from the fact barely anyone is using this setup and there is no QA being run with it (at least was not running) which means you can’t be sure all new features are really being compatible with such setup. As Oli describes even query cache which was implemented many years ago in 4.0 was implemented without looking at this feature, so what are you expecting from newer stuff. The other stuff which Oli mentions is table cache which does not seems to be properly controlled. Manually doing FLUSH TABLES does not really works as it does not allow to prevent race condition.

Of course sometimes your Boss simply heard Active Active clustering is cool and you should just set up that one with MySQL is yet another argument to use it :)

MySQL Query Cache and prepared statements - support comming ?

Posted by peter

Looking at post by Konstantin Osipov we should finally get Query Cache working with prepared statements in MySQL 5.1

The interesting thing Konstantin notes it just took few days to fix it - I believe MySQL Support Team alone spent much more times explaining customers why is it not done or troubleshooting cases with wrong customer assumptions regarding this problems. I think I alone spent tens of hours in various cases related to this problem.

This is one thing MySQL Development Management could consider - if the thing is annoying and it does not take a long to fix it why not to go ahead and make it done instead of spending longer time explaining over and over again why this stuff is not high priority :)

This feature should help MySQL getting even better results in SpecJAppServer benchmarks as previously it was hard choice - if you want to have fast prepared statements without query cache or no prepared statements with it.

Would this make me use/advice Query Cache more ? Not really. There are enough of over limits with Query Cache which makes it best for small simple applications.


This page was found by: mysql fixing connect... mysql remote connect... mysql "community fea...