May 30, 2006

InnoDB memory usage

Posted by Vadim

There are many questions how InnoDB allocates memory. I’ll try to give some explanation about the memory allocation at startup.
Some important constants:
NBLOCKS=count of block in innodb_buffer_pool = innodb_buffer_pool_size / 16384
OS_THREADS= if (innodb_buffer_pool_size >= 1000Mb) = 50000
else if (innodb_buffer_pool_size >= 8Mb) = 10000
else = 1000 (it’s true for *nixes, for Windows there is a bit another calculation for OS_THREADS)
So InnoDB uses:

  • innodb_buffer_pool
  • innodb_additional_mem_pool_size
  • innodb_log_buffer_size
  • adaptive index hash, size= innodb_buffer_pool / 64
  • system dictionary hash, size = 6 * innodb_buffer_pool_size / 512
  • memory for sync_array, which is used for syncronization primitives, size = OS_THREADS * 152
  • memory for os_events, which are also used for syncronization primitives, OS_THREADS * 216
  • and memory for locking system, size=5 * 4 * NBLOCKS

So the final formula for innodb:
innodb_buffer_pool_size + innodb_log_buffer_size + innodb_additional_mem_pool_size + 812 / 16384 * innodb_buffer_pool_size + OS_THREADS * 368

For simplicity we can use: 812 / 16384 * innodb_buffer_pool_size ~~ innodb_buffer_pool_size / 20
and OS_THREADS*368 = 17.5MB if innodb_buffer_pool > 1000MB
= 3.5MB if > 8MB

For example if you have innodb_buffer_pool_size=1500M, innodb_additional_mem_pool_size = 20M, innodb_log_buffer_size = 8M, InnoDB will allocate = 1500M + 20M + 8M + 1500/20M + 17.5M = 1620.5M.
Take the additional memory into account when you are planning memory usage for your server.

May 29, 2006

Join performance of MyISAM and Innodb

Posted by peter

We had discussion today which involved benchmarks of Join speed for MyISAM and Innodb storage engines for CPU bound workload, this is when data size is small enough to fit in memory and so buffer pool.

I tested very simple table, having with about 20.000 rows in it on 32bit Linux. The columns "id" "i" and "c" were populated with same integers so we can allow the same job to be done using different kinds of columns - primary key, integer indexed column and indexed char column. The query is also trivial - the point was to make sure it is not index covered query so it reads the rows and it does not return many rows. I varied the join clause to be id, i and C columns appropriately.

SQL:
  1. CREATE TABLE `t1` (
  2.   `id` int(10) UNSIGNED NOT NULL DEFAULT '0',
  3.   `i` int(10) UNSIGNED NOT NULL DEFAULT '0',
  4.   `c` char(15) DEFAULT NULL,
  5.   `pad` char(8) DEFAULT NULL,
  6.   PRIMARY KEY  (`id`),
  7.   KEY `i` (`i`),
  8.   KEY `c` (`c`)
  9. ) ENGINE=MyISAM DEFAULT CHARSET=utf8;
  10.  
  11. SELECT count(t1.pad),count(t2.pad) FROM t1,t1 t2 WHERE t1.id=t2.id;

The result I've got are as follows

Storage Engine ID I C
MyISAM 0.24s 0.27s 1.19s
Innodb 0.07s 0.30s 0.38s

As you see in such circumstances Innodb is actually faster than MyISAM in 2 cases out of 3. I guess the reasons are the following:

  • Innodb primary key joins are very fast as data is clustered together with index and generally highly optimized
  • Innodb builds hash indexes which helps to speed up lookup by indexes by passing BTREE index and using hash, which is faster
  • MyISAM does compression for character keys which makes it perform slower for random lookups
  • MyISAM generally has lower processing overhead due to its simplicity
  • MyISAM still a bit better by primary key join than for secondary key join. I guess because it knows for sure there is no more than one row which matches the index, so there is no need for MySQL to request next row matching index

Note: This applies to CPU bound workload with all content fitting in memory. In other cases situation is very different and MyISAM compression for char keys could frequently positevely impact performance.

INSERT ON DUPLICATE KEY UPDATE and summary counters.

Posted by peter

INSERT ... ON DUPLICATE KEY UPDATE is very powerful but often forgotten MySQL feature. It was introduced in MySQL 4.1 but I still constantly see people unaware of it.

Myself I like this feature big deal because it is designed in truly MySQL style - very efficient solution for freqent task while keeping it beautiful and easy to use.

So what is this feature great for ? Well any kind of maintaining counters. If you're writing traffic accounting it could be traffic and number of packet passed for given port or IP address. For Web Applications it could be counting number of visits per page or IP address, number of times particular keyword was searched etc.

This functionality also makes it very easy to do incremental single pass log file processing and building summary tables.

Here is example:

SQL:
  1. CREATE TABLE ipstat(ip int UNSIGNED NOT NULL PRIMARY KEY,
  2.                           hits int UNSIGNED NOT NULL,
  3.                           last_hit timestamp);
  4.  
  5. INSERT INTO ipstat VALUES(inet_aton('192.168.0.1'),1,now())
  6.                        ON duplicate KEY UPDATE hits=hits+1;

This example actually shows one more neat feature of MySQL - inet_aton and inet_ntoa functions which can convert IP address strings to integers and back. This allows to save on field length significantly by using 4 bytes instead of 15

The third feature this example takes advantage of is TIMESTAMP field. By default first TIMESTAMP column will have its value automatically updated to current timestamp on insert and update. We actually could have omitted now() in insert clause but this would require to specify list of columns which we skipped for sake of example.

So how would this example work ? Well just you would expect it. If there is no such IP address in the table it will be added with hits=1 if it is already where (note ip is PRIMARY KEY) it would be just incremented and last visit timestamp updated.

The benefit of using this feature insted of INSERT + UPDATE could be different, depending on number of new rows and data set size. 30% speedup should be typical. Performance increase is not the only benefit - what is even more important the application code becomes simplier - less error prone and easy to read.

May 27, 2006

MySQL SpecJAppServer2004 results

Posted by peter

Sun has now published SpecJAppServer2004 benchmark results with MySQL. The results are pretty good as we can see and it is also good to know some room for tuning remains so we can hope getting even better results this time. Jenny Chen published good MySQL Tuning writeup so I would not repeat it. It was great colaboration between Sun and MySQL teams which allowed us to deliver these results.

Jeremy Cole on MySQL Replication

Posted by peter

Jeremy Cole recently posted very nice post about MySQL Replication Performance. There are however few points I should comment on.

  • Jeremy Speaks about MyISAM and Innodb in the same terms, in fact they are quite different for Replication. As you probably know MyISAM uses buffered IO so if master crashes you're out of sync anyway - whatever sync_binlog option was set to. Well it is still a bit better with sync_binlog enabled as slave could have consistent data after it catches up but master may have some of its tables out of sync with binary log and it is not easy to find which.
  • I'm surprised to read enabling binary log drops update performance. I guess Jeremy speaks about enabling binary log and setting sync_binlog option which is good for safe replication. But it is setting which is off by default so you should not see such effect just by enabling log_bin.
  • Innodb tables also have to flush logs on transaction commit unless you disable it so you should not be getting 1500 update transactions per second from the single thread without battery backed up cache. If you do it is quite possible Operation system is faking fsync() for you. In certain cases it only does so for sequential writes this is why sync_binlog changes things. Two logs are synced now which requires seeks which make certain disks to flush their cache.
  • Watch out for MySQL 5.0 - if you enabled Binary log with MySQL 5.0 you loose group commit which can dramatically increase commit rate for multiple user load. See this post by Vadim for benchmark results.
  • Solution with Battery Backed up RAID is great one and is not that expensive these days. Other two probably just create conditions for Operation System to fake fsync. You need to get data to the disk and you can't physically get data to the disk surface at 1500 operarions per second. Jeremy is speaking about seeks but it is only part of the latency. Disk rotation is another big contibutor. Even if we stay on the same track we'll be only able to do single disk write per rotation, which is about 250 for 15000 RPM drive. If you're getting more without battery backed up cache something is likely faking it.

I also should mention this covers only Master overhead of Replication. Slave is where you can expect more problems, especially on high end systems. Slave serializes all operations to single stream so group commit can't work. Furthermore even if you run with innodb_flush_logs_at_trx_commit=2 you still do not solve all problems. You still will practically have only One CPU to handle replication load. Furthermore if you have multiple disks they will not be used effectively as replication SQL thread will typically submit IO requests to the disk one by one so only one drive will do the work and other will be idle. You mostly really affected by this if you have a lot of writes to large database size.
This is one more reason why you might want to go with scale out and keep single server a box with few CPUs and few hard drives while running MySQL.

May 26, 2006

MyISAM mmap feature (5.1)

Posted by Vadim

As you know MyISAM does not cache data, only indexes. MyISAM assumes OS cache is good enough and uses pread/pwrite system calls for reading/writing datafiles. However OS is not always good in this task, my benchmarks show Linux/Solaris aren't scalable on intensive pread calls (I believe the same for Windows, but I did not test it).
In 5.1 I implemented a new feature: memory mapping for the datafiles. That can be enabled by --myisam_use_mmap=1 startup option.
In this case instead of systems call MyISAM will use memcpy function. There is a memory addressing limit for 32bit platforms - 2Gb, so the datafiles over 2GB will be used the old way - pread/pwrite functions. Mmap is available on all POSIX-compatible platforms. It will work faster for SELECT/UPDATE/INSERT inside file queries, and no performance gain (maybe a bit slower) for INSERT at the end of file. In case with INSERT at the end of file we have to use a remap technic - resize memory mmaped area to new extended size. Currenlty we call remap once per 1000 inserts at the end of file and on an exclusive operation (DELETE/UPDATE/INSERT inside file), for work with non-mmaped area we use pread/pwrite functions.
To approve effectiveness of memory mapping several benchmarks:

[read more...]

May 22, 2006

How Web Services world affect LAMP Stack

Posted by peter

In connection to my previous post I was thinking how Web 2.0 with its massive use of Web Services affect LAMP Stack ? Well actually there are serious difference both for client and server application development which you might want to take into account.

If you're developing Web Service Server application you will most likely be dealing with two types of requests

  • Real time requests. These are executed during web page generation on the client and add up to its response time. These are important to be answered as fast as possible. If you can't service them it time it might be better to return partial results or error code - not everyone is good at handling timeouts properly. One more nice feature you might consider is allowing to execute multiple operations with single request. This both allow you to service them in parallel and reduce number of round trips which are very expensive.
  • Batch Operations. Web services are also often used in batch mode - where client just needs to process certain data using Web Service. In this case you do not really care about response time of individual requests it is throughtput which is important. Such requests are also might be treated with idle priority - batch processing often can be done during the night when real time load is light. Actually even certain web requests may fall into this category - for example Search Engine bots often much more patient than users and can wait for page generation longer. Of course Batch operations would also benefit from ability to submit multiple requests at the same time, as well as keep alive.

There are not many web services out where which make this difference but I think it would be quite useful especially with Business to Business web services which might expect competent users using them.

There are also number of changes on the client application, which is frequently yet another web application:

  • More workers required. In my post yesterday I mentioned you might need only few "workers" - actively working web server processes, if you do not have to wait for slow clients. Now you however also have to wait for remote services during web page generation which must be done while worker is active. So you end up needing much more workers, especially dealing with slow web services.
  • Timeouts. Web services might get slower themselves or network connectivity to them might slow things down so proper timeout handling is a must. If you use multiple web services you might want to use timeout for total call times during same web page generation. For example you might want your page to be generated in 5 seconds and if first web service responded in 4 seconds second one will have only one second to respond. You might use different values but it is better to display page with partial information than no page at all. You also might use certain flags to avoid querying service if it is down and thus avoid having each page generated with maximum allowed response time.
  • Graceful failure. Web services are out of our control so are networks, so you should plan for web services to be too slow or unavailable sometimes. It is great if you can have partial service functionality even in such circumstances.
  • Intelligent caching. Working with local data we typically cache in memory (possibly network memory) as it is normally fast enough to regenerate the data. With Web services it is slower plus you might have limits on how frequently you can call Web Services. So you might want to cache data on Disk - in files or MySQL tables. There are multiple solutions available out where. In addition to improving speed of your application caching couple be used to provide graceful failure. It is often better to show a bit stale data rather than show no data at all.

There are many other interesting challenges and questions in Web Services world, so I should write more on this topic sometimes.

May 21, 2006

Speedup your LAMP stack with LigHTTPD

Posted by peter

Apache Web server could frequently be bottleneck for your LAMP stack, or you might be wasting resources on your web servers. The problem with Apache server comes from "client per process" architecture which you have to stick to at least for dynamic content served by PHP or ModPerl. With "client per process" you end up having large amount of processes if you have many clients. As processes service all requests they can get you end up having a lot of memory allocated per process which is not fully released to OS even if process is serving static requests or doing keepalive. So why do you need many processes ? Keep alive is one thing, which you can disable of course. But the real problem is slow clients which need to be spoon feed. This is especially the problem for large pages and files but with bad/slow networks even small objects may cause the stall. There are multiple solutions and workarounds to this problem:

  • Just get more memory/more boxes. The good thing with Web servers is - it is easy to scale, this is if you have money to waste of course. Just get more servers, beef them up with memory and you should be fine. If this is your choice you can stop reading :)
  • Restrict resources. You can limit number of Apache children (MaxClients 50) , disable keep alive (KeepAlive Off) and relay on Operation System connection buffering (ListenBacklog 1024) but such approach is often bad for site performance. Even if you can balance it to work right in normal case, minor network or database server slowdowns may cause overload.
  • Use LingerD. I have not ever used this one, but in theory it should relieve apache from closing connection which is part of the job, however it does not play good with keep alive and does not eliminate all problems.
  • Use FastCGI instead of mod_php etc. Apache itself without any programming language modules is light, futhermore you may use threaded worker models to keep overhead lower. However FastCGI is a bit less tested interface, so some software might not play well with it, some, including APC requires careful configuration to work like you expect it to work.
  • Use other server, ie lighttpd with FastCGI. Honestly I like this solution more - it allows you to get rid of all the fat Apache server has and get efficient event driven socket processing. This is however hard step as it may require changes in your application if you used apache specific modules - ie mod_rewrite, .htaccess, apache specific environment variables etc. So this is typically the path for new applications which do not require legacy applications.
  • Use Reverse proxy in front. You may configure other Apache server to use as reverse proxy, you may also use Squid, Lighttpd, Nginx etc, in fact Most of light http servers support such configuration. There are several benefits in this configuration. One obvious one is Caching - it however requires you to provide cache expiration information for dynamic contents etc which your applications might not be doing at this point. The other benefit is more important for scope of this article. Now you have proxy server spoonfeeding the clients, doing keep-alive and your apache is relieved from this job, so you can shrink number of needed children dramatically. One more benefit is load balacing and HA - you may have round robin or random request routing to your proxy servers, which pretty much any load balancer can do, but you may configure each of them to do hash routing to real servers (especially for static content this may help to increase file cache efficiency dramatically). Such solution also typically allow you to relieve Apache from serving static content - Proxy may be configured to handle only dynamic requests, serving static files by itself. The tiny but annoying with such configuration could be logging. Instead of real IP of your clients you would see 10.0.0.1 or whatever IP your proxy server is located. Just recently I found out - there is great module out where - mod_rpaf which can replace remote host with IP passed by your local proxy. So now you can put reverse proxy infront of your Apache web server in 10 minutes and you will likely need no changes in your application.
  • Dedicated server for static content. If you serve a lot of static content - music, videos, file download I guess it is good idea anyway as such servers might be configured differently. You also may have different routing policies set for these servers - you want Web pages do be real time while you do not really care about latency for large files download. If your mostly static contents are few images used in pages, setting them to be directly services by reverse proxy servers could be enough.

The note about LigHTTPD - I like this one in particular because it is really easy to set up, it is pretty stable which is proved by number of major sites using it. It has good set of modules, decent documentation and community helping to develop it. So it is not single guy project any more. I however can't say it is best one - I simply did not test too much of other alternatives as it already worked well. There are few features which it lacks and which could make it faster. Few missing items I should mention:

  • Persistent connections to FastCGI servers
  • Keep alive from Reverse proxy to web server
  • Async IO for serving large large files

The good thing I hear most of these are already in works. Thanks Jan and Lighttpd contributors.

May 19, 2006

Group commit and XA

Posted by Vadim

Returning to post Group commit and real fsync I made several experiments:

I ran sysbench update_key benchmarks without ---log-bin, with ---log-bin, and with ---log-bin and ---innodb-support-xa=0 (default value is 1). Results (in transactions / sec)

threads without ---log-bin ---log-bin ---log-bin and
---innodb_support-xa=0
1 1218.68 614.94 1010.44
4 2686.36 667.77 1162.60
16 3993.59 666.14 1161.56
64 3630.55 665.18 1151.36

So we can see group commit is not only broken when XA is enabled but also if XA is disabled but binary log is enabled. Performance without XA can be twice as good as with XA if binary logs are enabled as Innodb will need to flush its log buffer only once. So, if you are using ---log-bin with innodb tables it makes sense to set ---innodb-support-xa=0

May 17, 2006

MySQL Server Memory Usage

Posted by peter

Every so often people ask me the question how should they estimate memory consumption by MySQL Server in given configuration. What is the formula they could use.

The reasons to worry about memory usage are quite understandable. If you configure MySQL Server so it uses too small amount of memory it will likey perform suboptimally. If you however configure it so it consumes too much memory it may be crashing , failing to execute queries or make operation to swap seriously slowing down. On now legacy 32bit platforms you could also run out of address space so that had to be watched as well.
Having said so, I do not think looking for the secret fomula to compute your possible memory usage is the right approach to this problem. The reasons are - this formula is very complex nowadays and what is even more important "theoretically possible" maximum it provides have nothing to do with real memory consumptions. In fact typical server with 8GB of memory will often run with maximum theoretical memory usage of 100GB or more. Furthermore there is no easy "overcommit factor" you can use - it really depends on application and configuration. Some applications will drive server to 10% of theoretical memory consumptions others only to 1%.

So what could you do instead ? First take a look at global buffers which are allocated at start and always where - these are key_buffer_size, innodb_buffer_pool_size, innodb_additional_memory_pool_size, innodb_log_buffer_size, query_cache_size. If you're using MyISAM seriously you can also add the size of Operation System cache you would like MySQL to use for your table. Take this number add to it number of memory Operation System and other applications need, add might be 32MB more for MySQL Server code and various small static buffers. This is memory which you can consider used when you just start MySQL Server. The rest of memory is available for connections. For exampe with 8GB server you might have everything listed adding up to 6GB, so you have 2GB left for your threads.

Each thread connecting to MySQL server will needs its own buffers. About 256K is allocated at once even if thread is idle - they are used by default thread stack, net buffer etc. If transaction is started some more space can add up. Running small queries might only barely increase memory consumption for given thread, however if table will perform complex operations such as full table scans, sorts, or need temporary tables as much as read_buffer_size, sort_buffer_size, read_rnd_buffer_size, tmp_table_size of memory might be allocated. But they are only allocated upon the need and freed once given stage of query is done. Some of them are allocated as single chunk at once others, for example tmp_table_size is rather maximum amount of memory MySQL will allocate for this operation. Note it is more complicated than once may think - multiple buffers of the same type might be allocated for exampe to handle subqueries. For some special queries memory usage might be even larger - bulk inserts may allocate bulk_insert_buffer_size bytes of memory if done to MyISAM tables. myisam_sort_buffer_size used for ALTER TABLE, OPTIMIZE TABLE, REPAIR TABLE commands.

For OLTP applications with simple queries memory consumption is often less than 1MB per thread with default buffers, and you really do not need to increase per thread buffers unless you have complex queries. Sorting 10 rows will be as fast with 1MB sort buffer as with 16MB (actually 16MB might be even slower but it is other story).

Another approach you may take is to come up with amount of memory you want MySQL Server to consume at peak. This can be easily computed by memory needed for OS, File Cache and other applications. For 32bit envinronment you also should keep 32bit limits into account and probably limit "mysqld" size to about 2.5GB (exact number depens on a lot of factors). Now you can use "ps aux" to see VSZ - Virtual Memory allocated by MySQL process. You can also look at "Resident Memory" but I find it less helpful as it may down because of swapping - not what you would like to see. Monitor how the value changes so you know memory requirements with current settings and increase/decrease values appropriately.

Some may say, Hey we want to have 100% guarantee our server will never run out of memory, no matter which queries or users will decide to run. Unfortunately this is as much close to impossible to be impractical. Here is why:

List of rarely considered MySQL Server Memory Requirements

  • Thread buffers can be allocated more than once for each thread. Consider for example subqueries - each layer may need its own read_buffer,sort_buffer, tmp_table_size etc
  • Many variabes can be set per connection. So you can't relay on global values if developers may use their local values to run some queries.
  • There can be mutiple key caches. Multiple key caches can be created to accomodate query executions
  • Query Parsing and optimization needs memory. This is usually small to be ignored but certain queries can have very large memory requrement for this step, especially specially crafted ones.
  • Stored Procedures. Compex stored procedures may require a lot of memory
  • Prepared statements and Cursors. Single connection may have many prepared statements and cursors. Their number finally can be limited but each of them still can have very large memory consumption
  • Innodb Table Cache. Innodb has its own table cache in which meta data about each table accessed from the start is stored. It is never purged and may be large if you have a lot of tables. It also means user having CREATE TABLE privilege should be able to run MySQL server out of memory
  • MyISAM buffers. MyISAM may allocate buffer which is large enough to contain largest record in the given table which is held until table is closed.
  • Federated Storage Engine. This may have unbound memory requirements retriving result sets from remove queries.
  • Blobs may require 3x time of memory. This is important if you're deaing with large Blobs (your max_allowed_packet is large) Processing of 256MB of blob may require 768MB of memory.
  • Storage Engines. In general storage engines may have their own per thread or global memory allocations which are not tuned as buffers. Watch for these especially now with many storage engines being released for MySQL by various parties.

I do not pretend this to be complete list. On the contrary I'm quite sure I've missed something (drop me a note if you have something to add). But the main point is - there are a lot of memory consumers out where and trying to find peak possible usage for each is impractical - so my advice would be measure what you get in practice and how memory consumption reacts to changing various variables. For example you may find out increasing sort_buffer_size from 1MB to 4MB and 1000 max_connections increases peak memory consumption just 30MB not 3000MB as you might have counted.


This page was found by: innodb_additional_me... innodb_buffer_pool mysql maximum query ... lighttpd index.so increase innodb_buff...