April 25, 2014

Opening Tables scalability

I was restarting MySQL on box with 50.000 of Innodb tables and again it took couple of hours to reach decent performance because of “Opening Tables” stage was taking long.

Part of the problem is Innodb is updating stats on each table open which is possibly expensive operation, but really it is only great test case for general MySQL problem.

During warmup process I noticed I get very low CPU usage and disk Usage and IOWait about 25% (4CPU box) which indicates table opens and stats updates are serialized rather than performed in parallel.

I’ve checked with Heikki and he confirmed MySQL has global table cache mutex (LOCK_open) which is held for all open table operation so table opens are serialized.

Do not think however it is only Innodb problem. I’ve seen similar problems with MyISAM – these also take few IO operations to open and could take quite a while to close if there were unflushed key blocks and we have no idea what other storage engines may do to perform table open – some may need network operation etc.

In general I think it is extremely poor design choice to have global mutexes for anything which may require blocking physical IO or network operation if you care about scalability. MySQL has same problem with key buffer but that was fixed in MySQL 4.1, this one still remains.

I do not blame Monty – implementing first MySQL version in 3 months he had to take shortcuts and implement most simple solutions for many things. Now years later and having 50+ developers this should have been fixed.

Until this is fixed it is especially important to keep your table_cache large enough so table opens will be rare (I use 1/sec as a number to worry about) and also worry about possible limited performance while table cache is being warmed up in addition to all other caches :)

About Peter Zaitsev

Peter managed the High Performance Group within MySQL until 2006, when he founded Percona. Peter has a Master's Degree in Computer Science and is an expert in database kernels, computer hardware, and application scaling.

Comments

  1. med amine says:

    I am an engineering student, and I am working on the subject of mysql database sharding. ds you can suggest me books or links useful for my work .. thank you in advance ..

  2. Jay Pipes says:

    Hi Peter!

    Tell me, does the table_definition_cache/table_open_cache split in 5.1 resolve these issues, or is that a separate thing?

    Cheers!

    Jay

  3. Frank Mash says:

    Hey Peter,

    Wassup man.

    Why do you have 50,000 tables on one server?

    Just curious :)
    Frank

  4. peter says:

    Jay,

    Honestly I do not know. Have not tested with 5.1 as that one is production. You can check with developers and proudly tell me “all table cache global locks are removed in 5.1″ if it is the case :)

  5. peter says:

    Frank,

    50.000 tables is probably a bit too many but many tables per server often is much better then single set of large tables, even if you use Innodb.

    You can get better clustering of data together so local operations are faster, each of tables fits in memory completely so batch operations like purging etc go much faster than they would go with single large table, not to mention ALTER TABLE, OPTIMIZE TABLE etc.

    It also makes it easier to transfer table to other server if needed etc.

    I know it looks quite contradicting to DBMS theory which teaches to use large table and let DBMS to do it job but in practice it may not be always best :)

  6. James Day says:

    Frank, there are people who have more than 100,000 tables on a server. It’s one reason why the InnoDB data dictionary in RAM was made 50-70% smaller in 5.1 and why a couple of other InnoDB changes have been made in 5.0 and 5.1 to speed up handling of lots of tables.

    We had a feature request recently from someone who wanted more than 32,000 databases on a file system that couldn’t handle that many subdirectories, so the MySQL database count was limited by that. No current plans to do this.

    The InnoDB stats continue to be quite frustrating, since the sample size isn’t big enough to give consistent query results at larger table sizes and the overhead at opening is annoying. It’d be nice if it remembered the stats and refreshed occasionally, or perhaps added to them in the background during use.

  7. peter says:

    James

    I honestly do not think MySQL should work around file system limits (this is about more than 32000 databases) plus there is solution to that problem – use symlinks – this way you can get as many directories as you want :)

  8. thomas says:

    or just use a decent filesystem :).

  9. Alexey says:

    With 32000 databases the real bottleneck would be MySQL privilege system. It becomes a problem when you have 5000 databases, and with 10000 databases you’ll have to fix some things in the source – otherwise queries like SHOW DATABASES will use 100% of your CPU for a couple of minutes.

  10. peter says:

    Alexey,

    Why would privilege system cause the problems ? Or are you speaking about special case of web hosting then there would be 32000 users created for each database. With handful of users I never seen it being the problem.

    Also “SHOW DATABASES” is probably not the query you really care about (It does not mean it should run for minutes though) – most queries would only need to check permissions to few databases/tables.

  11. Alexey Polyakov says:

    Yes, I was speaking about web hosting case. Companies that sell shared web hosting services often place all of their customers’ DBs to a dedicated boxes. Most of DBs do not produce high load so it’s ok to stuff thousands or even tens of thousands databases on a single server.
    I’m not sure why one would want to create thousands of DBs belonging to a handful of users. :)

    Regarding ‘show databases’ – you have to care about it in a web hosting case. phpMyAdmin calls it twice when you open it’s main page. 10 users actively working with phpMyAdmin can ruin the performance of otherwise not-heavily-loaded server.

  12. Alexey says:

    Here’s a piece of profiling data from live system, running lot of DBs without too much load (about 80 qps, select-heavy with pretty good hit rate):
    root@titanic [/var/lib/mysql]# ls -la|wc -l
    24547
    root@titanic [/var/lib/mysql]# opreport -l *mysqld|head
    warning: /oprofile could not be found.
    CPU: AMD64 processors, speed 2394.07 MHz (estimated)
    Counted CPU_CLK_UNHALTED events (Cycles outside of halt state) with a unit mask of 0×00 (No unit mask) count 100000
    samples % image name symbol name
    10174899 26.6171 libc-2.3.4.so strcmp
    5460631 14.2848 mysqld acl_getroot(THD*, user_resources*, char const*, unsigned int)
    2592977 6.7831 libc-2.3.4.so memcpy
    1956617 5.1184 vmlinux-2.6.17.7 copy_user_generic_c
    1797812 4.7030 mysqld Query_cache::insert_into_free_memory_list(Query_cache_block*)
    1685928 4.4103 mysqld my_lengthsp_8bit
    1066380 2.7896 mysqld anonymous symbol from section .plt

    First two functions are privilege-system related (strcmp is used for comparing logins, filenames etc., mysql uses it’s own functions for matching data). As you can see those two functions alone consume more than 40% of cpu time. That’s after fixing show databases problem, with show databases the number would be more like 95%. :)

  13. Matt says:

    If anyone is interested in improving the way MySQL deals with a ton of table we (WordPress.com) would happily sponsor that development. We have about 5 million tables today and it’s going up constantly. I’m reachable on my site.

  14. peter says:

    Alexey,

    OK This is different story when privilege check system comes into play. I guess it was not optimized well to handle such large amount of tables. I remember seeing quite a few of list scans when I worked with code few years ago.
    I would suggest to post it as a bug and hope it would be fixed sometime.

  15. The 32000 limit is in ext2 and ext3 (still not fixed for ext4, although there is an old patch to make it 64000). If you happen to be using ext as a file system with a lot of either databases or file systems, you might try adding dir_index and then do a tune. Don’t know what the speed up is, your mileage may vary.

    Matt: why 5 million tables? We host 500,000 forums and don’t give each one its own set of tables, it was just too much work and too slow and complex.

  16. peter says:

    Steven,

    Alexey is saying MySQL Privilege system may be slow with so huge number of databases and a lot of grants on database level.

    Regarding Matt, as I understand Matt is using standard software as a base which has its own set of tables for each user. If you have something custom having so many tables is of course not good idea.

  17. Hakan says:

    WordPress.com hosts a free blog system running WordPress Mu (multiuser wordpress) and it creates a new set of tables for every new blog.

    /Hakan

  18. peter says:

    This actually was always curious for me why WordPress MU was not fixed to operate with smaller amount of tables rather than create single table set for each users.

    You can blame MySQL for certain performance problems but generally dealing with a lot of files is slower with most of file systems.

  19. Sysadm says:

    We have forum hosting community – around 3 500 000 myisam tables per each server. Each forum gets it’s own database (~90 tables in one database) . We have similar issue – when server is overloaded, status of all threads in “show processlist” is “opening tables”. We have many php and system tweaks but this problem seems to be hard to resolve without MySQL source code rewrite -we have not enought knowledge to do it. It’s MySQL 4.0.

    We have tried to increase table_cache but it doesn’t make any sense with such many useable tables – best performance setting for us is table_cache=0.

    Now it’s hosted on Linux 2.6 & XFS which supports more than 32K subdirectories in one directory. Reiserfs supports too, but it’s terribly unstable within such load and brokes it’s journal itself very often, and reiserfs is not visible faster in this than XFS.

    I’m afraid of putting such many tables into one database – this cause worse performance problems – like hosting >10 000 000 files in one directory (3 files per table)…

  20. art says:

    Does mysql open all innodb tables at startup or only with query on that particular table?

  21. peter says:

    MySQL only opens tables once you access them.

  22. We work on the basis of a different database for each new company account. The problem I’m starting to have with 5000 databases is the show databases and use database commands are taking a long time to run and are showing up in my slow-queries log. Are there any tips for making this go faster?

  23. Mattias J says:

    We have some 2000 databases with the same set of 250+ tables (both numbers constantly growing) = 500’000 tables and counting on the same server.

    As we are trying to scale out, our main options seems to be either “manually” partition different databases to different servers, or putting all the data in a single set of large tables in combination with built-in MySQL Partitioning.

    But how does MySQL Partitioning affect table opening / tables cache???

    Are there other options we should consider?

  24. Sergei says:

    And I was wondering if MySQL can handle a few thousands of tables. Compared to wordpress’s numbers, this is like a toy project :-)

  25. Andreas Bergman says:

    Hi,

    Do you know if this issue is fixed in 5.5?

    We run a WP multisite with about 200k tables and as tables in open_table cache increase the execution time of the query increase and at last there is always 400 querys running and most of them in opening/closing state. If I restart the process it works like a charm for a couple of hours, and then it starts all over again.

    Any ideas?
    Andreas

Speak Your Mind

*