August 11, 2006

Database problems in MySQL/PHP Applications

Posted by peter

Article about database design problems is being discussed by Kristian.

Both article itself and responce cause mixed feellings so I decided it is worth commenting:

1. Using mysql_* functions directly This is probably bad but I do not like solutions proposed by original article ether. PEAR is slow as well as other complex conectors. I have not yet tested PDO but would not expect it to beat MySQLi in speed. It is however bad idea to use mysql_ functions directly as well - I would go for using mysqli object approach. The great things about objects is you can easily overload methods and get debugging and profiling tools, as well as have tools which protect you from SQL Injections.

For example I have little wrapper which allows to do $dbcon->query("Select email from user where name=%s",$name) - wrapper will detect query is being called with multiple parameters and will perform needed checks and query rewriting. You also can use pretty much direct path to mysqli extension to performance critical queries if you need.

I would also note for many PHP applications abstraction layer is not the main performance problem, also benefit from persistent connections can be much more modest. DVD Store was special type of application which was designed to have very simple logic besides database - in most cases you would have beautiful page rendering as well as much more queries per page which will make performance improvement much smaller. Notable exception being AJAX applications which may do very little work and formating, so database connection may become the issue. Caching should be good help in this case though.

About Consulting - it is worth to mention it was my group which was Dell DVD Store optimization, and I'm now on my own, offering MySQL and LAMP Consuilting Services.

2. Not using auto_increment functionality This is right. With some exception however. For example Innodb tables do internal full table lock if auto_increment is used so using values generated elseware might be faster.

3. Using multiple databases Honestly I do not see application using one database per table that often. I however often see applications using multiple databases to group tables by certain logic, such as you do with directories to group files. I think this makes a lot of sense. Sometimes grouping is done so a lot of databases are needed - for example if grouping is done by user. This might be a bit extreem if you have thousands of users - I would rather do many to many relationship between users and tables but it also might work.

Regarding if you use many tables you're doing something wrong it is frequently told by people with traditional database background. Things are different with MySQL.

There are many successful applications, using tens of thousands of tables per host and archiving great performance by doing so.
Using multiple tables gives some very important benefits - your data becomes managable, your ALTER TABLE or OPTIMIZE TABLE now locks small table for few seconds rather than giant 100GB table for few hours so can be done pretty much online. You also get good data clustering so table becomes hot very quickly due to data locality once this user starts his queries. It is also much easier to do backup and restore if you need only portion of your data recovered.

There are some performance problems with many tables some are OS and File System dependent, others correspond to Innodb storage engine or using innodb_file_per_table option in particular.

4. Not using relations This one is right one but also with the catch. It is very traditional recommendation to normalize your data however it does not always bring good performance. Joins are expensive and you can often do much better with denormalized data. You may wish to use denormalized data as cached lookup table however so you do not have all these problems with loosing data etc. Read more in my Why MySQL Could be slow with Large Tables article.

5. The n+1 pattern This probably should rather be called Not using Join. This is typical error. On other hand in MySQL you might be better of using several queries than doing complicated ones. Of course you would rather use IN() than do 100 of queries in this case. This most applies to subqueries Where Subselects with IN() become corellated even if they are not, and so using IN() list of values derived by previous query. For example you can do:

SQL:
  1. SELECT id FROM users WHERE featured=1;
  2.  
  3. Now populate List FOR IN ON your PHP application:
  4.  
  5. SELECT * FROM articles WHERE user_id IN(23,545,654,34)
  6.  
  7. instead of:
  8.  
  9. SELECT * FROM articles WHERE user_id IN (SELECT id FROM users WHERE featured=1)

Some day this should be fixed however but do not expect it soon.

Use Indexes This item was not in original article, however I think this is the most common mistake and it is very important to fix it. Most applications I have to fix have number of indexing missing which requires queries to do full table scans. Funny enough this is often not the problem in the beginning - if application is bought or custom ordered it frequently can pass customer QA - it will work quite fast with almost empty database. With database growth it will however start to crawl.
So developing you PHP applications use test database with reasonable amount of data in it. And do run EXPLAIN for your queries, especially if you see them in slow query log. If you have trouble understanding EXPLAIN or optimizing your queries remember
we're here to help.

Related posts: :Only Design What You Can Implement::Are you designing IO bound or CPU bound application ?::MySQL Performance Forums:
 

11 Comments »

  1. [...] Update:  Seems I am not alone.  Peter, its not bad, you are right.  Kristian backs up my feelings about speed as well. [...]

    Pingback :: August 11, 2006 @ 11:04 am

  2. 2. Vadim

    What is considered as “reasonable amount of data”?
    I understand, that it depends on application,number of tables/columns, but just as rule of thumb, how many rows ensure that I am on safe side?

    Comment :: August 12, 2006 @ 6:33 am

  3. Vadim,

    For test dataset rule is pretty simple - system should behave same as it will behave on production. So generate about same amount of data as you would run on the single box in production. Sometimes you want to scale it down a bit if your test system is low end.

    Basically you want two things to apply - queries should be executed same way on production and test system. Read EXPLAIN should be the same. Second - cache efficiency should be similar. CPU bound workload can’t be compared to disk bound.

    Comment :: August 13, 2006 @ 6:47 am

  4. On other hand in MySQL you might be better of using several queries than doing complicated ones.

    This is true, however most developers don’t realize that this is *within the database system only*. ie, if you only go to the database once, then go ahead, run as many queries as you want.

    If a system is being developed with a *remote* database in mind, you actually have to find a good balance. If the network back and forth with 100 queries costs more than just doing the complex query in the first place, then it’s faster to do the complex query and only go across the network and back to the database once.

    Comment :: August 13, 2006 @ 4:49 pm

  5. Sheery,

    I would not be limiting it to single system. If you’re using several boxes you brobably have them on local network with 1GBit connection between them. This allows you do do many thousands of queries per second from single connection. From multiple connections it will be many tens of thousands per second.

    Network is fast these days, this is why memcached is getting so popular or MySQL Cluster can exist.

    Now you probably do not want to do 100 of the queries instead of one. Please take a close look at my recommendation I recommend using 2-3 queries when MySQL optimizer does not optimize query efficient enough or for some other reason single query requires much more work than separate queries.

    Comment :: August 14, 2006 @ 12:34 am

  6. 6. suma

    my website has 6 subdomain and separate databases…. is that the reason why it is slow (im using php mysql with wordpress)

    Comment :: November 2, 2006 @ 12:14 am

  7. Suma,

    It should not be reason why it is slow per say.

    Comment :: November 2, 2006 @ 2:46 am

  8. Has anyone seen a situation where mysql reports max
    connections reached and freezes, with a bunch of processes waiting to finish.
    Also, mysql will not shut down unless a manual kill -9 is run.

    Our mysql db contains a mix of innodb and myism tables and run on linux with 16gigs or ram.

    Here is our my.cnf

    # sammple MySQL config file for very large systems.
    #
    #
    # This is for a large system with memory of 1G-2G where the system runs mainly
    # MySQL.
    #
    # You can copy this file to
    # /etc/my.cnf to set global options,
    # mysql-data-dir/my.cnf to set server-specific options (in this
    # installation this directory is /usr/local/mysql/data) or
    # ~/.my.cnf to set user-specific options.
    #
    # In this file, you can use all long options that a program supports.
    # If you want to know which options a program supports, run the program
    # with the “–help” option.

    # The following options will be passed to all MySQL clients
    [client]
    #password = your_password
    port = 3306
    socket = /tmp/mysql.sock
    #tmpdir =/mysql_tmp/
    # Here follows entries for some specific programs

    # The MySQL server
    [mysqld]
    port = 3306
    socket = /tmp/mysql.sock
    bind-address=10.234.94.71
    skip-locking
    key_buffer_size = 2000M
    max_allowed_packet = 32M

    # table_cache=20M
    # open-files-limit=20000

    table_cache = 3072
    open_files_limit = 9216

    tmp_table_size=1000M
    sort_buffer_size = 100M
    read_buffer_size = 100M
    read_rnd_buffer_size = 100M
    myisam_sort_buffer_size = 100M
    max_length_for_sort_data=2048
    max_sort_length=2048
    long-query-time=5
    log-slow-queries=/apps/log/slow-query
    interactive_timeout=300
    wait_timeout=300
    thread_cache = 40
    max_connections=500
    query_cache_size = 2000M
    # Try number of CPU’s*2 for thread_concurrency
    thread_concurrency = 8
    ft_min_word_len=3
    #skip-grant-tables

    # Don’t listen on a TCP/IP port at all. This can be a security enhancement,
    # if all processes that need to connect to mysqld run on the same host.
    # All interaction with mysqld must be made via Unix sockets or named pipes.
    # Note that using this option without enabling named pipes on Windows
    # (via the “enable-named-pipe” option) will render mysqld useless!
    #
    #skip-networking

    # Replication Master Server (default)
    # binary logging is required for replication
    log-bin=db1-bin
    log-bin-index=db1-bin.index

    binlog-ignore-db=chrome_vin
    binlog-ignore-db=dummyData
    #binlog-ignore-db=edmunds
    #binlog-ignore-db=evox
    #binlog-ignore-db=jato
    #binlog-ignore-db=kbb
    binlog-ignore-db=mysql
    binlog-ignore-db=test
    #binlog-ignore-db=us_incentives_extract
    #binlog-ignore-db=vehicles
    #binlog-ignore-db=voiceshot

    # required unique id between 1 and 2^32 - 1
    # defaults to 1 if master-host is not set
    # but will not function as a master if omitted
    server-id = 1

    # Replication Slave (comment out master section to use this)
    #
    # To configure this host as a replication slave, you can choose between
    # two methods :
    #
    # 1) Use the CHANGE MASTER TO command (fully described in our manual) -
    # the syntax is:
    #
    # CHANGE MASTER TO MASTER_HOST=, MASTER_PORT=,
    # MASTER_USER=, MASTER_PASSWORD= ;
    #
    # where you replace , , by quoted strings and
    # by the master’s port number (3306 by default).
    #
    # Example:
    #
    # CHANGE MASTER TO MASTER_HOST=’125.564.12.1′, MASTER_PORT=3306,
    # MASTER_USER=’joe’, MASTER_PASSWORD=’secret’;
    #
    # OR
    #
    # 2) Set the variables below. However, in case you choose this method, then
    # start replication for the first time (even unsuccessfully, for example
    # if you mistyped the password in master-password and the slave fails to
    # connect), the slave will create a master.info file, and any later
    # change in this file to the variables’ values below will be ignored and
    # overridden by the content of the master.info file, unless you shutdown
    # the slave server, delete master.info and restart the slaver server.
    # For that reason, you may want to leave the lines below untouched
    # (commented) and instead use CHANGE MASTER TO (see above)
    #
    # required unique id between 2 and 2^32 - 1
    # (and different from the master)
    # defaults to 2 if master-host is set
    # but will not function as a slave if omitted
    #server-id = 2
    #
    # The replication master for this slave - required
    #master-host =
    #
    # The username the slave will use for authentication when connecting
    # to the master - required
    #master-user =
    #
    # The password the slave will authenticate with when connecting to
    # the master - required
    #master-password =
    #
    # The port the master is listening on.
    # optional - defaults to 3306
    #master-port =
    #
    # binary logging - not required for slaves, but recommended
    #log-bin
    # Point the following paths to different dedicated disks
    #tmpdir = /tmp/
    #log-update = /path-to-dedicated-directory/hostname
    tmpdir =/mysql_tmp/:/tmp/
    # Uncomment the following if you are using BDB tables
    #bdb_cache_size = 384M
    #bdb_max_lock = 100000

    # Uncomment the following if you are using InnoDB tables
    #innodb_data_home_dir = /usr/local/mysql/data/
    #innodb_data_file_path = ibdata1:2000M;ibdata2:10M:autoextend
    #innodb_log_group_home_dir = /usr/local/mysql/data/
    #innodb_log_arch_dir = /usr/local/mysql/data/
    innodb_data_home_dir = /db
    innodb_data_file_path = ibdata1:10M:autoextend
    innodb_log_group_home_dir = /db
    innodb_log_arch_dir = /db

    # You can set .._buffer_pool_size up to 50 - 80 %
    # of RAM but beware of setting memory usage too high
    innodb_buffer_pool_size = 8000M
    #innodb_additional_mem_pool_size = 80M
    # Set .._log_file_size to 25 % of buffer pool size
    innodb_log_file_size = 1000M
    #innodb_log_buffer_size = 32M
    #innodb_flush_log_at_trx_commit = 1
    #innodb_lock_wait_timeout = 50

    [mysqldump]
    quick
    max_allowed_packet = 16M

    [mysql]
    no-auto-rehash
    # Remove the next comment character if you are not familiar with SQL
    #safe-updates

    [isamchk]
    key_buffer = 256M
    sort_buffer_size = 256M
    read_buffer = 2M
    write_buffer = 2M

    [myisamchk]
    key_buffer = 256M
    sort_buffer_size = 256M
    read_buffer = 2M
    write_buffer = 2M

    #[mysqlhotcopy]
    interactive-timeout

    Comment :: November 6, 2006 @ 12:06 pm

  9. Greg, Can you show processlist so we can understand exactly what you mean as well as specify MySQL version and what kind of binary you’re using. It would be best if you report problem on forum as it is not really related to this blog post.

    Comment :: November 12, 2006 @ 9:23 pm

  10. I have a question in regards to using multiple databases I wanted some advice. I have a jewelry website I am working on and we are switching everything dynamically. Lets say we are selling Jewerly. The user clicks on Jewerly and from Jewerly they click on a brand, lets called it Brand A - under Brand A there are “Rings, Necklaces, Engagement Rings” When a user clicks on “Rings” there is a page that displays sub-categories like “Wedding Band” so each category like Rings, Necklaces, etc, have their own sub=categories of items. How would I go about structering the database? Should I have 1 DB per Vendor (i.e. Brand A, B, C, D)? I am stuck figuring this out because each Vendor A, B, C, D etc have their own MAIN CATEGORIES and in those MAIN CATEGORIES you have sub-categories.

    Thanks so much for your help.

    Comment :: December 21, 2006 @ 5:01 pm

  11. 11. juddy

    Have anyone encountered a problem when posting a specific rowid and it posts to all?

    Comment :: November 16, 2007 @ 3:12 pm

 



Subscribe without commenting


This page was found by: mysql + problems