Yesterday I had fun time repairing 1.5Tb ext3 partition, containing many millions of files. Of course it should have never happened – this was decent PowerEdge 2850 box with RAID volume, ECC memory and reliable CentOS 4.4 distribution but still it did. We had “journal failed” message in kernel log and filesystem needed to be checked and repaired even though it is journaling file system which should not need checks in normal use, even in case of power failures. Checking and repairing took many hours especially as automatic check on boot failed and had to be manually restarted.

Same may happen with Innodb tables. They are designed to never crash, surviving power failures and even partial page writes but still they can get corrupted because of MySQL bugs, OS Bugs or hardware bugs, misconfiguration or failures.

Sometimes corruption kind be mild, so ALTER TABLE to rebuild the table fixes it. Sometimes table needs to be dropped and recovered from backup but in certain cases you may need to reimport whole database – if corruption is happens to be in undo tablespace or log files.

So do not forget to have your recovery plan this kind failures. This is one thing you better to have backups for. Backups however take time to restore, especially if you do point in time recovery using binary log to get to actual database state.

The good practice to approach this kind of problem is first to have enough redundancy. I always assume any component, such as piece of hardware or software can fail, even if this piece of hardware has some internal redundancy by itself, such as RAID or SAN solutions.

If you can’t afford full redundancy for everything (and probably even if you do) the good idea is to keep your objects smaller so if you need to do any maintenance with them it will take less times. Smaller RAID volumes would typically rebuild faster, smaller database size per system (yet another reason to like medium end commodity hardware) makes it faster to recover, smaller tables allow per table backup and recovery to happen faster.

With MySQL and blocking ALTER TABLE there is yet another reason to keep tables small, so you do not have to use complicated scenarios to do simple things. Assume for example you need to add extra column to 500GB Innodb table. It will probably take long hours or even days for ALTER TABLE to complete and about 500GB of temporary space will be required which you simply might not have. You can of course use MASTER-MASTER replication and run statement on one server, switch role and then do it on other, but if alter table takes several days do you really can afford having no box to fall back to for such a long time ?

On other hand if you would have 500 of 1GB tables it would be very easy – you can simply move small pieces of data offline for a minute and alter them live. Also all process will be much faster this way as whole indexes will well fit in memory for such small tables.

Not to mention splitting 500 tables to several servers will likely be easy than splitting one big one.

There are bunch of complications with many tables of course, it is not always easy to partition your data appropriately, also code gets complicated but for many applications it is worth the trouble

At NNSEEK for example we have data split at 256 groups of tables. Current data size is small enough so even single table would not be big problem but it is much easier to write your code to handle split from very beginning rather than try to add in later on when there are 100 helper scripts written etc.

For the same reason I would recommend setting up multiple virtual servers even if you work with physical one in the beginning. Different accounts with different permissions will be good enough. Doing so will ensure you will not have problems once you will really need to scale to multiple servers.

14 Comments
Oldest
Newest Most Voted
Inline Feedbacks
View all comments
Daniel Schneller

What about the new partitioning feature of recent MySQL versions? Would that help? Because splitting the data as you say is something should have considered right from the beginning. If not, you end up with lots and lots of code to adapt and handle the new situation. Moreover I suspect it might not be possible to split data into separate tables under all circumstances, as e. g. joins etc. might get too complicated to maintain.

Mike

I’m not exactly a MySQL expert, but I do use it on a daily basis and have had to evolve a backup strategy in the face of not-always-reliable hardware. What I’ve done is this: I have two “main” servers, with lots of RAM and processors, in a dual-master-slave configuration. One acts as a primary and one as a backup. If one fails, I can switch over to the other one seamlessly. I just move an IP alias from one machine to the other. But then the problem is, how do I restore the failed server if I’ve had a crash that corrupts the tables, or the filesystem? The solution is that I have a third and fourth replication slave, one on-site and one off-site via VPN. Once I repair whatever’s broken on the downed “main” server, I then shut down MySQL on one of the backup slaves and synchronize the files with rsync. I could improve this situation even moreso with LVM or EVMS snapshots, and perhaps automated failover, but for now, this approach has saved me on a few occasions, and my boss is happy with the near-zero downtime.

Peter Halliday

You said:

“On other hand if you would have 500 of 1GB tables it would be very easy – you can simply move small pieces of data offline for a minute and alter them live. Also all process will be much faster this way as whole indexes will well fit in memory for such small tables.”

How exactly would you recommend taking small tables offline? Partitioning in this way manually has always intregued me, but it is the maintainance like this that I didn’t understand.

Piet

How is performance affected when I’d have eg private data for 500 customers in tables cust_1_data upto cust_500_data, instead of 1 big cust_data table with a customer_id column ?

Doesn’t having lots of tables give a performance penalty ? And how does it compare to a performance increase for having smaller tables ?

A select clause for eg cust 33 would then be ‘select * from cust_33_data’, instead of ‘select * from cust_data where customer_id = 33’.

I’ve often wondered about this.

sean

Peter,
I have a question about ALTER TABLE and how it logs incoming writes. The documentation states that the “ALTER TABLE works by making a temporary copy of the original table. The alteration is performed on the copy, and then the original table is deleted and the new one is renamed. While ALTER TABLE is executing, the original table is readable by other clients. Updates and writes to the table are stalled until the new table is ready, and then are automatically redirected to the new table without any failed updates.”

I’m hoping you an elaborate on ‘…are stalled until the new table is ready…’ I assume MySQL is storing these writes somewhere, either in memory or perhaps on disk but I’m not sure.

Thanks.

lotso

peter,

coming back to your answer to sean’s question. When I do an alter table (to add a new column), I see that a new table is being populated at the same time to add the new column. (BTW, why does it do this? Experience with MSSQL is that an alter table takes merely seconds to add a new column). I always presumed that it was doing such because it needs to ..

1. Maintain a read/write copy of the existing table
2. be able to still populate data
3. be able to switch over w/o any loss in data as soon as the task is finished.

Is this not true?

lotso

Hi Peter,

After reading your post on http://www.mysqlperformanceblog.com/2006/06/09/why-mysql-could-be-slow-with-large-tables/ I was just very surprised when reading some of the comments that an alter table on a 500GB (say) table could take up to 2 days!! I have my tables as separate files/table (which I thought was a good idea, now I _know_ it’s a good idea)

Actually, I don’t get the statement regarding the column information NOT stored with each row. (can you point me to other resource to read up on? or even a google phrase)