Q&A: How to Avoid Common (but Deadly) MySQL Operations MistakesJanuary 22 I gave a presentation on “How to Avoid Common (but Deadly) MySQL Operations Mistakes” for Percona MySQL Webinars. If you missed it, you can still register to view the recording and my slides.

Thanks to everyone who attended, and especially to folks who asked the great questions. I answered as many as we had time for during the session, but here are all the questions with my complete answers:

Q: What if I use rsync to sync the mysql dir to another server as a backup?

You can do this only if you shut down the mysqld daemon first. Otherwise, you have a high risk that your copy will be incomplete and impossible to restore.

If you need to create a physical backup on a running instance of MySQL, use Percona XtraBackup. This is safe because this tool carefully synchronizes copying the tablespace with the transaction log, so it assures against getting a partial copy.

Q: Is MIXED binlog-format useful as well as ROW?

The MIXED format defaults to STATEMENT, and switches to ROW for individual events that it detects are non-deterministic and therefore unsafe for statement-based replication. In theory, this should be give you the best of both worlds. But there could still be cases where MySQL replicates an event in statement format because it fails to detect a non-deterministic case.

Q: Percona Server version 5.1 — is this version compatible with pt-mext tool?

Yes, pt-mext works fine with Percona Server 5.1.

Q: We have multiple slaves and slaves to slaves and one slave sends changes to the master. Checksum is breaking the replication.

I am not sure I follow your description of your topology, or how the slave would send changes to the master.

I suggest you contact Percona Oncall and they can help you solve any issue with running checksums.

Q: To verify a restore can the check table extended command be used? This supposedly makes sure the table is consistent. I was wondering if this is still useful in to verify a restore in the latest versions of MySQL.

CHECK TABLE analyzes a table for errors, but it can’t tell if the data is different from the data in the same table on a different instance (e.g. a slave). This can check for physical corruption after a restore, but it can’t verify that the data is correct.

Q: Query optimize really helps? And for large databases like 100GB +, how that will be affected?

By choosing query optimization techniques carefully, some queries can be made to run thousands of times faster.

The larger the table, the more important it is to make sure your queries run efficiently.

Q: Is pt-online-schema-change available in 5.1? All my servers are Percona 5.1.

Yes, pt-online-schema-change works fine with Percona Server 5.1.

Q: What is the best way to perform backup?

Choosing the right tool, schedule, and method for performing backups is a complex problem. I suggest you read a white paper Percona has published on “MySQL Backup and Recovery Best Practices.”

Q: Is there a list of measurable indicators of performance somewhere?

A good tool to help you monitor many performance indicators is Percona Monitoring Plugins. This works with popular open-source monitoring frameworks like Cacti and Zabbix.

Q: How does most of this apply to Amazon’s RDS? Not having direct root access seems like a problem.

You’re right, Amazon RDS is a convenient way to deploy a preconfigured MySQL appliance, but you don’t get to log into the server, and many MySQL tuning variables are not accessible. We can still use some of the tools we are accustomed to using with MySQL, because the tools can access a remote MySQL server. But other tools require local access to the data directory.

Amazon RDS has a management console that allows you to do backups and restores, but if you want to choose specific tools, you may need to migrate to another environment such as Amazon EC2.

Q: A sales person told me that Percona XtraDB Cluster was not fully baked yet about half a year ago, is it ready to go for production now? (we have cluster instances in Amazon for low latency)

PXC is fully baked, has a beautiful golden crust, and smells delicious.

But seriously, we’ve helped many customers deploy PXC over the past year, and it’s working in many production environments.

Q: What buffer size and max_heap_size would you recommend for small 512 Mb RAM server (runs php5-fpm+nginx)?

I suggest you try the Percona Configuration Wizard for MySQL to get you started. It will suggest configuration values appropriate for your server hardware. This won’t be optimized specifically for your site’s workload, but it will be a good start.

Q: Is there any harm in in running pt-table-sync without running pt-table-checksum?

No harm. You can optionally use pt-table-sync to calculate its own checks to find out which rows need to be synchronized. You can even synchronize MySQL instances that aren’t replication master and slave.

Q: Is Percona XtraDB Cluster a viable option when MySQL servers are located in different data centers and connected via shared Internet connections?

Yes, this is a great use case for PXC. Refer to an early proof of concept test we ran to prove that multi-datacenter clusters work, and our white paper on High Level Multi-Datacenter MySQL High Availability.

Q: Can Percona XtraBackup be used to take a backup of a single table?

Yes, you can use partial backup options to make innobackupex back up only specific databases or specific tables.

Q: What methods do you recommend to replicate the binlogs outside of replication? We are working with DRBD any other recommendations?

MySQL 5.6 adds an option to the mysqlbinlog tool to backup binary logs files continously. So you can effectively keep your binlogs backed up on a separate server for safety.

Q: How will pt-table-checksum tolerate binlog-format=MIXED with GTID replication?

pt-table-checksum must use statement-based binlog events for the checksums to work, so it overrides any default binlog row format you have defined on your system.

Q: What are your thoughts on SymmetricDS for db replication over standard MySQL replication?

I have not evaluated SymmetricDS, so I can’t offer a specific opinion about it.

Most alternative solutions fit a specific type of project, and no single solution works for everyone.
So if this one works for your needs, it’s worth taking a look at it.

You should compare it with Tungsten Replicator, which is designed for a similar use case, as a highly-available solution for multi-master and multi-site replication.

Q: A question about indexes: in a table with persons, should I add an index on the column gender?

The best indexes depend on what queries you need to run, and the selectivity of data. If you never run a query that uses the gender column for searching or sorting, there would be no benefit to adding an index on it.

Furthermore, MySQL may still not use an index even if you do search on that column, if the value you search for occurs in a large (typically 20%+) of the rows of the table.

Q: I have tried Percona XtraBackup but I’m not sure about the best way to backup full server and restore only a single database from that full backup. I’m using mysqldump to backup and restore now.

Percona XtraBackup does support a method to restore individual tablespaces, but the steps to do it are laborious, and must be done one table at a time. Restoring all tables from a given database this way is possible, but involves more work that so far is mostly manual.

To be honest, using mysqldump is probably still the easier solution for this.

Q: Does Percona Xtradb Cluster have any replication drift? How can one minimize it?

PXC uses a different method of replication, not the built-in replication of standard MySQL. For purposes of replication drift and lag, you can think of it as similar to ROW based replication with semi-synchronous slaves. PXC should therefore have minimal chance of replication drift all by itself.

Q: How reliable are Percona XtraBackup incremental backups, in combination with binary logs for point in time recovery?

The incremental backups in Percona XtraBackup work very reliably, the most common problem is when you make a mistake and apply the incremental backups in an incorrect order.

Likewise, binary logs are reliable, but you must apply all the binary logs after the last incremental backup has been applied, and you must have a contiguous set of binary logs. For maximum safety, use sync_binlog=1 to assure the last events in the binlog are written to disk.

. . .

Thanks again for attending my webinar! Here are some more tips:

4 Comments
Oldest
Newest Most Voted
Inline Feedbacks
View all comments
Jason

Regarding binlog-format=MIXED and “But there could still be cases where MySQL replicates an event in statement format because it fails to detect a non-deterministic case.”

Are there known bugs or (edge) cases that you can share or is this anecdotal?

Thanks!

Bill Karwin

Hi Jason, I don’t know of any specific case where MIXED mode fails to switch to ROW and results in non-deterministic changes. But I know the logic for the switch is complex and has undergone several changes in different versions of MySQL. I also anecdotally know that almost every site that uses MySQL uses only STATEMENT mode. For any feature that is used by a small minority of MySQL sites, I assume that there’s an increased likelihood of undiscovered bugs.

Jason

Thanks for your reply Bill.

If MIXED should be avoided I was wondering if you had any comments about why the MySQL docs state that RBR is the “safest form of replication” (http://dev.mysql.com/doc/refman/5.6/en/replication-sbr-rbr.html) but that STATEMENT is the default binlog-format (http://dev.mysql.com/doc/refman/5.6/en/replication-options-binary-log.html#sysvar_binlog_format)?

How does one reconcile that?

Cheers

Bill Karwin

Jason, it’s not a 100% simple answer.

ROW based records are safer because they solve the problem of non-deterministic statements being replicated. But ROW based binlogs have disadvantages too. For example, they sometimes make your binlogs much larger, and they can cause table locks on the slave unless the table has a primary key.

You can read a summary of pros and cons of the different binlog formats here: http://dev.mysql.com/doc/refman/5.6/en/replication-sbr-rbr.html

They did try to adopt MIXED as the default binlog format from MySQL 5.1.12 to MySQL 5.1.28 (which was still prior the 5.1 GA), but then switched back to STATEMENT as the default. I believe the reason for this was that they assumed the more mature and widely-used binlog format was a better choice for a default.