Hi, I recently had the experience of assisting with a migration of a customer MySQL installation to Amazon RDS (Relational Database Service). Amazon RDS is a great platform for hosting your MySQL installation and offers the following list of pros and cons:

  • You can scale your CPU, IOPS, and storage space separately by using Amazon RDS. Otherwise you need to take downtime and upgrade physical components of a rack-mounted server.
  • Backups, software version patching, failure detection, and (some) recovery is automated with Amazon RDS.
  • You lose shell access to your DB instance
  • You lose SUPER privilege for regular users. Many SUPER-type statements and commands are provided for as a Stored Procedure.
  • It is easy to set up multiple read replicas (slaves in READ_ONLY=1 mode).
  • You can set up a secondary sychronous instance for failover in the event your primary instance fails.

While this article is written to be Amazon RDS-specific it also has implications for any sort of migration.

  1. The only way to interface with RDS is through mysql client, which means loading data must be done using SQL. This means you need to use mysqldump or mydumper, which can be a large endeavour should your dataset be north of 500GB — this is a lot of single threaded activity!  Think about not only how long dumping and loading will take, but also factor in how much time it will take for replication to catch up on the hours/days/weeks your dumping and loading procedure took.  You might need to allocate more disk space and Provisioned IOPS to your RDS node in order to improve disk throughput, along with a change to innodb_flush_log_at_trx_commit, and sync_binlog.
  2. RDS is set to UTC (system_time_zone=UTC) and this cannot be changed as in Parameter Groups you will see that default_time_zone is set as Modifiable=false. This can bite you if you are planning to use RDS as a slave for a short while before failing the application over to Amazon RDS.  If you have configured binlog_format=STATEMENT on your master and you have TIMESTAMP columns, this will lead to differences in RDS data set for absolute values ‘2014-07-24 10:15:00’ vs NOW(). It is also a concern for the Developer who may not be explicitly declaring their MySQL connections to set an appropriate time zone. Often the best piece of advice can be to leave all database data in UTC no matter where the server is physically located, and deal with localization at the presentation layer.
  3. Amazon RDS by default has max_allowed_packet=1MB. This is pretty low as most other configs are 16MB so if you’re using extended-insert (by default, you are), the size of each insert statement will be close to 16MB and thus can lead to errors related to “packet too big” on Amazon RDS side, thus failing out an import.
  4. Amazon RDS does not support the SUPER privilege for regular users. For example, this becomes quite a challenge as many tools (Percona Toolkit) are authored to assume you have SUPER-level access on all nodes — simple tasks become vastly more complicated as you need to think of clever workarounds (I’m looking at you pt-table-sync!).
  5. Triggers and views thus cannot be applied using the default mysqldump syntax which includes SQL DEFINER entries — these lines are there so that a user with SUPER can “grant” another user ability to execute the trigger/view. Your load will fail if you forget this.
  6. Consider running your load with –force to the mysql client, and log to disk stderr/stdout so you can review errors later. It is painful to spend 4 days loading a 500GB database only to have it fail partially through because you forgot about SQL DEFINER issue..
  7. Consider splitting the mysqldump into two phases: –no-data so you dump schema only, and then –data-only so you get just the rows. This way you can isolate faults and solve them along the way.
  8. Skipping replication events is SLOW. You don’t have ability to do sql_slave_skip_counter (since this requires SUPER), instead you need to use an Amazon RDS function of mysql.rds_skip_repl_error. Sadly this Stored Procedure takes no argument and thus it only skips one event at a time. It takes about 2-3 seconds for each execution, so if you have a lot of events to skip, that’s a problem. Having to skip ANYTHING is indication that something went wrong in the process, so if you find yourself in the unenviable position of skipping events, know that pt-table-checksum should be able to give you an idea how widespread is the data divergence issue.
  9. pt-table-sync doesn’t work against Amazon RDS as it is written to expect SUPER because it wants to do binlog_format=STATEMENT in session, but that’s not allowed. Kenny Gryp hacked me a version to just skip this check, and Kenny also reported it for inclusion in a future release of Percona Toolkit, but in the meantime you need to work around the lack of SUPER privilege.
  10. pt-table-sync is SLOW against RDS. As pt-table-sync doesn’t log a lot of detail about where time is spent, I haven’t completely isolated the source of the latency, but I suspect this is more about network round trip than anything else.
  11. innodb_log_file_size is hardcoded to 128MB in Amazon RDS, you can’t change this.  innodb_log_files_in_group is not even showing up in Parameter Groups view but SHOW GLOBAL VARIABLES reports as 2. So you’re cookin’ on 256MB, if your writes are heavy this may become a bottleneck with little workaround available in MySQL.
  12. CHANGE MASTER isn’t available in RDS. You define RDS as a slave by calling a stored procedure where you pass the appropriate options such as CALL mysql.rds_set_external_master.

For those of you wondering about the SUPER-privilege, I was fortunate that Bill Karwin from Percona’s Support team took the time to review my post and suggested I dig into this deeper, turns out that Amazon didn’t hack MySQL to remove the SUPER privilege, but instead run the Stored Procedures with security_type of DEFINER:

So for those of you working with Amazon RDS, I hope that this list saves you some time and helps our your migration!  If you get stuck you can always contact Percona Consulting for assistance.

20 Comments
Oldest
Newest Most Voted
Inline Feedbacks
View all comments
Soenke Ruempler

Hi Michael,

thanks for the write-up. We did a similiar migration in the past and encountered similar issues.

An addendum to point 2: Actually you *CAN* set the timezone with a hack:

You can tune the “init_connect” parameter to set the timezone before each connect.

In RDS it looks like this:

> show variables like ‘init_connect’;
+—————+——————————————+
| Variable_name | Value |
+—————+——————————————+
| init_connect | SET SESSION time_zone = ‘Europe/Berlin’; |
+—————+——————————————+
1 row in set (0.04 sec)

Cheers,

Soenke

Elcio

Hi Michael,

Improving the original idea, I made a Stored Procedure testing the current user and seting the time zone only for ordinary users.

CREATE DEFINER=pddad@% PROCEDURE set_time_zone()
LANGUAGE SQL
NOT DETERMINISTIC
CONTAINS SQL
SQL SECURITY DEFINER
COMMENT ‘Set the time zone for Brasil/East, handling daylight savings. Tr’
BEGIN
IF NOT (POSITION(‘rdsadmin@’ IN CURRENT_USER()) = 1) THEN
SET SESSION time_zone = ‘Brazil/East’;
SET SESSION lc_time_names = ‘pt_BR’;
END IF;
END

mysql> show variables like ‘init_connect’;
+—————+—————————+
| Variable_name | Value |
+—————+—————————+
| init_connect | call shared.set_time_zone |
+—————+—————————+
1 row in set (0.03 sec)

Cheers,

Elcio.

yocmen

Hi Elcio, please, can u tell me if this is a good idea on a high traffic RDS??? with lots (millions) of requests???? thanks in advance…

Elcio

Another tip.
If you intend to use funcions and Stored procedures, you need to enable log_bin_trust_function_creators, by default it´s turned off (log_bin_trust_function_creators=0);

Hi Michael,

This is interesting, all I see is a list of cons, and I don’t see any pros.

By the way, you can’t even use phpMyAdmin to interface with Amazon RDS? I think we’ve used it once on Amazon – but it’s been some time.

krishna

Hi Michael,

Nicely done. This post might saved my few days head-spin on the point 2 if posted few months back.

derek

Valuable article. Thanks!

Christian

Where can one get this hacked version of pt-table-sync? Do you know if this will make it into the percona tools mainline? I desperately need to work with RDS MIXED replication type.

Armen Ro.

Hi there – I would greatly appreciate a copy of the hacked version of the pt-table-sync script you mentioned.

Finding a workaround for skipping that mode would be of GREAT use to me.

How might I get into contact with you to find a workaround?

Thanks!
_Armen

Bill Karwin

Another effect of the small innodb_log_file_size is that you can’t insert large BLOB/TEXT content. The log files must be 10x larger than the largest content you insert to an InnoDB table. Since you’re limited to 256MB of log file space, that means you can’t insert a blob or text greater than 25MB. This might be a deal-breaker for a given application.

I was at a talk about RDS at the recent AWS Summit in April 2015, and I asked if they had any plans to make the log file size configurable. The speaker said no, no plans, and suggested that we should store large blobs outside the database, and store a filename in the database.

Michael Coburn

Hi @Armen, sorry that I missed your message! If you want to contact me I can be reached via email at michael.coburn.com

Mahesh Chaudhari

Thank you very much for such informative and useful article. I have been struggling with percona connectivity with amazon rds since last few weeks. I did get the password for rdsadmin@localhost. However, I still cannot connect to the rds instance using that username and password because rdsadmin is allowed to connect only on localhost. How do I use rdsadmin to connect via percona? Do you have some example that you can share?

Thanks a bunch.
Mahesh

Michael Coburn

Hi Mahesh,

You don’t want to use the rdsadmin account for anything, this is reserved for Amazon to run their own monitoring and other routines within your instance. You are better off creating multiple other users, perhaps one for your application, one for monitoring, and one for your own access — each will require different levels of permissions.

What is it you mean with “Percona connectivity with Amazon RDS” — are you using RDS as a slave or master to a Percona Server instance?

Mahesh Chaudhari

Hello Michael,

I have been trying to use Percona XtraBackup is an open-source hot backup utility for MySQL to do my backups and restore over bunch of Amazon RDS instances. However, the users that I have dont have super privileges. Hence I wanted to see how we can make XtraBackup to work.

Brendan Price

@Mahesh

To restore a backup from XtraBackup you would need access to the filesystem, which you don’t get from using RDS. I’m not sure what your circumstances are, but if you are unable to stop your current DB and are looking to restore from a backup (say, to setup replication for instance), you’re best bet is to do so in EC2. Once fully restored in EC2, you can then stop that DB and do a MySQL dump from EC2 into RDS. I should note that if you can stop the DB that is currently running, you would not even need EC2 in that case. Hope this helps.

Anup Bharti

Good information but Does RDS allow replication filters? .My organization is planning to migrate few of the tables to new RDS instance .

Anagha

Thanks for this information! Also, great inputs by other contributors on this page.
I have a question:
I am looking to connect my nodejs application, which is deployed in OpenStack, to a MySQL DB instance in Amazon RDS. Any pointers as to what I need to configure on Amazon RDS side or on OpenStack side to enable this connection ?

Thanks.