Setting up replication from RDS to external MySQL

At first, when I set out to set up the replication between MySQL on Amazon Web Services RDS to external MySQL, I found a few comments that it cannot be done or shouldn't be done, because it's not supported by Amazon. In any case, I was set out for the task, and I found that it was actually as simple as following the official documentation on Using Replication to Export MySQL Data. In this post I will go through exactly what need's to be done, and you'll see how simple it really is. To complete the instructions in this post you'll need RDS MySQL version 5.6.13+ or 5.7.10+.

I will explain how you set up replication without any downtime. However, it comes at a cost - you'll need a RDS Read Replica. During the dump process the Replica will lag, so it should be used solely for dumping the master data. If it's fine with some downtime, then you will not need the Read Replica, but can dump the data from the master directly.

Configuring the RDS master

The first step is to create a MySQL user for the replication. Connect to the master RDS with the MySQL client and run:

mysql> CREATE USER 'repl'@'%' IDENTIFIED BY 'slavepwd';  
mysql> GRANT REPLICATION SLAVE ON *.* TO 'repl'@'%';  

Feel free to change the username repl and password slavepwd.

You'll also need to make sure that the retention period of the Binary Log is long enough. You need to make sure that replication is started while the Binary Log is still in place on master. To set the retention period, make sure to set the RDS backup retention period at least as high. Then run:

mysql> CALL mysql.rds_set_configuration('binlog retention hours', 48);  

This will make sure that the logs are available for at least 48 hours. Make sure that you have enough storage space on RDS master to keep the logs for the specified period!

Dumping the database

The next step is to dump the data from RDS. This is done by using the tool mysqldump against the Read Replica, while the replication is stopped. This is necessary to get the log position right, so that the replication can be resumed correctly. It's also important that all tables are converted to the InnoDB storage engine, to avoid unconsistency and unexpected errors.

Start with stopping the replication. This is done by running CALL mysql.rds_stop_replication;. If you want to automate the backups you can do use the client with an inline command:

mysql -p -u root -h <read replica endpoint> -e 'CALL mysql.rds_stop_replication;'  

The expected output is:

+---------------------------+
| Message                   |
+---------------------------+
| Slave is down or disabled |
+---------------------------+

In this post I will use -p, which will make the client ask for the password. In a script you should add the password after -p or use --defaults-extra-file=<file> with the password specified in the file.

When the replication has stopped, you can find out the current Binary Log file and position:

mysql -p -u root -h -e 'SHOW SLAVE STATUS\G'

From the output you should parse out or note Master_Log_File and Exec_Master_Log_Pos.

Now it's time to create the dump file. Normally when creating a dump file for replication, the option --master-data is used, but this is not supported by RDS. Instead we will imitate this behavior. Create the dump by command:

mysqldump -p -h <read replica host> \  
    -u root \
    --single-transaction \
    --routines \
    --apply-slave-statements \
    --databases <database to replicate> > dump.sql

The option --apply-slave-statements will add STOP SLAVE; and START SLAVE; statements to the dump, to make sure that the slave this dump is restored to will be stopped and then started when the whole dump has been read.

As the dump has been completed we can start the replication again:

mysql -p -u root -h <read replica endpoint> -e 'CALL mysql.rds_start_replication;'  

The expected output is:

+-------------------------+
| Message                 |
+-------------------------+
| Slave running normally. |
+-------------------------+

As we couldn't use --master-data we will add a CHANGE MASTER TO statement manually when the dump is ready. Add the statement before CREATE DATABASE line in dump.sql or use sed -i "/CREATE DATABASE/i <statement>" to insert the the statement at the correct place. The statement should be:

CHANGE MASTER TO MASTER_HOST='<master endpoint>', MASTER_USER='repl',\  
 MASTER_PASSWORD='slavepwd', MASTER_LOG_FILE='<Master_Log_File value>',\
 MASTER_LOG_POS=<Exec_Master_Log_Pos value>;

Now the dump is complete, and you have the specified Binary Log retention period to start using it.

Restore and start replication

You need to have a MySQL server with access to the master RDS. To set up a slave you also need to specify a server-id in the MySQL configuration (/etc/mysql/my.cnf). You might also need to add time zones, which can be done like this:

mysql_tzinfo_to_sql /usr/share/zoneinfo | mysql -u root -D mysql -p  

When MySQL is configured and time zones are loaded, restart it:

sudo service mysql restart  

Now you can copy dump.sql to the MySQL instance and run:

cat dump.sql | mysql -u root -p  

When the command has returned, all data has been restored and MySQL should be replicating from the RDS master. Check the status by connecting to MySQL and run SHOW SLAVE STATUS\G. That statement will tell will if the slave has encountered any errors and give an indication of the replica lag.

I hope you enjoy using your new EC2 MySQL slave!