Recently we’ve been having some problems with our production server. Every now and then it just reboots – for no aparent reason. We’re still looking into the reason this occurs (looks like an overheating CPU) however when the reboot occurs, it causes our Mysql replication slaves to get out-of-sync with the master.

The following steps detail how to correct this problem:

1) Firstly, we need to get a new dump of all the databases on the production server. To do this we need to initiate a read-lock to stop the data in the tables from changing while we do the dump:

FLUSH TABLES WITH READ LOCK;

2) Secondly we need to dump the databases to a file which we can transfer to the slave replication server:

mysqldump --all-databases --master-data > dbdump.db

3) Once this process is complete, we can unlock the tables so that production can be written to again:

UNLOCK TABLES;

4) Now we need to compress and transfer the dump file to the replication slave:

On Production:

tar -czf dbdump.db.tar.gz dbdump.db

Now on the replication slave:

scp production.myhost.com:/tmp/dbdump.db.tar.gz .

5) On the slave server, ensure that its slave service is stopped:

stop slave;

6) Now unzip the database file and load it:

tar -zxf dbdump.db.tar.gz
mysql -uroot -p
 < dbdump.db

7) Start the slave process:

start slave;

8 ) Your replication slave show now be working! To check, issue this command on the production server:

mysql> show slave hosts;
+-----------+------------------------------+------+-------------------+-----------+
| Server_id | Host                         | Port | Rpl_recovery_rank | Master_id |
+-----------+------------------------------+------+-------------------+-----------+
|         2 | slave.myhost.com | 3306 |                 0 |         1 |
+-----------+------------------------------+------+-------------------+-----------+

And you should be back up and replicating now!