MySQL Replication Monitoring and Failover with GTID

GTID (Global Transaction Identifiers) replication, available since MySQL 5.6, makes replica promotion and failover deterministic — each transaction has a unique ID that replicas track, so a replica can be promoted to primary without manually calculating binary log positions. For WordPress high-availability setups, GTID replication combined with a monitoring script and WordPress’s HyperDB drop-in provides read scaling with automatic write failover.

Problem: A WordPress database cluster uses MySQL replication, but monitoring replica lag and automating failover are manual — a primary failure requires manually promoting a replica, updating wp-config.php, and restarting PHP-FPM.

Solution: Use GTID-based replication (gtid_mode = ON, enforce_gtid_consistency = ON) to make replica promotion deterministic — any replica can become primary without manually calculating binary log positions. Monitor replica lag with SHOW REPLICA STATUS\G and alert when Seconds_Behind_Source exceeds a threshold. Use Orchestrator or MySQL Router for automated failover.


The SQL and Bash below configure GTID replication, show how to monitor replica lag and GTID gaps, perform a manual failover, and integrate the health check into a WordPress HyperDB configuration.


-- ── Primary: enable GTID mode (add to my.cnf then restart) ───────────────
-- gtid_mode           = ON
-- enforce_gtid_consistency = ON
-- log_bin             = ON
-- binlog_format       = ROW

-- Create replication user on primary
CREATE USER 'replicator'@'10.0.0.%'
    IDENTIFIED WITH caching_sha2_password BY 'strong-password';
GRANT REPLICATION SLAVE ON *.* TO 'replicator'@'10.0.0.%';

-- ── Replica: connect using GTID auto-positioning ─────────────────────────
CHANGE REPLICATION SOURCE TO
    SOURCE_HOST     = '10.0.0.1',
    SOURCE_USER     = 'replicator',
    SOURCE_PASSWORD = 'strong-password',
    SOURCE_AUTO_POSITION = 1;   -- GTID auto-position (no log file/pos needed)

START REPLICA;

-- ── Monitor replication health ────────────────────────────────────────────
SHOW REPLICA STATUS\G
-- Key fields:
--   Seconds_Behind_Source   : replica lag in seconds (0 = in sync)
--   Replica_SQL_Running      : YES = applying transactions
--   Replica_IO_Running       : YES = receiving binlog
--   Retrieved_Gtid_Set       : GTIDs received from primary
--   Executed_Gtid_Set        : GTIDs applied on replica

-- Detect GTID gaps: transactions received but not yet applied
SELECT GTID_SUBTRACT(
    @@global.gtid_purged,
    (SELECT Executed_Gtid_Set FROM performance_schema.replication_applier_status_by_coordinator)
) AS gtid_lag;

-- ── Failover: promote replica to primary ──────────────────────────────────
-- 1. On old primary (if accessible): FLUSH TABLES WITH READ LOCK;
-- 2. On replica:
STOP REPLICA;
RESET REPLICA ALL;
-- The replica is now standalone; update app DB_HOST to point here


#!/bin/bash
# Replication health check script (run via cron every 60s)
REPLICA_HOST="10.0.0.2"
MYSQL="mysql -h $REPLICA_HOST -u monitor -pmonitor_password --skip-column-names -e"

LAG=$($MYSQL "SHOW REPLICA STATUS\G" | grep 'Seconds_Behind_Source' | awk '{print $2}')
IO_OK=$($MYSQL "SHOW REPLICA STATUS\G" | grep 'Replica_IO_Running' | awk '{print $2}')
SQL_OK=$($MYSQL "SHOW REPLICA STATUS\G" | grep 'Replica_SQL_Running' | awk '{print $2}')

if [[ "$IO_OK" != "Yes" || "$SQL_OK" != "Yes" ]]; then
    echo "CRITICAL: Replication stopped on $REPLICA_HOST"
    # Alert via PagerDuty / Slack webhook here
elif [[ "$LAG" -gt 30 ]]; then
    echo "WARNING: Replica lag ${LAG}s on $REPLICA_HOST"
else
    echo "OK: Replica lag ${LAG}s"
fi


NOTE: Never run RESET REPLICA ALL on a replica that is still receiving traffic from the primary — it clears all replication metadata including GTID state; only run it after the replica has been fully promoted and the primary's VIP or DNS has been updated to point to the new primary.