MySQL Galera Cluster Configuration
Galera Cluster provides synchronous multi-master replication for MySQL and MariaDB databases. Unlike traditional asynchronous replication, Galera ensures that all nodes have the same data at any given moment, eliminating replication lag and providing true high availability.
🎯 What is Galera Cluster?
Galera Cluster is a synchronous multi-master cluster solution that uses certification-based replication. Every transaction is certified across all nodes before being committed, ensuring data consistency.
| Feature | Description |
|---|---|
| Synchronous Replication | All nodes contain identical data at all times |
| Multi-Master | Read and write to any node |
| Automatic Node Joining | New nodes automatically sync with cluster |
| Automatic Failover | No single point of failure |
| Row-Level Parallel Replication | High performance parallel applying |
💡 Key Advantage: Galera provides "virtually synchronous" replication - the commit on the originating node waits until certification succeeds on all nodes, but doesn't wait for applying on all nodes.
🔧 Architecture Overview
┌─────────────────────────────────────────────────────────────────────┐
│ GALERA CLUSTER TOPOLOGY │
├─────────────────────────────────────────────────────────────────────┤
│ │
│ ┌───────────────┐ ┌───────────────┐ ┌───────────────┐ │
│ │ NODE 1 │ │ NODE 2 │ │ NODE 3 │ │
│ │ (Primary) │◄──►│ (Primary) │◄──►│ (Primary) │ │
│ │ R/W │ │ R/W │ │ R/W │ │
│ └───────┬───────┘ └───────┬───────┘ └───────┬───────┘ │
│ │ │ │ │
│ │ Galera Replication Protocol │ │
│ │ (wsrep - Write Set Replication) │ │
│ │ │ │ │
│ ┌───────▼────────────────────▼────────────────────▼───────┐ │
│ │ Group Communication System │ │
│ │ (GComm / GCS) │ │
│ └─────────────────────────────────────────────────────────┘ │
│ │
│ Transaction Flow: │
│ 1. Client commits on Node 1 │
│ 2. Write-set created and broadcast to all nodes │
│ 3. All nodes certify (check for conflicts) │
│ 4. If certified, commit proceeds on all nodes │
│ 5. Client receives commit acknowledgment │
│ │
└─────────────────────────────────────────────────────────────────────┘
📋 Available Implementations
| Implementation | Based On | Notes |
|---|---|---|
| Percona XtraDB Cluster (PXC) | Percona Server | Most popular, includes XtraBackup |
| MariaDB Galera Cluster | MariaDB | Built into MariaDB 10.1+ |
| MySQL + Galera | MySQL Community | Codership's original implementation |
🚀 Installation Guide
Installing Percona XtraDB Cluster (Recommended)
# Node 1, 2, and 3 - Install Percona XtraDB Cluster
# Add Percona repository
wget https://repo.percona.com/apt/percona-release_latest.$(lsb_release -sc)_all.deb
dpkg -i percona-release_latest.$(lsb_release -sc)_all.deb
percona-release setup pxc80
# Install PXC
apt-get update
apt-get install percona-xtradb-cluster
# During installation, set root password when prompted
Installing MariaDB Galera Cluster
# Add MariaDB repository
curl -LsS https://downloads.mariadb.com/MariaDB/mariadb_repo_setup | sudo bash
# Install MariaDB Galera
apt-get install mariadb-server galera-4
# Verify Galera support
mysql -u root -p -e "SHOW STATUS LIKE 'wsrep%';"
🔧 Configuration
Node 1 Configuration
# /etc/mysql/mysql.conf.d/galera.cnf
[mysqld]
# ============================================
# Basic MySQL Settings
# ============================================
server_id = 1
bind-address = 0.0.0.0
datadir = /var/lib/mysql
# Binary logging (required)
log_bin = mysql-bin
binlog_format = ROW
log_slave_updates = ON
expire_logs_days = 7
# InnoDB Settings (required for Galera)
default_storage_engine = InnoDB
innodb_autoinc_lock_mode = 2 # Required for Galera
innodb_flush_log_at_trx_commit = 2
innodb_buffer_pool_size = 4G
# ============================================
# Galera Cluster Settings
# ============================================
wsrep_on = ON
# Galera Provider
wsrep_provider = /usr/lib/galera4/libgalera_smm.so
# Cluster name (must be identical on all nodes)
wsrep_cluster_name = "production_cluster"
# Cluster addresses (all nodes)
wsrep_cluster_address = "gcomm://192.168.1.101,192.168.1.102,192.168.1.103"
# Node name and address
wsrep_node_name = "node1"
wsrep_node_address = "192.168.1.101"
# SST (State Snapshot Transfer) Method
wsrep_sst_method = xtrabackup-v2
wsrep_sst_auth = "sst_user:SSTPassword123!"
# Certification-based replication
wsrep_certify_nonPK = 1
wsrep_max_ws_rows = 0
wsrep_max_ws_size = 2147483647
# Parallel threads for applying
wsrep_slave_threads = 8
wsrep_log_conflicts = ON
# ============================================
# Performance Tuning
# ============================================
wsrep_provider_options = "gcache.size=1G; gcs.fc_limit=256; gcs.fc_factor=0.9; evs.send_window=256; evs.user_send_window=128"
# ============================================
# Certification Settings
# ============================================
wsrep_retry_autocommit = 3
wsrep_auto_increment_control = ON
Node 2 Configuration
# /etc/mysql/mysql.conf.d/galera.cnf
[mysqld]
# Same as Node 1, except:
server_id = 2
wsrep_node_name = "node2"
wsrep_node_address = "192.168.1.102"
# All other settings identical to Node 1
Node 3 Configuration
# /etc/mysql/mysql.conf.d/galera.cnf
[mysqld]
# Same as Node 1, except:
server_id = 3
wsrep_node_name = "node3"
wsrep_node_address = "192.168.1.103"
# All other settings identical to Node 1
🚀 Bootstrapping the Cluster
Step 1: Create SST User on All Nodes
-- Run on each node before starting the cluster
CREATE USER 'sst_user'@'localhost' IDENTIFIED BY 'SSTPassword123!';
GRANT RELOAD, LOCK TABLES, PROCESS, REPLICATION CLIENT ON *.* TO 'sst_user'@'localhost';
GRANT BACKUP_ADMIN ON *.* TO 'sst_user'@'localhost';
FLUSH PRIVILEGES;
Step 2: Bootstrap First Node
# On Node 1 (first node only!)
# Stop MySQL if running
systemctl stop mysql
# Bootstrap the cluster
# For Percona XtraDB Cluster:
systemctl start mysql@bootstrap
# For MariaDB Galera:
galera_new_cluster
# Verify bootstrap
mysql -u root -p -e "SHOW STATUS LIKE 'wsrep_cluster_size';"
Expected output:
+--------------------+-------+
| Variable_name | Value |
+--------------------+-------+
| wsrep_cluster_size | 1 |
+--------------------+-------+
Step 3: Join Additional Nodes
# On Node 2
systemctl start mysql
# On Node 3
systemctl start mysql
# Verify cluster size (should be 3)
mysql -u root -p -e "SHOW STATUS LIKE 'wsrep_cluster_size';"
Step 4: Verify Cluster Status
-- Check cluster health
SHOW STATUS LIKE 'wsrep_%';
-- Key variables to check:
SELECT
VARIABLE_NAME,
VARIABLE_VALUE
FROM performance_schema.global_status
WHERE VARIABLE_NAME IN (
'wsrep_cluster_size',
'wsrep_cluster_status',
'wsrep_connected',
'wsrep_ready',
'wsrep_local_state_comment'
);
Expected output:
+---------------------------+----------------+
| VARIABLE_NAME | VARIABLE_VALUE |
+---------------------------+----------------+
| wsrep_cluster_size | 3 |
| wsrep_cluster_status | Primary |
| wsrep_connected | ON |
| wsrep_ready | ON |
| wsrep_local_state_comment | Synced |
+---------------------------+----------------+
📊 Monitoring Galera Cluster
Essential Monitoring Queries
-- Cluster overview
SELECT
@@hostname AS node_name,
(SELECT VARIABLE_VALUE FROM performance_schema.global_status
WHERE VARIABLE_NAME = 'wsrep_cluster_size') AS cluster_size,
(SELECT VARIABLE_VALUE FROM performance_schema.global_status
WHERE VARIABLE_NAME = 'wsrep_cluster_status') AS cluster_status,
(SELECT VARIABLE_VALUE FROM performance_schema.global_status
WHERE VARIABLE_NAME = 'wsrep_local_state_comment') AS node_state,
(SELECT VARIABLE_VALUE FROM performance_schema.global_status
WHERE VARIABLE_NAME = 'wsrep_ready') AS ready;
-- Replication health
SELECT
VARIABLE_NAME,
VARIABLE_VALUE
FROM performance_schema.global_status
WHERE VARIABLE_NAME IN (
'wsrep_local_recv_queue', -- Incoming write-set queue
'wsrep_local_send_queue', -- Outgoing write-set queue
'wsrep_flow_control_paused', -- Flow control pause time
'wsrep_cert_deps_distance', -- Certification dependency distance
'wsrep_apply_oooe', -- Out-of-order applying
'wsrep_apply_oool', -- Out-of-order committing
'wsrep_local_replays' -- Transaction replays
);
-- Conflict detection
SELECT
VARIABLE_NAME,
VARIABLE_VALUE
FROM performance_schema.global_status
WHERE VARIABLE_NAME IN (
'wsrep_local_cert_failures', -- Certification failures
'wsrep_local_bf_aborts' -- Transactions aborted by brute force
);
Create Monitoring View
-- Create helpful monitoring view
CREATE VIEW galera_status AS
SELECT
'Cluster Size' AS metric,
(SELECT VARIABLE_VALUE FROM performance_schema.global_status
WHERE VARIABLE_NAME = 'wsrep_cluster_size') AS value
UNION ALL
SELECT 'Cluster Status',
(SELECT VARIABLE_VALUE FROM performance_schema.global_status
WHERE VARIABLE_NAME = 'wsrep_cluster_status')
UNION ALL
SELECT 'Node State',
(SELECT VARIABLE_VALUE FROM performance_schema.global_status
WHERE VARIABLE_NAME = 'wsrep_local_state_comment')
UNION ALL
SELECT 'Receive Queue',
(SELECT VARIABLE_VALUE FROM performance_schema.global_status
WHERE VARIABLE_NAME = 'wsrep_local_recv_queue')
UNION ALL
SELECT 'Send Queue',
(SELECT VARIABLE_VALUE FROM performance_schema.global_status
WHERE VARIABLE_NAME = 'wsrep_local_send_queue')
UNION ALL
SELECT 'Flow Control Paused (ns)',
(SELECT VARIABLE_VALUE FROM performance_schema.global_status
WHERE VARIABLE_NAME = 'wsrep_flow_control_paused_ns')
UNION ALL
SELECT 'Cert Failures',
(SELECT VARIABLE_VALUE FROM performance_schema.global_status
WHERE VARIABLE_NAME = 'wsrep_local_cert_failures');
SELECT * FROM galera_status;
⚠️ Handling Common Issues
Issue 1: Node Won't Join Cluster
# Check if cluster is in Primary state
mysql -u root -p -e "SHOW STATUS LIKE 'wsrep_cluster_status';"
# If cluster shows "non-Primary", you have a split-brain situation
# On the node that should be primary:
SET GLOBAL wsrep_provider_options='pc.bootstrap=YES';
# Alternative: Start fresh with safe_to_bootstrap
# Check grastate.dat for safe_to_bootstrap flag
cat /var/lib/mysql/grastate.dat
# If no node is safe, manually set on one node:
# Edit /var/lib/mysql/grastate.dat
# Change: safe_to_bootstrap: 1
Issue 2: SST Failing
# Check SST logs
tail -f /var/lib/mysql/innobackup.backup.log
# Common fixes:
# 1. Verify SST user credentials
mysql -u sst_user -p'SSTPassword123!' -e "SELECT 1;"
# 2. Check disk space on joining node
df -h /var/lib/mysql
# 3. Verify network connectivity
nc -zv 192.168.1.101 4444 # SST port
nc -zv 192.168.1.101 4567 # Galera port
nc -zv 192.168.1.101 4568 # IST port
Issue 3: Flow Control Pauses
-- Check flow control statistics
SELECT
VARIABLE_NAME,
VARIABLE_VALUE
FROM performance_schema.global_status
WHERE VARIABLE_NAME LIKE 'wsrep_flow_control%';
-- Tune flow control thresholds
SET GLOBAL wsrep_provider_options = 'gcs.fc_limit=256';
SET GLOBAL wsrep_provider_options = 'gcs.fc_factor=0.9';
-- Increase parallel applying threads if applier is slow
SET GLOBAL wsrep_slave_threads = 16;
Issue 4: Certification Conflicts
-- Monitor certification failures
SELECT
VARIABLE_VALUE AS cert_failures
FROM performance_schema.global_status
WHERE VARIABLE_NAME = 'wsrep_local_cert_failures';
-- Check for hot rows (frequently updated same rows from multiple nodes)
-- Enable wsrep_debug for detailed logging
SET GLOBAL wsrep_debug = 1;
-- Check for queries causing conflicts
SET GLOBAL wsrep_log_conflicts = ON;
-- Check error log for conflict details
🚀 Performance Optimization
Tuning for High Throughput
# /etc/mysql/mysql.conf.d/galera_tuning.cnf
[mysqld]
# Increase write-set cache for high write workloads
wsrep_provider_options = "gcache.size=2G"
# Parallel applying (adjust based on CPU cores)
wsrep_slave_threads = 16
# Optimize certification
wsrep_cert_deps_distance = 40
# InnoDB tuning for Galera
innodb_buffer_pool_size = 8G
innodb_log_file_size = 1G
innodb_flush_log_at_trx_commit = 2 # Better performance
innodb_flush_method = O_DIRECT
# Network optimization
wsrep_provider_options = "evs.send_window=512; evs.user_send_window=256; gcs.fc_limit=512"
Benchmark Comparison
| Configuration | Transactions/sec | Latency (avg) | Conflicts/sec |
|---|---|---|---|
| Default | 2,500 | 15ms | 5 |
| Optimized wsrep_slave_threads=8 | 4,200 | 10ms | 8 |
| Full optimization | 6,800 | 6ms | 12 |
🔐 Security Best Practices
Enable Encryption
# /etc/mysql/mysql.conf.d/galera_ssl.cnf
[mysqld]
# IST/SST encryption
wsrep_provider_options = "socket.ssl_key=/etc/mysql/certs/server-key.pem;socket.ssl_cert=/etc/mysql/certs/server-cert.pem;socket.ssl_ca=/etc/mysql/certs/ca.pem"
# Enable SST encryption
[sst]
encrypt = 4
ssl-key = /etc/mysql/certs/server-key.pem
ssl-cert = /etc/mysql/certs/server-cert.pem
ssl-ca = /etc/mysql/certs/ca.pem
Firewall Configuration
# Required ports for Galera
# 3306 - MySQL client connections
# 4444 - SST (State Snapshot Transfer)
# 4567 - Galera Cluster replication
# 4568 - IST (Incremental State Transfer)
# UFW example
ufw allow from 192.168.1.0/24 to any port 3306
ufw allow from 192.168.1.0/24 to any port 4444
ufw allow from 192.168.1.0/24 to any port 4567
ufw allow from 192.168.1.0/24 to any port 4568
# iptables example
iptables -A INPUT -p tcp -s 192.168.1.0/24 --dport 3306 -j ACCEPT
iptables -A INPUT -p tcp -s 192.168.1.0/24 --dport 4444 -j ACCEPT
iptables -A INPUT -p tcp -s 192.168.1.0/24 --dport 4567 -j ACCEPT
iptables -A INPUT -p tcp -s 192.168.1.0/24 --dport 4568 -j ACCEPT
🔄 Maintenance Procedures
Adding a New Node
# 1. Install MySQL/PXC on new node
# 2. Configure galera.cnf with existing cluster address
# 3. Start MySQL - it will automatically SST from a donor
systemctl start mysql
# Monitor SST progress
tail -f /var/lib/mysql/innobackup.backup.log
# Verify node joined
mysql -u root -p -e "SHOW STATUS LIKE 'wsrep_cluster_size';"
Graceful Node Shutdown
-- Before shutting down, check replication queues are empty
SHOW STATUS LIKE 'wsrep_local_recv_queue';
SHOW STATUS LIKE 'wsrep_local_send_queue';
-- Desync node for maintenance (optional, prevents becoming donor)
SET GLOBAL wsrep_desync = ON;
# Graceful shutdown
systemctl stop mysql
# When starting back up, no bootstrap needed
systemctl start mysql
Full Cluster Restart
# 1. Stop all nodes gracefully
# 2. Find the most advanced node (check grastate.dat)
cat /var/lib/mysql/grastate.dat
# Look for highest seqno
# 3. Bootstrap from that node
# Set safe_to_bootstrap: 1 in grastate.dat if needed
systemctl start mysql@bootstrap
# 4. Start other nodes normally
# They will IST from the bootstrapped node
✅ Best Practices Checklist
Deployment
- [ ] Use odd number of nodes (3, 5, 7)
- [ ] Deploy across different failure domains
- [ ] Use SSDs for optimal performance
- [ ] Configure proper gcache size for your workload
- [ ] Set up monitoring before going to production
Schema Design
- [ ] All tables must have PRIMARY KEY
- [ ] Use InnoDB storage engine only
- [ ] Avoid large transactions (split if needed)
- [ ] Minimize multi-table updates in single transaction
Operations
- [ ] Regular backup testing (even with Galera!)
- [ ] Monitor flow control pauses
- [ ] Keep nodes on same MySQL version
- [ ] Document bootstrap procedures
🎓 Conclusion
Galera Cluster provides true synchronous multi-master replication for MySQL, making it an excellent choice for applications requiring high availability and strong consistency. With proper configuration and monitoring, it can handle demanding production workloads.
💡 Pro Tip: While Galera provides excellent HA, always maintain proper backup procedures. Galera replicates all data changes, including accidental DELETEs.
Key Takeaways
- Synchronous replication means no replication lag
- Multi-master allows writes to any node
- Automatic SST/IST handles node provisioning
- Certification-based conflict detection is highly efficient
- Proper monitoring is essential for production deployments
About the Author
Jamaurice Holt is a Cloud Database Expert and AWS Solutions Architect with over 10 years of experience in database optimization, high availability, and cloud migrations. Specializing in MySQL, SQL optimization, and enterprise database solutions.
Related Articles
- MySQL Group Replication for High Availability
- MySQL ProxySQL Load Balancing Setup
- MySQL Backup with mysqldump and Percona XtraBackup
Need help with MySQL Galera Cluster? Contact us for expert database consulting.