Overview and principle Analysis of PXC / Galera Cluster Cluster 07/19 Update SLTechnology News&Howtos

Overview and principle Analysis of PXC / Galera Cluster Cluster

2025-07-19 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Database >

Shulou(Shulou.com)06/01 Report--

Mariadb's galera cluster cluster copies percona's PXC database cluster, so the principle is the same.

# how Galera Cluster/ PXC clusters work

When the client side sends the dml update operation request to the server, the native local process of server processes the request and returns OK to receive it. Client sends the commit update transaction to server,server to send the replicate writeset replication write data set to group (cluster cluster), and cluster sends the unique GTID (global transaction ID) corresponding to the data set to each server (node) in the cluster. After the verification of the current server node is passed, execute the commit_cd action to update the local database and return OK;. If the verification of other nodes fails, then execute rollback_cd and roll back the transaction that has just been committed. After the other server (other server) receives and verifies, it performs apply_cd and commit_cd actions to update the local database; if the authentication fails, the dataset is discarded.

Any node receives a sql request. For dml update transaction, before commit, wsrep API calls galera library for intra-cluster broadcast to verify whether the current transaction can be executed in all nodes. After verification, the transaction is actually committed to all nodes in the cluster for execution, otherwise roll back rollback. The purpose of this verification mechanism is to ensure the data consistency of all nodes.

Pessimistic locks are used internally in innodb to ensure the successful commit and execution of transactions.

The pxc/galera cluster uses optimistic locks, and all transactions are broadcast to each node in the cluster. Verification does not pass rollback.

# PXC/Galera Cluster cluster architecture

Group communication layer: it mainly realizes the unified global data synchronization strategy and the sorting of all transactions within the cluster, which is easy to generate GTID.

Replication layer: mainly used to complete data synchronization, composed of applier and slave queue. The efficiency of the replication module directly affects the writing function of the whole cluster.

# explanation of main nouns

WS write set writes datasets, writes / updates transactions

IST Incremental State Transfer incremental synchronization

SST State Snapshot Transfer incremental synchronization. Several methods of transmitting SST: mysqldump/xtrabackup/rsync

Unique identification of state change and sequence of UUID nodes

GTID Global Transaction ID, which consists of UUID and sequence number offsets. The global transaction id within the cluster defined in wsrep api is used to record the unique identity of the state change in the cluster and the offset in the queue.

Wsrep API provides an interface between the DBMS library and wsrep provider

Commit submits the changes made by the transaction to the database, that is, to execute the user's sql request in the library

# PXC/Galera Cluster Cluster Port

3306 the port at which the database provides services

4444 Mirror data transfer SST, cluster data synchronization port, full synchronization, play a role when new nodes join

4567 Port where cluster nodes communicate with each other

4568 incremental data synchronization IST. The node goes offline and uses this port after restart to incrementally synchronize data.

# Node status

The OPEN node starts successfully and attempts to connect to the cluster. If it fails, it exits or creates a new cluster according to the configuration.

The PRIMARY node is already in the cluster. When a new node joins, the state that will occur when donor is selected for data synchronization

The state of the JOINER node waiting to receive / receive synchronization files

The JOINED node completes data synchronization, but some of the data is not keeping up, and is trying to keep up with the progress of the cluster.

For example, the state of rejoining the cluster after a node failure and catching up with the progress of the cluster

5. The status of normal service provided by the SYNCED node, which indicates that the synchronization has been completed and is consistent with the progress of the cluster.

6. The DONOR node is in the state of providing full data synchronization for the new node. At this point, the node does not provide services to the client.

# # factors that cause changes in node status

New nodes join the cluster

Node failure recovery, rejoin the cluster

Node synchronization failure

# advantages and disadvantages of PXC/Galera Cluster Cluster

Advantages:

1. High availability. Multiple nodes in the cluster have equal functions, provide load and redundancy, and avoid a single point of failure.

two。 Strong consistency. All nodes in the cluster modify data synchronously and read and write synchronously without delay.

3. Easy to expand. Add a new node, just throw it into the cluster, and automatically complete the full SST synchronization and the subsequent IST incremental synchronization.

Disadvantages:

1. Any update transaction requires global validation before it is executed in each node library. Cluster performance is limited by the node with the worst performance

The 2.galera/pxc cluster ensures data consistency and must be verified by all nodes. Multi-point concurrent write, lock conflict is serious.

For example, multiple computers have write operations at the same time, and each update operation will lock the library to verify

3. When a new node or a delayed node rejoins, a full copy of the data SST will be made. The node as the donor (the node that provides synchronized files) cannot provide read or write during synchronization, and the display status is donor. The status after completion is syncd

# Analysis of downtime of single node or all nodes in galera cluster cluster

Single node downtime

The node stops and restarts, rejoins the cluster, and synchronizes the data incrementally through IST to maintain the consistency of the cluster data. The implementation of IST is determined by the wsrep_provider_options= "gcache.size=1G" parameter, which is generally set to 1G. What determines the parameter size? according to the downtime, if the downtime is one hour, you need to confirm how much binlog is generated in one hour to calculate the parameter size.

1.1 the downtime is too long and some data gcache is not available. In this case, the node SST synchronizes all the data.

two。 When all nodes are closed, the rolling shutdown method should be adopted: a node closes repair and adds back to the cluster; b node shuts down repairs and adds back to the cluster.

The principle is to keep at least one member of the cluster alive and perform a rolling restart.

2.1 all nodes in the cluster are shut down and there are no surviving nodes

After the database of each node is shut down, the last GTID is saved. When starting the cluster, start the last closed node first, in reverse order.

3. Avoid data loss when shutting down and starting nodes

The principle is to keep at least one member of the cluster cluster in stock and then restart it on a rolling basis.

3.2 transform a slave node into a node in a PXC/Galera cluster by using the concept of master and slave.

# FAQ Summary

What if the primary node (the node responsible for writing) writes too much and the apply_cd time is too long, resulting in too long data update operation time?

The Wrep_slave_threads parameter is configured to number or 1. 5 times the number of cpu.

Cerebral fissure

The occurrence of unknown command in the execution of any command indicates that there is a brain fissure, port 4567 of communication between any two nodes in the cluster is not available, and can not provide external services. SET GLOBAL wsrep_provider_options= "pc.ignore_sb=true"

Concurrent writing

If write / update operations are performed on multiple nodes in the cluster, locking problems may occur when different nodes update the same line at the same time, such as: Error:1213 SQLSTATE:4001. Solution: specify that both updates and writes operate on the same node.

DDL global lock

Adopt pt-online-schema-change

Only the innodb engine is supported, and the table structure must have a primary key, otherwise it will cause data inconsistency in the data page of each node in the set.

Table-level locks are not supported, that is, lock/unlock tables is not allowed, row-level locks are used

Join the new node-the failed node resumes joining the cluster. There can be no write operations at this time, otherwise it will result in the DDL deadlock of the library being written to. Therefore, you need to pause the write operation of the cluster business and start the write operation after the data is consistent.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.