Interpretation of the selection of 10 common MySQL high availability schemes 02/13 Update SLTechnology News&Howtos

Interpretation of the selection of 10 common MySQL high availability schemes

2026-02-13 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Database >

Shulou(Shulou.com)06/01 Report--

Original: https://yq.aliyun.com/articles/80365?utm_campaign=wenzhang&utm_medium=article&utm_source=QQ-qun&2017517&utm_content=m_21166

The architecture is relatively simple, using native semi-synchronous replication as the basis for data synchronization

With two nodes, there is no master selection problem after the host downtime. You can switch directly.

Dual nodes, low resource requirements and simple deployment

Disadvantages:

It is completely dependent on semi-synchronous replication. If semi-synchronous replication degenerates to asynchronous replication, data consistency cannot be guaranteed.

Additional consideration should be given to the high availability mechanisms of HAProxy and Keepalived

2. Optimization of semi-synchronous replication

The semi-synchronous replication mechanism is reliable. If semi-synchronous replication is always in effect, then the data can be considered consistent. However, due to some objective reasons such as network fluctuations, semi-synchronous replication times out and switches to asynchronous replication, so the consistency of data can not be guaranteed. Therefore, the consistency of data can be improved by ensuring semi-synchronous replication as much as possible.

This scheme also uses two-node architecture, but makes functional optimization on the basis of the original semi-synchronous replication, which makes the mechanism of semi-synchronous replication more reliable.

The optimization schemes that can be referenced are as follows:

Dual channel replication

Semi-synchronous replication breaks due to timeout, and when replication is established again, two channels are established at the same time, one of which starts replication from the current location to ensure that the slave machine knows the progress of the current host execution. Another asynchronous replication channel begins to catch up with the data lagging behind the slave machine. When the asynchronous replication channel catches up to the starting position of semi-synchronous replication, semi-synchronous replication is resumed.

Binlog file server

Build two semi-synchronous replication channels, in which the semi-synchronous channel connected to the file server is not normally enabled. When the master-slave semi-synchronous replication is degraded by network problems, start the semi-synchronous replication channel with the file server. When the master-slave semi-synchronous replication is restored, close the semi-synchronous replication channel with the file server.

Advantages:

Dual nodes, low resource requirements and simple deployment

The structure is simple, there is no problem of choosing the master, just switch directly.

Compared with native replication, optimized semi-synchronous replication can better ensure data consistency.

Disadvantages:

You need to modify the kernel source code or use the MySQL communication protocol. Need to have some understanding of the source code, and be able to do a certain degree of secondary development

Still rely on semi-synchronous replication, which does not fundamentally solve the problem of data consistency.

3. High availability architecture optimization

Extend a two-node database to a multi-node database or a multi-node database cluster. You can choose a cluster of one master and two slaves, one master and multiple slaves or multiple masters and multiple slaves according to your own needs.

Because of the semi-synchronous replication, it is considered that the semi-synchronous replication is successful when the successful reply of a slave is received, so the reliability of multi-slave semi-synchronous replication is better than that of single-slave semi-synchronous replication. And the probability of multi-node downtime is also less than that of single-node downtime, so to some extent, multi-node architecture can be considered that high availability is better than two-node architecture.

However, due to the large number of databases, database management software is needed to ensure the maintainability of the database. You can choose MMM, MHA, or various versions of Proxy, and so on. Common scenarios are as follows:

MHA+ multi-node cluster

MHA Manager regularly detects the master nodes in the cluster, and when the master fails, it can automatically upgrade the slave of the latest data to the new master, and then redirect all other slave to the new master, and the whole failover process is completely transparent to the application.

MHA Node runs on each MySQL server, and its main function is to handle binary logs during handover to ensure that data is lost as little as possible.

MHA can also be extended to the following multi-node clusters:

Advantages:

Automatic fault detection and transfer can be carried out

It has good scalability and can expand the number and structure of MySQL nodes as needed.

Compared with two-node MySQL replication, three-node / multi-node MySQL is less likely to be unavailable.

Disadvantages:

At least three nodes are needed, and more resources are needed than two nodes.

The logic is more complex, troubleshooting problems after a fault occurs, and locating the problem is more difficult.

Data consistency is still guaranteed by native semi-synchronous replication, and there is still a risk of data inconsistency.

It may be due to a brain fissure in the network partition.

ZooKeeper+Proxy

ZooKeeper uses distributed algorithms to ensure the consistency of cluster data, and the use of ZooKeeper can effectively ensure the high availability of Proxy and avoid the phenomenon of network partition.

Advantages:

It ensures the high availability of the whole system, including Proxy and MySQL.

It has good scalability and can be extended to a large-scale cluster.

Disadvantages:

Data consistency still depends on native mysql semi-synchronous replication

With the introduction of ZK, the logic of the whole system becomes more complex

4. Shared storage

Shared storage realizes the decoupling of database server and storage device. Data synchronization between different databases no longer depends on the native replication function of MySQL, but ensures data consistency by means of disk data synchronization.

SAN shared storage

The concept of SAN is to allow a direct high-speed network connection (compared to LAN) between the storage device and the processor (server) to achieve centralized data storage. Common architectures are as follows:

When using shared storage, the MySQL server can mount the file system and operate normally. If the master database goes down, the slave database can mount the same file system to ensure that the master database and slave database use the same data.

Advantages:

It only takes two nodes, and the deployment is simple and the switching logic is simple.

It is good to ensure the strong consistency of data.

Data inconsistencies will not occur because of MySQL logic errors.

Disadvantages:

Need to consider the high availability of shared storage

High price

DRBD disk replication

DRBD is a software-based, network-based block replication storage solution, which is mainly used to mirror the data of disks, partitions and logical volumes between servers. When the user writes the data to the local disk, it will also send the data to the disk of another host in the network, so that the data of the local host (primary node) and the remote host (standby node) can be synchronized in real time. Common architectures are as follows:

When there is a problem with the local host, the remote host still retains a copy of the same data and can continue to use it, ensuring the security of the data.

DRBD is a fast-level synchronous replication technology implemented by Linux kernel module, which can achieve the same shared storage effect as SAN.

Advantages:

It only takes two nodes, and the deployment is simple and the switching logic is simple.

Compared with SAN storage network, the price is low.

Ensure strong consistency of data

Disadvantages:

It has a great influence on the performance of IO

No read operation is provided from the slave library

5. Distributed protocol

Distributed protocols can solve the problem of data consistency well. The more common scenarios are as follows:

MySQL Cluster

MySQL Cluster is the deployment scheme of the official cluster, which achieves high availability and data consistency of the database by using the NDB storage engine to back up redundant data in real time.

Advantages:

All use official components and do not rely on third-party software

Strong consistency of data can be achieved.

Disadvantages:

It is less used in China.

The configuration is complex, and the NDB storage engine is required, which is different from the conventional MySQL engine.

At least three nodes

Galera

MySQL high availability cluster based on Galera is a multi-master data synchronization MySQL cluster solution, which is easy to use, no single point of failure and high availability. Common architectures are as follows:

Advantages:

Multi-master write, no delay replication, can ensure strong data consistency

There are mature communities and Internet companies are using them on a large scale.

Automatic failover, automatic addition and elimination of nodes

Disadvantages:

Native MySQL nodes need to be patched with wsrep

Only innodb storage engine is supported

At least three nodes

Paxos

The problem solved by Paxos algorithm is how a distributed system can agree on a certain value (resolution). This algorithm is considered to be the most effective of its kind. The combination of Paxos and MySQL can achieve strong consistency of distributed MySQL data. Common architectures are as follows:

Advantages:

Multi-master write, no delay replication, can ensure strong data consistency

Have a mature theoretical basis

Automatic failover, automatic addition and elimination of nodes

Disadvantages:

Only InnoDB storage engine is supported

At least three nodes

Summary

With the increasing demand for data consistency, more and more methods have been tried to solve the problem of distributed data consistency, such as the optimization of MySQL itself, the optimization of MySQL cluster architecture, the introduction of Paxos, Raft, 2PC algorithm and so on.

The method of using distributed algorithm to solve the data consistency problem of MySQL database is also more and more accepted by people, and a series of mature products such as PhxSQL, MariaDB Galera Cluster, Percona XtraDB Cluster and so on are more and more used on a large scale.

With the official MySQL Group Replication GA, the use of distributed protocols to solve data consistency problems has become the mainstream direction. It is expected that more and more excellent solutions will be proposed, and the problem of high availability of MySQL can also be better solved.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.