In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-03-18 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Database >
Share
Shulou(Shulou.com)06/01 Report--
Original: https://yq.aliyun.com/articles/80365?utm_campaign=wenzhang&utm_medium=article&utm_source=QQ-qun&2017517&utm_content=m_21166
The architecture is relatively simple, using native semi-synchronous replication as the basis for data synchronization
With two nodes, there is no master selection problem after the host downtime. You can switch directly.
Dual nodes, low resource requirements and simple deployment
Disadvantages:
It is completely dependent on semi-synchronous replication. If semi-synchronous replication degenerates to asynchronous replication, data consistency cannot be guaranteed.
Additional consideration should be given to the high availability mechanisms of HAProxy and Keepalived
2. Optimization of semi-synchronous replication
The semi-synchronous replication mechanism is reliable. If semi-synchronous replication is always in effect, then the data can be considered consistent. However, due to some objective reasons such as network fluctuations, semi-synchronous replication times out and switches to asynchronous replication, so the consistency of data can not be guaranteed. Therefore, the consistency of data can be improved by ensuring semi-synchronous replication as much as possible.
This scheme also uses two-node architecture, but makes functional optimization on the basis of the original semi-synchronous replication, which makes the mechanism of semi-synchronous replication more reliable.
The optimization schemes that can be referenced are as follows:
Dual channel replication
Semi-synchronous replication breaks due to timeout, and when replication is established again, two channels are established at the same time, one of which starts replication from the current location to ensure that the slave machine knows the progress of the current host execution. Another asynchronous replication channel begins to catch up with the data lagging behind the slave machine. When the asynchronous replication channel catches up to the starting position of semi-synchronous replication, semi-synchronous replication is resumed.
Binlog file server
Build two semi-synchronous replication channels, in which the semi-synchronous channel connected to the file server is not normally enabled. When the master-slave semi-synchronous replication is degraded by network problems, start the semi-synchronous replication channel with the file server. When the master-slave semi-synchronous replication is restored, close the semi-synchronous replication channel with the file server.
Advantages:
Dual nodes, low resource requirements and simple deployment
The structure is simple, there is no problem of choosing the master, just switch directly.
Compared with native replication, optimized semi-synchronous replication can better ensure data consistency.
Disadvantages:
You need to modify the kernel source code or use the MySQL communication protocol. Need to have some understanding of the source code, and be able to do a certain degree of secondary development
Still rely on semi-synchronous replication, which does not fundamentally solve the problem of data consistency.
3. High availability architecture optimization
Extend a two-node database to a multi-node database or a multi-node database cluster. You can choose a cluster of one master and two slaves, one master and multiple slaves or multiple masters and multiple slaves according to your own needs.
Because of the semi-synchronous replication, it is considered that the semi-synchronous replication is successful when the successful reply of a slave is received, so the reliability of multi-slave semi-synchronous replication is better than that of single-slave semi-synchronous replication. And the probability of multi-node downtime is also less than that of single-node downtime, so to some extent, multi-node architecture can be considered that high availability is better than two-node architecture.
However, due to the large number of databases, database management software is needed to ensure the maintainability of the database. You can choose MMM, MHA, or various versions of Proxy, and so on. Common scenarios are as follows:
MHA+ multi-node cluster
MHA Manager regularly detects the master nodes in the cluster, and when the master fails, it can automatically upgrade the slave of the latest data to the new master, and then redirect all other slave to the new master, and the whole failover process is completely transparent to the application.
MHA Node runs on each MySQL server, and its main function is to handle binary logs during handover to ensure that data is lost as little as possible.
MHA can also be extended to the following multi-node clusters:
Advantages:
Automatic fault detection and transfer can be carried out
It has good scalability and can expand the number and structure of MySQL nodes as needed.
Compared with two-node MySQL replication, three-node / multi-node MySQL is less likely to be unavailable.
Disadvantages:
At least three nodes are needed, and more resources are needed than two nodes.
The logic is more complex, troubleshooting problems after a fault occurs, and locating the problem is more difficult.
Data consistency is still guaranteed by native semi-synchronous replication, and there is still a risk of data inconsistency.
It may be due to a brain fissure in the network partition.
ZooKeeper+Proxy
ZooKeeper uses distributed algorithms to ensure the consistency of cluster data, and the use of ZooKeeper can effectively ensure the high availability of Proxy and avoid the phenomenon of network partition.
Advantages:
It ensures the high availability of the whole system, including Proxy and MySQL.
It has good scalability and can be extended to a large-scale cluster.
Disadvantages:
Data consistency still depends on native mysql semi-synchronous replication
With the introduction of ZK, the logic of the whole system becomes more complex
4. Shared storage
Shared storage realizes the decoupling of database server and storage device. Data synchronization between different databases no longer depends on the native replication function of MySQL, but ensures data consistency by means of disk data synchronization.
SAN shared storage
The concept of SAN is to allow a direct high-speed network connection (compared to LAN) between the storage device and the processor (server) to achieve centralized data storage. Common architectures are as follows:
When using shared storage, the MySQL server can mount the file system and operate normally. If the master database goes down, the slave database can mount the same file system to ensure that the master database and slave database use the same data.
Advantages:
It only takes two nodes, and the deployment is simple and the switching logic is simple.
It is good to ensure the strong consistency of data.
Data inconsistencies will not occur because of MySQL logic errors.
Disadvantages:
Need to consider the high availability of shared storage
High price
DRBD disk replication
DRBD is a software-based, network-based block replication storage solution, which is mainly used to mirror the data of disks, partitions and logical volumes between servers. When the user writes the data to the local disk, it will also send the data to the disk of another host in the network, so that the data of the local host (primary node) and the remote host (standby node) can be synchronized in real time. Common architectures are as follows:
When there is a problem with the local host, the remote host still retains a copy of the same data and can continue to use it, ensuring the security of the data.
DRBD is a fast-level synchronous replication technology implemented by Linux kernel module, which can achieve the same shared storage effect as SAN.
Advantages:
It only takes two nodes, and the deployment is simple and the switching logic is simple.
Compared with SAN storage network, the price is low.
Ensure strong consistency of data
Disadvantages:
It has a great influence on the performance of IO
No read operation is provided from the slave library
5. Distributed protocol
Distributed protocols can solve the problem of data consistency well. The more common scenarios are as follows:
MySQL Cluster
MySQL Cluster is the deployment scheme of the official cluster, which achieves high availability and data consistency of the database by using the NDB storage engine to back up redundant data in real time.
Advantages:
All use official components and do not rely on third-party software
Strong consistency of data can be achieved.
Disadvantages:
It is less used in China.
The configuration is complex, and the NDB storage engine is required, which is different from the conventional MySQL engine.
At least three nodes
Galera
MySQL high availability cluster based on Galera is a multi-master data synchronization MySQL cluster solution, which is easy to use, no single point of failure and high availability. Common architectures are as follows:
Advantages:
Multi-master write, no delay replication, can ensure strong data consistency
There are mature communities and Internet companies are using them on a large scale.
Automatic failover, automatic addition and elimination of nodes
Disadvantages:
Native MySQL nodes need to be patched with wsrep
Only innodb storage engine is supported
At least three nodes
Paxos
The problem solved by Paxos algorithm is how a distributed system can agree on a certain value (resolution). This algorithm is considered to be the most effective of its kind. The combination of Paxos and MySQL can achieve strong consistency of distributed MySQL data. Common architectures are as follows:
Advantages:
Multi-master write, no delay replication, can ensure strong data consistency
Have a mature theoretical basis
Automatic failover, automatic addition and elimination of nodes
Disadvantages:
Only InnoDB storage engine is supported
At least three nodes
Summary
With the increasing demand for data consistency, more and more methods have been tried to solve the problem of distributed data consistency, such as the optimization of MySQL itself, the optimization of MySQL cluster architecture, the introduction of Paxos, Raft, 2PC algorithm and so on.
The method of using distributed algorithm to solve the data consistency problem of MySQL database is also more and more accepted by people, and a series of mature products such as PhxSQL, MariaDB Galera Cluster, Percona XtraDB Cluster and so on are more and more used on a large scale.
With the official MySQL Group Replication GA, the use of distributed protocols to solve data consistency problems has become the mainstream direction. It is expected that more and more excellent solutions will be proposed, and the problem of high availability of MySQL can also be better solved.
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 303
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.