In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-04-12 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Database >
Share
Shulou(Shulou.com)05/31 Report--
This article shares with you the content of the sample analysis of the master-slave replication model of MySQL 8.0. The editor thinks it is very practical, so share it with you as a reference and follow the editor to have a look.
1. MySQL master-slave replication model
It all starts with the master-slave replication model of MySQL. The following is the most classic master-slave replication model architecture of MySQL:
MySQL replication model
The master-slave architecture of MySQL depends on the MySQL Binlog function. The Binlog is generated on the Master node and the Binlog is written to the Binlog file.
Two threads are started on the Slave node: one IO thread fetches the Binlog log from the MySQL and writes it to the local RelayLog log; the other SQL thread constantly reads the log from the RelayLog log and parses the execution, so that by adding several sequential read and write operations on the host and slave, you can ensure that all SQL statements executed on the host are executed exactly the same on the slave.
Replication delay refers to how long it will take for a transaction to be completed on the Slave after the Master execution is completed.
Because the reading and writing of Binlog files and RelayLog files are sequential operations, the IO thread on Slave rarely delays the Dump operation of Binlog files in the production environment. In fact, starting from MySQL 5.5, MySQL officially provides a semi-synchronous replication plug-in. The Binlog of each transaction needs to be transferred to Slave and written to RelayLog before it can be submitted. This architecture provides data integrity between the master and the slave and ensures that the host can have a complete copy of the data after a failure. As a result, replication delays usually occur during SQL thread execution.
As you can see from the architecture diagram, in the earliest master-slave replication model, only one thread was responsible for executing the Relaylog, that is, all operations on the host were played back serially on the slave. This brings a problem, if the master write pressure is relatively high, then the playback speed from the master is likely to be unable to keep up with the master. (in addition, the architecture of MySQL determines that Binlog will only write Binlog files and Dump to the slave computer during the Commit phase, which also leads to delays in the execution of master-slave transactions. This problem is particularly obvious in large transactions, but this issue is beyond the scope of this article.)
Since the problem of master-slave delay is that single-thread playback of RelayLogs is too slow, the solution to reduce the master-slave delay is to improve the parallelism of RelayLog playback on the slave computer.
II. Parallel replication in 5.7
1. Schema-level parallel replication
MySQL officially introduced a relatively simple parallel replication scheme in 5.6.Its architecture is as follows:
(the picture is from teacher Jiang Chengyao's blog)
The red box is divided into the key to parallel playback. If the function of parallel playback is enabled in 5.6, multiple WorkThread will be started, and the SQLThread, which was originally responsible for playback, will be transformed into a Coordinator role, which will be responsible for judging whether the transaction can be executed in parallel and distributed to WorkThread.
If the transaction belongs to a different Schema and is not a DDL statement, and there is no cross-Schema operation, it can be played back in parallel, otherwise you need to wait for all the Worker threads to complete before executing the contents of the current log.
This parallel playback is Schema-level parallelism, which benefits if there is more than one Schema on the instance, while if there is only one Schema on the instance, the transaction cannot be played back in parallel, and the efficiency is slightly reduced due to more distribution operations. In practical application, single database and multiple tables is a more common situation.
2. Parallel replication based on Group Commit
Although the parallel replication in 5.6 does not improve the playback speed in most application scenarios, this architecture has become the basis of later MySQL parallel replication-that is, parallel playback RelayLog,SQL threads on Slave are responsible for judging whether parallel playback can be done and assigning it to Workthread playback.
Group Commit technology is introduced in order to solve the problem of insufficient concurrency caused by fsync when a transaction is committed. To put it simply, it is because when a transaction commits, Binlog must be written to disk and fsync is called, which is a costly operation. When a transaction commits concurrently, each transaction acquires a log lock and carries out fsync, which will cause the transaction to actually write to the Binlog file in a serial way, which greatly reduces the concurrency of the transaction commit.
The Group Commit technology used in 5.6 divides the transaction commit phase into three stages: Flush, Sync and Commit. Each stage maintains a queue, and the first thread in the queue is responsible for executing the step, so that it can actually achieve the purpose of Binlog fsync a batch of transactions to disk at one time, and such a batch of transactions committed at the same time is called the transaction of the same Group.
Although Group Commit is a parallel commit technology, it accidentally solves a difficult problem of parallel playback from computer transactions-how to determine which transactions can be played back in parallel. If a batch of transactions are Commit at the same time, then these transactions must not have mutually exclusive holding locks and execution interdependencies, so these transactions must be able to be played back in parallel.
Therefore, a new type of parallel playback is introduced in MySQL 5.7. it is determined by the parameter slave_parallel_type. The default value of DATABASE will use SCHEMA-level parallel playback in version 5.6. if set to LOGICAL_LOCK, GroupCommit-based parallel playback will be adopted, and transactions within the same Group will be played back in parallel on Slave.
In order to mark the group to which the transaction belongs, MySQL version 5.7 generates Binlog logs with two special values recorded in Binlog Event, last_committed and sequence_number, where last_committed refers to the number of the last transaction commit when the transaction commits, and sequence_number is the sequence number of the transaction commit, which is monotonously incremented in an Binlog file. If the last_committed values of two transactions are the same, the two transactions are committed within a group.
As in the binlog file above, the transaction last_committed of sequence_number 1-6 is 0, so it belongs to the same group and can be played back in parallel on slave. The last_committed of 7-12 is 6 and also belongs to the same group, so it can be played back in parallel.
The introduction of Logical_Lock greatly improves the speed of slave playback in the case of high concurrent pressure on the host, basically achieving how to submit it on the host and how to play back on the slave.
3. WriteSet in MySQL MGR
However, parallel replication based on logical clock Logical_Clock still has some disadvantages in 5.7. for example, transactions that must be submitted in parallel on the master can be played back in parallel on the slave, and if there is little concurrency pressure on the master, then you will not be able to enjoy the benefits of parallel replication. Binlog_group_commit_sync_delay and binlog_group_commit_sync_no_delay_count are introduced to increase the rate of group submissions on Master by letting Binlog wait for a while before executing fsync. But in any case, the speed of parallel playback from the top depends on the parallel commit on the host.
In MySQL 8.0, a new mechanism is introduced to determine whether transactions can be played back in parallel, and the playback order of slaves is determined by detecting whether there are write conflicts in the running process of transactions, which makes the degree of concurrency on the slave no longer dependent on the host.
In fact, this mechanism has been quietly applied in MySQL version 5.7.20. Version 5.7.20 introduces an important feature: Group Replication, which distributes binlog among multiple MySQL nodes through the Paxso protocol, so that a transaction must be successfully committed on most of the nodes in the cluster.
To support multi-master writes, MySQL MRG uses a Certify phase to determine whether transactions in Binlog are written to RelayLog after the completion of the Binlog distribution node. In this process, the Certify phase uses the WriteSet method to verify whether there are conflicts between transactions, and when writing to RelayLog, the last_ commits value of transactions without conflicts is set to the same value.
For example, in 5.7.20, do the following:
The above code creates a database and an InnoDB table in a MGR cluster and inserts three records. At this point, querying the Binlog on the Primary node might get the following result:
As you can see, because in a Session, these operations have different last_committed in serial order, normally, these BinlogEvent should be played back in the same serial manner on the slave. Let's take a look at the RelayLog in the MGR cluster:
Interestingly, in the RelayLog of the Secondary node, these transactions have the same last_ committed value, which means that these transactions can be played back in parallel on playback in the MGR cluster.
In MGR, WriteSet technology is used to detect whether there are write conflicts between different transactions, and the parallel playback of transactions is re-planned. This technology was moved to the Binlog generation phase in 8.0, and adopted the architecture of master-slave replication.
4. Parallel replication in MySQL 8.0
With all that said, I finally got to MySQL 8.0. from the above description, the reader should have a rough outline of the optimization principle of parallel replication in MySQL 8.0. Through WriteSet-based conflict detection, when Binlog is generated on the host, the parallel relationship is no longer based on the group commit, but on the update conflict of the transaction itself.
1. Related MySQL parameters
In MySQL 8.0, this version introduces the parameter binlog_transaction_depandency_tracking to control how the dependency of the transaction is determined.
There are three options for this value:
The default COMMIT_ORDERE indicates that the dependency of the transaction continues to be determined using the group commit-based approach in 5.7,
WRITESET means to use write sets to determine the dependencies of transactions.
Another option, WRITESET_SESSION, indicates that WriteSet is used to determine the dependency of a transaction, but transactions within the same Session will not have the same last_ committed value.
In the code implementation, MySQL uses a vector variable to store the hash value of the committed transaction, and the values of the modified primary key and non-empty UniqueKey of all committed transactions are compared with the values in the vector after HASH, so as to determine whether the current committed transaction has updated the same line with the committed transaction, and to determine the dependency. The size of the vector is controlled by the parameter binlog_transaction_dependency_history_size, with values ranging from 1 to 1000000, and the initial default value is 25000.
At the same time, the parameter transaction_write_set_extraction controls that the HASH algorithm used to detect transaction dependencies has three values: OFF | XXHASH64 | MURMUR32. If the value of binlog_transaction_depandency_tracking is WRITESET or WRITESET_SESSION, then the value cannot be OFF and cannot be changed.
2. WriteSet dependence detection conditions
WriteSet determines whether a transaction can be played back in parallel by detecting whether two transactions update the same record, so it is necessary to save the committed transaction information at run time to record which rows have been updated by the historical transaction. The parameter to record historical transactions is binlog_transaction_dependency_history_size. The higher the value, the more committed transaction information can be recorded, but it is important to note that this value does not refer to the transaction size, but to the number of transaction updates tracked. When WRITESET or WRITESET_SESSION is turned on, MySQL identifies and records updates to the transaction as follows.
If the currently updated row of the transaction has a primary key (Primary Key), then HASH (DB name, table name, KEY name, KEY_VALUE1, KEY_VALUE2... To the vector write_set of the current transaction
If the row currently updated by the transaction has a non-empty unique key (Unique Key Not NULL), the same HASH (DB name, table name, key name, KEY_VALUE1). Add to the write_set of the current transaction.
If the row updated by the transaction has a foreign key constraint (FOREIGN KEY) and is not empty, the foreign key information and the HASH of VALUE are added to the write_set of the current transaction; if the primary key of the table currently updated by the transaction is the foreign key of some other table, the current transaction has_related_foreign_key = true; is set. If the transaction updates a row and no data is added to the write_set, the current transaction is marked has_missing_key = true.
When performing conflict detection, has_related_foreign_key and has_missing_key are checked first, and if true, it returns to COMMIT_ORDER mode; otherwise, it is compared with the write_set of the committed transaction according to the hash value in the transaction's write_set.
If there is no conflict, the current transaction shares the same last_commited as the last committed transaction, otherwise all write_set committed before that conflicting transaction will be deleted from the global committed write_set and degenerated to COMMIT_ORDER computing last_committed.
After each calculation of the last_committed value of the transaction, you need to check whether the write_set of the current global committed transaction has exceeded the value set by binlog_transaction_dependency_history_size, and if so, clear the global write_set of the committed transaction.
In terms of detection conditions, this feature depends on the primary key and unique index, and there will be no performance improvement from this feature if there is no primary key and no unique non-null index in the table involved in the transaction. In addition, you need to set the Binlog format to Row format.
3. Performance improvement
MySQL High Availability tests the replication performance of the enabled WriteSet. Here, the test results are transferred directly, and those who are interested can visit the original blog directly.
During the test, 100W transactions are executed on the host through Sysbench, and then the replication thread of Slave is started. The test environment is executed on the Xeon E5-2699-V3 16-core host. Here are the test results:
As you can see, WRITESET has the best performance when there are fewer client threads, and there is little difference between WRITESET_SESSION and COMMIT_ORDER when there is only one connection.
Thank you for reading! This is the end of the article on "sample Analysis of MySQL 8.0 Master-Slave replication Model". I hope the above content can be of some help to you, so that you can learn more knowledge. if you think the article is good, you can share it for more people to see!
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.