What is the reason for the master-slave delay? 07/13 Update SLTechnology News&Howtos

What is the reason for the master-slave delay?

2025-07-13 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Database >

Shulou(Shulou.com)05/31 Report--

This article introduces the relevant knowledge of "what is the cause of master-slave delay". In the operation of actual cases, many people will encounter such a dilemma, so let the editor lead you to learn how to deal with these situations. I hope you can read it carefully and be able to achieve something!

Preface

In the actual production environment, a single MySQL as an independent database can not meet the actual needs, whether in security, high availability, high concurrency and other aspects.

Therefore, generally speaking, it is deployed and implemented by means of cluster master-slave replication (Master-Slave) to synchronize data, and then through read-write separation (MySQL-Proxy) to improve the concurrent load capacity of the database.

To summarize the functions of MySQL master-slave cluster:

Improve the load capacity of the database, the main database performs read and write tasks (add, delete and modify), and the standby database only makes queries.

Improve system read and write performance, scalability and high availability.

Data backup and disaster recovery, the preparation database is in different places, the main database no longer exists, and the backup database can be taken over immediately without the need for recovery time.

When it comes to master-slave synchronization, we can't do without binlog. Let's introduce binlog first.

Biglog

What is binlog? What's the use?

It is used to record the information of write operations (excluding queries) performed by the database and is stored on disk in binary form. What can be simply understood as recording is the sql statement.

Binlog is the logical log of mysql and is recorded by the Server layer. The mysql database using any storage engine records binlog logs.

In practical applications, there are two main usage scenarios for binlog:

For master-slave replication, in the master-slave structure, the binlog is sent from the master to the slave,slave server as an operation record and saved to the relay log for logs received from the master.

For data backup, after the database backup file is generated, binlog saves the details of the database backup so that the next backup can start from the backup point.

Log format

Binlog logs are available in three formats: STATMENT, ROW, and MIXED

Before MySQL 5.7.7, the default format is STATEMENT, and after MySQL 5.7.7, the default value is ROW

The log format is specified by binlog-format.

STATMENT: based on the replication of SQL statements, each sql statement that modifies the data is recorded in binlog

ROW: row-based replication

MIXED: mixed replication based on STATMENT and ROW. For example, general data operations are saved in row format, and some table structure change statements are recorded using statement.

We can also view the contents of the file through the viewing tool mysqlbinlog provided by mysql, such as:

Mysqlbinlog mysql-bin.00001 | more

The size and number of binlog files will continue to increase, and the suffix names will be incremented by serial number, such as mysql-bin.00002, etc.

Principle of master-slave replication

You can see that mysql master-slave replication requires three threads: master (binlog dump thread), slave (Imax O thread, SQL thread)

Binlog dump thread: when there is a data update in the main database, the updated event type is written to the binlog file of the main database according to the set binlog format, and the log dump thread is created to notify slave that there is a data update. When the slave O thread requests the log content, the current binlog name and the current updated location are passed to the log I / O thread at the same time.

Binlog O thread: this thread connects to the master, requests a copy of the specified binlog file location from the log dump thread, and stores the requested binlog in the local relay log.

SQL thread: after this thread detects an update to relay log, it reads and does the redo operation locally, and reexecutes the events that occur in the master database locally to ensure master-slave data synchronization.

Basic process summary

Hongmeng official Strategic Cooperation to build HarmonyOS Technology Community

The main library writes data and generates a binlog file. During this process, MySQL writes transactions serially to the binary log, even if the statements in the transaction are executed across each other.

After the event is written to the binary log, master notifies the storage engine to commit the transaction.

Connect to the Master server from the IO thread on the library server and request to read the binlog to the slave library from the specified location in the execution binlog log file.

After the master library receives the IO thread request from the slave library, the IO thread copied on it will read the binlog file in batches according to the request information of Slave and return it to the IO thread of the slave library.

After the I / O thread of the Slave server obtains the log contents, log files and location points sent by the IO thread on the Master server, it writes the binlog log contents to the end of the server's own Relay Log (that is, relay log) file, and records the new binlog file name and location into the master-info file. So that the next time you read the new binlog log on the master side, you can tell the Master server to start reading the new binlog log contents from the specified file and location of the new binlog log.

The SQL thread from the library server will monitor the new log content in the local RelayLog in real time, then translate the log in RelayLog into SQL and execute SQL sequentially to update the data from the library.

The slave library records the file name and location point of the current application relay log in relay-log.info for the next data replication.

Parallel replication

Prior to MySQL version 5.6, there were two thread I / O threads and SQL threads on the Slave server.

The Icano thread is responsible for receiving the binary log, and the SQL thread plays back the binary log. If parallel replication is enabled in MySQL version 5.6, the SQL thread becomes the coordinator thread, and the coordinator thread is mainly responsible for the contents of the previous two parts

The red box in the image above is the key to parallel replication.

This means that the coordinator thread does not just send the log to the worker thread, but it can also play back the log itself, but all parallel operations are delivered by the worker thread.

Coordinator threads and worker are typical producer and consumer models.

However, it can not be called real parallel replication until MySQL 5.7. the main reason is that the playback of the slave server is consistent with that of the host, that is, parallel playback is performed on the slave as it is executed on the master server. There are no parallel replication restrictions for libraries, and there are no special requirements for binary log formats.

To be compatible with MySQL 5.6based parallel library replication, 5.7introduced a new variable slave-parallel-type, which can be configured with the following values:

DATABASE: default value, based on parallel replication of the library

LOGICAL_CLOCK: parallel replication based on group submission

The following two parallel replication methods are introduced respectively.

Parallel by library

Each worker thread corresponds to an hash table that holds the libraries involved in transactions currently in the execution queue of this worker. Where the key in the hash table is the database name, which is used to determine the distribution policy. The advantage of this strategy is that the hash value is built quickly, only the library name is needed, and there is no requirement for the format of binlog.

However, the effect of this strategy is good only if there are multiple DB on the main database and the pressure of each DB is balanced. Therefore, it does not have much effect for the tables on the main library to be placed in the same DB or different DB hotspots.

Group submission optimization

The feature is as follows:

Hongmeng official Strategic Cooperation to build HarmonyOS Technology Community

Transactions that can be committed in the same group will not be modified on the same line

Transactions that can be executed in parallel on the master library must also be executed in parallel on the slave library.

Exactly how to achieve it:

Hongmeng official Strategic Cooperation to build HarmonyOS Technology Community

Transactions committed together in the same group will have the same commit_id, the next group will be commit_id+1, and the commit_id will be written directly to the binlog

When the slave library is used, transactions of the same commit_id are distributed to multiple worker for parallel execution until the same set of commit_id execution ends, and the coordinator fetches the next batch.

More details can be found on the official website: https://dev.mysql.com/doc/refman/5.7/en/replication-options-slave.html

Let's start with master-slave delay.

Master-slave delay

What is the master-slave delay?

According to the previous principle of master-slave replication, we can see that there is a certain period of data inconsistency between the two, that is, the so-called master-slave delay.

Let's take a look at the point in time that caused the master-slave delay:

The main library An executes a transaction and writes it to binlog, which is recorded as T1.

Pass it to the slave library B and receive the binlog from the library as T2.

After executing this transaction from library B, the time is recorded as T3.

Then the so-called master-slave delay is the difference between the completion time of the slave database execution and the master database execution time of the same transaction, namely T3-T1.

We can also execute show slave status from the slave library, and the returned result will show seconds_behind_master, indicating how many seconds the slave library is currently delayed.

How does seconds_behind_master calculate?

The binlog of each transaction has a time field that records the time written on the main database

Take the time field of the currently executing transaction from the library, subtract it from the time of the current system, and get seconds_behind_master, which is the T3-T1 described earlier.

Master-slave delay reason

Why the master-slave delay?

Normally, if the network does not delay, then the time for logs to be transferred from the master to the slave database is quite short, so T2-T1 can basically ignore it.

The most direct impact is the time period for transferring logs (relaylog) from database consumption, which is generally caused by the following reasons:

1. The machine performance of the slave library is worse than that of the master library.

For example, put 20 master libraries on four machines and slave libraries on one machine. When the update operation is carried out at this time, a large number of read operations will be triggered during the update, resulting in multiple slave libraries on the slave machine competing for resources, resulting in master-slave delay.

At present, however, most deployments are based on machines with the same specifications.

2. The pressure from the library is great.

According to the normal strategy, the read and write are separated, the master library provides the write ability, and the slave library provides the read ability. Put a large number of queries on the slave database, resulting in the consumption of a lot of CPU resources on the slave database, which affects the synchronization speed and causes master-slave delay.

In this case, the reading pressure can be shared by one master and multiple slaves, or binlog can be output to an external system, such as Hadoop, which can provide query capabilities.

3. Execution of major affairs

Once a large transaction is executed, the main library must wait until the transaction is completed before it is written to the binlog.

For example, the main library executes an insert. Select has a very large insert operation, which results in the transfer of nearly a few hundred gigabytes of binlog files to read-only nodes, which in turn results in application binlog delays on read-only nodes.

As a result, DBA often reminds developers not to try delete statements to delete large amounts of data at once, but to control the amount as much as possible and proceed in batches.

4. DDL of the main library (alter, drop, create)

1. The DDL synchronization between the read-only node and the master database is carried out serially. If the DDL operation takes a long time in the master database, the slave database will consume the same time. For example, it takes 10 minutes to add a field to a 500W table in the master database, and it will also take 10 minutes to add a field to the slave node.

2. If there is a query on the slave node that takes a very long time to execute, the query will block the DDL from the master database and the table will be locked until the query ends, resulting in data delay of the slave node.

5. Lock conflict

Lock conflicts can also cause slow execution of SQL threads on slave nodes, such as having some select on the slave. SQL of for update, or using the MyISAM engine, etc.

6. the ability to copy from the library

In a general scenario, if the slave library is delayed for a few minutes due to accidental circumstances, it will catch up with the master library after recovering from the library. However, if the execution speed of the slave library is lower than that of the master library, and the master library is under constant pressure, it will lead to a long master-slave delay, which is likely to be a problem of the ability to copy from the slave library.

The execution on the slave library, that is, the sql_thread update logic, only supports single thread before version 5.6, so when the master database concurrency is high and the TPS is high, there will be a large master-slave delay.

As a result, MySQL has supported parallel replication since version 5. 7. You can set slave_parallel_workers to a number greater than 0 on the slave service, and then set the slave_parallel_type parameter to LOGICAL_CLOCK. This is fine.

The problem of master-slave synchronization is always a tradeoff between consistency and performance. It depends on the actual application scenario. If you want to reduce the master-slave delay, you can take the following measures:

Reduce the probability of concurrency of large multithreaded transactions and optimize business logic

Optimize SQL to avoid slow SQL and reduce batch operations. It is recommended to write scripts in the form of update-sleep.

Improve the configuration of the slave machine and reduce the poor efficiency of writing binlog and reading binlog from the master library.

Try to use a short link, that is, the distance between the master library and the slave server should be as short as possible to improve the port bandwidth and reduce the network delay of binlog transmission.

The business read required by real-time requirements is forced to go to the master database, and the slave database only makes disaster preparedness and backup.

This is the end of the content of "what is the reason for the master-slave delay". Thank you for your reading. If you want to know more about the industry, you can follow the website, the editor will output more high-quality practical articles for you!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.