Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

How to repair and find problems after MYSQL MGR crash

2025-02-23 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Share

Shulou(Shulou.com)06/02 Report--

This article introduces you how to repair and find problems after the MYSQL MGR crash, the content is very detailed, interested friends can refer to, I hope it can be helpful to you.

MYSQL's GROUP REPLICATION estimates that most companies are useless, even if they are not in major projects and key areas. So there is not much about the repair of MYSQL Group Replicaiton on the Internet. Coincidentally, the MGR of our test system crashed recently.

Our MGR test system is composed of three MYSQL 5.7.23 + Proxysql, once broke down a machine (network reasons), but MGR steadily provides database services, this crash is not so simple compared with the last time. Two of the three machines hung up. In fact, it has something to do with the lack of monitoring, but because it is a testing machine, there is no monitoring, so there is this exploration)

From the second machine (Secondary), the primary machine cannot be accessed. Machine 3 is not in member list at all. Machine 3 is in the state of ERROR on this machine. Although the main library seems to be alive, it has been unable to log in.

Both project manager and developers use this test system, so it only takes one word to analyze and solve problems. In fact, I want to analyze in detail what went wrong.

After saving the error log, I try to restore the main library. I can log in after restarting, and run the command again. Usually you have to start all over again, and it's best to know that the library in the crash is the last main library. Then operate the following command on that main library. (this is very important and related to the subsequent recovery)

SET GLOBAL group_replication_bootstrap_group=ON

Start group_replication

SET GLOBAL group_replication_bootstrap_group=OFF

After the operation command, the main library has been started and the following log has been generated

Go to the second machine for recovery

Re-execute

CHANGE MASTER TO MASTER_USER='repl', MASTER_PASSWORD='password' FOR CHANNEL 'group_replication_recovery'

SET GLOBAL group_replication_allow_local_disjoint_gtids_join=ON; (this command no longer exists on MYSQL 8)

Start group_replication

SET GLOBAL group_replication_allow_local_disjoint_gtids_join=OFF

After the execution, wait a moment.

The two machines have been restored, should be able to work properly, and the cluster can be accessed from the proxysql.

At present, there is still one machine missing, but the recovery process of this machine is not that simple. In the process of re-adding the third machine to the cluster, a problem was found.

[ERROR] Error reading packet from server for channel 'group_replication_recovery': The slave is connecting using CHANGE MASTER TO MASTER_AUTO_POSITION = 1, but the master has purged binary logs containing GTIDs that the slave requires. (server_errno=1236)

From this mistake, I can infer at least two things.

If this server wants to join the cluster directly, it is highly unlikely that the log can no longer keep up.

2 the time when the server is separated from the cluster must be earlier than the time when the cluster fails.

Later, in the process of analyzing the error log, I proved my guess above.

How to restore this third machine, the fastest thing is to restore it after backup. After XTRABACKUP backed up the main database, it was found that it was very slow in perpare, and during backup, it was also very slow in the backup of the log. It is estimated that there must be something wrong inside.

During the recovery process, it is strange that after restoring the backup files to the third machine, prompt

When I came back to look at the former primary's No. 1 machine, it was indeed crash.

And there are also problems with doublewrite, some of the data may not be written in, which leads to the problem that the backup of the host computer will lead to the failure of the third machine when it is restored later.

Later, because the database of Unit 2 is still normal, so directly resetart the MYSQL of Unit 1. The following figure is one of the reasons why the backup of Unit 1 is extremely slow when backing up and XTRABACKUP PERPARE. Most of the data need to be UNDO

The current situation is that when Unit 12 starts normally, according to the state at that time, Machine 1 is also used as primary (the weight of MGR has been set in the configuration file). Here, the operation of re-operating MGR initialization is omitted (the text of MGR installation was previously written).

Soon, machines 1 and 2 returned to normal, the cluster returned to normal, and external visits were also normal.

Next, let's go back to the last question of how to restore Unit 3. Through backup and recovery, Unit 3 has been normal. After startup, Unit 3 automatically started to connect to the cluster, but the result was a failure. Finally, after 10 attempts, it was raised by the cluster. The error reason is also very simple, that is, there is a data conflict. We directly set the GTID information in the XTRABAKCUP file at the time of backup.

If you drop this GTID PURGED, you will be OK.

Add number three to the cluster again, OK

The whole cluster is restored.

According to the error log and some related instructions, the general problem is that Machine 3 has been separated from the cluster for some time due to network reasons, and the problem of unavailability of the cluster is roughly due to the tester's stress test of the system, which is also posted on the picture above. The cleaning thread is unable to refresh the dirty pages of memory to disk in time. However, it is estimated that it is impossible to find out what specific statements were executed at that time. Later, we will consider installing the audit function and recording related statements to provide more information for the handling of the problem.

On how to repair and find problems after the MYSQL MGR crash to share here, I hope that the above content can be of some help to you, can learn more knowledge. If you think the article is good, you can share it for more people to see.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Internet Technology

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report