Example Analysis of MHA Research and Application 07/11 Update SLTechnology News&Howtos

Example Analysis of MHA Research and Application

2025-07-11 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Database >

Shulou(Shulou.com)05/31 Report--

Editor to share with you the sample analysis of MHA research and application, I believe that most people do not know much about it, so share this article for your reference, I hope you will learn a lot after reading this article, let's go to know it!

Research and Application of MHA I. questions and requirements 1.1.The summary of questions

1. There is no tool to quickly switch the cluster master library. If you switch the master library, you need DBA to manually modify the slave library direction, modify meta information, etc.

2. Need to be able to go online quickly without affecting the current architecture.

3. All automatic processing is required to facilitate the use of DBA, such as inspection, operation, display, etc.

II. MHA introduction 2.1.What is MHA

MHA (Master High Availability) is currently a relatively mature solution for MySQL high availability. It was developed by youshimaton, a Japanese company of DeNA, and is a set of excellent high availability software for failover and master-slave upgrade in MySQL high availability environment.

In the process of MySQL failover, MHA can automatically complete the failover operation of the database within 30 seconds, and in the process of failover, MHA can ensure the consistency of the data to the maximum extent in order to achieve high availability in the real sense.

2.2.2.The MHA principle

Phase 1: Configuration Check Phase..

Mha will check the mha default file,then it can get the status of all mysql nodes and the relationship between them, who is master, who is slave and who is dead, who is alive.

Phase 2: Dead Master Shutdown Phase

Use master_ip_failover_scirpt and shutdown script to shutdown or inactive the dead master, (sush as IP or DNS switching,which was defined ina self-defined script in advance, just in case of split-brain) and I tend to use python.

(execute master_ip_failover_script-- command=stopssh to invalidate the original main library IP; execute SHUTDOWN script-- command=stopssh to close the original main library)

Phase 3: Master Recovery Phase

3.1 compare the pos points of all binlog from the library to find out latest binlog file&pos and oldest file&pos

3.2 try to get binlog from the original main database

3.3.According to the delay of no_master, candidate_master and slave libraries, the new master database is selected, and the missing logs of the new master database are obtained.

3.4 fill the log for the newly selected master library and activate the new master library. (generate change master to statement)

Phase 4: Slaves Recovery Phase

4.1 supplement logs to slave libraries: supplement logs from the master library, or from the lastest slave library to other slave libraries

4.2 execute "change master to" command in all avaiable slaves, which is generated in former steps

Phase 5: New master cleanup

Clear the slave info of the new main library

[info] * Phase 1: Configuration Check Phase..

[info] * * Phase 1: Configuration Check Phase completed.

[info] * Phase 2: Dead Master Shutdown Phase..

[info] * Phase 2: Dead Master Shutdown Phase completed.

[info] * Phase 3: Master Recovery Phase..

[info] * Phase 3.1: Getting Latest Slaves Phase..

[info] * Phase 3.2: Saving Dead Master's Binlog Phase..

[info] * Phase 3.3: Determining New Master Phase..

[info] * Phase 3.3: New Master Diff Log Generation Phase..

[info] * Phase 3.4: Master Log Apply Phase..

[info] * Phase 3: Master Recovery Phase completed.

[info] * Phase 4: Slaves Recovery Phase..

[info] * Phase 4.1: Starting Parallel Slave Diff Log Generation Phase..

[info] * Phase 4.2: Starting Parallel Slave Log Apply Phase..

[info] * Phase 5: New master cleanup phase..

[info] Sending mail..

2.3.The advantages of MHA

1. It does not affect server performance, is easy to install, and does not change the existing deployment.

2. Failover (automatic fault detection and failover, usually within 30 seconds;)

3. Data consistency guarantee

2.4.The mha mode

By node is divided into: manage/node mode, that is, MHA management machine and cluster node node

According to the switching, it can be divided into: online/failover mode switching, that is, online switching and main library damage switching.

2.5, MHA requirements

1. At least one master and one slave

2. Ssh mutual trust

3. Mysql account

4. Linux system

5. Mysql version 5.0 or later

6. Mysqlbinlog must be 3.3or above

7. Log-bin must be set on every mysql server that can be called master

8. Replication filtering rules must be consistent for all mysql servers

9. A replication account must be set up on a server that can become a master

10. Do not use the load datainfile command for statement-based replication

11. To turn off relay_log_purge, script / manual regular cleaning is required (cleaning method: set global relay_log_purge=1;flush logs;)

2.6. MHA version selection

Starting with version 0.56 of MHA, GTID-based failover is also supported. MHA automatically detects whether mysqld is running on GTID. If GTID is enabled, MHA performs failover with GTID. If not, MHA uses relay log-based failover.

2.7. Important parameters

Ignore_fail=1 ignores error reporting

Candidate_master=1 1: priority to become master 0: no priority

No_master=1 1: cannot be master 0: can be master

3. MHA implementation 3.1, architecture diagram

3.2. specific implementation and automation 3.2.1, mha operation, deployment, inspection and other programs-mhacluster

Integrate mha daily operation, deployment, check, repair and other functions

These include:

Add a mutual trust relationship

Install MHA packages, etc.

Authorization of relevant accounts

Check the correctness of ssh, repl, conf, etc.

Automatically fix the problem cluster

Automatically update the conf file for each cluster

Automatically update aliases for ease of use

Automatically clean up relaylog

The functions are as follows

3.2.2, mhacluster function-mutual trust

Central control / MHA management machine to all machines to complete mutual trust

Add mutual trust within the cluster, and use make_ssh_authentication.sh script to automatically add mutual trust

3.2.3. Mhacluster function-Authorization

For database users who need super permission, all instances of the source cluster are completed by IP.

3.2.4, mhacluster function-relay_log_purge setting

The default setting is off, which uses script tasks to clean up regularly.

Use the purge script that comes with MHA to deploy to crontab (it will be available automatically after installing node. This method is not used for the time being, so simulate this method for cleaning)

3.2.5. Mhacluster function-mysqlbinlog collects updates

As the binlog directory is not fixed at present, use scripts to collect meta-information for the time being.

3.3.6, mhacluster function-installation package

Install 2 software packages for all instances

Rpm-ivh / data/soft/perl-DBD-MySQL-4.013-3.el6.x86_64.rpm

Rpm-ivh / data/soft/mha4mysql-node-0.56-0.el6.noarch.rpm

3.3.7. Mhacluster function-check ssh/repl/conf

Check the masterha_check_ssh/repl required by mha

Check whether the mha configuration file is consistent with the meta information

3.3.8, mhacluster function-automatically fix problem clusters

Automatically fix clusters with ssh/repl/conf check problems

3.3.9. Mhacluster function-updating aliases and mha configuration files

Update common aliases for mha

Alias masterha_check_ssh.1='masterha_check_repl-conf=/data/masterha/conf/1#testdb

Alias masterha_check_repl.1='masterha_check_repl-conf=/data/masterha/conf/1#testdb

Note: the 1 here is the cluster number.

Update mha profile

Update the conf file of mha based on meta information

3.3.9. Mhacluster function-authorized mha requires account

Automatically authorize the account required by mha

3.4. Meta-information field situation

Cluster_id cluster number

Cluster_name cluster name

Role:Master,Slave,Backup role

Binlog_dir binlog address

No_master can't be master,1. It can't be master,0.

Whether candidate_master is priority can be cut as master,1 priority, 0 does not give priority'

Whether mha_write_into_conf is written to the configuration file, 1 is written, 0 is not written

3.5. MHA configuration 3.5.1, MHA default configuration file

[root@dbmon conf] # vi / etc/masterha_default.cnf

[server default]

Manager_workdir=/data/masterha/work/

User=dba

Password=

Ssh_user=root

Repl_user=repadm

Repl_password=

Ping_interval=30

Shutdown_script= ""

Master_ip_failover_script= "/ data/mha/master_ip_failover_script.py" Note: failover mode switch main script

Master_ip_online_change_script= "/ data/mha/master_ip_online_change_script.py" Note: online mode switch main script

Report_script= "/ data/mha/send_report.py" Note: send the main script after switching mail

3.5.2. Cluster conf configuration file for mha

Automatic generation based on meta-information

[server1]

Hostname=10.0.0.1

Port=3306

Master_binlog_dir=/data/my3306/

Ignore_fail=1

No_master=1

Candidate_master=0

[server2]

Hostname=10.0.0.2

Port=3306

Master_binlog_dir=/data/my3306/

Ignore_fail=1

No_master=0

Candidate_master=0

3.6. switch program-mhaswitch

Switching program, encapsulated by Python, convenient for daily switching

Support batch switching of clusters

Switching mode: online/dead mode switching, that is, the original master library survival switch, the original master library failover

3.6.1, mhaswitch architecture

Note: other auxiliary scripts are not marked for the time being

3.6.2. Mhaswitch function-display switching information

You can display the instance information of the cluster, as follows

3.6.3, mhaswitch function-2 switching modes

Supports switching between online mode and failover mode (alive,dead of the corresponding program)

Online mode: you can choose whether the original master library is the slave library of the new master library.

Failover mode: close the original main library to switch

3.6.4, mhaswitch Auxiliary script-master_ip_failover_script.py function

When the incoming command is stopssh or stop, the original primary library is closed

Wait 2 seconds to check whether the original main library is closed. If it is not closed, it will print "old master still run,please check", and the program will exit.

When the incoming command is start, start to modify the metadata

Modify the correspondence between domain name and IP

Set the new main library read_only=off and modify the configuration file at the same time

Modify the original main library read_only=on and modify the configuration file at the same time

3.6.5, mhaswitch Auxiliary script-master_ip_online_change_script.py function

When the incoming command is stopssh or stop, the original master library is set to read_only

Check whether the original main library is read_only. If there is no read_only, it will print "not read_only,please check", and the program exits.

When the incoming command is start, start to modify the metadata

Modify the correspondence between domain name and IP

Set the new main library read_only=off and modify the configuration file at the same time

Modify the original main library read_only=on and modify the configuration file at the same time

3.6.6. Mhaswitch Auxiliary script-send_report.py function

Send the log of the failover mode switch, and the switch result

Send MHA profile address

Old main library IP, port

New main library IP, port

Library name, service name

Check the cluster status after switching (in tabular form):

Cluster number, IP, role, can connect, slave sql thread, slave io thread, host check of main library referred to by slave, port check of main library referred to by slave, slave delay, number of connections, / data space condition, read-only case, time

3.6.7, mhaswitch Auxiliary script-online_switch_send_report.py function

Send the log of online switching, and the switching result

Send MHA profile address

Old main library IP, port

New main library IP, port

Library name, service name

Check the cluster status after switching (in tabular form):

3.6.8. Mhaswitch Auxiliary script-change_domain_ip.sh function

Change the corresponding script of domain name-IP

3.7.1. start switching 3.7.1, mhaswitch instructions

Mhaswitch

Please enter the cluster number to switch: 78 Note: the cluster number to be switched

# Please enter the status of the main database in the cluster ['78'] [alive/dead]: alive Note: select the switching method

Alive, do not close the original main library

Dead, close the original main library

The online mode of MHA will be used to switch, the master library will not be closed, and will the 'old master library' serve as the 'slave library' for the 'new cluster'? [yes/no], default no:yes Note: after selecting alive: you need to choose whether the original master library is used as the slave library of the new master library. Yes is, no does not do it (that is, does not close after setting read_only)

# whether to switch [yes/no]. Default no:yes Note: if you want to confirm the start of the switch, yes will start the switch, and no will exit.

3.7.2, check after switching

View mhaswitch output

Check email

View instance status report

View new main library access, etc.

Check data consistency, etc.

3.8. Examples of switching

Here is only one example of switching online mode, which is similar to failover mode.

3.8.1. Cluster topology before switching

3.8.2. Mhaswitch handoff

3.8.3. After switching

Topology situation

Email situation

Input time: about 10 seconds

Switching time: about 3-5 seconds

Check, update and email time: 5 seconds

Total: about 18-20 seconds, the actual impact on business writing time is about 3-5 seconds

Monitoring 4.1and MHA daily inspection and monitoring

Check the correctness of ssh, repl and conf, check that the program is mhacluster, and store the results in the database and display them with django front end.

The command is:

Python mhacluster.py-options=check_all_mha_ssh_repl_conf

The front-end report is:

Automatic repair is supported, that is, it can be repaired according to the error condition. The command is as follows:

Python mhacluster.py-options=auto_repair_all_mha_fail

4.2. Relaylog_purge monitoring

Clean up the expired relay_log, check the running status of the program and the status after cleaning, store it and display it at the front end, the command is as follows

Python mhacluster.py-options=purge_relaylog_all

Front end:

4.3. Instance status check

Check the running status of the instance, including readonly, IO,SQL thread of slave database, whether the master database IP,port of slave database is correct, master-slave delay, number of connections, space situation, etc., to facilitate viewing the status of the switched cluster and quickly locate the problem.

Python mysql_check.py-options=check_all

The front-end report is as follows:

Fifth, common switching error reporting and handling 5.1, common switching situations

No switch, no change in meta information, etc.

Switched, and some failed to switch from the library instance

5.2.The masterha_check_ssh/repl check of MHA failed and cannot be switched

Situation: no switching, no meta-information modification

Reason:

There are several reasons why the masterha_check_repl check fails:

No trust relationship

The issue of account authority

There is no way to set problems for the main library, etc.

Resolve:

The 1BI 2 problem can be solved by initializing the mha environment.

Python mhacluster.py-options=add_mha_single_cluster-cluster_ids=1,2,3

3 question: database is not cut, priority is cut, whether to write configuration file configuration error, just change it correctly

5.3.The MHA switch, some failed to switch from the library.

Situation: it has been switched, and there is a problem with some slave libraries.

Reason: the reason is complicated, such as the network, the change command itself cannot be executed (for example, some configuration of 5.7s), etc.

Deal with:

Check the status of instances to confirm which instances have problems

View the report: you can confirm the instance status, for example:

Fix the slave library that failed to switch

Repair or redo it according to the log and so on.

VI. Online performance and advantages 6.1, online switching

At present, it has been successfully running online for more than 4 months, and more than 6 online clusters have been switched (online test environment switch 30 +). No problem has been found yet.

The real switching time is about seconds (mostly between 3 and 5 seconds)

6.2. what happened before using the MHA environment

1. There is no convenient tool to quickly switch the cluster master database. When the master database fails:

It takes 5 minutes for DBA to manually modify the direction from the library, 2-3 minutes to check the cluster status, and 3-5 minutes to modify meta-information.

The actual downtime is 3-5 minutes; the total waiting time is 10-20 minutes.

(DBA operations only)

2. There is no tool to quickly display the topology of a cluster.

3. There is no tool to quickly check the operation of the instance.

6.3. what happens after using the MHA environment

1. Use mhaswitch tools to quickly switch between main libraries.

Can reduce the risk of data loss

Less writing time, second level

It takes about 5 minutes before and after switching (only DBA operation). The actual shutdown operation is about 3-30s (if the online switching is about 3-5s, if it is switching over the original main library, it is more than 10s).

The efficiency of DBA is increased by 50% and 75%. If it is fast, the total time can be controlled within 1 minute.

Online actual operation: input information for about 10 seconds, switching effects on writing for about 3-5 seconds, updates and checks for about 9 seconds, totaling about 22-24 seconds.

Mhacluster integrated deployment, all mysql instances can be automatically deployed in a few hours (currently nearly 700instances and 500machines, actual deployment and inspection takes about 4-6 hours)

No need for DBA to manually modify the master and slave, saving manual operation time of about 10-20 minutes

There is no need for DBA to manually modify meta information, saving about 3-5 minutes to modify meta information

There is no need for DBA to manually adjust the domain name IP relationship, saving about 1-3 minutes to adjust the domain name IP time.

Encapsulate MHA to facilitate the use of DBA without tedious commands

Mail program, send all the information, you can quickly view the switching results and logs, etc.

2. Query the cluster topology tool qmysql,1 to view the cluster topology in seconds

3. Check the cluster status tool mysql_check, and query nearly 700instances. It only takes about 30s for nearly 500machines.

4. Django front-end display, MHA monitoring, reporting, convenient monitoring of MHA and troubleshooting, etc.

The above is all the contents of the article "sample Analysis of MHA Research and Application". Thank you for reading! I believe we all have a certain understanding, hope to share the content to help you, if you want to learn more knowledge, welcome to follow the industry information channel!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.