In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-02-25 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Database >
Share
Shulou(Shulou.com)05/31 Report--
Editor to share with you the sample analysis of MHA research and application, I believe that most people do not know much about it, so share this article for your reference, I hope you will learn a lot after reading this article, let's go to know it!
Research and Application of MHA I. questions and requirements 1.1.The summary of questions
1. There is no tool to quickly switch the cluster master library. If you switch the master library, you need DBA to manually modify the slave library direction, modify meta information, etc.
2. Need to be able to go online quickly without affecting the current architecture.
3. All automatic processing is required to facilitate the use of DBA, such as inspection, operation, display, etc.
II. MHA introduction 2.1.What is MHA
MHA (Master High Availability) is currently a relatively mature solution for MySQL high availability. It was developed by youshimaton, a Japanese company of DeNA, and is a set of excellent high availability software for failover and master-slave upgrade in MySQL high availability environment.
In the process of MySQL failover, MHA can automatically complete the failover operation of the database within 30 seconds, and in the process of failover, MHA can ensure the consistency of the data to the maximum extent in order to achieve high availability in the real sense.
2.2.2.The MHA principle
Phase 1: Configuration Check Phase..
Mha will check the mha default file,then it can get the status of all mysql nodes and the relationship between them, who is master, who is slave and who is dead, who is alive.
Phase 2: Dead Master Shutdown Phase
Use master_ip_failover_scirpt and shutdown script to shutdown or inactive the dead master, (sush as IP or DNS switching,which was defined ina self-defined script in advance, just in case of split-brain) and I tend to use python.
(execute master_ip_failover_script-- command=stopssh to invalidate the original main library IP; execute SHUTDOWN script-- command=stopssh to close the original main library)
Phase 3: Master Recovery Phase
3.1 compare the pos points of all binlog from the library to find out latest binlog file&pos and oldest file&pos
3.2 try to get binlog from the original main database
3.3.According to the delay of no_master, candidate_master and slave libraries, the new master database is selected, and the missing logs of the new master database are obtained.
3.4 fill the log for the newly selected master library and activate the new master library. (generate change master to statement)
Phase 4: Slaves Recovery Phase
4.1 supplement logs to slave libraries: supplement logs from the master library, or from the lastest slave library to other slave libraries
4.2 execute "change master to" command in all avaiable slaves, which is generated in former steps
Phase 5: New master cleanup
Clear the slave info of the new main library
[info] * Phase 1: Configuration Check Phase..
[info] * * Phase 1: Configuration Check Phase completed.
[info] * Phase 2: Dead Master Shutdown Phase..
[info] * Phase 2: Dead Master Shutdown Phase completed.
[info] * Phase 3: Master Recovery Phase..
[info] * Phase 3.1: Getting Latest Slaves Phase..
[info] * Phase 3.2: Saving Dead Master's Binlog Phase..
[info] * Phase 3.3: Determining New Master Phase..
[info] * Phase 3.3: New Master Diff Log Generation Phase..
[info] * Phase 3.4: Master Log Apply Phase..
[info] * Phase 3: Master Recovery Phase completed.
[info] * Phase 4: Slaves Recovery Phase..
[info] * Phase 4.1: Starting Parallel Slave Diff Log Generation Phase..
[info] * Phase 4.2: Starting Parallel Slave Log Apply Phase..
[info] * Phase 5: New master cleanup phase..
[info] Sending mail..
2.3.The advantages of MHA
1. It does not affect server performance, is easy to install, and does not change the existing deployment.
2. Failover (automatic fault detection and failover, usually within 30 seconds;)
3. Data consistency guarantee
2.4.The mha mode
By node is divided into: manage/node mode, that is, MHA management machine and cluster node node
According to the switching, it can be divided into: online/failover mode switching, that is, online switching and main library damage switching.
2.5, MHA requirements
1. At least one master and one slave
2. Ssh mutual trust
3. Mysql account
4. Linux system
5. Mysql version 5.0 or later
6. Mysqlbinlog must be 3.3or above
7. Log-bin must be set on every mysql server that can be called master
8. Replication filtering rules must be consistent for all mysql servers
9. A replication account must be set up on a server that can become a master
10. Do not use the load datainfile command for statement-based replication
11. To turn off relay_log_purge, script / manual regular cleaning is required (cleaning method: set global relay_log_purge=1;flush logs;)
2.6. MHA version selection
Starting with version 0.56 of MHA, GTID-based failover is also supported. MHA automatically detects whether mysqld is running on GTID. If GTID is enabled, MHA performs failover with GTID. If not, MHA uses relay log-based failover.
2.7. Important parameters
Ignore_fail=1 ignores error reporting
Candidate_master=1 1: priority to become master 0: no priority
No_master=1 1: cannot be master 0: can be master
3. MHA implementation 3.1, architecture diagram
3.2. specific implementation and automation 3.2.1, mha operation, deployment, inspection and other programs-mhacluster
Integrate mha daily operation, deployment, check, repair and other functions
These include:
Add a mutual trust relationship
Install MHA packages, etc.
Authorization of relevant accounts
Check the correctness of ssh, repl, conf, etc.
Automatically fix the problem cluster
Automatically update the conf file for each cluster
Automatically update aliases for ease of use
Automatically clean up relaylog
The functions are as follows
3.2.2, mhacluster function-mutual trust
Central control / MHA management machine to all machines to complete mutual trust
Add mutual trust within the cluster, and use make_ssh_authentication.sh script to automatically add mutual trust
3.2.3. Mhacluster function-Authorization
For database users who need super permission, all instances of the source cluster are completed by IP.
3.2.4, mhacluster function-relay_log_purge setting
The default setting is off, which uses script tasks to clean up regularly.
Use the purge script that comes with MHA to deploy to crontab (it will be available automatically after installing node. This method is not used for the time being, so simulate this method for cleaning)
3.2.5. Mhacluster function-mysqlbinlog collects updates
As the binlog directory is not fixed at present, use scripts to collect meta-information for the time being.
3.3.6, mhacluster function-installation package
Install 2 software packages for all instances
Rpm-ivh / data/soft/perl-DBD-MySQL-4.013-3.el6.x86_64.rpm
Rpm-ivh / data/soft/mha4mysql-node-0.56-0.el6.noarch.rpm
3.3.7. Mhacluster function-check ssh/repl/conf
Check the masterha_check_ssh/repl required by mha
Check whether the mha configuration file is consistent with the meta information
3.3.8, mhacluster function-automatically fix problem clusters
Automatically fix clusters with ssh/repl/conf check problems
3.3.9. Mhacluster function-updating aliases and mha configuration files
Update common aliases for mha
Alias masterha_check_ssh.1='masterha_check_repl-conf=/data/masterha/conf/1#testdb
Alias masterha_check_repl.1='masterha_check_repl-conf=/data/masterha/conf/1#testdb
Note: the 1 here is the cluster number.
Update mha profile
Update the conf file of mha based on meta information
3.3.9. Mhacluster function-authorized mha requires account
Automatically authorize the account required by mha
3.4. Meta-information field situation
Cluster_id cluster number
Cluster_name cluster name
Role:Master,Slave,Backup role
Binlog_dir binlog address
No_master can't be master,1. It can't be master,0.
Whether candidate_master is priority can be cut as master,1 priority, 0 does not give priority'
Whether mha_write_into_conf is written to the configuration file, 1 is written, 0 is not written
3.5. MHA configuration 3.5.1, MHA default configuration file
[root@dbmon conf] # vi / etc/masterha_default.cnf
[server default]
Manager_workdir=/data/masterha/work/
User=dba
Password=
Ssh_user=root
Repl_user=repadm
Repl_password=
Ping_interval=30
Shutdown_script= ""
Master_ip_failover_script= "/ data/mha/master_ip_failover_script.py" Note: failover mode switch main script
Master_ip_online_change_script= "/ data/mha/master_ip_online_change_script.py" Note: online mode switch main script
Report_script= "/ data/mha/send_report.py" Note: send the main script after switching mail
3.5.2. Cluster conf configuration file for mha
Automatic generation based on meta-information
[server1]
Hostname=10.0.0.1
Port=3306
Master_binlog_dir=/data/my3306/
Ignore_fail=1
No_master=1
Candidate_master=0
[server2]
Hostname=10.0.0.2
Port=3306
Master_binlog_dir=/data/my3306/
Ignore_fail=1
No_master=0
Candidate_master=0
3.6. switch program-mhaswitch
Switching program, encapsulated by Python, convenient for daily switching
Support batch switching of clusters
Switching mode: online/dead mode switching, that is, the original master library survival switch, the original master library failover
3.6.1, mhaswitch architecture
Note: other auxiliary scripts are not marked for the time being
3.6.2. Mhaswitch function-display switching information
You can display the instance information of the cluster, as follows
3.6.3, mhaswitch function-2 switching modes
Supports switching between online mode and failover mode (alive,dead of the corresponding program)
Online mode: you can choose whether the original master library is the slave library of the new master library.
Failover mode: close the original main library to switch
3.6.4, mhaswitch Auxiliary script-master_ip_failover_script.py function
When the incoming command is stopssh or stop, the original primary library is closed
Wait 2 seconds to check whether the original main library is closed. If it is not closed, it will print "old master still run,please check", and the program will exit.
When the incoming command is start, start to modify the metadata
Modify the correspondence between domain name and IP
Set the new main library read_only=off and modify the configuration file at the same time
Modify the original main library read_only=on and modify the configuration file at the same time
3.6.5, mhaswitch Auxiliary script-master_ip_online_change_script.py function
When the incoming command is stopssh or stop, the original master library is set to read_only
Check whether the original main library is read_only. If there is no read_only, it will print "not read_only,please check", and the program exits.
When the incoming command is start, start to modify the metadata
Modify the correspondence between domain name and IP
Set the new main library read_only=off and modify the configuration file at the same time
Modify the original main library read_only=on and modify the configuration file at the same time
3.6.6. Mhaswitch Auxiliary script-send_report.py function
Send the log of the failover mode switch, and the switch result
Send MHA profile address
Old main library IP, port
New main library IP, port
Library name, service name
Check the cluster status after switching (in tabular form):
Cluster number, IP, role, can connect, slave sql thread, slave io thread, host check of main library referred to by slave, port check of main library referred to by slave, slave delay, number of connections, / data space condition, read-only case, time
3.6.7, mhaswitch Auxiliary script-online_switch_send_report.py function
Send the log of online switching, and the switching result
Send MHA profile address
Old main library IP, port
New main library IP, port
Library name, service name
Check the cluster status after switching (in tabular form):
Cluster number, IP, role, can connect, slave sql thread, slave io thread, host check of main library referred to by slave, port check of main library referred to by slave, slave delay, number of connections, / data space condition, read-only case, time
3.6.8. Mhaswitch Auxiliary script-change_domain_ip.sh function
Change the corresponding script of domain name-IP
3.7.1. start switching 3.7.1, mhaswitch instructions
Mhaswitch
Please enter the cluster number to switch: 78 Note: the cluster number to be switched
# Please enter the status of the main database in the cluster ['78'] [alive/dead]: alive Note: select the switching method
Alive, do not close the original main library
Dead, close the original main library
The online mode of MHA will be used to switch, the master library will not be closed, and will the 'old master library' serve as the 'slave library' for the 'new cluster'? [yes/no], default no:yes Note: after selecting alive: you need to choose whether the original master library is used as the slave library of the new master library. Yes is, no does not do it (that is, does not close after setting read_only)
# whether to switch [yes/no]. Default no:yes Note: if you want to confirm the start of the switch, yes will start the switch, and no will exit.
3.7.2, check after switching
View mhaswitch output
Check email
View instance status report
View new main library access, etc.
Check data consistency, etc.
3.8. Examples of switching
Here is only one example of switching online mode, which is similar to failover mode.
3.8.1. Cluster topology before switching
3.8.2. Mhaswitch handoff
3.8.3. After switching
Topology situation
Email situation
Input time: about 10 seconds
Switching time: about 3-5 seconds
Check, update and email time: 5 seconds
Total: about 18-20 seconds, the actual impact on business writing time is about 3-5 seconds
Monitoring 4.1and MHA daily inspection and monitoring
Check the correctness of ssh, repl and conf, check that the program is mhacluster, and store the results in the database and display them with django front end.
The command is:
Python mhacluster.py-options=check_all_mha_ssh_repl_conf
The front-end report is:
Automatic repair is supported, that is, it can be repaired according to the error condition. The command is as follows:
Python mhacluster.py-options=auto_repair_all_mha_fail
4.2. Relaylog_purge monitoring
Clean up the expired relay_log, check the running status of the program and the status after cleaning, store it and display it at the front end, the command is as follows
Python mhacluster.py-options=purge_relaylog_all
Front end:
4.3. Instance status check
Check the running status of the instance, including readonly, IO,SQL thread of slave database, whether the master database IP,port of slave database is correct, master-slave delay, number of connections, space situation, etc., to facilitate viewing the status of the switched cluster and quickly locate the problem.
Python mysql_check.py-options=check_all
The front-end report is as follows:
Fifth, common switching error reporting and handling 5.1, common switching situations
No switch, no change in meta information, etc.
Switched, and some failed to switch from the library instance
5.2.The masterha_check_ssh/repl check of MHA failed and cannot be switched
Situation: no switching, no meta-information modification
Reason:
There are several reasons why the masterha_check_repl check fails:
No trust relationship
The issue of account authority
There is no way to set problems for the main library, etc.
Resolve:
The 1BI 2 problem can be solved by initializing the mha environment.
Python mhacluster.py-options=add_mha_single_cluster-cluster_ids=1,2,3
3 question: database is not cut, priority is cut, whether to write configuration file configuration error, just change it correctly
5.3.The MHA switch, some failed to switch from the library.
Situation: it has been switched, and there is a problem with some slave libraries.
Reason: the reason is complicated, such as the network, the change command itself cannot be executed (for example, some configuration of 5.7s), etc.
Deal with:
Check the status of instances to confirm which instances have problems
View the report: you can confirm the instance status, for example:
Fix the slave library that failed to switch
Repair or redo it according to the log and so on.
VI. Online performance and advantages 6.1, online switching
At present, it has been successfully running online for more than 4 months, and more than 6 online clusters have been switched (online test environment switch 30 +). No problem has been found yet.
The real switching time is about seconds (mostly between 3 and 5 seconds)
6.2. what happened before using the MHA environment
1. There is no convenient tool to quickly switch the cluster master database. When the master database fails:
It takes 5 minutes for DBA to manually modify the direction from the library, 2-3 minutes to check the cluster status, and 3-5 minutes to modify meta-information.
The actual downtime is 3-5 minutes; the total waiting time is 10-20 minutes.
(DBA operations only)
2. There is no tool to quickly display the topology of a cluster.
3. There is no tool to quickly check the operation of the instance.
6.3. what happens after using the MHA environment
1. Use mhaswitch tools to quickly switch between main libraries.
Can reduce the risk of data loss
Less writing time, second level
It takes about 5 minutes before and after switching (only DBA operation). The actual shutdown operation is about 3-30s (if the online switching is about 3-5s, if it is switching over the original main library, it is more than 10s).
The efficiency of DBA is increased by 50% and 75%. If it is fast, the total time can be controlled within 1 minute.
Online actual operation: input information for about 10 seconds, switching effects on writing for about 3-5 seconds, updates and checks for about 9 seconds, totaling about 22-24 seconds.
Mhacluster integrated deployment, all mysql instances can be automatically deployed in a few hours (currently nearly 700instances and 500machines, actual deployment and inspection takes about 4-6 hours)
No need for DBA to manually modify the master and slave, saving manual operation time of about 10-20 minutes
There is no need for DBA to manually modify meta information, saving about 3-5 minutes to modify meta information
There is no need for DBA to manually adjust the domain name IP relationship, saving about 1-3 minutes to adjust the domain name IP time.
Encapsulate MHA to facilitate the use of DBA without tedious commands
Mail program, send all the information, you can quickly view the switching results and logs, etc.
2. Query the cluster topology tool qmysql,1 to view the cluster topology in seconds
3. Check the cluster status tool mysql_check, and query nearly 700instances. It only takes about 30s for nearly 500machines.
4. Django front-end display, MHA monitoring, reporting, convenient monitoring of MHA and troubleshooting, etc.
The above is all the contents of the article "sample Analysis of MHA Research and Application". Thank you for reading! I believe we all have a certain understanding, hope to share the content to help you, if you want to learn more knowledge, welcome to follow the industry information channel!
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.