MHA sharing of MySQL 07/19 Update SLTechnology News&Howtos

MHA sharing of MySQL

2025-07-19 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Database >

Shulou(Shulou.com)06/01 Report--

This article mainly introduces the MHA sharing of MySQL, the things involved, learned from the theoretical knowledge, there are many books and documents for your reference, from the perspective of practical significance, accumulated years of practical experience can be shared with you.

Brief introduction:

MHA (Master High Availability) is currently a relatively mature solution for MySQL high availability. It was developed by youshimaton, a Japanese DeNA company (now working for Facebook). It is a set of excellent high availability software for failover and master-slave upgrade in MySQL high availability environment. In the process of MySQL failover, MHA can automatically complete the failover operation of the database within 30 seconds, and in the process of failover, MHA can ensure the consistency of the data to the maximum extent in order to achieve high availability in the real sense.

The software consists of two parts: MHA Manager (management node) and MHA Node (data node). MHA Manager can be deployed on a separate machine to manage multiple master-slave clusters, or it can be deployed on a slave node. MHA Node runs on each MySQL CVM, and MHA Manager regularly detects the master nodes in the cluster. When the master fails, it can automatically upgrade the slave of the latest data to the new master, and then redirect all other slave to the new master. The entire failover process is completely transparent to the application.

In the process of MHA automatic failover, MHA tries to save binary logs from the down primary server to ensure that the data is not lost as much as possible, but this is not always feasible. For example, if the primary server hardware fails or cannot be accessed through ssh, MHA cannot save binary logs and only fails over and loses the latest data. With semi-synchronous replication of MySQL 5.5, the risk of data loss can be greatly reduced. MHA can be combined with semi-synchronous replication. If only one slave has received the latest binary log, MHA can apply the latest binary log to all other slave servers, thus ensuring data consistency across all nodes.

At present, MHA mainly supports the architecture of one master and multiple slaves. To build MHA, there must be at least three database servers in a replication cluster. One master and two slaves, that is, one serves as master, one acts as standby master, and the other acts as slave library, because at least three servers are needed. Taobao has also carried out transformation on this basis for the consideration of machine cost. At present, Taobao TMHA already supports one master and one slave. In addition, for those who want to build quickly, please refer to: MHA Quick build

In fact, we can also use one master and one slave for our own use, but the master host cannot be switched after downtime, and the binlog cannot be completed. After the mysqld process of master crash, you can still switch successfully and complete the binlog.

Official introduction: https://code.google.com/p/mysql-master-ha/

Figure 01 shows how to manage multiple sets of master-slave replications through MHA Manager. You can summarize how MHA works as follows:

(figure 01)

(1) Save binary log events (binlog events) from crashed master

(2) identify the slave with the latest updates

(3) Relay logs (relay log) that apply differences to other slave

(4) apply binary log events saved from master (binlog events)

(5) upgrade a slave to a new master

(6) make other slave connect to the new master for replication

MHA software consists of two parts, Manager toolkit and Node toolkit, which are described in detail as follows.

The Manager toolkit mainly includes the following tools:

Masterha_check_ssh check MHA SSH configuration status masterha_check_repl check MySQL replication status masterha_manger launch MHAmasterha_check_status check current MHA operational status masterha_master_monitor detect master downtime masterha_master_switch control failover (automatic or manual) masterha_conf_host Add or remove configured server information

The Node toolkit (these tools are usually triggered by MHA Manager scripts and do not require human manipulation) mainly includes the following tools:

Save_binary_logs saves and replicates master's binary log apply _ diff_relay_logs to identify differential relay log events and apply them to other slavefilter_mysqlbinlog to remove unnecessary ROLLBACK events (MHA no longer uses this tool) purge_relay_logs clears the relay log (does not block SQL threads)

Note:

In order to minimize the data loss caused by hardware damage and downtime of the main library, it is recommended to configure semi-synchronous replication as MySQL 5.5while configuring MHA. Refer to the principle of semi-synchronous replication for yourselves. (not necessary)

1. Deploy MHA

Next, deploy MHA, and the specific construction environment is as follows (all operating systems are centos 6.264bit, not required. Server03 and server04 are the followers of server02. The replication environment will be briefly demonstrated after the replication environment is built, but the relevant secure replication will not be described in detail. For children's shoes, please refer to the previous article and the issues that MySQL Replication needs to pay attention to):

Role ip address hostname server_id type Monitor host 192.168.0.20 server01-Monitoring replication group Master 192.168.0.50 server02 1 write Candicate master 192.168.0.60 server03 2 read Slave 192.168.0.70 server04 3 read

Master provides write service, alternative master (actual slave, hostname server03) provides read service, and slave also provides related read service. Once the master is down, the alternative master will be promoted to a new master,slave pointing to a new master.

(1) install the perl module (DBD:mysql) required for MHA node on all nodes. The installation script is as follows:

[root@192.168.0.50 ~] # cat install.sh #! / bin/bashwget http://xrl.us/cpanm-- no-check-certificatemv cpanm / usr/binchmod 755 / usr/bin/cpanmcat > / root/list all.sql

Where-master-data=2 represents backup time to record the Binlog location of master and Position,--single-transaction means to take a consistent snapshot,-R means backup stored procedures and functions,-triggres means backup trigger, and-A represents backup of all libraries. For more information, please check it at mysqldump-help.

(2) create a replication user on server02:

Mysql > grant replication slave on *. * to 'repl'@'192.168.0.%' identified by' 123456The query OK, 0 rows affected (0.00 sec) mysql > flush privileges;Query OK, 0 rows affected (0.00 sec) mysql >

(3) check the name and location of binlog, MASTER_LOG_FILE and MASTER_LOG_POS when backing up the main database:

[root@192.168.0.50 ~] # head-n 30 all.sql | grep 'CHANGE MASTER TO'-- CHANGE MASTER TO MASTER_LOG_FILE='mysql-bin.000010', MASTER_LOG_POS=112; [root@192.168.0.50 ~] #

(4) copy the backup to server03 and server04, namely 192.168.0.60 and 192.168.0.70

Scp all.sql server03:/data/scp all.sql server04:/data/

(5) Import backup to server03 and execute replication related commands

Mysql

< /data/all.sql mysql>

CHANGE MASTER TO MASTER_HOST='192.168.0.50',MASTER_USER='repl', MASTER_PASSWORD='123456',MASTER_LOG_FILE='mysql-bin.000010',MASTER_LOG_POS=112;Query OK, 0 rows affected (0.02 sec) mysql > start slave;Query OK, 0 rows affected (0.01 sec) mysql >

Check the replication status (you can see that the replication was successful):

[root@192.168.0.60 ~] # mysql-e 'show slave status\ G' | egrep' Slave_IO | Slave_SQL' Slave_IO_State: Waiting for master to send event Slave_IO_Running: Yes Slave_SQL_Running: Yes [root@192.168.0.60 ~] #

(6) set up a replication environment on server04 (192.168.0.70), and operate as above.

Mysql

< /data/all.sql mysql>

CHANGE MASTER TO MASTER_HOST='192.168.0.50',MASTER_USER='repl', MASTER_PASSWORD='123456',MASTER_LOG_FILE='mysql-bin.000010',MASTER_LOG_POS=112;Query OK, 0 rows affected (0.07 sec) mysql > start slave;Query OK, 0 rows affected (0.00 sec) mysql >

View replication status:

[root@192.168.0.70 ~] # mysql-e 'show slave status\ G' | egrep' Slave_IO | Slave_SQL' Slave_IO_State: Waiting for master to send event Slave_IO_Running: Yes Slave_SQL_Running: Yes [root@192.168.0.70 ~] #

(7) two slave servers set read_only (read service is provided from the library, so it is not written into the configuration file because slave can be upgraded to master at any time)

[root@192.168.0.60 ~] # mysql-e 'set global read_only=1' [root@192.168.0.60 ~] # [root@192.168.0.70 ~] # mysql-e' set global read_only=1' [root@192.168.0.70 ~] #

(8) create a monitoring user (executed on master, that is, 192.168.0.50):

Mysql > grant all privileges on *. * to 'root'@'192.168.0.%' identified by' 123456The query OK, 0 rows affected (0.00 sec) mysql > flush privileges;Query OK, 0 rows affected (0.01 sec) mysql >

The entire cluster environment has been built here, and all that is left is to configure the MHA software.

5. Configure MHA

(1) create a working directory for MHA, and create relevant configuration files (there are sample configuration files in the unzipped directory of the package).

[root@192.168.0.20 ~] # mkdir-p / etc/masterha [root@192.168.0.20 ~] # cp mha4mysql-manager-0.53/samples/conf/app1.cnf / etc/masterha/ [root@192.168.0.20 ~] #

Modify the app1.cnf configuration file. The contents of the modified file are as follows (note that the comments in the configuration file need to be removed. I am here to explain clearly):

[root@. ~] # / etc/masterha/=/var/log/masterha/app1.log manager_log=/var/log/masterha/app1/manager.log master_binlog_dir=/data/mysql master_ip_failover_script= / usr/local/bin/master_ip_failover master_ip_online_change_script= / usr/local/bin/master_ip_online_change password= user== remote _ workdir=/tmp repl_password= repl_user=repl report_script=/usr/local/send_report secondary_check_script= / usr/local/bin/masterha_secondary_check-s server03-s server02 shutdown_script= ssh_user=root =. =. = check_repl_delay= =. =. ~] #

(2) set the clearing method of relay log (on each slave node):

[root@192.168.0.60 ~] # mysql-e 'set global relay_log_purge=0' [root@192.168.0.70 ~] # mysql-e' set global relay_log_purge=0'

Note:

In the process of switching, MHA depends on the relevant information of relay log in the recovery process of the library, so here you need to set the automatic removal of relay log to OFF, which adopts the method of manually clearing relay log. By default, the relay log from the server is automatically deleted after the SQL thread finishes execution. However, in a MHA environment, these relay logs may be used when restoring other slave servers, so you need to disable the automatic deletion of relay logs. Periodically clear the relay log to take into account replication latency. Under the ext3 file system, deleting large files takes a certain amount of time, which can lead to serious replication delays. To avoid replication latency, you need to create a hard link for the relay log temporarily, because it is fast to delete large files through hard links on linux systems. (in mysql databases, hard links are usually used to delete large tables.)

The MHA node includes the pure_relay_logs command tool, which creates a hard link for the relay log, executes SET GLOBAL relay_log_purge=1, and waits a few seconds for the SQL thread to switch to the new relay log before executing SET GLOBAL relay_log_purge=0.

The parameters of the pure_relay_logs script are as follows:

-- user mysql user name-- password mysql password-- port port number-- workdir specifies the location where the hard link of the relay log is created. The default is / var/tmp. Since the creation of hard link files in different partitions of the system will fail, the specific location of the hard link needs to be executed. After successful execution of the script, Hard-linked relay log files are deleted-disable_relay_log_purge by default, if relay_log_purge=1, the script will clean up nothing and exit automatically. By setting this parameter, relay_log_purge will be set to 0 in the case of relay_log_purge=1. After cleaning up the relay log, finally set the parameter to OFF.

(3) set up periodic cleaning relay scripts (two slave servers)

[root@192.168.0.60] # cat purge_relay_log.sh #! / bin/bashuser=rootpasswd=123456port=3306log_dir='/data/masterha/log'work_dir='/data'purge='/usr/local/bin/purge_relay_logs'if [!-d $log_dir] then mkdir $log_dir-pfi$purge-- user=$user-- password=$passwd-- disable_relay_log_purge-- port=$port-- workdir=$work_dir > $log_dir/purge_relay_logs.log 2 > & 1 [root@192.168.0.60] #

Add to crontab periodic execution

[root@192.168.0.60] # crontab-l04 * / bin/bash / root/purge_relay_ log.sh [root @ 192.168.0.60] #

The purge_relay_logs script deletes the relay log without blocking the SQL thread. Let's do it manually and see what happens.

[root@192.168.0.60] # purge_relay_logs-- user=root-- password=123456-- port=3306-disable_relay_log_purge-- workdir=/data/2014-04-20 15:47:24: purge_relay_logs script started. Found relay_log.info: / data/mysql/relay-log.info Removing hard linked relay log files server03-relay-bin* under / data/.. Done. Current relay log file: / data/mysql/server03-relay-bin.000002 Archiving unused relay log files (up to / data/mysql/server03-relay-bin.000001)... Creating hard link for / data/mysql/server03-relay-bin.000001 under / data//server03-relay-bin.000001.. Ok. Creating hard links for unused relay log files completed. Executing SET GLOBAL relay_log_purge=1; FLUSH LOGS; sleeping a few seconds so that SQL thread can delete older relay log files (if it keeps up); SET GLOBAL relay_log_purge=0;. Ok. Removing hard linked relay log files server03-relay-bin* under / data/.. Done.2014-04-20 15:47:27: All relay log purging operations succeeded. [root@192.168.0.60 ~] #

6. Check SSH configuration

Check the SSH connection status of MHA Manger to all MHA Node:

[root@192.168.0.20] # masterha_check_ssh-- conf=/etc/masterha/app1.cnf Sun Apr 20 17:17:39 2014-[warning] Global configuration file / etc/masterha_default.cnf not found. Skipping.Sun Apr 20 17:17:39 2014-[info] Reading application default configurations from / etc/masterha/app1.cnf..Sun Apr 20 17:17:39 2014-[info] Reading server configurations from / etc/masterha/app1.cnf..Sun Apr 20 17:17:39 2014-[info] Starting SSH connection tests..Sun Apr 20 17:17:40 2014-[debug] Sun Apr 20 17:17:39 2014-[debug] Connecting via SSH from root@192.168.0. 50 (192.168.0.50 to root@192.168.0.60 22) Sun Apr 20 17:17:39 2014-[debug] ok.Sun Apr 20 17:17:39 2014-[debug] Connecting via SSH from root@192.168.0.50 (192.168.0.50 debug 22) to root@192.168.0.70 (192.168.0.70 debug 22) .Sun Apr 20 17:17 39 2014-[debug] ok.Sun Apr 20 17:17:40 2014-[debug] Sun Apr 20 17:17:40 2014-[debug] Connecting via SSH from root@192.168.0.60 (192.168.0.60 Sun Apr 22) to root@192.168.0.50 (192.168.0.50 debug 22) .Sun Apr 20 17:17:40 2014-[debug] ok.Sun Apr 20 17:17:40 2014-[debug] Connecting Via SSH from root@192.168.0.60 (192.168.0.60 to root@192.168.0.70 22) to root@192.168.0.70 (192.168.0.70 Sun Apr 22).. Sun Apr 20 17:17:40 2014-[debug] ok.Sun Apr 20 17:17:41 2014-[debug] Sun Apr 20 17:17:40 2014-[debug] Connecting via SSH from root@192.168.0.70 (192.168.0.70 Sun Apr 22) to root@192 .168.0.50 (192.168.0.50 Sun Apr 22).. Sun Apr 20 17:17:40 2014-[debug] ok.Sun Apr 20 17:17:40 2014-[debug] Connecting via SSH from root@192.168.0.70 (192.168.0.70 ok.Sun Apr 22) to root@192.168.0.60 (192.168.0.60 ok.Sun Apr 22).. Sun Apr 20 17:17:41 2014-[debug] ok.Sun Apr 20 17: 17:41 2014-[info] All SSH connection tests passed successfully.

You can see that the ssh authentication of each node is ok.

7. Check the state of the entire replication environment.

View the status of the entire cluster through the masterha_check_repl script

[root@192.168.0.20] # masterha_check_repl-- conf=/etc/masterha/app1.cnfSun Apr 20 18:36:55 2014-[info] Checking replication health on 192.168.0.60..Sun Apr 20 18:36:55 2014-[info] ok.Sun Apr 20 18:36:55 2014-[info] Checking replication health on 192.168.0.70..Sun Apr 20 18:36:55 2014-[info] ok.Sun Apr 20 18 36:55 2014-[info] Checking master_ip_failover_script status:Sun Apr 20 18:36:55 2014-[info] / usr/local/bin/master_ip_failover-- command=status-- ssh_user=root-- orig_master_host=192.168.0.50-- orig_master_ip=192.168.0.50-- orig_master_port=3306 Bareword "FIXME_xxx" not allowed while "strict subs" in use at / usr/local/bin/master_ip_failover line 88. Execution of / usr/local/bin/master_ip_failover aborted due to compilation errors.Sun Apr 20 18:36:55 2014-[error] [/ usr/local/share/perl5/MHA/MasterMonitor.pm] Ln214] Failed to get master_ip_failover_script status with return code 255:0.Sun Apr 20 18:36:55 2014-[error] [/ usr/local/share/perl5/MHA/MasterMonitor.pm, ln383] Error happend on checking configurations. At / usr/local/bin/masterha_check_repl line 48Sun Apr 20 18:36:55 2014-[error] [/ usr/local/share/perl5/MHA/MasterMonitor.pm, ln478] Error happened on monitoring servers.Sun Apr 20 18:36:55 2014-[info] Got exit code 1 (Not master dead) .MySQL Replication Health is NOT OK!

I found the final conclusion that my copy was not ok's. But the above information clearly said to be normal, I also went into the database to check. I've been walking in a pit here. Has been entangled, and then inadvertently found Huoding note of the blog, this just know the reason, the original Failover two ways: one is the virtual IP address, the other is the global configuration file. MHA does not limit which method to use, but allows users to choose the way in which virtual IP addresses involve other software, such as keepalive software, and modify the script master_ip_failover. (I didn't report this error until I modified the script. I was half dead if I didn't understand perl. I bought a watch last year)

If the following error is found:

Can't exec "mysqlbinlog": No such file or directory at / usr/local/share/perl5/MHA/BinlogManager.pm line 99.mysqlbinlog version not testing mysql connection and privileges..sh: mysql: command not found

The solution is as follows: add soft connections (all nodes)

Ln-s / usr/local/mysql/bin/mysqlbinlog / usr/local/bin/mysqlbinlogln-s / usr/local/mysql/bin/mysql / usr/local/bin/mysql

So note the option master_ip_failover_script= / usr/local/bin/master_ip_failover for the time being. Turn this option on later after introducing keepalived and modifying the script.

[root@192.168.0.20 ~] # grep master_ip_failover / etc/masterha/app1.cnf#master_ip_failover_script= / usr/local/bin/master_ip_ failure [root @ 192.168.0.20 ~] #

Check the status again:

Sun Apr 20 18:46:08 2014-[info] Checking replication health on 192.168.0.60..Sun Apr 20 18:46:08 2014-[info] ok.Sun Apr 20 18:46:08 2014-[info] Checking replication health on 192.168.0.70..Sun Apr 20 18:46:08 2014-[info] ok.Sun Apr 20 18:46:08 2014-[warning] master_ip_failover_script is not defined.Sun Apr 20 18:46: 08 2014-[warning] shutdown_script is not defined.Sun Apr 20 18:46:08 2014-[info] Got exit code 0 (Not master dead). MySQL Replication Health is OK.

There are no obvious errors, only two warnings, and the copy appears to be normal.

8. Check the status of MHA Manager:

View the status of Manager through the master_check_status script:

[root@192.168.0.20] # masterha_check_status-- conf=/etc/masterha/app1.cnfapp1 is stopped (2:NOT_RUNNING). [root@192.168.0.20 ~] #

Note: if normal, "PING_OK" will be displayed, otherwise "NOT_RUNNING" will be displayed, which means that MHA monitoring is not turned on.

9. Enable MHA Manager monitoring

[root@192.168.0.20] # nohup masterha_manager-conf=/etc/masterha/app1.cnf-remove_dead_master_conf-ignore_last_failover

< /dev/null >

/ var/log/masterha/app1/manager.log 2 > & 1 & [1] 30867 [root@192.168.0.20 ~] #

Introduction of startup parameters:

-- remove_dead_master_conf this parameter means that when a master-slave switch occurs, the ip of the old master library will be removed from the configuration file.

-- location of manger_log logs

-- ignore_last_failover by default, if MHA detects continuous downtime and the interval between two outages is less than 8 hours, Failover will not be performed. This restriction is to avoid ping-pong effect. This parameter means that the file generated by the last MHA trigger switch is ignored. By default, after the MHA switch occurs, the app1.failover.complete file will be generated in the log directory, that is, the / data I set above. The switch will not be triggered if the file is found to exist in the directory next time, unless the file is deleted after the first switch. For convenience, it is set to-- ignore_last_failover here.

Check whether MHA Manager monitoring is normal:

[root@192.168.0.20] # masterha_check_status-- conf=/etc/masterha/app1.cnfapp1 (pid:20386) is running (0:PING_OK), master:192.168.0.50 [root@192.168.0.20 ~] #

You can see that it is already under monitoring, and the host of master is 192.168.0.50.

10. View the startup log

[root@192.168.0.20] # tail-N20 / var/log/masterha/app1/manager.logSun Apr 20 19:12:01 2014-[info] Connecting to root@192.168.0.70 (192.168.0.70 var/log/masterha/app1/manager.logSun Apr 22). Checking slave recovery environment settings.. Opening / data/mysql/relay-log.info... Ok. Relay log found at / data/mysql, up to server04-relay-bin.000002 Temporary relay log file is / data/mysql/server04-relay-bin.000002 Testing mysql connection and privileges.. Done. Testing mysqlbinlog output.. Done. Cleaning up test file (s).. Done.Sun Apr 20 19:12:01 2014-[info] Slaves settings check done.Sun Apr 20 19:12:01 2014-[info] 192.168.0.50 (current master) +-- 192.168.0.60 +-- 192.168.0.70Sun Apr 20 19:12:01 2014-[warning] master_ip_failover_script is not defined.Sun Apr 20 19:12:01 2014-[warning] shutdown_script is not defined.Sun Apr 20 19:12:01 2014 -[info] Set master ping interval 1 seconds.Sun Apr 20 19:12:01 2014-[info] Set secondary check script: / usr/local/bin/masterha_secondary_check-s server03-s server02-- user=root-- master_host=server02-- master_ip=192.168.0.50-- master_port=3306Sun Apr 20 19:12:01 2014-[info] Starting ping health check on 192.168.0.50 (192.168.0.50 seconds.Sun Apr 3306) .Sun Apr 20 19:12:01 2014-[info] Ping (SELECT) succeeded Waiting until MySQL doesn't respond.. [root@192.168.0.20 ~] #

Where "Ping (SELECT) succeeded, waiting until MySQL doesn't respond.." Which means the whole system is already monitoring.

11. Turn off MHA Manage monitoring

Turning it off is simple, using the masterha_stop command.

[root@192.168.0.20] # masterha_stop-- conf=/etc/masterha/app1.cnfStopped app1 successfully. [1] + Exit 1 nohup masterha_manager-- conf=/etc/masterha/app1.cnf-- remove_dead_master_conf-- ignore_last_failover-- manager_log=/data/ mamanager.log [root @ 192.168.0.20 ~] #

twelve。 Configure VIP

Vip can be configured in two ways, one is to manage the floating of the virtual ip through keepalived, and the other is to start the virtual ip by script (that is, no keepalived or heartbeat-like software is required).

The method of managing virtual ip,keepalived configuration in 1.keepalived mode is as follows:

(1) download the software and install it (two master, one master to be exact, the other an alternative master, slave before switching):

[root@192.168.0.50 ~] # wget http://www.keepalived.org/software/keepalived-1.2.12.tar.gz

Tar xf keepalived-1.2.12.tar.gz cd keepalived-1.2.12./configure-- prefix=/usr/local/keepalivedmake & & make installcp / usr/local/keepalived/etc/rc.d/init.d/keepalived / etc/init.d/cp / usr/local/keepalived/etc/sysconfig/keepalived / etc/sysconfig/mkdir / etc/keepalivedcp / usr/local/keepalived/etc/keepalived/keepalived.conf / etc/keepalived/cp / usr/local/keepalived/sbin/keepalived / usr/sbin/

(2) configure the configuration file of keepalived, configure on master (192.168.0.50)

[root@192.168.0.50 ~] # cat / etc/keepalived/keepalived.conf! Configuration File for keepalivedglobal_defs {notification_email {saltstack@163.com} notification_email_from dba@dbserver.com smtp_server 127.0.0.1 smtp_connect_timeout 30 router_id MySQL-HA} vrrp_instance VI_1 {state BACKUP interface eth2 virtual_router_id 51 priority 150 advert_int 1 nopreempt authentication {auth_type PASS auth_pass 1111} virtual_ipaddress {192.168 .0.88}} [root@192.168.0.50 ~] #

Where router_id MySQL HA means to set the name of the keepalived group, bind the virtual ip of 192.168.0.88 to the eth2 network card of the host, and set the state to backup mode, set the mode of keepalived to non-preemptive mode (nopreempt), and priority 150indicates that the priority is set to 150. The following configurations are slightly different, but they all mean the same thing.

Configure on candidate master (192.168.0.60)

[root@192.168.0.60 ~] # cat / etc/keepalived/keepalived.conf! Configuration File for keepalivedglobal_defs {notification_email {saltstack@163.com} notification_email_from dba@dbserver.com smtp_server 127.0.0.1 smtp_connect_timeout 30 router_id MySQL-HA} vrrp_instance VI_1 {state BACKUP interface eth2 virtual_router_id 51 priority 120 advert_int 1 nopreempt authentication {auth_type PASS auth_pass 1111} virtual_ipaddress {192.168 .0.88}} [root@192.168.0.60 ~] #

(3) start the keepalived service, start and view the log on master

[root@192.168.0.50 ~] # / etc/init.d/keepalived startStarting keepalived: [OK] [root@192.168.0.50 ~] # tail-f / var/log/messagesApr 20 20:22:16 192 Keepalived_healthcheckers [15334]: Opening file'/ etc/keepalived/keepalived.conf'.Apr 20 20:22:16 192 Keepalived_healthcheckers [15334]: Configuration is using: 7231 BytesApr 20 20:22:16 192 kernel: IPVS: Connection hash table configured (size=4096 Memory=64Kbytes) Apr 20 20:22:16 192 kernel: IPVS: ipvs loaded.Apr 20 20:22:16 192 Keepalived_healthcheckers [15334]: Using LinkWatch kernel netlink reflector...Apr 20 20:22:19 192 Keepalived_vrrp [15335]: VRRP_Instance (VI_1) Transition to MASTER STATEApr 20 20:22:20 192 Keepalived_vrrp [15335]: VRRP_Instance (VI_1) Entering MASTER STATEApr 20 20:22:20 192 Keepalived_vrrp [15335]: VRRP_Instance (VI_1) setting protocol VIPs.Apr 20 20:22:20 192 Keepalived_vrrp [15335]: VRRP_Instance (VI_1) Sending gratuitous ARPs on eth2 for 192.168.0.88Apr 20 20:22:20 192 Keepalived_healthcheckers [15334]: Netlink reflector reports IP 192.168.0.88 addedApr 20 20:22:25 192 Keepalived_vrrp [15335]: VRRP_Instance (VI_1) Sending gratuitous ARPs on eth2 for 192.168.0.88

It is found that the virtual ip 192.168.0.88 has been bound to the network card eth2.

(4) check the binding situation

[root@192.168.0.50 ~] # ip addr | grep eth23: eth2: mtu 1500 qdisc pfifo_fast state UP qlen 1000 inet 192.168.0.50 brd 24 brd 192.168.0.255 scope global eth2 inet 192.168.0.88 scope global eth2 [root@192.168.0.50 ~] #

Start the keepalived service on another server, candidate master, and observe

[root@192.168.0.60 ~] # / etc/init.d/keepalived start Tail-f / var/log/messagesStarting keepalived: [OK] Apr 20 20:26:18 192 Keepalived_vrrp [9472]: Registering gratuitous ARP shared channelApr 20 20:26:18 192 Keepalived_vrrp [9472]: Opening file'/ etc/keepalived/keepalived.conf'.Apr 20 20:26:18 192 Keepalived_vrrp [9472]: Configuration is using: 62976 BytesApr 20 20:26:18 192 Keepalived_vrrp [9472]: Using LinkWatch Kernel netlink reflector...Apr 20 20:26:18 192 Keepalived_vrrp [9472]: VRRP_Instance (VI_1) Entering BACKUP STATEApr 20 20:26:18 192 Keepalived_vrrp [9472]: VRRP sockpool: [ifindex (3) Proto (112), unicast (0) Fd (10Jet 11)] Apr 20 20:26:18 192 Keepalived_healthcheckers [9471]: Netlink reflector reports IP 192.168.80.138 addedApr 20 20:26:18 192 Keepalived_healthcheckers [9471]: Netlink reflector reports IP 192.168.0.60 addedApr 20 20:26:18 192 Keepalived_healthcheckers [9471]: Netlink reflector reports IP fe80::20c:29ff:fe9d:6a9e addedApr 20 20:26:18 192 Keepalived_healthcheckers [9471]: Netlink reflector reports IP fe80::20c:29ff:fe9d:6aa8 addedApr 20 20:26 18 192 Keepalived_healthcheckers [9471]: Registering Kernel netlink reflectorApr 20 20:26:18 192 Keepalived_healthcheckers [9471]: Registering Kernel netlink command channelApr 20 20:26:18 192 Keepalived_healthcheckers [9471]: Opening file'/ etc/keepalived/keepalived.conf'.Apr 20 20:26:18 192 Keepalived_healthcheckers [9471]: Configuration is using: 7231 BytesApr 20 20:26:18 192 kernel: IPVS: Registered protocols (TCP) UDP, AH, ESP) Apr 20 20:26:18 192 kernel: IPVS: Connection hash table configured (size=4096, memory=64Kbytes) Apr 20 20:26:18 192 kernel: IPVS: ipvs loaded.Apr 20 20:26:18 192 Keepalived_healthcheckers [9471]: Using LinkWatch kernel netlink reflector...

From the information above, you can see that keepalived has been configured successfully.

Note:

The keepalived of the above two servers is set to BACKUP mode, and the two modes in keepalived are master- > backup mode and backup- > backup mode. There is a big difference between the two models. In master- > backup mode, once the master database is down, the virtual ip will automatically drift to the slave database. When the master database is repaired and keepalived is started, the virtual ip will be preempted, even if the non-preemptive mode (nopreempt) is set to preempt ip. In backup- > backup mode, when the master database goes down, the virtual ip will automatically drift to the slave database. When the original master database is restored and the keepalived service is started, it will not preempt the new master virtual ip, even if the priority is higher than that of the slave database. In order to reduce the number of ip drifts, the repaired main library is usually used as a new backup library.

(5) MHA introduces keepalived (stop keepalived through MHA when the MySQL service process dies):

To bring the keepalived service into MHA, all we need to do is modify the script file master_ip_failover where the switch is triggered, and add to the script the handling of keepalived in the event of a master outage.

Edit the script / usr/local/bin/master_ip_failover, modified as follows, I am not familiar with perl, so I fully post the script here (operation on the main library, 192.168.0.50).

The content of the modified script after MHA Manager modification is as follows (resources are relatively few):

Warnings FATAL = > Getopt::, = = >\, = >\ & (eq | | eq = & = (eq &\ @\ Usage: master_ip_failover-- command=start | stop | stopssh | status-- orig_master_host=host-- orig_master_ip=ip-- orig_master_port=port-- new_master_host=host-- new_master_ip=ip-- new_master_port=port\ n

Now that the script has been modified, we now open the parameters mentioned above, and then check the cluster status to see if there is an error.

[root@192.168.0.20 ~] # grep 'master_ip_failover_script' / etc/masterha/app1.cnfmaster_ip_failover_script= / usr/local/bin/master_ip_ fail over [root @ 192.168.0.20 ~] #

[root@192.168.0.20] # masterha_check_repl-- conf=/etc/masterha/app1.cnf Sun Apr 20 23:10:01 2014-[info] Slaves settings check done.Sun Apr 20 23:10:01 2014-[info] 192.168.0.50 (current master) +-- 192.168.0.60 +-- 192.168.0.70Sun Apr 20 23:10:01 2014-[info] Checking replication health on 192.168.0.60..Sun Apr 20 23:10:01 2014-[info] ok.Sun Apr 20 23:10:01 2014-[info] Checking replication health on 192.168.0.70..Sun Apr 20 23:10:01 2014-[info] ok.Sun Apr 20 23:10:01 2014-[info] Checking master_ip_failover_script status:Sun Apr 20 23:10:01 2014-[info] / usr/local/bin/master_ip_failover-- command=status-- ssh_user=root Orig_master_host=192.168.0.50-- orig_master_ip=192.168.0.50-- orig_master_port=3306 Sun Apr 20 23:10:01 2014-[info] OK.Sun Apr 20 23:10:01 2014-[warning] shutdown_script is not defined.Sun Apr 20 23:10:01 2014-[info] Got exit code 0 (Not master dead) .MySQL Replication Health is OK.

You can see that the report has not been wrong. Ha ha

The content added or modified by / usr/local/bin/master_ip_failover means that when the master database fails, the MHA switch will be triggered, and the MHA Manager will stop the keepalived service on the master database and trigger the virtual ip to drift to the alternate slave database to complete the switch. Of course, you can introduce a script into keepalived, which monitors whether mysql is running properly, and if not, it is called to kill the keepalived process.

two。 Manage VIP through scripting. Here is the modification / usr/local/bin/master_ip_failover, which can also be done in other languages, such as php. Failover written in php scripts is not covered here. After the modification is completed, the content is as follows, and if you use a script to manage vip, you need to manually bind a vip on the master server (I found that the modification has a sense of perl. Am I fit to learn Perl? ^ _ ^)

[root@192.168.0.50] # / sbin/ifconfig eth2:1 192.168.0.88 Universe 24

Through the script to maintain the vip test I will not explain here, children's shoes self-test, the script is as follows (test passed)

To prevent brain fissure, it is recommended that production environments use scripts to manage virtual ip, rather than using keepalived. At this point, the basic MHA cluster has been configured. The next step is the actual testing. Run some tests to see exactly how MHA works. Next, we will automatically failover from MHA, we will manually failover, online switch three ways to introduce the work of MHA.

one。 Automatic Failover (MHA Manager must be started first, otherwise it cannot be switched automatically, of course, manual switching does not need to enable MHA Manager monitoring. Children's shoes, please refer to the front to start MHA Manager)

Post the test environment again, the article is too long, I am confused.

Role ip address hostname server_id type Monitor host 192.168.0.20 server01-Monitoring replication group Master 192.168.0.50 server02 1 writes Candicate master 192. 168.0.60 server03 2 read Slave 192.168.0.70 server04 3 read

The steps for automatic failover simulation testing are as follows.

(1) use sysbench to generate test data (use yum to quickly install)

Yum install sysbench-y

The sysbench data is generated on the main database (192.168.0.50), and the sbtest table is generated under the sbtest library, with a total of 100W records.

[root@192.168.0.50] # sysbench-test=oltp-- oltp-table-size=1000000-- oltp-read-only=off-- init-rng=on-- num-threads=16-- max-requests=0-- oltp-dist-type=uniform-- max-time=1800-- mysql-user=root-- mysql-socket=/tmp/mysql.sock-- mysql-password=123456-- db-driver=mysql-- mysql-table-engine=innodb-- oltp-test-mode=complex prepare

(2) stop the slave sql thread to simulate master-slave delay. (192.168.0.60)

Mysql > stop slave io_thread;Query OK, 0 rows affected (0.08 sec) mysql >

On the other slave, we didn't stop the io thread, so we continue to receive logs.

(3) simulated sysbench stress test.

The stress test was carried out on the main library (192.168.0.50) for 3 minutes, resulting in a large amount of binlog.

[root@192.168.0.50] # sysbench-- test=oltp-- oltp-table-size=1000000-- oltp-read-only=off-- init-rng=on-- num-threads=16-- max-requests=0-- oltp-dist-type=uniform-- max-time=180-- mysql-user=root-- mysql-socket=/tmp/mysql.sock-- mysql-password=123456-db-driver=mysql-- mysql-table-engine=innodb-- oltp-test-mode=complex run sysbench 0.4.12: multi-threaded system evaluation benchmarkRunning the test with following options:Number Of threads: 16Initializing random number generator from timer.Doing OLTP test.Running mixed OLTP testUsing Uniform distributionUsing "BEGIN" for starting transactionsUsing auto_inc on the id columnThreads startedflowers time limit exceeded Exiting... (last message repeated 15 times) Done.OLTP test statistics: queries performed: read: 15092 write: 5390 other: 2156 total: 22638 transactions: 1078 (5.92 per sec.) Deadlocks: 0 (0.00 per sec.) Read/write requests: 20482 (112.56 per sec.) Other operations: 2156 per sec.) Test execution summary: total time: 181.9728s total number of events: 1078 total time taken by event execution: 2910.4518 per-request statistics: min: 934.29ms avg: 2699.86ms max: 7679.95ms approx. 95 percentile: 4441.47msThreads fairness: events (avg/stddev): 67.3750 execution time 1.49 (avg/stddev): 181.9032 Compact 0.11

(4) start the IO thread on slave (192.168.0.60) to catch up with the binlog that lags behind master.

Mysql > start slave io_thread; Query OK, 0 rows affected (0.00 sec) mysql >

(5) kill the mysql process of the main library, simulate the failure of the main library, and carry out automatic failover operation.

[root@192.168.0.50] # pkill-9 mysqld

(6) check the MHA switching log to understand the whole switching process, and view the log on 192.168.0.20:

View Code

See the last Master failover to 192.168.0.60 (192.168.0.60 3306) completed successfully. It means that the alternative master is now in place.

From the output above, you can see the entire switching process of MHA, which includes the following steps:

1. Profile check phase, which checks the entire cluster profile configuration

two。 Master handling of outages, which includes virtual ip removal and host shutdown (which I haven't implemented here yet and need to be studied)

3. Copy the relay log of the difference between dead maste and the latest slave, and save it to the specific directory of MHA Manger

4. Identify the slave with the latest updates

5. Apply binary log events saved from master (binlog events)

6. Upgrade a slave to a new master for replication

7. Make other slave connect to the new master for replication

Finally, start MHA Manger monitoring to see who is the master in the cluster (after switching, the monitoring stops. Is there something else you didn't get right? ) it became clear when I saw this sentence on the official website later.

Running MHA Manager from daemontoolsCurrently MHA Manager process does not run as a daemon. If failover completed successfully or the master process was killed by accident, the manager stops working. To run as a daemon, daemontool. Or any external daemon program can be used. Here is an example to run from daemontools. [root@192.168.0.20] # masterha_check_status-- conf=/etc/masterha/app1.cnfapp1 (pid:23971) is running (0:PING_OK), master:192.168.0.60 [root@192.168.0.20 ~] #

two。 Manual Failover (MHA Manager must not be running)

Manual failover means that MHA automatic switchover feature is not enabled in the business. When the master server fails, manually call MHA to perform the failover operation. Specific commands are as follows:

Note: if MHA manager detects a server without dead, it will report an error and end the failover:

Mon Apr 21 21:23:33 2014-[info] Dead Servers:Mon Apr 21 21:23:33 2014-[error] [/ usr/local/share/perl5/MHA/MasterFailover.pm, ln181] None of server is dead. Stop failover.Mon Apr 21 21:23:33 2014-[error] [/ usr/local/share/perl5/MHA/ManagerUtil.pm, ln178] Got ERROR: at / usr/local/bin/masterha_master_switch line 53

The manual switching command is as follows:

[root@192.168.0.20] # masterha_master_switch-master_state=dead-conf=/etc/masterha/app1.cnf-dead_master_host=192.168.0.50-dead_master_port=3306-new_master_host=192.168.0.60-new_master_port=3306-ignore_last_failover

The output will ask you if you want to switch:

View Code

The above simulates the operation process of manually promoting 192.168.0.60 to the main database in the case of mastery downtime.

three。 Switch online

In many cases, the existing primary server needs to be migrated to another server. For example, the hardware failure of the primary server, the need to rebuild the RAID control card, moving the primary server to a better performance server, and so on. Maintenance of the primary server results in performance degradation, resulting in downtime at least not being able to write data. In addition, blocking or killing a currently running session can lead to data inconsistencies between hosts. MHA provides fast switching and elegant blocking writes, a switching process that takes only 0. 5-2 seconds, during which data cannot be written. In many cases, blocking writes of 0.5-2s are acceptable. Therefore, switching the primary server does not require a scheduled allocation of the maintenance time window.

The general process of online switching of MHA:

1. Detect replication settings and determine the current primary server

two。 Identify the new primary server

3. Block writes to the current primary server

4. Waiting for all to catch up with replication from the server

5. Grant write to the new primary server

6. Reset the slave server

Note that the application architecture needs to consider the following two issues when switching online:

1. The problem of automatically identifying master and slave (master machines may switch) can basically be solved if you use the vip approach.

two。 The problem of load balancing (you can define the approximate read-write ratio, the load ratio that each machine can bear, which needs to be considered when a machine leaves the cluster)

In order to ensure the complete consistency of data and complete the handover in the fastest time, the online handover of MHA must meet the following conditions before the handover is successful, otherwise the handover will fail.

1. All slave IO threads are running

two。 All slave SQL threads are running

3. The Seconds_Behind_Master parameter in the output of all show slave status is less than or equal to running_updates_limit seconds. If running_updates_limit is not specified during the switch, then running_updates_limit is 1 second by default.

4. On the masterside, none of the updates take longer than running_updates_limit seconds through show processlist output.

The online switching steps are as follows:

First, stop MHA monitoring:

[root@192.168.0.20] # masterha_stop-- conf=/etc/masterha/app1.cnf

Secondly, carry on the online switching operation (simulate the operation of switching the main library online, the original main library 192.168.0.50 becomes slave,192.168.0.60 and upgrade to the new main library)

[root@192.168.0.20] # masterha_master_switch-conf=/etc/masterha/app1.cnf-master_state=alive-new_master_host=192.168.0.60-new_master_port=3306-orig_master_is_new_slave-running_updates_limit=10000

Finally, check the log to learn about the switching process. The output information is as follows:

View Code

The meaning of the parameters is:

-- adding this parameter to the orig_master_is_new_slave switch changes the original master into a slave node. If this parameter is not added, the original master will not start.

-- running_updates_limit=10000, in case of failover, if the candidate master has a delay, the mha handover cannot be successful. Plus this parameter indicates that the delay can be switched within this time range (in s), but the handover time is determined by the size of the relay log during recover.

Note: due to the need to call the master_ip_online_change script for online switching, but because the script is incomplete and needs to be modified by myself, I found that there was still a problem when I arrived at google. The variable new_master_password in the script could not be obtained, resulting in the failure of online switching, so the relevant hard coding was carried out and the root user password of mysql was directly assigned to the variable new_master_password. If any Daniel knows the reason, please give me some advice. This script can also manage vip. The script is posted below:

View Code

four。 Repair the down Master

Usually, after automatic switching, the original master may have been discarded. After the original master host is repaired, if the data is complete, you may want to use the original master as the slave of the new master library. At this time, we can use the MHA log of the automatic switching time at that time to complete the repair of the original master. The following is the command to extract the related logs:

[root@192.168.0.20 app1] # grep-I "All other slaves should start" manager.log Mon Apr 21 22:28:33 2014-[info] All other slaves should start replication from here. Statement should be: CHANGE MASTER TO MASTER_HOST='192.168.0.60', MASTER_PORT=3306, MASTER_LOG_FILE='mysql-bin.000022', MASTER_LOG_POS=506716, MASTER_USER='repl', MASTER_PASSWORD='xxx'; [root@192.168.0.20 app1] #

After obtaining the above information, you can perform change master to-related operations directly on the repaired master and re-use it as a slave library.

Finally, add the email sending script send_report, which can be used after asking a friend, as follows:

View Code

Finally, an example of sending an alarm after switching. Note that this is my follow-up test. Never mind that it is inconsistent with the ip in the above environment.

Summary:

At present, the high availability scheme can achieve the high availability of the database to some extent, such as MMM,heartbeat+drbd,Cluster introduced in the previous article. And percona's Galera Cluster and so on. These highly available software have their own advantages and disadvantages. When making the choice of highly available solutions, it mainly depends on the business and the requirements for data consistency. Finally, for the requirements of high availability of database and data consistency, it is recommended to use MHA architecture.

After reading the above MHA sharing introduction of MySQL, I hope it can bring some help to you in practice. Due to the limited space in this article, there will inevitably be deficiencies and areas that need to be supplemented. You can continue to pay attention to the industry information section and will update your industry news and knowledge regularly. If you need more professional answers, you can contact us on the official website for 24-hour pre-sales and after-sales to help you answer questions at any time.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.