In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-04-04 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >
Share
Shulou(Shulou.com)06/02 Report--
This article introduces the relevant knowledge of "RocketMQ DLedger multi-copy, that is, the principle of master-slave switching". In the operation of actual cases, many people will encounter such a dilemma, so let the editor lead you to learn how to deal with these situations. I hope you can read it carefully and be able to achieve something!
DLedger is based on raft protocol, so it naturally supports master-slave handover, that is, if the master node (Leader) fails, it will re-trigger the election master and re-elect a new master node in the cluster.
Master-slave synchronization in RocketMQ, slave nodes will not only synchronize data from the master node, but also synchronize metadata, including topic routing information, consumption progress, delay queue processing queue, consumer group subscription configuration and other information. How can the metadata be synchronized after master-slave switching? In particular, how much impact will it have on message consumption in the process of master-slave switching, and will messages be lost?
1. Detailed explanation of master-slave related methods in BrokerController
This section starts with the methods related to master-slave switching in BrokerController.
1.1 startProcessorByHa
BrokerController#startProcessorByHa
Private void startProcessorByHa (BrokerRole role) {if (BrokerRole.SLAVE! = role) {if (this.transactionalMessageCheckService! = null) {this.transactionalMessageCheckService.start ();}
It feels that the name of this method is relatively random, and the function of this method is to turn on the transaction state check processor, that is, when the node is the main node, turn on the corresponding transaction state check processor and initiate a transaction state review request for the PREPARE status message.
1.2 shutdownProcessorByHa
BrokerController#shutdownProcessorByHa
Private void shutdownProcessorByHa () {if (this.transactionalMessageCheckService! = null) {this.transactionalMessageCheckService.shutdown (true);}}
Turn off the transaction state retrace processor, which is called when the node changes from the master node to the slave node.
1.3 handleSlaveSynchronize
BrokerController#handleSlaveSynchronize
Private void handleSlaveSynchronize (BrokerRole role) {if (role = = BrokerRole.SLAVE) {/ / [@ 1] (https://my.oschina.net/u/1198) if (null! = slaveSyncFuture) {slaveSyncFuture.cancel (false);} this.slaveSynchronize.setMasterAddr (null) / / slaveSyncFuture = this.scheduledExecutorService.scheduleAtFixedRate (new Runnable () {[@ Override] (https://my.oschina.net/u/1162528) public void run () {try {BrokerController.this.slaveSynchronize.syncAll ();} catch (Throwable e) {log.error ("ScheduledTask SlaveSynchronize syncAll error.", e) }, 1000 * 3, 1000 * 10, TimeUnit.MILLISECONDS);} else {/ / @ 2 / / handle the slave synchronise if (null! = slaveSyncFuture) {slaveSyncFuture.cancel (false);} this.slaveSynchronize.setMasterAddr (null);}}
The main function of this method is to deal with the metadata synchronization of the slave node, that is, actively synchronizing the routing information, consumption progress, delay queue processing queue, consumption group subscription configuration and other information of the topic from the slave node to the master node.
Code @ 1: if the current node's role is a slave node:
If the last synchronized future is not empty, cancel first.
Then set the master address of slaveSynchronize to be empty. I do not know if you are like the author, there is a question, if the slave node, if the master address is set to empty, then how to synchronize metadata, then when will this value be set?
Start the timing synchronization task and synchronize the metadata from the master node every 10s.
Code @ 2: if the role of the current node is the primary node, cancel the timing synchronization task and set the address of the master to be empty.
1.4 changeToSlave
BrokerController#changeToSlave
Public void changeToSlave (int brokerId) {log.info ("Begin to change to slave brokerName= {} brokerId= {}", brokerConfig.getBrokerName (), brokerId); / / change the role brokerConfig.setBrokerId (brokerId= = 0? 1: brokerId); / / TO DO check / / @ 1 messageStoreConfig.setBrokerRole (BrokerRole.SLAVE); / / @ 2 / / handle the scheduled service try {this.messageStore.handleScheduleMessageService (BrokerRole.SLAVE) / / @ 3} catch (Throwable t) {log.error ("[MONITOR] handleScheduleMessageService failed when changing to slave", t);} / handle the transactional service try {this.shutdownProcessorByHa () / / @ 4} catch (Throwable t) {log.error ("[MONITOR] shutdownProcessorByHa failed when changing to slave", t);} / handle the slave synchronise handleSlaveSynchronize (BrokerRole.SLAVE) / / @ 5 try {this.registerBrokerAll (true, true, brokerConfig.isForceRegister ()) / / @ 6} catch (Throwable ignored) {} log.info ("Finish to change to slave brokerName= {} brokerId= {}", brokerConfig.getBrokerName (), brokerId);}
The Broker state changes to the slave node. The key implementation is as follows:
Set brokerId, and set it to 1 if the id of broker is 0. When using it, pay attention to planning the brokerId of the nodes in the cluster.
Set the status of broker to BrokerRole.SLAVE.
If it is a slave node, turn off the scheduled scheduling thread (processing the RocketMQ delay queue), and if it is the master node, start the thread.
Turn off the transaction status retrace processor.
The slave node needs to start the metadata synchronization processor, that is, start SlaveSynchronize to synchronize metadata from the master server regularly.
Immediately notify all nameserver in the cluster of the change in the status of the broker information.
1.5 changeToMaster
BrokerController#changeToMaster
Public void changeToMaster (BrokerRole role) {if (role = = BrokerRole.SLAVE) {return;} log.info ("Begin to change to master brokerName= {}", brokerConfig.getBrokerName ()); / / handle the slave synchronise handleSlaveSynchronize (role); / / @ 1 / / handle the scheduled service try {this.messageStore.handleScheduleMessageService (role) / / @ 2} catch (Throwable t) {log.error ("[MONITOR] handleScheduleMessageService failed when changing to master", t);} / / handle the transactional service try {this.startProcessorByHa (BrokerRole.SYNC_MASTER); / / @ 3} catch (Throwable t) {log.error ("[MONITOR] startProcessorByHa failed when changing to master", t) } / / if the operations above are totally successful, we change to master brokerConfig.setBrokerId (0); / / TO DO check / / @ 4 messageStoreConfig.setBrokerRole (role); try {this.registerBrokerAll (true, true, brokerConfig.isForceRegister ()) / / @ 5} catch (Throwable ignored) {} log.info ("Finish to change to master brokerName= {}", brokerConfig.getBrokerName ());}
This method is the processing logic of changing the Broker role from the slave node to the master node, and the key points of its implementation are as follows:
Turn off the metadata synchronizer because the primary node does not need to be synchronized.
Start the timing task processing thread.
Turn on the transaction status check processing thread.
Set brokerId to 0.
Send a heartbeat to nameserver immediately to inform the broker server of the current status.
So much for the core method of master-slave node state change, let's take a look at how to trigger the master-slave switch.
2. How to trigger the master-slave switch
From the previous article, we can know that RocketMQ DLedger is implemented based on the raft protocol. In this protocol, the primary node is elected and the cluster will be re-elected automatically after the primary node fails, and a new master node is generated through negotiation and voting, so as to achieve high availability.
BrokerController#initialize
If (messageStoreConfig.isEnableDLegerCommitLog ()) {DLedgerRoleChangeHandler roleChangeHandler = new DLedgerRoleChangeHandler (this, (DefaultMessageStore) messageStore); ((DLedgerCommitLog) ((DefaultMessageStore) messageStore). GetCommitLog (). GetdLedgerServer (). GetdLedgerLeaderElector (). AddRoleChangeHandler (roleChangeHandler);}
The above code snippet is intercepted from the initialize method of BrokerController. We can know that when Broker starts, if the multi-copy mechanism is enabled, that is, if the enableDLedgerCommitLog parameter is set to true, the roleChangeHandler event handler will be added to the cluster node selector, that is, the changed event handler will be sent by the node.
Next we will focus on DLedgerRoleChangeHandler.
2.1 Class Diagram
DLedgerRoleChangeHandler inherits from RoleChangeHandler, that is, the event handler after the node state changes. The above properties are very simple, here we will focus on ExecutorService executorService, event handling thread pool, but only one thread will be opened, so the events will be executed one by one.
Next, let's focus on the execution of the handle method.
2.2 handle master-slave state switching processing logic
DLedgerRoleChangeHandler#handle
Public void handle (long term, MemberState.Role role) {Runnable runnable = new Runnable () {public void run () {long start = System.currentTimeMillis (); try {boolean succ = true; log.info ("Begin handling broker role change term= {} role= {} currStoreRole= {}", term, role, messageStore.getMessageStoreConfig (). GetBrokerRole ()) Switch (role) {case CANDIDATE: / / @ 1 if (messageStore.getMessageStoreConfig (). GetBrokerRole ()! = BrokerRole.SLAVE) {brokerController.changeToSlave (dLedgerCommitLog.getId ());} break Case FOLLOWER: / / @ 2 brokerController.changeToSlave (dLedgerCommitLog.getId ()); break Case LEADER: / / @ 3 while (true) {if (! dLegerServer.getMemberState (). IsLeader ()) {succ = false; break } if (dLegerServer.getdLedgerStore () .getLedgerEndIndex () =-1) {break } if (dLegerServer.getdLedgerStore (). GetLedgerEndIndex () = = dLegerServer.getdLedgerStore (). GetCommittedIndex () & & messageStore.dispatchBehindBytes () = = 0) {break;} Thread.sleep } if (succ) {messageStore.recoverTopicQueueTable (); brokerController.changeToMaster (BrokerRole.SYNC_MASTER);} break; default: break } log.info ("Finish handling broker role change succ= {term= {} role= {} currStoreRole= {} cost= {}", succ, term, role, messageStore.getMessageStoreConfig (). GetBrokerRole (), DLedgerUtils.elapsed (start)) } catch (Throwable t) {log.info ("[MONITOR] Failed handling broker role change term= {} currStoreRole= {} cost= {}", term, role, messageStore.getMessageStoreConfig (). GetBrokerRole (), DLedgerUtils.elapsed (start), t);}}; executorService.submit (runnable);}
Code @ 1: if the current node state machine state is CANDIDATE, the Leader node is being initiated, and if the role of the server is not SLAVE, you need to change the state to SLAVE.
Code @ 2: if the current node state machine state is a FOLLOWER,broker node, it will be converted to a slave node.
Code @ 3: if the state machine status of the current node is Leader, the node is elected as Leader. Before switching to the Master node, you need to wait for the data appended by the current node to be submitted before changing the state to Master. The key implementation is as follows:
If ledgerEndIndex is-1, it means that the current node has not forwarded data yet, so it just jumps out of the loop without waiting.
If ledgerEndIndex is not-1, you must wait for the data to be submitted, that is, ledgerEndIndex equals committedIndex.
And you need to wait for the commitlog logs to be all forwarded to consumequeue, that is, the reputFromOffset in ReputMessageService is equal to the maxOffset in commitlog.
After waiting for the above conditions to be met, the state can be changed, the ConsumeQueue needs to be restored, the maxOffset corresponding to each queue is maintained, and then the broker role is changed to master.
After the above steps, the automatic switching of the broker master node can be completed in real time. Since the master-slave switch is not intuitive enough from a code point of view, I will give the flow chart of the master-slave switch below.
2.3 Master-Slave switching flow chart
Since it may not be intuitive enough from the point of view of the source code, this section gives its flow chart.
> Tip: the first half of the flowchart is described in this article on the design skills of source code analysis, RocketMQ integration and DLedger (multiple copies) for smooth upgrade.
3. Thoughts on some problems of master-slave switching.
I believe that after the above explanation, you should have a clearer understanding of the principle of master-slave switching. I believe readers will raise a question: will master-slave switching lose messages? will the progress of message consumption be lost, resulting in repeated consumption?
3.1 whether there is a risk of losing the progress of message consumption
First of all, because the synchronization of RocketMQ metadata, including the progress of message consumption, is regularly pulled from the slave server to the master server to update, there is a delay, the introduction of DLedger mechanism does not guarantee its consistency, DLedger only ensures the consistency of commitlog files.
When the master node goes down, each slave node will not synchronize the progress of message consumption. At the same time, message consumption continues, and consumers will continue to pull messages from the slave node for consumption, but the reported slave node will not necessarily become the new master node, so the consumption progress may be lost on the broker side. Of course, it will not be lost, because as long as the message consumer does not restart, the progress of message consumption will be stored in memory.
To sum up, the progress of message consumption may be lost on the broker side, and there is the possibility of repeated consumption, but it is not a big problem, because RocketMQ itself does not bear the possibility of repeated consumption.
3.2 whether there is a risk of loss of messages
The key to whether the message will be lost is whether the slave node with slow log replication can be elected as the master node. If the slave node lags behind the master node in a cluster, but when the master node goes down, if the slave node is elected as the new master node, it will be a disaster and data will be lost. The logic of whether one node votes for another node is described in detail in the 2.4.2 handleVote method of Leader selection for multiple copies of RocketMQ DLedger in source code analysis. Here I will show its core point again by screenshot:
It can be seen from the above that if the replication progress of the initiating voting node is smaller than its own, it will vote no.
It must be recognized by more than half of the nodes in the cluster, that is, the current replication progress of the ultimately elected master node must be larger than that of most slave nodes, and it will also be equal to the committed offset committed to the client. Therefore, the conclusion is that the news will not be lost.
This is the end of the introduction of this article, and finally throw out a thinking question to communicate and learn with you, which can be regarded as a summary review of DLedger multi-copy, that is, master-slave switch. I will give the answer in the form of a message or in the next article.
4. Thinking questions
For example, a DLedgr cluster with five nodes in a cluster. Leader Node: n0-broker-a folloer Node: N1 brokerafoule, N2, brokerphila, n3, brokerphila, n4, brokerphila.
Replication progress of slave nodes may be inconsistent, for example, n1-broker-a replication progress is 100n2-broker-a replication progress is 120n3-broker-a replication progress is 90 n4-broker-a load progress is 90
If the n0-broker-a node goes down at this time, the elector is triggered, and if N1 takes the lead in initiating the vote, the replication progress of N1 is greater than that of N1, and if you add your own vote, it is possible to become leader. Will the message be lost? Why?
This is the end of the content of "RocketMQ DLedger multi-copy, that is, the principle of master-slave switching". Thank you for your reading. If you want to know more about the industry, you can follow the website, the editor will output more high-quality practical articles for you!
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.