Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

SOFAJRaft | example analysis of SOFAChannel#8

2025-04-04 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Share

Shulou(Shulou.com)06/02 Report--

This article shows you the sample analysis of SOFAJRaft | SOFAChannel#8, which is concise and easy to understand, which will definitely brighten your eyes. I hope you can get something through the detailed introduction of this article.

Raft consensus algorithm

Raft is a consensus algorithm that allows multiple participants to agree on something: one thing, one conclusion. At the same time, the conclusion that has been reached is irrefutable. You can give an example of a bank account to explain the consensus algorithm: if a group of servers form a cluster to maintain the bank account system, if a Client sends a "deposit 100 yuan" instruction to the cluster, then when the cluster returns a successful reply, Client will certainly be able to find the 100 yuan successfully stored, even if the machine is unavailable, the 100 yuan account can not be tampered with. This is what the consensus algorithm is trying to achieve.

Compared with other consensus algorithms, Raft algorithm has the following different characteristics:

There can be at most one Leader in a Strong leader:Raft cluster, and logs can only be copied from Leader to Follower.

The Leader election:Raft algorithm uses the random election timeout to trigger the election to avoid the carving up of the ballot paper and ensure the smooth completion of the election.

Membership changes: respond to the membership or withdrawal of members of the cluster in a two-stage manner, which does not affect the external services of the cluster during this period

A typical application scenario of consensus algorithm is replication state machine. Client sends a series of commands to the replication state machine that can be executed on the state machine, and the consensus algorithm is responsible for copying these commands to other state machines in the form of Log, so that different state machines can get the same output as long as they execute these commands in exactly the same order. Therefore, it is necessary to use consensus algorithm to ensure that the content and order of the copied logs are consistent.

Cdn.nlark.com/yuque/0/2019/jpeg/307286/1567322241949-064625c5-2919-4853-8c2a-b9b8a62ff688.jpeg "> figure 1-replication state machine

SOFAJRaft

SOFAJRaft is a production-level high-performance Java implementation based on Raft algorithm, which supports MULTI-RAFT-GROUP. Application scenarios include Leader election, distributed lock service, highly reliable meta-information management, and distributed storage system.

Figure 2-SOFAJRaft structure

This diagram is the design of SOFAJRaft. Node represents a SOFAJRaft Server node. These boxes represent his internal modules. We still use the previous bank account system as an example to illustrate how each module of SOFAJRaft works.

When Client sends a "save 100CNY" command to SOFAJRaft, Node's Log enclosure first stores the command locally in the form of Log. At the same time, Replicator will copy the Log to other Node,Replicator. There are multiple Replicator as many as there are in the cluster, so concurrent log replication can be realized. When Node receives a "copy successful" response from more than half of the Follower in the cluster, the Log and the previous Log can be sent to the state machine in an orderly manner for execution. The state machine is implemented by the user, for example, the example we are giving now is the bank account system, so the state machine performs the lending operation of the account amount. If SOFAJRaft is used in other scenarios, the state machine will be executed in other ways.

Snapshot is a snapshot. A snapshot is a record of the current value of data. Leader has several functions in generating snapshots:

When a new Node joins the cluster, instead of relying solely on log replication and playback to keep the data consistent with Leader, you can skip the playback of a large number of logs by installing snapshots of Leader

Leader replacing Log replication with snapshots can reduce the amount of data on the network.

Replacing earlier Log with snapshots can save storage space.

Figure 3-user implementation is required: StateMachine, Client

SOFAJRaft requires users to implement two parts: StateMachine and Client.

Because SOFAJRaft is just a tool, its purpose is to help us reach a consensus within the cluster, and the specific business logic to reach a consensus needs to be defined by users themselves, and we define the part that users need to implement as StateMachine interfaces. For example, accounting system and distributed storage require users to implement different StateMachine logic. Client is also easy to understand, depending on the business, users need to define different message types and client processing logic.

Figure 4-user is required to implement some interfaces

With so much introduced earlier, we bring to today's topic: how to implement a distributed counter with SOFAJRaft?

Demand

Our requirement is actually very simple, in one sentence: to provide a Counter,Client, you can specify the stride for each count, or you can initiate a query at any time.

After a little analysis of this requirement, we translate it into specific functional points, which are mainly divided into three parts:

Implementation: Counter server, with counting function, the specific calculation formula is: Cn = Cn-1 + delta

Provide write service, write delta to trigger counter operation

Provides a read service to read the current Cn value

In addition, we have an optional requirement for availability, which requires a backup machine, and the read and write service cannot be unavailable.

System architecture

According to the functional requirements just analyzed, we design the architecture of 1.0. this architecture is very simple. A node Counter Server provides counting function and receives counting requests and query requests initiated by the client.

Figure 5-Architecture 1.0

But there are two problems in such architecture design: first, Server is a single point, once the Server node failure service is not available; second, the operation results are stored in memory, node failure will lead to data loss.

Figure 6-deficiency of Architecture 1.0: single Point

For the second question, let's optimize it and add a local file store. In this way, the data will be placed on the disk after each counter completes the operation, and when the node fails, we have to set up a new standby machine, copy the file data, and then take over the failed machine to provide external services. This solves the risk of data loss, but it also leads to another problem: disk IO is very frequent, and this cold standby mode will still lead to service unavailability for a period of time.

Figure 7-deficiency of Architecture 1.0: cold standby

So we propose architecture 2.0, which adopts the mode of cluster to provide services. We use three nodes to form a cluster, and one node provides services. When Server receives a write request from Client, Server calculates the result, and then copies the result to the other two machines. After receiving the successful response from all other nodes, Server returns the operation result to Client.

Figure 8-Architecture 2.0

But there is also a problem with such an architecture:

Which Server do we choose to act as Leader to provide services to the outside world?

When the Leader is not available, choose which one to replace it

When Leader processes write requests, it needs to wait until all nodes respond before responding to Client.

It is also important that we cannot guarantee that the data copied from Leader to Follower is orderly, so the data of the three nodes may be different at any one time.

To ensure that the order and content of the data are copied, there is an opportunity for the consensus algorithm to show its ability, so in the next 3.0 architecture, we use SOFAJRaft to help the implementation of the cluster.

Figure 8-Architecture 3.0: using SOFAJRaft

In the 3. 0 architecture, Counter Server uses SOFAJRaft to form a cluster, and the election of Leader and data replication are left to SOFAJRaft. In the timing diagram, we can see that the business logic of Counter has become as concise as that in Architecture 1.0.The work of maintaining data consistency is left to SOFAJRaft, so the gray part of the diagram is not aware of the business.

Figure 9-Architecture 3.0: sequence Diagram

In the 3.0architecture using SOFAJRaft, SOFAJRaft helps us complete the work of Leader election and data synchronization between nodes. In addition, SOFAJRaft only needs more than half of the nodes to respond, and no longer needs responses from all nodes in the cluster, which can further improve the efficiency of write request processing.

Figure 10-Architecture 3.0:SOFAJRaft implements Leader election and log replication

Use SOFAJRaft

So how do you use SOFAJRaft? As we said before, SOFAJRaft mainly exposes two places for us to implement, one is Cilent and the other is StateMachine, so our counter is going to do these two parts.

On Client, we need to define specific message types, and for different message types, we also need to implement the Processor of messages to deal with these messages, and then these messages are handed over to SOFAJRaft to complete the data synchronization within the cluster.

On StateMachine, we are going to implement the state machine to expose several interfaces to be implemented, the most important of which is the onApply interface, in which the request instructions of Cilent are calculated and converted into specific counter values. The onSnapshotSave and onSnapshotLoad interfaces are responsible for generating and loading snapshots.

Figure 11-Module relationship

The following diagram is the final implementation of the module diagram, in fact, it is already the product of the code implementation, there is no specific code posted here, because the code has been open source with our project. We implemented two message types, IncrementAndGetRequest and GetValueRequest, which correspond to write and read requests, respectively, because the responses to both requests are counter values, so we use the same ValueResponse. Two kinds of requests, so corresponding to two kinds of Processor:IncrementAndGetRequestProcessor and GetValueRequestProcessor, the state machine CounterStateMachine implements the three interfaces mentioned earlier, in addition to onLeaderStart and onLeaderStop, which are used to do some processing when the node becomes leader and disqualifies as leader. In this place, IncrementAndAddClosure is used in the processing of write requests, so that the response can be implemented through callback.

Figure 12-Class Diagram

Start up and run

Let's take a look at the whole startup process. First, let's take a look at the startup of the Follower node (of course, we don't know which node will be Leader before startup). Counter is used locally by three processes to simulate three nodes, using ports 8081, 8082, and 8083, respectively, and marking them as A, B, and C nodes.

Node A starts first, and then starts to send preVote requests to B and C, but at this time the other two nodes have not yet started, so node A fails to communicate, then wait, try again, and so on. During the waiting of A node after a communication failure, it suddenly received a preVote request from B node. After a series of check, it approved the preVote request and returned a successful response, then successfully responded to the vote request sent by B node, and then we can see that B node was successfully elected Leader. This is the starting and voting process of Follower A.

Figure 13-Follower Startup Lo

Let's take a look at the startup of node B. after startup, node B happens to be in a waiting gap between nodes A, so it does not receive preVote requests from other nodes, so it launches preVote requests to the other two nodes in an attempt to run. Next, it received a confirmation response from node A, and then node B initiated a vote request, and still received a response from node A. In this way, Node B received more than half of the votes in the cluster and was successfully elected (Node An and Node B themselves, up to 2max 3). In this process, the C node has not been started, but because An and B constitute more than half, so the consensus algorithm has been able to normal work.

Figure 14-Leader Startup Lo

In the process just now, we mentioned two keywords: preVote and vote, which are the two stages of the election. The reason for setting preVote is to deal with the situation of network partition. With regard to the election of SOFAJRaft, we have a special article to analyze, you can learn more. Here we roughly describe the selection principle of the election as: which node keeps the latest and most complete log, it is more qualified to be a leader.

Next, let's take a look at a write request initiated by Client. Client initiated three write requests, namely "+ 0", "+ 1", and "+ 2". From the log, we can see that after receiving these requests, Leader first sends them to other nodes in the form of logs (and in batches). When it receives successful responses from other nodes to log replication, it updates committedIndex, and finally calls the onApply API to perform the counting operation of counter, adding the instructions from client to the counter. In this process, you can see that one of the most important steps in Leader's processing of write requests is to copy the logs to other nodes. Let's take a closer look at this process and the committedIndex mentioned in it.

Figure 15-Leader handles write requests

CommittedIndex marks a point that indicates that all previous logs have been replicated to more than half of the nodes in the cluster. As you can see in the figure, committedIndex initially refers to the position of "3", and the logs representing "0-3" have been copied to more than half of the nodes (as we have seen on Follower). Then Leader copies "4" and "5" logs in batches to Follower, which allows committedIndex to slide right to the position of "5". The logs indicating "0-5" have been copied to more than half of the nodes.

Figure 16-Log replication

Then another question arises: how do we know which log StateMachine has executed? Through committedIndex, we can know which logs have been successfully replicated to other nodes in the cluster, but the current state in StateMachine represents the result of which log has been executed. This is going to be expressed in applyIndex. In the figure, applyIndex points to "3", which means that all the instructions represented by the log of "0-3" have been executed by StateMachine, and the status of the state machine now represents the result of the execution of the "3" log. When committedIndex slides to the right, applyIndex can continue to slide to the right with the execution of the state machine. ApplyIndex and committedIndex can support linear consistent reading, and we already have articles to analyze this concept, which can be found in the link at the end of the article.

Figure 17-ApplyIndex updates

The above is the example analysis of SOFAJRaft | SOFAChannel#8. Have you learned any knowledge or skills? If you want to learn more skills or enrich your knowledge reserve, you are welcome to follow the industry information channel.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Internet Technology

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report