What is the implementation principle of SOFAJRaft? 07/01 Update SLTechnology News&Howtos

What is the implementation principle of SOFAJRaft?

2025-07-01 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Shulou(Shulou.com)06/02 Report--

This article mainly introduces "what is the implementation principle of SOFAJRaft". In daily operation, I believe many people have doubts about what is the implementation principle of SOFAJRaft. Xiaobian consulted various materials and sorted out simple and easy operation methods. I hope to help you answer the doubts of "what is the implementation principle of SOFAJRaft"! Next, please follow the small series to learn together!

SOFAStack (Scalable Open Financial Architecture Stack) is a finance-level distributed architecture independently developed by Ant Financial Services, which contains all components required to build a finance-level cloud native architecture and is the best practice tempered in financial scenarios.

SOFAJRaft is a production-grade high-performance Java implementation based on the Raft consistency algorithm, supporting MULTI-RAFT-GROUP, suitable for high-load, low-latency scenarios.

This article analyzes| SOFAJRaft Implementation Principle Chapter 6, this article by Xu Jiafeng, from Zhiwei Information, Li Kun, from Ant Financial Services. Analysis| SOFAJRaft principle "series produced by the SOFA team and source code enthusiasts, project code: , the end of the article has a way to participate, welcome the same source code enthusiasm you join.

SOFAJRaft ：https://github.com/sofastack/sofa-jraft

The purpose of this article is to introduce the pipeline mechanism used by SOFAJRaft in log replication, but the author suddenly felt that this topic is a bit abrupt, we should not assume that readers should have understood the concept of log replication, so as an analysis, I think it is better to introduce SOFAJRaft log replication is to solve what problem.

Concept Introduction

SOFAJRaft is a Java implementation of the Raft consensus algorithm. Since it is a consensus algorithm, it is inevitable to transfer the content that needs to reach consensus between multiple server nodes. In SOFAJRaft, we encapsulate these contents into log entries. This log transfer behavior between server nodes also has a special term in SOFAJRaft: log replication.

For ease of reading, let's use a chess story to simulate the process and problems of log replication.

Suppose we travel back in time and design a livestream for an upcoming chess match. Of course, all electronic communication technology is no longer available at this time. Fortunately, chess is an event that can be described in simplified words, such as "gun two draw five, ""horse eight into seven, ""car two back three," etc. We call these descriptive words chess. In this way, as long as we put the same chessboard outside the field (which may be very large and convenient for onlookers), we can broadcast the chess process of the players through the chess table.

cdn.nlark.com/yuque/0/2019/png/307286/1564466968889-f553ceba-e385-41ca-90be-97020fb9a656.png">

Figure 1 -Live broadcast via chess

Therefore, our live broadcast scheme is: two players play normally in the field, set up a special recorder to record every step taken by the players, arrange a flag boy to run inside and outside the field, every step taken by the players, the flag boy will transmit it to the outside in the form of chess score, so that the audience can watch the process of the game in quasi-real time outside the field and get the same experience as watching the live broadcast.

Figure 2 -A simple live broadcast scenario

This is the human version of SOFAJRaft's log copy. Next, let's perfect this "live broadcast system" and make it gradually align with the real log copy.

Improvements 1. Increase the number of recorders

Assuming our game gets a lot of attention, we need to set up more live venues outside the stadium for more viewers to watch.

In this way, we need to arrange more flag boys to pass the chess book. Every live broadcast outside the stadium needs a flag boy to be responsible for these flag boys constantly running inside and outside the stadium to pass the chess book information. Some live streaming platforms were farther away from the arena, and the flag boy had to run for a long time, so the delay of the live broadcast would be greater. Some live streaming platforms were very close, and the corresponding flag boy could quickly synchronize the game situation to the live broadcast.

As the number of live venues increases, the pressure on the recorder to record the game increases, as he has to provide different pieces of chess each time for different flag boys, some slow and some fast. If the recorder is confused or blinded, there will be a serious live accident (the audience will no longer see the real chess game of the players).

Figure 4 -Stressed Recorder

For this reason, we need to make some optimization, arrange a special recorder for each off-site live broadcast platform, so that "game-recorder-flag-live broadcast bureau" constitutes a single-line mode, with dedicated personnel working efficiently and reliably.

Figure 5 - "Game-Recorder-Flag Boy-Live Game"

Improvements 2. Increases the amount of information that flag boys can send each time.

At first, we asked the flag boy to pass the game map out once for every move the player made. However, as the game progressed, its drawbacks gradually appeared. On the one hand, the recorder recorded a lot of chess information that was not transmitted, so that he had to ask the players to stop and wait (incredible); on the other hand, the audience outside the game was also very dissatisfied with this "card frame" live broadcast mode.

So we made an improvement, asking the flag boy to remember a few more moves at a time, so that the recorder would not accumulate too much information to be broadcast, and the audience could see several moves at a time, which was not difficult for the smart flag boy, so the improvement achieved a win-win situation.

Figure 6 -Flagboy carrying information in batches

Improvements 3. Add snapshot mode

The game became more and more exciting. In response to the strong request of the chess fans, we temporarily added several live venues. At this time, the players had already taken a lot of steps. According to our usual method, the recorder and flag boy in charge of the new live broadcast needed to restore every step in the past on the live board (playback process), while the players kept playing new content.

Intuitively, this is also a very unwise way, so we use snapshot mode. Instead of asking the flag boy to pass on every move in the past, we draw down the current game map directly. After the flag boy takes the map out, he will place the pieces directly according to the map. In this way, the new live broadcast platform can quickly catch up with the progress of the chess game, allowing the audience to enjoy the synchronized chess game.

Figure 7 -Taking snapshot mode

Improvement 4. Each live broadcast platform uses multiple flag boys to deliver information

Although we have increased the amount of information that the flag boy carries each time in Improvement 2, in some cases (players playing fast chess, live platforms far away, etc.), the recorder still cannot synchronize the information to the off-site in time. At this time, we need to add multiple flag boys, each flag boy carries the information to the off-site in order, so that the recorder can synchronize the information to the off-site live broadcast platform more quickly.

Figure 8 -Pipeline effect using multiple flag boys to convey information

Now this live platform of human flesh has the following main characteristics of SOFAJRaft log replication under our gradual improvement:

Feature 1: Replicated logs are ordered and continuous

If the sequence is different, the final game may be completely different. When SOFAJRaft copies logs, the order of log transmission should also be strictly ordered, and all logs should not be out of order or have holes (that is, they cannot be missed).

Figure 9 -The journal remains strictly ordered and continuous

Feature 2: Replication logs are concurrent

In SOFAJRaft, the Leader node copies logs to multiple Follower nodes at the same time. Each Follower is assigned a Replicator in the Leader, which is dedicated to handling the replication log task. In chess games, we also arrange a recorder for each live broadcast platform to synchronize the chess game to the corresponding live broadcast platform.

Figure 10 -Concurrent replication logs

Feature 3: Copy logs are batch

The Leader node in SOFAJRaft copies logs to Followers in batches, just as the Flag Boy carries information about multiple moves out of the field at a time.

Figure 11 -Log copied in bulk

Feature 4: Snapshots in log replication

In Improvement 3, we let the newly added live streaming platform copy the current game directly instead of playing back every move in the past. This is the Snapshot mechanism in SOFAJRaft. Snapshot allows Followers to quickly keep up with the Leader's log progress and no longer play back log information from a long time ago, which reduces network throughput and improves log synchronization efficiency.

Feature 5: Pipeline mechanism for replication logs

In Improvement 4, we let multiple flag children participate in information transmission, so that the recorder and the live broadcast platform can transmit information in a "streaming" way, which can ensure that the information transmission is orderly and continuous.

In SOFAJRaft we have a similar mechanism to keep log replication streaming, which is pipeline. Pipeline makes it unnecessary for both Leader and Follower to strictly follow the "Request-Response- Request" interaction mode. Leader can continuously send AppendEntries Request of replication log to Follower without receiving Response.

In the specific implementation, the Leader only needs to maintain a queue for each Follower to record the copied logs, and if there is a log copy failure, it will resend the subsequent logs to the Follower. This will ensure the reliability of the log copy, the details of our analysis in the source code to talk about.

Figure 12 -Pipeline mechanism for log replication

Source code analysis

The above is an introduction to log replication at the principle level, and in the code implementation, Replicator and NodeImpl are mainly used to implement the logic of Leader and Follower respectively. The main methods are listed below. There are three things worth paying attention to in dealing with source code.

Figure 13 -Related Methods

Focus 1: Probe status for Replicator

Figure 14 -Status of Replicator

After establishing a connection with a Follower via Replicator, the Leader node sends a Probe type probe request to know the location of the logs already owned by the Follower, so as to send subsequent logs to the Follower.

Figure 15 -Sending probes to know the logindex of followers

Focus 2: Using Inflight to aid pipeline implementation

Inflight is an abstraction of logEntries sent in bulk, indicating which logEntries have been encapsulated as log copy requests sent.

Figure 16 - Inflight structure

The Leader maintains a queue, and adds an Inflight representing a batch of logEntries to the queue for each batch of logEntries, so that when it knows that a batch of logEntries fails to copy, it can rely on the Inflight in the queue to copy the batch of logEntries and all subsequent logs to the follower. It ensures that the log replication can be completed, and also ensures that the order of replication logs remains unchanged.

This part is logically clear, but there are many things to consider at the code level, so we post the source code here, and readers can continue to explore it in the source code.

Figure 17 -The main method of replicating logs

Figure 18 -Adding Inflight to Queue

Of course, in log replication, we should actually consider more complicated situations, such as how to deal with the follower once the leader is switched. I hope you can enter the source code to find the answer to these questions.

Concern 3: Communication layer adopts single thread & single link

In the pipeline mechanism, although we ensure that logs are copied in an orderly manner through the Inflight queue at the SOFAJRaft level, LogEntries that are transmitted out of order are excluded through various exception processes, but these excluded out-of-order logs will eventually have to be retransmitted to ensure final success, which will affect the efficiency of log replication.

Figure 19 -The communication layer does not guarantee order

As shown in the figure above, both the Connection Pool at the sending end and the Thread Pool at the receiving end will make the logs transmitted orderly on the "one-way street" enter the "multi-lane," so order cannot be guaranteed. So at the communication level SOFAJRaft does two optimizations to try to ensure that LogEntries are not out of order during transmission.

On the Replicator side, the URL used for log transfer is uniquely identified by uniqueKey, so that SOFABolt (the communication framework underlying SOFAJRaft) establishes a single connection for this URL, that is, only one connection is available in the Connection Pool on the sending side.

Figure 20 -Customizing URLs with uniqueKey

Instead of using thread pool to dispatch tasks at the receiver, add judgment_dispatch_msg_list_in_default_executor_so that we can directly deliver tasks to Processor through io thread. We have made some enhancements to SOFABolt, here is PR #84, interested readers can go to find out.

Figure 21 -SOFA Bolt uses IO threads to dispatch AppendEntries Request to Processor

The communication model of log replication thus becomes the desired "one-way street" model. This "one-way street" can largely ensure that the logs transmitted are orderly and continuous, thus improving the efficiency of pipeline.

Figure 22 -Optimized communication model

summary

Log replication isn't a complex concept, pipeline mechanics are an intuitive way of optimizing, and even practices of these concepts can be found in our daily lives. In SOFAJ Raft, the real challenge of log replication is how to keep high performance in a distributed environment while taking into account details and exceptions. This article is just a conceptual attempt to introduce log replication; more details are left for the reader to dig into the code to find out.

At this point, the study of "what is the implementation principle of SOFAJRaft" is over, hoping to solve everyone's doubts. Theory and practice can better match to help everyone learn, go and try it! If you want to continue learning more relevant knowledge, please continue to pay attention to the website, Xiaobian will continue to strive to bring more practical articles for everyone!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.