Quick introduction to Block chain (3)-- CFT (non-Byzantine Fault tolerance) consensus algorithm 04/16 Update SLTechnology News&Howtos

Quick introduction to Block chain (3)-- CFT (non-Byzantine Fault tolerance) consensus algorithm

2025-04-16 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Shulou(Shulou.com)06/03 Report--

Quick introduction to blockchain (3)-- CFT (non-Byzantine fault tolerant) consensus algorithm 1. Introduction to CFT

CFT (Crash Fault Tolerance), that is, fault tolerance, is a fault tolerance technique for non-Byzantine problems.

Paxos problem refers to the consensus problem in the distributed system where there is a crash fault, but there is no malicious (corrupt) node (that is, the message may be lost or repeated, but there is no error message). It is the most common problem in the distributed consensus field. It was first named by Leslie Lamport to describe it with the story model of Paxon Island. The main algorithms to solve the Paxos problem are Paxos series algorithm and Raft algorithm. Paxos algorithm and Raft algorithm belong to strong consistency algorithm.

2. Paxos algorithm 1. Background of Paxos algorithm

In common distributed systems, situations such as machine downtime or network anomalies (including message delay, loss, repetition, disorder, network partition) always occur. The problem that Paxos algorithm needs to solve is how to quickly and correctly agree on the value of a certain data within the cluster in a distributed system where exceptions may occur, and ensure that no matter what exception occurs, it will not destroy the consistency of the whole system.

Paxos algorithm is used to solve the consistency problem of distributed systems.

2. Brief introduction of Paxos algorithm

In 1990, Leslie Lamport proposed the Paxos consensus algorithm in his paper "The Part-time Parliament", which realized a mechanism to maximize the consistency of distributed systems (there is a very small probability of not achieving consistency) from an engineering point of view. As an early researcher in the field of distributed systems, Leslie Lamport won the Turing Award in 2013 for related achievements.

In his thesis, Leslie Lamport expressed the Paxos problem as follows:

Law enforcers on the Greek island of Paxon vote to pass the law (a Paxos process) in the hall of parliament and exchange information through proposer, each of which records the passed law in his or her own account. The problem is that law enforcers and waiters are unreliable, they may leave the parliament hall at any time because of various things (server extension machine or network disconnection), and new law enforcers may enter the parliament hall at any time to vote on the law (newly joined machine). In what way can the voting process proceed normally, and the laws passed do not contradict (agree on a value).

There is no Byzantine general problem (message will not be tampered with) and two general problem (reliable channel) in the Paxos process.

Paxos is the first consensus algorithm that has been proved and widely used. Its principle is similar to the two-stage commit algorithm, which is generalized and extended, and gradually eliminates the uncertain state in the system through message passing.

As the basis of many subsequent consensus algorithms (such as Raft, ZAB, etc.), the basic idea of Paxos algorithm is not complex, but it is difficult to understand in the initial paper, even after several twists and turns. In 2001, Leslie Lamport published a special paper "Paxos Made Simple" for reinterpretation, which described the Paxos algorithm as follows:

Phase1

(a) A proposer selects a proposal number n and sends a prepare request with number n to a majority of acceptors.

(B) If an acceptor receives a prepare request with number n greater than that of any prepare request to which it has already responded, then it responds to the request with a promise not to accept any more proposals numbered less than n and with the highest-numbered pro-posal (if any) that it has accepted.

Phase 2

(a) If the proposer receives a response to its prepare requests (numbered n) from a majority of acceptors, then it sends an accept request to each of those acceptors for a proposal numbered n with a value v, where v is the value of the highest-numbered proposal among the responses, or is any value if the responses reported no proposals.

(B) If an acceptor receives an accept request for a proposal numbered n, it accepts the proposal unless it has already responded to a prepare request having a number greater than n.

At present, Paxos algorithm has been applied in Google's Chubby, MegaStore, Spanner and other systems. ZooKeeper in Hadoop also uses Paxos algorithm, but the algorithm used is an improvement of the original Paxos algorithm. Usually based on the Paxos algorithm, an improved Paxos algorithm can be obtained by dealing with the specific details of the actual application scene in the implementation process.

3. The principle of Paxos algorithm

The basic idea of the Paxos algorithm is similar to two-stage submission: multiple sponsors must first fight for the right of the proposal (supported by the majority of recipients); the successful sponsors send the proposal to all for confirmation, and the proposal confirmed by the majority of people becomes the approved closure of the case.

The Paxos protocol has three roles: Proposer (proponent), Acceptor (decision maker), and Learner (decision learner).

Paxos is a two-stage communication protocol. The basic flow of Paxos algorithm is as follows:

Phase I Prepare:

A, Proposer generates a globally unique proposal number N, and then sends a Prepare request numbered N to all Acceptor.

B. If an Acceptor receives a Prepare request numbered N and N is greater than the number of all Prepare requests that this Acceptor has responded to, then this Acceptor will feedback to Proposer the highest numbered proposal (if any) it has accepted, and this Acceptor undertakes not to accept any proposal with a number less than N.

Second stage Accept

A. If Proposer receives a response from more than half of Acceptor to its Prepare request numbered N, then Proposer will send an Accept request for [NMagi V] proposal to more than half of Acceptor. V is the value of the most numbered proposal in the response received, and if the response does not contain any proposals, then V is up to Proposer to decide.

B. If Acceptor receives an Accept request for a proposal numbered N, Acceptor accepts the proposal as long as the Acceptor does not respond to a Prepare request with a number greater than N.

Paxos does not guarantee that the system will always be in a consistent state. However, because more than half of the nodes participate in each consensus, the whole system will get the consensus result eventually. If the sponsor fails in the process of the proposal, it can be alleviated through the timeout mechanism.

Paxos can ensure that the system can always reach a consensus with a high probability when more than half of the nodes are working normally.

4. The algorithm for generating proposal ID

For the choice of proposal ID, what is mentioned in "Paxos made simple" is to let all Proposer choose from data sets that never intersect.

In Google's Chubby paper, the algorithm for generating proposal ID is as follows: suppose there are n Proposer, each numbered as follows

Ir (0 of 0P1, so P1 recalculates the number: new P1 = 1 "3" 1 = 4

2) P3 is submitted as number 2 and is found to be less than 4 of P1, so P3 is renumbered: new P3 = 1 "3" 2 = 5.

5. Livelock of Paxos algorithm

If the two sponsors happen to submit updated proposals in turn, it will lead to live locks, and the system will never be able to reach a consensus (the actual probability of occurrence is very small). The livelock did not block, but no agreement could be reached.

There are three solutions for live locks:

A. add a random waiting time before launching the PrepareRequest request again in the first phase.

B. Set a timeout. When the timeout is reached, no PrepareRequest requests will be received.

C, elect a leader in Proposer, send out PrepareRequest and AcceptRequest uniformly through leader.

6. Exception handling of Paxos algorithm

During the execution of Paxos algorithm, there will be a lot of anomalies: proposal downtime, Acceptor downtime after receiving Proposal, Proposer downtime after receiving messages, Acceptor downtime after Accept, Learner downtime, storage failure and so on.

In order to ensure the correctness of Paxos algorithm, Proposer, Aceptor and Learner all need to achieve persistent storage, so that they can still correctly participate in Paxos processing after Server recovery.

Proposer stores the maximum proposal number and resolution number (instance id) that have been submitted.

Acceptor stores the maximum number of committed (promise), the maximum number of accepted (Accept), and the Value, resolution number.

Learner stores decisions and numbers that have been learned.

7. Application of Paxos algorithm

In the Paxos algorithm, there are only two cases in which the service is unavailable: one is more than half of the Proposer exceptions, and the other is the occurrence of livelock. The former can reduce the probability of affecting the service due to Proposer anomalies by increasing the number of Proposer, while the probability of the latter itself is very low.

Paxos is the basis of distributed system consistency protocol, and other protocols (raft, zab, etc.) are improved versions of Paxos protocol. Paxos focuses on theory, so it is very difficult to implement Paxos.

PhxPaxos implementation of Wechat backend production-level Paxos class library:

Https://github.com/Tencent/paxosstore

Https://github.com/tencent-wechat/phxpaxos

The data set of the improved algorithm based on Paxos algorithm:

Https://github.com/dgryski/awesome-consensus

Third, the issue of the three armed forces. 1. a brief introduction to the issue of the three armed forces.

The description of the issue of the armed forces is as follows:

1) one Red Army camped in the valley, and three Blue Army troops were stationed on the surrounding hillside

2) the red army is stronger than any blue army; if one blue army fights alone, the red army wins; if two or more blue armies attack at the same time, the blue army wins

3) the three blue armies need to synchronize their attack time; but their only communication medium is to send messengers into the valley on foot, where they may be captured and thus lose information, or to avoid capture, may stay in the valley for a long time.

4) one staff officer in each army is responsible for proposing the attack time; each army also has one general to approve the attack time proposed by the staff officer; obviously, the attack time proposed by one staff officer needs to be approved by at least two generals in order to be meaningful

5) question: is there an agreement that allows the Blues to synchronize their attack time?

The issue of the armed forces is in line with the Paxos scenario, and staff and generals need to follow some basic rules:

1) the staff officer initiates the proposal in the form of a two-stage submission (prepare/commit). A proposal number is required in the Prepare phase.

2) when there is a conflict in the Prepare phase, the general decides according to the size of the proposed number, and proposes that the staff officer with the large number win.

3) if the staff officer receives the accepted attack time returned by the general in the Prepare phase, he must use the returned attack time in the Commit phase.

2. The two staff officers proposed the scene one after another.

Staff Officer 1 initiated a proposal to send messengers to send messages to three generals, the content is (No. 1)

B, 3 generals received a proposal from staff Officer 1, because no number had been saved before, so save (number 1) to avoid forgetting; at the same time, send the messenger back with the message (ok).

C. Staff Officer 1 received replies from at least 2 generals and again sent messengers to send messages to 3 generals, the content is (number 1, attack time 1)

D, 3 generals received staff 1 time, save (serial number 1, attack time 1) to avoid forgetting; at the same time, let the messengers return with the message (Accepted)

E, staff 1 received (Accepted) from at least 2 generals, confirming that the attack time has been accepted by everyone.

F, staff 2 launched a proposal to send messengers to three generals, the content is (No. 2)

G, three generals received the proposal of staff Officer 2, because (number 2) is larger than (number 1), so save (number 2) to avoid forgetting; and because they have previously accepted the proposal of staff Officer 1, they asked the messengers to bring a message back, the content is (number 1, attack time 1)

H, staff 2 received replies from at least two generals, and because the reply brought the accepted proposal of staff Officer 1, staff Officer 2 no longer proposed a new attack time and accepted the time proposed by staff Officer 1.

3. Cross-proposal scenario between two staff officers

1) staff Officer 1 initiated a proposal to send messengers to send messages to three generals, the content is (No. 1)

2) the situation of the three generals is as follows

A, General 1 and General 2 received the proposal of staff Officer 1, and General 1 and General 2 recorded it (number 1). If other staff officers put forward a smaller number, they would be rejected; at the same time, send the messenger back with the message (ok).

B. The messenger responsible for notifying General 3 was arrested, so General 3 did not receive a proposal from staff Officer 1

3) staff Officer 2 also launched a proposal at the same time, sending messengers to send messages to three generals, the content is (No. 2)

4) the situation of the three generals is as follows

A, General 2 and General 3 received a proposal from staff Officer 2, and General 2 and General 3 recorded it (number 2). If other staff officers put forward a smaller number, they would be rejected; at the same time, send the messenger back with the message (ok).

B. The messenger responsible for notifying General 1 was arrested, so General 1 did not receive a proposal from staff Officer 2

5) staff 1 received replies from at least 2 generals, and sent messengers again to send messages to the 2 generals who answered, the content is (number 1, attack time 1)

6) the situation of the two generals is as follows

General 1 received (number 1, attack time 1), which is the same as the number saved by himself, so save it (number 1, attack time 1); at the same time, let the messenger bring the message back, the content is (Accepted)

B, General 2 received (number 1, attack time 1), because (number 1) is less than the saved one (number 2), so send the messenger back with the message (Rejected, number 2).

7) staff 2 received replies from at least 2 generals and again sent messengers to send messages to the 2 generals who answered, the content is (number 2, attack time 2)

8) General 2 and General 3 received (number 2, attack time 2), which is the same as the number they saved, so save (number 2, attack time 2) and let the messengers bring the message back with the content (Accepted).

9) staff 2 received (Accepted) from at least 2 generals, confirming that the attack time has been accepted by the majority.

10) staff Officer 1 received only one General's (Accepted) content and one (Rejected, No. 2); staff Officer 1 renewed the proposal to send messengers to three generals with the following contents (No. 3)

11) the situation of the three generals is as follows

General 1 received the proposal of staff Officer 1, and because (number 3) was larger than that previously saved (number 1), keep it (number 3); because General 1 had accepted the previous proposal of staff Officer 1, let the messenger take the message back, the content is (number 1, attack time 1)

B. General 2 received the proposal of staff Officer 1, and because (number 3) was larger than that previously saved (number 2), it was saved (number 3). Since General 2 has accepted the proposal of staff Officer 2, let the messenger go back with a message. the content is (number 2, attack time 2)

C. The messenger responsible for notifying General 3 was arrested, so General 3 did not receive a proposal from staff Officer 1

12) staff 1 received responses from at least two generals, compared the numbers of the two responses, and selected the attack time corresponding to the large number as the latest proposal; staff 1 again sent messengers to send messages to the two generals who replied, the content is (number 3, attack time 2)

13) General 1 and General 2 received (number 3, attack time 2), which is the same as the number saved by themselves, so save it (number 3, attack time 2), and send the messenger back with the message (Accepted).

14) staff 1 received accepted from at least 2 generals, confirming that the time of attack had been accepted by the majority.

4. Multi-Paxos algorithm 1. Brief introduction of Multi-Paxos algorithm

Paxos is to agree on a value, and Multi-Paxos is a series of Paxos instance to agree on multiple values. There is a Leader in the Multi-Paxos protocol. Leader is the only Proposal in the system, all proposals in the lease lease cycle have the same ProposalId, can skip the prepare phase, the motion has only the accept process, a ProposalId can correspond to multiple Value, so it is called Multi-Paxos.

Multi-Paxos protocol is a simplified version of the classic Paxos protocol, which simplifies the original 2-Phase process to 1-Phase, thus speeding up the delivery speed. Multi-Paxos requires that there is a unique Leader in each Proposer, and the Leader uniquely submits the value to each Acceptor for voting. When there is only one Leader in the system for value submission, the Prepare process can be skipped, and the election of Leader can be completed by Paxos Lease.

2 、 PaxosLease

In the Paxos algorithm, if a leader can be elected, it will help to improve the success rate of voting. In addition, leader plays an important role in the election of multiple resolutions (used to obtain successive id of resolutions). Therefore, how to get a leader in some way is what PaxosLease explains.

The process of Master election is as follows: select one of the many Node as the Master, if the Master has been alive, there is no need to elect, if the master crash, then the other node to elect the next master. The guarantee of correct selection is that there can be at most one master at any one time.

Logically, Master is more like an invisible lock, and any node that gets the lock can become a master, so in essence, Master election is a distributed lock problem, but it is also risky to solve the election problem entirely by locking: if a Node gets the lock and then crash, the lock cannot be released, that is, deadlock. One feasible solution is to add a Lease to the lock. The Master that gets the lock can only access the locked resources within the validity period of the Lease. After the Lease timeout, other Node can continue to compete for locks, which fundamentally avoids deadlocks.

After getting the lock, Master can "renew" the lock from other node if it keeps alive, so as to avoid frequent election activities.

5. Raft algorithm 1. Brief introduction of Raft algorithm

The design of Paxos algorithm does not take into account some optimization mechanisms, and the paper does not give too many implementation details, so there are many algorithms and implementations with better performance, including Fast Paxos, Multi-Paxos and so on.

The Raft algorithm was proposed by Diego Ongaro and John Ousterhout of Stanford University in their paper "In Search of an Understandable Consensus Algorithm" in 2013. Facing the problem of reaching agreement on multiple decisions, Raft algorithm simplifies the design and implementation of Multi-Paxos, decomposes the considerations of leader election, log replication and security, and reduces the uncertain state space through constraints.

The Raft algorithm divides the consistency problem into three parts: leadership election (leader election), log replication (log replication) and security (safety).

2. Raft algorithm role

The Raft algorithm includes three roles: leader (Leader), candidate (Candidate) and follower (Follower). Before decision-making, a global leader is elected to simplify the subsequent decision-making process. The leader decides to submit the log, and the log can only be copied one-way from the leader to the follower.

Leader: there is only one server in the Leader state in the cluster, which is responsible for responding to requests from all clients.

Follower: all nodes are in Follower status at startup, responding to the log synchronization request of Leader and the request of Candidate.

The state that the Candidate:Follower state server needs to transition to before preparing to initiate a new Leader election is the intermediate state between Follower and Leader.

The initial states of all nodes are Follower roles

If the request from Leader is not received within the timeout period, it will be converted to Candidate for election.

When Candidate receives votes from most nodes, it is converted to Leader; discovery Leader or a request for a higher term is converted to Follower.

Leader is converted to Follower after receiving a request for a higher term of office

3. Term (term of office)

Term (term of office): each Leader has its own term of office, and a new round of election needs to be started when the term of office expires. In each term, there can be no leader, but there cannot be more than two leader.

The Raft algorithm divides the execution time of the whole system into a sequence of Term (term of office) with different time intervals, with an increasing number as the number of Term; each Term starts with Election, and several servers in the Candidate state compete to generate a new Leader during the term of office. If a candidate wins the election, he will serve as the leader for the rest of the term. In some cases, the votes will be divided, and there may be no leader elected, then another term will begin and the next election will begin immediately. The Raft algorithm ensures that there is at most one leader in a given term of office.

4. The flow of Raft algorithm

The typical process of Raft algorithm consists of two main stages:

(1) Leader election

When the whole system starts, all servers are in Follower state; if there is a Leader,Leader in the system, it will periodically send a heartbeat (AppendEntries RPC) to tell other servers that they are Leader;. If Follower does not receive any heartbeat information after a period of time, it can be considered that Leader does not exist and Leader election is needed.

Before the election, Follower increases its Term number and changes its state to Candidate, and then sends RequestVote RPC,Candidate status to other servers in the cluster until any of the following three events occur:

A. win the election: Candidate accepts the votes of most servers, becomes Leader, and then sends an AppendEntries RPC to other servers to tell other servers.

B, other servers get elected: while waiting, Candidate receives a heartbeat (AppendEntries RPC) from a server that calls itself Leader. If the Term number of RPC is greater than or equal to the Term number of Candidate itself, Candidate acknowledges Leader, and its own state becomes Follower; or refuses to recognize Leader, and the status is still Candidate.

C. After a period of time, if no new Leader is generated, the Term will increase step by step and re-launch the election (because it is possible that multiple Follower will be transferred to the Candidate state at the same time, resulting in diversion and no majority of votes will be obtained).

(2) Log replication

Log replication is mainly used to ensure the consistency of nodes, and the operation is to ensure consistency and high availability; when Leader is elected, it will be responsible for client requests, and all requests must be processed by Leader first. After receiving the client request, Leader appends it to the tail of Log, and then sends AppendEntries RPC to other servers in the cluster, causing other servers to copy the new request. When most servers finish copying, Leader applies the request to the internal state machine and returns the execution result to the client.

The items in each Log contain two contents: the action command itself and the Term number, and a global Log Index to indicate the sequential number of the Log project in the Log. When most servers store Log projects in Log, Log projects can be considered submittable, and Log projects before Log Index 7 in the figure above can be submitted.

(3) Security

Security is a security mechanism used to ensure that each node executes the same sequence. For example, when a Follower is not available when the current Leader submits a command, and the Follower may be elected as Leader later, the new Leader may overwrite the previously submitted Log with the new Log, which causes the node to execute a different sequence; security is the mechanism used to ensure that the elected Leader must contain a previously submitted Log.

In order to achieve security, the Raft algorithm adds two constraints:

A. require that only those servers whose Log contains all submitted operation commands have the right to be selected as Leader.

B. For the new Leader, it is considered a real submission only if it has submitted the operation command of the current Term.

5. The application of Raft algorithm.

Raft algorithm has been implemented in many languages, such as Go, C++, Python and other mainstream development languages.

Animation demonstration of the principle of Raft algorithm:

Http://thesecretlivesofdata.com/raft/

The resources of Raft consensus algorithm are as follows:

Https://raft.github.io/

The implementation of Raft algorithm in GE language:

Https://github.com/goraft/raft

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.