How to solve the problem of annoying data inconsistency? -- data consistency through "consensus" 11/19 Update SLTechnology News&Howtos

How to solve the problem of annoying data inconsistency? -- data consistency through "consensus"

2025-11-19 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Servers >

Shulou(Shulou.com)06/02 Report--

This time I'm ready to start a new series to talk about concerns in distributed systems. The rhythm will not be too tight, so plan a biweekly shift.

This is the second article in this series. It is the follow-up to the previous article "I don't know if it is the most easy-to-understand"data consistency Analysis".

The previous article may be too popular, the pressure is not high, not very popular. This article will continue to keep it as easy to understand as possible, and firmly believe that it is of greater value to let more people understand it. However, relatively speaking, the professionalism of content has increased.

The problem of data consistency has been analyzed, so how to solve the problem of inconsistency caused by failure? This article will focus on the point of "consensus".

01 what is "consensus"? Why did it happen?

The consistency problem is actually a "result", which is essentially caused by data redundancy. If there is no redundancy, there will be no consistency problem.

The reason why the subsystems in the distributed system can cooperate with each other is that the same data is redundant as a "token". Otherwise, why would I cooperate with you if I don't know you? So this "token" has changed, you have to let me know, or I won't know you again. The process of achieving agreement on this "token" change is called "consensus". So:

The problem of consistency is the result, and consensus is the process, or a means, to achieve the result.

In distributed systems, the scenario of redundant data is not limited to this, because the larger the system, the more can not tolerate the butterfly effect when a subsystem goes wrong, so it tends to be highly available. Xiaoming No. 1 is down, and tens of thousands of Xiaoming X are sticking to their posts, ideally providing 24 hours a day service. The essence of high availability is that multiple copies are stored through the same data, and all of them can provide services. For example, each Xiaoming X account has a "× × finger white paper". Whoever asks for leave can provide the same services by other Xiaoming X accounts. However, if this "× × finger white paper" is changed, everyone will have to be notified. Because this is the whole and source of the service, the problem of data redundancy is even more prominent in highly available clusters.

In fact, if each node in the distributed system can guarantee instantaneous response and fault-free operation, it is easy to reach a consensus. Just like us, within a certain range, as long as a roar, through a stable air transmission, whether the person concerned received the message, and the response can be almost "instantaneous". But as mentioned in [previous article, ← I], such a system only stays in the imagination, there are often delays in response requests, network interruptions, node failures, and even malicious nodes deliberately trying to destroy the system. This leads to the classic "Byzantine General problem" [1].

02 question of Byzantine generals

We generally divide the "Byzantine General problem" into two situations:

■ Byzantine error. Represents an error resulting from a malicious response by falsifying information.

■ is not a Byzantine error. There is no error resulting from the response.

The crux of the problem is:

How to solve the consistent implementation results of a change in the distributed network is recognized by all parties involved, and this information is determined and irrefutable.

For example, how to make all Xiaoming X receive the "× × × finger white paper Ⅱ" instead of the others, and destroy the original one. This problem gives rise to many "consensus" algorithms, the Byzantine Fault Tolerance (BFT) algorithm for solving Byzantine errors and the Crash Fault Tolerance (CFT) algorithm for solving non-Byzantine errors. It can also be seen from these two names that the essential job is "fault tolerance". Some partners may not have such a strong sense of the importance of "fault tolerance" in their normal work, resulting in an BUG or abnormal data, but in the aerospace field, a small mistake may lead to the failure of the whole launch, which is very costly.

If you want to have an in-depth understanding of the "Byzantine General problem", you can consult the relevant materials by yourself. it will not be carried out here, and the paper is attached at the end.

Byzantine errors are not generally considered in our common software development, but it is a necessity for blockchain projects. However, "non-Byzantine errors" can be seen in mainstream distributed databases, such as Tidb's Paxos algorithm and CockroachDB's Raft algorithm. Although we all in the daily coding, the understanding of the underlying principles of the database is not necessary. But when it comes to high application-level availability, at least "non-Byzantine errors" are a hurdle that must be faced.

03 BFT algorithm

The BFT type algorithm has two more branches. "based on certainty" and "based on probability".

Let's start with "certainty-based", which means that once a consensus is reached on a result, it is irreversible, that is, consensus is the final result. Its masterpiece is the PBFT (Practical Byzantine Fault Tolerance) algorithm [2], which has become more famous since the introduction of the central bank endorsement (blockchain digital bill trading platform). The principle of the algorithm, as shown below:

▲ pictures come from the Internet, and the copyright belongs to the original author.

Take the army as an analogy, straight line C here can be regarded as "commander-in-chief", straight line 0 is "army commander," straight line 1, straight line 2, and straight line 3 are all "division commanders." it is worth noting that the third division commander has defected. The whole process is explained as follows:

■ "request": the commander-in-chief gave an order to the commander, "do it!" .

■ "pre-prepare": the army commander broadcasts the order to three more division commanders.

■ "prepare": after receiving and agreeing, each division commander will send "receive" to the army commander and the other two division commanders.

■ "commit": each division commander receives a "receipt" request from 2f division commanders (army commanders do not do prepare) and sends "work at any time" to army commanders and two other division commanders. (F is the tolerable number of Byzantine nodes)

■ "reply": after each division commander receives the 2f+1 message "open at any time", he can think that the command of the commander-in-chief has reached the state of "dry at any time" among the relevant division commanders, then he will fire directly!

If you really have an in-depth understanding of PBFT, there is still a lot of content, which will not be carried out here. Interested partners will consult the address of the final paper or follow the official account and then reply "consistency" to package and download directly in the background.

When we talk about "probability-based", the consensus result of this kind of algorithm is temporary, and with the passage of time or some kind of reinforcement, the probability of the consensus result being overturned becomes less and less, which becomes the de facto final result. Its masterpiece is the PoW (Proof of Work) algorithm, based on which bitcoins, which used to be as high as $2W each, are implemented. The principle of the algorithm takes "Xiuxian" as a simple analogy (the algorithm in the actual bit is more complex than this):

■ works hard to practice by himself, and makes more than half of the immortals recognize your practice and allow you to become an immortal.

■, and then you become a fairy. And participate in judging whether others can become "immortals" in the future.

If ■ is to be achieved through bribery, as the number of people in the team increases and the cost of bribery increases, it can be assumed that the fewer people are going to bribe, the lower the probability of being misjudged, and the more credible it will eventually be.

The probability formula of being misjudged is 0.5 ^. If the number is 6, the probability of misjudgment is 1.5625%. If the number is 10, it is already 0.09765625%, which is exponentially lower.

It is worth noting that the criteria for non-cooperative nodes are different between "certainty-based" and "probability-based". The former can tolerate at most 1ax 3, while the latter is less than 1max 2.

04 CFT algorithm

As mentioned above, CFT-like algorithms solve the consensus problem in distributed systems where there are failures, but there are no malicious nodes (that is, messages may be lost or duplicated, but no error messages). Leslie Lamport, the author of the Byzantine General problem, also raised the "Paxos problem" in his other paper [3], similar to this. In the paper, this question is compared with a story, as follows:

The "law enforcers" on the Greek island of Paxon voted to pass the "law" in the "parliament hall" and exchanged information through the "waiter". Each "law enforcer" will record the passed "law" in his own "account". The problem is that both "law enforcers" and "waiters" are unreliable. They may leave the "parliament hall" at any time because of various things, and new "law enforcers" may enter the "parliament hall" at any time to vote on the law.

What method can be used to make the voting process proceed normally, and there is no contradiction in the "law" passed.

Baidu encyclopedia

The key object here is in our system, which can be compared to:

■ Parliament Hall = distributed system

■ enforcers = a procedure

■ server = RPC channel

■ accounts = database

■ law = one change operation

Leslie Lamport also proposed an algorithm to solve this problem, the "Paxos" algorithm [4]. The key to this algorithm is reflected by the following three definitions:

■ has a unique serial number for each "change" and can identify the new from the old by it.

■ Enforcement can only accept changes that are newer than known "changes"

Any two "changes" of ■ must involve the same "law enforcers"

These three points are only the most critical part of ensuring consistency, and there is a lot more to it. Interested partners check the address of the paper at the end of the article or follow the official account to reply "consistency" package and download directly in the background.

The "Paxos" algorithm is a Leaderless algorithm, and its implementation is complex, so there are many variants to simplify it, the most famous of which is "Raft", which came out in 2013. The "Raft" algorithm is a Leadership algorithm. Consensus is ensured by the following two processes:

There will only be one living leader in ■, and the leader is responsible for synchronizing the data of his followers.

■ if the leader is "lost", then every follower can become a candidate, and finally compare whose term is the latest, who is the new leader. This term is a self-increment maintained internally by each node.

Although followers vote on a first-come-first-served basis, there will still be multiple candidates with the same term who get the same number of votes (referred to as "split voting"), so a new round of voting will be held until the outcome is decided. Because Raft uses random timers to increase term, and the network is unstable, the probability of encountering the same number of votes again is greatly reduced.

The complete process is more complex, there is a Raft algorithm animation recommended to you, interested can learn about: http://thesecretlivesofdata.com/raft/.

Beside the question, the "ZAB" (ZooKeeper Atomic Broadcast) algorithm that we often use in Zookeeper is also a CFT algorithm, which is based on Fast Paxos algorithm.

05 conclusion

In retrospect, we found that for more rigorous consistency, we needed to increase the number of mutual communication confirmations, but this led to poor performance, just like PBFT and Paxos. But distributed systems are like this. Balance is needed everywhere, and the most important thing is to find the most suitable one.

After talking about the issue of "consensus" at the data level, let's talk about "distributed transactions" next time, focusing on the common CAP and BASE theories.

Finally, if you want to be a data consistency expert, ask if there are any shortcuts. The shortcut is to read Don Leslie Lamport's paper, his home page: http://www.lamport.org/.

▶ Wechat replies with the keyword "consistency" at the background, which can be packaged and downloaded.

[1] The Byzantine Generals Problem, ACM Transactions on Programming Languages and Systems, Leslie Lamport,1982.

Link: https://www.microsoft.com/en-us/research/uploads/prod/2016/12/The-Byzantine-Generals-Problem.pdf

[2] Practical Byzantine Fault Tolerance, Miguel Castro&Barbara Liskov,1999.

Link: http://101.96.10.63/pmg.csail.mit.edu/papers/osdi99.pdf

[3] "The Part-Time Parliament", Leslie Lamport,1998.

Link: https://www.microsoft.com/en-us/research/uploads/prod/2016/12/The-Part-Time-Parliament.pdf

[4] "In Search of an Understandable Consensus Algorithm", Diego Ongaro&John Ousterhout,2013

Link: https://raft.github.io/raft.pdf

Author: Zachary (personal WeChat account: Zachary-ZF)

Wechat official account (launch): cross-border architect.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.