Quick introduction to Block chain (2)-- Core Technology of distributed system 07/19 Update SLTechnology News&Howtos

Quick introduction to Block chain (2)-- Core Technology of distributed system

2025-07-19 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Shulou(Shulou.com)06/03 Report--

Quick introduction to Block chain (2)-- Core Technology of distributed system 1. Consistency of distributed system 1. Consistency of distributed system

With the bottleneck of Moore's Law, more and more cases have to rely on scalable distributed architecture to achieve massive processing capacity. When the single point structure evolves to the distributed structure, the first problem to be solved is the consistency of data. If multiple nodes in the distributed cluster can not guarantee the consistency of the processing results, the business system built on it will not work properly. Block chain system is a typical distributed system, and the consistency problem must be considered in the design.

When facing large-scale and complex task scenarios, single-point services are often difficult to meet the requirements of Scalability and Fault-tolerance, so multiple servers are needed to form a cluster system, which is virtualized as a more powerful and stable "super server".

The larger the size of the cluster, the stronger the processing power, and the higher the complexity of management. The large-scale clusters currently running include Google's search system, which supports search services for entire Internet content through hundreds of thousands of servers.

Usually, different nodes in the cluster system may be in different states and receive different requests at any time, so it is necessary to maintain the consistency of external responses at all times.

In the field of distributed systems, Consistency means that for multiple service nodes, given a series of operations, under the protection of the agreement protocol, multiple nodes achieve a certain degree of cooperation on the processing results.

Ideally (regardless of node failure), if each service node strictly follows the same processing protocol (that is, constitutes the same state machine logic), given the same initial state and input sequence, you can ensure that the execution results of each step in the process are the same. Therefore, the discussion of consistency in traditional distributed systems usually refers to ensuring that most of the nodes in the system actually deal with the request sequence in the case of arbitrarily initiating external requests (such as sending different requests to multiple nodes). That is, global sorting of requests.

Consistency is concerned with the state presented by the system, not whether the result is correct; for example, it is also a kind of consistency that all nodes reach a negative state to a request.

2. The challenge of distributed system

The following challenges exist in a typical distributed system:

A, nodes can only interact and synchronize through messages, while network communication is unreliable, and any message delay, disorder and errors may occur.

B, the time for the node to process the request can not be guaranteed, the result of processing may be wrong, and even the node itself may fail at any time.

C. In order to avoid conflicts, synchronous calls can simplify the design, but will seriously reduce the scalability of the system, and even degenerate into a single point system.

3. Requirements for consistency of distributed systems

The process of reaching agreement on a distributed system should meet the following requirements:

A, terminability (Termination)

Consistent results can be completed in a limited time.

B, Agreement

The final decision-making results of different nodes are the same. In order to ensure cooperation, distributed systems usually globally and uniquely sort multiple events that occur in different time and space, and the order must be recognized by all nodes.

C, legitimacy (Validity)

The result of a decision must be a proposal put forward by a node.

4. Consistency of distributed systems with constraints.

It is costly to achieve absolute ideal strict consistency (Strict Consistency). Unless there is no failure in the system, and the communication between all nodes does not need any time, the whole system is actually equivalent to a single point of system. In fact, stronger consistency requirements often mean weaker processing performance and worse scalability. According to the actual demand, we can choose different consistency, including strong consistency (Strong Consistency) and weak consistency (Weak Consistency).

There are two types of strong consistency:

A, sequence consistency

Sequence consistency (Sequential Consistency): proposed in Leslie Lamport1979's classic paper "How to Make a Multiprocessor Computer That Correctly Executes Multiprocess Programs", it is a strong constraint to ensure that the global execution order (total order) seen by all processes is consistent, and that each process sees its own execution order (local order) consistent with the actual order of occurrence. For example, if a process executes A before B, the actual global result should be that A comes before B, not vice versa. At the same time, all other processes should see this order globally. Order consistency actually limits the partial order of instructions within each process, but does not sort globally according to physical time between processes.

B, linear consistency

Linear consistency (Linearizability Consistency): Maurice P. Herlihy and Jeannette M.Wing jointly proposed in the classic paper "Linearizability: A Correctness Condition for Concurrent Objects" in 1990 that under the premise of sequence consistency, the ordering of operations between processes is strengthened to form a unique global order (the system is equivalent to sequential execution, and all operations seen by all processes are in the same sequence and consistent with the actual occurrence order), which is a strong guarantee of atomicity. But it is difficult to implement, at present, basically either rely on the global clock or lock, or through some complex algorithms, the performance is usually not high.

Accurate timing equipment is often needed to achieve strong consistency. The drift rate of the high-precision quartz clock is 10 ^-7, and that of the most accurate atomic oscillation clock is 10 ^-13. Google has adopted the TrueTime scheme based on atomic clock and GPS in its distributed database Spanner, which can control the time deviation of different data centers within 10ms. Without considering the cost, the TrueTime scheme is simple, rough, but effective.

Strongly consistent systems are often difficult to implement, and the need for consistency is not that strong in many scenarios. Therefore, the requirements for consistency can be appropriately relaxed and the difficulty of system implementation can be reduced. For example, to achieve the so-called final consistency (Eventual Consistency) under certain constraints, that is, there will always be a moment (not immediately) to bring the system to a consistent state.

For example, when an e-commerce is shopping, it puts an item in a shopping cart, but it may not indicate that the item has been sold out until the final payment. In fact, most Web systems achieve ultimate consistency in order to maintain the stability of the service.

Compared with strong consistency, weak consistency is weakening consistency in some aspects, such as final consistency.

2. Brief introduction of distributed consensus algorithm 1. Brief introduction of distributed consensus

Consensus (Consensus) is usually discussed together with Consistency terms. Strictly speaking, the two meanings are not exactly the same.

Consistency has a broader meaning than consensus and has different meanings in different scenarios (transaction-based databases, distributed systems, etc.). Specific to the distributed system scenario, consistency refers to the state of multiple replicas presented to the outside. Such as sequential consistency and linear consistency, it describes the common maintenance ability of multi-nodes to the data state. Consensus, on the other hand, refers to the process of agreeing on something, such as the order in which multiple transaction requests are executed, among multiple nodes in a distributed system. Therefore, reaching a consensus does not mean that consistency is guaranteed.

In practice, to ensure that the system meets different degrees of consistency, it often needs to be achieved through consensus algorithm.

Consensus algorithm solves the process that most nodes in a distributed system agree on a proposal (Proposal). The meaning of the proposal is very broad in distributed systems, such as the order in which multiple events occur, the value corresponding to a key, and so on. Any information that can be agreed upon is a proposal. For distributed systems, each node is usually the same deterministic state machine model (also known as state machine replication problem, State-Machine Replication). The same result state can be guaranteed if the instructions in the same order are received from the same initial state. Therefore, the key for multiple nodes to reach a consensus in a distributed system is to agree on the order of multiple events, that is, sorting.

2. The challenge of distributed consensus

When a distributed system reaches a consensus, it has to solve two basic problems:

A. how to put forward a proposal to be agreed upon, such as passing through tokens, random selection, weight comparison, solving problems and so on.

B, how to make multiple nodes reach a consensus on the proposal (agree or reject), such as voting, rule verification and so on.

In the practical distributed system, there is a delay in communication between different nodes (light speed physical limit, communication processing delay), and any link may have faults (the larger the scale of the system, the higher the possibility of failure). For example, the communication network will be interrupted, the node will fail, or even the node will deliberately forge messages to destroy the normal consensus process.

In general, a failure (Crash or Fail-stop, that is, no response) but does not forge information is called a "non-Byzantine error (Non-Byzantine Fault)" or "fault error (Crash Fault)"; a malicious response to forged information is called a "Byzantine error" (Byzantine Fault), and the corresponding node is a Byzantine node. In the Byzantine error scenario, it is more difficult to reach a consensus because of the existence of troublemakers.

3. Common distributed consensus algorithms

According to whether the solution scenario allows Byzantine errors, consensus algorithms can be divided into two categories: CFT (CrashFault Tolerance) and BFT (Byzantine Fault Tolerance).

For non-Byzantine errors, classic consensus algorithms include Paxos (1990), Raft (2014) and their variants. CFT fault-tolerant algorithms usually have better performance, faster processing, and tolerate no more than half of the fault nodes.

To tolerate Byzantine errors, it includes the deterministic series of algorithms represented by PBFT (Practical Byzantine Fault Tolerance, 1999) and the probabilistic algorithms represented by PoW (1997). Once the deterministic algorithm reaches a consensus, it is irreversible, that is, the consensus is the final result, while the consensus result of probabilistic algorithms is temporary, and with the passage of time or some reinforcement, the probability of the consensus result being overturned becomes smaller and smaller, and finally becomes the de facto result. Byzantine fault-tolerant algorithms usually have poor performance and tolerate no more than 1 to 3 fault nodes.

In addition, XFT (Cross Fault Tolerance, 2015) and other recently proposed improved algorithms can provide processing response speed similar to CFT, and can provide BFT guarantee when most nodes are working normally.

The Algorand algorithm (2017) is improved based on PBFT, and the problem of proposal selection is solved by introducing verifiable random functions, which can theoretically achieve better performance (1000+TPS) while tolerating Byzantine errors.

In practice, the client usually needs self-verification to get the consensus result. Typically, enough service nodes can be accessed to compare the results to ensure the accuracy of the results.

Third, the impossible principle of FLP 1. A brief introduction to the principle of FLP

FLP impossible principle: in a minimized asynchronous model system with reliable network but allowing nodes to fail (even if there is only one), there is no deterministic consensus algorithm that can solve the consistency problem.

The principle of FLP impossibility was put forward and proved by three scientists Fischer,Lynch and Patterson in the paper "Impossibility of Distributed Consensus with One Faulty Process" in 1985.

The principle of FLP impossibility shows that don't waste time trying to design consensus algorithms for any scenario for asynchronous distributed systems.

2. Synchronization and asynchronism of distributed system

The definitions of synchronization and asynchrony in distributed systems are as follows:

Synchronization means that there is an upper limit of the clock error of each node in the system, and the message transmission must be completed within a certain period of time, otherwise it is considered a failure; at the same time, the time for each node to complete processing the message is certain. Therefore, it is easy to determine whether the message is lost in the synchronization system.

Asynchronism means that there may be large clock differences among the nodes in the system, and the message transmission time may be arbitrarily long; the processing time for each node may also be arbitrarily long. Therefore, it is impossible to determine the reason for the delay in responding to a message (node failure or transmission failure).

Distributed systems in real life are usually asynchronous systems.

3. The significance of FLP principle.

The principle of FLP impossibility actually shows that when nodes are allowed to fail, pure asynchronous systems cannot ensure that the consensus is completed in a finite time. Even under the premise of non-Byzantine errors, including Paxos, Raft and other algorithms, there are extreme cases in which consensus can not be reached, but the probability in engineering practice is very small.

The principle of FLP impossibility does not mean that it is meaningless to study consensus algorithms. Academic research usually considers idealized situations in the mathematical and physical sense, and in many cases the real world is much more stable; when an engineering fails to achieve a consensus, a few more attempts are likely to succeed.

4. Principle of CAP 1. Introduction to the principle of CAP

The CAP principle first appeared in 2000, and the conjecture was put forward by Professor Eric Brewer of the University of California, Berkeley at the Principles of Distributed Computing (PODC) Symposium organized by ACM, and later proved by Nancy Lynch and other scholars of the Massachusetts Institute of Technology.

CAP principle is considered to be one of the important principles in the field of distributed systems, which has a profound impact on the development of distributed computing and system design.

CAP principle: distributed systems cannot ensure consistency (Consistency), availability (Availability) and partition tolerance (Partition) at the same time, and design often needs to weaken the requirements for a certain feature.

Consistency: any transaction should be atomic, and the state on all replicas is the result of a successful commit of the transaction and is strongly consistent.

Availability (Availability): the system (non-failed node) can reply to the operation request in a limited time.

Partition tolerance (Partition): the network in the system may have partition failures (becoming multiple subnets, even online and offline nodes), that is, the communication between nodes can not be guaranteed. The network failure should not affect the normal service of the system.

According to the CAP principle, a distributed system can only guarantee two of the three features at most. When partitions may occur in the network, the system cannot guarantee consistency and availability at the same time. Either, after receiving the request, the node does not reply because it is not confirmed by other nodes (sacrificing availability), or the node can only respond to inconsistent results (sacrificing consistency).

Because the network is considered to be reliable most of the time, the system can provide consistent and reliable services; when the network is unreliable, the system either sacrifices consistency (in most scenarios) or availability.

Network zoning is possible, which is likely to lead to brain fissure, and multiple emerging primary nodes may try to shut down other primary nodes.

2. Application scenario of CAP principle

The three characteristics of CAP can not be guaranteed at the same time, so the support for a certain feature must be weakened when designing the system. Three application scenarios can be defined according to the principle of CAP:

A. weakening consistency

Applications that are not sensitive to the consistency of the results can allow a period of time before the final update is successful after the new version is launched, during which consistency is not guaranteed. For example, the static page content of the website, the query database with weak real-time performance, simple distributed synchronization protocols such as Gossip, and CouchDB, Cassandra databases all weaken the consistency.

B, weakening usability

Applications that are sensitive to consistency of results, such as bank teller machines, will be denied service when the system fails. MongoDB, Redis, MapReduce, etc., all weaken usability.

Consensus algorithms such as Paxos and Raft mainly deal with situations that are sensitive to consistency. In Paxos-like algorithms, there may be situations where available results are not available and a small number of nodes are allowed to go offline.

C. Weakening the tolerance of partitions

In reality, the probability of network partition is small, but it is difficult to avoid it completely.

This design is mainly considered in the two-phase commit algorithm, some relational databases and ZooKeeper.

In practice, the network can enhance the reliability through dual-channel and other mechanisms to achieve high and stable network communication.

Fifth, ACID principle and multi-stage submission 1. Brief introduction of ACID principle

ACID, namely Atomicity (atomicity), Consistency (consistency), Isolation (isolation),

An acronym for the four features of Durability.

ACID is a well-known principle for describing consistency, which usually appears in systems based on transaction processes such as distributed databases.

The ACID principle describes the consistency requirements that a distributed database needs to meet, while allowing for the cost of availability.

Atomicity: each transaction is atomic, and all operations contained in the transaction either succeed or are not executed. Once an operation fails, you need to fall back to the state before the transaction is executed.

Consistency: the state of the database is consistent and complete before and after the transaction execution, and there is no intermediate state. That is, it can only be in the state after a successful transaction has been committed.

Isolation: various transactions can be executed concurrently, but do not affect each other. According to the standard SQL specification, there are four isolation levels from weak to strong: unauthorized read, authorized read, repeatable read and serialization.

Durability: the state change is permanent and does not fail. Once a transaction commits, the state change it causes is permanent.

A principle opposite to ACID is the BASE (Basic Availability,Soft-state,Eventual Consistency) principle put forward by Dan Pritchett, an eBay technical expert. BASE principle is oriented to large-scale high-availability distributed systems, which advocates that the pursuit of strong consistency should be sacrificed and the ultimate consistency should be achieved in exchange for certain availability.

The two principles of ACID and BASE are actually different trade-offs between the three characteristics of CAP principle.

The research results of distributed transaction consistency include the famous two-phase commit algorithm (Two-phaseCommit,2PC) and three-phase commit algorithm (Three-phase Commit,3PC).

2. Two-phase commit algorithm

The two-phase commit algorithm was first proposed by Jim Gray in his paper "Notes on Database Operating Systems" in 1979. The basic idea is very simple. Since various failures and conflicts may occur in a direct commit transaction in a distributed scenario, it can be divided into two stages: pre-commit and formal commit to avoid the risk of conflict.

Pre-submission: the coordinator (Coordinator) initiates an application for the submission of a transaction, and each participant executes

(Participant) you need to try to submit and give feedback on whether it can be completed.

Formal submission: if the coordinator receives a successful response from all implementers, a formal submission request will be issued. If it completes successfully, the algorithm executes successfully.

If there is a problem with any step in this process (for example, a reply from an executor in the pre-submission phase is expected to be unable to complete the submission), a fallback is required.

Two-phase commit algorithm is widely used in relational database and other systems because of its simplicity and easy to implement. The disadvantage of the two-phase commit algorithm is that the whole process requires synchronous blocking, resulting in poor performance; at the same time, there is a single point of problem, and in the worst case, the commit may not be completed all the time; data inconsistencies may occur (for example, coordinators and executors fail in the second phase).

3. Three-phase submission algorithm

Three-phase commit is optimized for the situation that some executors may be blocked in the first phase of the two-phase commit algorithm, that is, the pre-commit phase is further divided into two steps: try pre-commit and pre-commit.

The complete process is as follows:

Try pre-commit: the coordinator asks the executor if he or she can commit a transaction. The executor needs to return the reply, but there is no need to perform the submission, which can avoid the situation that some executors are blocked by invalidity.

Pre-submission: the coordinator checks the collected responses and, if all, initiates a request to submit the transaction. Each participant executor (Participant) needs to try to submit and give feedback on whether it can be completed.

No matter two-phase commit or three-phase commit, it only alleviates the problem of submission conflict to some extent, and can not guarantee the consistency of the system. The first valid algorithm for multi-phase commit is the Paxos algorithm.

VI. Reliability index 1. Brief introduction of reliability index.

Reliability (Availability, availability) is an important index to describe the ability of the system to provide services. Highly reliable distributed systems usually require a variety of complex mechanisms to guarantee.

In general, the availability of services can be measured by service commitment (Service Level Agreement,SLA), service indicators (Service Level Indicator,SLI), service objectives (Service Level Objective,SLO), and so on. The reference values of the reliability indicators that allow unavailability of the service each year are as follows:

In general, a single point of server system should be able to meet at least two 9s; an ordinary enterprise information system with three 9s is enough; a system that can reach four 9s is already a leading level (refer to cloud computing platforms such as AWS); telecom applications generally need to reach five 9s, allowing services to be unavailable for up to five minutes in a year; systems with six 9s or more are rare, and implementation often means a very high price.

2. Two core times

Generally speaking, there are two basic indicators to describe the possibility and resilience of a system failure: MTBF and MTTR.

MTBF (Mean Time Between Failures), that is, the average time between failures, is the expected time for the system to run without failure.

MTTR (Mean Time To Repair), that is, the average repair time, is the expected time that the system can return to normal operation after a failure.

MTBF measures the frequency of system failures, if the MTBF of a system is very short, it means that the availability of the system is low, while MTTR reflects the resilience of the system after a failure. If the MTTR of the system is too long, it will take a long time for the system to restore service in the event of a failure.

A highly available system should have MTBF as long as possible and MTTR as short as possible.

3. Improve the reliability

Reliability can be improved in two ways: one is to make individual components of the system more reliable, and the other is to eliminate a single point.

After all, the reliability of relying on a single point is limited, so if we want to further improve the reliability of the system, we have to eliminate the single point, and let multiple nodes collectively complete the original single point work (distributed) through master-slave, multi-activity and other modes. The overall reliability of the service can be improved in the sense of probability.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.