How to practice CAP Conformance Protocol and its Application 07/03 Update SLTechnology News&Howtos

How to practice CAP Conformance Protocol and its Application

2025-07-03 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Shulou(Shulou.com)06/02 Report--

In this issue, the editor will bring you about how to practice the CAP conformance agreement and application. The article is rich in content and analyzes and describes for you from a professional point of view. I hope you can get something after reading this article.

1. Consistency 1.1 CAP theory

C consistency: in a distributed environment, consistency means whether multiple replicas can have the same value at the same time.

An availability: the services provided by the system must be available at all times. Even if some of the nodes in the cluster fail.

P-partition fault tolerance: when the system encounters node failure or network partition, it can still provide consistent and available services. In terms of practical effect, partitioning is equivalent to the time limit of communication. If the system can not achieve data consistency within a certain implementation, it means that the situation of partition has occurred. A choice must be made before C and A for the current operation.

1.2 proof that CAP cannot be satisfied at the same time

Suppose there are five nodes in the system, n1~n5. N 1 ~ 2 ~ N 3 is in A physical computer room. N 4 ~ 5 is in the B physical computer room. Now the network partition has occurred, and the network of computer room An and computer room B are not connected.

Ensure consistency: at this time, the client writes data in computer room An and cannot be synchronized to computer room B. Write failed. Usability is lost at this time.

Ensure availability: the data is returned successfully after all the n1~n3 nodes in computer room An are successfully written. The data is also written to the n4~n5 node in computer room B, and the data is returned successfully. The data of the same data is inconsistent between computer room An and computer room B. As smart as you can think of zookeeper, when a node down off, the system will remove it from the node, but more than half of the other nodes can be written successfully. Does zookeeper satisfy CAP at the same time? In fact, there is a misunderstanding, the system will remove it from the node. An implicit condition is that the system introduces a scheduler, a scheduler who kicks out a bad node. When the network partition appears between the dispatcher and the zookeeper node, the whole system is still unavailable.

1.3 Common scenarios

CA without P: in a distributed environment, P is inevitable, and natural disasters (the Azure of a soft company is struck by lightning) and man-made disasters (the optical cable between An and B computer rooms of a certain company is dug off) can lead to P.

CP without A: it means that every write request must be strongly consistent before Server. P (partition) can cause the synchronization time to be extended indefinitely. This can be guaranteed. For example, database distributed transaction, two-phase commit, three-phase commit and so on.

AP without C: when network partitioning occurs, An and B clusters lose contact. In order to ensure high availability, when the system writes, some nodes of the system will return success, which will cause the client to read different data from different machines within a certain period of time. For example, redis master-slave asynchronous replication architecture, when master down is off, the system will switch to slave. Because it is asynchronous replication, salve is not the latest data, which will lead to consistency problems.

II. Conformance Protocol 2.1two-phase commit Protocol (2PC)

Two-phase commit (Two-phaseCommit) is an algorithm (Algorithm) designed to make all nodes based on distributed system architecture consistent when committing transactions in the field of computer network and database. In general, two-phase commit is also referred to as a Protocol. In a distributed system, although each node can know the success or failure of its own operation, it cannot know the success or failure of the operation of other nodes. When a transaction spans multiple nodes, in order to maintain the ACID characteristics of the transaction, it is necessary to introduce a component as a coordinator to control the operation results of all nodes (called participants) and finally indicate whether these nodes want to actually commit the operation results (such as writing updated data to disk, etc.). Therefore, the algorithm idea of two-stage submission can be summarized as follows: the participants notify the coordinator of the success or failure of the operation, and then the coordinator decides whether each participant should submit the operation or abort the operation according to the feedback information of all participants.

2.1.1 two roles

Coordinator

Participant

2.1.2 processing phase

Ask about the voting phase: the transaction coordinator sends a Prepare message to each participant. After the participant receives the message, either he or she successfully writes the redo and undo logs locally, and returns an agreed message, otherwise, a message that terminates the transaction.

Perform initialization (perform commit): after receiving a message from all participants, if one of the coordinators returns to terminate the transaction, the coordinator sends a rollback instruction to each participant. Otherwise, send commit messages.

2.1.3 exception handling

Coordinator failure: the backup coordinator takes over and queries the participant to what address to execute

Participant failure: the coordinator will wait for him to restart and then execute

Both the coordinator and the participant fail at the same time: the coordinator fails, and then the participant also fails. For example: there are machines 1, 2, 3, 4. Among them, 4 is the coordinator, and 1, 2, and 3 is the participant 4, which fails after sending a transaction to 1, and 3 is also malfunctioning at this time. Note that this is 3, which does not commit transaction data. After the backup coordinator starts, ask the participant that since 3 is dead, it has not been known what state it is in (accepting the commit transaction or feedback on whether it can or cannot be executed). In the face of this situation, 2PC cannot be solved, and the 3PC described below is needed to solve the problem.

2.1.4 disadvantages

Synchronous blocking problem: since all participating nodes are transaction blocking, such as update tablesetstatus=1wherecurrent_day=20181103, the current_day=20181103 record of the participant table table will be locked and other transactions that modify the current_day=20181103 row will be blocked

A single point of failure blocks other transactions: the coordinator performs the commit phase down again, and all participants are in a state of locking transaction resources. Unable to complete the related transaction operation.

Both the participant and the coordinator down down at the same time: the coordinator drops down after sending the commit message, and the only participant who receives the message also drops down. The new coordinator takes over, is also a confused state, do not know the status of this transaction. Neither commit nor rollback is appropriate. This cannot be changed by a two-phase submission.

2.2 three-phase commit protocol (3PC)

At that time, 2PC only considered that if a single machine failed, it could barely cope with it. When the coordinator and the participant fail at the same time, 2PC's theory is not perfect. This is when 3PC takes the stage.

3PC is a supplementary protocol for 2PC vulnerabilities. There are two main changes:

Insert a preparation phase into the first and second phases of the 2PC so that even if the participants and coordinators fail at the same time, there is no blocking and consistency is guaranteed.

Introduce a timeout mechanism between coordinators and participants

2.2.1 three stages of processing

Transaction query phase (can commit phase): the coordinator sends a commit request to the participant and then waits for the participant to respond. This is different from the 2PC phase in that the participant does not lock the resource, does not write the redo,undo, and performs a rollback log. Low rollback cost

Transaction preparation phase (pre commit): if all participants return ok, a Prepare message is sent, and the participant executes redo and undo logs locally. Otherwise, a request for abort transaction is submitted to the participant. If the Prepare message is sent again and the wait times out, a request to terminate the transaction will also be submitted to the participant.

Execute transaction phase (do commit): if all send Prepare returns successful, then it becomes the execution transaction phase, sending a message of the commit transaction to the participant. Otherwise, roll back the transaction. At this stage, if the participant does not receive the docommit message within a certain period of time, the timeout mechanism is triggered and the transaction will be committed by itself. The logic of this process is that you can enter this stage, indicating that all nodes are good during the transaction query phase. Even if it fails partially at the time of submission, it is reasonable to believe that most of the nodes are good at this time. It can be submitted.

2.2.2 shortcomings

It can not solve the problem of data inconsistency caused by network partition: for example, five participant nodes, three nodes in computer room A, and five nodes in computer room B. In the pre commit phase, all five nodes received a Prepare message, but node 1 failed to execute. The coordinator sends a message to 1-5 nodes to roll back the transaction. But at this time, the network partition of A _ mai B computer room. Node 1-3 rolls back. However, the 45th node commits the transaction because it does not receive a message to roll back the transaction. After the network partition is restored, there will be data inconsistencies.

Cannot resolve the problem with fail-recover:

Due to the existence of timeout mechanism in 3PC, the unsolved problems in 2PC are solved when participants and coordinators down at the same time. Once participants do not receive a message from the coordinator within the timeout, they will submit it themselves. This also prevents participants from taking up shared resources all the time. However, in the case of network partition, it can not guarantee the consistency of the data.

2.3 Paxos protocol

For example, both 2PC and 3PC need to introduce the role of a coordinator. When the coordinator down is dropped, the whole transaction cannot be committed, the resources of the participants are in a locked state, and the impact on the system is catastrophic. In the case of network partition, data inconsistencies are likely to occur. Is there a solution that does not require the role of coordinator, each participant to coordinate transactions, and maximize consistency in the case of network partitioning? At this point Paxos appears.

Paxos algorithm is a consistent algorithm based on message passing proposed by Lamport in 1990. Because the algorithm was difficult to understand and did not attract people's attention at first, Lamport was republished eight years later, even so the Paxos algorithm was not taken seriously. Three papers by Google in 2006 were shocking, in which the chubby locking service used Paxos as consistency in chubbycell, and later received attention.

2.3.1 what problems have been solved?

Paxos protocol is a communication protocol to solve the problem that multiple nodes agree on a certain value (proposal) in a distributed system. It can handle that when a small number of nodes are offline, the remaining majority nodes can still reach agreement. That is, each node is both a participant and a decision maker.

2.3.2 two roles (both can be the same machine)

Proposer: the server that proposed the proposal

Acceptor: the server that approved the proposal

As the ZAB protocol used by Paxos and zookeeper mentioned below is too similar, please refer to the Zookeeper principles section below for a detailed explanation.

2.4 Raft protocol

Paxos demonstrates the feasibility of the conformance protocol, but the process of the demonstration is said to be obscure and lack of necessary implementation details, and the engineering implementation is relatively difficult and is widely known only to implement the zab protocol of zk. Then the distributed consistent replication protocol Raft is proposed in the Stanford University RamCloud project, which is easy to implement and understand. Java,C++,Go and others all have their corresponding implementations.

2.4.1 basic nouns

Node state

Leader (master node): accept the client update request, write it locally, and then synchronize to other replicas

Follower (slave node): accepts the update request from the Leader and writes to the local log file. Provide a read request to the client

Candidate (candidate node): if the follower does not receive a leader heartbeat for a period of time. Then determine the possible failure of the leader and initiate the election proposal. The node status changes from Follower to Candidate until the end of the selection

TermId: term number. Time is divided into terms. After each election, a new termId is created, and there is only one leader in a term. TermId is equivalent to the proposalId of paxos.

RequestVote: requests a vote. The candidate is initiated during the election process and becomes a leader after receiving a quorum (majority) response.

AppendEntries: attach logs, the mechanism by which leader sends logs and heartbeats

Election timeout: the election timed out. If follower does not receive any messages (append log or heartbeat) within a period of time, it is the election timeout.

2.4.2 Features

Leader will not modify its log, but will only do an append operation, and the log can only be transferred from Leader to Follower. For example, the Leader node that is about to down has submitted log 1, and log 2 has not submitted log 3. After the down is dropped, node 2 starts the latest log only 1, and then commits log 4. Coincidentally, Node 1 started again. At this point, node 2's number 4 log is appended to node 1's number 1 log. The log of Node 1 number 2Pol 3 will be lost.

It does not rely on the physical timing of each node to ensure consistency, but through logically increasing term-id and log-id.

2.4.3 choose the main opportunity

Did not receive a heartbeat from Leader during the timeout

At startup

2.4.4 selecting the main process

As shown in figure raft-2, Raft divides time into multiple term (terms of office), term is identified by consecutive integers, and each term represents the beginning of an election. For example, Follower node 1, during the time at the connection between term1 and term2, cannot contact Leader, increases the currentTerm number by 1, becomes 2, enters the term of office of term2, completes the election in the blue part of term2, and the green part works properly.

Of course, a term of office may not be able to elect Leader, then the currentTerm will continue to be increased by 1, and then the election will continue, such as T3 in the figure. The principle of the election is that each voter has one ballot in each round, and the request for voting comes first and voters find that the log id of the candidate node is greater than or equal to their own, they will vote, otherwise they will not vote. The node that gets more than half of the votes becomes the master node. Note that this does not mean that the selected transaction id is necessarily the largest. For example, the following figure shows that the raft-1a~f has six nodes (the number in the square box is the number of rounds elected term).

In the fourth round of the election, a will vote first, and of the six machines, axie will vote for a, and even if f does not vote for afora, it will win the election. If there is no transaction id (such as when it is started), follow the voting request first. Then Leader copies the latest logs to each node and provides services.

Of course, in addition to these election restrictions, there will be other situations. Such as commit restrictions and other guarantees, the success of the Leader election must include all commit and log.

2.4.5 Log replication process

During the raft log writing process, the master node will write to the local log after receiving a request from xroom1, and then broadcast the log of xroom1. If follower receives the request, it will write the log to the local log and return success. When leader receives more than half of the node responses, it changes the status of the log to commit, and then broadcasts the message to follwer to submit the log. After the node is in the commit log, it updates the logindex in the state machine.

FirstLogIndex/lastLogIndex is the starting and ending index position in the node (including commit, uncommitted, write state machine) commitIndex: committed index. ApplyIndex: the index written to the state machine.

The essence of log replication is to make the committed log order and content of follwer and Leader exactly the same, to ensure consistency.

The specific principles are:

Principle 1: if two logs are in different raft nodes, if there are two identical term and logIndex, the contents of the two logs are exactly the same.

Principle 2: two logs in different raft nodes, if the starting and ending term,logIndex are the same, then the log content in the two logs is exactly the same.

How to guarantee

The first principle requires only the use of the new logIndex when creating the logIndex, ensuring the uniqueness of the logIndex. And do not change it after creation. So after leader is copied to follwer, the logIndex,term and log contents remain the same.

The second principle is to pass the current latest log currenTermId and currentLogIndex, as well as the previous log preCurrentTermId and preCurrentLogIndex, when Leader is replicated to Follower. As shown in the figure raft-1, on node d, term7,logIndex12. When synchronizing to node a, sending (term7,logIndex11), (term7,logIndex12), and node a cannot find (term7,logIndex11) the log will cause the Leader,d node to resend. D node will resend (term6,logIndex10) (term7,logIndex11), or no (term6,logIndex10) log, will still refuse synchronization. Then term6,logIndex9 (term6,logIndex10). Now node a has (term6,logIndex9). Then the leader node will give the (term6,logIndex9) ~ (term7,logIndex11) log content to node a, and node a will have the same log as node d.

III. An overview of the principles of Zookeeper

Burrows, the designer and developer of Google's coarse-grained locking service Chubby, once said: "all conformance protocols are essentially either Paxos or variants." Although Paxos solves the problem that multiple nodes reach a consistent communication protocol on a certain value in a distributed system. But other problems have been introduced. Because each of its nodes can propose a proposal or approve it. When there are three or more proposer sending prepare requests, it is very difficult for one proposer to receive more than half of the responses and continue to implement the first phase of the protocol. Under this kind of competition, the election speed will slow down.

So zookeeper proposed the ZAB protocol on the basis of paxos. In essence, only one machine can propose a proposal (Proposer), and the name of this machine is called the Leader role. Other participants played the role of Acceptor. In order to ensure the robustness of Leader, the Leader election mechanism is introduced.

ZAB protocol also solves these problems.

If less than half of the nodes are down, they can still provide services to Taiwan.

All write requests on the client side are handled by Leader. After a successful write, you need to synchronize to all follower and observer

Leader is down, or the cluster is restarted. You need to ensure that all transactions that have been committed by Leader will eventually be committed by the server, and that the cluster can quickly return to its pre-failure state.

3.2 basic concepts

Basic noun

Data node (dataNode): the smallest data unit in the zk data model. The data model is a tree, which is uniquely identified by a path name divided by a slash (/). The data node can store data content and a series of attribute information, and can also mount child nodes to form a hierarchical namespace.

Transaction and zxid: transaction refers to the operation that can change the state of Zookeeper server, including the creation and deletion of data node, the update of data node content and the creation and invalidation of client session. For each transaction request, zk assigns a globally unique transaction ID, namely zxid, which is a 64-bit number. The high 32 bits represent the cluster election cycle in which the transaction occurs (each leader election occurs in the cluster, the value is added by 1), and the lower 32 bits indicate the increasing order of the transaction in the current selection cycle. (for each transaction request processed by leader, the value plus 1, a leader selection occurs, and the lower 32 bits are cleared 0).

Transaction log: all transaction operations need to be recorded in the log file. You can configure the file directory through dataLogDir. The file is suffixed with the first transaction zxid written to facilitate subsequent location search. Zk will adopt the strategy of "disk space pre-allocation" to avoid disk Seek frequency and improve the ability of zk server to influence transaction requests. By default, each transaction log write operation is brushed to disk in real time, or it can be set to non-real-time (write to memory file stream and write to disk in batch), but there is a risk of data loss in the event of a power outage.

Transaction snapshots: data snapshots are another very core operating mechanism in zk data storage. The data snapshot is used to record the full amount of memory data content at a certain time on the zk server and write it to a specified disk file, through the dataDir configuration file directory. The configurable parameter snapCount sets the number of transaction operations between the two snapshots. When the zk node records the transaction log, it will statistically determine whether a data snapshot is needed (from the last snapshot, when the number of transaction operations is equal to a certain value in snapCount/2~snapCount, the snapshot generation operation will be triggered. The random value is to prevent all nodes from generating snapshots at the same time, resulting in slow impact on the cluster).

Core role

Leader: the system was in an election state when it first started or after the Leader crashed

The state of the follower:Follower node. Follower and Leader are in the data synchronization phase.

The state of observer:Leader. There is currently a Leader main process in the cluster.

Node state

LOOKING: the node is in the primary state and does not provide services until the end of the elector

FOLLOWING: as a slave node of the system, accepts updates from the master node and writes to the local log

LEADING: as the system master node, accept client updates, write to the local log and copy to the slave node

3.3 Common misunderstandings

The data written to the node can be read immediately, which is wrong. * zk writes must be written through leader serial, and as long as more than half of the nodes write successfully. And any node can provide read service. For example: zk, which has 1-5 nodes, writes the latest data, and the latest data is written to node 1-3, which returns success. Then the read request comes to read the latest node data, and the request may be assigned to node 4x5. At this time, the latest data has not been synchronized to node 4 / 5. Will not be able to read the latest data. If you want to read the latest data, you can use the sync command * before reading.

Zk startup nodes cannot be even numbered, which is also wrong. Zk needs more than half of the nodes to work properly. For example, create 4 nodes, more than half of the normal number of nodes is 3. That is, only one machine down is allowed to drop at most. Of the 3 nodes, more than half of the normal number of nodes is 2, which allows at most one machine down to drop. Four nodes, the cost of an extra machine, but the robustness is the same as a three-node cluster. Cost-based considerations are not recommended

3.4 the election synchronization process 3.4.1 opportunities to initiate voting

Node start

The node cannot stay connected to the Leader while it is running

Leader loses more than half the connection of nodes

3.4.2 how to guarantee transactions

The ZAB protocol is similar to a two-phase commit. A write request comes from the client. For example, setting the / my/test value to 1 Magi Leader will generate a corresponding transaction proposal (proposal) (the zxid proposed by the current zxid is 0x5000010 is Ox5000011). Now the set/my/test1 (pseudo code here) is written to the local transaction log, and then the set/my/test1 log is synchronized to all follower. Follower receives the transaction proposal and writes the proposal to the transaction log. If more than half of the follower responses are received, the broadcast initiates a commit request. After follower received the commit request. The zxid ox5000011 in the file is applied to memory.

What it says above is a normal situation. There are two situations. After the first Leader is written to the local transaction log, it is down without sending a synchronization request. Even if the host is selected and then started as a follower. At this point, the log will still be lost (because the selected leader does not have this log and cannot be synchronized). The second type of Leader issues synchronization requests, but down without commit. At this point, the log will not be lost and will be submitted synchronously to other nodes.

3.4.3 Voting process during server startup

Now the five zk machines are numbered 1 to 5 in turn.

Node 1 starts, and the request sent out does not respond. This is the status of Looking.

Node 2 starts, communicates with node 1 and exchanges election results. Since the two have no historical data, that is, zxid cannot be compared, node 2 with a higher value of id wins, but since there are not more than half of the nodes, both 1 and 2 remain in the state of looking.

Node 3 starts, and according to the above analysis, the node with the highest id value wins 3, and more than half of the nodes participated in the election. Node 3 wins and becomes Leader.

Node 4 starts and communicates with 1-3 nodes. We know that the latest leader is node 3, and at this time the zxid is also less than node 3, so we recognize the role of leader of node 3.

Node 5 starts and, like node 4, selects the role of leader that acknowledges node 3

3.4.4 selecting the main process during server operation

1. Node 1 initiates a vote, and the first round of voting first votes for itself, and then enters the Looking waiting state.

two。 Other nodes (such as Node 2) receive the voting information of the other party. Node 2 broadcasts its voting results if it is in Looking status (the Looking branch on the left in the image above); if it is not in Looking status, tell node 1 who the current Leader is, and don't fiddle with the election (the Leading/following branch on the right side of the image above).

3. At this point, node 1 receives the election results of node 2. If Node 2 has a larger zxid, empty the ballot box, create a new ballot box, and broadcast your latest voting results. In the same election, if more than half of the nodes in the ballot box have selected a node after receiving the voting results of all the nodes, then it is proved that leader has been elected and the voting will be terminated. Otherwise, it will cycle all the time.

In the election of zookeeper, the node with the largest zxid,zxid has the most up-to-date data. If there is no zxid, such as when the system is just started, compare the number of the machine and give priority to the one with a large number.

3.5 process of synchronization

After selecting Leader, zk enters the process of state synchronization. In fact, the log data corresponding to the latest zxid is applied to other nodes. This zxid contains the zxid in the follower that is written to the log but not committed. It is called the zxid in the server proposal cache queue committedLog.

Synchronization completes the initialization of the three zxid values.

PeerLastZxid: the last zxid processed by the learner server. The minCommittedLog:leader server proposes to cache the minimum zxid in queue committedLog. The maxCommittedLog:leader server proposes to cache the maximum zxid in the queue committedLog.

The system will make different synchronization strategies according to the comparison between learner's peerLastZxid and leader's minCommittedLog and maxCommittedLog.

3.5.1 Direct differential synchronization

Scene: peerLastZxid is between minCommittedLogZxid and maxCommittedLogZxid.

This scenario occurs when, as mentioned above, Leader makes a synchronization request, but down before commit. Leader sends Proposal packets, as well as commit instruction packets. The newly elected leader continues to complete the unfinished work of the previous leader.

For example, at the moment, the cache team proposed by Leader is listed as 0x20001, 0x20002, 0x20003, 0x20004, where the peerLastZxid of learn is 0x20002, Leader will synchronize 0x20003 and 0x20004 to learner.

3.5.2 rollback first in differentiated synchronization / rollback synchronization only

This scenario occurs when, as mentioned above, Leader writes to the local transaction log, down before a synchronization request is issued, and then appears as a learner when synchronizing the log.

For example, leader Node 1, which is about to lose down, has already dealt with 0x20001Power0x20002 and down before making a proposal when dealing with 0x20003. Later, Node 2 was elected as the new leader, and Node 1 was miraculously revived when synchronizing the data. If the new leader has not yet processed the new transaction and the queue column of the new leader is, 0x20001, 0x20002, then only node 1 is rolled back to the 0x20002 node, and the 0x20003 log is discarded, which is called rollback-only synchronization. If the new leader has already handled 0x30001recorder 0x30002 transactions, then the new leader team here is listed as 0x20001ji0x20002j0x30001j0x30002, then let node 1 roll back first, go to 0x20002, and then differentiate and synchronize 0x30001j0x30002.

3.5.3 full synchronization

PeerLastZxid is less than minCommittedLogZxid or there is no cache queue on leader. Leader uses the SNAP command directly for full synchronization.

Fourth, use Raft + RocksDB distributed KV storage service with likes.

At present, most of the open source cache kv systems are AP systems, such as setting master-slave synchronization cluster redis,master asynchronous synchronization to slave. Although after master is out of service, slave will come up. However, when master writes data, but down before it is synchronized to slave, and then slave is selected as the master node to continue to provide services, some data will be lost. This is unacceptable for systems that require strong consistency. For example, in many scenarios, redis makes distributed locks with natural defects. If master stops service, the lock is not very unreliable. Although the probability of occurrence is very small, once it occurs, it will be a fatal error.

In order to realize the KV storage system of CP, and to be compatible with the existing redis business. You like the development of ZanKV (open source ZanRedisDB first).

The underlying storage structure is RocksDB (the underlying LSM data structure). A setx=1 will be transmitted through the redis protocol protocol, and the content will be synchronously written to the RocksDB of other nodes through the Raft protocol. With the blessing of Raft theory and the excellent storage performance of RocksDB, it can easily deal with a series of abnormal situations, such as network partition, master node down drop, slave node down drop, and so on. In terms of capacity expansion, the system chooses to maintain the mapping table to establish the relationship between partitions and nodes, and the mapping table will be generated according to a certain algorithm and flexible strategy to achieve convenient expansion. The specific principle can be found in "using open source technology to build a like distributed KV storage service".

This paper introduces consistency from three aspects. The first is to describe the core theory of distributed architecture-CAP, and its simple proof. The second part introduces the protocol in CAP, with emphasis on Raft protocol. The third part focuses on the commonly used principle of zookeeper.

In order to ensure that the data can not be lost after commit, the system will use (WAL write ahead log) (write the operation content log before each modification of the data, and then modify the data. Even if there is an exception when modifying the data, the data can be recovered through the operation content log)

Distributed storage system is designed on the assumption that the machine is unstable and down may fall off at any time. In other words, even if the machine down is lost, the data written by the user can not be lost to avoid a single point of failure. For this reason, each written data needs to be stored in multiple copies at the same time. For example, zk node data replication, etcd data replication. On the other hand, replicating data will bring consistency problems to nodes, such as how to synchronize data between master node and slave node. It will also bring usability problems, such as down drop of leader node, how to quickly select master, restore data and so on. Fortunately, there are mature theories, such as Paxos protocol, ZAB protocol, Raft protocol and so on.

The above is the editor for you to share how to practice the CAP conformance protocol and application, if you happen to have similar doubts, you might as well refer to the above analysis to understand. If you want to know more about it, you are welcome to follow the industry information channel.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.