What are the knowledge points of zookeeper data consistency 07/13 Update SLTechnology News&Howtos

What are the knowledge points of zookeeper data consistency

2025-07-13 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Shulou(Shulou.com)06/01 Report--

This article mainly explains "what are the knowledge points of zookeeper data consistency?" interested friends may wish to take a look. The method introduced in this paper is simple, fast and practical. Next, let the editor take you to learn "what are the knowledge points of zookeeper data consistency?"

Point in time consistency (Point in time Consistency)

I think point-in-time consistency can also be called replica consistency. Time-point consistency is defined as:

If all relevant data components are consistent at any time, it can be called point-in-time consistency.

If you have known CAP theory, you should be familiar with this definition. (if you are not familiar with this article, you can read my article on distributed transactions.)

C in CAP is defined as a read operation that returns the latest write operation for a given client. We can find that the time point does not stipulate that the need for consistency is up-to-date, so some students may question that the scope of time point consistency is a little larger than that in CAP. In fact, consider that if one of our data components updates the data, in order to meet the point-in-time consistency, then the data of all our relevant data components are consistent, so other data will become up-to-date, then it is actually the same as CAP, and it needs to be satisfied that if the data is updated at a certain node, then the latest data can be read at other nodes.

Of course, CAP and point-in-time consistency are not completely consistent: the definition of point-in-time consistency requires that the data of all data components are completely consistent at any time, but generally speaking, the maximum speed of information transmission is the speed of light, in fact, it cannot be consistent at any time, and there is always a certain time inconsistency. For the consistency in our CAP, we only need to read the latest data. Achieving this situation does not require strict consistency at any time.

We also need to note here that this is not always used in distributed systems. If we have multi-core processors in a single machine, it can be the same if we access different processors at any time for the same variable data.

Transaction consistency

Consistency can be used not only to represent simultaneous changes or homogeneity of data, but also to express constraints, and our transaction consistency is one of them. Transaction consistency is what we usually call C in ACID, which is defined as follows:

Transaction consistency means that the database must be in a consistent state both before and after the execution of a transaction. If the transaction completes successfully, all changes in the system will be applied correctly and the system will be in a valid state. If an error occurs in the transaction, all changes in the system are automatically rolled back and the system returns to its original state.

Transaction consistency can only exist before the start of the transaction and after the transaction is completed, and the data may be inconsistent during the transaction. For example: for example, if A transfers 100 yuan to BMagazine A deduction and adds 100 to BMague B, which can ensure that their accounts are matched before and after the transaction is completed, then this is transaction consistency. However, in the course of the transaction, it is possible that A deducts 100 yuan and B does not add 100 yuan. This is inconsistency.

The average beginner here will misinterpret the C in CAP and ACID as the same meaning, in fact, one of them represents the same data, and the other is used to express some kind of constraint.

Application consistency

Application consistency can be regarded as a kind of constraint consistency. The above transaction consistency represents a single data source, if there are multiple data sources, such as multiple databases, file systems, caches, etc. Then we need to apply consistency, which is also seen here as distributed transaction consistency.

Multiple different stand-alone transactions are involved in the application, and the data is completely consistent only before and after all stand-alone transactions are completed. For example, sending coupons and points to users, voucher service and points service are two services, each of which has its own stand-alone transaction. These two stand-alone transactions can ensure that the user's account is corresponding before and after the completion of the transaction. However, in the execution process of these two stand-alone transactions, there may be a situation in which only coupons are given and no points are sent, and the status may be incorrect.

These three kinds of consistency can be simply regarded as two categories, one is that the data copy is consistent, the other is that the data constraint is consistent. Next I'll talk more about the consistent types of data replicas, and the consistency of data constraints can be found in my previous article on distributed transactions.

Consistent model

Before writing this article, I always thought that there were only a few commonly heard of consistency, strong consistency, weak consistency, and final consistency. After consulting some literature, I found that there are really many types of consistency. Here I choose some of the more important ones.

What consistency models do you know if someone asks you? Many people immediately answered that they were strongly consistent and eventually agreed.

Many consistent models are initially used to describe whether memory is consistent, that is, they are not used in distributed systems in the first place. If our machine is a single core, then its memory must be strongly consistent. If our machine is multicore, inconsistencies may occur because the processor is not the memory accessed directly but the processor-specific cache accessed. In redistribution, each of our nodes can actually be regarded as an independent processor, and we initially used in the memory consistency model, can also be applied to our distributed systems. Next I'll talk about some common consistency models from strong to weak.

Linear consistency

Linear consistency is also called atomic consistency, strong consistency. Linear consistency can be seen as having only one single-core processor, or as having only one copy of data, and all operations are atomic. In a linearizable distributed system, if one node updates the data, then all other nodes can read the latest data. You can see that the linear consistency is consistent with the C in our CAP.

Take an example of non-linear consistency, for example, there is a flash sale activity, you and your friend rush to buy something at the same time, it is possible that his stock is gone, but there are still a few items on your phone, which violates the linear consistency. Even if your phone shows that there is no stock after a while, it is still a violation.

What is the use of linear consistency? The following three functions are described in the book DDIA:

Lock and master node election: the master-slave replication system needs to ensure that there is only one master node, otherwise brain fissure will occur. It is common to use locks to elect a new master node: each startup node needs to acquire a lock. The lock needs to be linearized so that all nodes agree on which node has the lock at the same time. Our ZooKeeper can be used to provide distributed locking capabilities, so can we say that ZooKeeper is linearly consistent? This can only be said to be partially correct, and then the sequential consistency will explain what the consistency of ZK is again.

Constraint and uniqueness guarantee: just as two identical file names are not allowed in a file directory, the database primary key cannot be repeated, all of which need to be linearized. In fact, these essence is similar to locking, such as the same file name, that is to do a lock operation on the file name, and then save it, and then save it will make an error.

Cross-channel time dependence: why was the previous example of panic buying violated? The reason is that we told this channel through our friends to let us know in advance that the goods had been sold out. Similarly, if there are multiple channels in our computer. For example, in the case of a user transaction, if the user uses 50 yuan, then 50 yuan will be deducted from his balance, and the event will be sent out as a message queue. Then the SMS service will query the user's balance and send a text message. If the data of the balance database has not been updated at this time, the SMS may get the user's old balance. The reason for the inconsistency here is that there is one more channel, just like the one our friends above told us to sell out. This solution can control a certain channel, such as passing the user's balance as a parameter, or reading only the main library. In the case of second kill, you can use a friend's phone instead of your own phone.

Sequential consistency

Sequential consistency is weaker than strict consistency. Writes to variables do not have to be seen instantly, but writes to variables by different processors must be seen in the same order on all processors, where processors can be replaced with different nodes in a distributed system.

Here we go back to Zookeeper. What is the consistency? Many interview questions will ask whether Zookeeper is CP or AP. Many people will answer that Zookeeper is CP, but this answer is not very rigorous. We know from linear consistency that consistency in CAP refers to linear consistency, so can we say that Zookeeper is linear consistent? The answer is no. When we write a value, it will be handled by Leader, and the Zab protocol only needs to ensure that half of the slave nodes are successful, then there will be nodes whose data is old, so it is possible that the data read by the client is not up-to-date, thus breaking the linear consistency.

Zookeeper actually realizes the order consistency, using zxid (ZooKeeper Transaction Id) in ZK to achieve the overall order consistency. Of course, it can also be considered that the writing of Zookeeper is linear consistency, and the reading is order consistency. Slave nodes receive leader broadcasts sequentially through zxid, so ZK cannot guarantee that all information will be seen immediately, but it will eventually be seen. Of course, Zookeeper can actually be linearized. There is a sync () command in ZK. As long as we call sync () to force synchronization of data every time we read it, we can guarantee that it is up-to-date.

Sequential consistency was proposed by Lamport, the author of the Paxos algorithm, and was initially used only to define the consistency of multiprocessing memory, which defines what sequential consistency is in Lamport's "How to Make a Multiprocessor Computer That Correctly Executes Multiprocess Programs":

The result of any execution is the same as if the operations of all the processors were executed in some sequential order, and the operations of each individual processor appear in this sequence in the order specified by its program.

The general meaning of this sentence is that the execution effect of a multiprocessor is the same as that of a single processor, and the operations of each independent processor appear in the operation queue in the specified order. This was originally used for concurrent programming, but it made multiprocessor execution less useful than a single processor, and then it was used in distributed systems. In ZK, all writes are assigned to the Leader node, and all operations are updated according to the order of zxid. Here is the order specified above, and the queue is in the order of zxid.

Causal consistency

Causal consistency is a consistency model that is weaker than sequential consistency. Sequential consistency requires that the order of all operations must be in the order of a single processor (node). Causal consistency only needs to satisfy that causal operations are sequential consistency.

How to understand causality? To put it simply, if someone asks you a question, then you give the answer, these two are causality, but if you give the answer before the question, then this violates causality.

To take a simple example, if node 1 updates data A, node 2 reads data An and updates data B, the data B here may be calculated based on data A, and all have causality. But if node 3 sees the updated B first and then the updated A, then the causal consistency is broken.

Processor consistency

Processor consistency is a weaker consistency model, and it just needs to make sure that the processor sees that a processor or multiple different processes are consistent with writes to the same location. There is no need to consider causality, but to update the same memory or the same data, you need to see a consistent order.

FIFO consistency

FIFO consistency is weaker than processor consistency, and it does not need to guarantee that writes to the same location are consistent.

All writes done on one processor will be notified to all other processors in the order in which they actually occur; but writes done on different processors may be seen by other processors in a different order than they actually do. This reflects the delay of different nodes in the network may be different in the distributed system. To illustrate the difference between it and the processor consistency, here are the following examples:

In the figure above, it can be found that processor consistency is violated. Why, because the write order is w (x) 1 and w (x) 2, p4 should be R (x) 1 and then R (x) 2. But this conforms to FIFO consistency. FIFO only needs to notify other processors or nodes of its occurrence order, and there is no need to guarantee that the same value is written in the same order. Final consistency

In fact, except for strong consistency, other consistency can be regarded as the final consistency, but many specific consistency models are derived according to the different requirements of different models of consistency. Of course, the simplest ultimate consistency is that you don't need to pay attention to the order of intermediate changes, you just need to be consistent at a certain point in time. It's just that this point in time needs to be measured according to different systems and different businesses. Before the final consistency is complete, it is possible to return any values, and no order guarantee is made for these values.

The E in BASE theory is the final consensus.

What's the use of a consistency model?

With so many consistency models described above, we know that the stronger the consistency, the more constraints it has, and the greater the cost if we implement it. You can see that ZK needs to call sync () at any time to synchronize data if he wants to achieve full linear consistency.

In our real scenario, the master-slave replication model of our database (through binlog replication is also sequential consistency), the great role of the slave database is to alleviate the reading pressure of the master database. If we want to achieve linear consistency blindly, then we must access the master database, so the significance of our slave database is minimal.

Therefore, according to different system models and different business requirements, our requirements for consistency are different, so it is necessary for us to understand these consistent models.

At this point, I believe you have a deeper understanding of "what are the knowledge points of zookeeper data consistency?" you might as well do it in practice. Here is the website, more related content can enter the relevant channels to inquire, follow us, continue to learn!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.