Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

The consistency principle of Zookeeper

2025-04-06 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Development >

Share

Shulou(Shulou.com)06/03 Report--

This article introduces the knowledge of "the consistency principle of Zookeeper". In the operation of practical cases, many people will encounter such a dilemma, so let the editor lead you to learn how to deal with these situations. I hope you can read it carefully and be able to achieve something!

Zookeeper is ultimately consistent, and because of multiple copies and most successful Zab protocols, when one client process writes a new value, another client process cannot guarantee that it will be read immediately, but it can be guaranteed that it will eventually be read.

Zookeeper's Zab protocol is similar to the Paxos protocol and provides strong consistency.

Whenever I hear these two statements, I want to correct them. Zookeeper is sequential consistency (Sequential Consistency), but it is more complicated to explain. I will discuss my views with you below.

What is Sequetial Consistency?

As we can see from the Zookeeper documentation, it is clearly stated that its consistency is Sequential Consistency.

So, what is Sequential Consistency?

Sequential Consistency was proposed by Lamport in 1979. As defined in this paper, it is Sequential Consistency when the following conditions are met:

The result of any execution is the same as if the operations of all the processors were executed in some sequential order, and the operations of each individual processor appear in this sequence in the order specified by its program.

This English definition is very obscure, which is the usual style of Lamport, rigorous but obscure, and so is the Paxos protocol.

When I saw it, I felt like, "what the heck is this?" Why do I know every English word, but I just don't know what he is saying?

Let's first look at a keyword in the title and definition of this paper to illustrate the scope of application of Sequential Consistency.

The title and definition of the paper include the word "multiprocessor". Multiprocessor means multicore processor.

Judging from this keyword, Sequential Consistency is a feature used to define multi-core processors and programs running on multi-core processors.

The title of Lamport's paper can be translated as: "how to make computers with multi-core processors execute multi-process programs correctly?"

That is to say, if a multi-core processor has the characteristics of Sequential Consistency, the multi-core processor will run correctly, and I will explain what this correct operation means later (that is, the role of Sequential Consistency later in this article).

We can also see from this title that Sequential Consistency should be a concept in the field of concurrent programming (Concurrent Programming).

But now we often talk about Sequential Consistency in the field of distributed systems, such as the consistency of Zookeeper (Zookeeper is obviously a distributed system), which is mainly discussed in this article.

In fact, multiple programs running on multi-core processors are also distributed systems (Lamport also made this point in his pioneering work on Time,Clocks,and the Ordering of Events in a Distributed System distributed systems).

So although Sequential Consistency was first proposed in concurrent programming, it can be used in distributed systems, such as Zookeeper, a distributed storage system discussed in this article.

Another important Linearizability (linear consistency), which is also first proposed in concurrent programming, is also widely used in the field of distributed systems.

Next, let's try to translate the obscure definition of the above paragraph. Doing this translation gives me the feeling of reading comprehension at school.

I won't translate it directly at first, because even if I translate it into Chinese, it is estimated that many people will still have the feeling of "why do I understand every Chinese word and still don't know what I'm saying?"

First of all, let me explain some individual words. First of all, what does "any execution" mean? You have multiple programs (Program) running on multi-core processors, for example, you have two programs.

* there are programs called P1, and its code is as follows:

P1_write (x); P1_read (y)

The second program is called P2 and the code is as follows:

P2_write (u); P2_read (v)

Theoretically speaking, how many kinds of execution is possible for two programs to run on the core of two independent processors? Let me cite several of them to illustrate.

* species:

P1---write (x)-read (y)-P2-write (u)-read (v)-

The second kind:

P1-write (x)-read (y)-P2--write (u)-read (v)-

The third kind:

P1---read (y)-write (x)-P2-write (u)-read (v)-

We have 24 possible execution orders, that is, an arbitrary arrangement and combination of these four operations, that is, 4 operations 24.

Possibilities such as the * and the second are easy to understand. Why is there such a possible execution as the third?

That's because even in the same program, due to multi-level caching and the presence of coherence in the processor, although your program is write before read, the order in which it really works in memory may be read first and then write.

In fact, there will also be an execution similar to the following, where two operations are performed on both processors at the same time.

P1--write (x)-read (y)-P2--write (u)-read (v)-

If you add in the case of running at the same time, there are more possibilities. My arithmetic is not very good, so I won't go on counting here, because it doesn't matter how many there are. The important thing is to know that there are many possibilities.

Then the term "any execution" in the definition refers to any possible execution, and it can also be understood as all these possible implementations in the definition.

Then let's explain another word-"sequential order". What is sequential order? Let's look through the English dictionary (it feels more like reading comprehension).

Sequential: continuous; successive; sequential

Order: orders; order; rules; [trade] orders

Sequential order-- is in order. What the heck is this?

In fact, sequential means one after another. In the context of the processor, sequential means that operartion are performed one after another, that is, sequential execution, without overlap.

Order means to make something orderly according to certain rules after a certain adjustment. For example, the sorting algorithm in the algorithm is ordering, which makes the array orderly according to the rules from big to small or from small to big.

Then sequential order means to have the operation arranged one after another without overlap.

Going back to the example above, if you arrange the four operations according to one rule after another, you will get 4! There are several possible permutations (order). Similarly, it doesn't matter how many there are.

For example:

P1_write (x); P1_read (y); P2_write (u); P2_read (v); P1_read (y); P1_write (x); P2_write (u); P2:read (v); P2_write (u); P2_read (v); P1_read (y); P1:write (x)

I will only list three of them here, and the others can line up by themselves.

The point is, in fact, sequential order is to allow these four operations to be performed one after another without overlap.

Note that the arrangement is not a real execution, but the real execution is any execution, which is a logical assumption, that is, why there is an as if in the definition.

After laying the groundwork, let's begin to translate the sentences in the definition:

Any possible execution effect is the same as the sequential execution of an operation on all processors.

Note that some here means "some", not some, because order is singular (really doing reading comprehension).

This means that if a processor wants to meet this condition, it can only allow those possible implementations that meet this condition to exist, and other unsatisfied possible implementations will not occur.

We can see from the sentence that if a multi-core processor wants to meet Sequential Consistency, the effect of multiple programs running on multiple cores is the same as performing all operations sequentially on one core.

If this is the case, in fact, the power of multicore will basically disappear. So no real multicore processor has implemented Sequential Consistency since Lamport wrote this paper in 1979 or now.

So why did Lamport come up with such an unrealistic concept? (note that when Lamport wrote this paper, he did not extend it to the field of distributed systems, that is, in the field of multi-core processors and concurrent programming.) We will discuss it later.

Another thing to note here is that the word "result" is used in my translation, but in fact, the word "result" is used in the original English text. Is there any difference between the effect and the result? Let's explain what the implementation result is.

Whether it is any real execution or some sort of ordered hypothetical execution, the program will produce certain results, such as the result from print. In fact, the definition says that the result is the same.

If "effect" is used in the definition, then the definition is only a qualitative definition, and if "result" is used, then the definition is a quantitative definition. Quantitative, that is to say, it can be proved mathematically.

From this we can see that the Great God is different, and any theory can be proved to be correct by mathematics. The proof will be mentioned later in this article, and we will sell it again here.

At this point, a more accurate translation of the definition of the sentence is:

The result of any possible execution is the same as the sequential execution of all operations on a certain processor.

We also need to note here that the same result means that if someone really wants to implement a multi-core processor of Sequential Consistency, because they want to ensure that the results are the same, they have some room to optimize, rather than a sequential effect.

However, it is estimated that this optimization is also very limited. Well, finally explained the most difficult sentence, you can breathe a sigh of relief, the second sentence is very simple.

Let's first explain a word that appears in the second sentence-"sequence". The sequential order just explained is sorted sequentially (that is, one by one).

In fact, this is an action, the action will produce a result, and its result produces a queue of operations (operation). The sequence that appears in the second sentence refers to the queue of the operation.

OK, the translation of the second sentence is:

And the operation of each independent processor will appear in the operation queue in the order specified by the program.

That is, if the program is first write (x); then read (y);, then only the operation queues that satisfy this order are eligible.

In this way, many of the possible implementations we have just talked about are much less, and I will not calculate how much is missing here, or that sentence, the quantity is not important, anyway, there is, and it has become less.

So what does the second sentence mean? It means that if a multi-core processor implements Sequential Consistency, then this multi-core processor will basically say goodbye to the automatic (cache) car.

Here I want to continue to sell, even the cache is not the most effective optimization to improve processor performance, why did the god put forward this concept?

All right, here we can translate the two sentences together and take a complete look at them:

The result of any possible execution is the same as that of all operations on a certain processor in order, and the operations of each independent processor appear in the operation queue in the order specified by the program.

From this definition, we can see that the core of this concept is sequential order, which is why Father Lamport calls this consistency model Sequential Consistency.

It can be said that this name is very appropriate, I do not know if this is better for native English speakers to understand, there should not be "what is the order of the ghost" situation. If you think sequential is very appropriate after reading this article, then I have made it clear.

Next, let's give a specific example to illustrate:

Execution A P0 writex=1-- P1-writex= 2murf-P2-read x==1--read x colors 2 P3-read x==1--read x colors 2 sequetial order: P0_write x colors 1 relegation P3 read x colors 1 P4_read Xerox writing Xerox 2 P4_read Xerox reading Xerox games 2 P4_read Xbox 2 execution B P0 write=1-- P1-write Xerox 2 muri-P2-read x==2--read Xanthates 1 P3-read x==2--read Xanthates 1 sequetial order: P1_write Xerox 2 P3_read Xerox characters 2 for P4 million read x stories 1 for P0 stories write x stories 1 for P3 movies read xboxes 1 P4_read Xerox write=1-- 1 execution C P0 write=1-- P1-write Xerox 2 muri-P2-read x==1--read Xanthates 2 P3-read x==2--read Order 1 sequetial order: you can't find an order that meets two conditions in the definition.

Sequetial order: you can't find an order that meets two conditions in the definition.

So if a multicore processor only allows execution An and B, but not C, then the multicore processor is Sequetial Consistency; if it allows C, it is not Sequetial Consistency.

At this point, we have fully explained what Sequetial Consistency is. However, a careful friend may ask, what if your program is multithreaded? Then let's explain one more detail in the definition: the word program.

Program refers to the sequence of instructions that can be run directly on the processor. This is not a strict definition of program, but I would like to point out that this program existed in ancient times that did not exist in the operating system, and prgram refers to the program of that era in the definition above.

There is no concept of processes or threads in this program. These concepts only appeared after the operating system. Because there is no operating system, there is no concept of memory space.

Unlike what we now call program, different programs have their own independent memory address space. Here, memory is shared for different program.

In addition, it is important to note that program can be used to illustrate a variety of programs, whether you are an operating system kernel or an application.

Sequential Consistency is the concept of distributed domain.

As we just said, although Sequential Consistency is aimed at the field of concurrent programming, it is actually a concept in the distributed domain, especially distributed storage systems.

At Distributed system:Principles and Paradigms (author Andrew S.Tanenbaum, Maarten Van Steen), the author slightly modified the definition of Lamport to make it closer to concepts in the distributed domain.

Let's take a look at how the author changed it:

The result of any execution is the same as if the (read and write) operations by all processes on the data store were executed in some sequential order and the operations of-each individual process appear in this sequence in the order specified by its program.

The author replaced processor with process and added the qualification of on the data store, which is not defined in the definition of Lamport, but actually refers to memory by default.

Process refers to the process, and Zookeeper, for example, refers to the application process that accesses Zookeeper. Program is not such a low-level concept, but also an operating system-based application.

The role of Sequential Consistency

All right, now it's time to reveal the two things I sold above. In Lamport's paper, a small example is given as follows:

Process 1 a: = 1; if b = 0 then critical section: a: = 0 else. Fi process 2 b: = 1; if a = 0 then critical section: B: = 0 else. Fi

Lamport said in his paper that if a multicore process meets the conditions of Sequential Consistency, then at most one program can enter critical section.

In the paper, Father Lamport does not explain why at most one program can enter critical section.

Instead, this proof is left to the reader of the paper, just like the after-school exercises in our common textbooks.

Father Lamport should have thought that the proof was too simple to take ink to prove it. Sequential Consistency's paper has less than two pages of A4 paper, which is the shortest paper I have ever seen.

This is Lamport's consistent style of doing things, and there are many details in his Paxos papers that are passed through, leaving readers with endless reverie (fantasy).

Suppose now that we have proved that this is correct (although I did not prove it, the paper gives two references to prove it), what does this example illustrate?

You may have noticed that no locks are used in this example, but it implements that critical section,critical section is a multithreaded synchronization mechanism.

If the multi-core processor is Sequential Consistency, then the concurrent program you write is "naturally correct".

However, in order to pursue performance, processor designers leave the task of ensuring that the program is correct to the program developers.

Only some instructions such as fence and cas are provided at the hardware level. Based on these instructions, various synchronization mechanisms are implemented to ensure the correctness of the operating system and the correctness of the application.

Programmers must be careful with threads and these synchronization mechanisms, or all sorts of unexpected problems will occur. If you don't have a debug multithreaded Bug working overtime for two days in a row, you are a god.

These instructions have a higher level of consistency, that is, linearizability, although the level of consistency is high, but only individual instructions, the processor as a whole only achieve a much lower level of consistency than Sequential Consistency. Therefore, the difficulty of realization is greatly reduced.

Although Lamport's concept of Sequential Consistency has no practical significance in the Concurrent Programming field, it shows us where the programmer's paradise is.

In the programmer's paradise, there are no many (car) lines to write (to) a journey, just write a program. No one will ask you about multithreaded programming or locks during your interview.

In the distributed world, Sequential Consistency is more practical. Zookeeper implements Sequential Consistency.

By the same token, this should also be provable, but so far no Zookeeper community has any papers to prove it.

If you already understand the definition explained above, you can think clearly that Zookeeper is Sequential Consistency. You are welcome to discuss it.

Why does Zookeeper implement Sequential Consistency?

In fact, the consistency of Zookeeper is more complex, the read operation of Zookeeper is Sequential Consistency, and the write operation of Zookeeper is linearizability.

This statement is not written in Zookeeper's official documentation, but it is discussed in detail in the community's email group.

In addition, this point of view is also mentioned in Modular Composition of Coordination Services's paper on Zookeeper (this paper is not the mainstream paper of Zookeeper, but it comprehensively analyzes the characteristics of Zookeeper and the cross-computer room scheme of Zookeeper, and some viewpoints in this paper are also referenced by ele.me 's Zookeeper remodeling).

We can understand Zookeeper this way, in terms of overall (read operation + write operation), it is Sequential Consistency, and the write operation implements linearizability.

Through simple reasoning, we can get a small example in Lamport's paper, which is also true in Zookeeper. We can implement distributed locks in this way.

However, the distributed implementation method officially recommended by Zookeeper does not adopt this approach, but uses the linearizability feature of Zookeeper to implement distributed locks.

Why does Zookeeper implement Sequential Consistency? The core function of Zookeeper is to do coordination service, that is, to do distributed lock services. In a distributed environment, how can Zookeeper itself be "naturally correct"?

There is no other synchronization mechanism to guarantee that Zookeeper is correct, so as long as Zookeeper implements Sequential Consistency, it can guarantee its correctness and provide locking services to the outside world.

This is the end of the introduction to the principle of consistency of Zookeeper. Thank you for your reading. If you want to know more about the industry, you can follow the website, the editor will output more high-quality practical articles for you!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Development

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report