What is the distributed system transaction processing method of the server? 07/01 Update SLTechnology News&Howtos

What is the distributed system transaction processing method of the server?

2025-07-01 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Shulou(Shulou.com)06/01 Report--

This article mainly introduces "what is the transaction processing method of the distributed system of the server". In the daily operation, I believe that many people have doubts about the transaction processing method of the distributed system of the server. The editor consulted all kinds of data and sorted out simple and useful operation methods. I hope it will be helpful for you to answer the doubt of "what is the distributed system transaction processing method of the server?" Next, please follow the editor to study!

When we use a server to provide data services on the production line, I encounter the following two problems:

1) the performance of one server is not enough to provide sufficient capacity to serve all network requests.

2) We are always afraid that our server will go down, resulting in service unavailability or data loss.

So we had to expand our server, add more machines to share performance problems, and solve single points of failure. Typically, we extend our data services in two ways:

1) data partitioning: the data is divided into blocks on different servers (such as: uid% 16, consistent hash, etc.).

2) data mirroring: let all servers have the same data and provide equivalent services.

For the first case, we can not solve the problem of data loss, when there is a problem with a single server, some data will be lost. Therefore, the high availability of data services can only be achieved through the second method-redundant storage of data (generally considered by the industry that the number of more secure backups should be 3, such as Hadoop and Dynamo). However, adding more machines will make our data services very complex, especially cross-server transaction processing, that is, cross-server data consistency. This is a very difficult question. Let's explain with the most classic Use Case: "An account remits money to B account". Anyone familiar with RDBMS transactions knows that six operations are required from account A to account B.

Read the balance from the An account.

Subtract An account.

Write the results back to An account.

Read the balance from the B account.

Add to the B account.

Write the results back to the B account.

For the sake of data consistency, these six things are either completed successfully or unsuccessful, and in the process of this operation, other access to An and B accounts must be locked. The so-called locking means to exclude other read and write operations. Otherwise, there will be the problem of dirty data, which is the transaction. So, as we add more machines, this thing will get complicated:

1) in the data partition scheme: what if the data of account An and account B are not on the same server? We need a transaction across machines. In other words, if the deduction of An is successful, but the increase of B is not successful, we have to roll back the operation of A. In the case of cross-machine, it becomes more complicated.

2) in the data mirroring scheme: the remittance between An account and B account can be done on one machine, but don't forget that we have multiple machines with copies of An account and B account. What if there are two concurrent operations to remit money to account A (to be remitted to B and C), and these two operations take place on two different servers? In other words, in data mirroring, how to ensure the consistency and non-conflict of writes to the same data on different servers?

At the same time, we also have to consider the performance factor, if we do not consider the performance, it is not difficult for the transaction to be guaranteed, and the system can be slower. In addition to performance, we also have to consider availability, that is, one machine is gone, the data is not lost, and the service can be continued by other machines. Therefore, we need to focus on the following situations:

1) disaster recovery: Failover of nodes without data loss

2) data consistency: transaction processing

3) performance: throughput, response time

As mentioned earlier, in order to solve the problem of not losing data, we can only use the method of data redundancy. Even if it is a data partition, each area needs to be processed with data redundancy. This is the data copy: when the data loss of a node occurs, it can be read from the copy, and the data copy is the only way for the distributed system to solve the data loss exception. So, in this article, for the sake of simplicity, we only discuss data consistency and performance in the case of data redundancy. To put it simply:

1) if you want the data to be highly available, you have to write multiple copies of the data.

2) the problem of writing multiple copies will lead to the problem of data consistency.

3) the problem of data consistency will cause performance problems.

This is software development, press the gourd from the ladle.

Consistency model

When it comes to data consistency, there are simply three types (of course, if subdivided, there are many consistency models, such as: order consistency, FIFO consistency, session consistency, single-read consistency, single-write consistency, but for this article to be easy to read, I will only mention the following three):

1) Weak weak consistency: when you write a new value, the read operation may or may not be read on the data copy. For example: some cache systems, online games other players' data has nothing to do with you, systems like VOIP, or Baidu search engine (hehe).

2) Eventually final consistency: when you write a new value, you may not be able to read it, but you are guaranteed to read it eventually after a certain time window. For example: DNS, e-mail, Amazon S3 search engine such as Google system.

3) Strong strong consistency: once the new data is written, the new value can be read at any time in any copy. For example: file system, RDBMS,Azure Table are strong consistency.

From these three consistent models, we can see that Weak and Eventually are generally asynchronous redundant, while Strong is generally synchronous redundant, which usually means better performance, but also means more complex state control. Synchronization means simplicity, but it also means performance degradation. OK, let's go from shallow to deep and look at the technologies step by step:

Master-Slave

The first is the Master-Slave structure, for this addition, Slave is usually a backup of Master. In such a system, it is generally designed as follows:

1) Master is responsible for all read and write requests.

2) after the write request is written to Master, it is synchronized to Slave by Master.

From Master synchronization to Slave, you can use async, you can use synchronization, you can use Master to push, you can use Slave to pull. Generally speaking, it is Slave to periodically pull, so it is the ultimate consistency. The problem with this design is that if the Master collapses during the pull cycle, it will result in data loss in this time slice. If you don't want to lose the data, Slave can only be Read-Only 's way to wait for Master to recover.

Of course, if you can tolerate data loss, you can immediately let Slave work instead of Master (for nodes that are only responsible for computing, without the problem of data consistency and data loss, Master-Slave can solve the single point problem) of course, Master Slave can also be strongly consistent, for example: when we write Master, Master is responsible for writing ourselves first, and then writing Slave after success. After both are successful, the whole process is synchronized. If writing Slave fails, there are two ways, one is to mark the Slave unavailable error and continue to serve (after the Slave recovers the data of synchronizing Master, there can be multiple Slave, so one less, and there is backup, as mentioned earlier), the other is to roll back yourself and return the write failure. (note: generally, you don't write Slave first, because if you fail to write Master, you have to roll back Slave, and if you fail to roll back Slave, you have to manually correct the data.) you can see how complicated it is if Master-Slave needs to make strong consistency.

Master-Master

Master-Master, also known as Multi-master, means that there are two or more Master in a system, and each Master provides read-write services. This model is an enhanced version of Master-Slave, and synchronization between data is generally done through asynchronism between Master, so it is the ultimate consistency. The advantage of Master-Master is that a Master is down and other Master can do read and write service normally. Like Master-Slave, when data is not copied to another Master, data will be lost. Many databases support Master-Master 's Replication mechanism.

In addition, when multiple Master make changes to the same data, the model's nightmare occurs-merging conflicts between data is not an easy task. If you look at the design of Dynamo's Vector Clock (the version number and modifier that records the data), you can see that this is not that simple, and it is up to the user to do the data conflict in Dynamo. Just like our SVN source code conflicts, conflicts on the same line of code can only be handled by the developers themselves. (Dynamo's Vector Clock will be discussed later in this article)

Two/Three Phase Commit

The acronym of this agreement is also called 2PC, which is called two-phase commit in Chinese. In a distributed system, although each node can know the success or failure of its own operation, it cannot know the success or failure of the operation of other nodes. When a transaction spans multiple nodes, in order to maintain the ACID characteristics of the transaction, it is necessary to introduce a component as a coordinator to control the operation results of all nodes (called participants) and finally indicate whether these nodes want to actually commit the operation results (such as writing updated data to disk, etc.). The algorithm for two-phase commit is as follows:

The first stage:

The coordinator asks all participant nodes if they can perform the submit operation.

Each participant begins the preparatory work for transaction execution, such as locking resources, reserving resources, writing undo/redo log.

The participant responds to the coordinator that if the preparation of the transaction is successful, the response "can be committed", otherwise the response "reject commit".

The second stage:

If all participants respond "can submit", the coordinator sends a "formal submit" command to all participants. The participant completes the formal submission, releases all resources, and then responds to "done". The coordinator collects the "complete" response from each node and ends the Global Transaction.

If one participant responds to "reject submission", the coordinator sends a "rollback operation" to all participants, releases all resources, and then responds to "rollback complete". After the coordinator collects the "rollback" response from each node, cancel the Global Transaction.

We can see that, to put it bluntly, 2PC is an algorithm for doing Vote in the first stage and making decisions in the second stage. We can also see that 2PC is a strongly consistent algorithm. We discussed Master-Slave 's strong consistency strategy earlier, which is somewhat similar to 2PC, except that 2PC is more conservative-try before committing. 2PC uses more, in some system design, will concatenate a series of calls, such as: a-> B-> C-> D, each step will allocate some resources or rewrite some data. For example, our B2C online shopping order operation will have a series of processes to do in the background. If we do it step by step, there will be such a problem, if a certain step can not be done, then the previous allocation of resources need to do reverse operation to recycle them, so the operation is more complicated. Now many Workflow will learn from the 2PC algorithm, using the try-> confirm process to ensure that the whole process can be completed successfully. To take a popular example, when Western churches get married, they all have this passage:

1) the priest asked the bride and groom: would you like to... In spite of birth, old age, sickness and death... (inquiry stage)

2) when both the bride and groom say yes (lock the resources for a lifetime), the priest will say: I declare you …... (transaction commit)

What a classic two-phase commit transaction. In addition, we can also see some of these problems, A) one of them is a synchronous blocking operation, which is bound to have a significant impact on performance. B) another major problem is on TimeOut, such as

1) if in the first phase, the participant does not receive the request for inquiry, or the participant's response does not reach the coordinator. In that case, the coordinator is required to handle the timeout. Once the timeout occurs, it can be considered a failure or a retry.

2) if in the second stage, after the formal submission is sent, if some participants do not receive it, or the confirmation message after the submission / rollback is not returned, once the response of the participant timed out, either retry, or mark that participant as a problem node and remove the entire cluster, which ensures that the service nodes are data consistent.

3) the bad situation is that in the second stage, if the participant does not receive the commit/fallback instruction from the coordinator, the participant will be in the "status unknown" stage, and the participant has no idea what to do, for example: if all the participants have completed the reply of the first stage (maybe all yes, maybe all no, maybe some yes part no), if the coordinator dies at this time. Then all the nodes have no idea what to do (not even to ask other participants). For consistency, either wait for the coordinator or resend the first phase of the yes/no command.

The biggest problem with the two-stage commit is item 3). If the participant does not receive a decision at the second level after the first phase is completed, then the data node will enter a "bewildered" state, which will block the entire transaction. In other words, the coordinator Coordinator is very important for the completion of the transaction, and the availability of Coordinator is the key. Therefore, we introduce a three-paragraph commit, which is described on Wikipedia as follows, and he changes the first paragraph of the second submission break into two: ask, and then lock the resource. Finally, it is really submitted. The schematic diagram of the three paragraphs is as follows:

The core idea of the three-paragraph submission is that resources are not locked when asked, and resources are not locked until everyone agrees.

In theory, if all the nodes in the first phase return success, then there is reason to believe that the probability of successful submission is high. In this way, the probability of unknown status of the participant Cohorts can be reduced. In other words, once the participant receives the PreCommit, it means he knows that everyone has actually agreed to modify it. This is very important. Let's take a look at the state transition diagram of 3PC: (note the dotted lines in the diagram, those FMague T are Failuer or Timeout, where the meaning of state is Q-Query,a-Abort,w-Wait,p-PreCommit,c-Commit)

From the state change diagram of the figure above, we can see from the dotted line (those FMague T is Failuer or Timeout)-if the problem of Fhand T occurs when the node is in state P (PreCommit), the advantage of three-stage commit over two-stage commit is that three-stage commit can continue to directly change the state to C state (Commit), while two-stage commit is at a loss.

In fact, three-paragraph submission is a very complex matter, which is quite difficult to implement, and there are also some problems.

See here, I believe you have a lot of problems, you must be thinking about a variety of failure scenarios in 2PC/3PC, you will find Timeout is a very difficult thing to deal with, because the Timeout on the Internet often makes you confused, you do not know whether the other party did it or not. So you have a good state machine because Timeout has become a device.

A web service can have three states: 1) Success,2) Failure,3) Timeout, and the third is definitely a nightmare, especially when you need to maintain state.

Two Generals Problem (two generals problem)

The problem of the two generals of Two Generals Problem is such a thoughtful experimental problem: there are two armies, each led by a general, who are now ready to attack a fortified city. Both armies are stationed near the city, occupying a separate hill. A valley separated the two mountains, and the only way for the two generals to communicate was to send their respective messengers to and from both sides of the valley. Unfortunately, the valley has been occupied by the defenders of the city, and there is a possibility that any messenger sent through the valley will be arrested. Please note that although the two generals have agreed to attack the city, they did not agree on the timing of the attack before they each occupied the hilltop positions. The two generals must allow their troops to attack the city at the same time to succeed. Therefore, they must communicate with each other to determine a time to attack and agree to attack at that time. If only one general attacked, it would be a catastrophic failure. This thought experiment includes thinking about how they do it. Here are our thoughts:

1) the first general first sent a message saying "Let's start the attack at 9 a.m.". However, once the messenger was dispatched, it was not known whether he had passed through the valley. Any uncertainty will make the first general hesitate to attack, because if the second general cannot attack at the same time, the troops stationed in that city will repel his army's attack, causing his army pair to be destroyed.

2) knowing this, the second general needs to send a confirmation note: "I received your email and will attack at 9: 00." But what if a messenger with a confirmation message is caught? So the second general will hesitate to confirm whether his message will arrive.

3) so it seems that we are going to ask the first general to send another confirmation message-"I received your confirmation". But what if the messenger is caught?

4) in this way, do we need the second general to send a "confirm receipt of your confirmation" message?

Shit, then you will find that no matter how many confirmation messages are sent, there is no way to guarantee that the two generals are confident enough that their messengers are not captured by the enemy.

There is no solution to this problem. The problem of two generals and its unsolvable proof were first published by E.A. Akkoyunluzhore K.Ekanadham and R.V.Huber in 1975 in "some restricted and eclectic Network Communication Design", which is illustrated in a paragraph describing the communication between the two gangs on page 73 of this article. In 1978, it was named the paradox of two generals in Jim Gray's "considerations for the database operating system" (starting on page 465). This reference is widely mentioned as the source of the definition of the two generals' problems and the proof of their unsolvability.

The purpose of this experiment is to clarify the potential dangers and design challenges of trying to coordinate an action through communication based on an unreliable connection.

From an engineering point of view, a practical way to solve the problem of two generals is to use a scheme that can withstand the unreliability of the communication channel, and does not try to eliminate this unreliability, but to reduce the unreliability to an acceptable level. For example, the first general lined up 100 messengers and predicted that there was little chance that they would all be arrested. In this case, regardless of whether the second general will attack or receive any news, the first general will attack. In addition, the first general can send a message flow, and the second general can send a confirmation message to each of these messages, so that if each message is received, the two generals will feel better. However, we can see from the proof that neither of them is sure that the attack can be coordinated. They do not have an algorithm available (for example, to attack upon receipt of more than four messages) to ensure that only one side of the attack is prevented. In addition, the first general can number each message, saying that this is No. 1, No. 2. Until the n. This method can let the second general know how reliable the communication channel is and return the appropriate number of messages to ensure that the last message is received. If the channel is reliable, only one message will be needed, and the rest will be of little help. The probability of losing the last message is equal to that of the first message.

The two-general problem can be extended to the more abnormal Byzantine General problem (Byzantine Generals Problem), which goes like this: Byzantium is the capital of the Eastern Roman Empire in what is now Istanbul, Turkey. Because of the vast territory of the Byzantine Roman Empire at that time, for defensive purposes, each army was far apart, and messages could only be transmitted between generals and generals by messengers. During the war, all the generals in the Byzantine army must reach a consensus and decide whether they have a chance to win before attacking the enemy camp. But the army may have traitors and enemy spies, and these traitor generals can disrupt or influence the decision-making process. At this time, in the case of a known member rebellion, how the other loyal generals reached an agreement without the influence of the traitors, this is the question of Byzantine generals.

Paxos algorithm

The description of various Paxos algorithms on Wikipedia is very detailed, you can go and have a look.

The problem solved by Paxos algorithm is how to agree on a certain value in a distributed system where the above exceptions may occur, so as to ensure that no matter any of the above exceptions occur, it will not break the consistency of the resolution. A typical scenario is that in a distributed database system, if the initial state of each node is the same and each node performs the same sequence of operations, then they can finally get a consistent state. In order to ensure that each node executes the same sequence of commands, a "consistency algorithm" needs to be executed on each instruction to ensure that the instructions seen by each node are consistent. A general consistency algorithm can be applied in many scenarios and is an important problem in distributed computing. The research on consistency algorithm has never stopped since the 1980s.

The Notes:Paxos algorithm is a consistent algorithm based on message passing proposed by Leslie Lamport (the "La" in LaTeX, who is now at Microsoft Research) in 1990. Because the algorithm was difficult to understand and did not attract people's attention at first, Lamport was republished on ACM Transactions on Computer Systems (The Part-Time Parliament) eight years later. Even so, the paxos algorithm was not taken seriously, and in 2001 Lamport felt that his peers could not accept his sense of humor, so he restated it in an acceptable way (Paxos Made Simple). It can be seen that Lamport has a soft spot for Paxos algorithm. The widespread use of Paxos algorithm in recent years has also proved its important position in distributed consistency algorithms. In 2006, three papers by Google appeared the clue of "cloud", in which Chubby Lock services used Paxos as the consistency algorithm in Chubby Cell, and the popularity of Paxos soared ever since. (Lamport himself described in his blog that he had spent nine years publishing the algorithm.)

Note: in Amazon's AWS, all cloud services are implemented based on an ALF (Async Lock Framework) framework, which uses the Paxos algorithm. When I was at Amazon, watching the internal sharing video, the designer said in the internal Principle Talk that he referred to ZooKeeper's method, but he implemented the algorithm in another way that was easier to read than ZooKeeper.

In a nutshell, the purpose of Paxos is to get nodes across the cluster to agree on a change in a value. The Paxos algorithm is basically a democratically elected algorithm-most decisions become a unified decision of the whole cluster. Any point can put forward a proposal to modify a certain data, and whether or not to pass this proposal depends on whether more than half of the nodes in the cluster agree (so the Paxos algorithm requires that the nodes in the cluster are odd).

This algorithm has two stages (suppose this has three nodes: a _ C

The first stage: Prepare phase

A sends the Prepare Request of the request for modification to all nodes A _ Magi B ~ C. Note that the Paxos algorithm will have a Sequence Number (you can think of it as a proposal number, which is increasing and unique, that is, An and B cannot have the same proposal number), this proposal number will be sent along with the modification request, and any node will reject a request whose value is less than the current proposal number during the "Prepare phase". Therefore, when node An applies for a modification request to all nodes, it needs to carry a proposal number. The newer the proposal, the larger the proposal number will be.

If the proposal number n received by the receiving node is greater than the proposal number sent by other nodes, the node will respond to Yes (the latest approved proposal number on this node) and guarantee that it will not receive other N. What is the meaning of R > Nmurw because Warren R > N? That is, the number of copies read must be larger than the difference between the total number of copies minus the multiple to ensure a successful write.

That is, each time you read it, you get at least one latest version. So you don't read an old piece of data. When we need a highly writable environment, we can configure W = 1 if Number3 then R = 3. At this time, it is considered successful as long as it is successful to write any node, but the data must be read from all nodes when reading. If we want to read efficiently, we can configure Wendn Renewal 1. At this time, any node reading success is considered successful, but writing must be successful for all three nodes to be considered successful.

Some settings of the NWR model can cause dirty data problems, because this is obviously not a strong consistent thing like Paxos, so it is possible that each read and write operation is not on the same node, so there will be some nodes on which the data is not the latest version, but the latest operation has been carried out.

So Amazon Dynamo cited the design of the data version. In other words, if you read out that the version of the data is v1, and you want to backfill the data after you have finished the calculation, only to find that the version number of the data has been updated to v2, then the server will reject you. The version thing is like an "optimistic lock".

However, for distributed and NWR models, versions can also have nightmares-- that is, the problem of version flushing, for example: we set Number3 Widel1, if a value is accepted on node A, the version is v1-> v2, but there is no time to synchronize to node B (asynchronous, should be Word1, even if writing a copy is successful), node B is still v1 version, at this time, node B receives a write request, according to reason. He needs to refuse, but on the one hand, he does not know that other nodes have been updated to v2, on the other hand, he can not refuse, because White1, so write a point on the success. As a result, there are serious version conflicts.

Amazon's Dynamo cleverly sidesteps the issue of version conflicts-- version conflicts are left to users.

So, Dynamo introduced Vector Clock (vector clock? !) this design. This design allows each node to record its own version information, that is, for the same data, two things need to be recorded: 1) who updated me, and 2) what my version number is.

Next, let's look at a sequence of operations:

1) A write request is processed by Node A for the first time. Node An adds a version information (Aline 1). We write down the data at this time as D1 (AMagol 1). Then another request for the same key is processed by A, so there is D2 (AMagne2). At this time, D2 can cover D1 and there will be no conflict.

2) now we assume that D2 propagates to all nodes (B and C), and that the data received by B and C is not generated by customers, but copied to them by others, so they do not generate new version information, so now the data held by B and C is still D2 (AMague 2). So the data and its version number are the same on AMagol BreceC.

3) if we have a new write request to the B node, then the B node generates data D3 (A < 2; B < 1), which means: the global version number of data D is 3, A has been upgraded twice, and B has been upgraded once. Isn't this the so-called code version of log?

4) if D3 does not propagate to C, another request is processed by C, so the data on the C node is D4 (AMagne2; CPhon1).

5) well, here comes the best thing: if there is a read request at this time, we have to remember that our Word1 is R=N=3, so R will read from all three nodes, and at this point, he will read three versions:

Node A: D2 (ABI 2)

Node B: D3 (AMagne2; BPhon1)

Node C: D4 (AMagne2; CPhon1)

6) at this time, it can be judged that D2 is already an old version (already included in D3/D4) and can be discarded.

7) but D3 and D4 are obvious version conflicts. Therefore, it is left to the caller to handle the version conflict on his own. Just like source code version management.

Obviously, the above Dynamo is configured with An and P in CAP.

At this point, the study of "what is the transaction processing method of the distributed system of the server" is over. I hope to be able to solve your doubts. The collocation of theory and practice can better help you learn, go and try it! If you want to continue to learn more related knowledge, please continue to follow the website, the editor will continue to work hard to bring you more practical articles!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.