In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-01-17 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >
Share
Shulou(Shulou.com)06/01 Report--
Today, I will talk to you about the application of MVCC ideas in distributed systems, which may not be well understood by many people. in order to make you understand better, the editor has summarized the following content for you. I hope you can get something according to this article.
Recently, a concurrency control problem of distributed system has been encountered in the project. The problem can be abstracted as follows: a distributed system consists of a data center D and several business processing centers L 1 and L 2. It is composed of Ln; D is essentially a key-value storage, which provides CRUD operation interface based on HTTP protocol. The business logic of L can be abstracted into the following three steps:
Read: according to keySet {K1,... Kn} get keyValueSet {k1:v1,... from D Kn:vn}
Do: conduct business processing according to keyValueSet, and get the dataset keyValueSet' {K1 updated V1,. Km':vm'} (Note: read keySet and updated keySet' may be different)
Update: update keyValueSet' to D (Note: d guarantees the atomicity of updating multiple key in one call)
In the absence of transaction support, concurrent processing of multiple Ls may cause data consistency problems. For example, consider the following execution order for L1 and L2:
L1 reads the corresponding value of key:123 from D
L2 reads the corresponding 100s of key:123 from D
L1 increases the value by 1 and updates key:123 to 100 + 1.
L2 increases the value by 2 and updates key:123 to 100 + 2.
If L1 and L2 execute serially, the corresponding value of key:123 will be 103, but the execution effect of L1 in the above concurrent execution is completely overridden by L2, and the corresponding value of the actual key:123 becomes 102.
Solution 1: locking mechanism
To make L's processing serializable (Serializable), one of the most direct solutions is to consider adding simple lock-based transactions to D. Let L lock D before business processing, and release the lock after completion. In addition, in order to prevent the L holding the lock from not committing the transaction for a long time for some reason, D also needs to have a timeout mechanism, which will get an error response when L tries to commit a transaction that has timed out.
The advantage of this scheme is that it is easy to implement, but the disadvantage is that the whole data set is locked, and the granularity is too large; the time includes the whole processing time of L, and the span is too long. For this reason, you can consider reducing the locking granularity to the data item level and locking by key, but this will lead to other problems. Because the updated keySet' may be uncertain in advance, it may not be possible to lock all the key; at the beginning of the transaction. If you lock the required key in phases, there may be a Deadlock problem. In addition, locking by key does not solve the problem of locking for too long in the case of lock contention. Therefore, there are still important deficiencies in locking by key.
Solution 2: multi-version concurrency control
In order to realize serializability and avoid various problems of locking mechanism, we can adopt unlocked concurrency mechanism based on multi-version concurrency control (Multiversion concurrency control,MVCC). Lock-based concurrency controllers are generally called pessimistic mechanisms, while mechanisms such as MVCC are called optimistic mechanisms. This is because the locking mechanism is preventative, read will block write, write will also block read, when the locking granularity is larger, longer time, the concurrency performance will not be too good; while MVCC is a posteriori, read does not block write, write does not block read, wait until the submission to check whether there is a conflict, because there is no lock, so read and write will not block each other, thus greatly improving the concurrency performance.
We can use source code version control to understand MVCC, everyone is free to read and modify the local code without blocking each other, only when the version controller will check for conflicts and prompt merge. At present, Oracle, PostgreSQL and MySQL all support the concurrency mechanism based on MVCC, but the specific implementation is different.
A simple implementation of MVCC is conditional update (Conditional Update) based on CAS (Compare-and-swap) ideas. The normal update parameter contains only one keyValueSet',Conditional Update and adds a set of update conditions conditionSet {. Data [keyx] = valuex,...}, that is, the data is updated to keyValueSet'; only if D meets the update condition. Otherwise, an error message is returned. In this way, L forms the processing mode of Try/Conditional Update/ (Try again) as shown in the following figure:
Although there is no guarantee for a single L to update successfully every time, from the perspective of the whole system, there are always tasks that can proceed smoothly. This scheme uses Conditional Update to avoid large granularity and long-term locking, and the concurrency performance is very good when there is little resource contention between businesses. However, because Conditional Update requires more parameters, if the length of the value in condition is very long, then the amount of data transmitted by the network each time will be relatively large, resulting in performance degradation. Especially when the keyValueSet' that needs to be updated is very small and the condition is very large, it is very uneconomical.
In order to avoid the performance problems caused by too large condition, an int version number field can be added to each data item. D maintains the version number and increases the version number every time the data is updated; L replaces the specific value with the version number when Conditional Update.
Another problem is that the above solution assumes that D can support Conditional Update; what if D is a third-party key-value storage that does not support Conditional Update? At this point, we can add a P as a proxy between L and D, and all CRUD operations must go through P for conditional checking, while the actual data operation is placed on D. This method realizes the separation of conditional checking and data operation, but at the same time reduces the performance, so it needs to add cache in P to improve the performance. Because P is the only client of D, the cache management of P is very simple, and there is no need to worry about cache invalidation as in the multi-client case. However, in fact, as far as I know, both redis and Amazon SimpleDB already have Conditional Update support.
Comparison between lock mechanism and MVCC
The basic principles of lock mechanism and MVCC are introduced above, but it is not clear where they are applicable and where the advantages and disadvantages of the two mechanisms are reflected in different situations. Here I will make a simple analysis of some typical application scenarios. It should be noted that the following analysis does not target distribution. Locking and MVCC mechanisms exist at all levels of distributed systems, single database systems, and even in-memory variables.
Scenario 1: high response speed to read
There is a class of systems that update very frequently and require high response to reads, such as stock trading systems. Under the locking mechanism, writes block reads, so when there are write operations, the response speed of read operations will be affected; while in MVCC, there is no read-write lock, and read operations are not blocked, so the response speed of reads will be faster and more stable.
Scenario 2: far more reading than writing
For many systems, the proportion of read operations is often much larger than that of write operations, especially in some massive concurrent reading systems. Under the lock mechanism, when write operations occupy the lock, a large number of read operations will be blocked, affecting the concurrency performance, while MVCC can maintain high and stable read concurrency ability.
Scenario 3: frequent write conflicts
If the proportion of write operations in the system is high and conflicts are frequent, it needs to be carefully evaluated. Suppose two conflicting services, L1 and L2, are time-consuming to execute separately. Under the locking mechanism, their total time is approximately equal to the time of serial execution:
T = T1 + T2
Under MVCC, suppose that L1 is updated before L2, and L2 needs retry once, and their total time is approximately equal to the time taken by L2 to execute twice. (here, it is assumed that the two executions of L2 take equal time, and even better, if the first time can cache some of the valid results, the time required for the second execution of L2 may be reduced.)
Tightness = 2 * T2
At this point, the key is to evaluate the cost of retry. If the cost of retry is low, for example, incrementing a counter, or if the second execution can be much faster than the first, the MVCC mechanism is more appropriate. On the other hand, if the cost of retry is high, for example, report statistics need to be calculated for several hours or even a day, then you should use a locking mechanism to avoid retry.
From the above analysis, we can simply draw the conclusion that scenarios with higher response speed and concurrency requirements for reads are suitable for MVCC;, while scenarios with higher retry cost are more suitable for locking mechanism.
After reading the above, do you have any further understanding of the application of MVCC ideas in distributed systems? If you want to know more knowledge or related content, please follow the industry information channel, thank you for your support.
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.