In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-01-18 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Database >
Share
Shulou(Shulou.com)05/31 Report--
How to analyze the database concurrency control, I believe that many inexperienced people are at a loss about it. Therefore, this paper summarizes the causes and solutions of the problem. Through this article, I hope you can solve this problem.
In the development standard of database transaction isolation, this paper introduces the isolation level of database from the point of view of standard formulation, and introduces the definition of isolation level of Read Uncommitted, Read Committed, Repeatable Read, Serializable and so on. Let's take a look at the common mechanisms for implementing transaction isolation, called concurrency control (Concurrency Control).
Principle
The so-called concurrency control is the mechanism that ensures the correct execution of concurrently executed transactions at a certain isolation level. It should be pointed out that the scheduler of the database is responsible for concurrency control, and the transaction itself is not aware of it. As shown in the following figure, Scheduler arranges the read and write requests of multiple transactions into a legal sequence to be executed in turn:
In this process, the scheduler may choose the following two ways to handle conflicting transactions that may compromise the correctness of the data:
Delay: delaying the execution of a transaction to a legitimate moment
Abort: directly discard the commit of the transaction and roll back the possible impact of the transaction
It can be seen that Abort brings higher costs than Delay, so let's introduce how different concurrency control mechanisms are handled in different situations.
classification
As shown in the figure above, common concurrency control mechanisms are classified from horizontal and vertical dimensions:
1. Degree of optimism
Different implementation mechanisms, based on different assumptions about the probability of conflict, assume that as long as two transactions access the same database object, there must be a conflict, so it should be stopped as soon as possible, while the optimistic way is that the probability of conflict is small, so it will postpone the opportunity to deal with the conflict. As shown in the Abscissa above, optimism increases from left to right:
Based on Lock: the most pessimistic implementation requires locking the database objects to be accessed and Delay the conflicting operation before the operation or even the transaction starts.
Based on Timestamp: optimistic implementation, each transaction gets a globally incremental timestamp at the beginning, expects to be executed sequentially according to the initial timestamp, checks for conflicts and selects Delay or Abort when manipulating database objects
Based on Validation: a more optimistic implementation, Validate only before Commit, Abort for conflicting transactions
As you can see, the essential difference between mechanisms with different levels of optimism is to check or anticipate the timing of conflicts, Lock at the beginning of the transaction, Timestamp during the operation, and Validation before the final Commit. Compared with the pessimistic approach, the optimistic mechanism can achieve a higher degree of concurrency, and once a conflict occurs, Abort transactions will also bring more overhead than Delay.
two。 Single version VS multiple version
As shown in the vertical coordinates above, there are multiple versions of the implementation with the same degree of optimism. The so-called multi-version is to generate a new version of the data each time the database object needs to be modified, and multiple versions of each object coexist. The read request can directly access the corresponding version of the data, thus avoiding the mutual blocking of read-write transactions and read-only transactions. Of course, multiple versions also incur maintenance costs for different versions, such as the need for a garbage collection mechanism to release versions that are not visible by anything.
It should be pointed out that these concurrency control mechanisms are not bound with specific isolation levels, and different intensity isolation levels can be achieved through different rules of conflict judgment. The implementation of each mechanism is described in detail based on Serializable.
Based on Lock
Scheduler based on Lock needs to add necessary lock protection before transactions access data. In order to improve concurrency, different modes of locks are assigned according to the actual access situation, such as read-write locks, update locks and so on. Most simply, it is necessary to hold the lock for a long time until the end of the transaction. In order to improve the degree of parallelism on the basis of ensuring correctness as much as possible, the commonly used locking method in the database is called two-phase lock (2PL). The Growing phase can apply for locking, and the Shrinking phase can only be released, that is, there can be no more locking requests after the lock is released for the first time. It should be noted that 2PL can not solve the problem of deadlock, so there needs to be a mechanism for deadlock detection and handling, usually selecting deadlocked transactions for Abort.
Scheduler's judgment of conflicts also needs to cooperate with Lock Table. As shown in the following figure, a possible Lock Table information is indicated. Each database object accessed has a corresponding table entry in Lock Table, which records the current highest mode of holding locks, whether there are transactions in the Delay, and the list of transactions that hold or wait for the corresponding locks. At the same time, record its transaction ID for each transaction in the linked list, the mode in which the lock is requested, and whether the lock is already held. Scheduler will determine whether the lock or Delay can be added by looking for Lock Table when the lock request arrives, if the Delay needs to be inserted into the linked list. Corresponding when the transaction Commit or Abort needs to release the lock held by it, and wake up the transaction waiting for Delay in the queue according to different policies.
Based on Timestamp
Timestamp-based Scheduler allocates a global self-incrementing Timestamp at the beginning of a transaction, which is usually generated by a physical timestamp or system-maintained self-incrementing id to distinguish the start of a transaction. At the same time, each database object needs to add some additional information, which is updated by the corresponding transaction after access, including:
RT (X): Timestamp of the largest read transaction
WT (X): the Timestamp of the largest write transaction
C (X): whether the newly modified transaction has been committed
Based on the Timestamp assumption that the order of the Timestamp at the beginning is the order in which the transaction executes, when the transaction accesses the database object, by comparing the transaction's own Timestamp with the information of the object, we can find the situation that is inconsistent with the starting order, and deal with it:
Read Late: transactions that are later than their own Timestamp write to the data before they want to Read, and modify WT (X), which will Read inconsistent data.
Write Late: transactions that are later than their own Timestamp read this data before they want to Write and modify RT (X), which will cause inconsistent data to be read by the other party if they continue to write.
Both cases are caused by the difference between the order in which the actual data is accessed and the starting order, and Scheduler needs to Abort the conflicting transactions.
Read Dirty: by comparing C (X), you can see whether you see the data that is already Commit. If you need to guarantee Read Commit, you need to commit the Delay transaction to the Commit of the other party.
Based on Validation (OCC)
The Validation-based approach, sometimes called Optimistic Concurrency Control (OCC), is probably because it is more optimistic than the Timestamp-based approach, delaying conflict detection until before Commit. Unlike Timestamp, which records the read and write time of each object, Validation records the set of read and write operations of each object. And divide things into three stages:
Read phase: reading data from the database and completing the write operation in the private space, at this time it is not actually written to the database. Maintain the read-write set of current transactions, RS, WS
Validate phase: compare the read and write sets of the current transaction with other transactions with time overlap to determine whether it can be committed.
Write phase: if the Validate is successful and enter the Write phase, the database will only be written here.
At the same time, Scheduler records the start time START (T), verification time VAL (T), and completion write time FIN (T) of each transaction. The Validataion-based approach assumes that the order of the transaction Validation is the order in which the transaction is executed, so it may be inconsistent to check the order of access data during verification:
Whether there is an intersection between RS (T) and WS (U), for any transaction, Udre fin (U) > START (T), if there is an intersection, the reading of T may be out of order with the writing of U
Whether there is an intersection between WS (T) and WS ({U)? for any transaction U, Fin (U) > VAL (T), if there is an intersection, the writing of T may be out of order with the writing of U.
Multiversion (MVCC)
Corresponding to each of the above levels of optimism, there can be multiple versions of the implementation, and the advantage of multiple versions is that read-write transactions and read-only transactions do not interfere with each other, resulting in a better degree of parallelism, which is precisely because this has become the choice of almost all mainstream databases. To achieve multiple versions of concurrency control, each transaction needs to be assigned a unique identity TID at the beginning, and the following information needs to be added to the database object:
Txd-id to create this version of the transactional TID
Begin-ts and end-ts record the transaction TID when the version is created and expired, respectively.
Pointer: a linked list that points to other versions of the object
The basic idea of its implementation is that each write operation to the database object generates a new version, marks the new version of begin-ts and the previous version of end-ts with its own TID, and adds itself to the linked list. The read operation compares its own TID with the data version of begin-ts,end-ts and finds the latest visible version to access it. Multiple versions of mechanisms are also divided into three categories according to optimism:
1. Two-phase Locking (MV2PL)
Similar to the single version of 2PL, it also requires Lock Table to track current locking and wait information, and adds begin-ts and end-ts information needed for multiple versions to database objects. The write operation requires a write lock on the latest version and the generation of a new data version. The read operation adds a read lock to the latest visible version found.
2. Timestamp Ordering (MVTO)
Compared with the single version of Timestamp to record Read TimeStamp (RT), Write TimeStamp (WT) and Commited flag (C) information for each database object, begin-ts and end-ts are added to identify the version, and also get a unique incremental Start TimeStamp (TS) before the transaction starts. Writing a transaction needs to verify the order by comparing its own TS with the latest visible version of RT. To write is to create a new version and mark the new version of WT with your own TS. Unlike a single version, this WT message will never change. The read request reads the latest version visible to itself, and modifies the corresponding version of RT after access, and also avoids Read Uncommitted by judging C flag information.
3. Optimistic Concurrency Control (MVOCC)
Compared with the single version of Validataion (OCC), it is also divided into three stages. The Read phase finds the latest visible version according to begin-ts,end-ts. The difference is that in multiple versions, the write operation of the Read phase is not completed in the private space, but directly generates a new version and operates on it. Because its pre-commit begin-ts is INF, it is not used by other transaction courseware. The Validation phase allocates a new Commit TID and uses it to determine whether it can be committed; the begin-ts is modified to Commit TID through the Write phase through the transaction of the Validation.
Compared with the pessimistic lock implementation, the optimistic mechanism can achieve better concurrency effect in the case of fewer conflicts, but in the event of conflict, the overhead of transaction rollback is much greater than the blocking of pessimistic implementation. so they adapt to different scenarios. Multi-version can achieve good concurrency effect in most database scenarios because it avoids the mutual blocking of read-write transaction and read-only transaction, so it is adopted by most mainstream databases. It can be seen that all kinds of concurrency control mechanisms, such as optimistic and pessimistic choice, multi-version implementation, read-write lock, two-stage lock and so on, are all based on the determined isolation level to improve the system throughput as much as possible. it can be said that the choice of isolation level determines the upper limit, while the implementation of concurrency control determines the lower limit.
This paper divides the selection of available concurrency control mechanisms from the degree of optimism and pessimism and the selection of single version and multiple versions, and introduces the general design ideas of various mechanisms, but there is still a long way to go from the real implementation. including implementation details and supporting mechanisms. For example, in various types of MVCC, there are some related problems caused by the existence of multiple versions, such as garbage collection, index management, version storage and so on. We will then take MyRocks as an example to look at the specific implementation of concurrency control in engineering.
After reading the above, have you mastered how to analyze the method of database concurrency control? If you want to learn more skills or want to know more about it, you are welcome to follow the industry information channel, thank you for reading!
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.