The function and example Analysis of Database concurrency Control 07/09 Update SLTechnology News&Howtos

The function and example Analysis of Database concurrency Control

2025-07-09 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Database >

Shulou(Shulou.com)05/31 Report--

The role of database concurrency control and example analysis, I believe that many inexperienced people do not know what to do, so this paper summarizes the causes of the problem and solutions, through this article I hope you can solve this problem.

1. The role of database concurrency control

1.1 the concept of transactions

Before introducing concurrency control, you first need to understand transactions. The database provides several basic operations such as additions, deletions, modifications and queries, and users can flexibly combine these operations to achieve complex semantics. In many scenarios, users want a set of operations to take effect as a whole, which is a transaction. A transaction is the basic unit of a database state change, consisting of one or more operations (such as multiple SQL statements). The classic transfer transaction consists of three operations: (1) check whether the An account balance is sufficient. (2) if enough, deduct $100 from A. (3) the B account will be increased by 100 yuan.

There is a basic characteristic of transactions: either this group of operations take effect together or none of them take effect. If an error occurs during the execution of the transaction, all the operations that have already been performed will be withdrawn, which is the atomicity of the transaction. If after the failure occurs, some valid transactions cannot be withdrawn, then the database enters an inconsistent state, which is contrary to the real world facts. For example, the transfer transaction failed after deducting 100 yuan from the An account, and the B account has not increased the money. if the debit operation of the An account is not withdrawn, the world will inexplicably lose 100 yuan. Atomicity can be achieved by keeping a log (the value before the change), and some databases cache transaction operations locally and discard the operations in the cache directly in case of failure.

As long as the transaction is committed, its result cannot be changed. Even if there is a system downtime, the state of the database after restart is the same as that before downtime, which is the persistence of the transaction. Downtime will not result in data loss as long as the data is stored in a non-volatile storage medium. Therefore, the database can use the following methods to ensure persistence: (1) all changes are guaranteed to be stored on disk before the transaction is completed. Or (2) before the commit is completed, the change information of the transaction is stored on disk in the form of a log, and the restart process recovers the memory state of the database system according to the log. In general, the database will choose the method (2), leaving the reader to think about the reason.

In order to improve the resource utilization and transaction execution efficiency and reduce the response time, the database allows transactions to execute concurrently. However, if multiple transactions operate the same object at the same time, there must be conflicts, and the intermediate state of the transaction may be exposed to other transactions, causing some transactions to write the wrong values to the database according to the intermediate state of other transactions. There is a need to provide a mechanism to ensure that transaction execution is not affected by concurrent transactions, making users feel as if only their own transactions are currently being executed, which is isolation. Isolation allows users to focus on the logic of a single transaction, regardless of the impact of concurrent execution. The database ensures isolation through concurrency control mechanism. Due to the high requirement of isolation on the execution order of transactions, many databases provide different options, and users can sacrifice part of isolation to improve system performance. These different options are transaction isolation levels.

The database reflects the real world, and there are many restrictions in the real world, such as realistic constraints such as: no matter how the money is transferred between accounts, the total amount will not change, and the age cannot be negative. Gender can only have integrity constraints such as male, female and transgender options. Transaction execution can not break these constraints and ensure that the transaction is transferred from one correct state to another, which is consistency. Unlike the first three properties, which are completely guaranteed by the database implementation, consistency not only depends on the database implementation (atomicity, persistence, isolation, but also to ensure consistency), but also depends on the transaction logic written by the application side.

1.2 how to ensure isolation in transaction concurrency control

To ensure isolation, one way is to execute all transactions serially so that transactions do not interfere with each other. But the serial execution efficiency is very low. In order to increase the throughput and reduce the response time, the database usually allows multiple transactions to execute at the same time. Therefore, the concurrency control module needs to ensure that the effect of concurrent execution of transactions is exactly the same as that of serial execution of transactions (serializability), in order to achieve the requirement of isolation.

In order to easily describe how concurrency control ensures isolation, we simplify the transaction model. A transaction consists of one or more operations, all of which can eventually be split into a series of reads and writes. A batch of simultaneous transactions, an execution order of all reads and writes, is defined as a schedule, such as:

Simultaneous execution of T1 and T2, a possible schedule: T1.read (A), T2.read (B), T1.write (A), T1.read (B), T2.write (A)

If the schedule effect of concurrent transaction execution is equivalent to the schedule (serial schedule) of serial execution, serializability can be satisfied. When a schedule constantly changes the order of read and write operations, it will always become a serializable schedule, but some swaps may lead to different results of transaction execution. In a schedule, the transaction result is changed due to the change of the position of two adjacent operations, so the two operations conflict. Conflicts need to meet the following conditions at the same time:

1. These two operations come from different transactions.

two。 At least one is a write operation.

3. The same operation object

Therefore, common conflicts include:

Read-write conflict. Transaction A reads a row of data first, transaction B then modifies the row data, and transaction B modifies a row of transactions first, transaction A then reads the row record two kinds of schedule. Transaction A reads different results. Such conflicts may lead to unrepeatable reading visions and dirty reading visions.

Write-read conflict. The cause is the same as the read-write conflict. This kind of conflict can lead to dirty reading visions.

Writing conflicts. Two operations write an object one after another, and the result of the latter operation determines the final result of the write. This kind of conflict can lead to a vision of update loss.

As long as the database ensures that the schedule of concurrent transactions, keep the execution order of conflicting operations unchanged, and only exchange non-conflicting operations, they can be considered to be equivalent to serial schedule. This equivalent judgment is called conflict equivalent: two schedule collisions operate in the same order. For example, in the example shown in the following figure, T1 write (A) conflicts with T3 read (A), and T1 occurs before T3. T1 read (B) and T2 write (B) conflict, and T2 precedes T1, so the schedule executed by the transaction on the left is equivalent to the serial schedule (right) executed serially by T2 Magi T1 T3. The order of execution on the left satisfies conflict serializablity.

Another counterexample was analyzed: T1 read (A) conflicts with T2 write (A) and T1 precedes T2 write (A) and T2 write (A) conflicts and T2 precedes T1. The following schedule cannot be equivalent to any serial schedule. It is an anomaly that does not satisfy the execution order of conflict serializablity and will result in update loss.

Generally speaking, serializability is a relatively strict requirement, in order to improve the concurrency performance of the database system, many users are willing to reduce the isolation requirements to seek better performance. Database systems often achieve a variety of isolation levels for users to choose flexibly. You can refer to this article about transaction isolation levels.

The requirements of concurrency control are clear, how to achieve it? Later, according to the optimistic degree of conflict detection, the common implementation methods of concurrency control are introduced one by one.

two。 Concurrency Control based on two-stage Lock

2.1 2PL

Since you want to ensure that the operations are performed in the correct order, the easiest way to think of is to lock and protect the access object. The lock manager module of the database system is specially responsible for locking and releasing locks to access objects to ensure that only transactions that hold locks can operate the corresponding objects. Locks can be divided into two categories: S-Lock and XmurLock are shared locks used by read requests, and X-Lock are exclusive locks used by write requests. Their compatibility is as follows: when operating on the same object, only two read requests are compatible and can be executed at the same time, and both read and write operations are performed serially because of lock conflicts.

2PL (Two-phase locking) is the most common lock-based concurrency control protocol for databases. As its name implies, it consists of two phases:

Phase 1: Growing, the transaction requests all the locks it needs from the lock manager (there is the possibility of locking failure).

Phase 2: Shrinking, the transaction releases locks acquired in the Growing phase, and new locks are not allowed to be requested.

Why should locking and locking be divided into two stages?

The purpose of 2PL concurrency control is to achieve serializable. If the concurrency control does not apply for all the necessary locks in advance, but releases the locks, it is also allowed to apply for locks again, which may occur between two operations on the same object within a transaction, and other transactions modify this object (as shown in the following figure), so that conflict serializable cannot be reached, resulting in inconsistencies (the following example is lost update).

2PL guarantees conflict serializability because the transaction must have all the required locks before it can be executed. For example, transaction A that is executing conflicts with transaction B. transaction B has either finished execution or is still waiting. Therefore, the execution order of those conflicting operations is the same as that of BA or AB serial execution.

So, can the database ensure consistency and isolation as long as it uses 2PL? Take a look at this example:

The above execution order conforms to 2PL, but T2 reads the uncommitted data. If a T1 rollback occurs at this time, a cascading rollback is triggered (changes to T1 cannot be seen by any transaction). Therefore, databases often use an enhanced version of S (trong) S (trict) 2PL, which is a little different from 2PL: in the shrinking phase, locks can only be released after the end of the transaction, completely preventing uncommitted data from being read.

2.2 deadlock handling

Concurrent transaction locking can not avoid a problem-deadlock: transaction 1 holds B lock such as A lock, transaction 2 holds A lock such as B lock. At present, there are two ways to solve the deadlock problem:

Deadlock Detection:

The database system records the waiting relationship of the transaction according to the waits-for diagram, where the point represents the transaction and the directed side represents the transaction waiting for another transaction to release the lock. When a ring appears in the waits-for diagram, it represents a deadlock. The system background will regularly detect the waits-for diagram, if a ring is found, it is necessary to select an appropriate transaction abort.

Deadlock Prevention:

When a transaction requests a lock that is already held, the database system kills one of the transactions to prevent deadlocks (the longer the transaction, the higher the priority reserved). This preventive approach does not require waits-for diagrams, but increases the rate at which transactions are killed.

2.3 intention lock

If there are only row locks, then the transaction needs to update 100 million records and need to acquire 100 million row locks, which will take up a lot of memory resources. We know that locks are used to protect internal access objects in the database, which may be: Attribute, Tuple, Page, and Table according to their size, and the corresponding locks can be divided into row locks, page locks, and table locks (no one implements property locks, and for OLTP databases, the smallest operation unit is rows). For transactions, it is certainly best to get a minimum number of locks, such as updating 100 million records, or perhaps adding a table lock.

The higher the level of locks (such as table locks), can effectively reduce the use of resources, significantly reduce the number of lock checks, but will seriously limit concurrency. Lower-level locks, such as row locks, are conducive to concurrent execution, but in the case of many transaction request objects, a large number of lock checks are required. In order to solve the problem that high-level locks restrict concurrency, the database system introduces the concept of Intention lock:

Intention-Shared (IS): indicates that one or more of its internal objects are protected by S-Lock, such as a table with IS, and at least one row in the table is protected by S-Lock.

Intention-Exclusive (IX): indicates that one or more of its internal objects are protected by X-Lock. For example, if a table is added with IX, at least one row in the table is protected by X-Lock.

Shared+Intention-Exclusive (SIX): indicates that at least one internal object is protected by X-Lock and itself is protected by S-Lock. For example, if an operation requires a full table scan and changes several rows in the table, you can add SIX to the table. Readers can think about why there is no XIX or XIS.

The compatibility relationship between intention locks and normal locks is as follows:

The advantage of intention locking is that when a table is added with IX, it means that there are rows in the table that are being modified. (1) when you initiate a DDL operation on a table, you need to request an X lock of the table, so you wait directly when you see that the table holds an IX, instead of checking whether the rows in the table hold row locks one by one, which effectively reduces the overhead of checking. (2) at this time, other read and write transactions come, and since the table adds IX instead of X, it does not prevent read and write requests for rows (first add IX on the table, and then add Smax X on the record). If the transaction does not involve rows that have been locked by X, it can be executed normally, increasing the concurrency of the system.

3. Concurrency Control based on Timing Order (Tadamo)

Timestamp is assigned to each transaction, which determines the order in which the transactions are executed. When the timestamp of transaction 1 is less than transaction 2, the database system ensures that transaction 1 executes before transaction 2. The methods of timestamp allocation include: (1) physical clock, (2) logical clock, and (2) mixed clock.

3.1 Basic T/O

Based on the concurrency control of Tplano, reads and writes do not need to be locked, and each row of records is marked with the timestamp of the transaction in which it was last modified and read. When the timestamp of the transaction is less than the recorded timestamp ("future" data cannot be read), it needs to be reexecuted after abort. Suppose record X is marked with two read and write timestamp:WTS (X) and RTS (X), the timestamp of the transaction is TTS, and the visibility judgment is as follows:

Read:

TTS

< WTS(X)：该对象对该事务不可见，abort事务，取一个新timestamp重新开始。 TTS >

WTS (X): this object is visible to the transaction, update RTS (X) = max (TTS,RTS (X)). To satisfy repeatable read, the transaction replicates the value of X.

In order to prevent reading dirty data, you can make a special mark on the record, and the read request needs to wait for the transaction to commit before reading.

Write:

TTS

< WTS(X) || TTS < RTS(X)：abort事务，重新开始。 TTS >

WTS (X) & & TTS > RTS (X): transaction update Xsparing WTS (X) = TTS.

The reason why TTS > RTS (X) is required here is to prevent the following situations: the timestamp of the read request is rts, X has been read, and the timestamp is set to RTS (X) = rts, if the TTS of the new transaction

< RTS(X)，并且更新成功，则rts读请求再来读一次就看到新的更改了，违反了repeatable read，因此这是为了避免读写冲突。记录上存储了最后的读写时间，可以保证conflict serializable 这种方式也能避免write skew，例如：初始状态，X和Y两条记录，X=-3，Y=5，X+Y >

0RTS (X) = RTS (Y) = WTS (X) = WTS (Y) = 0. The timestamp of transaction T1 is TTS1=1, and the timestamp of transaction T2 is TTS2=2.

Its shortcomings include:

Long transactions are easy to starve to death, because the timestamp of long transactions is too small, and there is a high probability that the updated data will be read after a period of time, resulting in abort.

Read operations also generate write (write RTS).

4. Concurrency Control based on Validation (OCC)

During execution, each transaction maintains its own write operation (Basic DB O writes data during transaction execution) and the corresponding RTS/WTS, commits to determine whether its changes conflict with the data already in the database, and writes to DB if there is no conflict. OCC is divided into three phases:

Read & Write Phase: that is, the read-write phase, where the transaction maintains the results of the read and the changes that are about to be committed, as well as the RTS and WTS that write the record.

Validation Phase: check whether the transaction conflicts with the data in the database.

Write Phase: write if there is no conflict, and abort,restart if there is conflict.

At the end of Read & Write Phase, entering Validation Phase is equivalent to the completion of transaction preparation, and entering the commit phase. The time of entering Validation Phase is selected as the timestamp of the record line to order. The reason why the transaction start time is not needed is that the transaction execution time may be long, so that the later transaction may commit first, which will increase the probability of transaction conflict. If the transaction with a smaller timestamp is written to the database, it will definitely be abort.

Validation process

Assuming that there are currently only two transactions T1 and T2, and the same data row is modified, and the timestamp of T1 is less than that of T2 (that is, validation order: T1 < T2, for users, T1 occurs first on T2), the following is true:

(1) T1 is in validate stage, T2 is still in Read & Write Phase. At this point, you can submit as long as there is no read-write conflict between T1 and T2.

If WS (T1) ∩ (RS (T2) ∪ WS (T2)) = ∅, the records written by T2 and T1 do not conflict, and validation is passed and can be written.

Otherwise, there is a read-write conflict or write-write conflict between T2 and T1, and T1 needs to be rolled back. Read-write conflict: T2 read the version before T1 write, and after T1 submission, it may read the T1 write version and cannot be read repeatedly. Write conflict: T2 may be updated based on the old version and written again, resulting in the loss of T1 update.

(2) it is irreversible that T1 completes the validate phase and enters the write phase until the submission is completed. The reading and writing of T2 before T1 enters write phase must not conflict with the operation of T1 (because T1 validation passed). Read and write operations that continue after T2 may conflict with the operations to be submitted by T1, so T2 enters the validate phase:

If WS (T1) ∩ RS (T2) = ∅, T2 does not read the T1 write record, validation passes, T2 can be written. (why not verify WS (T2)? WS (T1) has been submitted, and its timestamp is less than WS (T2). The previous part of WS (T2) must not conflict, and the later part, because the object that has not read T1 can be written in, and will not overwrite the write of WS (T1).

Otherwise, there are read-write conflicts and write-write conflicts between T2 and T1, and T2 needs to be rolled back. Read-write conflict: T2 read the version before T1 write, and after T1 submission, it may read the T1 write version and cannot be read repeatedly. Write conflict: T2 may be updated based on the old version and written again, resulting in the loss of T1 update.

5. Concurrency Control based on MVCC

The database maintains multiple physical versions of a record. When a transaction is written, a new version of the written data is created, and the read request obtains the latest version of the data that already exists based on the snapshot information at the beginning of the transaction / statement. Its most immediate benefits are that writes do not block reads, reads do not block writes, and read requests never fail in conflicts (such as single-version Tplano) or wait (such as single-version 2PL). For database requests, read requests tend to outnumber write requests. Almost all the mainstream databases adopt this optimization technology.

MVCC is a read and write request optimization technology, which does not completely solve the database concurrency problem. It needs to be combined with several concurrency control technologies mentioned above in order to provide complete concurrency control capability. Common types of concurrency control technologies include: MV-2PL,MV-T/O and MV-OCC, their characteristics are as follows:

MVCC also has two key points to consider: the storage of multi-version data and the recycling of excess multi-version data.

Multi-version data storage can be roughly divided into two categories: (1) Append only, where the new and old versions are stored in the same table space, such as the LSM-Tree-based storage engine. (2) the main tablespace records the latest version data, and the previous mirror records in other tablespaces or data segments, for example, the multi-version information of InnoDB is recorded in undo log. Multi-version data collection is also known as garbage collection (GC). Old version records that do not have a chance to be obtained by any read request should be deleted in time.

According to the timing of conflict processing (optimism), the transaction concurrency control mechanisms based on lock (preventing conflict before transaction start), TUnio (judging conflict during transaction execution) and Validation (verifying conflict when transaction commit) are introduced in turn. Different implementations are suitable for different workload, and workload with less concurrency conflict is certainly suitable for a more optimistic concurrency control method. On the other hand, MVCC can solve the problem of mutual blocking between read-only transactions and read-write transactions, and improve the concurrent reading of transactions, which is adopted by most mainstream database systems.

After reading the above, have you mastered the function of database concurrency control and the method of example analysis? If you want to learn more skills or want to know more about it, you are welcome to follow the industry information channel, thank you for reading!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.