Implementation of transaction and Lock in MySQL 07/08 Update SLTechnology News&Howtos

Implementation of transaction and Lock in MySQL

2025-07-08 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Database >

Shulou(Shulou.com)05/31 Report--

This article mainly talks about "the implementation of transactions and locks in MySQL". Interested friends may wish to have a look. The method introduced in this paper is simple, fast and practical. Next, let the editor take you to learn "the implementation of transactions and locks in MySQL".

Implementation of transaction in MySQL

In relational databases, the importance of transactions is self-evident. As long as people who have a little knowledge of the database know that transactions have four basic attributes of ACID, what we do not know may be how the database implements these four attributes. In this article, we will analyze the implementation of transactions and try to understand how the database implements transactions. Of course, we will also give a brief introduction to the implementation of ACID in MySQL.

Transaction is actually the basic unit of concurrency control; I believe we all know that a transaction is a sequential operation in which either all or none of the operations are performed, and it is an indivisible unit of work. The four ACID features of database transactions are the basis of transactions. After understanding how ACID is implemented, we will clear the implementation of transactions. Next, we will introduce how the database implements these four features in turn.

Atomicity

When learning a transaction, someone will often tell you that a transaction is a series of operations, either all or not, which is actually a portrayal of the atomicity of the transaction; although the transaction is atomic, but atomicity is not only related to the transaction, it will appear in many places.

Because the operation is not atomic and can be subdivided into multiple operations, when these operations have errors or throw exceptions, the whole operation may not continue, and the side effects caused by the operations that have already been carried out may result in the loss or error of data updates.

In fact, a transaction is not much different from an operation. It is a collection of a series of database operations (which can be understood as SQL). If the transaction is not atomic, then there is no way to guarantee that all operations in the same transaction are executed or not executed, and the whole database system is neither available nor trusted.

Roll back the log

To ensure the atomicity of a transaction, you need to roll back the operations that have been performed when an exception occurs, while in MySQL, the recovery mechanism is implemented through a rollback log (undo log), in which all changes made by transactions are first recorded in the rollback log, and then the corresponding rows in the database are written.

This process is actually very understandable, in order to undo all previous actions when an error occurs, it must be necessary to record all previous actions so that they can be rolled back when an error occurs.

The rollback log can not only provide rollback related information when an error occurs or when the user executes ROLLBACK, but also after the whole system crashes and the database process is killed directly, when the user starts the database process again, it can immediately query the rollback log to roll back the previously uncompleted transactions, which requires that the rollback log must be persisted to disk before the data is persisted. It is the main reason why we need to write the log before writing the database.

The rollback log does not physically restore the database to the way it was before the statement or transaction was executed; it is a logical log, and when the rollback log is used, it will only logically undo the changes in the database according to the log. It can be understood that every INSERT we use in the transaction corresponds to a DELETE, and each UPDATE corresponds to an opposite UPDATE statement.

Here, we will not introduce the format of the rollback log and how it is managed. This article focuses on what kind of thing it is and what problems have been solved and how. If you want to know the details of the implementation, I believe that there must be a lot of articles on the network about the rollback log.

The status of the transaction

Because a transaction is atomic, from a distance, a transaction is an inseparable whole, and there are only three states of a transaction: Active, Commited, and Failed. The transaction is either in execution or a state of success or failure:

But if we zoom in, we will find that the transaction is no longer atomic, including many intermediate states, such as partial commit, and the state diagram of the transaction becomes more and more complex.

The state diagram and description of the transaction are taken from Chapter 14 of the Database System Concepts book.

Active: the initial state of the transaction, indicating that the transaction is executing

Partially Commited: after the last statement is executed

Failed: after discovering that the transaction cannot be executed properly

Aborted: after the transaction is rolled back and the database is restored to the state before the transaction

Commited: execute the entire transaction successfully

Although the state of the entire database can be restored when an error occurs, however, if we perform operations such as printing logs to standard output, sending emails to the outside world, not modifying the contents of the disk through the database, or even transferring remittances during the execution of the transaction, then there is no way to roll back these operations as visible external output These problems are solved and responsible by application developers, and in most cases, we need to trigger similar operations that cannot be rolled back after the entire transaction is committed.

Take booking as an example, even if we make a request to a third party only after the end of the whole transaction, because requesting from the third party and obtaining the result is an operation that requires a long event, if the database or server crashes when the transaction is just committed, then we are very likely to lose the process of initiating the request, which creates a very serious problem. This is not guaranteed by the database, and developers need to check at the appropriate time whether the request was initiated and whether the result was successful or failed.

Atomicity of parallel transactions

So far, all transactions are executed serially, and the problem of parallel execution has not been considered; however, in practice, transactions executed in parallel are the norm, but in parallel tasks, very complex problems may arise:

When Transaction1 reads and writes to the user with id = 1 during execution, but does not commit or roll back the modified content, Transaction2 reads the same data and commits the transaction That is to say, Transaction2 depends on Transaction1. When Transaction1 needs to be rolled back due to some errors, it is necessary to roll back Transaction2 because of the atomicity of the transaction, but because we have committed Transaction2, we have no way to roll back. Under this problem, we have a problem. Database System Concepts calls this phenomenon an unrecoverable arrangement (Nonrecoverable Schedule). Under what circumstances can it be recovered?

A recoverable schedule is one where, for each pair of transactions Ti and Tj such that Tj reads a data item previously written by Ti, the commit operation of Ti appears before the commit operation of Tj.

To put it simply, if the Transaction2 depends on the transaction Transaction1, then the transaction Transaction1 must complete the commit before the Transaction2 commit:

However, this is not over, as the number of transactions increases, the whole recovery process becomes more and more complex, and it is not so easy if we want to recover from the errors that occurred in the transaction.

In an event shown in the figure above, Transaction2 depends on Transaction1, while Transaction3 depends on Transaction1. When Transaction1 rolls back due to execution problems, it rolls back all the work in Transaction2 and Transaction3 in order to ensure the atomicity of the transaction. This situation is also called cascading rollback (Cascading Rollback). The occurrence of cascading rollback will cause a lot of work to be withdrawn, which is difficult for us to accept, but if we want to achieve absolute atomicity. This matter has to be dealt with again, and we will explain in detail how to deal with the atomicity of parallel transactions later in the article.

Persistence

Since it is a database, then there must be a very strong demand for persistent storage of data, if the data is written to the database, then the data must be safely stored on disk; and the persistence of the transaction is reflected in, once the transaction is committed, then the data must be written to the database and persisted.

After the transaction has been committed, it cannot be rolled back again, and the only way to recall the committed transaction is to create an opposite transaction to "compensate" the original operation, which is also one of the manifestations of transaction persistence.

Redo log

Like atomicity, transaction persistence is achieved through logs. MySQL uses redo logs (redo log) to achieve transaction persistence. Redo logs are composed of two parts, one is the redo log buffer in memory, because the redo log buffer is in memory, it is volatile, and the other is the redo log file on disk, which is persistent.

When we try to modify the data in a transaction, it will first read the data from disk into memory and update the data cached in memory, then generate a redo log and write to the redo log cache. When the transaction is actually committed, MySQL will flush the contents of the redo log cache to the redo log file, and then update the data in memory to disk. Steps 4 and 5 in the figure are performed when the transaction is committed.

In InnoDB, redo logs are stored in 512-byte blocks, and because the block size is the same as the disk sector size, redo log writes can ensure atomicity and will not cause redo logs to write only half and leave dirty data due to machine power outage.

In addition to all changes to the database will result in redo logs, because rollback logs also need to be persisted, they will also create corresponding redo logs, after an error occurs, when the database is restarted, it will reexecute the logs from the redo log that have not been updated to the database disk to meet the persistence of the transaction.

Roll back logs and redo logs

Up to now, we have learned about two kinds of logs in MySQL, rollback log (undo log) and redo log (redo log). In database systems, the atomicity and persistence of transactions are guaranteed by transaction logs (transaction log). In implementation, the former is used to undo the impact of transactions, while the latter redoes committed transactions during error handling, which ensures two things:

Transactions that have an error or need to be rolled back can be rolled back successfully (atomicity)

After the transaction is committed, when the disk goes down before the data can be written, the data can be successfully recovered after the next reboot (persistence)

In the database, these two kinds of logs often work together, and we can think of them as a transaction log as a whole, which contains the ID of the transaction, the modified row elements, and the values before and after modification.

A transaction log contains both pre-and post-modified values, so it is very easy to roll back and redo. We will not introduce redo and rollback log deployment here. The use of two kinds of logs may be mentioned in a later article when talking about the recovery mechanism of the database system.

Isolation

In fact, the author in the previous article "simple and simple" MySQL and InnoDB have introduced the isolation of database transactions, but in order to ensure the independence and integrity of the article, we will also introduce the isolation of transactions, the content may be slightly different.

The isolation of transactions is one of the bases for the database to process data. if there is no isolation between transactions without the database, it will occur in the cascading rollback mentioned in the atomicity of parallel transactions, resulting in a huge loss of performance. If the execution order of all transactions is linear, then it is much easier to manage transactions, but allowing parallel execution of transactions can improve throughput and resource utilization, and reduce the waiting time of each transaction.

When multiple transactions are executed concurrently, the isolation of transactions may be violated. Although there may be no errors in the execution of a single transaction, it will cause problems with the consistency of the database as a whole. Although serial allows developers to ignore the impact of parallelism and maintain database consistency well, it will affect the performance of transaction execution.

Isolation level of the transaction

Therefore, the isolation and consistency of the database is actually a problem that needs to be weighed by developers. What kind of isolation level is provided for the database also determines the performance of the database and what kind of consistency can be achieved. In the SQL standard, the isolation level of transactions in four databases: READ UNCOMMITED, READ COMMITED, REPEATABLE READ and SERIALIZABLE;, the isolation level of each transaction actually solves one more problem than the previous level:

RAED UNCOMMITED: use query statements without locking and may read uncommitted rows (Dirty Read)

READ COMMITED: only record locks are added to records, not gap locks between records, so new records are allowed to be inserted near locked records, so when you use query statements multiple times, you may get different results (Non-Repeatable Read)

REPEATABLE READ: reading the same range of data multiple times will return a snapshot of the first query. No different data rows will be returned, but Phantom Read may occur.

SERIALIZABLE:InnoDB implicitly adds a shared lock to all the query statements, which solves the problem of phantom reading.

All of the above transaction isolation levels do not allow dirty writing (Dirty Write), that is, the current transaction updates data that another transaction has updated but has not yet committed. Most databases use READ COMMITED as the default transaction isolation level, but MySQL uses REPEATABLE READ as the default configuration. From RAED UNCOMMITED to SERIALIZABLE, as the transaction isolation level becomes more and more stringent, the performance of the database for concurrent transaction execution is gradually degraded.

For database users, in theory, we do not need to know how the isolation level of the transaction is implemented, we just need to know what problems this isolation level solves, but the implementation details of different databases for different isolation levels often make us encounter unexpected pitfalls.

If the reader doesn't know what dirty reading, unrepeatable reading, and illusory reading are, you can read the previous articles MySQL and InnoDB, and here we just put a picture to show how each isolation level solves these problems.

Implementation of isolation level

The implementation of the isolation level in the database is to use the concurrency control mechanism to control the transactions executed at the same time, to restrict the access and update of different transactions to the same resource, and the most important and common concurrency control mechanism. Here we will briefly introduce the working principles of the three most important concurrency controller mechanisms.

Lock

Lock is one of the most common concurrency control mechanisms. In a transaction, we will not lock the entire database, but only lock those data items that need to be accessed. MySQL and common database locks are divided into two types, shared lock (Shared) and mutex lock (Exclusive), the former is also called read lock, the latter is called write lock.

The read lock ensures that the read operations can be performed concurrently and will not affect each other, while the write lock ensures that there are no unpredictable problems caused by other transactions accessing or changing the same record when updating database data.

Time stamp

In addition to locks, another way to achieve transaction isolation is through timestamps, a database that implements transactions in this way, for example, PostgreSQL reserves two fields for each record; the read timestamp misstates the maximum timestamp of all transactions accessing the record, and the write timestamp of the record row stores the timestamp of the transaction that changed the record to the current value.

When using timestamps to achieve transaction isolation, optimistic locks are often used to modify the data first, and then judge the current value when writing back, that is, whether the timestamp has changed, and if it has not changed, write it, otherwise, generate a new timestamp and update the data again. Optimistic locking is not a real locking mechanism, it is just an idea, and it will not be introduced here.

Multi-version and snapshot isolation

By maintaining multiple versions of data, databases can allow transactions to read older versions of data when it is updated by other transactions, and many databases implement this mechanism. Because all read operations no longer need to wait for the release of the write lock, it can significantly improve the read performance. Both MySQL and PostgreSQL implement this mechanism, that is, MVCC, although their own implementation methods are different. MySQL implements MVCC through the rollback log mentioned in the article, ensuring that transactions can obtain data directly without waiting for the release of mutexes when executing in parallel.

Isolation and atomicity

Here, we need to briefly mention the cascading rollback encountered in the atomicity section. If a transaction writes to the data, it will acquire a mutex lock. Other transactions will have to wait for the release of the write lock if they want to obtain the read lock of the changed data. Naturally, cascading rollback will not occur.

However, features such as MVCC are used in most databases, such as MySQL, that is, normal reading methods do not need to acquire locks, and you need to use SELECT when you want to update the read data. FOR UPDATE attempts to acquire mutexes for the corresponding rows to ensure that different transactions work properly.

Consistency

The author believes that database consistency is a very confusing concept, because the database domain actually contains two consistency, one is consistency in ACID, the other is consistency in CAP definition.

The consistency of the two databases is not the same thing at all, and many people have a very deep misunderstanding of the concepts of the two. When we discuss the consistency of the database, we must know what the semantics of the context is, and try to be clear about whether we are talking about consistency in ACID or consistency in CAP.

ACID

The database defines consistency in ACID as follows: if a transaction runs atomically in a consistent database, then the state of the database must be consistent after it is executed. For this concept, its first meaning is that the constraints on data integrity, including primary key constraints, reference constraints and some constraint checks, will not violate the constraints on data integrity before and after the execution of the transaction and during the process. all operations written to the database should be legal and cannot produce an illegal data state.

A transaction must preserve database consistency-if a transaction is run atomically in isolation starting from a consistent database, the database must again be consistent at the end of the transaction.

We can think of a transaction as a function that accepts an external SQL input and a consistent database, which must return a consistent database.

The second layer actually refers to the logical requirements for developers. We need to write the correct transaction logic in the code, such as bank transfer. The logic in the transaction cannot only deduct money or only add money. This is the requirement for database consistency at the application level.

Ensuring consistency for an individual transaction is the responsibility of the application programmer who codes the transaction. -Database System Concepts

The requirement of consistency in database ACID for transactions includes not only the check of data integrity and legitimacy, but also the correctness of application-level logic.

The data consistency in CAP theorem actually means that each node in the distributed system has the same value for the copy of the same data; while the consistency in ACID refers to the rules of the database. If it is stipulated in schema that a value must be unique, then the consistent system must ensure that the value is unique in all operations, so there is a fundamental difference in the definition of consistency between CAP and ACID.

Summary

The four basic ACID characteristics of transactions are the cornerstone of ensuring that the database can run, but fully ensuring the ACID of the database, especially isolation, will have a great impact on performance. In actual use, we will also adjust the isolation according to the needs of the business. In addition to isolation, the atomicity and persistence of the database are believed to be easy to understand. The former ensures that all transactions in the database are either executed or not executed, while the latter ensures that the writes to the database are persistent and non-volatile, and consistency is not only the requirement of the database for the integrity of its own data. at the same time, it also puts forward a requirement for developers to write logically correct and reasonable transactions.

Finally, and most importantly, when others are going to be consistent, be sure to figure out their context. If you have any questions about the content of the article, you can leave a message in the comments.

Talking about Database concurrency Control-Lock and MVCC

Transferred from https://draveness.me/database-concurrency-control

After studying programming for a few years, you will find that there are no simple and quick solutions to all problems, and many problems need to be weighed and compromised. This article introduces the tradeoff and compromise between database concurrency performance and serializability-concurrency control mechanism.

If all transactions in the database are executed serially, then it is very easy to become the performance bottleneck of the whole application. Although the node that cannot be scaled horizontally will become a bottleneck in the end, the database that executes transactions in series will accelerate this process; and Concurrency makes everything possible, it can solve some performance problems, but it will bring more weird errors.

After the introduction of concurrent transactions, if you do not control the execution of transactions, there will be all kinds of problems, and you may be tormented to death by all kinds of strange problems without enjoying the performance improvement brought by concurrency.

Overview

How to control concurrency is one of the most important issues in the database field, but up to now, there are many mature solutions for transaction concurrency control, and the principle of these solutions is what this article wants to introduce. The article will introduce the three most common concurrency control mechanisms:

They are pessimistic concurrency control, optimistic concurrency control and multi-version concurrency control, in which pessimistic concurrency control is actually the most common concurrency control mechanism, that is, lock, while optimistic concurrency control also has another name: optimistic lock. Optimistic lock is not a real lock, which we will introduce in detail in the later part of the article. Finally, there is multi-version concurrency control (MVCC). Unlike the naming of the former two opposites, MVCC can be used in combination with any of the former two mechanisms to improve database read performance.

Since this article introduces different concurrency control mechanisms, it must involve the concurrency of different transactions, and we will analyze how the various mechanisms work in a schematic way.

Pessimistic concurrency control

Controlling the acquisition of the same data by different transactions is the most fundamental way to ensure the consistency of the database. If we can make transactions have the ability to monopolize the same resource at the same time, then we can ensure that different transactions operating the same resource will not affect each other.

The simplest and most widely used method is to use locks to solve the problem. when a transaction needs to operate on a resource, it needs to obtain the corresponding lock of the resource first to ensure that other transactions will not access the resource. In pessimistic concurrency control, database programs are pessimistic about data modification and are locked in the process of data processing, so as to solve the problem of competition.

Read-write lock

In order to maximize the concurrency of database transactions, locks in the database are designed into two modes, namely shared locks and mutex locks. When a transaction acquires a shared lock, it can only read, so the shared lock is also called a read lock; and when a transaction acquires a mutex lock of a row of data, it can read and write to that row of data, so a mutex lock is also called a write lock.

In addition to limiting the read and write operations that transactions can perform, shared locks and mutexes also have a "shared" and "exclusive" relationship between shared locks and mutexes, that is, multiple transactions can acquire a shared lock of a row of data at the same time. However, mutexes are not compatible with shared locks and other mutexes. We can naturally understand the reason for this design: it is inevitable that all kinds of weird problems will occur when multiple transactions write the same data at the same time.

If the current transaction has no way to obtain the lock corresponding to the row of data, it will fall into a waiting state until other transactions release the lock corresponding to the current data before it can acquire the lock and perform the corresponding operation.

Two-phase locking protocol

The two-phase locking protocol (2PL) is a protocol that ensures that transactions can be serialized. It divides the acquisition and release of locks into two different phases: Growing and Shrinking.

In the growth phase, a transaction can acquire the lock but not release the lock; in the reduction phase, the transaction can only release the lock, not acquire a new lock. If you only look at the definition of 2PL, then you are done here, but it also has two variants:

Strict 2PL: mutexes held by transactions must be released after commit

Rigorous 2PL: all locks held by the transaction must be released after commit

Although the use of locks can solve the problems caused by concurrent execution between different transactions, the use of two-phase locks introduces another serious problem, deadlocks; different transactions will cause deadlocks when waiting for resources that each other has already locked. Let's give a simple example here:

At the beginning, two transactions acquire locks on the resource side of draven and beacon respectively, and then deadlocks occur when the other party requests the locks already acquired. Neither side can wait for the lock to be released. If there is no deadlock processing mechanism, it will wait indefinitely, and neither transaction can be completed.

Handling of deadlocks

Deadlocks are often encountered in multithreaded programming. Once multiple threads are involved in competing for resources, it is necessary to consider whether current threads or transactions will cause deadlocks. Generally speaking, there are two ways to solve deadlocks. One is to eliminate the generation and emergence of deadlocks from the source, and the other is to allow the system to enter the state of deadlocks, but it can be found and restored in time when the system has deadlocks.

Prevent deadlock

There are two ways to help us prevent deadlocks, one is to ensure that there is no loop in the waiting between transactions, that is, the waiting graph between transactions should be a directed acyclic graph, there is no cyclic waiting, or to ensure that all the resources desired in a transaction are locked atomically at the beginning of the transaction, and all resources are either locked or unlocked.

However, there are two problems with this approach. At the beginning of the transaction, it is difficult to determine which resources need to be locked. At the same time, because some data that will be used late is locked in advance, the data utilization and transaction concurrency rate are also very low. One solution is to lock all data rows in a certain order, and combine with 2PL protocol to ensure that all data rows are locked from small to large in the locking phase, but this method still requires the transaction to know in advance the data set to be locked.

Another way to prevent deadlock is to use preemption plus transaction rollback to prevent deadlock. When the transaction starts to execute, it will first get a timestamp, and the database program will decide whether the transaction should wait or roll back according to the transaction timestamp. At this time, there are also two mechanisms for us to choose. One is the wait-die mechanism:

When the timestamp of executing a transaction is less than that of another transaction, that is, transaction A starts before B, it will wait for another transaction to release the lock of the corresponding resource, otherwise it will maintain the current timestamp and roll back.

Another mechanism is called wound-wait, which is a preemptive solution, which is the opposite of the result of the wait-die mechanism. If the current transaction executes before another transaction and requests the resources of another transaction, then the other transaction will immediately roll back and give up the resources to the first transaction, otherwise it will wait for the other transaction to release the resources:

Both methods will cause unnecessary transaction rollback, which will bring some performance loss. a simpler way to solve the deadlock is to use timeout time, but the setting of timeout time needs to be carefully considered. otherwise, it will cause the long transaction can not be executed normally, or can not find the deadlock that needs to be solved in time, so its use still has some limitations.

Deadlock detection and recovery

If the database program cannot guarantee that the deadlock will not occur in principle through the protocol, then it is necessary to detect the deadlock in time and return to the normal state from the deadlock state to ensure that the database program can work properly. When using the method of detection and recovery to solve the deadlock, the database program needs to maintain the reference information between the data and the transaction, and also needs to provide an algorithm to judge whether the current database has entered the deadlock state. finally, it is necessary to provide appropriate strategies for timely recovery when deadlocks occur.

In the previous section, we actually mentioned that deadlock detection can be judged by a directed wait graph. If a transaction depends on the data being processed by another transaction, then the current transaction will wait for the end of another transaction. This is an edge in the whole waiting graph:

As shown in the figure above, if a ring appears in this directed graph, it means that the current database has entered a deadlock state of TransB-> TransE-> TransF-> TransD-> TransB, and the deadlock recovery mechanism needs to be connected at this time.

How to recover from a deadlock is actually very simple. The most common solution is to select a transaction in the whole ring to roll back to break the ring in the entire waiting graph. There are three things to consider in the whole recovery process:

In fact, multiple transactions are affected every time there is a deadlock, and it is necessary to choose which task to roll back. The golden principle when choosing a Victim is to minimize the cost, so we need to consider factors such as the calculated time of the transaction, the data rows used and the transactions involved. After we have selected the victim, we can start to roll back. Actually, there are two options for rollback: full rollback and partial rollback, which is rolled back to a checkpoint before the transaction. If there is no checkpoint, then there is no way to do partial rollback.

In fact, in the process of deadlock recovery, some tasks may be selected as victims in multiple deadlocks and will never be executed successfully, resulting in Starvation. We need to ensure that the transaction will be executed in a limited time, so we need to take the timestamp into consideration when selecting the victim.

Lock granularity

So far, we have not discussed locks of different granularity, we have always talked about row locks, but sometimes we want to treat multiple nodes as a single data unit. use locks to lock this data unit, table, or even database directly. To achieve this goal, we need to define different granularity locks in the database:

When we have locks of different granularity, if a transaction wants to lock the entire database or the whole table, simply lock the corresponding node and add an explicit lock on the current node and an implicit lock on all child nodes. Although this different granularity lock can solve the problem that the child node can not be locked when the parent node is locked, we have no way to determine that the parent node can not be locked immediately when the child node is locked.

At this point, we need to introduce intention locks to solve this problem. when we need to lock the child nodes, we first add the corresponding intention locks to all the parent nodes. The intention locks are not mutually exclusive at all. It is only used to help the parent node quickly determine whether the node can be locked:

Here is a compatibility relationship between all locks after introducing two kinds of intention locks, intention sharing lock and intention mutex lock; here, we speed up the throughput of the database through different granularity locks and intention locks.

Optimistic concurrency control

In addition to pessimistic concurrency control mechanism-lock, we actually have other concurrency control mechanisms, optimistic concurrency control (Optimistic Concurrency Control). Optimistic concurrency control is also called optimistic lock, but it is not a real lock, many people will mistakenly think that optimistic lock is a real lock, but it is only a kind of concurrency control idea.

In this section, we will first introduce the concurrency control mechanism based on timestamp, and then extend this protocol to implement an optimistic concurrency control mechanism.

Timestamp-based protocol

The lock protocol executes sequentially according to the time when different transactions request the same data item, because the data that the later transaction wants to acquire will be locked by the previous transaction and can only wait for the release of the lock. so the order in which the lock-based protocol executes transactions is related to the order in which the lock is obtained. The timestamp-based protocol you want to introduce here can determine the order in which transactions are executed before they are executed.

Each transaction will have a globally unique timestamp, which can use either the clock time of the system or the counter, as long as it ensures that all timestamps are unique and incremented over time.

The timestamp-based protocol can ensure that the order in which transactions execute in parallel is exactly the same as that in which transactions execute serially according to timestamps; each data item has two timestamps, read timestamps and write timestamps, which represent the timestamps of the transactions that have successfully executed the corresponding operations.

This protocol ensures that all conflicting read and write operations can be performed serially according to the size of the timestamp. When performing the corresponding operation, you do not need to pay attention to other transactions, you only need to care about the value of the corresponding timestamp of the data item:

Both read and write operations compare the value of the read and write timestamp from left to right. If it is less than the current value, it will be directly rejected and then rolled back. The database system will add a new timestamp to the rolled back transaction and re-execute the transaction.

Authentication-based protocol

Optimistic concurrency control is essentially a verification-based protocol, because in most applications, read-only transactions account for the vast majority, and the possibility of conflicts between transactions due to write operations is very small. that is to say, most transactions can run very well without concurrency control mechanism, and the consistency of the database can be ensured. The concurrency control mechanism actually adds a lot of overhead to the entire database system, and we can actually reduce this overhead through other strategies.

The verification protocol is the solution we found, which divides the execution of all transactions into two or three phases based on the read-only or update of the transaction:

During the read phase, the database performs all read and write operations in the transaction and stores all written values in temporary variables without actually updating the contents of the database At this point, the database program will check whether the current change is legal, that is, whether any other transaction has updated the data during RAED PHASE. If it passes the test, it will directly enter WRITE PHASE and write all the changes in the temporary variables to the database, and the transaction that fails the test will be terminated directly.

In order to ensure the normal operation of optimistic concurrency control, we need to know the occurrence time of different phases of a transaction, including the start time of the transaction, the start time of the verification phase, and the end time of the write phase. Through these three timestamps, we can ensure that any conflicting transactions will not be written to the database at the same time, as soon as one transaction completes the verification phase, it will be written immediately, and other transactions that read the same data will be rolled back and re-executed.

As an optimistic concurrency control mechanism, it assumes that all transactions will eventually pass the validation phase and execute successfully, while locking mechanisms and timestamp-based sorting protocols are pessimistic. because they force transactions to wait or roll back when conflicts occur, even if locks are not needed, they can ensure that there will be no conflicts between transactions.

Multi-version concurrency control

The concurrency control mechanisms we have introduced so far are actually to solve the Race condition between transactions by delaying or terminating the corresponding transactions to ensure the serialization of transactions. Although the previous two concurrency control mechanisms can fundamentally solve the problem of serializability of concurrent transactions, in the actual environment, database transactions are mostly read-only, and read requests are many times larger than write requests. if there is no concurrency control mechanism before write and read requests, then the worst-case scenario is that the read request reads the data that has been written, which is acceptable to many applications.

Under this premise, the database system introduces another concurrency control mechanism-multi-version concurrency control (Multiversion Concurrency Control). Each write operation will create a new version of the data, and the read operation will select the most appropriate result from a limited number of versions of the data and return directly. At this point, conflicts between read and write operations no longer need to be concerned, and managing and quickly selecting the version of the data becomes the main problem that MVCC needs to solve.

MVCC is not the opposite of optimistic and pessimistic concurrency control. It can be well combined with the two to increase the concurrency of transactions. MVCC is implemented in both MySQL and PostgreSQL, the most popular SQL databases, but because they implement pessimistic locks and optimistic locks respectively, MVCC is implemented in different ways.

MySQL and MVCC

The multi-version two-phase locking protocol (Multiversion 2PL) implemented in MySQL combines the advantages of MVCC and 2PL. Each version of the data row has a unique timestamp. When there is a read transaction request, the database program will directly return from multiple versions of the data item with the maximum timestamp.

The update operation is a little more complicated. The transaction will first read the latest version of the data to calculate the updated results, and then create a new version of the data. The timestamp of the new data is the maximum version of the current data row + 1:

The deletion of the data version is also selected according to the timestamp, and MySQL periodically clears the lowest version of the data from the database to ensure that there is no large amount of legacy content.

PostgreSQL and MVCC

Unlike pessimistic concurrency control in MySQL, optimistic concurrency control is used in PostgreSQL, which leads to some differences in the implementation of MVCC when optimistic locks are combined. The final implementation is called Multi-version timestamp sorting Protocol (Multiversion Timestamp Ordering), in which all transactions are assigned a unique timestamp before execution, and each data item has two timestamps for reading and writing:

When a PostgreSQL transaction issues a read request, the database directly returns the latest version of the data without being blocked by any operation, while when the write operation is executed, the timestamp of the transaction must be large or equal to the read timestamp of the data row, otherwise it will be rolled back.

The implementation of this MVCC ensures that the read transaction will never fail and does not need to wait for the lock to be released. For applications where read requests are far more than write requests, optimistic locking plus MVCC can greatly improve the performance of the database. Although this protocol can make some obvious performance improvements according to some actual situations, it will also lead to two problems, one is that each read operation will update the read timestamp and cause two disk writes, the second is that the conflict between transactions is resolved by rollback, so if the possibility of conflict is very high or the rollback cost is high, the read and write performance of the database is not as good as using the traditional lock waiting method.

1. Introduction and practice of MVCC

MySQL has two modes: current read and snapshot read under the InnoDB engine.

1 the current read is locked, and the latest version number of the record will be locked to ensure that other concurrent things cannot modify the current record until the lock is released. The insert / update / delete operation defaults to the current read, and the query displayed as select statement plus lock in share mode or for update is also in the current read mode.

2 Snapshot read: read the snapshot version of the record without locking, rather than the latest version. Using the MVCC mechanism, the biggest advantage is that reading does not require locking, does not conflict between reads and writes, and is used for applications with more read operations than write operations. Therefore, when select statements with [lock in share mode] / [for update] are not displayed, that is, ordinary select statements are implemented in snapshot read MVCC mode by default. So in order to make everyone understand the demonstration operation, there are both current reading and snapshot reading.

1.1 what is MVCC

MVCC is a multi-version concurrency control mechanism.

1.2 what problem does MVCC solve?

Most MYSQL transactional storage engines, such as InnoDB,Falcon and PBXT, do not use a simple row locking mechanism. In fact, they are all used with MVCC- multi-version concurrency control.

As we all know, the lock mechanism can control concurrent operations, but its system overhead is high, and MVCC can replace row-level locks in most cases. Using MVCC can reduce its system overhead.

1.3 MVCC implementation

MVCC is achieved by keeping a snapshot of the data at a certain point in time. MVCC of different storage engines. The MVCC implementation of different storage engines is different, such as optimistic concurrency control and pessimistic concurrency control.

Analysis of concrete implementation of 2.MVCC

Next, we analyze how MVCC implements concurrency control through the MVCC implementation of InnoDB. InnoDB's MVCC is achieved by saving two hidden columns after each row of records, each of which holds the creation time of the row, and one saves the deletion time of the row. What is stored here is not the actual time value, but the system version number (which can be understood as the ID of the transaction). If you do not start a new transaction, the system version number will be incremented automatically, and the system version number at the beginning of the transaction will be used as the ID of the transaction. Let's take a look at how MVCC operates under the REPEATABLE READ isolation level.

2.1 simple small examples

Create table yang (id int primary key auto_increment, name varchar (20)

Suppose the version number of the system starts with 1.

INSERT

InnoDB saves the current system version number as the version number for each newly inserted row. The first transaction ID is 1

Start transaction; insert into yang values (NULL,'yang'); insert into yang values (NULL,'long'); insert into yang values (NULL,'fei'); commit

The corresponding table in the data is as follows (the next two columns are hidden columns, which we can't see through the query statement)

Idname creation time (transaction ID) deletion time (transaction ID) 1yang1undefined2long1undefined3fei1undefined

SELECT

InnoDB checks each row of records against two conditions: a.InnoDB only looks for rows whose version is earlier than the current transaction version (that is, the system version number of the row is less than or equal to the transaction's system version number), which ensures that the row read by the transaction either exists before the transaction starts, or is inserted or modified by the transaction itself. b. The deleted version of the row is either undefined or greater than the current transaction version number, which ensures that the row read by the transaction is not deleted before the transaction starts. Only records that are satisfied at the same time by aformab can be returned as query results.

DELETE

InnoDB saves the current system version number (transaction ID) as the deletion identity for each line deleted. Take a look at the following specific example analysis: the second transaction, ID is 2

Start transaction; select * from yang; / / (1) select * from yang; / / (2) commit

Hypothesis 1

Suppose that during the execution of this transaction with an ID of 2, when (1) is executed, another transaction ID inserts a piece of data into the table for 3, and the third transaction ID is 3.

Start transaction; insert into yang values (NULL,'tian'); commit

The data in the table is as follows:

Idname creation time (transaction ID) deletion time (transaction ID) 1yang1undefined2long1undefined3fei1undefined4tian3undefined

Then execute (2) in transaction 2. Because the creation time of the data of id=4 (transaction ID is 3), the ID of the current transaction is 2, and InnoDB will only look for data rows with transaction ID less than or equal to the current transaction ID, the data rows of id=4 will not be retrieved in transaction 2 (2). The data retrieved by the two select statements in transaction 2 will only be shown in the following table:

Idname creation time (transaction ID) deletion time (transaction ID) 1yang1undefined2long1undefined3fei1undefined

Hypothesis 2

Suppose that during the execution of this transaction with an ID of 2, the execution has just reached (1), and that the transaction has finished executing transaction 3, followed by transaction 4; the fourth transaction:

Start transaction; delete from yang where id=1; commit

The tables in the database are as follows:

Idname creation time (transaction ID) deletion time (transaction ID) 1yang142long1undefined3fei1undefined4tian3undefined

Then execute the transaction (2) with a transaction ID of 2. According to the SELECT retrieval condition, it can be known that it will retrieve rows whose creation time (the ID of the transaction was created) is less than the current transaction ID, and the delete time (the ID of the delete transaction) is greater than the row of the current transaction, while the row of id=4 has been mentioned above, and the row of id=1 is greater than the ID of the current transaction due to the delete time (the ID of the delete transaction). So transaction 2's (2) select * from yang will also retrieve the id=1 data. Therefore, the data retrieved by the two select statements in transaction 2 is as follows:

Idname creation time (transaction ID) deletion time (transaction ID) 1yang142long1undefined3fei1undefined

UPDATE

When InnoDB executes UPDATE, it actually inserts a new row of records and saves the ID of the current transaction when it was created, and the deletion time from the ID of the current transaction to the row to be UPDATE.

Hypothesis 3

Suppose that after executing transaction 2 (1), another user executes transaction 3, 4, and then another user performs a UPDATE operation on the table: the fifth transaction:

Start transaction; update yang set name='Long' where id=2; commit

According to the update principle of update: a new row is generated, and the transaction ID is added to the delete time column of the column to be modified, and the table is as follows:

Idname creation time (transaction ID) deletion time (transaction ID) 1yang142long153fei1undefined4tian3undefined2Long5undefined

Continue with transaction 2 (2) and get the following table according to the retrieval conditions of the select statement:

Idname creation time (transaction ID) deletion time (transaction ID) 1yang142long153fei1undefined

Or get the same result as transaction 2 (1) select.

At this point, I believe you have a deeper understanding of "the implementation of transactions and locks in MySQL". You might as well do it in practice. Here is the website, more related content can enter the relevant channels to inquire, follow us, continue to learn!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.