Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

What are the principles of transactions and MVCC in MySQL

2025-04-06 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Database >

Share

Shulou(Shulou.com)05/31 Report--

Most people do not understand the knowledge points of this article "what is the transaction in MySQL and the principle of MVCC", so the editor summarizes the following content, detailed content, clear steps, and has a certain reference value. I hope you can get something after reading this article. Let's take a look at this article, "what are the transactions and MVCC principles in MySQL?"

What is a transaction?

A database transaction refers to a set of data operations, in which either all the operations in the transaction are successful or all failed, and nothing is done. In fact, it is not that it has not been done, but it is possible to do some of it, but as long as one step fails, it is necessary to roll back all the operations.

In MySQL, transaction support is implemented at the engine layer. MySQL is a system that supports multiple engines, but not all engines support transactions. For example, MySQL's native MyISAM engine does not support transactions, which is one of the important reasons why MyISAM has been replaced by InnoDB.

1.1 four characteristics

Atomicity: after a transaction starts, all operations are either done or not done at all, and cannot be stuck in the middle. When an error occurs during the execution of the transaction, it is rolled back to the state before the start of the transaction, and all operations are as if they had not occurred. In other words, affairs are an indivisible whole, just like atoms learned in chemistry, are the basic units of matter.

Consistency: the integrity constraints of the database are not broken before and after the transaction begins. For example, if A transfers money to B, it is impossible for A to deduct the money, but B does not receive it.

Isolation: only one transaction is allowed to request the same data at a time, and there is no interference between different transactions. For example, An is withdrawing money from a bank card, and B cannot transfer money to this card until the withdrawal process of An is over.

Durability: after the transaction completes, all updates to the database made by the transaction are saved to the database and cannot be rolled back.

1.2 isolation level

Atomicity, consistency, and persistence are easy to understand among the four characteristics of SQL transactions. However, the isolation level of transactions is really difficult, today we will mainly talk about the isolation of MySQL transactions.

SQL standard transaction isolation from low to high levels are read uncommitted (read uncommitted), read commit (read committed), repeatable read (repeatable read), and serialization (serializable). The higher the level, the lower the efficiency.

Read uncommitted: when a transaction is not committed, its changes can be seen by other transactions.

Read commit: after a transaction commits, its changes will not be seen by other transactions.

Repeatable: the data seen during the execution of a transaction is always the same as the data seen when the transaction is started. Of course, under the repeatable readable isolation level, uncommitted changes are not visible to other transactions.

Serialization: as the name implies, for the same row of records, "write" adds "write lock" and "read" adds "read lock". When there is a read-write lock conflict, the later accessed transaction must wait for the previous transaction to complete before it can continue execution. So all the data at this isolation level is the most stable, but the performance is also the worst.

1.3 concurrency issues solved

The SQL transaction isolation level is designed to solve concurrency problems as much as possible:

Dirty read: transaction A reads the data updated by transaction B, and then B rolls back the operation, so the data read by An is dirty data.

Unrepeatable: transaction A reads the same data many times, and transaction B updates and commits the data during transaction A's multiple reads, resulting in inconsistent results when transaction A reads the same data many times.

Illusion: system administrator A changed the scores of all students in the database from specific scores to ABCDE grades, but system administrator B inserted a record of specific scores at this time. When system administrator A finished the change, he found that there was a record that had not been changed, as if there were hallucinations, which is called hallucination.

Different transaction isolation levels of SQL can also solve different concurrency problems, as shown in the following table: only serialized isolation levels solve all three problems, and the other three isolation levels are flawed.

Transaction isolation level dirty read non-repeatable uncommitted read may have been committed impossible repeatable read impossible possible serialization impossible

PS: unrepeatable readings and phantom readings are easily confused, non-repeatable readings focus on modification, and phantom readings focus on adding or deleting. To solve the problem of unrepeatable reading, you only need to lock the rows that meet the conditions, and to solve the phantom reading, you need to lock the table.

1.4 give me a chestnut

This may be a little difficult to understand. Take a chestnut. Or the previous table structure and table data

CREATE TABLE `student` (`id` int (11) NOT NULL AUTO_INCREMENT, `name` varchar (100) CHARACTER SET utf8 COLLATE utf8_general_ci NOT NULL, `age` int (11) NULL DEFAULT NULL, PRIMARY KEY (`id`) USING BTREE) ENGINE = InnoDB AUTO_INCREMENT = 66 CHARACTER SET = utf8 COLLATE = utf8_general_ci ROW_FORMAT = Compact

Suppose now, I want to start two foods at the same time, one transaction A queries the age of the student with id = 2, and the other transaction B updates the age of the student with id = 2. The process is as follows: what are the values of X1, X2, and X3 under the four isolation levels?

Read uncommitted: the value of X1 is 23, because transaction B is not committed but its changes have been seen by A. The value of X1 is dirty if it rolls back after B. The value of X2 and X3 is also 23, which is understandable.

Read submitted: the value of X1 is 22, because B has changed, but A cannot see it. (if B rolls back, the value of X1 remains the same, and the dirty reading is solved.) the value of X2 and X3 is 23, and there is nothing wrong with it. B is submitted before A can see it.

Repeatable: X1 and X2 are both 22. When An is turned on, the value is 22, so throughout the process of A, its value is 22. (no matter how B is modified during this period, as long as A has not been submitted, it is invisible and can not be read repeatedly.) while the value of X3 is 23, because A has been submitted, you can see the modified value of B.

Serialization: B is locked during the change until A commits. B to continue with the execution. B cannot write while An is reading. Make sure the data is up-to-date at this time. So X1 and X2 are both 22, while the final X3 is executed after B commits, and its value is 23.

Then why did this happen? How exactly is the transaction isolation level implemented?

How is the transaction isolation level implemented? I found the answer in Mr. Dinky's class at geek time:

In fact, a view is created in the database and accessed based on the logical results of the view. Under the repeatable isolation level, this view is created at the start of the transaction and is used throughout the existence of the transaction. Under the read commit isolation level, this view is created at the beginning of each SQL statement execution. It should be noted here that the latest value on the record is returned directly under the read uncommitted isolation level, without the concept of view, while under the serialization isolation level, parallel access is avoided by locking directly.

1.5 set transaction isolation level

Transaction isolation levels vary greatly from database to database. The default isolation level for Oracle databases is read commit, while MySQL is repeatable. So, when your system needs to migrate the database from Oracle to MySQL, please set the level to be the same as before the relocation (read commit) to avoid unpredictable problems.

1.5.1 View transaction isolation level

# check SELECT @ @ transaction_isolationshow variables like 'transaction_isolation' before transaction isolation level 5.7.20 # 5.7.20 and then SELECT @ @ tx_isolationshow variables like 'tx_isolation'+-+-+ | Variable_name | Value | +-+-+ | tx_isolation | REPEATABLE-READ | +- -+

1.5.2 set isolation level

Modify the isolation level statement format as: set [scope] transaction isolation level [transaction isolation level]

The scope is optional: SESSION (session) and GLOBAL (global). Isolation levels are the four mentioned above, which are not case-sensitive.

For example: set the global isolation level to read commit

Set global transaction isolation level read committed

1.6 initiation of transaction

There are several ways to start a transaction for MySQL:

Explicitly start a transaction statement, begin or start transaction. The accompanying commit statement is commit, or the rollback statement is rollback.

# Update the student's name START TRANSACTION;update student set name = 'Zhang San' where id = 2 commit

Set autocommit = 0, this command turns off the autocommit of the thread. This means that if you execute only one select statement, the transaction starts and does not commit automatically. This transaction persists until you actively execute commit or rollback statements, or disconnect.

Set autocommit = 1, which means that MySQL automatically starts and commits transactions. For example, when a update statement is executed, the statement is automatically submitted after the statement is completed. Do not need to display the use of begin, commit to open and commit transactions. So when we execute multiple statements, we need to manually use begin and commit to open and commit transactions.

The begin/start transaction commands mentioned above in start transaction with consistent snapshot; are not the starting point of a transaction, and the transaction does not really start until the first statement that operates on the InnoDB table after them is executed. If you want to start a transaction immediately, you can use the start transaction with consistent snapshot command. In the first startup, the consistency view is created when the first snapshot read statement is executed; in the second startup, the consistency view is created when the start transaction with consistent snapshot is executed.

02 implementation of transaction isolation

After understanding the isolation level, how is the isolation of transactions achieved? To understand transaction isolation, you need to understand the concept of concurrency control in multiple versions of MVCC. MVCC in turn relies on undo log and read view implementations.

2.1What is MVCC?

The explanation on Baidu is as follows:

MVCC, full name Multi-Version Concurrency Control, that is, multi-version concurrency control. MVCC is a method of concurrency control, which generally implements concurrent access to the database in the database management system and transaction memory in the programming language.

MVCC makes the database read will not lock the data, ordinary SELECT requests will not be locked, improving the concurrent processing ability of the database; database write will be locked. With MVCC, the database can achieve isolation levels such as READ COMMITTED,REPEATABLE READ, and users can view the previous or previous historical versions of the current data, ensuring the I feature (isolation) in ACID.

MVCC only works under two isolation levels, REPEATABLE READ and READ COMMITIED. The other two isolation levels are not compatible with MVCC because READ UNCOMMITIED always reads the latest rows of data rather than rows that match the current transaction version. SERIALIZABLE locks all rows read.

2.1.1 MVCC in InnDB

Each transaction in InnDB has a unique transaction ID, recorded as transaction_id. It applies to InnDB at the beginning of the transaction and is strictly incremented in chronological order.

There are actually multiple versions of each row of data, which depends on undo log. Each time the transaction updates the data, a new version of the data is generated, and the transaction_id is recorded as row trx_id. At the same time, the old data version will be retained in undo log, and the new version will record the rollback pointer of the old version and get the previous version directly through it.

So, MVCC in InnDB is actually achieved by saving two hidden columns after each row of records. One column is the transaction ID:trx_id; and the other column is the rollback pointer: roll_pt.

2.2 undo log

The rollback log holds a version of the data before the transaction and can be used for rollback, while providing MVCC under multi-version concurrency control, that is, unlocked read.

Depending on the operation, undo log is divided into two types: insert undo log and update undo log.

2.2.1 insert undo log

The undo log generated by the insert operation, because no historical version of the insert operation record is visible only to the current transaction itself, and this record is not visible to other transactions, so the insert undo log can be deleted directly after the transaction commits without the need for a purge operation.

The main task of purge is to delete the data that has already been mark del in the database, and also to recycle undo pages in bulk.

So, when inserting data. Its initial state is as follows:

2.2.2 update undo log

The Undo log generated by the UPDATE and DELETE operations are of the same type: update_undo. (update can be regarded as insert new data to the original location, delete old data, undo log temporarily retain the old data).

When the transaction commits, it is placed on the history list, and there is no transaction to use these rollback logs, that is, when there is no earlier version of the rollback log in the system, the purge thread will perform the final delete operation.

A transaction modifies the current data:

Another transaction modifies the data:

There are multiple versions of the same record in the database, which is the multi-version concurrency control MVCC mentioned above.

In addition, you can return to the previous version state by rolling back with undo log. For example, to go back to V1, you only need to perform two rollbacks sequentially.

2.3 read-view

Read view is a consistent read view used by InnDB to implement MVCC to support the implementation of RC (read commit) and RR (repeatable readable) isolation levels.

Read view is not real, it's just a concept, and undo log is its embodiment. It is mainly calculated by version and undolog. The role is to determine what data the transaction can see.

Each transaction or statement has its own consistent view. Normal query statements are consistent reads, which determine the visibility of the data version based on the row trx_id and the consistency view.

2.3.1 visibility rules for data version

Read view mainly contains the active read and write transactions in the current system. In implementation, InnDB constructs an array for each transaction to save the transactions that are currently active (not yet committed) at the moment the transaction starts.

As mentioned earlier, transaction ID increases strictly over time, recording the maximum value of committed transaction ID in the system as the low water level of the array, and the created transaction ID + 1 as high water level.

This array of views and the high water level constitute a consistent view of the current transaction (read view)

Draw a picture of this array, which looks like this:

The rules are as follows:

1 if the trx_id is in a gray area, the trx_id of the accessed version is less than the id value of the low water level in the array, that is, the transaction that generated the version has been committed before generating the read view, so the version is visible and can be accessed by the current transaction.

2 if the trx_id is in the orange area, the trx_id of the accessed version is greater than the id value of the high water level in the array, that is, the transaction that generated the version is generated after the read view is generated, so the version is not visible and cannot be accessed by the current transaction.

3 if you are in a green zone, there will be two situations:

A) trx_id proves in the array that this version is generated by an uncommitted transaction and is not visible

B) trx_id is not in the array, proving that this version was generated by a committed transaction, as shown in

Third, I was a little confused when I was reading the tutorial, but fortunately, there were enthusiastic netizens to answer:

Falling in the green zone means that the transaction ID is in the range of low and high water levels, and whether it is really visible depends on whether the green zone has this value. If the green area does not have this transaction ID, it is visible, and if so, it is not visible. Being in this range does not mean that there is this value in this range, for example, 4 is in the range of 1-5 in this array, but not in this array.

This may be a little difficult to understand. I assume a scenario in which three transactions query and update the same data, and draw a diagram to make it easier to understand:

The original data is still shown in the following figure. Update the information of Zhang San with id = 2:

In response to the picture above, I would like to ask a question. What are the query age values for T4 and T5 at the RC (read commit) and RR (repeatable readable) isolation levels, respectively? What is the value of T4 update? Think about it for a moment. I believe everyone has his own answer. The answer is at the end of the article. I hope you can continue to read it with your own questions.

2.3.2 results under RR (repeatable)

At the RR level, the query only recognizes data that has been committed before the transaction starts, and a view is built once the transaction is started. So using the start transaction with consistent snapshot command, the view will be created immediately.

Now assume:

Before transaction A starts, there is only one active transaction, ID = 2

The committed transaction is the transaction that inserts the data ID = 1

The transaction ID of transaction A, B and C is 3, 4 and 5, respectively.

At this isolation level, the moment they create the view is as follows:

According to the figure above, the view array of transaction An is [2jue 3]; the view array of transaction B is [2jue 3re4]; and the view array of transaction C is [2pc3re4]. Analyze a wave:

At T4, the B-read data is read from the current version, and the process is as follows:

Read the current version of trx_id = 4, which happens to be yourself, you can see

So age = 24

At T5, the A-read data is read from the current version, and the process is as follows:

Read the current version of trx_id = 4, which is larger than the high water level of your own view array and is not visible.

Then read trx_id = 5, which is higher than the high water level of your own view array and is not visible.

Then read trx_id = 1, which is lower than the low water level of your own view array, and you can see

So age = 22

In this way, although this row of data has been modified during this period, transaction A sees that the results of this row of data are consistent whenever it is queried, so we call it a consistent read.

In fact, whether the view is visible or not depends on the timing of view creation and submission, and summarizes the rules:

Version not submitted, not visible

The version has been submitted, but is not visible after the view is created

The version has been submitted, but before the view is created, you can see

2.3.2.1 Snapshot read and current read

The update statement of transaction B, if read according to the consistency of the above figure, does not seem to have a good result.

As shown in the following figure, the view array of B is made by Mr., and transaction C commits after that. Then you should not see the C modified age = 23? How did B get 24 in the end?

Yes, if B executes the query before the update, the result must be age = 22. The problem is that the update cannot be updated in the historical version, otherwise the update of C will be lost.

Therefore, there is a rule for updating data: update data is read first and then written (read is the execution of update statements, not manually), reading is the value of the current version, which is called current reading, while our ordinary query statement is called snapshot reading.

Therefore, when updating, what you currently read is age = 23, and after the update, it becomes 24.

2.3.2.2 select current read

In addition to the update statement, the query statement is currently read if locked. If you change the query statement select age from t where id = 2 of transaction An and add a lock (lock in mode or for update), you can also get the age = 24 returned by the current version 4.

Here is the locked select statement:

Select age from t where id = 2 lock in mode; select age from t where id = 2 for update;2.3.2.3 transaction C does not commit immediately

Suppose transaction C is not committed immediately, but the age = 23 version has been generated. How will the update of transaction B go?

Transaction C has not been committed and the write lock has not been released, but the update of transaction B must be currently read and must be locked. So transaction B is blocked and must wait until transaction C commits to release the lock to continue the current read.

2.3.3 results under RC (read submission)

At the read commit isolation level, the query recognizes only the data that has been committed before the statement starts; a new view is recalculated before each statement is executed.

Note: in the table above, it is the start transaction with consistent snapshot command that starts the transaction, which creates a view that persists throughout the transaction. So, at the RC level, this command doesn't really work. Equivalent to a normal start transaction (the transaction is not started until the sql statement is executed). Therefore, the update of transaction B is actually after transaction C, which has not really started the transaction, and C has been committed.

Now assume:

Before transaction A starts, there is only one active transaction, ID = 2

The committed transaction is the transaction that inserts the data ID = 1

The transaction ID of transaction A, B and C is 3, 4 and 5, respectively.

At this isolation level, the moment they create the view is as follows:

According to the figure above, the view array of transaction An is [2jue 3re4], but its high water level is 6 or greater (transaction ID + 1 has been created); the view array of transaction B is [2jue 4]; and the view array of transaction C is [2jue 5]. Analyze a wave:

At T4, the B-read data is read from the current version, and the process is as follows:

Read the current version of trx_id = 4, which happens to be yourself, you can see

So age = 24

At T5, the A-read data is read from the current version, and the process is as follows:

Read the current version of trx_id = 4, which can contain 4 within your own consistent view, and is not visible.

Read up to trx_id = 5, which is within the scope of your own consistent view, but does not contain 5, you can see

So age = 23

The above is the content of this article on "what is the transaction in MySQL and the principle of MVCC". I believe we all have a certain understanding. I hope the content shared by the editor will be helpful to you. If you want to know more about the relevant knowledge, please pay attention to the industry information channel.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Database

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report