In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-01-18 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Development >
Share
Shulou(Shulou.com)06/03 Report--
This article will explain in detail the example analysis of MVCC multi-version concurrency in MySQL technology. The content of the article is of high quality, so the editor shares it with you for reference. I hope you will have a certain understanding of the relevant knowledge after reading this article.
one。 Premise outline what is MVCC
MVCC, full name Multi-Version Concurrency Control, that is, multi-version concurrency control. MVCC is a method of concurrency control, which generally implements concurrent access to the database in the database management system and transaction memory in the programming language.
The main purpose of the implementation of MVCC in MySQL InnoDB is to improve the concurrency performance of the database and to deal with read-write conflicts in a better way, so that even when there are read-write conflicts, it can be unlocked and non-blocking. Anyway, every change is recorded in the version chain. SELECT can get records from the version chain, which realizes the concurrent execution of read-write, write-read, and improves the performance of the system.
Transactions under two isolation levels, committed read (READ COMMITTD) and repeatable read (REPEATABLE READ), access records in the version chain for SELECT operations.
What is current read and snapshot read
Current read and snapshot read under MySQL InnoDB?
Current read
Operations such as select lock in share mode (shared lock), select for update; update, insert, and delete (exclusive lock) are all current reads, that is, they read the latest version of the record, and make sure that other concurrent transactions cannot modify the current record and lock the read record.
Snapshot read
For example, an unlocked select operation is a snapshot read, that is, an unlocked non-blocking read; the premise of a snapshot read is that the isolation level is not a serial level, and the snapshot read at the serial level will degenerate to the current read The reason for snapshot reading is based on the consideration of improving concurrency performance. The implementation of snapshot reading is based on multi-version concurrency control, that is, MVCC. It can be considered that MVCC is a variant of row lock, but in many cases, it avoids locking operations and reduces overhead. Since it is based on multiple versions, snapshot reading may not necessarily read the latest version of the data, but may be the previous historical version.
MVCC is to realize read-write conflict without lock, and this read refers to snapshot read, not current read, which is actually a locking operation and the realization of pessimistic lock.
The relationship between current read, snapshot read and MVCC
MVCC multi-version concurrency control refers to the concept of "maintaining multiple versions of a data so that there is no conflict between read and write operations". It is only an ideal concept and in MySQL, to realize such an MVCC ideal concept, we need MySQL to provide specific functions to implement it, and snapshot reading is one of the specific non-blocking read functions that MySQL implements for us in the MVCC ideal model.
Relatively speaking, the current reading is the specific function of the pessimistic lock, to be more detailed, snapshot reading itself is also an abstract concept, and then in-depth study. The specific implementation of MVCC model in MySQL is accomplished by three implicit fields, undo log, Read View and so on. For more information, please see the following MVCC implementation principle.
What problems can MVCC solve? what are the benefits?
There are three database concurrency scenarios, which are:
Read-read: there are no problems and no concurrency control
Read-write: there are thread safety problems, which may cause transaction isolation problems, and may encounter dirty reading, phantom reading, and unrepeatable reading.
Write-write: there are thread safety problems, and there may be update loss problems, such as the first type of update loss and the second type of update loss.
What are the benefits of MVCC?
Multi-version concurrency control (MVCC) is an unlocked concurrency control used to resolve read-write conflicts, that is, an one-way growing timestamp is allocated to a transaction, a version is saved for each modification, the version is associated with the transaction timestamp, and the read operation only reads the snapshot of the database before the transaction starts. So MVCC can solve the following problems for the database
When reading and writing to the database concurrently, it can be achieved that there is no need to block the write operation during the read operation, and the write operation does not have to block the read operation, which improves the performance of the database for concurrent read and write. at the same time, it can also solve the transaction isolation problems such as dirty read, phantom read and non-repeatable read, but it can not solve the problem of update loss.
In short, MVCC is because Daniel is not satisfied with the solution proposed by the database to solve read-write conflicts in the form of pessimistic locks, so with MVCC in the database, we can form two combinations:
MVCC + pessimistic lock: MVCC solves read and write conflicts, pessimistic lock solves write conflicts
MVCC + optimistic lock: MVCC solves read-write conflicts, optimistic locks resolve write-write conflicts
This combination can maximize database concurrency performance and solve the problems caused by read-write conflicts and write-write conflicts.
The realization principle of MVCC
The purpose of MVCC is multi-version concurrency control, and its implementation in the database is to solve the read-write conflict. Its implementation principle is mainly based on three implicit fields in the record, undo log and Read View. So let's first take a look at these three point concepts.
Implicit field
In addition to our custom fields, each row of records also has fields such as DB_TRX_ID,DB_ROLL_PTR,DB_ROW_ID implicitly defined in the database.
DB_TRX_ID:6byte, recently modified / inserted transaction ID: records the transaction ID that created this record / last modified it. This transaction ID is used to store the transaction ID each time a clustered index record is modified.
DB_ROLL_PTR:7byte, roll back the pointer, point to the last version of this record (stored in rollback segment)-> the target is the undoLog pointer, and each time a change is made to which clustered index is recorded, the old version is written to the undo log. This roll_pointer stores a pointer to the location of the previous version of the clustered index record, which is used to obtain the record information of the previous version. (note that the undo log for the insert operation does not have this property because it does not have the older version.)
DB_ROW_ID:6byte, the implicit self-increasing ID (hidden primary key). If the data table does not have a primary key, InnoDB will automatically generate a clustered index as DB_ROW_ID.
There is actually a delete flag hidden field, since the record is updated or deleted does not mean that it is really deleted, but the deletion flag has changed.
DB_ROW_ID is the only implicit primary key that the database generates by default for this record.
DB_TRX_ID is the transaction ID that currently operates on the record
DB_ROLL_PTR is a rollback pointer to match the undo log and point to the previous version
Undo log
There are two main types of undo log:
Insert undo log
The undo log that represents the undo log generated by a transaction when a new record is insert is only needed when the transaction is rolled back and can be discarded immediately after the transaction is committed.
Update undo log
The undo log; generated by the update or delete on behalf of the transaction is needed not only when the transaction is rolled back, but also when the snapshot is read back. Therefore, it cannot be deleted casually. Only when the snapshot read or the transaction rollback does not involve the log, the corresponding log will be cleared by the purge thread.
Purge operation in order to implement the MVCC mechanism of InnoDB, update or delete operations only set the deleted_bit of old records, and do not really delete outdated records.
To save disk space, InnoDB has a special purge thread to clean up records whose deleted_bit is true. In order not to affect the normal operation of the MVCC, the purge thread also maintains a read view (this read view is equivalent to the read view of the oldest active transaction in the system); if the deleted_bit of a record is true and the DB_TRX_ID is visible relative to the read view of the purge thread, then the record must be safely cleared.
The essence that is helpful to MVCC is update undo log. Undo log actually exists the old record chain in rollback segment, and its execution process is as follows:
1. For example, if a transaction inserts a persion table and inserts a new record, the record is as follows: name is Jerry, age is 24 years old, implicit primary key is 1, transaction ID and rollback pointer, let's assume NULL
2. Now there is a transaction 1 that changes the name of the record to Tom
When transaction 1 modifies the row (record) data, the database will first add an exclusive lock to the row
Then copy that line of data into undo log as an old record, since there is a copy of the current row in undo log
After the copy is completed, modify the name of the line to Tom, and modify the transaction ID of the hidden field to the ID of the current transaction 1. By default, we start with 1, and then increment it. The rollback pointer points to the copy record copied to undo log, which means that my last version is it.
Release the lock after the transaction is committed
Third, another transaction 2 modifies the same record of the person table, changing the age to 30 years old
When transaction 2 modifies the data of the row, the database also locks the row first
Then copy this line of data into undo log, as an old record, and find that the row record already has undo log, so the latest old data is inserted in front of the undo log of the row record as the header of the linked list.
Change the age of the row to 30 years old, and modify the transaction ID of the hidden field to the ID of the current transaction 2, which is 2, and the rollback pointer points to the copy record that has just been copied to undo log.
Transaction commit, release lock
From the above, we can see that the modification of the same record by different transactions or the same transaction will cause the undo log of the record to become a linear list of the record version. the linked list, the head of the undo log is the latest old record, and the end of the chain is the earliest old record (of course, as mentioned earlier, the node of the undo log may be cleared by the purge thread, to the first insert undo log in the diagram. In fact, it may be deleted and lost after the transaction is committed, but it is still here for demonstration)
Read View (read View) what is Read View? The difference between committed reads and repeatable reads is that they have different strategies for generating ReadView.
What is Read View? to put it bluntly, Read View is the read view (Read View) produced when the transaction performs a snapshot read operation. At the moment the snapshot read is performed by the transaction, a snapshot of the database system is generated and the ID of the current active transaction of the system is recorded and maintained (when each transaction is opened, an ID is assigned, this ID is incremented, so the latest transaction, the higher the ID value)
So we know that Read View is mainly used for visibility judgment, that is, when a transaction performs a snapshot read, create a Read View read view of the record and compare it to a condition to determine which version of the data the current transaction can see. It may be either the latest data or some version of the data in the undo log recorded by the row.
Read View follows a visibility algorithm, which mainly takes out the DB_TRX_ID (that is, the current transaction ID) from the latest record of the data to be modified, and compares it with the ID of other active transactions in the system (maintained by Read View). If the properties of DB_TRX_ID and Read View do not conform to the visibility, use the DB_ROLL_PTR rollback pointer to retrieve the DB_TRX_ID in the Undo Log and compare again. That is, traversing the DB_TRX_ID of the linked list (from the beginning to the end of the chain, that is, from the last modification) to finding a DB_TRX_ID that meets a specific condition, then the old record in which the DB_TRX_ID is located is the latest old version that can be seen by the current transaction.
So what is the condition of this judgment?
As above, it is a piece of source code for MySQL to judge visibility, that is, the changes_visible method (not completely ha, but you can see the general logic). This method shows how we compare DB_TRX_ID with some attributes of Read View.
Before I show you, let me simplify Read View. We can simply understand Read View as having three global properties.
Trx_list (I chose the name casually)
A list of values used to maintain the active transaction ID of the system at the time of Read View generation
Up_limit_id
Record the ID with the lowest transaction ID in the trx_list list
Low_limit_id
ReadView generates the next transaction ID that has not been allocated by the system at the time of generation, that is, the maximum value of the transaction ID that has occurred so far.
First, compare DB_TRX_ID < up_limit_id. If less than, the current transaction can see the record where DB_TRX_ID is located, and if greater than or equal to proceed to the next judgment.
Next, it is determined that the DB_TRX_ID is greater than or equal to low_limit_id. If it is greater than or equal to, it means that the record in which the DB_TRX_ID is located does not appear until after the Read View is generated. It must not be visible to the current transaction. If it is less than, then proceed to the next judgment.
Judge whether the DB_TRX_ID is in an active transaction, trx_list.contains (DB_TRX_ID). If so, it represents the time when my Read View is generated. Your transaction is still active, without Commit, and the data you modified is invisible to my current transaction. If not, it means that your transaction was already Commit before the Read View was generated, and the result of your modification can be seen in my current transaction.
Overall process
After understanding the concepts of implicit fields, undo log, and Read View, you can take a look at the overall flow of the MVCC implementation.
When transaction 2 performs a snapshot read on a row of data, the database generates a Read View read view for that row of data, assuming that the current transaction ID is 2, transaction 1 and transaction 3 are active at this time, and transaction 4 commits an update just before the snapshot read of transaction 2, so Read View records the ID of the system's current active transaction 1, which is maintained on a list, assuming that we call it trx_list.
Read View will not only maintain transaction 2 through a list trx_list to perform snapshot reads of the transaction ID that the system is active at the moment, but there will also be two attributes up_limit_id (the ID with the smallest transaction ID in the record trx_list list), low_limit_id (the ID with the largest transaction ID in the record trx_list list, and some people say that snapshot reads the next transaction ID that has not been assigned by the system at that moment, that is, the maximum value of transaction ID that has appeared so far. I prefer the latter. So in this example, up_limit_id is 1, and the value of the collection 4 + 1 = 5 is 1, and the value of the collection, 3 read View, is shown below.
In the example, only transaction 4 modifies the row record and commits the transaction before transaction 2 performs a snapshot read, so the undo log of the current data in the row is shown in the following figure When our transaction 2 takes a snapshot to read the row record, it compares the DB_TRX_ID of the row record with the up_limit_id,low_limit_id and the active transaction ID list (trx_list) to determine which version of the record the current transaction 2 can see.
So first compare the transaction ID4 recorded in the DB_TRX_ID field of the record with the up_limit_id of Read View to see whether 4 is less than up_limit_id (1), so it does not meet the condition. Continue to judge whether 4 is greater than or equal to low_limit_id (5), nor does it meet the condition. Finally, it determines whether transaction 4 is an active transaction in trx_list, and finally finds that the transaction with transaction ID of 4 is not in the current active transaction list and meets the visibility condition, so the latest result of transaction 4's modified commit is visible to transaction 2's snapshot, so the latest data record that transaction 2 can read is the version committed by transaction 4, and the version of transaction 4's commit is also the latest version from a global point of view. It is also the different timing of Read View generation, which results in different results of snapshot reading at the RC,RR level.
Problems related to MVCC
How does RR solve the problem of unrepeatable reading on the basis of RC level? The difference between current reads and snapshot reads at the RR level:
In the order here in Table 2, the snapshot read and current read of transaction B after transaction A commits are both real-time new data 400. Why?
The only difference between this and the above table is that transaction B in Table 1 has taken a snapshot of the amount data before transaction A modifies the amount, while transaction B in Table 2 has not done a snapshot read before transaction A modified the amount.
So we know that the result of snapshot reading in a transaction is very dependent on the place where the snapshot read occurs for the first time in the transaction, that is, the place where the snapshot read occurs for the first time in a transaction is critical, and it has the ability to determine the result of subsequent snapshot reads in the transaction. What we are testing here is update, and deletion and update are the same. If the snapshot read of transaction B is performed after transaction An operation, the snapshot read of transaction B can also read the latest data.
What is the difference between InnoDB snapshot reads at the RC,RR level?
It is the different timing of Read View generation that leads to different results of snapshot reading at the RC,RR level.
The first snapshot read of a record by a transaction at the RR level creates a snapshot and a Read View to record other transactions active in the current system, and then the snapshot read is still called with the same Read View, so as long as the current transaction uses the same snapshot read before the other transaction commits the update, then all subsequent snapshot reads use the same Read View. So subsequent changes are not visible.
That is, at the RR level, when a snapshot read generates Read View, Read View takes a snapshot of all other active transactions at this time, and the modifications of these transactions are not visible to the current transaction. Changes made by transactions created earlier than Read View are visible, and at the RC level, each snapshot read generates a new snapshot and Read View, which is why we can see updates committed by other transactions in transactions at the RC level.
In short, at the RC isolation level, each snapshot read generates and acquires the latest Read View;, while at the RR isolation level, the Read View is created by the first snapshot read in the same transaction, and subsequent snapshot reads get the same Read View.
This is the end of the sample analysis of MVCC multi-version concurrency in MySQL technology. I hope the above content can be helpful to you and learn more knowledge. If you think the article is good, you can share it for more people to see.
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.