In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-01-17 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >
Share
Shulou(Shulou.com)06/01 Report--
This article introduces the relevant knowledge of "what is the value read when MySQL MVCC updates the data". In the operation of actual cases, many people will encounter such a dilemma, so let the editor lead you to learn how to deal with these situations. I hope you can read it carefully and be able to achieve something!
If it is a repeatable readable isolation level, transaction T creates a view read-view when it starts, and then during transaction T execution, even if other transactions modify the data, transaction T still sees what it sees at startup. In other words, a transaction executed under the repeatable readable isolation level seems to be at peace with the world and free from external influence.
But when sharing a row lock, it is mentioned that a transaction needs to update a row, and if another transaction happens to have a row lock for that row, it cannot be so detached, it will be locked and enter the waiting state. The question is, now that it is in a waiting state, what value does the transaction read when it acquires the row lock to update data?
Let me give you an example. The following is an initialization statement for a table with only two rows.
Mysql > CREATE TABLE `t` (`id` int (11) NOT NULL, `k` int (11) DEFAULT NULL, PRIMARY KEY (`id`)) ENGINE=InnoDB;insert into t (id, k) values (1), (2)
Here, we need to pay attention to the timing of the transaction.
The begin/start transaction command is not the starting point of a transaction, and the transaction really starts only after the execution of the first statement that operates on the InnoDB table (the first snapshot read statement). If you want to start a transaction immediately, you can use the command start transaction with consistent snapshot. It is also important to note that in our example, if not specified, it is the default autocommit=1.
In this example, transaction C does not explicitly use begin/commit, indicating that the update statement itself is a transaction and is automatically committed when the statement is completed. Transaction B is queried after updating the row; transaction An is queried in a read-only transaction and in chronological order after the query of transaction B.
At this point, if I tell you that the value of k found by transaction B is 3 and the value of k found by transaction An is 1, do you feel a little dizzy?
So, in today's article, I actually want to clarify this problem with you. I hope that the process of solving this doubt will help you to have a better understanding of InnoDB's transactions and locks. In MySQL, there are two concepts of views:
One is view. It is a virtual table defined by a query statement that executes the query statement and generates the results when called. The syntax for creating a view is create view... And its query method is the same as a table
The other is the consistent read view that InnoDB uses when implementing MVCC, namely consistent read view, which is used to support the implementation of RC (Read Committed, read commit) and RR (Repeatable Read, repeatable readable) isolation levels.
It has no physical structure and is used to define what data I can see during transaction execution. I explained to you the implementation logic of MVCC. Today, to illustrate the difference between query and update, I'll take read view apart in a different way. You can combine the instructions in these two articles to learn more about MVCC.
How does a snapshot work in MVCC?
At the repeatable readable isolation level, the transaction "took a snapshot" when it started. Note that this snapshot is based on the entire library.
At this point, you will say that this does not look very realistic. If a library has 100 gigabytes, then I start a transaction and MySQL has to copy 100g of data, which is a slow process. However, my usual affairs are carried out very quickly.
In fact, we do not need to copy out the 100G data. Let's take a look at how this snapshot is implemented.
Each transaction in InnoDB has a unique transaction ID, called transaction id. It is applied to the transaction system of InnoDB at the beginning of the transaction and is strictly incremented in the order of application.
There are multiple versions of each row of data. Each time the transaction updates the data, a new data version is generated and the transaction id is assigned to the transaction ID for that data version, marked as row trx_id. At the same time, the old data version should be retained, and in the new data version, information can be obtained directly. In other words, a row of records in a data table may actually have multiple versions (row), each with its own row trx_id. The following figure shows the state of a record after it has been continuously updated by multiple transactions.
In the dotted frame in the picture are four versions of the same row of data. The current latest version is V4Magi k with a value of 22, which is updated by a transaction with a transaction id of 25, so its row trx_id is also 25.
You might ask, didn't the previous article say that statement updates generate undo log (rollback logs)? So, where is undo log?
In fact, the three dotted arrows in figure 2 are undo log;, while V1, V2, and V3 are not physically real, but are calculated based on the current version and undo log each time they are needed. For example, when you need V2, you use V4 to calculate U3 and U2 in turn.
Now that we understand the concept of multiple versions and row trx_id, let's think about how InnoDB defines that "100G" snapshot.
According to the definition of repeatable readability, when a transaction starts, you can see the results of all committed transactions. But then, during the execution of this transaction, updates from other transactions are not visible to it.
Therefore, a transaction only needs to declare at startup time, "at the time of my startup, if a data version is generated before I start, I will recognize it; if it is generated after I start, I will not recognize it. I have to find the last version of it.
Of course, if the previous version is not visible, you have to move on. Also, if the transaction updates its own data, it still has to recognize it.
In implementation, InnoDB constructs an array for each transaction to hold the transaction startup moment, and all transactions that are currently "active" ID. "active" means that it is started but has not yet been committed.
The minimum value of transaction ID in the array is marked as low water level, and the maximum value of transaction ID that has been created in the current system plus 1 is recorded as high water level.
This array of views and the high water level form a consistent view of the current transaction (read-view). The visibility rules of the data version are based on the comparison between the row trx_id of the data and this consistent view. This view array divides all row trx_id into several different cases.
In this way, there are several possibilities for a data version of row trx_id at the start of the current transaction:
1. If it falls in the green section, it indicates that this version is a committed transaction or the current transaction generated by itself, and this data is visible.
2. If it falls on the red part, it means that this version is generated by a transaction started in the future and is definitely invisible.
3. If it falls in the yellow part, there are two situations.
3.1 if row trx_id is in the array, this version is generated by a transaction that has not yet been committed and is not visible
3.2 if row trx_id is not in the array, it means that this version is generated by a committed transaction, visible.
For example, for the data in figure 2, if there is a transaction whose low water level is 18, then when it accesses this row of data, it calculates V3 from V4 through U3, so in its view, the value of this row is 11.
You see, with this statement, the subsequent updates in the system have nothing to do with what the transaction sees? Because of the subsequent update, the generated version must be in the case of 2 or 3 (a) above, and for it, these new data versions do not exist, so the snapshot of the transaction is static.
So as you know now, InnoDB takes advantage of the fact that all data has multiple versions to achieve the ability to create snapshots in seconds.
Next, let's take a look at the three transactions in figure 1 and analyze the result returned by the statement of transaction An and why it is knew1. Here, we might as well make the following assumptions:
Before transaction A starts, there is only one active transaction in the system. The ID is 99.
The version numbers of transaction A, B and C are 100, 101 and 102 respectively, and there are only these four transactions in the current system.
Before the start of the three transactions, (1) the row trx_id of this row is 90.
Thus, the view array of transaction An is [99100], the view array of transaction B is [99100101], and the view array of transaction C is [99100101102].
In order to simplify the analysis, I first remove the other interfering statements and draw only the operations related to the transaction A query logic:
As you can see from the figure, the first valid update is transaction C, which changes the data from (1) to (1). At this time, the latest version of the data is row trx_id 102, while version 90 has become a historical version.
The second valid update is transaction B, which changes the data from (1p2) to (1p3). At this time, the latest version of the data (row trx_id) is 101and 102 becomes the historical version.
You may have noticed that at the time of transaction A query, transaction B has not yet committed, but the version it generated (1d3) has become the current version. But this version must be invisible to transaction A, or it will become dirty.
OK, now transaction An is going to read the data, and its view array is [99100]. Of course, the data is read from the current version. Therefore, the read data flow of the transaction A query statement is as follows:
When row trx_id=101 is found, it is judged that it is larger than the high water level, is in the red area, and is not visible.
Then, find the previous historical version, look at the row trx_id=102, larger than the high water level, in the red area, invisible
Looking further, I finally found (1), its row trx_id=90, which is smaller than the low water level, is in the green zone, visible.
In this way, although this row of data has been modified during this period, transaction A sees that the results of this row of data are consistent whenever it is queried, so we call it a consistent read.
This judgment rule is translated directly from the code logic, but as you can see, it is troublesome to use visibility in human flesh analysis. So, let me translate it for you. For a data version, there are three situations for a transaction view, except that its own updates are always visible:
Version not submitted, not visible
The version was submitted, but it was submitted after the view was created and is not visible
The version has been submitted, and it was submitted before the view was created, visible.
Now, we use this rule to determine the query result in figure 4. The view array of the query statement of transaction An is generated when transaction A starts, which is the time:
(1p3) has not been submitted yet. It belongs to case 1 and is not visible.
(1) although it was submitted, it was submitted after the view array was created, which belongs to case 2 and is not visible
(1recovery1) is submitted before the view array is created, visible.
You see, after getting rid of the digital comparison, it is much easier to judge only by the chronological order. So, later, we all use this rule to analyze.
Update logic
Careful students may have questions: if you read the update statement of transaction B according to consistency, it seems that the result is not correct. If you look at the figure below, the view array of transaction B is made by Mr., and then transaction C commits. Isn't it supposed to be invisible (1Mague 2)? how can it be calculated (1Magne3)?
Yes, if transaction B queries the data once before the update, the value of k returned by this query is indeed 1. However, when it is going to update the data, it can no longer be updated in the historical version, otherwise the update of transaction C will be lost. Therefore, the set k=k+1 of transaction B at this time is based on (1mem2).
So, here is a rule: update data is read before you write, and this read can only read the current value, called "current read" (current read).
Therefore, at the time of the update, the data currently read is (1mem2). After the update, a new version of the data is generated (1Jing 3), and this new version of row trx_id is 101. Therefore, when executing the transaction B query statement, I see that my version number is 101, and the latest data version number is also 101, which is my own update and can be used directly, so the value of k obtained by the query is 3.
Here we mention a concept called current reading. In fact, in addition to the update statement, the select statement, if locked, is also currently read. Therefore, if you modify the query statement select * from t where id=1 of transaction A, and add lock in share mode or for update, you can also read the data with version number 101, and the value of k returned is 3. The following two select statements add read locks (S locks, shared locks) and write locks (X locks, exclusive locks), respectively.
Mysql > select k from t where id=1 lock in share mode;mysql > select k from t where id=1 for update; one more step forward
Suppose transaction C is not committed immediately, but becomes the following transaction C?
The difference of transaction C'is that it is not committed immediately after the update, and the update statement of transaction B is initiated before it commits. As mentioned earlier, although transaction C 'has not yet been committed, this version has been generated and is currently the latest version. So, what about the update statement for transaction B?
Keep in mind at this point (in InnoDB transactions, row locks are added when needed, but not immediately when they are not needed, but not until the end of the transaction. This is the two-phase locking protocol. Transaction C'is not committed, which means that the write lock on this version has not been released. Transaction B is the current read, must read the latest version, and must be locked, so it is locked, and must wait until transaction C 'releases the lock before continuing its current read.
At this point, we string consistent reads, current reads, and row locks together. Now, let's return to the question at the beginning of the article: how is the repeatability of transactions achieved?
The core of repeatable readability is consistent read; when a transaction updates data, it can only use the current read. If the row lock of the current record is occupied by another transaction, you need to enter the lock and wait.
The logic of read submission is similar to that of repeatable readings, with the main difference being:
At the repeatable readable isolation level, you only need to create a consistency view at the beginning of the transaction, which is shared by all other queries in the transaction
At the read commit isolation level, a new view is recalculated before each statement is executed.
So, let's take a look at how much k should be found by the query statements of transaction An and transaction B at the read commit isolation level.
To be clear here, start transaction with consistent snapshot; means to create a consistent snapshot that persists throughout the transaction, starting with this statement. Therefore, at the read commit isolation level, this usage is meaningless and is equivalent to a normal start transaction.
The following is the state diagram when reading the submission, and you can see that the timing of creating the view array of these two query statements has changed, which is the read view box in the diagram. (note: here, we are using the logic of transaction C to commit directly, not transaction C')
At this point, the view array of the query statement of transaction An is created when the statement is executed, and the generation time in sequence (1meme2) and (1meme3) is before the time when the view array is created. But at this moment:
(1p3) has not been submitted yet. It belongs to case 1 and is not visible.
(1pd2) submitted, in case 3, visible.
So, at this point, the transaction A query statement returns krypton2. Obviously, transaction B queries the result knew3.
Thinking
Using the following table structure and initialization statements as the experimental environment, the transaction isolation level is repeatable. Now, I'm going to zero the c values of all "rows with equal field c and id values", but I found a "weird" situation that can't be changed. Please construct this situation and explain its principle.
Mysql > CREATE TABLE `t` (`id` int (11) NOT NULL, `c` int (11) DEFAULT NULL, PRIMARY KEY (`id`)) ENGINE=InnoDB;insert into t (id, c) values (1), (2), (3), (4)
Answer: take advantage of the accuracy of update
This is the end of the content of "what is the value read when MySQL MVCC updates the data". Thank you for reading. If you want to know more about the industry, you can follow the website, the editor will output more high-quality practical articles for you!
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.