In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-01-19 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Database >
Share
Shulou(Shulou.com)05/31 Report--
This article introduces the knowledge of "can the repeatable level of MySQL solve the problem of phantom reading". In the operation of actual cases, many people will encounter such a dilemma, so let the editor lead you to learn how to deal with these situations. I hope you can read it carefully and be able to achieve something!
Introduction
When I had an in-depth understanding of database theory, I learned about the possible problems with different isolation levels of transactions. For a better understanding, test and reproduce these problems in the MySQL database. Both dirty reading and unrepeatable reading can be easily repeated at the corresponding isolation level.
But for phantom reading, I found that it did not appear under the isolation level of repeatable reading, and I wondered what MySQL had done to phantom reading.
Test:
Create a test table dept:
CREATE TABLE `dept` (`id` int (11) NOT NULL AUTO_INCREMENT, `name` varchar (20) DEFAULT NULL, PRIMARY KEY (`id`) ENGINE=InnoDB AUTO_INCREMENT=12 DEFAULT CHARSET=utf8 insert into dept (name) values ("Logistics Department")
According to the above process execution, it is expected that the first select of transaction 1 queries out one piece of data, and the second select queries out two pieces of data (including the data committed by transaction 2).
But in the actual test, it is found that the second select actually only queries and processes one piece of data. This is but according to the repeatable implementation of database theory (exclusive lock and shared lock) this is not the case.
Let's review the relevant theories of affairs before we understand the actual causes.
Database principle theory
Business
Transaction, which generally refers to what is to be done or done. In computer terminology, a program execution unit (unit) that accesses and may update various data items in a database. A transaction consists of all operations performed between the start of the transaction (begin transaction) and the end of the transaction (end transaction).
In a relational database, a transaction can be a set of SQL statements or an entire program.
Why should there be a transaction?
A database transaction usually contains a sequence of operations that read or write to the database. It exists for the following two purposes:
It provides a method for database operation to recover from failure to normal state, and also provides a way for database to maintain consistency in abnormal state.
When multiple applications are accessing the database concurrently, an isolation method can be provided between these applications to ensure that their operations interfere with each other.
Transaction characteristics
Transactions have four characteristics: atomicity, consistency, isolation, and persistence. These four attributes are often referred to as ACID attributes.
Atomicity: a transaction should be an indivisible unit of work, and the operations included in the transaction are either successful or unsuccessful.
Consistency: the transaction must be to change the database from one consistency state to another. Consistency is closely related to atomicity.
Isolation: the execution of one transaction cannot be interfered with by other transactions. That is, the operations and the data used within a transaction are isolated from other concurrent transactions before the transaction is committed, and the concurrent transactions can not influence each other.
Durability: once a transaction is successfully committed, its changes to the data in the database should be permanent. Other operations or failures that follow should not affect it in any way.
Several features between transactions are not an equivalent set of concepts:
If there is only one transaction at any time, it is naturally isolated, and it can be consistent as long as atomicity is guaranteed.
If there is concurrency, atomicity and isolation need to be ensured to ensure consistency.
Problems in database concurrent transactions
If you do not consider the isolation of transactions, the following problems occur:
Dirty reading
Dirty reading refers to reading data from another uncommitted transaction during one transaction. When a transaction is modifying a data many times, and the many modifications in the transaction have not been committed, then a concurrent transaction to access the data will cause the data obtained by the two transactions to be inconsistent.
Non-repeatable
Non-repeatable reading means that different data values are returned by multiple queries within a transaction range for a piece of data in the database (the difference here means that the contents of one or more pieces of data are inconsistent, but the number of data items is the same). This is because the data needed by the transaction is modified and committed by another transaction during the query interval.
The difference between unrepeatable reading and dirty reading is that dirty reading is that one transaction reads dirty data not committed by another transaction, while unrepeatable reading reads data committed by other transactions. It should be noted that in some cases unrepeatable reading is not a problem.
Illusory reading
Illusory reading is a phenomenon that occurs when a transaction is not executed independently. For example, transaction T1 modifies a data item of all rows in a table from "1" to "2", and transaction T2 inserts a row of data items into the table, and the value of this data item is still "1" and submitted to the database.
On the other hand, if the user who operates transaction T1 looks at the data that has just been modified, he will find that there is still one line that has not been modified. In fact, this line is added from transaction T2, which is like an illusion, which is a hallucination.
Both phantom and unrepeatable reads read another committed transaction (this is different from dirty reads), except that unrepeatable reads may occur in update,delete operations, while phantom reads occur in insert operations.
Exclusive lock, shared lock
Exclusive lock (Exclusive), also known as X lock, write lock.
Shared lock (Shared), also known as S lock, read lock.
There are the following relationships between read-write locks:
A transaction adds an S lock to the data object O, and the O can be read, but not updated. During locking, other transactions can add S locks to O, but not X locks.
A transaction can read and update O by adding an X lock to the data object. No locks can be applied to O by other transactions during locking.
That is, the relationship between read-write locks can be summarized as: multiple read-write
Isolation level of the transaction
There are several isolation levels in a transaction:
Read unsubmitted (Read Uncommitted)
Resolve the issue of missing updates. If one transaction has already started writing, other transactions are not allowed to write at the same time, but other transactions are allowed to read this row of data. This isolation level can be achieved through an "exclusive write lock", that is, some data that the transaction needs to modify must be locked by X, and the read data does not need to be locked by S.
Read submitted (Read Committed)
Solved the problem of dirty reading. Transactions that read data allow other transactions to continue to access the row's data, but uncommitted write transactions will prevent other transactions from accessing the row. This can be achieved through "instant shared read lock" and "exclusive write lock", that is, some data that the transaction needs to modify must be locked by X and S when the data is read, and the S lock is released immediately after the data is read. You don't have to wait until the end of the transaction.
Repeatable read (Repeatable Read)
Non-repeatable and dirty reads are prohibited, but phantom reading data may sometimes occur. Transactions that read data will prohibit writing transactions (but read transactions are allowed), and write transactions will prohibit any other transactions.
Mysql uses this isolation level by default. This can be achieved through "shared read locks" and "exclusive write locks", that is, some data that the transaction needs to modify must be locked by X and S when the data is read, and the S lock will not be released immediately when the data is read. Instead, wait until the end of the transaction.
Serialization (Serializable)
Solved the problem of fantasy reading. Provide strict transaction isolation. It requires transaction serialization execution, and transactions can only be executed one after another, not concurrently. Transaction serialization cannot be achieved through row-level locks alone, and other mechanisms must be used to ensure that newly inserted data is not accessed by the transaction that has just performed the query operation.
Implementation of isolation level in MySQL
The above explains some concepts of database theory, but in databases such as MySQL and ORACLE, performance considerations are not entirely based on the theory described above.
MVCC
Multi-version concurrency control (Multi-Version Concurrency Control, MVCC) is a way to realize isolation level based on optimistic lock theory in MySQL, which is used to realize read committed and repeatable read isolation level.
Implementation (isolation level is repeatable)
Before talking about how to implement, introduce two concepts:
System version number: an incremental number that is automatically incremented each time a new transaction is started.
Transaction version number: the system version number at the beginning of the transaction.
In MySQL, two fields are added after each piece of data in the table:
Create version number: when creating a row of data, the current system version number is assigned as the creation version number
Delete version number: when deleting a row of data, the current system version number is assigned as the deletion version number
SELECT
The rule for reading data when select is: create the version number of the current transaction version number.
Creating the version number the current transaction version number ensures that the data is not deleted and should be checked out at least before the transaction is opened.
INSERT
Insert assigns the current system version number to the create version number field.
UPDATE
Insert a new record, save the current transaction version number to create the version number for the row, and save the current transaction version number to the original deleted row, which is actually updated through delete and insert.
DELETE
When deleting, the current system version number is assigned to the delete version number field, identifying the transaction in which the data will be deleted, even if it is not actually deleted when the commit is in place. The data opened according to the rules of select will not be queried.
Did MVCC really solve the illusion?
From the beginning of our test example and the above theoretical support, it seems that the problem of phantom reading has been solved through MVCC in MySQL, so since such serial reading seems to be meaningless, continue the test with doubt.
Pre-test data:
Based on the above results, we expect that the result is as follows:
Idname1 Finance Department 2 Research and Development Department
But in fact, what we have gone through is:
What we had hoped for was that the department of the first piece of data was changed to finance, but it turned out that both pieces of data had been modified.
This result tells us that in fact, the problem of phantom reading is not completely solved in the MySQL repeatable isolation level, but in the case of reading data. However, for the modified operation, there is still a problem of phantom reading, that is to say, MVCC is not thorough in solving the problem of phantom reading.
Snapshot read and current read
With the above situation, we need to know why this happened. After consulting some data, it is found that at the RR level, although the data becomes readable repeatedly through the MVCC mechanism, the data we read may be historical data, not the latest data in the database.
This way of reading historical data is called snapshot read, while the way of reading the latest version of the database is called current read.
Select snapshot read
When performing a select operation, innodb will perform a snapshot read by default, and record the result of this select. After that, the select will return the data of this snapshot, even if other transactions have committed data that will not affect the current select. This enables repeatable reading.
Snapshot generation when select is executed for the first time, that is, suppose that when An opens the transaction and then does nothing, B insert a piece of data and then commit, and An executes select, then the returned data will have the data added by B.
It doesn't matter if there is any other transaction commit after that, because the snapshot has been generated, and the subsequent select is based on the snapshot.
Current read
The current read mode is used for operations that will modify the data (update, insert, delete). When performing these operations, the latest records are read, and even the data committed by other transactions can be queried.
Suppose you want to update a record, but the data has been delete and commit in another transaction, if update will cause conflicts, so you need to know the latest data when update. It is precisely because of this that leads to the kind of situation we tested above.
The current read of select requires manual locking:
Select * from table where? Lock in share mode; select * from table where? For update
There's a problem to explain.
At the beginning of the test, I thought that using the begin statement was to start a transaction, so in the second test above, because transaction 1 started first, the data added to transaction 2 was found in transaction 1. At that time, I thought that this was inconsistent with the rule of select in the previous MVCC, so I did the following test:
SELECT * FROM information_schema.INNODB_TRX / / is used to query currently executing transactions
You can see that if you just execute the begin statement, you don't actually open a transaction.
Here is to add a select statement after begin:
Therefore, it is necessary to understand that in fact, it is only after adding, deleting, changing and checking the data that a transaction is opened.
How to solve Phantom Reading
It is clear that the repeatable isolation level does not completely solve the problem of phantom reading, and there are two ways to solve the problem of phantom reading if we need to solve it in our project:
Isolation level using serialized reads
MVCC+next-key locks:next-key locks consists of record locks (index lock) and gap locks (gap lock, which locks not only the data that needs to be used, but also the data near it).
In fact, many projects will not use the above two methods, the performance of serial reading is too poor, and in fact, most of the time, phantom reading is completely acceptable to us.
This is the end of the content of "can the repeatable level of MySQL solve the problem of phantom reading". Thank you for your reading. If you want to know more about the industry, you can follow the website, the editor will output more high-quality practical articles for you!
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.