In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-01-18 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Database >
Share
Shulou(Shulou.com)05/31 Report--
This article introduces what is transaction and lock in MySQL. The content is very detailed. Interested friends can use it for reference. I hope it will be helpful to you.
Implementation of transaction in MySQL
In relational databases, the importance of transactions is self-evident. As long as people who have a little knowledge of the database know that transactions have four basic attributes of ACID, what we do not know may be how the database implements these four attributes. In this article, we will analyze the implementation of transactions and try to understand how the database implements transactions. Of course, we will also give a brief introduction to the implementation of ACID in MySQL.
Transaction is actually the basic unit of concurrency control; I believe we all know that a transaction is a sequential operation in which either all or none of the operations are performed, and it is an indivisible unit of work. The four ACID features of database transactions are the basis of transactions. After understanding how ACID is implemented, we will clear the implementation of transactions. Next, we will introduce how the database implements these four features in turn.
Atomicity
When learning a transaction, someone will often tell you that a transaction is a series of operations, either all or not, which is actually a portrayal of the atomicity of the transaction; although the transaction is atomic, but atomicity is not only related to the transaction, it will appear in many places.
Because the operation is not atomic and can be subdivided into multiple operations, when these operations have errors or throw exceptions, the whole operation may not continue, and the side effects caused by the operations that have already been carried out may result in the loss or error of data updates.
In fact, a transaction is not much different from an operation. It is a collection of a series of database operations (which can be understood as SQL). If the transaction is not atomic, then there is no way to guarantee that all operations in the same transaction are executed or not executed, and the whole database system is neither available nor trusted.
Roll back the log
To ensure the atomicity of a transaction, you need to roll back the operations that have been performed when an exception occurs, while in MySQL, the recovery mechanism is implemented through a rollback log (undo log), in which all changes made by transactions are first recorded in the rollback log, and then the corresponding rows in the database are written.
This process is actually very understandable, in order to undo all previous actions when an error occurs, it must be necessary to record all previous actions so that they can be rolled back when an error occurs.
The rollback log can not only provide rollback related information when an error occurs or when the user executes ROLLBACK, but also after the whole system crashes and the database process is killed directly, when the user starts the database process again, it can immediately query the rollback log to roll back the previously uncompleted transactions, which requires that the rollback log must be persisted to disk before the data is persisted. It is the main reason why we need to write the log before writing the database.
The rollback log does not physically restore the database to the way it was before the statement or transaction was executed; it is a logical log, and when the rollback log is used, it will only logically undo the changes in the database according to the log. It can be understood that every INSERT we use in the transaction corresponds to a DELETE, and each UPDATE corresponds to an opposite UPDATE statement.
Here, we will not introduce the format of the rollback log and how it is managed. This article focuses on what kind of thing it is and what problems have been solved and how. If you want to know the details of the implementation, I believe that there must be a lot of articles on the network about the rollback log.
The status of the transaction
Because a transaction is atomic, from a distance, a transaction is an inseparable whole, and there are only three states of a transaction: Active, Commited, and Failed. The transaction is either in execution or a state of success or failure:
But if we zoom in, we will find that the transaction is no longer atomic, including many intermediate states, such as partial commit, and the state diagram of the transaction becomes more and more complex.
The state diagram and description of the transaction are taken from Chapter 14 of the Database System Concepts book.
Active: the initial state of the transaction, indicating that the transaction is executing
Partially Commited: after the last statement is executed
Failed: after discovering that the transaction cannot be executed properly
Aborted: after the transaction is rolled back and the database is restored to the state before the transaction
Commited: execute the entire transaction successfully
Although the state of the entire database can be restored when an error occurs, however, if we perform operations such as printing logs to standard output, sending emails to the outside world, not modifying the contents of the disk through the database, or even transferring remittances during the execution of the transaction, then there is no way to roll back these operations as visible external output These problems are solved and responsible by application developers, and in most cases, we need to trigger similar operations that cannot be rolled back after the entire transaction is committed.
Take booking as an example, even if we make a request to a third party only after the end of the whole transaction, because requesting from the third party and obtaining the result is an operation that requires a long event, if the database or server crashes when the transaction is just committed, then we are very likely to lose the process of initiating the request, which creates a very serious problem. This is not guaranteed by the database, and developers need to check at the appropriate time whether the request was initiated and whether the result was successful or failed.
Atomicity of parallel transactions
So far, all transactions are executed serially, and the problem of parallel execution has not been considered; however, in practice, transactions executed in parallel are the norm, but in parallel tasks, very complex problems may arise:
When Transaction1 reads and writes to the user with id = 1 during execution, but does not commit or roll back the modified content, Transaction2 reads the same data and commits the transaction That is to say, Transaction2 depends on Transaction1. When Transaction1 needs to be rolled back due to some errors, it is necessary to roll back Transaction2 because of the atomicity of the transaction, but because we have committed Transaction2, we have no way to roll back. Under this problem, we have a problem. Database System Concepts calls this phenomenon an unrecoverable arrangement (Nonrecoverable Schedule). Under what circumstances can it be recovered?
A recoverable schedule is one where, for each pair of transactions Ti and Tj such that Tj reads a data item previously written by Ti, the commit operation of Ti appears before the commit operation of Tj.
To put it simply, if the Transaction2 depends on the transaction Transaction1, then the transaction Transaction1 must complete the commit before the Transaction2 commit:
However, this is not over, as the number of transactions increases, the whole recovery process becomes more and more complex, and it is not so easy if we want to recover from the errors that occurred in the transaction.
In an event shown in the figure above, Transaction2 depends on Transaction1, while Transaction3 depends on Transaction1. When Transaction1 rolls back due to execution problems, it rolls back all the work in Transaction2 and Transaction3 in order to ensure the atomicity of the transaction. This situation is also called cascading rollback (Cascading Rollback). The occurrence of cascading rollback will cause a lot of work to be withdrawn, which is difficult for us to accept, but if we want to achieve absolute atomicity. This matter has to be dealt with again, and we will explain in detail how to deal with the atomicity of parallel transactions later in the article.
Persistence
Since it is a database, then there must be a very strong demand for persistent storage of data, if the data is written to the database, then the data must be safely stored on disk; and the persistence of the transaction is reflected in, once the transaction is committed, then the data must be written to the database and persisted.
After the transaction has been committed, it cannot be rolled back again, and the only way to recall the committed transaction is to create an opposite transaction to "compensate" the original operation, which is also one of the manifestations of transaction persistence.
Redo log
Like atomicity, transaction persistence is achieved through logs. MySQL uses redo logs (redo log) to achieve transaction persistence. Redo logs are composed of two parts, one is the redo log buffer in memory, because the redo log buffer is in memory, it is volatile, and the other is the redo log file on disk, which is persistent.
When we try to modify the data in a transaction, it will first read the data from disk into memory and update the data cached in memory, then generate a redo log and write to the redo log cache. When the transaction is actually committed, MySQL will flush the contents of the redo log cache to the redo log file, and then update the data in memory to disk. Steps 4 and 5 in the figure are performed when the transaction is committed.
In InnoDB, redo logs are stored in 512-byte blocks, and because the block size is the same as the disk sector size, redo log writes can ensure atomicity and will not cause redo logs to write only half and leave dirty data due to machine power outage.
In addition to all changes to the database will result in redo logs, because rollback logs also need to be persisted, they will also create corresponding redo logs, after an error occurs, when the database is restarted, it will reexecute the logs from the redo log that have not been updated to the database disk to meet the persistence of the transaction.
Roll back logs and redo logs
Up to now, we have learned about two kinds of logs in MySQL, rollback log (undo log) and redo log (redo log). In database systems, the atomicity and persistence of transactions are guaranteed by transaction logs (transaction log). In implementation, the former is used to undo the impact of transactions, while the latter redoes committed transactions during error handling, which ensures two things:
Transactions that have an error or need to be rolled back can be rolled back successfully (atomicity)
After the transaction is committed, when the disk goes down before the data can be written, the data can be successfully recovered after the next reboot (persistence)
In the database, these two kinds of logs often work together, and we can think of them as a transaction log as a whole, which contains the ID of the transaction, the modified row elements, and the values before and after modification.
A transaction log contains both pre-and post-modified values, so it is very easy to roll back and redo. We will not introduce redo and rollback log deployment here. The use of two kinds of logs may be mentioned in a later article when talking about the recovery mechanism of the database system.
Isolation
In fact, the author in the previous article "simple and simple" MySQL and InnoDB have introduced the isolation of database transactions, but in order to ensure the independence and integrity of the article, we will also introduce the isolation of transactions, the content may be slightly different.
The isolation of transactions is one of the bases for the database to process data. if there is no isolation between transactions without the database, it will occur in the cascading rollback mentioned in the atomicity of parallel transactions, resulting in a huge loss of performance. If the execution order of all transactions is linear, then it is much easier to manage transactions, but allowing parallel execution of transactions can improve throughput and resource utilization, and reduce the waiting time of each transaction.
When multiple transactions are executed concurrently, the isolation of transactions may be violated. Although there may be no errors in the execution of a single transaction, it will cause problems with the consistency of the database as a whole. Although serial allows developers to ignore the impact of parallelism and maintain database consistency well, it will affect the performance of transaction execution.
Isolation level of the transaction
Therefore, the isolation and consistency of the database is actually a problem that needs to be weighed by developers. What kind of isolation level is provided for the database also determines the performance of the database and what kind of consistency can be achieved. In the SQL standard, the isolation level of transactions in four databases: READ UNCOMMITED, READ COMMITED, REPEATABLE READ and SERIALIZABLE;, the isolation level of each transaction actually solves one more problem than the previous level:
RAED UNCOMMITED: use query statements without locking and may read uncommitted rows (Dirty Read)
READ COMMITED: only record locks are added to records, not gap locks between records, so new records are allowed to be inserted near locked records, so when you use query statements multiple times, you may get different results (Non-Repeatable Read)
REPEATABLE READ: reading the same range of data multiple times will return a snapshot of the first query. No different data rows will be returned, but Phantom Read may occur.
SERIALIZABLE:InnoDB implicitly adds a shared lock to all the query statements, which solves the problem of phantom reading.
All of the above transaction isolation levels do not allow dirty writing (Dirty Write), that is, the current transaction updates data that another transaction has updated but has not yet committed. Most databases use READ COMMITED as the default transaction isolation level, but MySQL uses REPEATABLE READ as the default configuration. From RAED UNCOMMITED to SERIALIZABLE, as the transaction isolation level becomes more and more stringent, the performance of the database for concurrent transaction execution is gradually degraded.
For database users, in theory, we do not need to know how the isolation level of the transaction is implemented, we just need to know what problems this isolation level solves, but the implementation details of different databases for different isolation levels often make us encounter unexpected pitfalls.
If the reader doesn't know what dirty reading, unrepeatable reading, and illusory reading are, you can read the previous articles MySQL and InnoDB, and here we just put a picture to show how each isolation level solves these problems.
Implementation of isolation level
The implementation of the isolation level in the database is to use the concurrency control mechanism to control the transactions executed at the same time, to restrict the access and update of different transactions to the same resource, and the most important and common concurrency control mechanism. Here we will briefly introduce the working principles of the three most important concurrency controller mechanisms.
Lock
Lock is one of the most common concurrency control mechanisms. In a transaction, we will not lock the entire database, but only lock those data items that need to be accessed. MySQL and common database locks are divided into two types, shared lock (Shared) and mutex lock (Exclusive), the former is also called read lock, the latter is called write lock.
The read lock ensures that the read operations can be performed concurrently and will not affect each other, while the write lock ensures that there are no unpredictable problems caused by other transactions accessing or changing the same record when updating database data.
Time stamp
In addition to locks, another way to achieve transaction isolation is through timestamps, a database that implements transactions in this way, for example, PostgreSQL reserves two fields for each record; the read timestamp misstates the maximum timestamp of all transactions accessing the record, and the write timestamp of the record row stores the timestamp of the transaction that changed the record to the current value.
When using timestamps to achieve transaction isolation, optimistic locks are often used to modify the data first, and then judge the current value when writing back, that is, whether the timestamp has changed, and if it has not changed, write it, otherwise, generate a new timestamp and update the data again. Optimistic locking is not a real locking mechanism, it is just an idea, and it will not be introduced here.
Multi-version and snapshot isolation
By maintaining multiple versions of data, databases can allow transactions to read older versions of data when it is updated by other transactions, and many databases implement this mechanism. Because all read operations no longer need to wait for the release of the write lock, it can significantly improve the read performance. Both MySQL and PostgreSQL implement this mechanism, that is, MVCC, although their own implementation methods are different. MySQL implements MVCC through the rollback log mentioned in the article, ensuring that transactions can obtain data directly without waiting for the release of mutexes when executing in parallel.
Isolation and atomicity
Here, we need to briefly mention the cascading rollback encountered in the atomicity section. If a transaction writes to the data, it will acquire a mutex lock. Other transactions will have to wait for the release of the write lock if they want to obtain the read lock of the changed data. Naturally, cascading rollback will not occur.
However, features such as MVCC are used in most databases, such as MySQL, that is, normal reading methods do not need to acquire locks, and you need to use SELECT when you want to update the read data. FOR UPDATE attempts to acquire mutexes for the corresponding rows to ensure that different transactions work properly.
Consistency
The author believes that database consistency is a very confusing concept, because the database domain actually contains two consistency, one is consistency in ACID, the other is consistency in CAP definition.
The consistency of the two databases is not the same thing at all, and many people have a very deep misunderstanding of the concepts of the two. When we discuss the consistency of the database, we must know what the semantics of the context is, and try to be clear about whether we are talking about consistency in ACID or consistency in CAP.
ACID
The database defines consistency in ACID as follows: if a transaction runs atomically in a consistent database, then the state of the database must be consistent after it is executed. For this concept, its first meaning is that the constraints on data integrity, including primary key constraints, reference constraints and some constraint checks, will not violate the constraints on data integrity before and after the execution of the transaction and during the process. all operations written to the database should be legal and cannot produce an illegal data state.
A transaction must preserve database consistency-if a transaction is run atomically in isolation starting from a consistent database, the database must again be consistent at the end of the transaction.
We can think of a transaction as a function that accepts an external SQL input and a consistent database, which must return a consistent database.
The second layer actually refers to the logical requirements for developers. We need to write the correct transaction logic in the code, such as bank transfer. The logic in the transaction cannot only deduct money or only add money. This is the requirement for database consistency at the application level.
Ensuring consistency for an individual transaction is the responsibility of the application programmer who codes the transaction. -Database System Concepts
The requirement of consistency in database ACID for transactions includes not only the check of data integrity and legitimacy, but also the correctness of application-level logic.
The data consistency in CAP theorem actually means that each node in the distributed system has the same value for the copy of the same data; while the consistency in ACID refers to the rules of the database. If it is stipulated in schema that a value must be unique, then the consistent system must ensure that the value is unique in all operations, so there is a fundamental difference in the definition of consistency between CAP and ACID.
About what transactions and locks in MySQL is shared here, I hope the above content can be of some help to you, can learn more knowledge. If you think the article is good, you can share it for more people to see.
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.