Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

What is the principle of implementing the ACID feature of MySQL transactions?

2025-04-05 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Database >

Share

Shulou(Shulou.com)06/01 Report--

Editor to share with you what is the principle of the implementation of the ACID feature of MySQL transactions. I hope you will gain a lot after reading this article. Let's discuss it together.

Transaction is an important aspect that distinguishes relational database such as MySQL from NoSQL, and it is an important means to ensure data consistency. This article will first introduce the basic concepts related to MySQL transactions, then introduce the ACID features of transactions, and analyze their implementation principles.

MySQL is broad and profound, and the omissions in the article are inevitable. Criticism and correction are welcome.

I. basic concepts

A Transaction is a program execution unit that accesses and updates a database; a transaction may contain one or more sql statements, all or none of which are executed. As a relational database, MySQL supports transactions, which is based on MySQL5.6.

Let's first review the basics of MySQL transactions. (recommended course: MySQL video tutorial)

1. Logical architecture and storage engine

As shown in the figure above, the logical architecture of the MySQL server can be divided into three layers from top to bottom:

(1) the first layer: deal with client connection, authorization authentication and so on.

(2) the second layer: server layer, which is responsible for the parsing, optimization and caching of query statements, as well as the implementation of built-in functions, stored procedures and so on.

(3) the third layer: storage engine, which is responsible for the storage and extraction of data in MySQL. In MySQL, the server layer does not manage transactions, which are implemented by the storage engine. MySQL storage engines that support transactions include InnoDB, NDB Cluster, etc., among which InnoDB is the most widely used; other storage engines do not support transactions, such as MyIsam, Memory and so on.

Unless otherwise noted, the content described later is based on InnoDB.

two。 Commit and rollback

A typical MySQL transaction operates as follows:

Start transaction;... # one or more sql statements commit

Where start transaction identifies the start of the transaction, and commit commits the transaction and writes the execution result to the database. If there is a problem with the execution of the sql statement, rollback is called and all sql statements that have been successfully executed are rolled back. Of course, you can also use the rollback statement directly in the transaction for rollback.

Automatic submission

Autocommit (autocommit) mode is used by default in MySQL, as shown below:

In autocommit mode, if there is no start transaction to explicitly start a transaction, each sql statement is executed as a transaction.

You can turn off autocommit; in the following way. Note that the autocommit parameter is specific to the connection, and if you modify the parameter in one connection, it will not affect other connections.

If autocommit is turned off, all sql statements are in one transaction until commit or rollback is executed, and the transaction ends and another transaction starts.

Special operation

In MySQL, there are special commands that, if executed in a transaction, force the commit commit transaction to be executed immediately, such as DDL statements (create table/drop table/alter/table), lock tables statements, and so on.

However, the commonly used select, insert, update, and delete commands do not force the transaction to be committed.

3. ACID characteristics

ACID is the four characteristics that measure transactions:

Atomicity (Atomicity, or indivisibility)

Consistency (Consistency)

Isolation (Isolation)

Persistence (Durability)

According to strict standards, it is a transaction only if it satisfies the ACID feature at the same time; but in the implementation of the major database vendors, there are very few transactions that really meet the ACID. For example, NDB Cluster transactions of MySQL do not satisfy persistence and isolation; InnoDB default transaction isolation level is repeatable and does not meet isolation; Oracle default transaction isolation level is READ COMMITTED, which does not meet isolation. Therefore, ACID is not so much a condition that a transaction must meet, but rather four dimensions that measure a transaction.

The ACID features and how they are implemented are described in detail below; for ease of understanding, the order of introduction is not strictly A-C-I-D.

Second, atomicity 1. Define

Atomicity means that a transaction is an indivisible unit of work, in which the operations are either done or not done; if a sql statement in the transaction fails, the executed statement must also be rolled back and the database returns to the pre-transaction state.

two。 Implementation principle: undo log

Before explaining the principle of atomicity, let's first introduce MySQL's transaction log. There are many kinds of MySQL logs, such as binary log, error log, query log, slow query log and so on. In addition, InnoDB storage engine also provides two kinds of transaction log: redo log (redo log) and undo log (rollback log). Redo log is used to ensure transaction persistence; undo log is the basis for the implementation of transaction atomicity and isolation.

Let's go back to undo log. The key to atomicity is the ability to undo all sql statements that have been successfully executed when the transaction is rolled back. InnoDB rollback is achieved by undo log: when a transaction modifies the database, InnoDB generates the corresponding undo log;. If the transaction fails or calls rollback, causing the transaction to need to be rolled back, you can use the information in undo log to roll back the data to the way it was before the modification.

Undo log is a logical log, which records information related to sql execution. When a rollback occurs, InnoDB will do the opposite work according to the contents of undo log: for each insert, delete; will be executed on rollback for each delete, insert; will be executed on rollback for each update, and an opposite update will be performed on rollback to change the data back.

Take the update operation as an example: when a transaction executes update, the generated undo log contains information such as the primary key of the modified row (in order to know which rows have been modified), which columns have been modified, and the value of these columns before and after modification. When rollback, you can use this information to restore the data to the state before update.

Third, persistence 1. Define

Persistence means that once a transaction is committed, its changes to the database should be permanent. Other operations or failures that follow should not affect it in any way.

two。 Implementation principle: redo log

Both redo log and undo log belong to the transaction log of InnoDB. Let's talk about the background of the existence of redo log.

As the storage engine of MySQL, InnoDB stores data on disk, but if disk IO is needed for every read and write of data, the efficiency will be very low. To this end, InnoDB provides Buffer Pool, and Buffer Pool contains the mapping of some data pages on disk as a buffer to access the database: when reading data from the database, it will first read from Buffer Pool, and if not in Buffer Pool, it will be put into Buffer Pool; after reading from disk. When writing data to the database, the modified data in Buffer Pool,Buffer Pool will be written first and flushed to disk periodically (this process is called brushing).

The use of Buffer Pool greatly improves the efficiency of reading and writing data, but it also brings new problems: if MySQL is down, and the modified data in Buffer Pool has not been refreshed to disk, it will lead to the loss of data, and the durability of the transaction can not be guaranteed.

Therefore, redo log is introduced to solve this problem: when the data is modified, in addition to modifying the data in Buffer Pool, the operation is recorded in redo log; when the transaction is committed, the fsync API is called to flush the redo log. If the MySQL is down, you can read the data in the redo log and restore the database when you restart. Redo log uses WAL (Write-ahead logging, pre-written log). All modifications are first written to the log, and then updated to Buffer Pool, ensuring that data will not be lost due to MySQL downtime, thus meeting the persistence requirements.

Since redo log also needs to write logs to disk when a transaction commits, why is it faster than writing data modified in Buffer Pool directly to disk (that is, brushing)? There are two main reasons:

(1) brushing dirty is a random IO, because the position of each modified data is random, but writing redo log is an additional operation and belongs to sequential IO.

(2) brushing dirty is in terms of data pages (Page). The default page size of MySQL is 16KB, and a small modification on a Page is written to the whole page; while redo log contains only the parts that really need to be written, and the invalid IO is greatly reduced.

3. Redo log and binlog

We know that there is also a binlog (binary log) in MySQL that can also record writes and be used for data recovery, but the two are fundamentally different:

(1) different functions: redo log is used for crash recovery to ensure that MySQL outage will not affect persistence; binlog is used for point-in-time recovery to ensure that the server can recover data based on point-in-time. In addition, binlog is also used for master-slave replication.

(2) different levels: redo log is implemented by InnoDB storage engine, while binlog is implemented by the server layer of MySQL (see the introduction to MySQL logic architecture above), and supports InnoDB and other storage engines at the same time.

(3) the content is different: redo log is a physical log, disk-based Page;binlog is a logical log, and the content is sql.

(4) the timing of writing is different: binlog writes when the transaction is committed, while the timing of redo log is relatively diverse:

As mentioned earlier: when a transaction commits, fsync is called to flush the redo log; this is the default policy, which can be changed by changing the innodb_flush_log_at_trx_commit parameter, but the persistence of the transaction will not be guaranteed.

In addition to transaction commit, there are other opportunities to refresh the disk: for example, master thread flushes the disk once per second redo log and so on. The advantage is that you don't have to wait for the commit to flush the disk, and the speed of commit is greatly accelerated.

Fourth, isolation 1. Define

Unlike atomicity and persistence, which focus on the transaction itself, isolation studies the interaction between different transactions. Isolation means that the operations within the transaction are isolated from other transactions, and the transactions executed concurrently cannot interfere with each other. Strict isolation corresponds to Serializable (serializability) in the transaction isolation level, but serializability is rarely used in practical applications for performance reasons.

Isolation pursues non-interference between transactions in the case of concurrency. For simplicity, we only consider the simplest read and write operations (special operations such as locked reads are not considered for the time being), then the discussion of isolation can be divided into two aspects:

The effect of a write operation on a write operation of (another transaction): the locking mechanism ensures isolation

Effect of write operations on read operations of (one transaction): MVCC guarantees isolation

two。 Locking mechanism

First, let's look at the interaction between the write operations of the two transactions. Isolation requires that only one transaction can write to the data at a time, and InnoDB ensures this through the locking mechanism.

The basic principle of the locking mechanism can be summarized as follows: before the transaction modifies the data, the transaction needs to obtain the corresponding lock; after obtaining the lock, the transaction can modify the data; during the transaction operation, this part of the data is locked. If other transactions need to modify the data, they need to wait for the current transaction to commit or roll back the lock.

Row lock and table lock

According to granularity, locks can be divided into table locks, row locks, and other locks that lie between the two. The table lock locks the whole table when operating the data, and the performance is poor, while the row lock only locks the data that needs to be operated, and the concurrent performance is good. However, because locking itself consumes resources (obtaining locks, checking locks, releasing locks, etc.), using table locks in the case of more locking data can save a lot of resources. Different storage engines in MySQL support different locks. For example, MyIsam only supports table locks, while InnoDB supports both table locks and row locks, and row locks are used in most cases for performance reasons.

How to view lock information

There are several ways to view locks in InnoDB, such as:

Overview of select * from information_schema.innodb_locks; # locks show engine innodb status; # InnoDB overall status, including lock status

Let's look at an example:

# execute in transaction A: start transaction;update account SET balance = 1000 where id = 1 * * # execute in transaction B: start transaction;update account SET balance = 2000 where id = 1

Check the lock at this point:

Show engine innodb status looks at the lock-related parts:

You can see how transactions 24052 and 24053 occupy locks through the above command, where lock_type is RECORD, representing a row lock (record lock), and lock_mode is X, representing an exclusive lock (write lock).

In addition to exclusive locks (write locks), there is a concept of shared locks (read locks) in MySQL. Since this article focuses on the implementation principle of MySQL transactions, the introduction of locks ends here. Later, we will write a special article to analyze the differences and usage scenarios of different locks in MySQL.

After introducing the interaction between write operations, let's discuss the impact of write operations on read operations.

3. Dirty reading, unrepeatable reading and phantom reading

First, let's look at three types of problems that may exist in read operations in the case of concurrency:

(1) dirty reading: the uncommitted data (dirty data) of other transactions (B) can be read in the current transaction (A). This phenomenon is dirty reading. Examples are as follows (take the account balance statement as an example):

(2) unrepeatable reading: the same data is read twice in transaction A, and the result of the two reads is different. This phenomenon is called non-repeatable reading. The difference between dirty reading and unrepeatable reading is that the former reads data that is not committed by other transactions, while the latter reads data that has been committed by other transactions. Examples are as follows:

(3) Phantom reading: in transaction A, the database is queried twice according to a certain condition, and the number of query results is different. This phenomenon is called phantom reading. The difference between unrepeatable reading and phantom reading can be understood as: the former is that the data has changed, while the latter is that the number of rows of the data has changed. Examples are as follows:

4. Transaction isolation level

The SQL standard defines four isolation levels and specifies whether the above problems exist under each isolation level. In general, the lower the isolation level, the lower the system overhead, the higher the concurrency that can be supported, but the worse the isolation. The relationship between isolation levels and reading problems is as follows:

In practical applications, read uncommitted will cause a lot of problems in concurrency, but the performance improvement compared with other isolation levels is very limited, so it is used less. Serializability forces transactions to be serial, and the concurrency efficiency is very low. It is only used when data consistency is extremely high and no concurrency is acceptable, so it is less used. Therefore, in most database systems, the default isolation level is read committed (such as Oracle) or repeatable read (hereinafter referred to as RR).

You can view the global isolation level and the isolation level for this session with the following two commands:

The default isolation level for InnoDB is RR, which will focus on RR later. It should be noted that in the SQL standard, RR cannot avoid phantom reading, but RR implemented by InnoDB avoids phantom reading.

5. MVCC

RR solves the problems of dirty reading, unrepeatable reading, phantom reading and so on, using the full name of MVCC:MVCC Multi-Version Concurrency Control, that is, multi-version concurrency control protocol. The following example shows the characteristics of MVCC: different transactions may read different data at the same time (that is, multiple versions)-transaction An and transaction C can read different versions of the data at T5.

The biggest advantage of MVCC is that there is no lock on read, so there is no conflict between read and write, and the performance of concurrency is good. InnoDB implements MVCC, and multiple versions of data can coexist, mainly depending on the hidden columns of the data (also known as tag bits) and undo log. The hidden column of the data includes the version number of the data, the deletion time, the pointer to undo log, and so on; when reading the data, MySQL can determine whether it needs to be rolled back and find the undo log needed by the rollback through the hidden column, so that the detailed format of the MVCC; hidden column is no longer expanded.

The following is explained respectively in combination with the several problems mentioned above.

(1) dirty reading

When transaction A reads the balance of zhangsan at the T3 time node, it will find that the data has been modified by other transactions and the status is uncommitted. At this time, transaction A reads the latest data and performs a rollback operation according to the undo log of the data to get the data before transaction B is modified, thus avoiding dirty reading.

(2) non-repeatable

When transaction A reads the data for the first time on the T2 node, it records the version number of the data (the version number of the data is recorded in row), assuming that the version number is 1; when transaction B commits, the version number of the row record increases, assuming that the version number is 2 When transaction A reads the data again at T5, it finds that the version number (2) of the data is greater than the version number (1) recorded when it was first read, so it will perform a rollback operation according to undo log to get the data with version number 1, thus achieving repeatable readability.

(3) Phantom reading

RR implemented by InnoDB avoids phantom reading through the next-key lock mechanism.

Next-key lock is a kind of row lock that implements the equivalent of record lock (record lock) + gap lock (gap lock); it is characterized by locking not only the record itself (the function of record lock), but also a range (the function of gap lock). Of course, what we are talking about here is unlocked reading: at this time, next-key lock is not really locked, but just adds a tag to the read data (including the data version number, etc.); let's call it a next-key lock-like mechanism for the sake of accuracy. Let's use the previous example to illustrate:

When transaction A reads 0 for the first time on T2 node

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report