Under what circumstances will there be holes in the self-increasing primary key of MySQL 07/13 Update SLTechnology News&Howtos

Under what circumstances will there be holes in the self-increasing primary key of MySQL

2025-07-13 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Shulou(Shulou.com)06/01 Report--

This article mainly explains "MySQL primary key self-increase in what circumstances there will be holes", interested friends may wish to take a look. The method introduced in this paper is simple, fast and practical. Next let the editor to take you to learn "MySQL primary key self-increase in what circumstances there will be holes" bar!

For illustration purposes, we create a table t, where id is the self-incrementing primary key field and c is the unique index.

CREATE TABLE `t` (`id` int (11) NOT NULL AUTO_INCREMENT, `c` int (11) DEFAULT NULL, `d` int (11) DEFAULT NULL, PRIMARY KEY (`id`), UNIQUE KEY `c` (`c`)) where is the self-added ENGINE=InnoDB; stored?

Execute insert into t values (null, 1,1) in the empty table t; insert a row of data, and then execute the show create table command, and you can see the result shown in the following figure:

Figure 1 automatically generated AUTO_ index value

As you can see, an AUTO_INCREMENT=2 appears in the table definition, indicating that the next time the data is inserted, if you need to automatically generate self-increment, an id=2 will be generated.

In fact, this output can easily lead to the misunderstanding that self-increment is stored in the table structure definition. In fact, the structure definition of the table is stored in a file with the suffix .frm, but it is not saved as self-increment.

Different engines have different preservation strategies for self-increment.

The self-increment of the MyISAM engine is saved in the data file.

In fact, the self-increment of the InnoDB engine is stored in memory, and it is not until MySQL version 8.0 that it has the ability of "self-increment persistence", that is, "if a restart occurs, the self-increment of the table can be restored to the value before MySQL restart". The details are as follows:

In MySQL version 5. 7 and earlier, self-increment is kept in memory and not persisted. After each restart, when the table is opened for the first time, it will find the maximum self-added value max (id), and then take max (id) + 1 as the current self-increment of the table. Phellodendron mandshurica (Thunb.)

For example, if the largest id in the current data row of a table is 10. At this point, we delete the line of id=10, and AUTO_INCREMENT is still 11. However, if you restart the instance immediately, the AUTO_INCREMENT of the table will become 10 after the restart. Phellodendron mandshurica (Thunb.)

That is, a MySQL restart may change the value of a table's AUTO_INCREMENT.

In MySQL version 8.0, self-increment changes are recorded in redo log, and the restart relies on redo log to restore the pre-restart values.

After understanding MySQL's preservation strategy for self-increment, let's look at the self-increment modification mechanism.

Self-value-added modification mechanism

In MySQL, if the field id is defined as AUTO_INCREMENT, the self-increment behavior is as follows when inserting a row of data:

If the id field is specified as 0, null, or no value when inserting data, fill in the current AUTO_ index value of the table to the self-increment field

If the id field specifies a specific value when you insert the data, use the value specified in the statement directly.

The result of the self-increment change will be different depending on the value to be inserted and the size of the current self-increment. Suppose that the value of a minor insert is X and the current self-increment is Y.

If X = current self-increment, the new self-increment is "value to be inserted + 1"

Otherwise, the self-appreciation will remain unchanged.

This introduces the problem mentioned at the beginning of our article. When both parameters are set to 1, the self-increasing primary key id is not guaranteed to be continuous. What is the reason for this?

Timing of self-increment modification

To answer this question, we need to look at the timing of self-increment modification.

Suppose that there is already a record in table t, and then I execute another insert data command:

Insert into t values (null, 1,1)

The execution process of this statement is:

The executor calls the InnoDB engine interface to write a line, and the value passed in for this line is (0mem1p1).

InnoDB found that the user did not specify the value of self-increment id and obtained the current self-increment 2 of table t

Change the value of the passed-in row to (2 # 1)

Change the self-increment of the watch to 3.

Continue with the insert data operation, because the record of center1 already exists, so report Duplicate key error and the statement returns.

The corresponding execution flow chart is as follows:

Fig. 3 A recurrence step of self-increasing primary key id discontinuity

As you can see, this operation sequence replicates a scene where the self-incrementing primary key id is discontiguous (lines without id=2). It can be seen that unique key conflicts are the first cause of self-increasing primary key id discontiguity.

Similarly, transaction rollback can produce a similar phenomenon, which is the second reason.

The following statement sequence can be used to construct discontinuous self-increasing id, which you can verify for yourself.

Insert into t values (null,1,1); begin;insert into t values (null,2,2); rollback;insert into t values (null,2,2); / / the row inserted is (3mem2jue 2)

You might ask, why didn't MySQL change the self-increment of table t back when there was a unique key conflict or rollback? If you change the current self-increment of table t from 3 to 2, and then insert new data, won't you be able to generate a row of id=2 data?

In fact, MySQL is designed to improve performance. Next, I will analyze this design idea with you to see why self-increment can not be reversed.

Suppose there are two transactions executed in parallel, when applying for self-increment, in order to prevent the two transactions from applying to the same self-incrementing id, be sure to add a lock, and then apply sequentially.

Assuming that transaction A requests id=2 and transaction B requests id=3, then the self-increment of table t is 4, and then execution continues.

Transaction B commits correctly, but transaction A has a unique key conflict.

If transaction An is allowed to roll back the self-increment id, that is, to change the current self-increment of table t back to 2, then there are already rows of id=3 in the table, and the current self-increment id value is 2.

Next, other transactions that continue will be applied to the id=2 and then to the id=3. At this point, the insert statement will report an error "primary key conflict".

To resolve this primary key conflict, there are two ways:

Before applying for id, determine whether the id already exists in the table. If it exists, skip the id. However, the cost of this method is high. Because, originally applying for id is a very fast operation, now we have to go to the primary key index tree to determine whether id exists.

Expand the lock scope of the self-increment id until one transaction is completed and committed before the next transaction can apply for the self-increment id. The problem with this method is that the granularity of the lock is too large and the concurrency ability of the system is greatly reduced.

It can be seen that both methods can cause performance problems. The chief culprit for these problems is our hypothetical premise of "allowing self-increasing id to fall back".

As a result, InnoDB abandoned this design, and statement execution failures did not fallback to augment id. It is precisely because of this that the self-increasing id is only guaranteed to be incremental, but not continuous.

Optimization of self-increasing lock

As you can see, the self-incremented id lock is not a transaction lock, but is released immediately after each application to allow other transactions to reapply. In fact, this was not the case before MySQL 5.1.

Next, I will first introduce you to the history of self-increasing lock design, which will help you analyze the next problem.

In MySQL version 5. 0, the scope of self-incrementing locks is at the statement level. That is, if a statement applies for a table self-incrementing lock, the lock will not be released until the end of the statement execution. Obviously, this design will affect the degree of concurrency.

MySQL version 5.1.22 introduces a new policy with a new parameter innodb_autoinc_lock_mode, and the default value is 1.

When the value of this parameter is set to 0, the policy of the previous version of MySQL 5.0 is adopted, that is, the lock is not released until the end of the statement execution.

The value of this parameter is set to 1:

An ordinary insert statement that is released immediately after the lock is added to the application

Similar to insert... For bulk insert data statements such as select, the lock will not be released until the end of the statement.

When the value of this parameter is set to 2, all applications will release the lock after the application is added to the primary key.

You must have two questions: why the default setting, insert... Does select want to use statement-level locks? Why isn't the default value of this parameter 2?

The answer is that it is designed for the sake of data consistency.

Let's take a look at this scene:

Fig. 4 self-increment locking of bulk insert data

In this example, I insert four rows of data into table T1, then create a table T2 with the same structure, and then both session insert data into table T2 at the same time.

You can imagine that if session B applied to release the self-increment lock as soon as it was added, then this might happen:

Session B inserts two records first, (1) and (2) (2)

Then, session A came to apply since adding id to get id=3, inserted (3jing5pence5)

After that, session B continues to execute, inserting two records (4, 3, 3) and (5, 4, 4).

You might say, it doesn't matter, after all, the semantics of session B does not require that all rows of table T2 have the same data as session A.

Yes, the data is logically correct. But if we have binlog_format=statement now, you can imagine how binlog will record it.

Because two session execute the insert data command at the same time, there are only two cases of update log facing table T2 in binlog: either session An or session B.

But either way, the binlog is used to execute from the library, or to restore temporary instances, the session B statement is executed in the repository and temporary instances, and the resulting id is continuous. At this point, there is a data inconsistency in the library.

You can analyze what is the cause of this problem?

In fact, this is because the insert statement of the original library session B generates a discontiguous id. This discontiguous id, which is executed serially with binlog in statement format, cannot be executed.

There are two ways to solve this problem:

One idea is to let the original library insert data statements in batches to generate continuous id values. Therefore, it is to achieve this goal that the lock is not released until the end of the statement execution.

Another idea is that the operations of inserting data are truthfully recorded in binlog, and when the standby database is executed, it is no longer dependent on the self-increasing primary key to generate. In this case, innodb_autoinc_lock_mode is set to 2 and binlog_format is set to row.

Therefore, in production, especially with insert... When select is a scenario of bulk inserting data, from the point of view of the performance of concurrent inserting data, I suggest you set it this way: innodb_autoinc_lock_mode=2, and binlog_format=row. Doing so can improve concurrency without data consistency problems.

It should be noted that the bulk insert data I am talking about here contains a statement type of insert. Select, replace... Select and load data statements.

However, in the case of multiple value values in a normal insert statement, even if innodb_autoinc_lock_mode is set to 1, the lock will not be released until the statement execution is complete. Because this kind of statement can accurately calculate how many id is needed when applying for self-adding id, and then apply at one time, and the lock can be released after the application is completed.

In other words, the statement for inserting data in bulk needs to be set because "I don't know how many id to apply in advance."

Since you don't know in advance how many self-increasing id you want to apply, a direct idea is to apply for one when you need one. But if a select... The insert statement inserts 100000 rows of data, which requires 100000 requests according to this logic. Obviously, this strategy of applying for self-increasing id, in the case of inserting data in large quantities, is not only slow, but also affects the performance of concurrent insertion.

Therefore, for statements that insert data in bulk, MySQL has a strategy to apply for self-incrementing id in bulk:

During the execution of the statement, the first time you apply for a self-increment id, a

After 1 is used up, this statement applies for a self-increasing id for the second time, and 2 will be assigned.

After 2 are used up, the same sentence will be assigned 4 for the third application to add id.

And so on, if you apply for self-increment id with the same statement, the number of self-increment id per application is double that of the previous one.

For example, let's take a look at the following statement sequence:

Insert into t values (null, 1jue 1); insert into t values (null, 2pje 2); insert into t values (null, 3pje 3); insert into t values (null, 4pr 4); create table T2 like tten insert into T2 (cmeme d) select from tten insert into T2 values (null, 5je 5)

Insert... Select, which actually inserts four rows of data into table T2. However, these four lines of data are self-increasing id in three applications, the first application to id=1, the second to id=2 and id=3, and the third to id=4 to id=7.

Since only four id are actually used in this statement, id=5 to id=7 is wasted. After that, you execute insert into T2 values (null, 5 values), and the data you insert is actually (8 field 5).

This is the third reason for the self-increasing id discontinuity of the primary key id.

Summary

Today, we start with the question of why discontiguous values occur in self-incrementing primary keys, starting with self-increment storage.

In the MyISAM engine, self-increment is written on the data file. In InnoDB, self-increment is recorded in memory. MySQL did not add persistence to the self-increment of the InnoDB table until version 8.0, ensuring that the self-increment of a table remained unchanged before and after the restart.

Then, I share with you the timing of self-increment changes during the execution of a statement, and analyze why MySQL cannot recycle self-increment id when a transaction is rolled back.

The parameter innodb_autoinc_lock_mode introduced in MySQL version 5.1.22 controls the lock range when applying for self-added value. From a concurrency performance point of view, I recommend that you set it to 2 and set binlog_format to row. I actually mentioned many times in the previous article that it is necessary to set binlog_format to row. Today's example gives one more reason for this conclusion.

Why are there so many locks in insert statements?

Try to release the self-increment lock after applying for the added id.

Therefore, the insert statement is a very lightweight operation. However, this conclusion is valid for "normal insert statements". That is, some insert statements are "special cases" where other resources need to be locked during execution, or the self-increment lock cannot be released immediately after applying for the added id.

So, in today's article, let's talk about this topic together.

Insert... Select statement

Let's start with yesterday's problem. The table structure and initialization data statements of tables t and T2 are as follows. In today's example, we will expand on these two tables.

CREATE TABLE `t` (`id` int (11) NOT NULL AUTO_INCREMENT, `c` int (11) DEFAULT NULL, `d` int (11) DEFAULT NULL, PRIMARY KEY (`id`), UNIQUE KEY `c` (`c`) ENGINE=InnoDB;insert into t values (null, 1Magi 1); insert into t values (null, 2prime2); insert into t values (null, 3pyr3); insert into t values (null, 4Make4); create table T2 like t

Now, let's take a look at why binlog_format=statement executes at the repeatable readable isolation level:

Insert into T2 (cMagna d) select cpene d from t

When using this statement, do you need to lock all rows and gaps in table t?

In fact, what we need to consider is the consistency of logs and data. Let's take a look at this execution sequence:

Figure 1 concurrent insert scenario

The actual execution effect is that if session B executes first, because the statement adds (- ∞, 1] the next-key lock to the table t primary key index, the insert statement of session A will not be allowed to execute until the statement execution is complete.

However, if there is no lock, it is possible that the insert statement of session B is executed first, but then written to binlog. So, in the case of binlog_format=statement, a sequence of statements like this is recorded in binlog:

Insert into t values (- 1) select from 1); insert into T2 (c-meme d)

When this statement is executed in the slave database, the line id=-1 will also be written to table T2, resulting in an inconsistency between master and standby.

Insert circular write

Of course, execute insert... When select, the target table is not locked, but only the resources that need to be accessed are locked.

If there is a need to insert a row of data into table T2, the c value of this row is the maximum c value of table t plus 1.

At this point, we can write this SQL statement as follows:

Insert into T2 (c order by c desc limit d) (select center1, d from t force index (c) order by c desc limit 1)

The locking scope of this statement is the row id=4 on the next-key lock and the primary key index of table t, index c.

Its execution process is also relatively simple, reverse the order of index c from table t, scan the first row, and write the results to table T2.

So the number of rows scanned for the entire statement is 1.

The slow query log (slow log) executed by this statement, as shown in the following figure:

Figure 3 slow query log-insert data into table t

As you can see, the value of Rows_examined at this time is 5.

As I mentioned in the previous article, I hope you can learn to use the results of explain to "complement" the execution of the whole sentence. Today, let's try it together.

The explain result of this statement is shown in figure 4.

Figure 5. View the Innodb_rows_read changes

As you can see, the value of Innodb_rows_read increases by 4 before and after the execution of this statement. Because the default temporary table uses the Memory engine, all four rows look up table t, that is, a full table scan is performed on table t.

In this way, we have sorted out the whole implementation process:

Create a temporary table with two fields c and d.

Scan the table t according to index c, then take canti4, 3, 2, 1 in turn, then go back to the table and read the values of c and d to write to the temporary table. At this point, Rows_examined=4.

Because there is limit 1 in the semantics, only the first row of the temporary table is taken and inserted into the table t. At this point, the value of Rows_examined adds 1 to 5.

That is, this statement causes a full table scan on table t and adds a shared next-key lock to all gaps on index c. Therefore, during the execution of this statement, other transactions cannot insert data on this table.

As for why temporary tables are needed for the execution of this statement, the reason is that when the data is traversed while the data is updated, if the read data is directly written back to the original table, it is possible to read the record that has just been inserted in the traversal process. If the newly inserted record participates in the computing logic, it is not semantic.

Because the implementation of this statement does not directly use limit 1 in the subquery, the execution of this statement requires traversing the entire table t. Its optimization method is also relatively simple, that is, using the method described earlier, first insert into to the temporary table temp_t, so that only one row needs to be scanned; then the row of data is extracted from the table temp_t and inserted into table T1.

Of course, because the amount of data involved in this statement is very small, you can consider using memory temporary tables to do this optimization. When using memory temporary table optimization, the statement sequence is written as follows:

Create temporary table temp_t (c int,d int) engine=memory;insert into temp_t (select clock1, d from t force index (c) order by c desc limit 1); insert into t select * from temp_t;drop table temp_t;insert unique key conflict

The first two examples are the use of insert... In the case of select, the next example I'm going to introduce is the most common case of unique key conflicts in insert statements.

For tables with unique keys, it is also common for unique key conflicts to occur when inserting data. Let me give you a simple example of a unique key conflict.

Figure 7 unique key conflict-deadlock

When session An executes a rollback statement rollback, session C finds a deadlock and returns almost at the same time.

The logic of this deadlock is as follows:

At the T1 moment, start session An and execute the insert statement, which adds a record lock to index c, clocked 5. Note that this index is the only index, so it degenerates to a record lock (if your impression is blurred, you can review the locking rules introduced in the 21st article).

At T2, session B wants to execute the same insert statement and finds a unique key conflict, plus a read lock; similarly, session C also adds a read lock on index c, on record 5.

At T3, session A rolls back. At this point, both session B and session C try to continue the insert operation with a write lock. Both session have to wait for each other's row lock, so a deadlock occurs.

The state change diagram of this process is shown below.

Figure 9 two unique keys collide at the same time

As you can see, the primary key id is judged first, and MySQL thinks that this statement conflicts with the line id=2, so it modifies the line of id=2.

It is important to note that the affected rows that executes this statement returns 2, which can easily be misunderstood. In fact, there is only one line that is really updated, except that in the code implementation, both insert and update think they have succeeded, adding 1 to the update count and 1 to the insert count.

At this point, I believe that everyone on the "MySQL primary key self-increase in what circumstances there will be holes" have a deeper understanding, might as well to the actual operation of it! Here is the website, more related content can enter the relevant channels to inquire, follow us, continue to learn!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.