Analysis of misuse of insert into select in mysql 07/19 Update SLTechnology News&Howtos

Analysis of misuse of insert into select in mysql

2025-07-19 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Servers >

Shulou(Shulou.com)05/31 Report--

This article introduces the relevant knowledge of "misuse of insert into select in mysql". In the operation of actual cases, many people will encounter such a dilemma, so let the editor lead you to learn how to deal with these situations. I hope you can read it carefully and be able to achieve something!

The cause of the matter

The company's transaction volume is relatively large, the database used is MySQL, and the daily increment is about one million. The company does not have a sub-database and sub-table, so in order to maintain the performance of this table, we can only consider data migration.

Colleague Li received the task, so he came up with these two plans:

First query through the program, and then insert the history table, and then delete the original table.

Use insert into select to let the database IO do all the work.

When the first program was used, it was found that it was all loaded at once, and the system was directly OOM, but if it was done in batches, there was too much IO and a long time, so I chose the second scheme, which did not have any problems during the test. I was happy and was fired.

In the end what happened? Let's review.

Let's take a look at the first solution, and first look at the pseudo code:

/ / 1. Query the data to be migrated

List list = selectData ()

/ / 2. Insert data into the history table

InsertData (list)

/ / 3. Delete the original table data

DeleteByIds (ids)

We can see from this code that the reason for OOM is very simple, we directly load all the data into memory, no wonder the memory does not explode.

Let's take a look at the second plan, what happened?

In order to maintain the performance of the table while retaining the valid data, we have negotiated a quantity to retain the data for 10 days, and it is almost necessary to keep the 1kw data in the table.

So the colleague did a time screening operation, directly insert into select. DateTime

< (Ten days ago)... 爽极了，直接就避免了要去分页查询数据，这样就不存在 OOM 啦。还简化了很多的代码操作，减少了网络问题。为了测试，还特意建了 1kw 的数据来模拟，测试环境当然是没有问题啦，顺利通过。考虑到这个表是一个支付流水表，于是将这个任务做成定时任务，并且定在晚上 8 点执行。晚上量也不是很大，自然是没有什么问题，但是第二天公司财务上班，开始对账，发现资金对不上，很多流水都没有入库。最终排查发现晚上 8 点之后，陆陆续续开始出现支付流水插入失败的问题，很多数据因此丢失。最终定位到了是迁移任务引起的问题，刚开始还不明所以，白天没有问题，然后想到晚上出现这样的情况可能是晚上的任务出现了影响，最后停掉该任务的第二次上线，发现没有了这样的情况。复盘问题在哪里？为什么停掉迁移的任务之后就好了呢？这个 insert into select 操作到底做了什么？我们来看看这个语句的 explain：

It is not difficult to see from the figure that this query statement goes directly to the full table scan. At this time, it is not difficult for us to guess a little bit of the problem.

If the full table is scanned and our table is so large, does it mean that the migration time will be very long? If our migration time is one hour, does that mean it explains why we don't have such a problem during the day? But is full table scanning the most fundamental reason?

We might as well try to migrate while doing some operations to restore the scene. In the end, such a problem will arise.

At this time, we can adjust and boldly assume that if the table is not scanned, there will not be such a problem. When we changed the conditions, we found that we had not scanned the whole table. Finally, the scene was restored again, and the problem was solved:

It is concluded that the full table scan led to the accident. This solves the problem, but it's hard to explain why it starts to fail one after another.

Reason

At the default transaction isolation level: insert into a select b's operation a means to lock the table directly, and table b is locked one by one. This also explains the reasons for the successive failures.

When locking one by one, because most of the flow meters are compound records, the final part is locked during scanning, and some can not get the lock, resulting in timeout or direct failure, and some locking here are successful.

Why is there nothing wrong with the test?

When testing, make full use of the data of the formal environment to test, but do not ignore a problem, that is, the test environment is, after all, the test environment, in testing, the real amount of data does not mean that it is the real business scenario.

For example, in this case, there is less of a migration, the insertion of a large amount of data. It eventually leads to online Bug.

Solution.

Since we can solve this problem by avoiding full table scans, we can just avoid it. To avoid full table scans, index the conditions after where and let all our select queries go through the index.

Does insert into still work? The answer is: of course.

This is the end of the analysis of misuse of insert into select in mysql. Thank you for your reading. If you want to know more about the industry, you can follow the website, the editor will output more high-quality practical articles for you!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.