Optimize the deduplication operation of MySQL to the extreme three rounds in a row. 07/06 Update SLTechnology News&Howtos

Optimize the deduplication operation of MySQL to the extreme three rounds in a row.

2025-07-06 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Database >

Shulou(Shulou.com)06/01 Report--

Optimize the MySQL deduplication operation to the top three: continuous firing (1): skillful use of indexes and variables

Http://blog.csdn.net/wzy0623/article/details/54377986

Preparation for the experiment:

MySQL 5.6.14

Create table t_source

(

Item_id int

Created_time datetime

Modified_time datetime

Item_name varchar (20)

Other varchar (20)

);

Create table t_target like t_source

Delimiter / /

Create procedure sp_generate_data ()

Begin

Set @ I: = 1

While @ I apply where filter-> process select list-> apply order by clause.

In order for variables to be assigned and compared in the sort order of created_time and item_name, the rows of data must be looked up in the order of index. The force index (idx_sort) hint here serves this purpose, and must be written in this way to make the entire duplicate check statement valid. Otherwise, because the table is scanned before sorting, the order in which variables are assigned cannot be guaranteed, and the correctness of the query results cannot be guaranteed. The order by clause is also not negligible, otherwise MySQL will use a full table scan instead of a full index scan, even if prompted by force index, resulting in incorrect results.

The index also ensures the order of created_time,item_name and avoids file sorting. The force index (idx_sort) hint and the order by clause are indispensable, and the index idx_sort is just right here to kill two birds with one stone.

Before the query statement starts, initialize the variable to an impossible value in the data, and then enter the where clause to judge from left to right. Compare the values of variables and fields, then assign the values of created_time and item_name to the variables, and process them line by line in the order of created_time,item_name. Item_name is a string type, (@ b:=item_name) is not a valid Boolean expression, so write it as (@ b:=item_name) is not null.

The way "insert into t_target select * from t_source group by created_time,item_name;" is written, which is restricted by "sql_mode='ONLY_FULL_GROUP_BY'".

The running time is different from the original text, probably because my environment is SSD.

In addition, to avoid the overhead of returning to the table, you can increase the fields of the index

Drop index idx_sort on t_source

Create index idx_sort on t_source (created_time,item_name,item_id,modified_time,other)

Analyze table t_source

Using the above index, the time consumption of the ultimate improved SQL can be reduced to 9.5s.

Reference:

Http://blog.csdn.net/wzy0623/article/details/54378367

Http://blog.csdn.net/wzy0623/article/details/54378575

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.