In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-01-17 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Database >
Share
Shulou(Shulou.com)06/01 Report--
Optimize the MySQL deduplication operation to the top three: continuous firing (1): skillful use of indexes and variables
Http://blog.csdn.net/wzy0623/article/details/54377986
Preparation for the experiment:
MySQL 5.6.14
Create table t_source
(
Item_id int
Created_time datetime
Modified_time datetime
Item_name varchar (20)
Other varchar (20)
);
Create table t_target like t_source
Delimiter / /
Create procedure sp_generate_data ()
Begin
Set @ I: = 1
While @ I apply where filter-> process select list-> apply order by clause.
In order for variables to be assigned and compared in the sort order of created_time and item_name, the rows of data must be looked up in the order of index. The force index (idx_sort) hint here serves this purpose, and must be written in this way to make the entire duplicate check statement valid. Otherwise, because the table is scanned before sorting, the order in which variables are assigned cannot be guaranteed, and the correctness of the query results cannot be guaranteed. The order by clause is also not negligible, otherwise MySQL will use a full table scan instead of a full index scan, even if prompted by force index, resulting in incorrect results.
The index also ensures the order of created_time,item_name and avoids file sorting. The force index (idx_sort) hint and the order by clause are indispensable, and the index idx_sort is just right here to kill two birds with one stone.
Before the query statement starts, initialize the variable to an impossible value in the data, and then enter the where clause to judge from left to right. Compare the values of variables and fields, then assign the values of created_time and item_name to the variables, and process them line by line in the order of created_time,item_name. Item_name is a string type, (@ b:=item_name) is not a valid Boolean expression, so write it as (@ b:=item_name) is not null.
The way "insert into t_target select * from t_source group by created_time,item_name;" is written, which is restricted by "sql_mode='ONLY_FULL_GROUP_BY'".
The running time is different from the original text, probably because my environment is SSD.
In addition, to avoid the overhead of returning to the table, you can increase the fields of the index
Drop index idx_sort on t_source
Create index idx_sort on t_source (created_time,item_name,item_id,modified_time,other)
Analyze table t_source
Using the above index, the time consumption of the ultimate improved SQL can be reduced to 9.5s.
Reference:
Http://blog.csdn.net/wzy0623/article/details/54378367
Http://blog.csdn.net/wzy0623/article/details/54378575
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.