In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-04-06 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Database >
Share
Shulou(Shulou.com)05/31 Report--
This article will share with you what are the basic principles of join optimization. The editor thinks it is very practical, so share it with you as a reference and follow the editor to have a look.
1. The basic principles of join optimization:
A: small result sets drive large result sets
B: ensure that the driven table is indexed
C: there is no guarantee that the driver table will be indexed to increase the join_buffer_size size.
Principle: mysql's join algorithm has only one Nested Loop Join algorithm. The most basic principle is to cycle through every record in the driver table.
Filter into the matching table, get the result set list, loop list each record again to filter in the next matching table, and so on.
Pseudo code [2 table association]:
For each recode in table_a {
For each recode in table_b that table_a.column=table_b.column {
Combination to output
}
}
Analysis: the cost of Nested Loop Join nested loops depends on the product of the inner and outer loop costs. That is, [number of rows in the drive table] Numm [find a cost in the matching table]
The height of the innodb B + tree index is generally 3 to 4, that is to say, no matter which table is used as the matching table, the cost of a query is constant T.
That is, Join cost: n [table rows] * T [constant], so use a small result set as the driving table, and emphasize that it is a small result set rather than a small table, because small and large are relative, so it is possible for a large table to pass the filtered result.
The set is much smaller than a small table. So emphasize small result sets.
Case: 120 million large table association, no result after 3 hours of execution before optimization. Allah wants to sing "it's enough to wait for you to love me even once."
Select c. Currentaccounname count (*)
From (select distinct PHONE from cis_data_qixin_score) a
Join TMP_A1_INFO_MOBILE_H_20151201 b on substr (a. PHONEJ 1) 7) = b.mobile_h_code
Join TMP_A1_DICT_AREA_20151201 c on c.tele_code=b.prov_telecode
Group by c.current_name
Description: the sql function is to obtain the number of phone numbers for each province. Cis_data_qixin_score: number list, TMP_A1_INFO_MOBILE_H_20151201 number H code table, TMP_A1_DICT_AREA_20151201
Number H code corresponding province table
Execute the plan:
+-+-+ -- +
| | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+-+-+ -- +
| | 1 | PRIMARY | c | ALL | NULL | NULL | NULL | NULL | 41 | Using temporary; Using filesort |
| | 1 | PRIMARY | | ALL | NULL | NULL | NULL | NULL | 124364159 | Using join buffer (Block Nested Loop) |
| | 1 | PRIMARY | b | ref | idx_mobile_h_code | idx_mobile_h_code | 33 | func | 1 | Using index condition; Using where |
| | 2 | DERIVED | cis_data_qixin_score | index | PRIMARY,idx_phone | idx_phone | 62 | NULL | 124364159 | Using index |
+-+-+ -- +
Analysis: it can be seen from the execution plan that cis_data_qixin_score deduplicates the derivative table DERIVED2's 124364159 records by number [actually about 80 million, the execution plan statistics are not very accurate] as the driver table and match.
Jion is performed with 35W records of table idx_mobile_h_code. This is a violation of the "strategist's taboo". The cost of a large result set as a driver table is 124364159T. You can't choose a small table as the driver table.
What shall I do? Nowadays, girls are looking forward to losing weight every day, so let's also lose weight for this girl named "drive watch".
Sql function analysis: through the number table and the number table substr (a. PHONE _ 1 ~ (7)) = b.mobile_h_code, the code of the province that gets the number is associated with the province table, and finally the province name is grouped by the province name
Give the number of all provinces. In other words, the first seven digits of PHONE correspond to the same province code. In a word. Group the first seven bits of phone directly, and then join.
Sql after rewriting:
Select c. Current _ name _ sum (a.cou)
From (
Select substr (a. PHONEJ 1) 7) PHONE_h_code, count (*) cou
From (select distinct PHONE from cis_data_qixin_score) a
Group by substr (a.PHONEJ 1) order by null 7)
) a
Join TMP_A1_INFO_MOBILE_H_20151201 b on a.PHONE_h_code=b.mobile_h_code
Join TMP_A1_DICT_AREA_20151201 c on c.tele_code=b.prov_telecode
Group by c.current_name
Execute the plan:
+-+- -+-+
| | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+-+- -+-+
| | 1 | PRIMARY | c | ALL | NULL | NULL | NULL | NULL | 41 | Using temporary; Using filesort |
| | 1 | PRIMARY | b | ALL | idx_mobile_h_code | NULL | NULL | NULL | 318794 | Using where; Using join buffer (Block Nested Loop) |
| | 1 | PRIMARY | | ref | 23 | cis_gather.b.mobile_h_code | 390 | Using where |
| | 2 | DERIVED | | ALL | NULL | NULL | NULL | NULL | 124364170 | Using temporary |
| | 3 | DERIVED | cis_data_qixin_score | index | PRIMARY,idx_phone | idx_phone | 62 | NULL | 124364170 | Using index |
+-+- -+-+
5 rows in set (0.00 sec)
Parsing: by grouping the first seven digits of the number to get the materialized table [about 35w] automatically create the index PHONE_h_code as the matching table. The cost of join is 35000T
The ratio of join cost before and after rewriting is: 124364159T / 350000T = 355 is there a way to fly?
Results: the optimized sql was completed in 4 minutes.
Thank you for reading! This is the end of this article on "what are the basic principles of join optimization". I hope the above content can be of some help to you, so that you can learn more knowledge. if you think the article is good, you can share it for more people to see!
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.