In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-03-26 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Database >
Share
Shulou(Shulou.com)06/01 Report--
Author | Tang Aizhong, developer of Yunhe Enmo SQM, an important contributor to Oracle/MySQL/DB2 's SQL parsing engine, SQL auditing and intelligent optimization engine, products are widely used in financial, telecommunications and other industry customers.
Abstract
The optimizer is the interpreter of logical SQL to physical storage, is a complex and "stupid" mathematical model, its input parameters are usually SQL, statistics and optimizer parameters, and the output is usually an executable query plan, so the quality of the optimizer depends on the stability and robustness of the mathematical model. Understanding this mathematical model can understand the SQL processing flow of the database.
01 execution flow of optimizer
(note: this picture is from Li Haixiang)
The figure above shows the general execution process of the optimizer, which can be simply described as follows:
1 construct the initial table access array (init_plan_arrays) according to the syntax tree and statistics.
2 access the array according to the table, calculate the best access path (find_best_ref) for each table, and save the current optimal execution plan (COST minimum)
3 if a better execution plan is found, the optimal execution plan will be updated, otherwise the optimization will end.
As can be seen from the above process, the generation of the execution plan is a process of "dynamic programming / greedy algorithm". The dynamic programming formula can be expressed as: Min (Cost (Tn+1)) = Min (Cost (T1)) + Min (Cost (T2)) +. Min (Cost (Tn-1)) + Min (Cost (Tn)), where Cost (Tn) represents the cost of accessing table T2 all the way to Tn. If the optimizer does not have any prior knowledge, it needs to do A (n) loop, which is a full permutation process. It is obvious that the optimizer has a priori knowledge, such as table size, outer join, subquery, etc., which will make the access of the table partially orderly, which can be understood as a "reduced" dynamic programming. The core function of the dynamic rule is: Join::Best_extention_limited_search. The code structure in the source code is as follows:
Bool Optimize_table_order::best_extension_by_limited_search (table_map remaining_tables, uint idx, uint current_search_depth) {for (JOIN_TAB * * pos= join- > best_ref + idx; * pos; pos++) {. Best_access_path (s, remaining_tables, idx, false, idx? (position-1)-> prefix_rowcount: 1.0, position); If (best_extension_by_limited_search (remaining_tables_after, idx + 1, current_search_depth-1). Backout_nj_state (remaining_tables, s);.}}
The above code is a recursive search in a for loop, which is a typical full-permutation algorithm.
02 optimizer parameters
MySQL's optimizer is still naive for Oracle. Oracle has a variety of rich statistical information, such as system statistics, table statistics, index statistics, etc., while MySQL needs more constants. MySQL5.7 provides table mysql.server_cost and table mysql.engine _ cost, which can be configured for users to adjust the optimizer model. Here are a few common and very important parameters:
1 # define ROW_EVALUATE_COST 0.2f
Calculate the cost of eligible rows. The more rows, the greater the cost.
2 # define IO_BLOCK_READ_COST 1.0f
The cost of reading a Page from disk
3 # define MEMORY_BLOCK_READ_COST 1.0f
The cost of reading a Page from memory, for Innodb, represents the cost of reading a Page from a Buffer Pool, so the default cost for reading a memory page and a disk page is the same
4 # define COND_FILTER_EQUALITY 0.1f
The default value of the equivalent filter condition is 0.1. for example, if the name = 'lily', table size is 100, 10 rows of data will be returned.
5 # define COND_FILTER_INEQUALITY 0.3333f
The default value for a non-equivalent filter condition is 0.3333, for example, col1 > col2
6 # define COND_FILTER_BETWEEN 0.1111f
The default value for Between filtering is 0.1111f, for example: col1 between an and b
.
There are many such constants, involving various costs such as filter conditions, JOIN cache, temporary tables, and so on. After understanding these constants, you will feel enlightened when you see the Cost of the execution plan!
03 optimizer options
In MySQL, execute select @ @ optimizer_trace to get the following parameters:
Index_merge=on,index_merge_union=off,index_merge_sort_union=off, index_merge_intersection=on, engine_condition_pushdown=on, index_condition_pushdown=on, mrr=on,mrr_cost_based=on,block_nested_loop=on,batched_key_access=off,materialization=on,semijoin=on,loosescan=on,firstmatch=on,subquery_materialization_cost_based=on, use_index_extensions=on, condition_fanout_filter=on
How is 04 Optimize Trace generated?
In the functions in the flowchart, there is a lot of code like this:
Opt_trace_object trace_ls (trace, "searching_loose_scan_index")
Therefore, during the operation of the optimizer, the execution path of the optimizer is also saved in Opt_trace_object, and then saved in information_schema.optimizer_trace, which is convenient for users to query and track.
Typical usage scenarios of 05 optimizer
5.1 full table scan
Select * from sakila.actor
Table actor statistics are as follows:
Db_name
Table_name
Last_update
N_rows
Cluster_index_size
Other_index
Sakila
Actor
2018-11-20 16:20:12
two hundred
one
0
The primary key actor_id statistics are as follows:
Index_name
Last_update
Stat_name
Stat_value
Sample_size
Stat_description
PRIMARY
2018-11-14 14:25:49
N_diff_pfx01
two hundred
one
Actor_id
PRIMARY
2018-11-14 14:25:49
N_leaf_pages
one
NULL
Number of leaf pages in the index
PRIMARY
2018-11-14 14:25:49
Size
one
NULL
Number of pages in the index
Execute the plan:
{"query_block": {"select_id": 1 "cost_info": {"query_cost": "41.00" } "table": {"table_name": "actor" "access_type": "ALL", "rows_examined_per_scan": "rows_produced_per_join": 200,200 "filtered": "100.00" "cost_info": {"read_cost": "1.00" "eval_cost": "40.00", "prefix_cost": "41.00" "data_read_per_join": "56K"} "used_columns": ["actor_id", "first_name" "last_name", "last_update" "id"]} }} IO_COST = CLUSTER_INDEX_SIZE * PAGE_READ_TIME = 1 * 1 = 1 EVAL_COST = TABLE_ROWS*EVALUATE_COST = 200 * 0.2 = 40 X Prefix costs = IO_COST + EVAL_COST
Note that the above process ignores the difference in access costs between memory pages and disk pages.
5.2 use full table scan when joining tables
SELECT * FROM sakila.actor a, sakila.film_actor bWHERE a.actor_id = b.actor_id
Db_name
Table_name
Last_update
N_rows
Cluster_index_size
Other_index_size
Sakila
Film_actor
2018-11-20 16:55:31
5462
twelve
five
The actor_id,film_id statistics in table film_actor are as follows:
Index_name
Last_update
Stat_name
Stat_value
Sample_size
Stat_description
PRIMARY
2018-11-14 14:25:49
N_diff_pfx01
two hundred
one
Actor_id
PRIMARY
2018-11-14 14:25:49
N_diff_pfx02
5462
one
Actor_id,film_id
PRIMARY
2018-11-14 14:25:49
N_leaf_pages
eleven
NULL
Number of leaf pages in the index
PRIMARY
2018-11-14 14:25:49
Size
twelve
NULL
Number of pages in the index
{"query_block": {"select_id": 1, "cost_info": {"query_cost": "1338.07"} "nested_loop": [{"table": {"table_name": "a", "access_type": "ALL" "possible_keys": ["PRIMARY"], "rows_examined_per_scan": 200,200 "rows_produced_per_join", "filtered": "100.00" "cost_info": {"read_cost": "1.00", "eval_cost": "40.00", "prefix_cost": "41.00", "data_read_per_join": "54K"} "used_columns": ["actor_id", "first_name", "last_name", "last_update"]}} {"table": {"table_name": "b", "access_type": "ref", "possible_keys": ["PRIMARY"] "key": "PRIMARY", "used_key_parts": ["actor_id"], "key_length": "2" "ref": ["sakila.a.actor_id"], "rows_examined_per_scan": 27, "rows_produced_per_join": 5461, "filtered": "100.00" "cost_info": {"read_cost": "204.67", "eval_cost": "1092.40", "prefix_cost": "1338.07", "data_read_per_join": "85K"} "used_columns": ["actor_id", "film_id" "last_update"]}]}}
The full table scan cost of the first table actor is 41, you can refer to example 1.
The second table is the NET LOOP cost:
Read_cost (204.67) = prefix_rowcount * (1 + keys_per_value/table_rows*cluster_index_size =
200 * (1 / 27 / 13 / 863 / 12) * 1
Note: 27 is equivalent to the index estimate for each actor_id,film_actor, with an average of 27 records per actor_id = 5462 prime 200
How is Table_rows calculated?
The actual number of records in the Film_ actor table is 5462, with a total of 12 page,11 leaf pages, with a total size of 11 pages 16K (default page size) = 180224Byte, a minimum record length of 26 (available by calculating the field length), and 13863 = 180224 impulse 26leaves 2. 2 is the security factor, and the worst cost estimate is made.
The number of rows returned by the table join = 200,5462 Universe 200, so the estimated cost of the rows is 5462 "0.2" 1902.4.
5.3 IN query
The index idx_id (film_id) statistics in table film_actor are as follows:
Index_name
Last_update
Stat_name
Stat_value
Sample_size
Stat_description
Idx_id
2018-11-14 14:25:49
N_diff_pfx01
nine hundred and ninety seven
four
Actor_id
Idx_id
2018-11-14 14:25:49
N_diff_pfx02
5462
four
Film_id,actor_id
Idx_id
2018-11-14 14:25:49
N_leaf_pages
four
NULL
Number of leaf pages in the index
Idx_id
2018-11-14 14:25:49
Size
five
NULL
Number of pages in the index
EXPLAIN SELECT * FROM ACTOR WHERE actor_id IN (SELECT film_id FROM film_actor) {"query_block": {"select_id": 1, "cost_info": {"query_cost": "460.79"} "nested_loop": [{"table": {"table_name": "ACTOR", "access_type": "ALL", "possible_keys": ["PRIMARY"] "rows_examined_per_scan": 200,200 "rows_produced_per_join", "filtered": "100.00", "cost_info": {"read_cost": "1.00", "eval_cost": "40.00" "prefix_cost": "41.00", "data_read_per_join": "56K"}, "used_columns": ["actor_id", "first_name", "last_name", "last_update" "id"]}}, {"table": {"table_name": "film_actor" "access_type": "ref", "possible_keys": ["idx_id"], "key": "idx_id", "used_key_parts": ["film_id"] "key_length": "2", "ref": ["sakila.ACTOR.actor_id"], "rows_examined_per_scan": 5, "rows_produced_per_join": 200, "filtered": "100.00" "using_index": true, "first_match": "ACTOR", "cost_info": {"read_cost": "200.66", "eval_cost": "40.00", "prefix_cost": "460.79" "data_read_per_join": "3K"}, "used_columns": ["film_id"] "attached_condition": "(`sakila`.`actor`.`actor _ id` = `sakila`.`film _ actor`.`film _ id`)"} ]}} id select_type table partitions type possible_keys key key_len ref rows filtered Extra- -1 SIMPLE ACTOR (NULL) ALL PRIMARY (NULL) (NULL) (NULL) 200 100.00 (NULL) 1 SIMPLE film_actor (NULL) ref idx_id Idx_id 2 sakila.ACTOR.actor_id 5 100.00 Using where Using index; FirstMatch (ACTOR)
As can be seen from the execution plan, MySQL adopts the FirstMatch approach. In MySQL, the semi-link optimization method is Materialization Strategy,LooseScan,FirstMatch,DuplicateWeedout. By default, all four optimization methods exist, and the selection method is based on the minimum COST. Now let's take FirstMatch as an example to explain the execution flow of the optimizer.
SQL is as follows:
Select * from Countrywhere Country.code IN (select City.Country from City where City.Population > 1 "1000" 1000) and Country.continent='Europe'
As you can see from the figure above, FirstMatch reduces the query and matching process by determining whether the record already exists in the result set.
For the access cost of table actor, please refer to example 1.
How is the access cost of 200.66 for the table film_actor calculated?
The index field film_id,MySQL in the access table film_actor will overwrite the index scan, that is, IDEX_ONLY_SCAN. How is the cost of an index access calculated?
Reference function double handler::index_only_read_time (uint keynr, double records)
The index block size is 16K, and MySQL assumes that all blocks are half full, then the number of index records that a block can hold is:
16K/2/ (index length + primary key length) = 16K/2/ (2x4) + 1x1366
Where the primary key is (actor_id,film_id), both fields are smallint, occupying 4 bytes, and the index idx_id (film_id) is 2 bytes, so the cost of accessing the index each time is: (5.47 million 1366-1) / 1366 = 1.0032, it takes a total of 200times to access the film_ actor table, and the total access cost is 2001.0032percent 200.66
Total cost 460.79 = table actor access cost + table film_actor access cost + row estimate cost =
The two 1s represent the filter factor, and since neither table has a filter condition, the filter factor is 1.
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.