What are the new features of MySQL8.0 optimizer 07/06 Update SLTechnology News&Howtos

What are the new features of MySQL8.0 optimizer

2025-07-06 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Database >

Shulou(Shulou.com)05/31 Report--

This article mainly introduces "what are the new features of MySQL8.0 optimizer". In daily operation, I believe many people have doubts about what the new features of MySQL8.0 optimizer are. The editor consulted all kinds of materials and sorted out simple and easy-to-use operation methods. I hope it will be helpful to answer the doubts about "what are the new features of MySQL8.0 optimizer?" Next, please follow the editor to study!

Why do I need to configure cost model constants? We know that MySQL has been developed for decades, but the optimizer still uses the weight values of hardcode to measure resources such as io, cpu, and so on, and these weights are actually based on the experience of many years or even more than a decade ago. Think about how fast hardware has developed over the years. Dozens of core servers are not few, even in some large companies large-scale use, ssd has long become the mainstream, NVME is also on the rise. High-speed RDMA network is entering the homes of ordinary people. All these even affect the realization and reform of the database system. Obviously, the weights of those hardcode are out of date, and we need to provide users with definable ways, or even more, to intelligently set them automatically according to the hardware environment.

MySQL5.7 introduces two new system tables that are updated by exposing them to the user, as follows:

Root@ (none) 04:05:24 > select * from mysql.server_cost

+-+

+-+

+-+

6 rows in set (0.00 sec)

Where default_value is generated column, and its expression is fixed to the default value:

Default_value float GENERATED ALWAYS AS (

(case cost_name

When _ utf8mb3'disk_temptable_create_cost' then 20.0

When _ utf8mb3'disk_temptable_row_cost' then 0.5

When _ utf8mb3'key_compare_cost' then 0.05

When _ utf8mb3'memory_temptable_create_cost' then 1.0

When _ utf8mb3'memory_temptable_row_cost' then 0.1

When _ utf8mb3'row_evaluate_cost' then 0.1 else NULL end)) VIRTUAL

Root@ (none) 04:05:35 > select * from mysql.engine_cost

+-- +

+-- +

| | default | 0 | io_block_read_cost | NULL | 2018-04-23 13:55:20 | NULL | 1 |

| | default | 0 | memory_block_read_cost | NULL | 0-04-23 13:55:20 | NULL | 2018 |

+-- +

You can update it through the update statement, for example:

Root@ (none) 04:05:52 > update mysql.server_cost set cost_value = 40 where cost_name = 'disk_temptable_create_cost'

Query OK, 1 row affected (0.05sec)

Rows matched: 1 Changed: 1 Warnings: 0

Root@ (none) 04:07:13 > select * from mysql.server_cost where cost_name = 'disk_temptable_create_cost'

+-+

+-+

| | disk_temptable_create_cost | 40 | 2018-06-23 16:07:05 | NULL | 20 |

+-+

1 row in set (0.00 sec)

/ / perform a flush optimizer_costs operation to update memory after update

/ / but the old session will still use the old cost data

Root@ (none) 10:10:12 > flush optimizer_costs

Query OK, 0 rows affected (0.00 sec)

You can see that the usage is also very simple, which contains two tables: server_cost and engine_cost, which configure the server layer and the engine layer respectively.

Related codes:

Global cache Cost_constant_cache

The global cache maintains a current cost model information. During the lex_start, the user thread determines whether it initializes the local pointer, and if not, goes to the cache to copy the pointer locally.

Initialize the global cache:

Cost_constant_cache::init

Create a Cost_model_constants, which contains two types of information: server layer cost model and engine layer cost model. The class structure is as follows:

Cost_constant_cache-> Cost_model_constants

-- > Server_cost_constants

/ / server_cost

-- > Cost_model_se_info

-- > SE_cost_constants

/ / engine_cost if the storage engine provides the interface function get_cost_constants, then take it from the storage engine

Read the configuration from the system table for initialization and flush optimizer_costs and update cache:

Read_cost_constants ()

| |-> read_server_cost_constants |

| |-> read_engine_cost_constants |

Since the user can update the system table dynamically, after executing flush optimizer_costs, it is possible that the old version is still in use by some session, so reference counting is required. The old version of ref counter can only be released after it is reduced to 0.

Thread cost model initialization

Cost_model_server

On the thd of each thread, a Cost_model_server object THD::m_cost_model is hung. When lex_start (), if the thread's m_cost_model is not initialized, it will get the global pointer and store it locally:

Cost_model_server::init

Const Cost_model_constants * m_cost_constants = cost_constant_cache- > get_cost_constants ()

/ / A reference count will be added to ensure that it will not be deleted when referencing

Const Server_cost_constants * m_server_cost_constants = masked costumes managers-> get_server_cost_constants ()

/ / also get the global pointer

It can be seen that thd does not create its own cost model, only references pointers in cache

Table Cost Model

Struct TABLE::m_cost_model, type: Cost_model_table

Its value is taken from the cost model object stored in the above thd

Cost_estimate

A unified object type cost_estimate is used to store the calculated cost results, which consists of four dimensions:

Double io_cost; /

< cost of I/O operations double cpu_cost; ///< cost of CPU operations double import_cost; ///< cost of remote operations double mem_cost; ///< memory used (bytes) 未来目前来看，除非根据工作负载，经过充分的测试才能得出合理的配置值，但如何配置，什么是合理的值，个人认为应该是可以自动调整配置的。关键是找出配置和硬件条件的对应关系。这也是我们未来可以努力的一个方向。 reference: Cost Model官方文档官方博客1:The MySQL Optimizer Cost Model Project 官方博客2: A new dimension to MySQL query optimizations Optimizer Cost Model Improvements in MySQL 5.7.5 DMR 5.Slide: MySQL Cost Model Related Worklog: WL#7182: Optimizer Cost Model API WL#7209: Handler interface changes for new cost model WL#7276: Configuration data base for Optimizer Cost Model WL#7315 Optimizer cost model: main memory management of cost constants WL#7316 Optimizer cost model: Command for online updating of cost model constants Histogram 直方图也是MySQL一个万众期待的功能了，这个功能实际上在其他数据库产品中是很常见的，可以很好的指导优化器选择执行路径。利用直方图存储了指定列的数据分布。MariaDB从很早的10.0.2版本支持这个功能，而MySQL在最新的8.0版本中也开始支持使用 MySQL里使用直方图是通过ANALYZE TABLE语法来执行： ANALYZE [NO_WRITE_TO_BINLOG | LOCAL] TABLE tbl_name UPDATE HISTOGRAM ON col_name [, col_name] ... [WITH N BUCKETS] ANALYZE [NO_WRITE_TO_BINLOG | LOCAL] TABLE tbl_name DROP HISTOGRAM ON col_name [, col_name] ... 举个简单的例子：我们以普通的sysbench表为例： root@sb1 05:16:33>

Show create table sbtest1\ G

1. Row

Table: sbtest1

Create Table: CREATE TABLE sbtest1 (

Id int (11) NOT NULL AUTO_INCREMENT

K int (11) NOT NULL DEFAULT'0'

C char (120) NOT NULL DEFAULT'

Pad char (60) NOT NULL DEFAULT'

PRIMARY KEY (id)

KEY kumb1 (k)

) ENGINE=InnoDB AUTO_INCREMENT=200001 DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_ai_ci

1 row in set (0.01 sec)

Create a histogram and store it in a data dictionary

Root@sb1 05:16:38 > ANALYZE TABLE sbtest1 UPDATE HISTOGRAM ON k with 10 BUCKETS

+-+

+-+

+-+

1 row in set (0.55 sec)

Root@sb1 05:17:03 > ANALYZE TABLE sbtest1 UPDATE HISTOGRAM ON kjournal pad with 10 BUCKETS

+-- +

+-- +

+-- +

2 rows in set (7.98 sec)

Delete the histogram on the pad column:

Root@sb1 05:17:51 > ANALYZE TABLE sbtest1 DROP HISTOGRAM ON pad

+-- +

+-- +

+-- +

1 row in set (0.06 sec)

Root@sb1 05:58:12 > ANALYZE TABLE sbtest1 DROP HISTOGRAM ON k

+-+

+-+

+-+

1 row in set (0.08 sec)

If you do not specify bucket, the default number of Bucket is 100

Root@sb1 05:58:27 > ANALYZE TABLE sbtest1 UPDATE HISTOGRAM ON k

+-+

+-+

+-+

1 row in set (.56 sec)

Histogram statistics are stored in the InnoDB data dictionary and can be obtained through the information_schema table

Root@information_schema 05:34:49 > SHOW CREATE TABLE INFORMATION_SCHEMA.COLUMN_STATISTICS\ G

1. Row

View: COLUMN_STATISTICS

Create View: CREATE ALGORITHM=UNDEFINED DEFINER=mysql.infoschema@localhost SQL SECURITY DEFINER VIEW COLUMN_STATISTICS AS select mysql.column_statistics.schema_name AS SCHEMA_NAME,mysql.column_statistics.table_name AS TABLE_NAME,mysql.column_statistics.column_name AS COLUMN_NAME,mysql.column_statistics.histogram AS HISTOGRAM from mysql.column_statisticswhere can_access_table (mysql.column_statistics.schema_name,mysql.column_statistics.table_name)

Character_set_client: utf8

Collation_connection: utf8_general_ci

1 row in set (0.00 sec)

As you can see from the definition of the column_statistics table, there is a system table called mysql.column_statistics, but it is hidden and not exposed.

Here is a simple example:

Root@sb1 05:58:55 > ANALYZE TABLE sbtest1 UPDATE HISTOGRAM ON k WITH 4 BUCKETS

+-+

+-+

+-+

1 row in set (0.63 sec)

Query the histogram information on the table

Root@sb1 06:00:43 > SELECT JSON_PRETTY (HISTOGRAM) FROM INFORMATION_SCHEMA.COLUMN_STATISTICS WHERE SCHEMA_NAME='sb1' AND TABLE_NAME = 'sbtest1'\ G

1. Row

JSON_PRETTY (HISTOGRAM): {

"buckets": [

[

38671

99756

0.249795

17002

]

[

99757

100248

0.500035

four hundred and ninety two

]

[

100249

100743

0.749945

four hundred and ninety five

]

[

100744

172775

1.0

16630

]

"data-type": "int"

"null-values": 0.0

"collation-id": 8

"last-updated": "2018-09-22 09-22 09 59 purl 30.857797"

"sampling-rate": 1.0

"histogram-type": "equi-height"

"number-of-buckets-specified": 4

}

1 row in set (0.00 sec)

As you can see from the output json, the histogram produced after the execution of the above statement has four bucket, the data type is Int, and the type is equi-height, that is, the equal height histogram (the other is the equal width histogram, or SINGLETON). The information described in each Bucket includes the upper and lower bounds of the value, the frequency, and the number of different values. Through this information, a more accurate data distribution can be obtained, so that the optimizer can determine a better execution plan based on these statistics.

If there are a large number of duplicate values on the column, MySQL may also choose the equal width histogram. For example, in the above example, if we update the value on column k to half 10 and half 20, the resulting histogram data is as follows:

Root@sb1 10:41:17 > SELECT JSON_PRETTY (HISTOGRAM) FROM INFORMATION_SCHEMA.COLUMN_STATISTICS WHERE SCHEMA_NAME='sb1' AND TABLE_NAME = 'sbtest1'\ G

1. Row

JSON_PRETTY (HISTOGRAM): {

"buckets": [

[

ten,

0.499995

]

[

twenty,

1.0

]

"data-type": "int"

"null-values": 0.0

"collation-id": 8

"last-updated": "2018-09-22 14 41D 17.312601"

"sampling-rate": 1.0

"histogram-type": "singleton"

"number-of-buckets-specified":

}

1 row in set (0.00 sec)

As mentioned above, for the SINGLETON type, each bucket contains only two values: the column value and the corresponding cumulative frequency (that is, how many percent of the data is smaller or equal than the current Bucket)

Note the sampling-rate here, where the value is 1, which means that all the data on the table is read for statistics, but usually for large tables, we may not want to read too much data because it may cause excessive memory consumption, so MySQL also provides a parameter histogram_generation_max_mem_size to limit the upper limit of memory usage.

If there is not much DML on the table, the histogram is basically stable, but if it is written frequently, then we may need to update the histogram regularly, and the MySQL itself will not actively update it.

The optimizer uses histogram to calculate the filterability of columns, and most predicates can be used. Refer to the official documents for details

About the influence of histograms on query plans, this blog and this blog

Related code

Code structure:

Take MySQL8.0.12 as an example, the main code is in the sql/histogram directory:

Ls sql/histograms/

Equi_height_bucket.cc

Equi_height_bucket.h

Equi_height.cc

Equi_height.h histogram.cc

Histogram.h singleton.cc

Singleton.h

Value_map.cc

Value_map.h

Value_map_type.h

Class structure:

Namespace histograms

| |-> Histogram / / Base class |

| |-- > Equi_height / / Contour histogram, template class. Instantiate parameters to data types, which need to be defined according to the type. |

/ / see document "equi_height.cc"

| |-> Singleton |

/ / Equal width histogram, only the value and the frequency of its occurrence are stored

Create and store histogram:

The related functions and stacks for handling histogram are as follows:

Sql_cmd_analyze_table::handle_histogram_command

| |-- > update_histogram / / Update histogram |

| |-- > histograms::update_histogram / / call the API function in namespace |

a. Determine each column:

/ / histograms::field_type_to_value_map_type: check whether the column type supports

/ / covered_by_single_part_index: if the column is competitive or uk, no histogram will be created for it

/ / if it is generated column, find the columns it depends on and add it to the set

b. Judge the half-fraction ratio of the sample, which is mainly limited by the parameter histogram_generation_max_mem_size. If it is set large enough, the whole table data will be read for analysis.

| |-> fill_value_maps / / start reading column data to be analyzed from the table |

| |-> ha_sample_init |

| |-> ha_sample_next |

| |-- > handler::sample_next / / read the next record and sample it by random number |

Value_map::add_values / / add the read data to the map

|->.

| |-> ha_sample_end |

|-> build_histogram / / create histogram object a. Determine the histogram type: if the number of values is less than the number of buckets, use Singleton, otherwise use the Equi_height type |-> Singleton::build_histogram |-> Equi_height::build_histogram |-> Histogram::store_histogram / / store histogram information in the column_statistic table |-> dd::cache::Dictionary_client::update

| |-> drop_histogram / / Delete histogram |

Use histogram

The way to use it is simpler:

First, in the table object TABLE_SHARE, add the member m_histograms, whose structure is a unordered map, the key value is field index, and value is the corresponding histogram object.

The relevant stack for obtaining the filterability of column values is as follows:

Get_histogram_selectivity

| |-> Histogram::get_selectivity |

| |-> get_equal_to_selectivity_dispatcher |

| |-> get_greater_than_selectivity_dispatcher |

| |-> get_less_than_selectivity_dispatcher |

| |-- > write_histogram_to_trace / / write to optimizer_trace |

MySQL supports the use of histograms for a variety of operation types, including:

Col_name = constant

Col_name constant

Col_name! = constant

Col_name > constant

Col_name

< constant col_name >

= constant

Col_name

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.