How to make the index of MySQL more efficient 04/19 Update SLTechnology News&Howtos

How to make the index of MySQL more efficient

2025-04-19 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Database >

Shulou(Shulou.com)05/31 Report--

This article mainly introduces how to make the index of MySQL more efficient. It is very detailed and has a certain reference value. Friends who are interested must finish it!

Index in MySQL

The InnoDB engine in MySQL uses B+Tree structure to store indexes, which can minimize the number of disk IO when querying data. At the same time, the height of the tree directly affects the performance of the query. Generally, the height of the tree is maintained at 3-4 layers.

B+Tree consists of three parts: root root, branch branch and Leaf leaves, in which root and branch do not store data, only pointer addresses are stored, and all data is stored in Leaf Node. At the same time, two-way linked lists are used to link Leaf Node. The structure is as follows:

As can be seen from the above, each Leaf Node is composed of three parts, namely, the leading pointer p_prev, the data data and the subsequent pointer p_next, while the data data is ordered, the default is ascending order ASC, the key values distributed on the right side of the B+tree are always greater than the left, and the distance from root to each Leaf is equal, that is, the IO required to access any Leaf Node is the same, that is, the height of the index tree Level + 1 IO operation.

We can think of the index in MySQL as a small table that takes up disk space. The process of creating an index is actually the process of sorting according to the index column, sorting in sort_buffer_size first. If the amount of sorted data is large and the sort_buffer_size capacity is not down, you need to sort through temporary files, and the most important thing is that sorting operation (distinct,group by,order by) can be avoided through the index.

Clustered index

The table in MySQL is IOT (Index Organization Table, index organization table), the data is stored according to the primary key id order (logically continuous, physically discontinuous), and the primary key id is a clustered index (clustered index), which stores the entire row of data. If there is no specified primary key displayed, MySQL will combine all the columns to construct a row_id as a primary key, such as table users (id, user_id, user_name, phone, primary key (id)). Id is a clustered index that stores entire rows of id, user_id, user_name, and phone data.

Auxiliary index

The secondary index is also known as the secondary index. besides the index columns, the primary key id is also stored in the index. for user_name 's index idx_user_name (user_name), it is actually equivalent to idx_user_name (user_name, id). MySQL automatically adds the primary key id at the end of the secondary index. anyone familiar with the Oracle database knows that row_id (representing the physical location of the data) is stored in the index in addition to the index column. It consists of four parts: object number + data file number + data block number + data line number). We can also display the add primary key id when creating a secondary index.

Comparing the results of the two indexes, n_fields represents the number of columns in the index, n_leaf_pages represents the number of leaf pages in the index, and size represents the total number of pages in the index. Through data comparison, you can see that the secondary index does contain the primary key id, which also shows that the two indexes are exactly the same.

Index_namen_fieldsn_leaf_pagessizeidx_user_name213581572idx_user_name_id213581572 index back to the table

It is proved that the secondary index contains the primary key id. If you filter the data through the secondary index column, you may need to return to the table. For example, the business needs to query the information of the user table users through the user name user_name, and the SQL corresponding to the business interface:

Select user_id, user_name, phone from users where user_name = 'Laaa'

We know that for index idx_user_name, it is actually a small table idx_user_name (user_name, id). If you only query the columns in the index, you only need to scan the index to get the required data, and there is no need to return to the table, as shown in the following SQL statement:

SQL 1: select id, user_name from users where user_name = 'Laaa'

SQL 2: select id from users where user_name = 'Laaa'

The Extra=Using index in the execution plan of SQL 1 and SQL 2 indicates that the overlay index scan is used and there is no need to return to the table. Let's take a look at the business SQL above:

Select user_id, user_name, phone from users where user_name = 'Laaa'

You can see that the user_id,phone column after select is not in the index idx_user_name, so you need to look up the table through the primary key id. The inner part of MySQL is processed in the following two stages:

Section 1: select * * id** from users where user_name = 'Laaa' / / id = 100101

Section 2: select user_id, user_name, phone from users where id = 100101

The operation of Section 2 is called returning to the table, that is, looking for data in the original table through the primary key id in the secondary index.

Index height

MySQL index is B+tree structure, even if there are hundreds of millions of data in the table, the height of the index will not be very high, usually maintained at about 3-4 layers. I will calculate the height of the index idx_name. I know the index information from above: index_id = 4003, page_no = 5, its offset offset is page_no x innodo_page_size + 64 = 81984, check through hexdump

$hexdump-s 81984-n 10 / usr/local/var/mysql/test/users.ibd0014040 00 02 00 00 00 0f a3 001404a

Where the PAGE_LEVEL of the index is 00, that is, the idx_user_name index height of 1j0f A3 represents the index number, and the conversion to decimal is 4003, which is index_id.

Data scanning mode

Full table scan

Scan the whole B+Tree to get data from left to right, scan the whole table data, IO overhead, slow speed, lock and other serious, affecting the concurrency of MySQL.

For the business scenario of OLAP, a large amount of data needs to be returned by scanning, so the sequential IO of full table scanning is more efficient.

Index scan

Generally speaking, the index is smaller than the table, the amount of scanned data is small, the consumption of IO is less, the execution speed block, almost no lock and so on, can improve the concurrency of MySQL.

For OLTP systems, it's always nice to expect all SQL to hit the right index.

The main difference is the amount of data scanned and the operation of IO, full table scan is sequential IO, index scan is random IO,MySQL optimizes this, adding change buffer feature to improve IO performance.

Index optimization case

Paging query optimization

The business needs to query the transaction records according to the time range. The original SQL of the API is as follows:

Select * from trade_info where status = 0 and create_time > = '2020-10-01 00and create_time 00' and create_time =' 2020-10-01 0000RV 00' and create_time = '2020-10-01 0000RV 00' and create_time explain select min (id) min_id, max (id) max_id from coupons where status = 0 and create_time > =' 2020-10-01 00and create_time 0000' and create_time = current_id and id create index idx_nickname on users (nickname) / / `nickname` varchar Records: 0 Duplicates: 0 Warnings: 1mysql > show warnings +-- + | Level | Code | Message | + -+ | Warning | 1071 | Specified key was too long Max key length is 767bytes |

In the early stage of business development, in order to quickly implement the function, the definition of the length of some data table fields is relatively loose. For example, the nickname nickname of the user table users is defined as varchar, and there are business interfaces that need to query through nickname. After the system has been running for a period of time, the maximum nickname length of the query users table is 30. At this time, a prefix index can be created to reduce the length of the index and improve performance.

Key_len=515, because both tables and columns are utf8mb4 character sets with 4 bytes per character, the variable length data type + 2Bytes allows NULL extra + 1Bytes, that is, 128x 4 + 2 + 1 = 515Bytes. To create a prefix index, the prefix length may not be the maximum value of the data column of the current table, but should be the length of the most differentiated part, which can generally reach more than 90%. For example, the email field stores a value like xxxx@yyy.com, and the maximum length of the prefix index can be the maximum length of this part of xxxx.

You can see that the optimizer chooses a prefix index with a length of 123, that is, 30 x 4 + 2 + 1 = 123 Bytes, which is less than a quarter of the original size.

Although the prefix index can reduce the size of the index, it cannot eliminate sorting.

When a single-column index cannot filter data well, you can create a composite index with other fields in the where condition to better filter the data and reduce the number of IO scans. For example, the business needs to query transaction records by time period, with the following SQL:

Select * from trade_info where status = 1 and create_time > = '2020-10-01 00 and create_time 00' and create_time create index idx_status_create_time on trade_info (status, create_time); mysql > create index idx_create_time_status on trade_info (create_time,status) -- View the execution plan of SQL mysql > explain select * from users where status = 1 and create_time > = '2021-10-01 00and create_time 00' and create_time set session optimizer_trace= "enabled=on", and end_markers_in_json=on;-- execute the SQL statement mysql > select * from trade_info where status = 1 and create_time > =' 2021-10-01 00and create_time 0000' and create_time SELECT trace FROM information_schema.OPTIMIZER_TRACE\ G

Compare the statistics of the next two indexes, as follows:

Composite index TypeRows participates in filtering index column ChosenCauseidx_status_create_timeIndex Range Scan98518status AND create_timeTrueCost low idx_create_time_statusIndex Range Scan98518create_ timeFalseCostHigh

MySQL optimizer is based on Cost, COST mainly includes IO_COST and CPU_COST,MySQL 's CBO (Cost-Based Optimizer cost-based optimizer) always chooses the smallest Cost as the final execution plan, from the above analysis, CBO chooses the composite index idx_status_create_time, because both status and create_time in the index can participate in data filtering and the cost is low. While idx_create_time_status only has create_time parameter data filtering, status is ignored. In fact, CBO simplifies it to single-column index idx_create_time, which is not as selective as compound index idx_status_create_time.

Design principles of composite index

Place the column of the range query at the end of the composite index, such as idx_status_create_time.

The more frequent the column filtering, the better the selectivity, so it should be used as the leading column of the composite index, suitable for equivalent lookup, such as idx_user_id_status.

These two principles are not contradictory, but complement each other.

Jump index

In general, if the table users has a compound index idx_status_create_time, we all know that if you use create_time alone to query, the MySQL optimizer does not walk the index, so you need to create a single-column index idx_create_time. Students who have used Oracle know that index hopping scan (Index Skip Scan) can be used, similar index hopping scan of Oracle can be implemented in MySQL 8.0, and skip_scan=on can also be seen in the optimizer option.

| | optimizer_switch | use_invisible_indexes=off,skip_scan=on,hash_join=on |

It is suitable for the situation where the leading column of the composite index has fewer unique values and the trailing column has more unique values. If the unique value of the leading column becomes more, MySQL CBO will not choose index jump scan, depending on the data table of the index column.

Mysql > explain select id, user_id,status, phone from users where create_time > = '2021-01-02 23 user_id,status 01purl 00' and create_time

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.