MySQL | 05 how to design a high-performance index? 07/01 Update SLTechnology News&Howtos

MySQL | 05 how to design a high-performance index?

2025-07-01 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Database >

Shulou(Shulou.com)06/01 Report--

Last time we focused on why indexes are used and the data structure of indexes. Today we'll show you how to design high-performance indexes.

Among them, there is such a point that the cluster index is used in the InnoDB engine. The leaf nodes in the implementation tree of the primary index store complete data records, while the secondary index stores only the values of the secondary key and the primary key.

In this way, when querying with the secondary index, the value of the primary key will be found first, and then the target value will be queried according to the value of the primary key in the primary index.

For example, imagine a table that stores four rows of data as shown below. Where Id is the primary index and Name is the secondary index.

IdNameCompany5GatesMicrosoft7BezosAmazon11JobsApple14EllisonOracle

For clustered index, if you query with primary key index, select * from tab where id = 14 to find primary key, then you can find corresponding leaf node according to B+ tree retrieval algorithm, and then obtain row data.

If you query using a secondary index and perform a conditional search on the Name column, you need two steps:

1. The first step is to retrieve Name in the auxiliary index B+ tree and reach its leaf node to obtain the corresponding primary key value.

2. In the second step, perform a B+ tree search operation in the main index B+ tree according to the primary key value, and finally obtain the whole row of data when it reaches the leaf node.

The above process is called back to the table.

Back to the table: in the data, when querying the data, after searching the index in the index, obtain the rowid of the row, and then query the data in the table according to the rowid, that is, back to the table.

Obviously, using secondary indexes causes table lookups, which inevitably affects query performance. Is there any way to reduce table lookups?

Let's start our topic: How to make MySQL indexing more efficient!

coverage indices

Above, we query is select *, if it is based on Name query Id? Select Id from tab where Name='Jobs'.

Obviously, since the auxiliary index Name has already stored the value of Id, the query will not return to the table again.

If the index already contains all the data needed to satisfy the query, we call it a Covering Index, and there is no need to return to the table.

Overlay index is a very powerful tool that can greatly improve query performance. There are some advantages to reading index instead of data:

Index entries are usually much smaller than the data row size, only need to read the index, MySQL will greatly reduce data access.

Because the index is stored in column order, there is much less IO for IO-intensive range lookups than reading each row of data randomly from disk.

Overlay indexes are especially useful for InnoDB tables. Because the secondary index of InnoDB stores the primary key value of the row in the leaf node, if the secondary primary key can override the query, the secondary query of the primary key index can be avoided;

Because overlay indexes can reduce the number of tree searches and significantly improve query performance, using overlay indexes is a common performance optimization method.

Union index/leftmost matching principle

Also known as composite index, an index consisting of two or more columns.

It specifies that MySQL uses index fields from left to right, with certain requirements for the order of fields.

Alternatively, a query can use only part of the index, more precisely the leftmost part (leftmost first), which is the legendary leftmost matching principle.

First left priority, for example:

If there is a 2-column index (col1,col2), it is equivalent to having established indexes on (col1) and (col1,col2);

If there is a three-column index (col1,col2,col3), it is equivalent to having established indexes on (col1),(col1, col2), and (col1,col2,col3);

But there is nothing on (col2,col3).

Assuming that the data table has a joint index (a, b) with 2 columns, the B+ tree structure of the index may be as follows:

Keys are sorted, and all data can be read logically sequentially through leaf nodes.

Data (1, 1)(1, 2)(2, 1)(2, 4)(3, 1)(3, 2) are arranged in the order of (a, b) comparing a first and then b.

So globally, a is globally ordered and b is not.

Based on the above structure, it is obvious that (a, b) this union index can be used for the following queries:

select * from table where a=xxx and b=xxx ;

select * from table where a=xxx;

However, this union index cannot be used for sql below, because the b-values of leaf nodes, 1,2,1,4,1,2, are obviously not sorted.

select * from table where b=xxx

As long as the leftmost prefix is satisfied, the index can be used to speed up the retrieval. This leftmost prefix can be the leftmost N fields of the union index or the leftmost M characters of the string index.

note

1. The primary key field actually establishes a joint index with all non-primary key indexes, just that if the primary key field is not explicitly declared in the joint index, it will only be on the far right in other indexes;

2, the leftmost prefix matching principle, MySQL will always match to the right until it encounters a range query (>, 3 and d = 4) If the index of (a,b,c,d) order is established, d is not indexed, if the index of (a,b,d,c) is established, then it can be used, the order of a,b,d can be arbitrarily adjusted.

3,= and in conditions can be out of order

MySQL's query optimizer will help you optimize it into a form that the index can recognize. The MySQL query optimizer determines the most efficient order to correct SQL statements before generating a true execution plan.

Why use a federated index?

1. Reduce expenses

One for three. Building a joint index (col1,col2,col3) is actually equivalent to building three indexes (col1),(col1,col2),(col1,col2,col3).

Each additional index increases write overhead and disk space overhead. For tables with large amounts of data, using a federated index can greatly reduce overhead!

2. Overlay index

For a joint index (col1,col2,col3), if there is sql: select col1,col2,col3 from test where col1=1 and col2=2. MySQL can then retrieve data directly by traversing the index without going back to the table, which reduces a lot of random IO operations.

Reducing io operations, especially random io, is actually the main optimization strategy of dba. Therefore, in real applications, coverage index is one of the main optimization methods to improve performance.

3, high efficiency

The more columns you index, the less data you filter through the index.

For a table with 1000W pieces of data, there is the following sql: select col1,col2,col3 from table where col1=1 and col2=2 and col3=3, assuming that each condition can filter out 10% of the data.

If there is only a single-valued index, then 1000W_10%= 100w pieces of data can be filtered through the index, and then the data matching col2 =2 and col3 = 3 can be found from the 100w pieces of data, and then sorted and paginated;

If it is a joint index, filter out 1000w_10%* 10%*10%= 1w through the index. The efficiency can be imagined!

index push down

Index condition pushdown (ICP) is a common optimization in MySQL, especially when MySQL needs to retrieve data from a table.

ICP (index condition pushdown) is an optimization operation for MySQL to extract data records from tables by using index (secondary index) tuples and WHERE conditions of sieve fields in the index.

The idea of ICP is that the storage engine checks the where condition of the filter field in the index when accessing the index, and filters out the data record if the data in the index tuple does not meet the pushed index condition.

ICP (Optimizer) pushes index condition processing down from the server level to the storage engine level as much as possible.

The storage engine uses indexes to filter irrelevant data and returns only data that meets the index condition to the server layer. That is to say, data filtering is done at the storage engine level as much as possible, rather than returning all data to the server layer and then filtering according to the where condition.

push-down process

When ICP is not used by Optimizer

The process of data access and extraction is as follows:

①: MySQL Server issues a command to read data, calling the index read or full table read of the storage engine. This is an index read.

②, ③: Enter the storage engine, read the index tree, search on the index tree, and read out the ones that meet the conditions (red) from the table records (step ④, usually IO).

5: Return identified results from storage engine.

Above, not only in the index line index read (usually in memory, fast. Step 3), also carry out step 4, usually IO.

③: Return multiple pieces of data found from the storage engine to MySQL Server, MySQL Server gets more tuples in ③.

7--8: Filter according to the WHERE clause condition to obtain the data that meets the condition.

Note that more data is obtained at the MySQL Server layer, and then filtered, and the final result is a small amount of qualified data.

On systems that do not support ICP, indexes are used only as data access.

Optimizer When using ICP

① MySQL Server issues a command to read data, the process is the same as Figure 1.

②, ③: Enter the storage engine, read the index tree, search on the index tree, and read out the ones that meet the pushed down conditions (red ones) from the table records (step ④, usually IO);

5: Return identified results from storage engine.

Here, index reads are not only performed on index rows (usually in memory, which is fast). Step (3), judging according to the downward pushing condition at this stage, if the condition is not satisfied, do not read the data in the table, and directly judge the next index item on the index tree until the condition is satisfied, then proceed to step (4), so that the IO amount is reduced compared with the mode without ICP.

6: Return a small amount of data found from the storage engine to MySQL Server, MySQL Server gets a small amount of data in 7.

Therefore, comparing Figure 1 without ICP, what is returned to the MySQL Server layer is a small amount of qualified data.

When ICP optimization is enabled, the storage engine first uses the index to filter where conditions that can be filtered, and then uses the index to do data access. The data filtered by the index condition does not need to be read, nor will it be returned to the server.

example

For example:

SELECT * FROM employees

WHERE first_name='Mary'

AND last_name LIKE '%man';

When ICP is not available, all records with first_name Mary are read from the storage engine by index prefix, and then the like condition of last_name is filtered by where at the server end;

After ICP is enabled, because the like filtering of last_name can be performed through the index field, the storage engine internally filters out records that do not meet the where condition by comparing the index with the where condition. This process does not need to read the entire record, and only returns the filtered record to the server, thus improving query performance.

precautions

There are a few things to note about ICP:

ICP can only be used for secondary indexes, not primary indexes;

Not all where conditions can be filtered by ICP. If the field of a certain where condition is not in the index, of course, the whole record should be read for filtering. In this case, the server side should still be used for where filtering.

The acceleration effect of ICP depends on the proportion of data filtered by ICP within the storage engine;

Summarize the principles of index construction

1, the leftmost prefix matching principle, a very important principle, MySQL will always match to the right until the range query (>,

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.