Case study of Index in MySQL 04/21 Update SLTechnology News&Howtos

Case study of Index in MySQL

2025-04-21 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Database >

Shulou(Shulou.com)06/01 Report--

I would like to share with you the case study of the index in MySQL. I hope you will gain a lot after reading this article. Let's discuss it together.

1. Index category

In MySQL, according to the logic or field characteristics of indexes, indexes are roughly divided into the following categories: general index, unique index, primary key index, federated index and prefix index.

General index: the most basic index, without any restrictions. Unique index: the value of the index column must be unique. Primary key index: a special unique index whose value cannot be empty as a primary key. Federated index: a federated index is a common index in which the index is listed as multiple fields, and the leftmost prefix principle needs to be considered. Prefix index: indexes the first few characters of a character type or the first few bytes of a binary type.

There is another index classification distinguished from physical storage: clustered index and non-clustered index.

Clustered index: the index order is the same as the data storage order, and its leaf nodes store data rows. Non-clustered index: the leaf node of a non-clustered index stores the value of the clustered index, and it is created based on the clustered index.

To put it simply, the so-called clustered index is that the index key is with the data rows, and the value of the index key of the non-clustered index is the value of the clustered index.

two。 Data structure of index

Common data structures used to implement indexes are hash tables, ordered arrays, and search trees.

2.1 Hashi indexing

A hash table is a container that stores data in the form of key-value. Like HashMap, the hash index will also calculate the index value of key through a specific hash function, and then store the corresponding value of key in the corresponding position of the array. If there are two key with the same index value calculated by the hash function (hash conflict occurs), then this position of the array will become a linked list, storing all value with the same hash value.

Therefore, in general, the time complexity of the hash table for equivalent query can reach O (1), but in the case of hash conflict, it is necessary to traverse all the values in the linked list in order to find the qualified data.

In addition, considering that the index calculated by the hash function is irregular-the hash table hopes that all key can be fully hashed, so that the key can be evenly distributed without wasting space-that is, the key of the hash table is out of order, so it is slow to use the hash table for interval query, and sorting is the same.

Therefore, the hash table only applies to equivalent queries.

2.2 ordered array

As the name implies, an ordered array is an array arranged in the order of key. The time complexity of its equivalent query can reach O (logN) by using binary query, which is inferior to the hash table.

But range queries through ordered arrays are more efficient: first find the minimum (or maximum) value through a binary query, and then traverse backwards to another boundary.

As for sorting, the ordered array is ordered and has already been sorted naturally, except that the sort field is not an index field.

But the ordered array has a disadvantage, because the array elements are continuous and ordered, if a new data row is inserted at this time, in order to maintain the order of the ordered array, the elements larger than this element key need to be moved back one unit to make room for insertion. This way of maintaining the index is costly.

Therefore, ordered arrays are suitable for storing data that is no longer updated after the clothes are initialized.

2.3 search tree

Anyone who has known the data structure should know that the search tree is a data structure whose query time complexity is O (logN) and update time complexity is O (logN). So the search tree takes into account both query and update compared with hash table and ordered array. For this reason, the most commonly used data model in MySQL is the search tree.

Considering that the index is stored on disk, if the search tree is a binary tree, then its child nodes can only have two left and right. In the case of more data than price, the height of this binary tree may be very high. When MySQL queries, it may cause too many times of disk I / O and slow down the query efficiency.

2.4 full-text index

In addition, there is a full-text index, which solves the problem of determining whether a field is included by establishing an inverted index.

Inverted index is used to store the mapping of the storage location of a word in a document or a group of documents under full-text search. Through inverted index, the list of documents containing this word can be quickly obtained according to the word.

Full-text indexing comes in handy when retrieved by keywords.

3. BTree Index 3.1 B+ Tree in InnoDB

This is a relatively simple B + tree.

Photo Source: Data Structure Visualizations

As you can see from the example above, the bottom leaf node of this B+ tree stores all the elements and stores them sequentially, while the non-leaf node stores only the values of the index column.

3.2 schematic BTree index

In InnoDB, the BTree-based index model is the most commonly used. Here is a practical example to illustrate the structure of the BTree index in InnoDB.

CREATE TABLE `user` (`id` int (11) NOT NULL, `name` varchar (36) DEFAULT NULL, `age` int (11) DEFAULT NULL, PRIMARY KEY (`id`) USING BTREE, INDEX `nameIndex` (`name`) USING BTREE) ENGINE = InnoDB;-- insert data insert into user1 (id,name,age) values (1) values (21), (22) (22), (23), (24), (25); copy code

There are only two fields in this table: the primary key id and the name field, and an BTree index with the name field as the index column is created.

An index with the primary key id field as the index, also known as the primary key index. Its index tree structure is as follows: the non-leaf stage of the index tree stores the value of the primary key id, and the leaf node stores the whole data row corresponding to the primary key id, as shown in the following figure:

Also because the leaf node of the primary key index stores the entire data row corresponding to the primary key id, the primary key index is also called clustered index.

For the index tree with the name field as the column, the non-leaf node also stores the value of the index column, while the value stored in the leaf stage is the value of the primary key id, as shown in the following figure.

3.3 execution process of the index

First, take a look at the following sentence SQL, which queries the data rows of id=1 in the user table.

Select * from user where id=1; copy code

The execution process of this SQL is very simple. The storage engine will take the index tree of the primary key id, and when it finds id=1, it will return the data row of id=1 on the index tree (because the primary key value is unique, so finding the hit target will stop the search and return the result set directly).

3.3.1 return to the table

Next, let's look at the case of using a normal index for a query, which is slightly different from the primary key index.

Select * from user where name='one'; copy code

The flow of the above SQL query statement is as follows: first, the storage engine will search the index tree of the ordinary index name column. When hitting the record whose name is equal to one, the storage engine needs to go through a very important step: returning the table.

Because the child node of the index tree of the ordinary index stores the primary key value, when the query statement needs to query other fields except the primary key id and the index column, it needs to go back to the primary key index tree according to the value of the primary key id to query, get the whole data row corresponding to the primary key id, and then get the fields needed by the client before adding this row to the result set.

The storage engine then continues to search the index tree until it encounters the first record that does not meet the name='one', and finally returns all hit records to the client.

We call the process of querying the entire data row in the primary key index based on the id value of the primary key queried from the normal index to return the table.

When the amount of data is very large, returning to the table is a very time-consuming process, so we should try our best to avoid returning to the table, which leads to the next problem: using overlay indexes to avoid returning to the table.

3.3.2 overwrite index

I don't know if you have noticed that there is a description in the last table question: "when the query statement needs to query fields other than the primary key id and index column." in this scenario, you need to get other query fields by going back to the table. In other words, if the query statement needs to query only the fields of the primary key id and the index column, does it not need to return to the table?

Let's analyze a wave of this process, starting with a federated index.

Alter table user add index name_age ('name','age'); copy the code

Then the structure of the index tree should look like this:

The order of the child nodes of the federated index tree is sorted by the field in which the index is declared, similar to order by name, age, and the corresponding value of the index is the same as the primary key value of the ordinary index.

Select name,age from user where name='one'; copy code

The above SQL queries the name and age fields of all name='one' records, and the ideal execution plan would be to search for the federated index that has just been created.

Like normal indexes, the storage engine searches for federated indexes, and because the order of federated indexes is sorted first by name and then by age, the search stops when the first index whose name is not one is found.

Because the SQL statement only queries the name and age fields, the data obtained when the storage engine hits the query condition is the name, age and id fields, which already contains the fields needed by the client, so there is no need to return to the table.

We make the index that can get all the fields needed by the query statement on only one index tree as the overlay index, and the overlay index does not need to return to the table, so it will be faster, so we can consider using the overlay index to optimize the SQL optimization.

4. Leftmost prefix principle

The above examples are all about the use of indexes, and in fact, indexes may not be used in complex query statements in a project. First of all, we need to know that when MySQL executes the SQL statement, a table will select only one index tree to search, so generally, when building an index, we need to cover all the query conditions as much as possible and establish a joint index.

For federated indexes, MySQL follows the leftmost prefix principle: the query condition is consistent with the leftmost column or leftmost consecutive columns of the federated index, so the index can be used.

In order to explain the leftmost prefix principle in detail, it also explains some special cases of the leftmost prefix principle.

5. Index failure scenario

Even if we create a federated index based on the principle of the leftmost prefix, there are still some special scenarios that can cause the index to fail, as illustrated below.

Suppose you have a table table with a federated index listed as three fields: a _ line _ b _ line _ c, all of which are 10 in length.

CREATE TABLE `demo` (`a` varchar (1) DEFAULT NULL, `b` varchar (1) DEFAULT NULL, `c` varchar (1) DEFAULT NULL, INDEX `abc_ index` (`a`, `b`, `c`) USING BTREE) ENGINE = InnoDB; copy code 5.1 full field match

In the first case, the query conditions are all the same as the index fields, and the equivalent query is used, such as:

Select * from demo where axioms 1 'and bundles 1' and cations 1 copies select * from demo where cations 1 'and bundles 1' and breads 1; copy the code

Output the execution plans of the above two SQL to see how they use the index.

The first SQL obviously can use federated indexes.

As you can see from the execution plan, the second SQL uses the same index and index length as the first SQL, using abc_index indexes with an index length of 18 bytes.

In theory, the query condition is inconsistent with the order of the index, so the index should not be used, but because MySQL has an optimizer, it will optimize the second SQL to look like the first SQL, so the second SQL also uses the federated index abc_index.

To sum up, in the case of full field matching and equivalent query, the inconsistent order of query conditions can also be used for federated indexes.

5.2 partial field matching

The second case is that the query conditions are consistent with the index field, so you need to follow the principle of leftmost prefix, such as:

Select * from demo where axioms 1 'and bundles 1' from demo where axioms 1 'and cations 1; copy code

The above two query statements correspond to only two fields for each of the three index fields, and their execution plan is as follows:

As can be seen from their execution plans, both queries use the abc_index index, but the length of the index they use is 12 and 6 bytes, respectively.

Here we need to mention the method of calculating the index length. For the a field declared as varchar (1) in this example, its index length = 1 * (3) + 1 + 2 = 6.

The first number 1 is the length of the field when it is declared. The second number 3 is the length of the field character type: utf8=3, gbk=2, latin1=1. The third number 1 is the default type of this field. If NULL is allowed by default, the third number is 1 because NULL requires one byte of extra space; if NULL is not allowed by default, it should be 0. The fourth number, 2, is the additional bytes required for variable-length fields of type varchar.

So the use of indexes by these two queries is as follows:

Using the federated index, the index length is 12 bytes, and the index field used is the adepartment b field; using the federated index, the index length is 6 bytes, and the index field used is the a field.

Thus it can be seen that the leftmost prefix principle requires that the query condition must be consecutive columns starting from the leftmost column of the index.

5.3 range query

The third case is when the query condition uses a range query (,! =, =, between,like), such as:

Select * from demo where asides, copy code, copy code

The execution plan of these two query statements is:

As you can see from the execution plan, the first SQL uses a federated index with a length of 12 bytes, that is, two fields aforme b are used; the second SQL also uses a federated index with a length of 6 bytes, using only the a field in the federated index.

To sum up, in the case of a full-field match and a range query, a federated index can also be used, but only the first field in the federated index where a range query condition occurs.

It is important to note that:

Like must require a left fuzzy match to use the index, because the index tree of the character type field is also ordered. Between is not necessarily a range query, it is equivalent to using in multi-valued exact matching, so between does not invalidate the index columns behind the federated index because it is a range query. 5.4 query condition is a function or expression

The fourth case is where there are functions or special expressions in the query conditions, such as:

Select * from demo where id + 1 = 2 * select * from demo where concat (a,'1') ='11'; copy the code

Perhaps because of the data (empty table), the execution plan I output uses a federated index, but in fact, in the query condition, the field to the left of the equality inequality contains an expression or function. The field will not use the index.

As for the reason, it is because the value of the index field itself is no longer ordered in the case of a function or expression.

5.5 in other scenarios where the index fails, 25% of the query conditions in which the number of rows is greater than that of the whole table are used (! =). If the data types of values in not in and is not nullin query conditions are inconsistent, MySQL will convert all values to data types consistent with the index columns, so index 6 cannot be used. Index push-down

The actual structure of the federated index, the leftmost prefix principle, and the scenario of index failure have been listed above, so let's talk about the important optimization rule of index push-down.

The above query statement, as can be seen from its execution plan, uses an index length of 6 bytes, using only the first field.

Therefore, in the process of query, MySQL will only judge the condition of the first field a >'1'. When the condition is met, the storage engine will not make a bread1 judgment, but will make a judgment after returning the table to get the whole data row.

This seems stupid, even if the index only uses the first field, but it is clear that there is data in the b field in the index tree, why not judge directly?

It sounds like a bug, but in fact, before the index is pushed down, the whole query logic is: the storage engine retrieves the index tree, even if there is a value of the b field in the index tree, but because the execution plan of this query statement uses the federated index but not the b field, it is impossible to judge the condition of the b field, when the storage engine gets the data that meets the condition (a >'1'). Then the condition is judged by the MySQL server.

This situation is optimized in the MySQL5.6 version, and the index push-down technology is introduced: in the process of searching the index tree, even if other fields of the federated index are not used, we can first judge the fields contained in the query conditions and the index, so as to reduce the times of returning to the table and improve the query efficiency.

After using the index push-down optimization, the b field is used as the joint index column and exists in the query condition, but it is not used in the search index tree at the same time. The MySQL server will also pass the part of the b field in the query condition to the storage engine, and the storage engine will judge the query condition of the b field after the search index tree hits the data, and only when it is satisfied will it join the result set.

Ps: when the value of the Extra field in the execution plan contains Using index condition, the index push-down is used.

After reading this article, I believe you have a certain understanding of the case study of the index in MySQL, want to know more about it, welcome to follow the industry information channel, thank you for reading!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.