How to use Index in MySQL 07/04 Update SLTechnology News&Howtos

How to use Index in MySQL

2025-07-04 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Database >

Shulou(Shulou.com)05/31 Report--

This article introduces how to use the index in MySQL. The content is very detailed. Interested friends can use it for reference. I hope it will be helpful to you.

Mysql uses b + tree to store index information.

The structure of the b + tree is as follows:

Let's talk about several features of the b + tree:

The leaf node (the bottom layer) stores the keyword (value of the index field) information and the corresponding data, and the leaf node stores the keyword information of all records.

Other non-leaf nodes only store keyword information and pointers to child nodes.

Each leaf node is equivalent to a page in mysql, and leaf nodes at the same level are connected in the form of a two-way linked list.

Multiple records are stored in each node (page), and the records are connected in the form of a single linked list to form an ordered linked list, sorted by index fields.

When retrieving data in a b + tree: each retrieval starts from the root node, and you always need to search to the leaf node

The data of InnoDB is read and written in terms of data pages. In other words, when you need to read a record, instead of reading the record itself from disk, the entire record is loaded into memory in pages, and there may be many records in a page. then the page is retrieved in memory. In innodb, the size of each page defaults to 16kb.

Indexes in Mysql are divided into

Clustered index (primary key index)

Each table must have a clustered index, the data storage of the whole table is stored in the file in the form of a b + tree, the key in the leaf node of the b + tree is the primary key value, and data is the fully recorded information; the non-leaf node stores the value of the primary key.

Retrieving data through clustered index only needs to follow the search process of b + tree, that is, the corresponding records can be retrieved.

Nonclustered index

Each table can have multiple nonclustered indexes, b + tree structure, the key of the leaf node is the value of the index field, and data is the value of the primary key; the non-leaf node only stores the value of the index field.

When retrieving records through a nonclustered index, two operations are required, first retrieving the primary key in the nonclustered index, and then retrieving the record corresponding to the primary key in the clustered index, which is one more operation than the clustered index.

How can I get to the index? why don't some queries go to the index? Why don't you use the function to index the data?

These questions can be put aside for a moment. Let's take a look at the data retrieval process of the b + tree. This part belongs to the principle. If we understand the various data retrieval processes of the b + tree, the above questions can be understood.

What does it mean that this query is indexed?

When we do some kind of retrieval on the value of a field, if, in the process of retrieval, we can quickly locate the page where the target data is located and effectively reduce the io operation of the page without having to scan all the data pages, we think that this situation can effectively make use of the index, also known as the index, if the process can not determine which pages the data is in. We believe that the index is not valid for this query in this case, and this query does not remove the index.

Data Retrieval process in b + Tree

Unique record retrieval

As shown in the figure above, all the data is unique. Query the records of 105 as follows:

Hongmeng official Strategic Cooperation to build HarmonyOS Technology Community

Load P1 page into memory

Using dichotomy search in memory, we can determine that 100150 is in the middle of [100), so we need to load 100 associated P4 pages.

Load P4 into memory, use dichotomy to find the record of 105 and exit

Query all records of a value

As shown in the figure above, all the records of 105 are queried as follows:

Hongmeng official Strategic Cooperation to build HarmonyOS Technology Community

Load P1 page into memory

By using dichotomy search in memory, it can be determined that 105is in the middle of [100150) and 100 is associated with P4 pages.

Load P4 into memory, use dichotomy to find the most record less than 105, that is, 100, and then access back from 100 through the linked list to find all 105 records until the first value greater than 100 is encountered.

Range lookup

For data such as the figure above, query [55150] all records. Since there is a bidirectional ascending list structure between the page and the data inside the page, you only need to find the location of the starting value of the range, and then access all the data between the two locations by relying on the linked list. The process is as follows:

Hongmeng official Strategic Cooperation to build HarmonyOS Technology Community

Load P1 page into memory

Using dichotomy in memory, 55 is found in 50 associated P3 pages and 150 in P5 pages.

Load P3 into memory, use dichotomy to find the records of the first 55, then continue to access 60 and 67 in P3 through the linked list structure, and then access all records in the next page P4 through the nextpage pointer of P3. Continue to traverse all records in P4 until all 150s in P5 are accessed.

Fuzzy matching

The data is shown in the picture above.

Query all records that begin with `f`

The process is as follows:

Hongmeng official Strategic Cooperation to build HarmonyOS Technology Community

Load P1 data into memory

Using dichotomy to find the last value less than or equal to f in the records on page P1, this value is f, and the first value is greater than f, this value is zGrai f pointing to the leaf node P3Powerz pointing to the leaf node P6. At this time, it can be concluded that the record that begins with f may exist in the page in the range of [P3Magi P6), that is, P3, P4 and P5.

3. Load the page P3, find the first record that begins with f internally, and then continue to access the records in P4 and P5 by linked list, that is, you can find all the data that begins with f.

Query records containing `f`

The included query is written as% f% in sql. Can we quickly locate the page through the index?

Can take a look at the above data, f exists in every page, we can not tell from the P1 page of the records containing f in those pages, only through the io way to load all leaf nodes, and traverse all records to filter in order to find the records containing f.

So if the% value% method is used, the index is not valid for the query.

Leftmost matching principle

When the data item of the b + tree is a compound data structure, such as (name,age,sex), the b + tree builds the search tree from left to right. For example, when the data such as (Zhang San, 20 ~ F) is retrieved, the b + tree will first compare name to determine the next search direction. If name is the same, then compare age and sex in turn, and finally get the retrieved data. But when there is no name data like (205F), the b + tree doesn't know which node to check next, because name is the first comparison factor when building a search tree, and you have to search according to name before you know where to query next. For example, when retrieving data such as (Zhang San, F), the b + tree can use name to specify the search direction, but the next field age is missing, so we can only find all the data whose name is equal to Zhang San, and then match the data whose gender is F, which is a very important property, that is, the leftmost matching feature of the index.

Let's try some examples.

The following figure is a joint index of three fields (an asc,b asc,c asc). The order of the data in the index is stored in the node in the sort of a field. The index first uses a field in ascending order, if an is the same, b field is in ascending order, b is the same, c field is in ascending order, and each data in the node is carefully looked at.

Query the records of aqum1

Because the records on the page are stored in an asc,b asc,c asc sort, the a field is ordered and can be quickly retrieved by dichotomy, as follows:

Hongmeng official Strategic Cooperation to build HarmonyOS Technology Community

Load P1 into memory

By using dichotomy to find the records in P1 in memory, it can be determined that the records of axiom 1 are within the range of the correlation between {1magentin 1 record1} and {1mague 5 record1}. The two value child nodes are P2 and P4, respectively.

Load the leaf node P2, use dichotomy in P2 to quickly find the first record of aqum1, and then start searching to the next and next page through the linked list until the first record that does not satisfy aqum1 is found in P4.

Query the records of aquired 1 and baked 5

The method is the same as the above, and it can be determined that the record of axiom 1 and baggage 5 is within the scope of the association between {1 Magne1, 1, 1, 1} and {1, 5, 5, 1}, and the search process is similar to the step of searching for April1.

Query the records of bread1

In this case, through the records on page P1, it is impossible to determine that bread1 records are in those pages. All leaf nodes of the index tree can only be locked, all records are traversed, and then filtered. At this time, the index is invalid.

Query by the value of c

This is the same as the query bread1, where only all leaf nodes can be scanned, and the index is invalid.

Check together according to b and c

This kind of index can not be used, and all data can only be scanned and judged one by one, and the index is invalid at this time.

Query according to the two fields of [aQuery c]

This can only make use of the a field in the index, determine the index range through a, then load all the records associated with a, and then filter the value of c.

Query records for aquired 1 and b > = 0 and canti1

In this case, we can only determine the scope of the page in which axi1 and b > = 0 is located, and then traverse all the pages in this range. In the process of this query, it is impossible to determine which pages are the data of c. At this time, we say that c is indexed, and only an and b can effectively determine the scope of the index page.

After id is set as the primary key, a clustered index is built on id, and we can retrieve any of them. Let's take a look at the results:

Mysql > select * from test1 where id = 1000000 +-+ | id | name | sex | email | +- -+ | 0 | javacode1000000 | 2 | javacode1000000@163.com | +-+ 1 row in set (1000000 sec)

This speed is very fast, this is the only record retrieval introduced above.

Between and range search

Mysql > select count (*) from test1 where id between 100 and 110; +-+ | count (*) | +-+ | 11 | +-+ 1 row in set (0.00 sec)

Speed is also very fast, there is a primary key index on id, which uses the range search described above to quickly locate the target data.

However, if the range is too large and there are too many page spans, the speed will be relatively slow, as follows:

Mysql > select count (*) from test1 where id between 1 and 20000000; +-+ | count (*) | +-+ | 2000000 | +-+ 1 row in set (1.17 sec)

The span of the above id value is too large, there are many pages to read between 1 page and 2 million page, so it is relatively slow.

So when using between and, the interval span should not be too large.

Retrieval of in

We often use in to retrieve data.

Usually, when we do a project, it is recommended to use less table connections, such as e-commerce needs to query the information of the order and the name of the goods in the order, you can first query the order table, and then take out the id list of the goods in the order table, and use the in method to retrieve the commodity information to the commodity table. Because the commodity id is the primary key of the commodity table, the retrieval speed is relatively fast.

Retrieve 4 million pieces of data through id to see the results:

Mysql > select * from test1 a where a.id in (100000, 100001, 100002, 100003, 100004, 100005, 100006, 100007, 100008, 100009, 100010, 100011, 100012, 100013, 100014, 100015, 100016, 100017, 100018, 100019, 100020, 100021, 100022, 100023, 100024, 100025, 100026, 100027, 100028, 100029, 100030, 100031, 100032, 100033, 100034, 100035, 100036, 100037, 100038, 100039, 100040, 100041, 100042, 100043, 100044, 100045, 100046, 100047, 100048, 100049, 100050, 100051, 100052, 100053, 100054, 100055, 100056, 100057, 100058, 100059 100060, 100061, 100062, 100063, 100064, 100065, 100066, 100067, 100068, 100069, 100070, 100071, 100072, 100073, 100074, 100075, 100076, 100077, 100078, 100079, 100080, 100081, 100082, 100083, 100084, 100085, 100086, 100087, 100088, 100089, 100090, 100091, 100092, 100093, 100094, 100095, 100096, 100097, 100098, 100099). +-- + | id | name | sex | email | +- -+ | 100000 | javacode100000 | 2 | javacode100000@163.com | | 100001 | javacode100001 | 1 | javacode100001@163.com | | 100002 | javacode100002 | 2 | javacode100002@163.com |. | | 100099 | javacode100099 | 1 | javacode100099@163.com | +-+ 100 rows in set (sec) |

It takes less than a millisecond, and it's pretty fast.

This is equivalent to multiple decomposition into multiple unique records to retrieve, and then merge the records.

How does the query go when there are multiple indexes?

Let's build an index on the name and sex fields respectively.

Mysql > create index idx1 on test1 (name); Query OK, 0 rows affected (13.50 sec) Records: 0 Duplicates: 0 Warnings: 0 mysql > create index idx2 on test1 (sex); Query OK, 0 rows affected (6.77 sec) Records: 0 Duplicates: 0 Warnings: 0

Take a look at the query:

Mysql > select * from test1 where name='javacode3500000' and sex=2 +-+ | id | name | sex | email | +- -+ | 0 | javacode3500000 | 2 | javacode3500000@163.com | +-+ 1 row in set (3500000 sec)

The query speed above is very fast. There is an index on name and sex respectively. Which index do you think should go above?

Some people say that name comes first in where, so it takes the index where the name field is located, and the process can be explained like this:

Go to the index where name is located and find all the records corresponding to javacode3500000.

Traversing the record to filter out the value of sex=2

Let's take a look at the name='javacode3500000' retrieval speed. It's really fast, as follows:

Mysql > select * from test1 where name='javacode3500000' +-+ | id | name | sex | email | +- -+ | 0 | javacode3500000 | 2 | javacode3500000@163.com | +-+ 1 row in set (3500000 sec)

Name index, and then filter, really can, the speed is also very fast, is it really related to the order of fields after where? Let's reverse the order of name and sex, as follows:

Mysql > select * from test1 where sex=2 and name='javacode3500000' +-+ | id | name | sex | email | +- -+ | 0 | javacode3500000 | 2 | javacode3500000@163.com | +-+ 1 row in set (3500000 sec)

The speed is still very fast, this time, should we go to the sex index to retrieve the data, and then filter the name? Let's first take a look at the sex=2 query speed:

Mysql > select count (id) from test1 where sex=2; +-+ | count (id) | +-+ | 2000000 | +-+ 1 row in set (0.36sec)

Look at the above, the query takes 360ms, 2 million data, if you go to sex will not work.

Let's take a look at this using explain:

Possible_keys: lists that this query may take two indexes (idx1, idx2)

In fact, it is idx1 (the key column: the index actually taken).

When there is an index in multiple conditions and the relationship is and, the index discrimination is high. Obviously, the repetition of name fields is very low, and it is faster to use name query.

Fuzzy query

Look at two queries.

Mysql > select count (*) from test1 a where a.name like 'javacode1000%'; +-+ | count (*) | +-+ | 1111 | +-+ 1 row in set (0.00 sec) mysql > select count (*) from test1 a where a.name like'% javacode1000%' +-+ | count (*) | +-+ | 1111 | +-+ 1 row in set (1.78 sec)

The above first query can use the index above the name field, the following query can not determine the range of values to be found, can only scan the table, can not use the index, so the speed is relatively slow, as mentioned above.

Return to the table

When the data to be queried does not exist in the index tree, you need to get it from the clustered index again. This process is called returning the table, such as query:

Mysql > select * from test1 where name='javacode3500000' +-+ | id | name | sex | email | +- -+ | 0 | javacode3500000 | 2 | javacode3500000@163.com | +-+ 1 row in set (3500000 sec)

The above query is *. Since the index where the name column resides only contains the values of name and id, and does not include sex and email, the above process is as follows:

Use the name index to retrieve the records corresponding to javacode3500000, and take out the id of 3500000.

Retrieve the record of id=3500000 in the primary key index and get the values of all fields

Index overlay

The index tree used in the query contains the values of all the fields needed by the query, and there is no need to go to the clustered index to retrieve data, which is called index coverage.

Let's take a look at a query:

Select id,name from test1 where name='javacode3500000'

Name corresponds to the idx1 index, and id is the primary key, so the leaf node of the idx1 index tree contains the values of name and id. This query only needs to go to the idx1 index. If you use * after select, you also need to go back to the table to get the values of sex and email.

Therefore, when writing sql, try to avoid using *, * may return to the table one more time, you need to see if you can use index override to achieve higher efficiency.

Index push-down

Referred to as ICP,Index Condition Pushdown (ICP) is a new feature in MySQL 5.6. it is an optimized way to use index to filter data in the storage engine layer. ICP can reduce the number of times that the storage engine accesses the base table and the MySQL server accesses the storage engine.

For example:

We need to query the number of gender 1 records starting with javacode35 in name. The sql is as follows:

Mysql > select count (id) from test1 a where name like 'javacode35%' and sex = 1; +-+ | count (id) | +-+ | 55556 | +-+ 1 row in set (0.19 sec)

Process:

Go to the name index to retrieve the first record in javacode35, and get the recorded id

Use id to query this record in the primary key index R1

Determine if the sex in R1 is 1, and then repeat the above until all records are found.

In the above process, you need to take the name index and return to the table operation.

If we use ICP, we can do this and create a composite index (name,sex). The query process is as follows:

Go (name,sex) index to retrieve the first record in javacode35, you can get (name,sex,id), write it down as R1

Determine whether the R1.sex is 1, and then repeat the above until all records are found

In this process, there is no need to return to the table operation, the entire condition can be filtered through the indexed data, faster than the above.

Numbers invalidate string class indexes

Mysql > insert into test1 (id,name,sex,email) values; Query OK, 1 row affected (0.00 sec) mysql > select * from test1 where name ='1' +-- + | id | name | sex | email | +-+ | 4000001 | 1 | 1 | javacode2018@163.com | +- -+-- + 1 row in set (0.00 sec) mysql > select * from test1 where name = 1 +-- + | id | name | sex | email | +-+ | 4000001 | 1 | 1 | javacode2018@163.com | +- -+-- + 1 row in set 65535 warnings (3.30 sec)

For the above three sql, we insert a record.

The second query is very fast, the third is compared with name and 1, there is an index on name, name is a string type, when comparing a string and a number, the string is forced to be converted into a number, and then compared, so the second query becomes a full table scan, only taking out each piece of data, converting name to a number and 1 for comparison.

What is the effect of comparing numeric fields with strings? As follows:

Mysql > select * from test1 where id = '400000' +-+ | id | name | sex | email | +- -+ | 4000000 | javacode4000000 | 2 | javacode4000000@163.com | +-+ 1 row in set (0.00 sec) mysql > select * from test1 where id = 4000000 +-+ | id | name | sex | email | +- -+ | 0 | javacode4000000 | 2 | javacode4000000@163.com | +-+ 1 row in set (4000000 sec)

Id has a primary key index, and id is of int type. As you can see, the above two queries are very fast and can be quickly retrieved using the index, so if the field is an array, the query value will be indexed whether it is a string or an array.

Function to invalidate the index

Mysql > select a.name+1 from test1 a where a.name = 'javacode1'; +-+ | a.name+1 | +-+ | 1 | +-+ 1 row in set, 1 warning (0.00 sec) mysql > select * from test1 a where concat (a.namemery 1') = 'javacode11' +-+ | id | name | sex | email | +-+ | 1 | javacode1 | 1 | javacode1@163.com | +-+-- -+ 1 row in set (2.88 sec)

There is an index on name, the above query, the first to go to the index, the second does not go to the index, after the use of the second function, the index tree where name is located can not quickly locate the page where the data needs to be found, so it can only load the records of all pages into memory, and then use the function to calculate each piece of data before judging the conditions. at this time, the index is invalid and becomes a full table data scan.

Conclusion: index fields use functional queries to invalidate the index.

Operator invalidates the index

Mysql > select * from test1 a where id = 2-1 +-+ | id | name | sex | email | +-+ | 1 | javacode1 | 1 | javacode1@163.com | +-+-- -+ 1 row in set (0.00 sec) mysql > select * from test1 a where id+1 = 2 +-+ | id | name | sex | email | +-+ | 1 | javacode1 | 1 | javacode1@163.com | +-+-- -+ 1 row in set (2.41 sec)

There is a primary key index on id, the above query, the first to go to the index, the second does not go to the index, the second uses the operator, the index tree where id is located can not quickly locate the page where the data needs to be found, so it can only load the records of all pages into memory, and then determine whether it is equal to 1 after calculating the id of each piece of data, at this time the index is invalid and becomes a full table data scan.

Conclusion: the use of functions in the index field will invalidate the index.

Optimize sorting using indexes

We have an order table t_order (id,user_id,addtime,price), which often queries a user's order and sorts it in ascending addtime order. How should the index be created? Let's analyze it.

To create an index on user_id, let's analyze this situation, the process of data retrieval:

Hongmeng official Strategic Cooperation to build HarmonyOS Technology Community

Go to the user_id index and find the recorded id

Retrieve the whole piece of data from the table in the primary key index through id

Repeat the above to get all the target records

Sort the target records in memory by addtime

We need to know that when the amount of data is very large, the sorting is relatively slow, and the files on disk may be used. Is there a way that the queried data happens to be sorted?

Let's review the structure of the b + tree data in mysql, the record is a linked list sorted by the value of the index, if user_id and addtime are put together to form a joint index (user_id,addtime), so that the data retrieved by user_id is naturally sorted according to addtime, which directly reduces the sorting operation and is more efficient. If you need addtime to descend, you only need to flip the results.

Summarize some suggestions for using indexes

Hongmeng official Strategic Cooperation to build HarmonyOS Technology Community

Building an index on a highly differentiated field can effectively use the index, but the discrimination is too low to make effective use of the index. You may need to scan all data pages, which is almost the same as not using the index.

Note the leftmost matching principle of the federated index: it must be matched from left to right, and mysql will match to the right until it encounters a range query (>, 3 and d = 4 if you build an index in the order of (a), d does not need an index, but if you build an index (a), you can use it, and the order of a b can be adjusted at will.

3. When querying records, use less * and make use of index coverage as much as possible, which can reduce table operations and improve efficiency.

4. Some queries can use federated indexes, and then use index push-down (IPC). It can also reduce back table operations and improve efficiency.

5. Prohibit the use of functions and operators on index fields, which will invalidate the index.

6. Comparing a string field with a number invalidates the index

7. The fuzzy query'% value% 'invalidates the index and becomes a full table scan, but the' value% 'can effectively utilize the index.

8. Index fields are used as much as possible in sorting, which can reduce sorting and improve query efficiency.

On how to use the index in MySQL to share here, I hope the above content can be of some help to you, can learn more knowledge. If you think the article is good, you can share it for more people to see.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.