How to understand MySQL Index deeply 07/09 Update SLTechnology News&Howtos

How to understand MySQL Index deeply

2025-07-09 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Database >

Shulou(Shulou.com)05/31 Report--

This article will explain in detail how to understand the MySQL index in detail, the content of the article is of high quality, so the editor will share it for you as a reference. I hope you will have some understanding of the relevant knowledge after reading this article.

Preface

When it comes to MySQL database, we will think of a few key words: index, transaction, database lock and so on. Index is the soul of MySQL, a sharp tool for query and the top priority in the interview.

You may know that the bottom layer of the index is the b + tree, which will speed up the query and build an index in the table, but this is far from enough. Here are a few common interview questions for the index:

1. Why does the index use the data structure of b + tree?

2. What is the difference between a clustered index and a nonclustered index?

3. When will the index fail and what is the leftmost matching principle?

When you encounter these problems, you may find that you still have little knowledge of indexes. today, let's learn about MySQL indexes.

1. How a query statement is executed

First of all, let's take a look at how a query statement is executed in a MySQL database, where the index appears, and what role it plays.

1.1 the application discovers SQL to the server

When the SQL statement is executed, the application connects to the appropriate database server, and the server processes the SQL.

1.2 query cache

Then the database server will first query whether there is a cache of the SQL statement. Key is the statement of the query and value is the result of the query. If your query can be directly hit, it will directly take the value out of the cache and return it to the client.

Note: the query will not be parsed, will not generate an execution plan, and will not be executed.

1.3 query optimization processing to generate execution plan

If the cache is not hit, proceed to step 3.

Parse SQL: generate a parse tree and verify that keywords such as select,where,left join etc.) are correct.

Preprocessing: further check whether the parsing tree is legal, such as checking the existence of data tables and columns, verifying user permissions, and so on.

Optimize SQL: decide which index to use, or determine the join order of tables when multiple tables are associated. Next, turn the SQL statement into an execution plan.

1.4 return the query results to the client

Finally, the database server returns the query results to the client. (if the query can be cached, MySQL will also put the results in the query cache)

This is the execution process of a query statement, and you can see that the index appears in the process step of optimizing the SQL, and then find out what the index is?

II. Overview of the index

Let's take a brief look at the basic concepts of the index.

2.1 what is the index

Index is a data structure that helps database to obtain data efficiently.

2.2 Classification of indexes 1) partition in terms of storage structure

Btree Index (Backtree)

Hash indexing

Full-index full-text index

RTree

2) divide it from the application level.

Normal index: that is, an index contains only a single column, and a table can have multiple single-column indexes.

Unique index: the value of the index column must be unique, but null values are allowed.

Composite index: an index contains multiple columns.

3) divide it from whether the order of the table records is consistent with that of the index.

Clustered index: the order of table records is the same as that of indexes.

Nonclustered index: the order of table records is inconsistent with that of indexes.

2.3 clustered and nonclustered indexes 1) simple summary

Clustered index: an index created with a primary key.

Nonclustered index: an index created with a non-primary key (also known as a secondary index).

2) detailed summary

Clustered index

The order of the records in the clustered index table is the same as that of the index, so the query efficiency is fast, because as long as the first index value record is found, the rest of the continuous records will be continuously stored in the physical table.

Disadvantages: the addition is slow because the data pages are reordered when the records are inserted to ensure that the physical order of the records in the table is consistent with the index order.

Nonclustered index

The logical order of the index is different from the physical storage order of the uplink on the disk. The nonclustered index stores the primary key and index column in the leaf node. When we use the nonclustered index to query the data, we need to get the primary key on the leaf and look up the data we are looking for in the table. This process is what we call returning to the table.

3) the difference between clustered index and nonclustered index

The clustered index stores data in the table at the leaf node.

Nonclustered indexes store primary keys and index columns on the leaf node.

For instance

For example, in a Chinese dictionary, if you want to look up the word "A", you only need to turn to the first few pages of the dictionary, where it begins with a, and then "ah" and "love" will come out. In other words, the body part of the dictionary is itself a directory, and there is no need to look up other directories to find what you are looking for. We call the content of this text itself a directory arranged according to certain rules as = = clustered index = =.

If you encounter a word you don't know, you can only look for it according to the "partial radical", and then turn directly to a page according to the page number after the word to find the word you are looking for. However, the sorting of the words found by combining the radical catalogue and the word search table is not the real sorting method of the text.

For example, to look up the word "jade", we can see that the page number of "jade" in the word search table after checking the head of the department is 587, followed by Chueh, which is 251. Obviously, the two words are not next to each other in the dictionary, and the continuous words "Yu, Jue and Ying" we see now are actually their sorting in the nonclustered index and the mapping of the words in the dictionary body in the nonclustered index. We can find the words we need in this way, but it takes two processes to find the results in the directory, and then turn to the corresponding page number of the results. We refer to this kind of catalog as a pure catalog, and the sorting method in which the text is purely the text is called = nonclustered index =.

2.4 MySQL how to add index 1) add PRIMARY KEY (primary key index) ALTER TABLE `table_ name` ADD PRIMARY KEY (`column`) 2) add UNIQUE (unique index) ALTER TABLE `table_ name` ADD UNIQUE (`column`) 3) add INDEX (general index) ALTER TABLE `table_ name` ADD INDEX index_name (`column`) 4) add FULLTEXT (full-text index) ALTER TABLE `table_ name` ADD FULLTEXT (`column`) 5) add multi-column index ALTER TABLE `table_ name` ADD INDEX index_name (`column1`, `column2`) `column3`) III. Index underlying data structure

After understanding the basic concepts of an index, perhaps the most curious thing is how the underlying layer of the index is implemented. Why can indexes look up data so efficiently? How to design the data structure to meet our requirements? The following through the general programmer's mind to think if we design the index, how to design to achieve the effect of the index.

3.1 Hashi indexing

What you may think of directly is to use a hash table to achieve fast lookup, just like we usually use hashmap, value = get (key) O (1) time complexity in one step, indeed, hash indexing is a way.

1) definition

Hash indexing is the use of a certain hash algorithm, only one hash algorithm can immediately locate the corresponding position, the speed is very fast. In essence, it converts the key value into a new hash value and locates it according to this hash value.

2) limitation

The hash index cannot be sorted using the index.

Cannot make a multi-field query.

In the case of a large number of repeated keys, the efficiency of hash indexing is also very low (hash collision problem occurs).

Range queries are not supported.

In the InnoDB engine commonly used in MySQL, B+ tree indexes are still used more frequently. InnoDB is an adaptive hash index (the creation of the hash index is automatically optimized by the InnoDB storage engine =, and we cannot intervene).

3.2 how to design the data structure of the index?

Suppose we want to query the data of a certain interval, we just need to get the starting value of the interval and look it up in the tree.

If the data are:

1) query the data in the range of [70.30]

When the starting point node 10 is found, it traverses along the linked list until the node data in the linked list is greater than the end value of the interval. All the data traversed is all the data that matches the interval value.

2) how else can it be optimized?

Using binary search tree, the function of interval query has been realized. However, in order to save memory, we can only store the tree on the hard disk.

Then, the read or access of each node corresponds to a hard disk IO operation. The number of disk IO operations per query, also known as = = IO progressive complexity = =, that is, = = height of the tree = =.

So, we need to reduce the number of disk IO operations, that is, to reduce the height of the tree.

The structural optimization process is shown in the following figure:

Here, the binary tree is changed into an M tree, which reduces the height of the tree, so how much of this M should be chosen?

Question: the m-tree index is constructed for the same number of data. the larger the m in the m-tree, the smaller the height of the tree. is the m in the m-tree the bigger the better? How old is the right size?

Whether it's data in memory or on disk, the operating system reads by page (usually the size of a page is 4kb, which can be viewed through the getconfig (PAGE_SIZE) command), reading only one page of data at a time.

If the amount of data to be read exceeds the size of a page, multiple IO operations will be triggered. So when choosing the m size, try to make the size of each node equal to the size of a page.

In general practical applications, the degree d (the number of bifurcations of the tree) is a very large number, usually more than 100 percent = the height (h) of the tree is very small, usually no more than 3 percent =.

3.3 B tree

Follow the problem-solving approach to know what the data structure we want is. At present, the commonly used data structure of index is B + tree. Let's first introduce what is B tree (that is, B-tree).

1) the characteristics of B-tree:

Keywords are distributed on all nodes of the entire tree.

Any keyword appears and only appears in one node.

The search may end at a non-leaf node.

Its search performance is equivalent to doing a binary search in the full set of keywords.

As shown in the following figure:

3.4 B+ tree

Now that you know the B-tree, let's take a look at the B + tree, which is also the data structure used in most cases of MySQL indexes.

1) basic characteristics of B + tree

The subtree pointer of a non-leaf node is the same as the number of keywords.

The subtree pointer P [I] of the non-leaf node points to the subtree where the keyword belongs to [k [I], K [itree 1]) (note: the interval is closed before opening).

Add a chain pointer to all leaf nodes.

All keywords appear on the leaf node.

These basic features are designed to meet the following characteristics.

2) the characteristics of B + tree

All keywords appear in the linked list of leaf nodes, and the keywords in the linked list are ordered.

The search is only hit on the leaf node.

The non-leaf node is equivalent to the index layer of the leaf node, and the leaf node is the data layer that stores the keyword data.

3) the advantage of B + tree as index compared with B tree.

The disk read and write cost of B+ tree is lower. There is no pointer to the specific information of keywords in the B + tree, so its internal node is smaller than the B tree. If all keywords are stored in the same disk, the more keywords can be contained in the disk, the more keywords need to be found in memory at one time, and accordingly, the number of IO reads and writes is reduced.

The query efficiency of trees is more stable. All the data of the B+ tree exists in the leaf node, the path length of all keyword queries is the same, and the query efficiency of each data is the same. The B-tree may stop searching at non-leaf nodes, so the query efficiency is not stable enough.

The B+ tree only needs to traverse the leaf nodes to traverse the whole tree.

Why did you choose B-tree for MongoDB's index and B + tree for MySQL's index?

Because MongoDB is not a traditional relational database, but a NoSQL non-relational database stored in Json format, the purpose is high performance, high availability and easy to expand. Get rid of the relational model, so the need for scope and traversal queries is less strong.

3.6 what's the difference between MyISAM storage engine and InnoDB indexing? 1) MyISAM storage engine

Primary key index

The index file (.MYI) and the data file (.MYD) of MyISAM are separated. The index file only holds the pointer (physical location) of the page where the record is located, and reads the page through these pointers, and then reads the indexed rows.

The leaf node in the tree holds the physical location of the corresponding row. With this value, the storage engine can query back to the table smoothly and get a row of complete records = =.

At the same time, each leaf also holds a pointer to the next leaf, which facilitates the range traversal of the leaf node.

Auxiliary index

In MyISAM, there is no structural difference between the primary key index and the secondary index, except that the primary key index requires key to be unique, while the key of the secondary index can repeat = =.

1) Innodb storage engine

Innodb's primary key index and secondary index have been mentioned before, review again.

Primary key index

The InnoDB primary key index stores both primary health values and row data.

Auxiliary index

For the secondary index, InnoDB saves the primary key value in the leaf node, and queries a complete record back and forth through the primary key value, so the secondary index actually carries out a secondary query, which is not as efficient as the primary key index.

IV. Invalidation of MySQL index

In the previous section, we learned about the various data structures of the index, as well as the comparison between B-tree and B + tree, so we should have a preliminary understanding of the underlying implementation of the index. From the perspective of the application layer, this section takes a look at how indexing can better meet our needs, and when the MySQL index will fail.

Let's think about a small question first.

Question: when the query conditions are 2 or more, is it better to create multiple single-column indexes or a federated index? What's the difference between them? Which is more efficient?

Let's first set up some single-column indexes for testing:

Here you create a table with three single-column indexes userId,mobile,billMonth.

Then make a multi-column query.

Explain select * from `troommobilesms11` where userid ='1' and mobile = '13504679876' and billMonth = '1998-03'

We found that only one single-column index, userid, was used in the query. Why? Because it depends on the optimization strategy of the MySQL optimizer.

When a multi-condition joint query is made, the optimizer evaluates which condition has the most efficient index, and it chooses the best index to use. In other words, all three index columns here may be used, but the optimizer determines that only one index, userid, can be used to complete this query, so the final key displayed by explain is userid.

4.1 Summary

Multiple single-column indexes the optimizer chooses the optimal index strategy when querying multiple conditions, either using only one index or using multiple indexes.

However, at the bottom of multiple single-column indexes, multiple B+ index trees will be built, which will not only take up space, but also waste search efficiency, so it is best to build joint indexes in multi-conditional joint queries.

So the joint index can be used for all three conditions? Will there be the problem of index failure?

4.2 failure of federated index

This section refers to and quotes the article:

A picture shows that the index of MySQL is invalid.

Create the user table, and then create a federated index full-value match of the four fields name, age, pos, and phone (the index is the best).

The index takes effect, which is the best query.

Then when will it expire?

1) violate the leftmost matching principle

Leftmost matching principle: the leftmost priority, with the leftmost as the starting point, any continuous index can be matched, if not continuous, it can not be matched.

For example, if you set up a federated index with an index of (where b), then only looking up the index b = 2 will not take effect. In other words: if the index is (a), (a), (b), (a), (b), (b), (b) and (b), only (a), (()), (()), ()

The leftmost name field is skipped to query and the index is found to be invalid.

If you encounter a range query (>, 3 and d = 4), if you build an index in the order of (a recalcitrance bpjcrech d), d does not need an index, because the c field is a range query, and the fields after it will stop matching.

2) do anything on the index column

Operations such as calculations, functions, and (manual or automatic) type conversions can cause indexes to fail and perform full table scans.

Explain select * from user where left (name,3) = 'zhangsan' and age = 20

Here, the left function operation is performed on the name field, causing the index to fail.

3) use is not equal to (! =,) explain select * from user where age! = 20

Explain select * from user where age 20

4) start with a wildcard in like ('% abc')

Index failure

Explain select * from user where name like'% zhangsan'

The index is effective

Explain select * from user where name like 'zhangsan%'

5) indexing invalid explain select * from user where name = 2000 without single quotation marks

6) or join index fails explain select * from user where name = '2000' or age = 20 or pos =' cxy'

7) order by

Normal (the index participates in sorting) and does not violate the leftmost matching principle.

Explain select * from user where name = 'zhangsan' and age = 20 order by age,pos

Violates the leftmost prefix rule, resulting in additional file sorting (which degrades performance).

Explain select name,age from user where name = 'zhangsan' order by pos

8) group by

Normal (the index participates in sorting).

Explain select name,age from user where name = 'zhangsan' group by age

Violates the leftmost prefix rule, resulting in temporary tables (which degrade performance).

Explain select name,age from user where name = 'zhangsan' group by pos,age

Understand how a query statement is executed and find that indexing is a data structure that can be looked up efficiently.

Learned about the classification of indexes, the difference between clustered indexes and nonclustered indexes, and how to create various indexes.

Through the step-by-step analysis of the requirements, this paper analyzes why MySQL chooses b+tree as the data structure of the index, and compares the difference between btree and b+tree and the difference of index between MyISAM and innodb.

Understand a variety of situations in which the index will fail, and the more important leftmost matching principle, accordingly, we can do some optimization when building the index.

On how to in-depth understanding of the MySQL index to share here, I hope that the above content can be of some help to you, can learn more knowledge. If you think the article is good, you can share it for more people to see.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.