Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

What is the principle of MySQL index?

2025-01-15 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Database >

Share

Shulou(Shulou.com)05/31 Report--

This article introduces the relevant knowledge of "what is the principle of MySQL index". In the operation of actual cases, many people will encounter such a dilemma, so let the editor lead you to learn how to deal with these situations. I hope you can read it carefully and be able to achieve something!

Index, may make many people daunting, after all, every interview when the index of MySQL must be asked content, even if put aside the interview, in normal development, the optimization of SQL is also the top priority.

It is no exaggeration to say that the quality of the SQL in the system can directly determine the speed of your system. But have you ever thought about a question before optimization? That is: what is our principle of optimization? What is the theoretical basis of optimizing SQL?

Although it is said that true knowledge comes from practice, I believe that theory is the basis for supporting practice, because we cannot practice blindly without purpose, because we often get half the result with twice the effort.

So having said so much, I just want to tell you that before we really start index optimization, we need to thoroughly understand the principle of the index. If you talk about optimization in this way, you will feel more silky.

1. The nature of the index

The essence of an index is a sorted data structure. I believe this is no stranger to you, because when it comes to indexing, many people naturally associate it with the contents in the dictionary.

Yes, this analogy is very vivid, but if you go deeper, I'm afraid many friends will be a little tongue-tied, then now that you already know the nature of the index, then you already have the basis for reading this article. I believe that you who read this article will certainly have a new understanding of the principle of the index.

2. Classification of index

In the database, the index is divided into many kinds (do not narrowly think that the index is only B + tree, that is because we usually use MySQL). Different categories are obviously designed to cope with different situations, so what are the types of indexes? Let's get a general idea of it.

2.1Index of Hash

Hash index is a common index, its query efficiency of single record is very high, and the time complexity is 1. However, Hash index is not the most commonly used database index type, especially our commonly used Mysql Innodb engine does not support hash index. The main reasons are as follows:

Hash indexes are suitable for precise lookups, but range lookups are not.

Because the storage engine calculates a hash code for each line, the hash code is relatively small, and the hash codes of different key value rows are usually different. What is stored in the hash index is the Hash code, and the hash code is irregular to each other, and the Hash operation can not guarantee the sequence, so the two data with similar values are very different and are divided into different buckets. This is why hash indexes can only perform full-time matching queries, because only in this way can the hash code match the data.

For hash indexing, all you need to know is here.

2.2, binary tree

In addition, the common data structure used in indexing is the tree structure. First of all, let's introduce the most classic binary tree.

Let's first introduce the characteristics of the binary tree:

The time complexity of binary tree is O (n).

A node can only have two child nodes. That is, the degree does not exceed 2.

The left child node is smaller than this node, and the right child node is larger than this node.

First of all, let's take a look at the binary tree.

However, in extreme cases, there will be chaining, that is, the number of nodes has been increasing on one side. The figure below is as follows

In the binary tree, there is a special structure-balanced binary tree, the characteristics of balanced binary tree:

The root node changes as the data changes.

The more data, the more iterations, the more IO, the slower (the IO of the disk is determined by the height of the tree)

2.4, B tree (two or three trees)

After understanding the binary tree, we can further talk about what a B-tree is. The B-tree looks like this:

From the structure diagram of the B-tree, you can see that each node contains not only the key value of the data, but also the data value.

The storage space of each page is limited, if the data is relatively large, it will lead to less key storage of each node, when the amount of data is large, it will also lead to the B-tree is very deep, thus increasing the number of disk IO, and then affect the query efficiency.

All right, at this point, the common types of indexes are over, the above content is only as a groundwork, let's officially start the MySQL B+ tree.

2.5, B+ tree

The data structure of the most commonly used index in MySQL is the B+ tree, which has the following characteristics:

Hongmeng official Strategic Cooperation to build HarmonyOS Technology Community

In the B+ tree, all the data recording nodes are stored on the leaf nodes of the same layer according to the size of the key value, while the non-leaf nodes only store key information, which can greatly reduce the number of key stored in each node and reduce the height of the B+ tree.

The keywords of the leaf nodes of the B+ tree are arranged orderly from small to large, and the data at the end of the left holds the pointer to the starting data of the node on the right.

The B + tree has fewer levels: compared with the B tree B +, each non-leaf node stores more keywords, and the tree has fewer levels, so querying data is faster.

The query speed of the B + tree is more stable: all the keyword data addresses of the B + tree are stored on the leaf node, so the times of each search are the same, so the query speed is more stable than the B tree.

The B + tree naturally has the sorting function: all the leaf node data of the B + tree form an ordered linked list, which is more convenient to query the data of the size interval, the data compactness is very high, and the hit rate of the cache is higher than that of the B tree.

The full node traversal of the B + tree is faster: the B + tree only needs to traverse all the leaf nodes, rather than traversing each layer like the B tree, which is beneficial to the full table scan of the database.

All right, having said so much about the characteristics of the B + tree, let's take a picture to see what the B + tree looks like (if you don't understand it, it doesn't matter, which will be explained step by step below)

The above data page is the place where the data page is actually stored, and the data pages are connected through a two-way linked list. OK, here we quickly understand the types of each index, and let's start the formal B+ tree analysis.

3. Primary key directory

Let's take out the data page in the above picture and refine it to become the following picture.

We all know that when MySQL stores data, it takes the data page as the smallest unit, and the storage of the data in the data page is continuous, the data in the data page is sorted by the primary key (no primary key is sorted by the ROW_ID maintained by MySQL itself), the data page and the data page are associated through a two-way linked list, and the data and data time are associated through an one-way linked list.

That is to say, there must be a minimum primary key in each data page, and then the page number and the smallest primary key of each data page will form a primary key directory (like the left part in the figure above). Suppose you want to find the data with the primary key 2 now, and finally determine that the record with the primary key 2 is in the data page 1 through the binary search method, and then locate the record with the primary key 2. We first know the general process, do not delve into the details, first look at the structural principle from the macro, and then look at the implementation principle from the micro.

What is mentioned above can actually be understood as the primary key index, and the primary key index is also the simplest and most basic index. At this time, you should know why you set up a primary key query so that it can be faster, right?

4. Index page

But now suppose there are a lot of data pages, is that the corresponding primary key directory will be very large?

What if there are 10 million records and 50 million records? Even if it is dichotomy search, its efficiency is still very low, so in order to solve this problem, MySQL designed a new storage structure-index page. For example, there is a situation like this.

Assuming that there are a lot of records in the primary key directory above, and the structure above evolves like this, MySQL will split the records into different index pages, like the following

What is recorded in the index page is the page number of each data page and the record of the smallest primary key in the data page, that is to say, the minimum primary key and data page number are not simply maintained in the primary key directory, but evolved into an index page. an index page is similar to a data page, where one page is not saved enough to split to the next.

If you want to find this record of id=20 now, huh? Which index page should I go to to find the record? So it must be necessary to maintain the index page at this time.

Yes, MySQL is also designed in this way, which means that MySQL also designs a data structure for maintaining index pages, which is also called index pages, but they are at different levels, similar to the following:

That is to say, the index page that maintains the index page is on the upper level of the index page that really stores records and data pages. Now if you want to look for this record in id=20, it starts from the top index page, and through dichotomy, you will soon be able to locate id=20 s. This record is on index page 2, then go to index page 2, and then it will be the same as before. The records in the index page are also connected through an one-way linked list), according to each of the smallest primary keys can be located id=20 is on data page 5, assuming that data page 5 is like this

Can you figure out how the data is located at this time?

5. Layering of index pages

Well, now that you know that too many index pages will spread one layer up, what if there are too many index page records in the upper layer? It's very simple, continue to split, go on to the next level, no nonsense, I'll draw a picture to help you understand.

I get it. Do you get it? Let's simulate a search process, suppose you want to look for the record 37, to be honest, I have no idea where this record is. OK, now let's simulate the search process of MySQL. First, we start with the top index page. Because of id=37, we locate to index page 16, and then continue to search in index page 16. At this time, we can also locate id=37 in index page 3, and then continue to search. Finally, we can locate the data in real data page 8, assuming that data page 8 is like this.

Isn't it perfect? If I have to complete the picture above, then. The younger brother is duty-bound (the picture is so large that the linked list structure of the data in the index page will not be drawn)

Have you discovered any secrets when you are witty at this time? Doesn't he look like a binary tree? In fact, this is the structure of a B+ tree, and this is the physical structure in which the data is actually stored on disk. What are the characteristics of B+ tree? The B+ tree is also a kind of binary search tree, but its data is only stored in the leaf node (in this case, the data page). The B+ tree composed of index page and data page is the clustered index (this sentence is very important).

The clustered index is created by MySQL based on the primary key index structure

6. Non-primary key index

But now the problem comes again, since the emphasis here is on the primary key index, we usually use a lot of other indexes in addition to the primary key index. What should we do at this time? Suppose you index name and age now. Now looking back, does the primary key index maintain a B+ tree based on the order of the primary key when inserting data?

In fact, the principle of non-primary key index is the same, MySQL is to maintain a Btree, to put it bluntly, how many indexes you build, MySQL will help you maintain how many Btrees (this is also suddenly figured out why the index can not build too many? It was known before that you can't build too many indexes, because indexes also take up space, which is actually the root cause.)

If name+age is really indexed now, what if it is stored at this time? At this time, MySQL maintains a separate B+ tree structure according to name+age, and the data is still stored in the data page, but each record in the original data is written as id=xx, but now it is written as name=xx,age=xx,id=xx. Anyway, the primary key will definitely be stored, so let's take a picture to suppress the surprise.

When inserting data, MySQL will first sort by name, if name is the same, sort by age in the federated index, and if it is the same, sort by primary key field. This is how the insertion works.

At this point, the records in each data page are actually index fields and primary key fields, while the other fields are not stored (why not? Storing the same data everywhere is a waste of space and unnecessary, which is why there is the following index optimization). As for lookup, the principle and process are the same as clustered indexes, so I won't repeat them here, but the following is crucial: suppose you perform such a SQL now:

SELECT name FROM student WHERE name='wx'

Then the query at this time is perfect, using the index and no need to return to the table.

7. Return to the table

It is like this, now we need to find the record according to name, and the query field (that is, the query field after select) only has name (as long as it is in all three fields of name,age,id). At this time, the final record can be obtained directly.

In other words, because the records in the federated index only have name,age,id, if only these three fields are queried in the query, then the desired results can be queried in the B+ tree.

Now assume that the SQL of the query looks like this (we assume that there are other fields in the student besides name,age,id)

SELECT * FROM student WHERE name='wx'

Then it's over, because although you quickly locate the record according to name, because name+age is not a clustered index, the data page of the B+ tree stores only its associated index and primary key index fields, and does not store other fields, so other attribute values cannot be obtained at this time, so what should I do?

In this case, MySQL needs to query back to the table. At this point, MySQL will do a clustered index search again according to the id in a record located, that is, it will look in the B+ tree of id maintenance according to id. Because the data page in a clustered index records a complete record of a record, this process is called back to the table.

Again emphasize the meaning of the next table: according to the results of the non-primary key index, there is no field value to look for. At this time, you need to search again according to the primary key from the root node of the clustered index, so that the records found again are complete.

Finally, let me take a look at MySQL's maintenance process for non-primary key indexes:

For non-primary key indexes (usually federated indexes), when maintaining the B+ tree, it will be judged according to the fields of the federated index. Assuming that the federated index is name + address + age, then MySQL will first sort according to name when maintaining the B+ tree of the index. If name is the same, it will sort according to the second address. If address is the same, then it will be sorted according to age, if age is the same. Then it is sorted by the value of the primary key field, and for non-primary key indexes, MySQL maintains only the index field and the primary key field when maintaining the B+ tree.

This is the end of the content of "what is the principle of MySQL Index". Thank you for your reading. If you want to know more about the industry, you can follow the website, the editor will output more high-quality practical articles for you!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Database

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report