How does MySQL query data 07/04 Update SLTechnology News&Howtos

How does MySQL query data

2025-07-04 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Database >

Shulou(Shulou.com)05/31 Report--

This article introduces the relevant knowledge of "how MySQL queries data". In the operation of actual cases, many people will encounter such a dilemma, so let the editor lead you to learn how to deal with these situations. I hope you can read it carefully and be able to achieve something!

In the last article, we talked about the principle of MySQL indexing, mainly talking about how MySQL maintains the fields of the index. Let's briefly review:

MySQL for the primary key index maintenance is the simplest, that is, according to the primary key to maintain a B + tree, because the characteristics of the primary key is generally incremental, that is to say, orderly, so MySQL in maintenance only need to add records to the data page in turn, the data page is full and then continue to add to the next data page. And each record is complete, that is, the values of all columns are maintained.

However, for non-primary key indexes, when maintaining the B+ tree, it will be judged according to the fields of the federated index.

Suppose the federated index is: name + address + age, then MySQL will first sort according to name when maintaining the B+ tree of the index, if name is the same, it will sort according to the second address, if address is the same, then it will sort according to age, if age is the same, then it will sort according to the primary key field value (the primary key can not be the same), and for non-primary key indexes, MySQL will maintain the B+ tree. Just maintain index fields and primary key fields.

In addition, the structure of the B+ tree is roughly like this:

The maintenance process of the data here will not be described in detail. Friends who are not clear can read an article.

Today, let's take a look at the basic principles of querying MySQL.

Because only on the basis of understanding the principles, can you write a SQL that meets the expectations, and you can know whether your SQL has used an index or not. This is the most basic principle.

This article is about some principles, so a lot of things are not very good to draw, but I will certainly draw for you.

Principle of equivalent matching

We now know that if it is a [primary key index], when inserting data, it is arranged in turn according to the order of the primary key, and if there is not enough data page, it will split into another data page, and then maintain the data page through the index page.

The data pages are maintained by two-way linked lists, and if there are too many index pages, they will split up (like the figure above), and so on, thus forming a B + tree structure made up of components, namely [clustered index].

But the question is that we have established not only the primary key index, but also the non-primary key index, so how is the non-primary key index maintained at this time?

Because it is impossible to repeat the primary key index, it is directly appended and inserted when saved to the data page (we default that the primary key is self-increasing)

As for the non-primary key is generally repeatable, suppose that the values of a federated index field are really the same at this time, what should I do?

As mentioned at the beginning, you can only sort by primary key field at this time, which is why non-primary key records also save a primary key field when they are saved.

In addition, as mentioned above, if the index is built too much, it will take up too much space, because MySQL will maintain a B+ tree for each index, after all, non-primary key fields are not necessarily incremental on the one hand, and may be duplicated on the other. So based on this point, those fields that are frequently added and deleted must not be suitable for indexing.

All right, let's go back to the federated index of name+age, assuming we now have a SQL like this.

SELECT * FROM student WHERE name='wx' AND age=1

Conditions like this WHERE are federated and arranged in the order of the fields in the federated index, and all use the equal sign condition, which we call equivalent matching; this is a very important principle.

Leftmost prefix matching principle

Suppose there are several records:

ClassId=1,name=wx,age=1,id=1; classId=1,name=xq,age=2,id=2; classId=1,name=wx,age=1,id=3; classId=2,name=zs,age=3,id=4

According to the above (classId,name,age) federated index, they are saved in the data page like this.

First sort by the value of the classId field.

If the classId field value is the same, it is sorted according to the second name field value.

If the name field value is the same, then it is the same according to the age field value, and if the age field value is the same, it is sorted by the primary key field value.

Then in the search, because you are now the condition of calssId and name, so MySQL can quickly locate a batch of data through classId.

Because this condition is the first condition for MySQL to maintain the B+ tree (that is, sorting according to classId), and then similarly, name is the second condition for MySQL to maintain the B+ tree (that is, sorting according to name), so even if you do not add the age condition, it will be OK to use the indexes classId and name, but if you query in this way

SELECT * FROM student WHERE age=1

This won't work, because MySQL will be based on the federated index you built.

The query is first based on classId, then on name, and then on age.

If you skip the first two fields directly, this is no different from a full table scan, because MySQL can't confirm where the age is at all and can only scan it one by one.

Similarly, if your WHERE condition is followed by classId=xx and then age=xx, classId can use the index in this case, because the first field maintained by the B+ tree is classId.

However, age cannot use the index query, because name cannot be located, so at this time we can only do another full scan based on the records that meet the classId. This rule is called the leftmost prefix matching principle.

If you don't understand the leftmost matching principle, let me make an analogy and introduce it again. Let's assume classId,name,age, the joint index of these three is like three floors, classId is the first floor, name is the second floor, and age is the third floor.

Suppose you want to get to the third floor, do you have to climb from the first floor, then the second floor, then the third floor; you can just climb to the first floor, and it doesn't matter if you don't climb the remaining two layers, which means you can use classId to do the equivalence query, and it doesn't matter if you don't use the remaining fields.

By the same token, you can climb from the first layer, and then climb to the second layer without climbing the third layer, just like you use classId,name to query, or you can climb from the first layer to the second layer and then to the third layer, that is, you use the three fields classId,name,age to do the equivalent query in turn. So far, it all belongs to OK.

But if you don't want to climb the first floor, you want to skip the first floor and start straight from the second floor, is it possible?

This is obviously impossible, which means skipping classId and querying name directly makes it impossible to use the index at all. It is the same with calling name to query age, skipping one or two layers directly from the third layer, that is to say, calling classId and name to query age directly can not be used to index.

Now you should fully understand the principle of leftmost matching, right? The most basic condition of the following principle is the need to meet: the leftmost prefix matching principle.

Range lookup rule

Scope search rules, I believe this is also the most frequently used principle, such as the following SQL

SELECT * FROM student WHERE classId > 1 AND classId

< 4 因为此时由联合索引（classId,name,age）构建出来的 B+ 树中的数据是根据 classId,name,age 去排序的。所以此时是能够根据 classId 查询到一个范围中的数据的，虽然他们可能不在同一个数据页中，但是我们说过了，数据页之间是通过双向链表进行连接的。所以此时针对 classId 的范围查找依旧是能走索引的。继续看如果条件是这样子的 SELECT * FROM student WHERE classId >

1 AND classId

< 4 and name >

An AND name

< x 你是不是觉得前面的 classId 是符合范围查找的，然后在查询出来的结果中继续范围查找 name。但实际上并不是这样子的，因为我们说了联合索引（classId,name,age）是按照 calssId、name、age 依次去排序的，因为此时 classId 的顺序确定以后，是不需要根据 name 排序的，也就是说在 classId 的范围内 name 是无序的，听不明白？没关系，看我画图现在我们依次插入的数据是上面的四条，因为在插入的时候是可以直接根据 classId 就能够确定下这四条记录的顺序了。所以此时是根本不会去管后面的 name 或者是 age 是什么顺序的，或许你可能觉得如果他们的 class Id 一样呢？好，我就来一点一点排除你心中的疑惑，看下面这张图

We assume that the calssId of the third record is the same as the classId field value of the second record, then the judgment will be based on the name at this time.

It turns out that aa is less than zz, which puts a smaller name first, but why do I say above that name is unordered after it is sorted by calssId?

Because the second case we are talking about (classId is the same) is a special case, we cannot use special cases to draw a general conclusion. Age is the same.

So remember: for range lookups, as long as the leftmost column of the federated index is valid, the rest of the index cannot be used (since the index cannot be used, it can only be scanned by a full table).

Equivalent matching + range search

Suppose we have a SQL like this.

SELECT * FROM student WHERE classId = 1 AND name > an AND name < x

First of all, calssId is indexed, and secondly, name is indexed.

Why? Why are you contradicting what you said? Just said the range after the first column can go to the index, but now that name also go to the index, name is obviously the second column.

Listen to me slowly. First of all, the scope search only has the first column to walk the index. I have explained the specific reasons in detail, but now if you use the first condition in the federated index to do equivalent matching, the second one can use the range query to walk the index. Look at the figure below.

Now we first locate the classId=1 records, which must be certain, but when MySQL maintains the B + tree, it is impossible to sort directly according to the classId=1 records.

Because at this time, the classId is all 1, in other words, the classId of all three records is 1. MySQL simply cannot determine who is in front and who is behind.

So we need to continue to judge according to name at this time, and the result is what the above picture looks like.

According to the name discovery, the order of records can be determined, so all name in records with classId equal to 1 are ordered.

This is why the equivalence can be followed by a range (but a consistent principle is that the leftmost matching principle must be satisfied, that is, the previous record must be determined in order to continue to judge the later data). Do you want to shout loudly at this moment?

So far, can you now determine whether and which indexes are used by your SQL based on the indexes you have built?

Do you want to write some SQL as soon as possible?

Don't worry, be sure to finish the summary.

Order By + limit optimization

Some of the above are the most basic principles of query, but if you want to use them in practice, you must learn here, because we can't write sql without paging.

And paging is basically also used in sorting combinations, so we also put this together.

Suppose you now create a federated index on the three fields name,age,adderss, and the SQL statement looks like this when querying:

SELECT name,age,address FROM student ORDER BY name,age,address LIMIT 10

If you write this, then MySQL can understand, ah, you want to sort according to the name,age,address federated index, and then take the first 10 records, and the fields of the records are all in the B + tree that maintains the federated index, then you need to go back to the table and query in the clustered index.

In addition, the fields after ORDER BY must be in the same way, that is to say, they are all ascending or descending, and some cannot be ascending and some descending.

To put it bluntly, it is common to index what fields are sorted, but do not mix ascending and descending order.

In fact, for the optimization of MySQL, I believe we have more or less found that optimization really does not have all the rules and routines, because the best optimization is to make adjustments in combination with the actual business area. There is no way to achieve it overnight and to do it once and for all.

Grouping query optimization

In fact, grouping query optimization is similar to the above Order By + limit optimization, which is basically the same reason. For example, there is such a SQL.

SELECT count (*) FROM student GROUP BY NAME

If the NAME is not indexed, then all the data is queried, placed in a temporary file, and then grouped into groups of data according to the grouped fields.

Then perform the aggregation operation (in this case, the count (*) operation), which is obviously inefficient, so we definitely need to index the NAME.

I don't understand why there is a need for temporary documents.

According to the semantic logic of group by, statistics are done by name. Because name does not have an index at this time, grouping according to name first needs to get a data sorted by name, so we need a temporary table to record and count the results.

In other words, what we need is a well-ordered result? Then you can index name directly.

Suppose we build the index based on name, because name is already sorted, and we can get the results of group by at this time, without temporary tables or additional sorting.

In other words, if the statement execution process can read the data and get the results directly, there is no need for additional memory, otherwise additional memory is needed to save the intermediate results.

Concluding remarks

Summary of basic principles of index query

Generally speaking, we usually write SQL in [equivalence + range], which is the most common conditional search. In a case like this, an index must be established. The fundamental basis for building an index is to understand how MySQL helps us maintain non-primary key indexes.

How MySQL helps us maintain non-primary key indexes

In fact, I have already emphasized it at the beginning, but in order for you to consolidate it, I will say it again here.

For the primary key index, MySQL sorts according to the primary key field (usually we set the primary key field to be self-increasing, otherwise we are really making trouble for ourselves, assuming that the primary key is not self-increasing, which will also lead to page splitting, which will degrade the performance.)

For non-primary key indexes (we generally refer to federated indexes), MySQL will also maintain a B+ tree for us, except that the data stored on this B+ leaf node (that is, the data page) is only index field data and primary key data.

Suppose there are joint indexes name, address, age, so that when inserting data, MySQL will first sort according to name, name will sort by address, and address field value will sort by age field value.

The value of the age field is the same, sorted by the primary key field.

This is why the primary key field is maintained.

In addition, why only maintain index columns for non-primary key fields? Because the clustered index (by maintaining the Btree of the primary key field) already has the values of all records, if other indexes maintain all the fields, it is a waste of space.

So much for the content of "how MySQL queries data". Thank you for reading. If you want to know more about the industry, you can follow the website, the editor will output more high-quality practical articles for you!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.