Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

What are the reasons for the slow paging scene?

2025-02-27 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Development >

Share

Shulou(Shulou.com)06/03 Report--

This article mainly explains "what are the reasons for the slow paging scene". Interested friends might as well take a look. The method introduced in this paper is simple, fast and practical. Let's let the editor take you to learn "what are the reasons for the slow paging scene?"

Start with a question.

When I was in tx five years ago, I found that the speed of mysql requests was very slow in the paging scenario. When the amount of data is only 10w, the select xx from stand-alone machine is about 2 minutes 3 seconds. I asked my mentor why, and he asked, "indexing the scene, what is the time complexity of getting the nth number in mysql?"

The pursuit of the answer

Confirm that the scenario assumes that status has an index on it. Select * from table where status = xx limit 10 offset 10000. Will be very slow. When the amount of data is small, there is a delay of a few seconds.

Xiaobai answered and guessed a log (N), thinking to find a node is not log (N). Naturally, the tutor asked me to study it myself.

This stage took 10 minutes.

If you continue to answer the question and analyze it carefully, you will find it awkward to find it through the index. Because you do not know the distribution of the first 100 numbers in the left and right subtrees, it is impossible to take advantage of the search characteristics of the binary tree. Through learning, we know that the index of mysql is a b + tree.

After looking at this picture, it suddenly became clear. The 100th largest tree can be found directly through the linked list of leaf nodes with o (n) complexity. But even o (n) is not excruciatingly slow, whether there is a reason.

This stage, mainly through the online search of information, intermittently used for 10 days.

System study here recommended two books, a "MySQL Technology Insider InnoDB Storage engine", through which he can have a deeper understanding of the implementation mechanism of InnoDB, such as mvcc, index implementation, file storage.

The second book is "High performance MySQL", which starts from the use level, but talks more deeply, and mentions a lot of design ideas.

By combining the two books and understanding them over and over again, mysql can barely enter the room.

Here are two key concepts:

Clustered index: contains the primary key index and the corresponding actual data. The leaf node of the index is the data node.

Secondary index: can be understood as a secondary node, its leaf node or index node, including the primary key id. Even if the first 10000 will be thrown away, mysql will check the data on the clustered index through the primary key id on the secondary index. This is 10000 random io, which naturally slows down into huskies. Questions may be raised here as to why there is such behavior, which has something to do with the layering of mysql, and limit offset can only act on the result set returned by the engine layer. In other words, the engine layer is also innocent, and he doesn't know that the 10000 are going to be thrown away. The following is a hierarchical diagram of mysql, and you can see that the engine layer and the server layer are actually separate. Until now, I probably understood the reason for the slowness. This stage took a year.

Analogy at this time has been working for 3 years, but also began to look at some source code. After watching etcd, I saw some source code of tidb. No matter which kind of database, in fact, the query of a statement is made up of logical operators.

Logical operators introduce some logical operators in the query plan before writing specific optimization rules.

DataSource this is the data source, that is, the table, t in select * from t.

Selection selection, such as the where filter condition in select xxx from t where xx = 5.

Projection projection. Taking c column in select c from t is a projection operation.

Join connection, select xx from T1, T2 where T1. C = T2. C means to Join the two tables of T1 and T2.

Selection, projection, and join (SPJ for short) are the most basic operators. Among them, Join has many connection methods, such as internal connection, left outer connection, right outer connection and so on.

After select b from T1, T2 where T1 c = T2 c and T1 > 5 becomes a logical query plan, the DataSource corresponding to T1 T2 is responsible for pulling up the data. Above with a Join operator, join the results of the two tables according to t 1.c = t 2.c, then press t 1.a > 5 to do a Selection filter, and finally project the b column. The following figure is an unoptimized representation: so it is not that mysql does not want to pass limit and offset to the engine layer, but because the logical operator is divided, it is impossible to see how much qualified data the specific operator contains.

How to solve the problem?

High performance MySQL mentions two schemes.

Plan one according to the actual needs of the business, to see if it can be replaced with the next page, the function of the previous page, especially in the ios, android side, the previous kind of complete paging is not common. Here, replace limit, offset, with > secondary index (that is, search criteria) id. When the id is called again, it needs to be returned to the front end.

The second plan is just on the front. Here is a concept: index coverage: when the data queried by the secondary index is only id and the secondary index itself, then there is no need to look up the clustered index.

The idea is as follows: `select xxx,xxx from in (select id from table where second_index = xxx limit 10 offset 10000) ``means to find the unique id value of the database corresponding to the data from the conditional query, because the primary key is on the secondary index, so you don't have to go back to the disk of the clustered index to pull it. Then query the clustered index through these 10 primary key id which have been limit. This will only lead to ten random io. In cases where the business does need paging, using this scheme can greatly improve performance. It usually meets the performance requirements.

At this point, I believe you have a deeper understanding of "what are the reasons for the slow paging scene?" you might as well do it in practice. Here is the website, more related content can enter the relevant channels to inquire, follow us, continue to learn!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Development

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report