Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

How to play deep paging for hundreds of millions of data and whether it is compatible with MySQL + ES + MongoDB

2025-02-25 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Database >

Share

Shulou(Shulou.com)05/31 Report--

This article will explain in detail how to play deep paging for hundreds of millions of data and whether it is compatible with MySQL + ES + MongoDB. The content of the article is of high quality, so the editor will share it for you as a reference. I hope you will have a certain understanding of the relevant knowledge after reading this article.

Interview questions & Real experience

Interview questions: in the case of a large amount of data, how to achieve deep paging?

You may encounter the above questions during the interview or in preparation for the interview, and most of the answers are basically indexing by database and table, which is a very standard and correct answer, but the reality is always very bony, so the interviewer will generally ask you, now that the construction period is insufficient and the staff is insufficient, how to achieve deep paging?

Students who have no practical experience at this time, So, please listen to me.

A painful lesson

First of all, it must be clear: deep paging can be done, but deep random paging absolutely needs to be prohibited.

Like MySQL,MongoDB database is good, itself is a professional database, the processing is not good, at most is slow, but if it involves ES, the nature is different, we have to use SearchAfter Api to cycle to obtain data, which involves the problem of memory occupation, if the code is not written elegantly at that time, it may directly lead to memory overflow.

Why can't you allow random depth page skipping?

From a technical point of view, talk about why random deep paging is not allowed, or why deep paging is not recommended.

MySQL

The basic principles of paging:

SELECT * FROM test ORDER BY id DESC LIMIT 10000, 20

LIMIT 10000, 20 means to scan 10020 rows that meet the criteria, throw away the first 10000 lines, and return to the last 20 rows. If it is LIMIT 1000000, 1000100 rows need to be scanned. In a highly concurrent application, more than 100W rows need to be scanned per query.

MongoDB

The basic principles of paging:

Db.t_data.find () .limit (5) .skip (5)

Similarly, as the page number increases, the items skipped by skip will also become larger, and this operation is achieved through the cursor iterator, the consumption of cpu will be very obvious, when the page number is very large and frequent, it is bound to explode.

ElasticSearch

From a business point of view, ElasticSearch is not a typical database, it is a search engine, if you do not search the desired data under filtering conditions, you will not find the desired data if you continue deep paging. To say the least, if we use ES as a database to query, we will certainly encounter the restrictions of max_result_window when paging. See, the official told you that the maximum offset limit is 10,000.

Query process:

Hongmeng official Strategic Cooperation to build HarmonyOS Technology Community

If you query page 501 with 10 articles per page, the client sends a request to a node

This node broadcasts data to each shard, and each shard queries the first 5010 pieces of data

The query result is returned to the node, and then the data is integrated to extract the first 5010 pieces of data.

Return to the client

From this, we can see why the offset is limited. in addition, if you use a scrolling API like Search After for a deep page skip query, you also need to scroll thousands of items at a time, possibly tens of millions of pieces of data, just for the last 20 pieces of data, the efficiency can be imagined.

Align with the product again

As the saying goes, the problems that cannot be solved by technology will be solved by business.

If you believe in the evil of the product during the internship, you must achieve deep pagination + page skipping. Now you must correct the chaos, and there must be the following changes in business:

Increase the default filter criteria, such as time period, as much as possible in order to reduce the display of the amount of data.

Modify the way the skip page is displayed, change it to scroll display, or skip the page in a small range.

Reference figure for small-scale page skips:

General solution

The quick solutions in a short period of time are mainly the following:

Must-have: for sort fields, filter conditions must be indexed

Core: reduce the offset by using known data with a small range of page numbers or scrolling loaded known data

Extra: if you encounter a situation that is difficult to handle, you can also obtain excess data and intercept it to a certain extent, which has little impact on performance.

MySQL

Original paging SQL:

# page 1 SELECT * FROM `year_ score` where `year` = 2017 ORDER BY id limit 0,20; # page N SELECT * FROM `year_ score`where `year` = 2017 ORDER BY id limit (N-1) * 20,20

Through the context, it is rewritten as follows:

# XXXX represents the known data SELECT * FROM `year_ score`where `year` = 2017 and id > XXXX ORDER BY id limit 20

In the article "SQL Optimization and diagnosis", it is mentioned that LIMIT will stop the query when the conditions are met, so the total amount of scanning of this scheme will be sharply reduced, and the efficiency will be improved Max!

ES

The solution is the same as MySQL, so we can use FROM-TO Api as much as we want, without thinking about the maximum limit.

MongoDB

The scheme is basically similar, with the basic code as follows:

Related performance tests:

If you have to go deep and randomly skip the page

What should you do if you haven't confronted the product manager? it doesn't matter, there's still a chance.

The technique of deep paging in SQL is also mentioned in the article MySQL optimization. The code is as follows:

# counterexample (129.570s) select * from task_result LIMIT 20000000, 10; # positive example (5.114s) SELECT a.* FROM task_result a, (select id from task_result LIMIT 20000000, 10) b where a.id = b.id; # shows that # task_result table is a table in a production environment with a total data volume of 34 million with id as the primary key and an offset of 20 million

The core logic of this scheme is based on clustered index. Without going back to the table, we can quickly get the primary key ID of the specified offset data, and then use the clustered index to query the table. At this time, the total number is only 10, which is very efficient.

So we can take the same approach when dealing with MySQL,ES,MongoDB:

Limit the fields to be obtained, and obtain the primary key ID only through filter criteria and deep paging.

Query the required data through the primary key ID

Defects: when the offset is very large, it takes a long time, such as 5s in the article.

About hundreds of millions of data on how to play deep paging and whether it is compatible with MySQL + ES + MongoDB to share here, I hope the above content can be of some help to you, can learn more knowledge. If you think the article is good, you can share it for more people to see.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Database

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report