How to solve the problem of deep paging in mongodb 07/06 Update SLTechnology News&Howtos

How to solve the problem of deep paging in mongodb

2025-07-06 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Shulou(Shulou.com)06/02 Report--

This article mainly explains "how to solve the problem of deep paging in mongodb". The content in the article is simple and clear, and it is easy to learn and understand. Please follow the editor's train of thought to study and learn "how to solve the problem of deep paging in mongodb".

Suddenly one day, there was user feedback, and all the saved personal templates were disappeared. when I heard the news, my first thought was to suspect that the user had deleted it, because there were daily development tasks, and I didn't care about it at that time. Since then, users have responded to similar problems every three or five days. At this time, I subconsciously thought that there was no feedback from users before, but now there is more feedback. Is there bug in the programs that have been launched recently? After the deletion operation, I urgently checked the online record of the code and found that there was no template deletion operation. At the same time, the template loading response was very slow, and occasionally the interface returned 500. The test colleague tried it, and it was normal to load the data. With a skeptical attitude, I began to analyze the writing of this part of business logic and program, and found some clues:

Saving a new user as my template creates a "personal template" scenario where id is stored in the user's datasheet

The saved template page and the scene page are stored in the same table in mongodb, and the volume of the data table contains more than a billion pieces of data.

The query function has paging. When the mongodb data is relatively large, the performance for deep paging is relatively poor, paging code

Query = new Query (Criteria.where ("sceneId") .is (sceneId)) .with (new Sort (Sort.Direction.DESC, "id")) .skip ((page.getPageNo ()-1) * page.getPageSize ()) .limit (page.getPageSize ())

The query method is as follows:

Then the obvious reason for the loss of the template is analyzed, which is that the user deletes the "personal template" scenario, resulting in the template page under the scene being deleted. After analyzing the reason, the following optimization scheme is thought of:

Scene list query does not display "personal template" scene data

Optimize mongodb paging, optimize mongodb paging, optimize as follows

If (tplId = = null) {/ / get query = new Query (Criteria.where ("sceneId") .is (sceneId)) .with (new Sort (Sort.Direction.DESC, "id")) .skip ((page.getPageNo ()-1) * page.getPageSize ()) .limit (page.getPageSize ()) } else {query = new Query (Criteria.where ("sceneId") .is (sceneId) .lte (tplId)) .with (new Sort (Sort.Direction.DESC, "id")) .limit (page.getPageSize ());}

Soon after the optimization plan is set, the program will be pre-released, deployed, tested and launched. Sure enough, after the user cannot see the "personal template" scenario in the list, the problem of missing feedback template is gone, but the problem of unresponsive loading of personal template list persists. Is it possible that mongodb paging optimization does not work? In fact, it is not. Because the data volume of the table has reached the level of 1 billion, the maximum response time of the API is set to 2 seconds, and 500 is returned when the maximum response time is exceeded. Then mongodb paging optimization can not fundamentally solve the problem of slow loading, and a new optimization solution is generated:

The scene page and the template page are stored separately, and statistics show that the total number of template pages is more than 7 million, and the rest are scene page data.

Saving a new user as a personal template does not generate a "personal template" scenario, and the scene ID is stored in the user table.

Establish the relationship table between userId and pageId in MySQL library

The optimized query is shown below:

So there are two questions, such a large volume data, how to migrate template page data? How to store the newly generated template page data? Thought about the solution for a while:

The template page data is double written, the newly inserted data is stored in different mongodb tables, and the relationship between UID and Tpl is established in MySQL.

Develop the template migration program, from the user's point of view, train each user with the template identification to obtain the sceneId, query the template page, then save it to a new table, and establish the relationship between UID and Tpl.

Then we start to follow this optimization scheme:

Step 1: establish the relationship table between UID and Tpl in MySQL, and write the data double first. The data relation table is shown in the following figure

2. The second step is to develop a template migration program. There are several ways to migrate. The first is to use ETL tools for data migration, and the second is to find out the historical data, send it to MQ, and set a certain number of consumers to consume and execute it in a multi-threaded way. In the end, I think the best solution is the second, as shown below:

The process is defined. In order not to affect the normal execution of the business, the general work of migrating data is to obtain data from the database. Next, we develop the migration program. First, we set up two projects, data-provider,data-consumer,data-provider query users, send the saved as a template scenario ID to mq,data-consumer to accept scenario ID, to query page. And save it to the new table of template page and the relational table of UID and Tpl of MySQL library, respectively.

The data-provider code is as follows:

@ Componentpublic class TaskTplSyncRunner implements ApplicationRunner {private Logger logger = LoggerFactory.getLogger (this.getClass ()); @ Autowired private TaskService taskService; @ Override public void run (ApplicationArguments args) throws Exception {AtomicInteger total = new AtomicInteger (0); AtomicInteger count = new AtomicInteger (1); / / start sceneId Long start = null If (! CollectionUtils.isEmpty (args.getOptionValues ("start")) {start = args.getOptionValues ("start") .get (0) = null? 1: Long.valueOf (args.getOptionValues ("start") .get (0));} / / Last sceneId Long end = null If (! CollectionUtils.isEmpty (args.getOptionValues ("end")) {end = args.getOptionValues ("end"). Get (0) = null? 1000: Long.valueOf (args.getOptionValues ("end"). Get (0));} / / the span of each execution Integer pageSize = null If (! CollectionUtils.isEmpty (args.getOptionValues ("pageSize")) {pageSize = args.getOptionValues ("pageSize"). Get (0) = null? 2000: Integer.valueOf (args.getOptionValues ("pageSize"). Get (0);} logger.info ("init start value is= {}, end value is= {}", start, end) While (true) {Map objectMap = taskService.sendTplMq (start, end); if (objectMap.containsKey ("endSceneId")) {/ / get the last id end of the next loop = Long.valueOf (objectMap.get ("endSceneId"). ToString (); start = end-pageSize;} count.getAndIncrement () If (objectMap.containsKey ("total")) {total.addAndGet (Integer.valueOf (objectMap.get ("total") + ""));} / / is the last user to jump directly out of if (start)

< 1101) { break; } } logger.info("execute personage tpl sync success,total count {} ", total.intValue()); logger.info("execute personage tpl sync task end。。。。。。。。。。"); }}@Async public Map sendTplMq(Long start, Long end) { Map paramMap = new HashMap(); paramMap.put("start",start); paramMap.put("end",end); List sceneList = sceneDao.findSceneList(paramMap); Map resultMap = new HashMap(); sceneList.stream().forEach(scene ->

{amqpTemplate.convertAndSend ("exchange.sync.tpl", "scene.tpl.data.sync.test", JsonMapper.getInstance (). ToJson (scene));}); resultMap.put ("total", sceneList.size ()); resultMap.put ("resultFlag", false); resultMap.put ("endSceneId", start); return resultMap } the reason for getting the value of start,end,pageSize from ApplicationArguments is to prevent program execution from being interrupted, and set the VM options yourself.

The data-consumer code is as follows:

@ Componentpublic class Receiver {Logger logger = LoggerFactory.getLogger (this.getClass ()); @ Autowired private ScenePageDoc scenePageDoc; @ RabbitListener (queues = "queue.scene.tpl.data.sync.test", containerFactory = "containerFactory") public void receiveTplMessage (String message) {Long pageId = null; try {HashMap hashMap = JsonMapper.getInstance () .fromJson (message, HashMap.class) PageId = Long.valueOf (String.valueOf (hashMap.get ("pageId"); / / query page information in the scene_page table ScenePage scenePage = scenePageDoc.findPageById (pageId); if (scenePage! = null) {/ / query whether ScenePageTpl scenePageTpl = scenePageDoc.findPageTplById (pageId) has been synchronized If (scenePageTpl = = null) {scenePageDoc.savePageTpl (scenePage); logger.info ("execute sync success pageId value is= {}", pageId); / / establish the relationship between UID and tpl} / / Delete the page data of the eqs_scene_page table scenePageDoc.removeScenePageById (pageId) }} catch (Exception e) {logger.error ("error param is = {}", message); scenePageDoc.saveSyncError (pageId); e.printStackTrace ();}

Mq optimization

@ Configurationpublic class MqConfig {@ Bean public SimpleRabbitListenerContainerFactory containerFactory (SimpleRabbitListenerContainerFactoryConfigurer configurer,ConnectionFactory connectionFactory) {SimpleRabbitListenerContainerFactory factory=new SimpleRabbitListenerContainerFactory (); / set thread pool ExecutorService service = new ThreadPoolExecutor (60, TimeUnit.SECONDS,new LinkedBlockingQueue (), new ThreadPoolExecutor.CallerRunsPolicy ()); factory.setTaskExecutor (service); / / set the number of consumer factory.setConcurrentConsumers (60); / / close ack factory.setAcknowledgeMode (AcknowledgeMode.NONE) Configurer.configure (factory,connectionFactory); return factory;}}

The next step to deploy the program

Nohup java-jar-Djava.security.egd=file:/dev/./urandom eqxiu-data-provider-0.0.5.jar-start=81947540-- end=81950540-- pageSize=3000 > / data/logs/tomcat/data-provider/spring.log & nohup java-jar eqxiu-data-consumer-0.0.5.jar > / data/logs/tomcat/data-consumer/spring.log &

However, it is found that the utilization of consumer is not high, as shown in the following figure:

After looking up the data, there are three reasons for the low consumer utilisation.

1. Too few consumers

2. The ack on the consumer side is too slow.

3. Too many consumers.

Because I set factory.setAcknowledgeMode (AcknowledgeMode.NONE); then there is no second reason, so I adjust the vm option parameters and increase the speed. Soon the consumer utilisation has been above 96%, the program has been running for less than 3 hours, and the data has been migrated.

The optimized query speed is as follows:

Thank you for your reading, the above is the content of "how to solve the problem of mongodb deep pagination", after the study of this article, I believe you have a deeper understanding of how to solve the problem of mongodb deep paging, and the specific use needs to be verified in practice. Here is, the editor will push for you more related knowledge points of the article, welcome to follow!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.