In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-01-15 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >
Share
Shulou(Shulou.com)06/02 Report--
This article focuses on "how to quickly deal with a large amount of data in the database", interested friends may wish to take a look. The method introduced in this paper is simple, fast and practical. Let's let the editor take you to learn how to quickly deal with a large amount of data in the database.
Background
Merge hundreds of tables with the same data structure (represented by Tn) into one table (represented by C)
The distribution of data in T-table is very uneven, ranging from single digits to hundreds of thousands.
There is no business association between T tables.
C table structure adds several fields to T table structure, so INSERT INTO (SELECT * FROM) cannot be used.
The total amount of data is about 3 million. Through the single process test, the processing speed is about 500 seconds, and the estimated time consumption is about 100min.
target
Maximize the data processing speed and reduce the time-consuming to about 10min, when the writing speed of the C table is about 5000 seconds.
Solution evolution scenario one
Because there is no business association between T tables, each table can be processed separately.
The T table is sorted by the amount of data, and each process processes N tables to balance the load of each process as much as possible.
The existing problems: the data volume distribution of T table is extremely uneven, there are several tables with data volume of about 700000, the final time is about (700000 / 500s), the bottleneck problem is serious.
Option 2
On the basis of scheme 1, the problem of large table bottleneck can be solved by parallel processing with the dimension of table + data.
The problem: the code implementation is complex and needs to be considered
The amount of data per T table
Segmentation of T table with large amount of data
Avoid duplicate data processing
Option 3
With the help of the pub/sub mechanism of Redis, the separation of production and consumption is realized.
The production side is responsible for publishing the table name + ID of the T table to the same number of channel,channel and processes.
On the consumer side, each process subscribes to a different channel, reads the table name + ID, and writes the data corresponding to the table name + ID to the C table.
Option 4
Is a variant of scheme 3, with the help of Redis's List to achieve the separation of production and consumption.
The production side is responsible for writing the table name + ID of the T table to List.
The consumer reads the List and writes the data corresponding to the table name + ID to the C table.
Compared with the third scheme, the advantage of this scheme is that the code logic is relatively simple, and neither the production side nor the consumer side needs to do load balancing. Consumers can do more than they can, and multiple consumption processes finish their homework simultaneously.
Implementation details
Finally, plan 4 is adopted.
Production end
Read the T-table data in turn, and write the table name + ID to List. It should be noted that List supports batch writes, writing 100 pieces of data at a rate of about 50000 paces at a time.
Consumer end
The consumption speed of a single process is about 300amp s, and the processing speed can reach about 3000max s for 10 consumption processes. If the write speed of the database allows, you can appropriately increase the number of consumption processes.
At this point, I believe you have a deeper understanding of "how to quickly deal with a large amount of data in the database". You might as well do it in practice. Here is the website, more related content can enter the relevant channels to inquire, follow us, continue to learn!
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.