In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-01-17 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >
Share
Shulou(Shulou.com)06/03 Report--
Group table is a high-performance storage format provided by the collector. its principle is to sort the data in advance and store it compactly in a compressed way. the advantage is that it takes up less space and can be quickly positioned in order.
However, this storage method will encounter trouble when the data is updated, because the new data is also sorted and compressed with the historical data, which often requires rewriting the entire group of tables, which is very time-consuming but has to be done.
However, there are high-performance data update methods in some scenarios, so let's take a look.
Tail update
We know that group tables allow you to modify a small amount of data. However, when the amount of modification is accumulated, it is necessary to do a reset (reorganization), otherwise it will affect the performance of the operation (because the modified part can not be compactly compressed and stored, please refer to other documents for details).
But the reset action is equivalent to rewriting the entire group table, which takes a lot of time when the group table is large. Is there a faster way?
If the modified records in the group table are recent (the key order is lower), you can use reset@q when reorganizing the group table, which can greatly reduce the reorganization time.
The principle is as follows:
As mentioned earlier, the group table data is divided into two areas: a compact and efficient text area, and a loose and inefficient supplement area. The modified data is only stored in the supplement area and does not change the text area. When reset is executed, the entire file is rewritten, merging the supplementary area data with the body area. In the case of reset@q, the group table will first find the reorganization location in the body area (the earliest modified record location in the supplement area), and then merge the data after that location with the supplement area.
For example, the group file stores data from January to December with time as the key, and the data for December has been changed. When performing reset@q, the previous November data will not be changed, only the December text data and supplementary areas will be merged, and in the reorganization process, only 12 months of data will be involved.
If the @ Q option is not used, the group table will rewrite all the January-December data during reset, involving 12 months of data, which will take much longer.
Non-temporal sorted data
Querying history by account is a very common requirement. We only need to sort the data by account to achieve better query performance, even if there is more concurrency.
For example, the following code sorts historical tax returns by account and generates a group table:
AB1=orcl.curosr@x ("select * from tax where declareTime > =? order by cardNo, declareTime", date ("2019-01-01")) / 2=file ("taxHis.ctx"). Create (# cardNo, # declareTime, tax, area, declaretype, unit, network) / New Group Table 3=A2.append (A1) / write data to Group Table
Then based on this ordered group table, query by account can achieve good performance:
AB1=file ("taxHis.ctx") / group table object 2=A1.create (). Cursor (; cardNo== "010319760818002X") / query group table
However, the production system will continue to generate new data, which is not sorted by account (generally in order of generation time). If you want to add new data to history, you need to merge and sort the two.
The group table provides the append@m function, which can automatically merge and sort the new data with the historical data, but its time cost is quite high, because even the merge sorting needs to rewrite all the historical data, which is often very large.
So, is there any good way to reduce the time it takes to merge and sort, thereby improving the performance of data updates?
You can solve this problem by using group table filegroups.
The filegroup principle is to simulate multiple isomorphic files into a single file, which can be logically used as an ordinary file, that is, it supports all the functions of the ordinary file. In particular, filegroups also support automatic merging. We can divide the data into two files: the history group table and the incremental group table. When querying, we use the filegroup composed of these two files, which is equivalent to querying a group table. Only small merging of incremental group tables is made each time it is updated regularly, which can improve the performance of daily updates. After accumulating to a certain extent, the incremental group table and the history group table will be merged.
For example, if you add 100000 tax returns per day, you can execute the following script every day to deal with the new data:
ABC1if day (now ()) = 1=file (["taxHis.ctx", "taxMon.ctx"]). Reset@m () / reforming 2=orcl.curosr@x at the beginning of the month ("select * from tax where declareTime > =? and declareTime"
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.