Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

[data Jiang Tang] issue 40: multiplier Segmentation technique

2025-03-31 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Database >

Share

Shulou(Shulou.com)06/01 Report--

The block segmentation scheme can meet the four goals we have set. However, in addition to the hassle of dealing with block marking, this approach is not very suitable for storage.

After the data is stored separately by column, the synchronization of each column must be ensured when segmenting, that is, the segmentation point of each column corresponds to the column of the same record, otherwise the data will be misplaced. The width of each column is different, the same size of the block in the storage of different column values, can hold the number is different, continue to segment by the block can not guarantee synchronization.

To segment the columns synchronously, you need to segment by the number of records, but instead of using a fixed size block, you need a block index. If the data is no longer appended, a fixed-length index can be established, but as the data is constantly appended, the index will grow dynamically. At this time, it is costly to either rewrite all the data each time the data is appended to ensure the continuity of the index, or to use some complex mechanism to deal with discontiguous indexes.

At present, the commonly used column storage segmentation in the industry is the block scheme: the data is divided into several blocks, the column storage is in the block, and the segmentation is based on the block. The number of chunks should be enough to ensure average segmentation, and the chunks should be large enough to have an effect, which is a contradiction, and it is only appropriate when there is a large amount of data. Moreover, this block is divided by the number of records, can not be fixed size, we need the above-mentioned block index, when the data is constantly added, the index will become larger dynamically, the continuity can not be guaranteed.

In order to solve these problems, we design a multiplier segmentation scheme.

Set aside a fixed-length index area, you can save N location information, N is a fixed number, such as 1024. We use the I bit of the index area to represent the I location saved in it.

There is no record in the initial state. After adding the first record, fill in the position where the record (called record 1) was written into the storage (such as a file) in the first place of the index area; after adding the second record, fill in the position of record 2 in the second position of the index area. After adding the section N record, fill in the position of record N in the index area position N.

The I bit of the index area can be regarded as a block, corresponding to all records (including head and tail) from the record pointed to by the I bit content to the record pointed to by the I + 1 bit content. In this round of adding leaves, there is only one record in each segment.

When we continue to add records, there is no room in the index area. At this time, we do such an operation: bit 1 is reserved, bit 2 is filled with the contents of the original number 3, and bit 3 is filled with the contents of bit 5. Fill in the contents of the original position 2i-1 in bit I. This lasts until you fill in the original 2*N/2-1, that is, N-1, in position Namp 2, and then clear the contents of positions N / 2 from 1 to N.

This operation is equivalent to merging Block 1 and Block 2 into Block 1, Block 3 and Block 4 into Block 2.. Block 2i-1 and block 2i are merged into block I, and these blocks are composed of two records, and the data records are written continuously, so that the merged block is still composed of continuous records.

Then, emptying the second half of the number bit is equivalent to vacating some empty blocks. if you continue to add records, you will use the next number bit every time you write 2 records, that is, fill the position of record Nasty 1 into the position of Number 1, record Numb2 to continue writing, and record Numb3 to fill in bit 2, and record Numb4 to continue to write.

After adding and filling all the positions, do the merge action again, doubling the record number of each block to 4. At the same time, the second half of the digits are vacated, and when additional data is added, each bit needs to be added by 4 records before the next bit is used. And so on and on.

Under this mechanism, there are 2 to N available blocks at any time. As long as N is large enough (1024 is basically enough), a higher average degree of segmentation can be achieved. Each segment consists of continuous blocks, which are composed of continuous and compact records, which ensures goal 3, and the above algorithm process has solved goal 4. Moreover, the traversal process does not need to deal with tags like the fixed block scheme, but can be read continuously from the start position of the segment to the end position, and the action is very simple.

The multiplier segmentation scheme is based on the number of records, so it is also suitable for storage. After each column appends data in this way, the segmentation points always fall on the same record for each column, and there is no dislocation. Moreover, the data of each column is always continuous and there is no break point. Unlike the block method, it can only be continuous within the block, and it will not face the contradiction between the size and quantity of the block.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Database

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report