Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

What are the knowledge points of Fractured Mirrors?

2025-02-28 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Share

Shulou(Shulou.com)06/01 Report--

This article mainly explains "what are the knowledge points of Fractured Mirrors". The content of the explanation is simple and clear, and it is easy to learn and understand. Please follow the editor's train of thought to study and learn "what are the knowledge points of Fractured Mirrors"?

Background knowledge

PAX optimizes the cache performance of NSM, while this article focuses on disk IO. NSM is mainly aimed at scenarios with high projection rate and selection rate, while DSM is suitable for scenarios with low projection rate and selection rate.

The projectivity is the number of columns followed by the select statement, that is, the number of columns in the query. A high projection rate means a lot of check. The selection rate (selectivity) is the degree of picky. If the selection rate is high, the result set will be few.

The projection rate and selection rate are high, that is, only a small amount of data is selected, and most of the attributes of these data are taken out, which is more suitable for NSM.

Shortcomings of DSM and its implementation Optimization

It is said that it takes time for DSM to reorganize data. How much time will it take? This article gives a test result.

Use the Line-item table of the TPC-H dataset (16 attributes per row of data) for a total of 1GB data. Using NSM takes up 1.1 GB disk space, and a full table scan takes 74.5 seconds.

DSM (16 relationships, each relational table with one ID column and one attribute column) occupies 2.8GB space

In order to achieve a full table scan, each row of data needs to be reorganized, and the scanning time is 68.29s to 759.39s as the number of columns selected is from 1 to 8 columns. It's a little worse than the NSM model, and it's not much better to choose only one column.

This is contrary to what I have always said that DSM performs well when selecting a small number of columns. There is an implementation problem. The author uses one of the simplest DSM implementations to compare, using slotted page to store each pair of "ID, attribute columns", regardless of whether the attribute column is of fixed or variable length. This way of implementation is a waste of space.

In addition, each attribute needs to carry an ID, which is a waste of space. Although there is almost no cost to the disk now, taking up more space is not a problem, but taking up more space will lead to more IO, which becomes a problem. Therefore, it is necessary to reduce IO data transmission.

Aiming at some problems in the implementation of DSM, the author proposes a sparse B-Tree index and removes the redundant ID columns. On the basis of these DSM optimization, a multi-path merging algorithm based on Chunk is proposed, and the main idea is to change from single-point loading to batch loading.

The author compares the optimized single point loading algorithm, batch loading algorithm and NSM scanning time. This is basically the speed that DSM should have, and you can see that the performance of an idea is also related to its implementation. You can turn a very clever idea into a piece of shit, or you can implement a stupid algorithm very quickly.

Mirror: Raid 1

Mirroring technology is to use two disks to make a copy of the data, both of which can provide access services and do load balancing. The data storage structure of both disks is the same.

Simple version of fractured mirror

All above are talking about the foundation and implementation of DSM and NSM. In fact, this article is mainly based on the mirror image.

Because NSM and DSM are suitable for different scenarios, a simple idea is to have two images, one with NSM and one with DSM. In this way, different query loads can be distributed to different disks for execution, which is fast and which is fast.

This is the easiest way, but there is a problem, that is, the load is unbalanced, maybe the queries are partial to the NSM scenario, then the NSM disk is more busy.

Upgraded version of fractured mirror

In order to solve the problem of load balancing, an upgraded version is proposed. The following figure:

A table, on a disk, the first half of the data is stored in NSM structure, and the second half is stored in DSM. Turn it upside down on the other disk. This is a relatively simple allocation strategy, and it can also be allocated by rotation.

In this way, a query load that is all partial to NSM, for each query, assuming that the data accessed by the query is randomly distributed, the query can also be shared equally between the two disks. Load balancing is no problem.

Synchronization mode

In the original Raid 1, the insert update operation on both disks was the same, but now the physical storage is different. An update operation involving 10 attributes on a row needs only one write on NSM and 10 writes on DSM, which is greatly out of sync.

The solution is to use all the Differential file that resides in memory, record all the changes here, and merge the disk file periodically to ensure that the file is of moderate size.

To delete a record that uses bitmap, you need to merge Differential file and bitmap together when querying.

Query plan generation

After such a query comes, you can generate two optimal plans, the NSM-based optimal plan and the DSM-based optimal plan, which will make the query optimizer more complex. And two query plans need to be generated, which is equivalent to optimizing twice (optimize-twice).

Another way is to search the query plan from the bottom up, generate all possible scan operators and join operators, and then spell out an optimal one. In this way, optimize-twice is avoided and a hybrid query plan (which includes both NSM and DSM) can be generated.

Limitation

First of all, query plan generation is based on rules. The author says that optimization is avoided twice, but I think it is just a mixture of the two query plan generators and scattered to every small step. It is not simple than optimize-twice, and the advantage is that it can generate mixed plans. Due to the different underlying storage, the need to maintain two sets of query engines is relatively large.

There are only two different physical storage structures, NSM and DSM, which apply only to two replicas.

Differential file is a potential performance problem, one disk is down, and the time it takes to recover from another is not measured.

Thank you for your reading, the above is the content of "what are the Fractured Mirrors knowledge points". After the study of this article, I believe you have a deeper understanding of what Fractured Mirrors knowledge points have, and the specific use needs to be verified in practice. Here is, the editor will push for you more related knowledge points of the article, welcome to follow!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Internet Technology

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report