Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

What are the knowledge points of big data's OLAP system

2025-01-30 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Share

Shulou(Shulou.com)06/01 Report--

This article mainly explains "what are the knowledge points of big data OLAP system". Interested friends may wish to have a look. The method introduced in this paper is simple, fast and practical. Next, let the editor take you to learn what are the knowledge points of big data's OLAP system.

Challenges in data production

Data explosion, daily use of the latest dimension for retrospective calculation of historical data. There are the following problems in Kylin's MOLAP mode:

Historical data is refreshed every day, losing the meaning of increment.

Daily backtracking has a large amount of historical data, 1 billion + historical data.

Data calculation takes 3 hours + and storage is 1TB, which consumes a lot of computing and storage resources, and seriously affects the stability of SLA.

The actual utilization rate of a large amount of pre-calculated historical data is low, and 80% of the history traceback in actual work is concentrated in about one month, but in order to cope with all demand scenarios, the business requires to calculate the history of more than half a year.

Query for detail data is not supported.

With the introduction of MPP engine, the data is now in use.

The cost of pre-calculation of historical data is huge, and the best way is to use it, but it requires strong parallel computing power.

The implementation of OLAP has three forms: MOLAP, ROLAP and HOLAP.

MOLAP is represented by Cube, but the cost of calculation and management is high.

ROLAP needs a strong relational DB engine to support it.

For a long time, due to the limited data processing capacity of traditional relational DBMS, the ROLAP model has been greatly limited. With the mature application of distributed and parallel technology, MPP engine gradually shows strong computing power of high throughput and low latency. There are many engines known as "hundreds of millions of seconds to open", and ROLAP mode can be better extended. Considering the practical business application alone, the performance can cover many application scenarios and has the possibility of application in the case of tens of millions of magnitude association query on-site computing. For example, the ROLAP field calculation of daily data volume, the calculation of weekly and monthly trends, and the browsing of detailed data can be well handled.

Disadvantages of the MOLAP model

The application layer model is complex, according to the business needs and Kylin production needs, we need to do more model preprocessing. In this way, the utilization of the model is relatively low in different business scenarios.

The configuration process of Kylin is tedious, so it is necessary to design the configuration model and cooperate with the appropriate "pruning" strategy to achieve the balance between computational cost and query efficiency.

Since MOLAP does not support the query of detail data, in the application scenario of "summary + details", the detail data needs to be synchronized to the DBMS engine to respond to the interaction, which increases the operation and maintenance cost of production.

More pretreatment is accompanied by higher production costs.

Advantages of ROLAP mode

The design of the application layer model is simplified, and the data can be fixed at a stable data granularity. For example, the business granularity star model, at the same time, the reuse rate is also relatively high.

The business expression of the App layer can be encapsulated through the view, which reduces the data redundancy, improves the flexibility of the application, and reduces the operation and maintenance cost.

At the same time, "summary + details" is supported.

The model is lightweight and standardized, which greatly reduces the production cost.

To sum up, in the application scenarios of variable dimension, non-preset dimension and fine-grained statistics, using ROLAP mode driven by MPP engine can simplify model design, reduce the cost of pre-calculation, and support a good real-time interactive experience through strong real-time computing power.

Application scenario adaptation under Twin engines

Architecturally, MOLAP+ROLAP twin-engine mode is used to adapt to different application scenarios.

Technical tradeoff

MOLAP: through pre-calculation, provide stable slice data, realize multiple query calculation, reduce the calculation pressure of query, ensure the stability of query, and is the best path of "space for time". The de-duplication algorithm based on Bitmap is realized, which supports the real-time statistics of multiple indicators in different dimensions with high efficiency.

ROLAP: based on real-time large-scale parallel computing, the requirement of cluster is higher.

The core of MPP engine is to improve parallel computing ability by distributing data to achieve the distribution of CPU, IO and memory resources. In the current case of disk-based data storage, the larger disk IO required by data Scan and the high CPU caused by parallelism are still the shortcomings of resources. Therefore, the large-scale summary statistics of high frequency and the concurrency capacity will face great challenges, which depends on the parallel computing ability of the cluster hardware. Traditional deduplication algorithms require a lot of computing resources, and real-time large-scale deduplication is a great challenge to CPU and memory. At present, the latest version of Doris already supports the Bitmap algorithm, which can be used to solve the de-reusing application scenarios very well.

MOLAP: when the business analysis dimension is relatively fixed, and when the historical state can be used, incremental production is carried out according to time, the processing cost increases linearly, the data is processed to a thicker granularity (such as organizational unit), the amount of resulting data is reduced, and the interaction efficiency is improved. As shown in the figure above, it is a good choice to use Kylin to precalculate from model A to model B.

ROLAP: when the business analysis dimension is flexible or specific to the latest state (as in the above figure A model, the latest business organization ownership is always used to view the history), it is costly to precalculate and retrace the historical data. In this scenario, the data is stabilized at the granularity of the merchant, and the historical data is analyzed backwards through the on-site calculation to realize the current calculation, which can save the huge cost of pre-calculation and bring greater application flexibility. In this case, it is suitable for the ROLAP production mode supported by MPP engine.

Selection of MPP engine

At present, there are many open source OLAP engines that have attracted much attention, such as Greenplum, Apache Impala, Presto, Doris, ClickHouse, Druid, TiDB and so on, but they lack the introduction of practical cases, so we don't have much experience to learn from. Therefore, we combined with our own business needs, starting from the engine construction cost, and based on the company's technology ecological integration, integration, ease of use and other dimensions for comprehensive consideration, as the basis for selection, our platform department finally selected Doris, which has just entered the Apache community in 2018.

At this point, I believe you have a deeper understanding of "what are the knowledge points of big data's OLAP system?" you might as well do it in practice. Here is the website, more related content can enter the relevant channels to inquire, follow us, continue to learn!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Internet Technology

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report