Which two categories can big data's OLAP system be divided into? 04/21 Update SLTechnology News&Howtos

Which two categories can big data's OLAP system be divided into?

2025-04-21 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Shulou(Shulou.com)06/01 Report--

This article mainly introduces "which two categories can big data OLAP system be divided into". In daily operation, I believe many people have doubts about which two types of big data OLAP system can be divided into. The editor consulted all kinds of materials and sorted out simple and easy-to-use operation methods. I hope it will be helpful to answer the doubts of "big data OLAP system can be divided into which two categories". Next, please follow the editor to study!

Open source big data OLAP components can be divided into two categories: MOLAP and ROLAP. ROLAP can be subdivided into two categories: MPP database and SQL engine. The SQL engine can be subdivided into SQL engine based on MPP architecture and SQL engine based on general computing framework:

MOLAP generally optimizes data storage and performs partial precomputation, so query performance is the highest. However, there are usually restrictions on query flexibility.

The MPP database is a complete database, and usually the data needs to be imported into it to complete the OLAP function. MPP database can optimize the data distribution when the data is stored in the database. Although the efficiency of data storage has declined to a certain extent, it is of great help to improve the query performance in the later period. MPP database can provide flexible ad hoc query capability, but generally there is a certain limit to the amount of query data, which can not support the query with a large amount of data.

SQL engine only provides the ability to execute SQL, and is generally not responsible for data storage. It can usually interface with a variety of data storage, such as HDFS, HBase, MySQL and so on. Some also support federated query capabilities, which can jointly analyze multiple heterogeneous data sources. Among the SQL engines, the SQL engine based on MPP architecture generally has special optimization for online query scenarios, so the end-to-end query performance is generally higher than that of the SQL engine based on the general computing framework, but it is inferior to the SQL engine based on the general computing framework in terms of fault tolerance and data volume.

In short, it can be said that no OLAP system can be perfect in terms of processing scale, flexibility and performance at the same time. Users need to make choices and choices based on their own needs.

Contrast difference

SparkSQL is another famous SQL engine in Hadoop. It uses Spark as the underlying computing framework and Spark uses RDD as the working set of distributed programs. It provides a limited form of distributed shared memory. In a distributed shared memory system, applications can read and write to anywhere in the global address space, while RDD is read-only and can only be created, transformed and evaluated. This kind of memory operation greatly improves the computing speed. The performance of SparkSql is worse than that of other components, and the query performance of multi-form and single table is not outstanding.

Impala officially claims that its computing speed is a major advantage. In the actual test, we also found that its multi-table query performance is similar to that of presto, but the single-table query is not as good as presto. And Impala does not support a lot of places, such as: does not support update, delete operations, does not support Date data types, does not support ORC file format and so on, so we query using parquet format for query, and Impala in the query takes up a lot of memory.

The overall performance of Presto is better than other components, not only in terms of query performance, supported data sources and data formats, but also in single-table queries and multi-table queries. Because Presto is completely memory-based parallel computing, presto takes up a lot of memory when querying, but it is found to be less than Impala. For example, multi-table join requires a lot of memory, and Impala takes up more memory than presto.

HAWQ absorbs the advanced cost-based SQL query optimizer and automatically generates an execution plan to optimize the use of hadoop cluster resources. HAWQ uses Dynamic pipelining technology to solve this key problem. Dynamic pipelining is a parallel data flow framework that accelerates Hadoop queries using linear scalability, the data is stored directly on HDFS, and its SQL query optimizer has carefully optimized the performance characteristics of HDFS-based file systems. However, we find that HAWQ is worse than Presto and Impala in multi-table query, and it is not suitable for the complex aggregation operation of single table, and the performance of single table test is much worse than that of the other four components, and there are many problems in building hawq environment.

ClickHouse as the fastest computing framework of all open source MPP computing frameworks, its performance is very exciting when querying tables with multiple columns and a large number of rows, but when doing join with multiple tables, its performance is not as good as that of single-width table queries. The performance test results show that ClickHouse has a great performance advantage in single-table query, but the performance in multi-table query is relatively poor, which is not as good as presto, impala and hawq.

As a relational database product, GreenPlum is mainly characterized by fast query speed, fast data loading speed and fast batch DML processing. And the performance can increase linearly with the addition of hardware, and has a very good scalability. Therefore, it is mainly suitable for analysis-oriented applications. For example, building an enterprise ODS/EDW or a data Mart, GREENPLUM is a good choice.

At this point, the study on "which two types of big data OLAP system can be divided into" is over. I hope to be able to solve your doubts. The collocation of theory and practice can better help you learn, go and try it! If you want to continue to learn more related knowledge, please continue to follow the website, the editor will continue to work hard to bring you more practical articles!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.