Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

What is the difference between data warehouse and Olap

2025-02-28 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Share

Shulou(Shulou.com)06/02 Report--

This article mainly introduces "what is the difference between data warehouse and Olap". In daily operation, I believe many people have doubts about what is the difference between data warehouse and Olap. Xiaobian consulted all kinds of materials and sorted out simple and easy operation methods. I hope to help you answer the doubts of "what is the difference between data warehouse and Olap"! Next, please follow the small series to learn together!

The big data domain system is very huge. Recently, I have been understanding the warehouse part and making some records.

First explain the concept of OLTP and OLAP, as a developer to understand OLTP, the operation object is the database, also known as OLTP database (such as Mysql), mainly used for CRUD operations, high concurrency, low latency, generally used as business data.

OLAP is an online analytical process, used for data analysis, such as data aggregation operations, which operate on large data sources and have relatively low performance requirements. The object of operation is the warehouse. Sometimes OLAP is equivalent to warehouse.

Warehouse is generally multidimensional model model, data layering, ETL processing. It has many data sources and formats, such as structured data and unstructured data.

For ETL processing, you need to understand the business very thoroughly. For example, MySQL is used as a business. For example, commodity business may have many types of tables, and after the warehouse, it may be re-modeled, such as divided into dimension tables and fact tables.

Now we are faced with two problems, the first is that ETL mechanism is very weak, basically import MySQL library into the warehouse as it is; the second business library needs to be rebuilt after the change, and the understanding of the business database is always lagging behind.

What is the use of that warehouse? Interactive queries, data analysis, data mining, BI reports can be carried out.

According to different understandings, there are also many classifications of warehouses, such as:

1: MOLAP, ROLAP, HOLAP according to modeling

MOLAP requires pre-computation to store possible query results and is suitable for analyzing stable scenarios. Kylin is the solution in this field.

ROLAP is the mainstream at present, based on relational model, built on multidimensional data model, generally through SQL can query.

2: For ROLAP: There are two solutions, one is the wide-table model, such as the now popular clockhouse; the other is the multi-table combination model, such as Presto.

3: From real-time points: divided into real-time number of warehouses and offline number of warehouses, this article mainly understands offline number of warehouses, also known as batch processing, that is, data is prepared in advance, such as Hadoop is to solve such problems.

4: For OLAP, the data processed is very large. In order to speed up the processing, there are two solutions: parallel processing (such as Mapreduce of Hadoop, Spark, or Presto of MPP architecture), and pre-calculation (such as Kylin).

How exactly do you choose?

1: We use a more conventional Hadoop, HDFS as distributed storage, Mapreduce as a parallel computing framework, but HDFS is only storage, there is no structured concept, then how to do the warehouse?

Using Hive solves two problems, first it stores table structure metadata, second sql in Hive queries automatically becomes MR parallel tasks, MR reads information from metadata, then reads data from HDFS, and finally performs operations.

Under normal circumstances, this belongs to offline data warehouse, HDFS stores the full amount of T-1 data (does not support data addition, deletion and search, only the whole file can be overwritten), use sqoop tool to import MySQL into HDFS.

MPP on Hadoop

Since the intermediate result of MR operation HDFS is still on disk, the operation is still very slow.

Presto is based on MPP architecture, making full use of the CPU capacity of each node, and putting the intermediate results into memory to reduce disk consumption.

For example, Presto, as an SQL execution engine, does not store data itself. It can directly call MySQL for calculation.

You can also call Hive, read metadata, and then manipulate HDFS data for parallel operations.

With Hive, Presto, combined with visual BI tools, can produce data reports, data analysis and mining.

Finally, a simple BI, there is a formula:

BI Platform = Data Warehouse +OLAP Services/Reports.

At this point, the study of "what is the difference between data warehouse and Olap" is over, hoping to solve everyone's doubts. Theory and practice can better match to help everyone learn, go and try it! If you want to continue learning more relevant knowledge, please continue to pay attention to the website, Xiaobian will continue to strive to bring more practical articles for everyone!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Internet Technology

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report