Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

Why should the data warehouse be layered in the database?

2025-03-17 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Servers >

Share

Shulou(Shulou.com)05/31 Report--

This article mainly introduces why the data warehouse should be layered in the database, which has a certain reference value. Interested friends can refer to it. I hope you will gain a lot after reading this article. Let the editor take you to know it.

Why layer the data warehouse:

A) trade space for time, and improve the user experience (efficiency) of the application system through a lot of preprocessing, so there will be a lot of redundant data in the data warehouse.

B) if there is no layering, if the business rules of the source business system change, it will affect the whole data cleaning process, and the workload is huge.

C) the process of data cleaning can be simplified through hierarchical data management, because dividing the original one-step work into multiple steps is equivalent to splitting a complex work into several simple tasks, turning a large black box into a white box, and the processing logic of each layer is relatively simple and easy to understand, so that it is easier for us to ensure the correctness of each step when the data goes wrong. Often we only need to adjust a certain step locally.

Data warehouse can be divided into four layers: ODS (temporary storage layer), PDW (data warehouse layer), MID (data Mart layer) and APP (application layer).

ODS layer:

For the temporary storage layer, it is the temporary storage area of the interface data in preparation for the next step of data processing. Generally speaking, the data of the ODS layer is isomorphic to that of the source system, and the main purpose is to simplify the work of subsequent data processing. The data granularity of the ODS layer is the finest in terms of data granularity. There are usually two types of tables in the ODS layer, one for storing the data that currently needs to be loaded, and one for storing the processed historical data. Historical data generally need to be cleared after 3-6 months to save space. However, different projects should be treated differently. If the amount of data in the source system is small, it can be retained for a longer time, or even saved in full.

PDW layer:

For the data warehouse layer, the data in the PDW layer should be consistent, accurate and clean, that is, the data after cleaning the source system data (removing impurities). The data in this layer generally follows the third normal form of the database, and its data granularity is usually the same as that of ODS. In the BI layer, all the historical data in the PDW system is saved, such as data that has been saved for 10 years.

MID layer:

For the data Mart layer, this layer of data is organized by topics, usually star-shaped or snowflake-structured data. In terms of data granularity, the data at this layer is at the mild summary level, and there is no more detailed data. In terms of the time span of the data, it is usually a part of the PDW layer, and the main purpose is to meet the needs of user analysis, while from the perspective of analysis, users usually only need to analyze the data in recent years (such as the data of the last three years). In terms of the breadth of the data, it still covers all business data.

APP layer:

For the application layer, this layer of data is constructed entirely to meet specific analysis needs, and is also star-shaped or snowflake structure data. Highly aggregated data in terms of data granularity. In terms of the breadth of data, it does not necessarily cover all business data, but a true subset of MID layer data, which in a sense is a repetition of MID layer data. In extreme cases, a model can be built for each report in the APP layer to support it. The standard layering of the data warehouse to achieve the purpose of exchanging space for time is only a suggested standard, and the actual implementation needs to determine the layering of the data warehouse according to the actual situation, and different types of data may also adopt different layering methods.

-[supplement]

Data cache layer:

The database layer used to store the original data provided by the interface. The table structure of this layer is basically the same as the source data, and the data storage time depends on the amount of data and the project. If the amount of data is large, only recent data can be stored. Back up the historical data. The purpose of this layer is to transfer and back up data.

Core data layer:

The data in this layer is integrated to a certain extent on the basis of the data cache layer, which is called data Mart, and it is still a relational model in storage. The purpose of this layer is to carry out the necessary data integration in preparation for the next step of the multidimensional model.

Analyze the application layer:

The data in this layer is the multi-dimensional model data constructed according to the needs of business analysis. The data can be directly used for analysis and presentation.

Note: the division of the data hierarchy can be tailored to the needs of the actual project. If the business is relatively simple and independent, you can merge the core data layer with the analysis application layer. In addition, the data of the analysis application can come from the multi-dimensional model data, the relational model data or even the original data.

Thank you for reading this article carefully. I hope the article "Why layering the data Warehouse in the Database" shared by the editor will be helpful to everyone. At the same time, I also hope that you will support and pay attention to the industry information channel. More related knowledge is waiting for you to learn!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 224

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Servers

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report