In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-04-17 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Development >
Share
Shulou(Shulou.com)06/03 Report--
This article introduces the relevant knowledge of "why data warehouse should be layered". In the operation process of actual cases, many people will encounter such difficulties. Next, let Xiaobian lead you to learn how to deal with these situations. I hope you can read carefully and learn something!
Why Tiered Data Warehouses?
Exchange space for time, improve user experience (efficiency) of application system through a lot of preprocessing, so there will be a lot of redundant data in data warehouse;
If there is no layering, if the business rules of the source business system change, the entire data cleaning process will be affected, and the workload will be huge.
Through data hierarchical management can simplify the process of data cleaning, because the original step of the work divided into multiple steps to complete, equivalent to a complex work into a number of simple work, a large black box into a white box, each layer of processing logic are relatively simple and easy to understand, so that we can easily ensure the correctness of each step, when the data error occurs, we often only need to adjust a certain step.
Data warehouse standard can be divided into four layers: ODS (temporary storage layer), PDW (data warehouse layer), MID (data mart layer), APP (application layer)
ODS layer:
The temporary storage layer is a temporary storage area for interface data to prepare for subsequent data processing. Generally speaking, ODS layer data and source system data are isomorphic, and the main purpose is to simplify the work of subsequent data processing. The ODS layer has the finest data granularity in terms of data granularity. ODS layer tables usually consist of two classes, one for storing the data currently to be loaded and one for storing processed historical data. Historical data generally needs to be cleaned up after 3-6 months to save space. However, different projects should be treated differently. If the amount of data in the source system is not large, it can be kept for a longer time or even in full.
PDW Layer:
For the data warehouse layer, the data of PDW layer should be consistent, accurate and clean data, that is, the data after cleaning (removing impurities) of the source system data. The data at this level generally follows the third normal form of the database, and its data granularity is usually the same as that of ODS. All historical data in the BI system is stored in the PDW layer, for example, 10 years of data.
MID Layer:
A data mart layer. This layer organizes data thematically, usually in star or snowflake structures. In terms of data granularity, the data at this level is at a light summary level, and there is no detailed data. From the time span of the data, it is usually part of the PDW layer, the main purpose is to meet the needs of user analysis, and from the analysis point of view, users usually only need to analyze the data in recent years (such as the data in the past three years). In terms of data breadth, all business data is still covered.
APP Layer:
For the application layer, this layer of data is completely to meet the specific analysis needs of the data structure, but also star or snowflake structure of the data. Highly aggregated data in terms of data granularity. In terms of data breadth, it does not necessarily cover all the traffic data, but a true subset of the MID layer data, in a sense, a repetition of the MID layer data. In extreme cases, a model can be built at APP layer for each report to support it. To achieve the purpose of exchanging space for time, the standard layering of data warehouse is only a standard of recommendation nature. In actual implementation, the layering of data warehouse needs to be determined according to actual conditions. Different types of data may also adopt different layering methods.
---[Supplement, there are also three layers]
Data cache layer:
The database layer used to store the original data provided by the interface party. The table structure of this layer is basically consistent with the source data. The storage time of data depends on the size of data and project conditions. If the data volume is large, only recent data can be stored and historical data can be backed up. The purpose of this layer is the transit and backup of data.
Core data layer:
The data in this layer is integrated to a certain extent on the basis of the data cache layer, which is called data mart, and the storage is still a relational model. The purpose of this layer is to perform the necessary data integration in preparation for the next multidimensional model.
Analysis application layer:
The data of this layer is multidimensional model data constructed according to the needs of business analysis. The data can be used directly for analytical presentation.
Note: The division of data hierarchy can be tailored according to actual project needs. If the business is relatively simple and independent, the core data layer can be merged with the analysis application layer. In addition, the data for analysis applications can come from multidimensional model data, relational model data or even raw data.
"Why data warehouse data stratification" content introduced here, thank you for reading. If you want to know more about industry-related knowledge, you can pay attention to the website. Xiaobian will output more high-quality practical articles for everyone!
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.