How to master the hierarchical architecture of data warehouse 04/27 Update SLTechnology News&Howtos

How to master the hierarchical architecture of data warehouse

2025-04-27 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Development >

Shulou(Shulou.com)06/03 Report--

This article mainly introduces "how to master the hierarchical architecture of the data warehouse". In the daily operation, I believe many people have doubts about how to master the hierarchical architecture of the data warehouse. The editor consulted all kinds of materials and sorted out the simple and easy-to-use operation methods. I hope it will be helpful for you to answer the doubts of "how to master the hierarchical architecture of the data warehouse"! Next, please follow the editor to study!

First, why should it be layered?

The main reason for layering is to have a clearer control over the data when managing the data. in detail, there are the following reasons:

Clear data structure:

Each data hierarchy has its scope so that we can more easily locate and understand when using tables.

Facilitate data consanguinity tracking:

To put it simply, what we finally present to the business is a table that can be used directly, but it comes from many sources. if there is a problem with a source table, we hope to be able to quickly and accurately locate the problem and know the scope of its harm.

Reduce repetitive development:

Standardizing data layering and developing some general middle-tier data can reduce great repeated calculation.

Simplify complex problems:

Decompose a complex task into multiple steps, and each layer only deals with a single step, which is relatively simple and easy to understand. And it is easy to maintain the accuracy of the data, when there is a problem with the data, you do not have to repair all the data, you only need to start with the problematic steps.

Mask the exception of the original data:

Shielding the impact of the business, you don't have to change the business to re-access the data.

Second, the idea of layering in data warehouses.

Data layering each enterprise can be divided into different levels according to its own business needs, but the most basic idea of layering, the data is theoretically divided into three layers, data operation layer, data warehouse layer and data service layer. Based on this basic layer, a new layer is added to meet different business needs.

Data Operation layer (ODS)

Operate data store (operational data-storage) is the layer closest to the data in the data source. The data in the data source is loaded into the ODS layer after being extracted, cleaned and transmitted, that is to say, the legendary ETL. Generally speaking, most of the data in this layer are classified according to the classification of the source business system. For example, a table in MySQL can be extracted to the source of ODS layer ODS layer data between sqoop:

Service library

Sqoop is often used to extract, for example, we extract it regularly once a day. In terms of real-time, you can consider using canal to monitor the binlog of mysql and access it in real time.

Bury the log.

The online system will enter a variety of logs, which are usually saved in the form of files. We can choose to extract them regularly with flume, or access them in real time with spark streaming or Flink. Of course, kafka will also play a key role.

Message queue

Data from ActiveMQ, Kafka, etc.

Data Warehouse layer (DW)

Data warehouse (data warehouse). Here, the data obtained from the ODS layer builds various data models according to the topic. For example, the data set with the theme of studying people's tourism consumption can be combined with the boarding and travel information of airlines and the credit card records of UnionPay system to generate data sets. Here, we need to understand four concepts: dimension, Fact, Index, and Granularity.

DW data layering, DWD,DWB,DWS from bottom to top

The DWD:data warehouse detail detail data layer is the isolation layer between the business layer and the data warehouse. DWB:data warehouse base basic data layer, which stores objective data, is generally used as the middle layer, and can be regarded as the data layer of a large number of indicators. The DWS:data warehouse service service data layer, based on the basic data on DWB, is integrated to analyze the service data of a topic domain, usually a wide table.

Data Services layer / Application layer (ADS):

Application Data Service (Application data Service). This layer mainly provides data for data products and data analysis, which is generally stored in ES, MySQL and other systems for online systems, and may also exist in Hive or Druid for data analysis and data mining. For example: we often talk about report data, or that kind of wide table, usually put here.

III. Ali data Warehouse hierarchical Architecture

ODS data preparation layer

Features:

The ODS layer is the data warehouse preparation area, which provides the basic raw data for the DWD layer and reduces the impact on the business system.

Modeling methods and principles:

Incremental extraction from the business system, retention time determined by business requirements, periodic storage of divisible tables, data conversion without cleaning, consistent with the business system data model, divided by subject logic

DWD data detail layer

Features:

Provide source detail data for DW layer, provide long-term precipitation of business system detail data, and provide historical data support for future expansion of analysis requirements.

Modeling methods and principles:

The data model is consistent with the ODS layer, no cleaning conversion processing, additional data business date fields can be added to support data rerunning, tables can be divided by year, month and day, and merge processing can be done with incremental ODS layer data and previous day DWD related tables.

DW (BBAMA S) data aggregation layer

Features:

Provide fine-grained data for DW and ST layers and refine them into DWB and DWS

DWB is converted according to DWD detail data, such as dimension transfer agent key, ID card cleaning, clear source of member registration, field merging, null value processing, dirty data processing, IP clear conversion, account balance cleaning, fund source cleaning, etc.

DWS is a high-granularity summary aggregation based on DWB layer data by each dimension ID, such as by transaction source and transaction type.

Modeling methods and principles:

Aggregate, aggregate and add derivative facts

Associated with the fact table of other topics, the DW layer may cross the topic domain

DWB maintains low granularity summary processing data, while DWS maintains high granularity summary data.

The data model may adopt anti-paradigm design, merge information, and so on.

Data Market (data Mart) layer

Features:

It can be some wide tables. According to the DW layer data, some fact fields that need to be queried are summarized and stored as separate columns according to various dimensions or a combination of multiple dimensions.

Meet some specific query and data mining applications

Apply bazaar data storage

Modeling methods and principles:

Minimize computation during data access (optimized retrieval)

Dimensional modeling, star model

Sub-table storage

ST data Application layer (ADS layer)

Features:

ST level applications and analysis requirements to users, including front-end reports, analysis charts, KPI, dashboards, OLAP, thematic analysis, etc., for end-result users

Suitable for OLAP, report model, such as ROLAP,MOLAP

Online transaction processing OLTP, online analytical processing OLAP.

OLTP is the main application of traditional relational database, which mainly deals with basic and daily transactions, such as bank transactions.

OLAP is the main application of data warehouse system, supporting complex analysis operations, focusing on decision support, and providing intuitive and easy-to-understand query results.

The users of online analytical processing are professional analysts and management decision-makers in the enterprise. when they analyze the data of business operation, it is a natural mode of thinking to examine the measurement indicators of the business from different angles. For example, the analysis of sales data may take into account many factors, such as time period, product category, distribution channel, geographical distribution, customer base and so on.

According to the coarse-grained fact table after aggregation and statistics in the DW layer

Modeling methods and principles:

Keep the amount of data small

Dimensional modeling, star model

Each dimension surrogate key + measure

Added data business date field to support data rerun

Undivided table storage

At this point, the study on "how to master the hierarchical architecture of data warehouse" is over. I hope to be able to solve your doubts. The collocation of theory and practice can better help you learn, go and try it! If you want to continue to learn more related knowledge, please continue to follow the website, the editor will continue to work hard to bring you more practical articles!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.