Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

What is the implementation process of OneData model?

2025-03-26 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Servers >

Share

Shulou(Shulou.com)05/31 Report--

The content of this article mainly focuses on how the implementation process of the OneData model is narrated. The content of the article is clear and clear. It is very suitable for beginners to learn and is worth reading. Interested friends can follow the editor to read together. I hope you can get something through this article!

1. The process of model implementation commonly used in the industry

Kimball model implementation process

Kimball dimensional modeling mainly discusses the whole process of requirements analysis, high-level model, detailed model and model review.

The construction of a dimensional model generally goes through three stages: the first stage is the high-level design period, which defines the scope of the business process dimensional model and provides a technical and functional description of each star pattern; the second stage is the detailed model design period. add attributes and measurement information to each star model The third stage is to review, redesign and verify the model, and the fourth stage is to generate detailed design documents and submit ETL design and development.

High-level model: the direct output goal of the high-level model design phase is to create a high-level dimensional model diagram, which is a graphical description of the dimension table and fact table in the business process. Determine the dimension table to create an initial attribute list and create proposed metrics for each fact table.

Detailed model: the detailed dimensional modeling process is to fill the missing information for the high-level model, solve design problems, and constantly test whether the model can meet business requirements and ensure the completeness of the model. Determine the attributes of each dimension table and the measurements of each fact table, and determine the location and definition of the information source, and determine how the attributes and metrics fill in the preliminary business rules of the model.

Model review, redesign and verification: in this stage, we mainly summon relevant personnel to review and verify the model, and redesign the detailed dimensions according to the review results.

Submit ETL design and development: finally, complete the detailed model design document, submit it to ETL developers, enter the ETL design and development phase, and ETL staff complete the design and development of the physical model.

The above content is mainly quoted from the The Data Warehouse Lifecycle Toolkit of Ralph Kimball, etc., please refer to the original work for details.

Inmon model implementation process

Inmon positioned the data model as an intelligent roadmap to other parts of the data warehouse. Because the construction of data warehouse is not achieved overnight, in order to coordinate the work of different people and adapt to different types of users, it is very necessary to establish a roadmap-data model to describe how the various parts of the data warehouse are combined.

Inmon divides the model into three layers, namely ERD (Entity Relationship Diagram) layer, DIS (Data Item Set) layer and physical layer (Physical Model).

The ERD layer is the highest layer of the data model, which describes the entities or subject domains in the company's business and the relationship between them; the DIS layer is the middle layer, which describes the relationship among keywords, attributes and detail data in the data model; the physical layer is the lowest layer of data modeling, and this layer describes the physical characteristics of the data model.

Inmon suggests that the spiral development method should be adopted to build the data warehouse model, and the requirements should be accomplished iteratively. However, a unified ERD model is needed to integrate the results of each iteration. ERD model is a highly abstract data model, which describes the complete data of the enterprise. Each iteration completes a subset of the ERD model, which is implemented by DIS and the physical data model.

The above content is mainly quoted from the Building the Data Warehouse of Inmon. Please refer to the original work for details.

Other model implementation process

In practice, the following hierarchical division of data warehouse model is often used, which has some similarities with the model implementation theory of Kimball and Inmon, but does not involve specific model expression.

Business modeling, the generation of business model, mainly to solve the business level of decomposition and programming.

Domain modeling, generating domain model, mainly abstracting the business model and generating domain conceptual model.

Logical modeling, the generation of logical model, mainly the conceptual entities of the domain model and the relationship between entities are logicalized at the database level.

Physical modeling, the generation of physical models, mainly to solve some specific technical problems such as the physical, chemical and performance of logical models for different relational databases.

2.OneData implementation process

Focus on how to use the OneData system and supporting tools to implement the model construction of big data system, which will be explained by Alibaba's specific business in the explanation.

Guidelines

First of all, in the construction of big data data warehouse, full business research and demand analysis should be carried out. This is the cornerstone of data warehouse construction. The adequacy of business research and demand analysis directly determines the success of data warehouse construction. Secondly, the overall data architecture is designed, which mainly divides the data according to the data domain; according to the dimensional modeling theory, the bus matrix is constructed, and the business process and dimensions are abstracted. Thirdly, the report requirements are abstracted to sort out the relevant index system, and OneData tools are used to complete the index specification definition and model design. Finally, there is code development and operation and maintenance. This article will focus on the steps before (including) the design of the physical model.

Implement Workflow

Data research-business research: the whole Ali Group covers e-commerce, digital entertainment, navigation (Gaode), mobile Internet services and other areas. Each field also covers a number of business lines, such as e-commerce field covers category C (Taobao, Tmall, Tmall International) and Class B (Alibaba Chinese station, international station, Express) business. Whether the data warehouse should cover all business areas, or whether each business area should be built independently, the business lines in the business domain are also faced with this problem. Therefore, in order to build big data data warehouse, we need to understand what business areas and lines of business have in common and differences, and which business modules each business line can be subdivided into. What is the specific business process of each business module? The adequacy of business research will directly determine the success of data warehouse construction.

In Alibaba, data warehouses are generally built independently in various business areas, and the business lines in the business areas are unified and centralized because of their business similarity and business relevance.

Data research-demand research: you can imagine that without considering the data needs of analysts and business operators, building a data warehouse based on business research is undoubtedly tantamount to working behind closed doors. Knowing the business of the business system does not mean it can be implemented. What we need to do now is to collect the needs of data users. We can go to analysts and business operators to find out what their data demands are. At this time, it is more about reporting requirements.

There are two ways of demand research: one is to know the needs according to the communication with analysts and business operators (e-mail, IM); the other is to study and analyze the existing reports in the report system. After the demand research and analysis, it is clear what the data will be made of. In many cases, the data warehouse team is driven by specific data requirements to understand the business data of the business system, and there is no strict sequence between the two.

For example: analysts need to know the transaction amount of the first-level category of Taobao (Taobao, Tmall, Tmall International). When we know this requirement, we need to analyze what the summary is based on (dimension) and what (measure), where the category is the dimension and the amount is the measure, and how should the detailed data and summary data be designed? Is this a public report? Do you need to precipitate into the summary table or summarize it in the report tool?

Architecture design-data domain partition: a data domain is a collection that abstracts business processes or dimensions for business analysis. Business processes can be summarized as inseparable behavioral events, such as placing orders, payments, and refunds. In order to ensure the vitality of the whole system, the data domain needs to be abstracted and maintained and updated for a long time, but it is not easy to change. When dividing the data domain, it can not only cover all the current business requirements, but also be included in the existing data domain or expand the new data domain without influence when the new business enters.

Architecture design-build the bus matrix: after full business research and requirements research, it is necessary to build the bus matrix. Two things need to be done: identify the business processes under each data domain, which dimensions the business processes are related to, and define the business processes and dimensions under each data domain.

Normative definition: the normative definition mainly defines the index system, including atomic indicators, modifiers, time periods and derived indicators.

Model design: model design mainly includes the standard definition of dimensions and attributes, the model design of dimension table, detail fact table and summary fact table. For a detailed explanation of the relevant practice, please refer to the following chapters.

Thank you for your reading. I believe you have a certain understanding of "what is the implementation process of the OneData model?" go ahead and practice it. If you want to know more about it, you can follow the website! The editor will continue to bring you better articles!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Servers

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report