The second of the "data governance that thing" series: hold the data "hukou book", the data governance must be stable! 11/01 Update SLTechnology News&Howtos

The second of the "data governance that thing" series: hold the data "hukou book", the data governance must be stable!

2025-11-01 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Shulou(Shulou.com)06/03 Report--

This article mainly starts with metadata, which is one of the foundation and core of data governance, and explains it in detail from the following angles:

Metadata concept

Distribution and collection of metadata

Some practical application scenarios of metadata. 1. What exactly is metadata?

If I say: Meta Data, it is the data that describes the data. Passersby who have no technical background may think like this when they see this "tongue twister":

Cdn.xitu.io/2019/7/31/16c45d87f2cba3c5?w=800&h=682&f=webp&s=20536 ">

To put it simply, metadata is actually equivalent to the household register of data.

What is the registered permanent residence book? It contains not only all kinds of basic description information such as personal name, age, gender, × × number, but also the blood relationship between the person and his family, such as father and son, brother and sister, and so on. All this information adds up to a comprehensive description of the person, which can also be called the person's metadata.

Similarly, if we want to clearly describe an actual data, take a table as an example, we need to know the table name, table alias, the owner of the table, the physical location of the data store, the primary key, the index, the fields in the table, the relationship between this table and other tables, and so on. All this information adds up to the metadata of this table. By comparison, we may know much more about the concept of metadata: metadata is the household register of data.

II. Metadata management

Is the core and foundation of data governance.

If you are asked to lead troops to fight a war, what information must you have now? Yes, a map of the battlefield is essential! In data governance, metadata is equivalent to the map of all data.

In this map of the data, we can know:

What data do we have?

Where is the data distributed?

What are the types of these data?

What is the relationship between the data?

What data is often referenced? Which data is unpatronized?

……

Therefore, if we do data governance, but do not master this map, it will be like a blind man touching an elephant. In the following articles, we will talk about data asset management, knowledge graph, in fact, most of them are also based on metadata. So we say: metadata is a data map within an organization, and it is the core and foundation of data governance.

Third, who is the metamodel?

Metamodel (Meta Model) is the data that describes metadata. The relationship between it and metadata and data can be described by the following diagram.

We will not discuss the concept of metamodel in depth. We only need to know the following: the data structure of metadata itself also needs to be defined and standardized, the definition and specification of metadata is metamodel, and the international standard of metamodel is CWM (Common Warehouse Metamodel, common warehouse metamodel). A mature metadata management tool needs to support CWM standard.

The following content is difficult to understand. Please read it carefully.

If you don't understand, Mr. Jiang will tutor alone backstage!

Where does the metadata come from?

In big data platform, metadata runs through the whole process of data flow on big data platform, including data source metadata, data processing process metadata, data theme database thematic database metadata, service layer metadata, application layer metadata and so on. The following figure shows the distribution range of metadata, taking a data center as an example:

The industry generally classifies metadata into the following types:

Technical metadata: database table structure, field constraints, data model, ETL program, SQL program, etc.

Business metadata: business metrics, business codes, business terminology, etc.

Manage metadata: data owner, data quality accountability, data security level, etc.

Metadata collection refers to the process of obtaining the metadata in the data life cycle, organizing the metadata, and then writing the metadata to the database. Using technical means including database direct connection, interface, log file and so on, automatic and manual collection of metadata information such as data dictionary of structured data, metadata information of unstructured data, business index, code, data processing process, etc. After the metadata collection is completed, it is organized into a structure that conforms to the CWM model and is stored in the relational database.

What can we do with metadata?

Let's take a look at a diagram of the overall functional architecture of metadata management. With metadata, what we can do is clear at a glance:

(if you don't understand, please come to the comments section and let me know)

① metadata View

Generally, the metadata is organized in a tree structure, and the metadata is browsed and retrieved according to different types. For example, we can browse the structure of the table, field information, data model, index information and so on. Through reasonable permission allocation, metadata viewing can greatly improve the sharing of information within the organization.

Consanguinity and influence Analysis of ② data

Data consanguinity and influence analysis mainly solve the problem of "what is the relationship between data". Because of its important value, some manufacturers will extract it separately from metadata management as an independent and important function. However, considering that the data consanguinity and impact analysis actually comes from metadata information, it is still described in metadata management.

Consanguinity analysis refers to the blood relationship of obtaining data, recording the source of data in the way of historical facts, the process of processing, and so on. Taking the kinship of a table as an example, consanguinity analysis shows the following information:

Data consanguinity analysis is of great value to users, for example, when finding problem data in data analysis, we can rely on consanguinity, trace back to the source of the problem data, and quickly locate the source and processing flow of the problem data. reduce the time and difficulty of analysis.

Typical application scenario of data consanguinity analysis: a business staff found that there was a quality problem in the "monthly marketing analysis" report data, so he raised an objection to the IT department. Through metadata consanguinity analysis, the technician found that the "monthly marketing analysis" report was affected by four different data tables in the upstream FDM layer, so as to quickly locate the source of the problem and solve the problem at low cost.

In addition to consanguinity analysis, there is also an impact analysis that can analyze the downstream flow of the data. When upgrading the system, if you modify the data structure, ETL programs and other metadata information, depending on the impact analysis of the data, you can quickly locate which downstream systems will be affected by the metadata modification, so as to reduce the risk brought by the system upgrade. From the above description, we can know that data influence analysis and consanguinity analysis are just the opposite, consanguinity analysis points to the upstream source of data, and influence analysis points to the downstream of data.

Typical application scenario of impact analysis: an organization modifies the field in the "FINAL_ZENT" table due to the upgrade of the business system: the length of TRADE_ACCORD is changed from 8 to 64. It is necessary to analyze the impact of this upgrade on subsequent related systems. Based on the impact analysis of the metadata "FINAL_ZENT", it is found that it has an impact on the tables and ETL programs related to the downstream DW layer. After the IT department locates the impact, it modifies the corresponding programs and table structure downstream in time to avoid problems. Thus it can be seen that the impact analysis of data is helpful to quickly lock in the impact of metadata changes and eliminate the possible problems in the bud ahead of time.

Hot and cold degree analysis of ③ data

The cold and hot degree analysis mainly carries on the statistics to the use of the data table, such as table and ETL program, table and analysis application, the relationship between the table and other tables, etc., from the point of view of access frequency and business requirements, carries on the cold and hot degree analysis of the data, and shows the importance index of the table in the way of chart.

The hot and cold degree analysis of data is of great value to users. Typical application scenarios: we observe that some data resources are idle for a long time, not called by any application, and there are no other programs to use. At this time, users can refer to the cold and hot degree report of the data, combined with manual analysis, to store the data with different cold and hot degrees, so as to make better use of HDFS resources. Or evaluate whether to offline the lost value of this part of the data, in order to save data storage space.

④ data asset map

Through the processing of metadata, applications such as data asset maps can be formed. Data asset maps are generally used to organize information at the macro level, merge and organize the information from a global perspective, and show the amount of data, data changes, data storage, overall data quality and other information. to provide reference for data management departments and decision makers.

Other applications of ⑤ metadata Management

There are other important functions in metadata management, such as: metadata change management, query the change history of metadata, compare the versions before and after the change, etc.; metadata comparative analysis, compare similar metadata; metadata statistical analysis, used to count the number of all kinds of metadata, such as the type and quantity of all kinds of data, to facilitate users to grasp the summary information of metadata. Such applications are not enumerated one by one.

6. Make a summary

Brief introduction of the author: Jiang Zhenbo, 6 years + big data governance experience, good at providing customers with scientific and reasonable data governance solutions. He has successively worked for Longtong, Ruotong Power, Puyuan Information and other companies, and has been responsible for data warehouse construction, BI, big data platform, data governance and other pre-sales consulting, with experience in government, power, manufacturing and other industries. At present, he works as a pre-sales consultant for big data platform in Qilan Technology.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.