Principles of distributed and highly available metadata acquisition 05/06 Update SLTechnology News&Howtos

Principles of distributed and highly available metadata acquisition

2026-05-06 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Database >

Shulou(Shulou.com)06/01 Report--

Introduction:

Metadata collection is the core part of metadata products, how to improve the collection efficiency is a matter that needs to be carefully considered, not only to maintain stability but also to keep up with the development trend of mainstream technology. Metadata products from the initial centralized WEB application system to now popular distributed, micro-service system architecture, the original metadata collection efficiency has been unable to meet the needs of applications.

Table of contents:

1. Principle of metadata acquisition

two。 Distributed acquisition strategy

3. Application of distributed acquisition Strategy

1. Principle of metadata acquisition

If we want to collect metadata, we must first understand what metadata is, where it is stored, and why it is collected.

The popular interpretation of metadata MetaData is to describe the data of data. In fact, all the other information / data needed to maintain the operation of the whole system can be called metadata except those business data directly read and written by business logic. For example, the Schema, Table, Column information of the database, the consanguinity of the task, the permission mapping relationship between the user and the script / task, and so on.

Taking big data platform as an example, metadata runs through the whole process of data flow on big data platform, including data source metadata, data processing process metadata, data theme database thematic database metadata, service layer metadata, application layer metadata and so on.

The key to data governance is to collect information, obviously, without data, there is no way to analyze, and it is impossible to effectively manage and improve the data link of the platform. So one of the most important functions of the metadata management platform is the collection of information. as for what information is collected, it depends on the needs of the business and the target problems that we need to solve.

How to collect metadata?

Metadata collection refers to the process of obtaining the metadata in the data life cycle, organizing the metadata, and then writing the metadata to the database.

The acquisition methods of metadata from different sources are also different, including database direct connection, interface, log files and other technical means to automate and manually collect metadata information such as data dictionary of structured data, metadata information of unstructured data, business indicators, codes, data processing process, etc. After the completion of metadata collection, it is organized into a structure that conforms to the CWM model. Stored in a relational database.

two。 Distributed acquisition architecture

Nowadays, the timeliness of metadata collected by metadata management tools is getting higher and higher. Our metadata management tools will manage metadata from many sources, configure many collection tasks to collect regularly, and how to complete collection tasks efficiently. It affects the timeliness of metadata stored by metadata management tools. Our original acquisition task strategy is a single acquisition program to execute the acquisition task serially, and the collection efficiency of this strategy is very low. In order to improve the collection efficiency, we use multiple acquisition programs to execute the acquisition task concurrently.

The common metadata management tool architecture is the traditional centralized WEB application architecture, where all the functional modules are concentrated in one application.

3. Application of distributed acquisition Architecture

When we do data governance in a securities company, we find that the customer's network architecture is more complex, and its network architecture is probably divided into three layers: business system layer, data acquisition layer and data storage layer. Http://m.qd8.com.cn/yiyao/xinxi21_3710011.html

The business system is distributed in different regions of the business system layer, such as A business system in Beijing, B business system in Shanghai, C business system in Guangzhou and so on. If we want to access the database of each business system, we can only access it through the proxy IP of the data acquisition layer. The IP address network segments of the business system agents in different regions are also different. The network segments of the data acquisition layer cannot be connected, and the data storage layer can be directly connected with all the network segments of the data acquisition layer.

At present, the architecture of metadata is divided into two parts: application and collection service, and there is an one-to-one relationship between application and collection. In view of this network situation, we need to adjust the architecture of metadata products.

First, change the metadata application program and collection service to one-to-many mode, so that we need a collection service management module, which can maintain (add, delete, modify) the information (IP, port) of the collection service, and map the collected target data source to the collection program service. A target data source can be configured with an active and standby collection service. After the main collection service fails, The collection work can be continued through the preparation collection service. How to treat infertility in Zhengzhou: http://rgsj.zzfkyy120.com/

The collection service management module should consider the ease of operation and applicability, such as: viewing the operation of the collection service, setting the default collection service and so on.

Second, the metadata collection task is adjusted to be executed in parallel. Now the step of collecting metadata is to obtain metadata > enter temporary table > compare with formal table, update metadata ID, get metadata change information > put metadata and change information into formal table.

The main difficulty in adjusting the collection task to parallel execution is how to cancel the temporary table, because there is only one temporary table in the metadata storage database. Only after waiting for the current collection task to be completed and emptying the temporary table, can the next collection task be carried out.

The purpose of a temporary table is:

Update metadata ID and find new, modified and deleted metadata. When collecting metadata, random UUID will be generated for each metadata as metadata ID. When comparing with the official table, if a metadata has been stored before, you need to update the ID in the temporary table of the metadata to the ID in the official table.

Measures to cancel temporary watches:

1. We choose the string generated by MD5 encryption of the metadata encoding, metadata type and metadata parent path as the ID of the metadata, so that the ID of the metadata is fixed and does not need to be compared with the official table.

2. You can find out which metadata is added or deleted by going to the formal table query through the metadata ID. How about Jiaozuo Gastrointestinal Hospital: http://jiaozuo.laoke.com/neike/251469256x.html/

We use the string generated by MD5 encryption of all the attribute values of the metadata as the attribute ID of the metadata, so we can know whether the metadata has been modified by comparing the attribute ID of the metadata.

In this way, we can cancel the temporary table, compare the metadata with the formal table data in the collection service program, get the changed metadata, and write the metadata record directly to the formal table in the database. Metadata collection tasks can also be performed in parallel.

We deploy acquisition services in each network segment of the data acquisition layer, so as to achieve high concurrent metadata collection. The advantages of this distributed acquisition strategy are:

1. It is efficient to collect metadata.

2. Collection tasks can be performed in parallel

3. Metadata collection which can adapt to complex network environment.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.