What are the data monitoring problems of done files? 10/15 Update SLTechnology News&Howtos

What are the data monitoring problems of done files?

2025-10-15 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Shulou(Shulou.com)06/01 Report--

This article mainly explains "what are the data monitoring problems of done files". The content of the explanation is simple and clear, and it is easy to learn and understand. Please follow the editor's train of thought to study and learn "what are the data monitoring problems of done files"?

1 problem

In addition to Dataworks like Alibaba, it is difficult for other companies to take data scheduling, data monitoring, data consanguinity, metadata management and so on as an integrated platform. Some factories, including our company, often take these construction independently and are responsible for by different teams. Among them, the data platform scheduling function is the basic platform of most companies, but the degree of scheduling function is different. The following questions are used as a brick to attract jade, pointing out the problems often encountered in the production environment. If there is follow-up output, try to open source some code and post it to the back of this blog.

There are two kinds of monitoring from a large level, one is used to intercept, that is, dependent, and the other is only used for alarm and analysis.

Due to the large number of dependent access sources, the following problems often occur:

1.1 data delay output, data output empty partition, data quality may be problematic (number of entries, incorrect timestamp)

General processing process: it takes time 30m + processing-delay problem → to easily create the dependency graph, confirm which upstream output table has no output-> copy the table name-> find the person in charge in the data map-> usually pull the group to follow up-- > wait for processing-- > synchronous or non-synchronous / follower → synchronous output is ready.

1.2 the user unconsciously uses the wrong data and takes 60m + to deal with the empty partition problem.

Processing process: need to carry on the quality control to the distribution of the final output label. There is no-> if found later-> copy the table name-> go to the person in charge in the data map-> usually pull the group to follow up-- > wait for processing-- > synchronize or out of sync / follower → backtracking data-> notify the user of the data problem.

1.3 the user is aware of using the wrong data first

Time spent 60m + data quality problem (number of entries, timestamp) → is generally realized only when the tag user discovers-> problem recurrence-> copy table name-> go to the data map to find the person in charge-> usually pull the group to follow up-> wait until the → synchronization or non-synchronization / follower → synchronization is finished.

1.4 data delay output problem

There are some routine data that must be produced at the xx point every day. If they are not generated well, they have to go to the person in charge of the upstream one by one to find the problem. Similar to the problems in 1.1.3, they all have to find the upstream manually.

2 ideas for solution

Based on the above problems, we find that these problems are imperfect monitoring, what should the perfect monitoring be like?

In a known problem, as long as the label distribution of the table or data is monitored, then when there is a problem, the data user and the data publisher can be automatically notified. When the problem is thrown to someone, he can choose to set the alarm to be processed, and then deal with it within the xx time. If it is not handled well, the alarm may continue to be reported, but the alarm scope may be larger, such as calling the manager in charge or emailing. Text messages, Laoquan Aite, etc. Another benefit of this is that the sla of the data is guaranteed to a certain extent, which can be checked later or used in "some special occasions" in the future.

If the requirements are as above, then the design

Monitoring is independent of the scheduling system, and the only interaction with the scheduling system is the done file, and the scheduling continues after the output of the done file.

1.2.0 Why is it based on done files?

Task dependency, for task dependency, in order to check the quality of the data source, it is necessary to configure task detection dependency for each task. There will be two problems, one is that the task detection script will be more decentralized, and the other is that many of the detection logic is similar, which can also cause script redundancy.

If the table is dependent on, and the detection location is the partition of the table, then when the data quality test passes, a partition of the table will be generated, which will eventually be added through add partition, similar to dt=xxxx/rule=check_t1_count.done.

File dependency, which is similar to table dependency, is to generate a done file. The difference is that you can call and generate done directly through the service, so it is more convenient to select the file dependency.

1.2.1 the done file consists of a unique table name + task id.done

1.2.2 single point alarm + multi-layer processing alarm, if the A meter and B meter are like, report to whom, specifically, output delay and failure alarm.

Thank you for your reading, the above is the content of "what are the data monitoring problems of done files?" after the study of this article, I believe you have a deeper understanding of the data monitoring problems of done files, and the specific use needs to be verified in practice. Here is, the editor will push for you more related knowledge points of the article, welcome to follow!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.