What is the solution of timing tasks in big data distributed task scheduling system 07/02 Update SLTechnology News&Howtos

What is the solution of timing tasks in big data distributed task scheduling system

2025-07-02 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Servers >

Shulou(Shulou.com)05/31 Report--

This article will explain in detail what is the solution for timing tasks in big data's distributed task scheduling system. The content of the article is of high quality, so the editor shares it for you as a reference. I hope you will have a certain understanding of the relevant knowledge after reading this article.

From the perspective of architecture and technical implementation, we explain to you how the distributed task scheduling system TCT (Tencent Cloud Task) can achieve accurate real-time, stable and efficient task scheduling, as well as task segmentation and scheduling.

01 background introduction

First, let's consider a few business scenarios:

XX credit card center, from 1:00 to 3:00 on the 28th of each month, needs to complete the generation of the monthly fee list for all network users. XX clothing, you need to send birthday greetings to members from 9:00 every morning. XX game platform, after new users register, need to generate a scheduled task for the current user to settle the virtual currency exchange commission line at the end of the month. XX needs to execute Python scripts regularly to clean up invalid tmp files in a file service system. XX insurance company, need to count the number of new policies added the previous day at 2:00 every morning, and trigger the report generation task, copy and send mail after completion.

Similar to the above business scenarios for batch processing of massive timed tasks, it is not uncommon for enterprises to evolve from single architecture to micro-service architecture and cloud-based service architecture. The conventional scheduling framework based on Quartz can no longer meet the needs of this distributed scenario, which can not achieve accurate real-time, stable and efficient task scheduling, nor can it achieve task segmentation, orchestration and failure supplement. Therefore, enterprises urgently need an one-stop distributed scheduling task solution to help enterprises uniformly manage complex scheduled tasks, enhance the service capability of enterprise micro-service platform, and support the transformation of enterprise cloud service.

The existing open source solution can attack jade by the stone of other mountains. In the past development, predecessors left behind many excellent plans, each with its own advantages and disadvantages. Common open source products: Quartz, XXL-Job, ElasticJob, Antares, SIA-TASK and so on. Quartz: this framework is the most widely used, and it is based entirely on Java, and Quartz basically achieves the extreme control over a single task. With its powerful function and application flexibility, it has become an authority in the field of open source task scheduling and the cornerstone of similar open source products such as Antares; XXL-JOB: a lightweight distributed task scheduling platform, its core design goal is rapid development, easy to learn, lightweight and easy to expand. XXL-JOB supports sharding, simple task dependencies, subtask dependencies, and no cross-platform dependencies. Elastic-Job: supports task fragmentation (job sharding consistency), no task scheduling, and does not support cross-platform; SIA-TASK: cross-platform, choreographable, highly available, non-intrusive, consistent, asynchronous parallel, dynamic expansion, real-time monitoring and other features.

Logical architecture diagram of open source solution

Technical implementation diagram of open source solution

From the logical architecture and technical implementation of the open source solution, we can also directly see the shortcomings of the open source solution:

Architecture: the responsibility division of the scheduler is not clear and the system expansibility is insufficient. In the face of large-scale virtualization & complex network environment, simple remote calls are not competent. Performance: with the increase of tasks and high-frequency events, ZooKeeper cluster has become the bottleneck of system performance. Jane's solutions such as remote call or task pull can not meet the business demands of large quantity and high frequency. Function: lack of complete authentication system design, security can not be guaranteed. The ability of system operation and maintenance, such as task intervention, monitoring and alarm, is weak.

Introduction to 03TCT

In order to solve the above problems, we have carried out in-depth exploration and designed a set of enterprise-level distributed task scheduling system TCT (Tencent Cloud Task). TCT provides one-stop distributed task scheduling solution, supports random and broadcast multiple task types, has the ability of task slicing and task scheduling, and provides a perfect monitoring and alarm system. We have combined the actual business scenarios of users, absorbed historical experience, and mainly solved several core problems:

The above core elements have different requirements for the system, which can be summarized as follows for reference:

04 technical architecture

Technical architecture diagram

Let's explain the functional modules in the next frame composition:

05 functional architecture

Functional architecture diagram

This design of distributed task scheduling system has the following advantages:

Advantage 1: modular micro-service architecture design, clear responsibilities

Trigger

You only need to calculate and parse the task trigger events at different times according to the task execution rules. Through MQ to achieve reliability delivery (follow-up articles will gradually explain how to achieve reliability delivery), to cut the peak and fill the valley, avoid peak IO and other problems, and improve throughput. Through reasonable slicing strategy and disaster recovery strategy, the analytical loading strategy of traditional multi-node lock competitive rotation training is solved, and the pressure on storage is reduced. The cold and hot data isolation loading mechanism further reduces the storage pressure and system overhead.

According to the high-frequency task execution strategy, the preloading strategy and dynamic adjustment preloading algorithm are adopted to solve the problem of high load caused by high-frequency trigger.

Dispatcher

The most complex control logic component in the whole task scheduling system is the IO-intensive component. By subscribing to MQ message events and decoupling from triggers, the throughput of the system can be effectively improved.

Focus on the logical control of task scheduling, such as task execution scheduling, load balancing, fault tolerance, current limitation, billing and so on.

Access gateway

Independently undertake the access authentication and authentication of the client, and provide an effective authority verification policy. Responsible for the call management of the upstream and downlink channels, decoupling from the complex business logic. The automatic detection and sensing mechanism of client node and service node can effectively realize session management. Data transmission and routing to achieve closed loop within the component.

With the design of the SDK/Agent side, the bottleneck of the number of connections in a single node and the problem of high concurrency tcp connection establishment in the cold start scenario of service nodes are effectively avoided.

Advantage 2: stateless design, simple horizontal expansion

Trigger

Through the effective slicing strategy, in the case of avoiding triggering pressure centralization, the elastic expansion and scaling of the service can be completed quickly, and the approximate stateless horizontal expansion can be achieved.

Dispatcher

The completely stateless design scheme does not need to consider the back-to-origin problem of the task, and realizes the stateless horizontal expansion.

Access gateway

The completely stateless design scheme can achieve stateless horizontal expansion and theoretically unlimited number of TCP connections.

Advantage three: the function is complete

Flexible trigger rules

Support for Cron expressions, such as * 0ram 5 *? Wait. Trigger rules for a specific cycle frequency, such as an interval of 36 minutes. Convenient management ability to provide a variety of management and control capabilities such as pause, resume, stop, retry and so on.

Task management

Three execution modes are supported

Random node execution: select an available execution node in the cluster to perform the scheduling task. Applicable scenario: regular reconciliation. Broadcast execution: all execution nodes in the cluster distribute scheduling tasks and execute them. Applicable scenario: batch operation and maintenance. Sharding execution: it is split according to user-defined sharding logic and distributed to different nodes in the cluster for parallel execution to improve the efficiency of resource utilization. Applicable scenario: massive log statistics.

Task scheduling and execution mode

Three triggering modes are supported

Manual trigger: the user selects a specific task to execute manually in the task management list, and the scheduler immediately distributes the task and generates an execution batch. Applicable scenarios: periodic execution of tasks to supplement. Cycle trigger: sets the execution time of a task by setting the interval between task triggers; it supports cycle settings that are not supported by cron expressions. Applicable scenario: regular backup. Workflow trigger: workflow is a set of tasks, which can schedule the upstream and downstream logical dependencies of tasks and trigger tasks. Applicable scenarios: massive data processing, such as data acquisition, data filtering, data cleaning, data aggregation process scheduling.

Task trigger mode

Log traceability

Through the log service, it is convenient for users to query the task execution log. By executing and recording the execution batch details of all tasks, users can stop the execution of the batch whose current status is in execution, and trigger the re-execution operation on the batch that has been terminated; click the batch ID to enter the execution details of the batch, click Task ID to enter the list of execution batches of the task, and click the execution deployment group to enter the list of resource details.

Log query

Support complex task scheduling capabilities

Task workflows for multiple scenarios can be implemented. The complex task scheduling logic is completed by constructing the dependency relationship between the upstream and downstream of the scheduling task. It is suitable for big data process processing, task execution work order, batch operation and maintenance process scheduling and other application scenarios.

Task scheduling

06 summary

A platform system has challenges in all aspects from product function to technical architecture, which requires layers of abstraction and gradual optimization to complete the landing of a mature product. In the era of big data, in the face of massive data and user scale, any kind of architecture design is faced with many problems, such as network response, fault tolerance, idempotent, data reliability / consistency and so on.

For the platform, the reliability of the task is the first priority to be considered, followed by the timeliness of task execution. Split the function modularization reasonably, design different expansion schemes for different scenarios, improve the overall throughput of the system under the premise of SLA, achieve reliable and effective access, and deal with high-frequency and high-volume business scenarios.

For users, diversified management means, multi-dimensional operation index query and omni-directional link monitoring are pursued by users. Only when users are separated from the complex and chaotic timing task scenarios, can they focus more on business research and development.

About big data distributed task scheduling system timing task solution is shared here, I hope the above content can be of some help to you, can learn more knowledge. If you think the article is good, you can share it for more people to see.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.