Trustworthy open source | Architecture design and operation flow of distributed task scheduling platform SIA-TASK 07/09 Update SLTechnology News&Howtos

Trustworthy open source | Architecture design and operation flow of distributed task scheduling platform SIA-TASK

2025-07-09 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Shulou(Shulou.com)06/03 Report--

I. the background of distributed task scheduling

Whether Internet applications or enterprise applications, there are a large number of batch processing tasks. We often need some task scheduling system to help solve the problem. With the gradual evolution of micro-service architecture, single architecture has gradually evolved into distributed and micro-service architecture. In this context, many of the original task scheduling platforms can no longer meet the needs of business systems, so there are some distributed task scheduling platforms.

1.1 Evolution of distributed task scheduling

In the actual business development process, many times we inevitably need to use some scheduled tasks to solve the problem. Usually we have a variety of solutions: using Crontab or SpringCron (of course, in this case there may be few machines and not a lot of tasks). However, when the complexity of the application increases, the number of scheduled tasks increases and the dependencies between tasks occur, the management and configuration of scheduled tasks in Crontab will be very chaotic, which will seriously affect the work efficiency. Then a series of problems arise:

Task management is chaotic, the life cycle can not be coordinated management; if there is a dependency between tasks, it is difficult to arrange.

With the development of the Internet, the distributed service architecture is becoming more and more popular. Accordingly, a distributed task scheduling system is needed to manage scheduled tasks in the distributed architecture.

1.2 distributed task scheduling architecture

When there are more and more vertical applications, the interaction between applications will become more and more complex. usually we use distributed or micro-service architecture to extract the core business to form separate services. An independent micro-service group gradually forms a stable service center, which enables business applications to respond to the ever-changing market demand more quickly.

At this point, the distributed service framework for improving business reuse and integration becomes the key. At the same time, because the service is independent, it is generally possible to achieve timing task independence, and the impact of task changes on the overall system is very small. Usually we use the way that the task is separated from the scheduling (as shown in the figure above), the execution logic of the task does not need to pay attention to scheduling and scheduling, at the same time, it can ensure the high availability of actuators and scheduling, easy to develop and maintain.

1.3 advantages of distributed task scheduling

Based on the distributed service architecture, because there may be a large number of independent services, if the timing task is implemented separately in the service, it is likely to be difficult to manage, and the business restart caused by the change of the timing task can not be avoided. Therefore, an independent distributed task scheduling system is necessary, which can be used to manage all scheduled tasks as a whole. At the same time, separate the configuration of the task as the function of the distributed task scheduling system, so that the change of the scheduled task does not affect any business or the whole system:

Through the management of scheduling and task separation, the development and maintenance costs are greatly reduced; distributed deployment ensures the high availability, scalability and load balancing of the system, and improves fault tolerance; scheduled tasks can be deployed and managed through the console, convenient, flexible and efficient Tasks can be persisted to the database to avoid the hidden dangers caused by downtime and data loss. at the same time, there are perfect task failure redo mechanism and detailed task tracking and alarm strategy. 2. selection of distributed task scheduling technology 2.1 considerations for distributed task scheduling

Task scheduling: there is a process order for scheduled tasks between multiple services. Task fragmentation: for a large task, it needs to be executed in parallel. Cross-platform: in addition to projects that use Java technology stacks (SpringBoot, Spring, etc.), there are applications in other languages. No intrusion: the business does not want to be highly coupled with scheduling and only focuses on the execution logic of the business. Failover: there are compensation measures for problems encountered in the process of task execution to reduce human intervention. High availability: the scheduling system itself must ensure high availability. Real-time monitoring: get the execution status of the task in real time. Visualization: the operation of task scheduling provides a visual page for easy to use. Dynamic editing: the task clock parameters of the business may change and do not want to be deployed downtime. 2.2 comparison between SIA-TASK and other distributed task scheduling technologies

SIA is the abbreviation of Simple is Awesome, the basic development platform of Yixin Company, and SIA-TASK (micro-service task scheduling platform) is one of the important products. SIA-TASK fits the current micro-service architecture model, and has the characteristics of cross-platform, orchestration, high availability, non-invasion, consistency, asynchronous parallel, dynamic expansion, real-time monitoring and so on.

Open source address: https://github.com/siaorg/sia-task

We first compare the mainstream open source distributed task scheduling framework in the market, analyze their advantages and disadvantages, and then introduce our technology selection.

Quartz: Quartz is an open source project of the OpenSymphony open source organization in the field of task scheduling, based entirely on Java implementation. The project was acquired by Terracotta in 2009 and is currently a project under Terracotta. Compared with the scheduled tasks provided by JDK or Spring, Quartz basically achieves the extreme control of a single task, and plays a huge role in enterprise applications because of its powerful function and application flexibility. However, Quartz does not support task choreography (there are dependencies between tasks), and does not support task fragmentation. TBSchedule: TBSchedule is a distributed scheduling framework that allows batch tasks or ever-changing tasks to be dynamically assigned to the JVM of multiple hosts and executed in parallel in different thread groups. Pure Java implementation based on ZooKeeper, open source by Alibaba. TBSchedule focuses on task distribution and supports task fragmentation, but there is no task scheduling and it is not cross-platform. Elastic-Job: Elastic-Job is a Dangdang open source distributed scheduling solution, which consists of two independent subprojects, Elastic-Job-Lite and Elastic-Job-Cloud. Elastic-Job supports task fragmentation (job fragmentation consistency), but there is no task scheduling, nor is it cross-platform. Saturn: Saturn is VIPSHOP's open source distributed, highly available scheduling service. Saturn does secondary development in Elastic-Job, supporting monitoring, task fragmentation, and cross-platform, but no task scheduling. Antares: Antares is a Quartz-based distributed scheduling that supports fragmentation and tree task dependency, but is not cross-platform. Uncode-Schedule: Uncode-Schedule is a distributed task scheduling component based on Zookeeper. Support the execution of all tasks in the cluster without repetition or omission. Support for dynamically adding and deleting tasks. However, it does not support task fragmentation, there is no task scheduling, and it is not cross-platform. XXL-JOB: XXL-JOB is a lightweight distributed task scheduling platform. Its core design goal is rapid development, easy learning, lightweight and easy to expand. XXL-JOB supports sharding, simply supports task dependency, and supports subtask dependency, not cross-platform.

Let's briefly compare SIA-TASK with these task scheduling frameworks:

Task scheduling task slicing cross-platform high availability failover real-time monitoring SIA-TASK √ Quartz × ×. NET √ × API monitoring TBSchedule × √ √√√ Elastic-Job × √ × √√√ Saturn × √ Antares √√ × √√√ Uncode-Schedule × × √√√ XXL- job subtask depends on √ × √√√

It can be found that these scheduling frameworks basically support the functions of high availability, failover and real-time monitoring, but they have their own emphasis on the support of task scheduling, task fragmentation and cross-platform. SIA-TASK will fully support these features.

3. SIA-TASK introduces 3.1SIA-TASK technology selection.

REST: a style of software architecture. The actuator is required to expose the Http calling interface to achieve the cross-platform purpose. AOP: facet programming technology. Used in the Spring project expansion package Hunter to ensure that Task is called serially (singleton single thread). Quartz: powerful, flexible application, the control of a single task is basically extreme, used as a scheduling center clock component. MySQL: for metadata storage and (temporary) log access. Elastic: a Lucene-based search server that provides a full-text search engine with distributed multi-user capabilities for log storage and query. SpringCloud: the active development framework of the community and the unified development framework designated by the company. For rapid development, fast iteration. MyBatis: an excellent persistence layer framework that supports customized SQL, stored procedures, and advanced mapping. Used to simplify persistence layer development. Zookeeper: a tried and tested registry. It is used to solve the problems of high availability of scheduling center, distributed consistency and so on. 3.2Design ideas of SIA-TASK

SIA-TASK draws lessons from the idea of micro-service design, obtains the Task metadata distributed on each actuator node, reports it, and uploads it to the registry. The online editable mode is used to support online task scheduling and dynamically modify the task clock, and Http protocol is used as the interactive transmission protocol. The data exchange format is unified using Json. The user operates through the scheduler (described below), triggers the event, the scheduler receives the event, and the scheduling center parses the clock, executes the task flow, and notifies the task.

3.3 basic concepts of SIA-TASK

SIA-TASK adopts the way of separation of task and scheduling, and the task execution logic and scheduling logic of the business are completely separated. The composition of the system involves the following core concepts:

Task (Task): basic execution unit, an HTTP calling interface exposed by the executor. Job: consists of one or more tasks that are logically related to each other (serial / parallel), the smallest unit scheduled by the task scheduling center. Plan: consists of several jobs executed sequentially, each with its own execution cycle, and the plan has no execution cycle. Task scheduling center (Scheduler): scheduling according to the execution cycle of each job, that is, making HTTP requests according to the logic of plans, jobs and tasks. Task scheduling Center (Config): the choreography center uses tasks to create schedules and assignments. Task executor (Executer): receives HTTP requests for business logic execution. The Hunter:Spring project expansion package is responsible for capturing tasks in the executor and uploading to the registry. The business can rely on this component for Task writing. 3.4 SIA-TASK system architecture

SIA-TASK can be divided into three modules (scheduling center, orchestration center and executor) and two components (persistent storage and registry). The functions of these three modules and two components are as follows:

Task scheduling center: responsible for preempting Job, task scheduling and task migration, is the core function module of SIA-TASK. Task scheduling center: responsible for logical scheduling of online tasks, providing log viewing and real-time monitoring functions. Task executor: responsible for receiving scheduling requests and executing task logic. Task registry (ZK): coordinate the workflow of Job and Task, scheduler, etc. Persistent storage (DB): records the Job and Task data of the project and provides log storage.

SIA-TASK uses the SpringBoot system as the architecture selection, carries out secondary development based on Quartz and Zookeeper, and supports the corresponding special × × ×. The logical architecture diagram of SIA-TASK is shown below:

3.5 SIA-TASK module description 3.5.1 Task scheduling Center

The task scheduling center is responsible for task scheduling, managing scheduling information, issuing scheduling requests according to the scheduling configuration, and does not bear the business code. The scheduling system is decoupled from the task, which improves the availability and stability of the system, and the performance of the scheduling system is no longer limited by the task module; it supports visual, simple and dynamic management of scheduling information, including task creation, update, deletion and task alarm. all of the above operations will take effect in real time, while supporting monitoring scheduling results and execution logs, and supporting actuator fault recovery.

3.5.2 Task scheduling Center

The task scheduling center is a component of the distributed scheduling center that supports online task model scheduling, and the task scheduling on the web side can be carried out based on UI.

We can use the above basic model to orchestrate some complex scheduling models, such as:

SIA-TASK 's UI choreography interface:

View the orchestration information of task after the orchestration is finished, as shown in the following figure:

At the same time, the orchestration center also provides home statistics viewing, scheduling monitoring, Job management, Task management and log management functions.

3.5.3 Task executor

Responsible for receiving scheduling requests and executing task logic. The task module focuses on operations such as task execution, making it easier and more efficient to develop and maintain.

Two types of actuators are supported:

(1) if sia-task-hunter is used, SpringBoot project and Spring project are supported, sia-task-hunter is introduced, and Task grabs the client. Compliant HTTP interface (called Task) tasks are automatically crawled and uploaded to the registry

(2) if you do not use sia-task-hunter, you only need to provide a HTTP API that can be called by the task. In this case, you need to enter the business manually and control the concurrency call control of the task.

3.5.4 Task Registry (Zookeeper)

The distributed framework uses Zookeeper as the registry.

(1) Task registration

Both the dispatching center and the execution cluster use Zookeeper as the registration center, and all data is registered in the form of nodes and node contents, and the host status is reported regularly to keep alive on the Zookeeper.

(2) metadata storage

The registry not only provides registration services, but also stores information about each executor (including executor instance information, Task metadata uploaded by the executor, and some temporary status data when the task is running).

(3) event release

Based on the Zookeeper event push mechanism, the task is issued, and the balance algorithm is used to ensure the balanced distribution of the scheduler task preemption.

(4) load balancing

Ensure that the scheduler obtains the balance of the number of executing Job to avoid the pressure of a single node.

3.5.5 persistent Storage (DB)

Here MySQL is used as the data persistence solution.

Except for Task dynamic metadata stored in the registry, other related metadata is stored in MySQL, including but not limited to: manually entered Task, configured Job information, choreographed Task dependency information, scheduling log, business staff operation log, Task execution log, and so on.

3.6 SIA-TASK critical Operation process 3.6.1 Task release process

(1) users can create Job through UI. You can select the Job type, set the alert mailbox, and set the Job description. The task Task is then choreographed for the created Job.

(2) after the Job has been created and the Task orchestration relationship is set up, the task can be issued, and the corresponding Job can be operated (activate, execute once, stop and delete) through UI.

(3) the user's Task task can be crawled by the crawler, or it can be created manually using UI.

3.6.2 execute the process

(1) after the Job is created, you can choose to activate the trigger scheduled task.

(2) after the Job arrives at the booking time, the dispatch center triggers the Job, then notifies the Task executor through http according to the scheduled Task orchestration logic to execute, and asynchronously listens for the execution result of the task.

(3) if the execution result is successful, determine whether there is a post-Task; if so, proceed to the next scheduling; if it does not exist, the Job execution is completed and the call ends; if the execution result fails, the fault recovery strategy is triggered: stop immediately, ignore the failure, try many times, and transfer to another executor for execution.

3.6.3 State transfer

Job has four states throughout its lifecycle, namely, stopped (NULL), ready (READY), RUNNING (start), and STOP (abnormal stop). The state flow and flow conditions are shown in the following figure.

3.7 SIA-TASK module design

The physical network topology diagram of SIA-TASK is shown below:

The design idea of SIA-TASK 's interaction between modules:

(1) create Task tasks through the orchestration center or grab automatically through Hunter, and asynchronously save Task information to DB; to create Job and activate, and create JobKey in zookeeper.

(2) the dispatch center will listen to the JobKey creation event in zookeeper, and then preempt the created Job. After the preemption succeeds, join the quartz scheduled task, and trigger the Job to run when the time expires. The scheduling center asynchronously invokes the executor service to execute the Task in the Job (there may be multiple Task, following the Task failure policy) and returns the result to the scheduling center.

(3) change the execution status of Job on zookeeper at any time, and query can be carried out through the query interface of the orchestration center.

(4) after the Job execution is finished, wait for the next execution.

3.7.1 Task scheduling Center Design

The orchestration center can exchange data with DB and zookeeper, and its main functions can be divided into three aspects:

Data persistence interface service; metadata changes on zookeeper; data visualization: viewing various statistical data of the system, etc.

The monitoring display on the home page of the choreography center is as follows:

3.7.2 Design of Task scheduling Center

The dispatching center mainly interacts with DB, ZK and actuators, and its main functions can be divided into the following aspects:

Job execution logging Job status change invocation executor service execution Job scheduling center highly available Job scheduling thread pool 3.7.3 task executor design

The actuator can interact with the ZK and the scheduling center, and its main functions can be divided into two aspects:

Accept the scheduling of the scheduling center, execute scheduled tasks, and return the results to the scheduling center; automatically grab the Task tasks on the actuator and submit them to ZK.

Example of an actuator Task:

@ OnlineTask (description = "online task example", enableSerial=true) @ RequestMapping (value = "/ example", method = {RequestMethod.POST}, produces = "application/json;charset=UTF-8") @ CrossOrigin (methods = {RequestMethod.POST}, origins = "*") @ ResponseBodypublic String example (@ RequestBody String json) {/ * * TODO: client business logic processing * / Map info = new HashMap (); info.put ("status", "success") Info.put ("result", "as you need"); return JSONHelper.toString (info);}

Thus, the task Task is very simple to write.

3.8 SIA-TASK High availability Design

Distributed services generally consider high availability solutions, and SIA-TASK also enhances different dimensions for different service components in order to ensure high availability.

3.8.1 High availability of the task scheduling center

SIA-TASK achieves the high availability of the orchestration center through front-end separation, service split and other measures. When an instance in the cluster fails, it does not affect other instances of the cluster, so other available orchestration centers in the cluster can be used without special action.

3.8.2 High availability 3.8.2.1 abnormal transfer of task scheduling center

If the service of an instance node in the scheduling center cluster goes down, all Job on this instance node will be smoothly migrated to the available instances in the cluster without missing the execution of scheduled tasks. At the same time, when the crashed instance repair successfully reconnects to the cluster, it will continue to preempt Job to provide services.

3.8.2.2 configure thread pool

Scheduling is implemented by thread pool to avoid task scheduling delay caused by single thread blocking. The default value is 10 for the number of threads in the program pool. When executing tasks will concurrently execute multiple time-consuming tasks, choose the size of the thread pool according to the business characteristics.

Org.quartz.threadPool.class = org.quartz.simpl.SimpleThreadPool org.quartz.threadPool.threadCount = 60org.quartz.threadPool.threadPriority = 5org.quartz.threadPool.threadsInheritContextClassLoaderOfInitializingThread = true

SIA-TASK uses the thread pool again according to the threadPool provided by quartz itself. Redefine the thread pool to allocate a unique thread pool for each Job. The size of the thread pool can be dynamically scaled according to the number of Task arranged by Job itself, so as to ensure that the scheduling threads of each Job are completely independent, and thread resources will not be exhausted due to the sharp increase in the number of choreographed Task. At the same time, the thread pool resource recovery logic is provided, and the thread pool resources allocated for a period of time are reclaimed when the Job is permanently terminated.

Public static ExecutorService getExecutorService (String JobKey) {ExecutorService exec = executorPool.get (JobKey); if (exec = = null) {LOGGER.info (Constants.LOG_PREFIX + "Initialize thread pool for running Jobs,Job is {}", JobKey); exec = Executors.newCachedThreadPool (); executorPool.putIfAbsent (JobKey, exec); exec = executorPool.get (JobKey);} return exec;} 3.8.2.3 full log trace

SIA-TASK comprehensively tracks the entire scheduling life cycle of Job, and uses AOP to enhance logging. Every time the scheduling center triggers Job scheduling, it will log. At the same time, Task execution choreographed for Job also logs tasks.

Logs are divided into Job logs and Task logs:

Job log: contains scheduler information, scheduling time, scheduling status, and other additional properties. Task log: contains executor information, execution time, execution status, return information, and other additional properties. 3.8.2.4 Asynchronous Encapsulation SIA-TASK has been designed from the very beginning to consider the consumption of concurrent thread resources in the scheduling center when tasks are called remotely. For Task remote scheduling encapsulated by Job, all asynchronous calls are used, and the time consumption of each task request logic is very lightweight. A http request seen only once. Support Task to set user-defined timeout, and support timeout in two modes: connecttimeout and readtimeout. Support users to set timeouts according to the specific execution cycle of the business. Public interface RestTemplate {/ * Asynchronous Post method * @ param request * @ param responseType * @ param uriVariables * @ param * @ return * / ListenableFuture postAsyncForEntity (Request request, Class responseType, Object... UriVariables);} 3.8.2.5 Custom Scheduler resource pool

SIA-TASK designed the scheduling resource pool from the perspective of physical resources, and we pooled the scheduler for some special cases; the scheduler can change the state through different operations, so as to transform the capacity.

Work Scheduler resource pool: manages scheduler resources that have the ability to acquire tasks and can actually get tasks. Offline scheduler resource pool: manages scheduler resources that have the ability to acquire tasks but are not actually allowed to do so. Offline scheduler resource pool: manages scheduler resources that have been down in the offline scheduler resource pool. 3.8.3 High availability of task executors

Considering the instability of the network, SIA-TASK also makes a very important design for the instability of the network. The test support for the connectivity of nodes and the premonition of the health of Task running instance nodes ensure that the health of Task instance nodes is perceived in advance and the scheduling Task is highly available.

At the same time, it also ensures that the actuator instance redesigns the reconnection mechanism of zookeeper in view of the problem that the link is interrupted by the network, which ensures that the node of the Task running instance can resume and retry after losing the link due to network problems, until it returns to normal and merges into the normal receiving task scheduling in the execution pool.

Generally speaking, executors are also deployed in a cluster. As the execution unit of Task, if the execution fails on a machine in the executor cluster, the scheduling center will fail over according to the failure policy. Two failover strategies are provided: polling failover and maximum compensation failover. Polling transfer is to poll the list of available executors. If one executor succeeds, the Task execution succeeds, and if all execution fails, the Task execution fails. The maximum compensation transfer is first executed several times in this actuator, if the execution is successful, it will not be transferred, and if the execution still fails, the polling transfer strategy will be executed. IV. Summary

So far, this paper gives a brief introduction to the micro-service task scheduling platform SIA-TASK, including the design background, architecture design and product component functions and features. The micro-service task scheduling platform SIA-TASK basically solves the current business needs and provides simple and efficient scheduling services. SIA-TASK will continue to iterate to provide better services. Relevant technical documentation and usage documentation will also be provided later.

Link Guid

Open source address: https://github.com/siaorg/sia-task

Expand reading: reliable open source micro-service task scheduling platform (SIA-TASK)

Author: Mao Zhengwei / × × Fei / Liang Xin

Original release: SpringCloud Community

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.