How to choose web distributed Task scheduling Framework 07/01 Update SLTechnology News&Howtos

How to choose web distributed Task scheduling Framework

2025-07-01 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Shulou(Shulou.com)06/01 Report--

This article mainly explains "how to choose web distributed task scheduling framework". The content of the article is simple and clear, and it is easy to learn and understand. Please follow the editor's train of thought to study and learn "how to choose web distributed task scheduling framework".

1. Background

Timing task is an inevitable business in re-development. For example, birthday coupons may be sent to users regularly in some e-commerce systems, and reconciliation may be done regularly in some reconciliation systems. Probably a long time ago, each service might have one machine, and a Timerschedule directly on this machine could basically meet our business needs, but with the changes of times, a single machine is far from enough to meet our needs. At this time, we may need 10, 20 or more machines to run our business and accept our traffic. This is what we call scale-out. But here's the question: what happens if so many machines still use our Timerschedule? In the above e-commerce system, it is possible to send a lot of birthday coupons to a user, causing a lot of losses to the company, so we need some other ways to make scheduled tasks be performed only once on multiple machines.

Here I would like to ask you how do you do scheduled tasks before you have known or used the distributed task scheduling framework? In the Spring project, everyone must know Spring-Scheduler. We only need to add @ Scheduler annotation to the corresponding method of bean in Spring to complete our scheduled task. However, using this annotation alone is far from ensuring that the scheduled task will be executed multiple times. We need some other means to guarantee. Generally speaking, there are no more than the following methods (for projects based on Spring):

A machine, we can carry some less important scheduled tasks using a special service desk, and then run on a stand-alone machine, even if we hang up as long as we restore it within an acceptable time, our business will not be affected.

Multiple machines, plus distributed locks, as long as we first acquire a distributed lock when we execute the task, if the acquisition fails for so long to prove that other services have been running again, and if the acquisition is successful, then it can be executed.

Multiple machines, use ZooKeeper to perform scheduled tasks on Leader machines, and many businesses have already used ZK, so when executing scheduled tasks, determine whether they are Leader, if not, do not execute, if so, execute business logic, which can also achieve our goal.

At present, our company also uses the above three methods to do scheduled tasks, which can basically be satisfied at the beginning of the business, but as time goes by, we encounter more and more problems. Let me share with you:

First of all, there is the problem of stand-alone. How to divide a business is not very important. This area is already quite complicated. Everyone may say that their business is important. Secondly, if the stand-alone machine hangs up, it may be a downtime. There may be some other situations. How can we ensure recovery between acceptable ranges at this time? these are all difficulties.

At present, when we use a scheduled task, if we want it to be executed immediately, we may need to write an additional Rest interface or a separate Job.

Another is that we need to change the execution time of scheduled tasks. For example, there is a requirement to change the execution time from once every 12 hours to once every 6 hours. We have to modify the code, submit the pr, and then package and put it online. It just takes us a lot of time to modify it.

Unable to pause our scheduled tasks, when there may be some problems with our scheduled tasks, such as the need for some timing alarms, when the alarms suddenly become too many, we need to pause to stop sending alarms. At this time, we may be able to use some distributed configuration switches to do, and then logically determine whether the timing task switch is turned on, and then do it. It's easy to do this, but we need to add some new logic that has nothing to do with the task.

Lack of monitoring of scheduled tasks, developers do not know after the failure of the task, some people say that there is not an Error log, if an Error log is an alarm, then your service can stand it? generally speaking, the alarm will only be triggered several times in a row, and the periodic characteristic of our scheduled task is that it is not easy to trigger continuous Error.

Of course, there are some more or less minor problems will not be listed here, if you have this kind of experience can slowly experience and discover.

two。 The basic principles of research

The first chapter above talked about the reasons for our framework, no matter what you want to introduce or improve, you need a reason, because there is a cost to do everything. I often see some very small projects start to introduce message queues, or distributed transactions, and so on. Instead, this is putting the cart before the horse. For example, some blog systems may do a message queue peak reduction and flow reduction, which may not be as fast as synchronous calls.

When we have a reason, we can start to do some research or technical solution design. Here I would like to talk about some basic principles of my research framework. If you have a similar research framework in the future, you can apply it to it.

Simple-easy to connect to developers and easy to use for users.

Rich documentation, there are many open source project documents are very few, of course, there are some open source projects only English documents, if you are not very good at English, you may need to consider Chinese documents.

There is a management interface, which makes it easy to perform operations and statistics.

Support mainstream frameworks: such as Spring,Springboot, etc., of course, this should at least support the mainstream frameworks in your business.

The framework is lightweight and convenient to be customized according to your own needs.

High performance, high reliability, high availability: don't let the framework become a bottleneck in the business.

Code update frequency and community usage: the more companies you use to prove that it is more popular, the more frequent code updates prove that there will be fewer problems, preferably open source and maintained by large manufacturers.

Multilingual requirements: if you have multilingual requirements in your business, for example, your company uses many development languages and requires a scheduling framework, then you need to use multilingual support. For example, the representative of Rpc that supports multiple languages is Thrift.

Can you solve the current pain point: this is the most important thing, and what's the point of using this if you can't solve your problem?

When we have the above major principles, we can then enter the research.

3. Research Framework 3.1 TBSchedule

General survey of some frameworks of the Java department, you can first see if Ali is open source, after all, Ali has done a very good job in open source in recent years, and then online search found that Ali in 12 years of open source a scheduling framework called TBSchedule, and now to search the code, found that people have gone cool, the code has been cleaned up. Of course, there is also a personal project to Fork out and continue to maintain, but it is not explained here that there are really few users. Github address: https://github.com/taobao/TBSchedule

3.2 elastic-job

Elastic-Job is a Dangdang open source distributed scheduling solution, which consists of two independent sub-projects Elastic-Job-Lite and Elastic-Job-Cloud. Positioned as a lightweight decentralized solution that provides coordination services for distributed tasks in the form of jar packages. Support distributed scheduling coordination, flexible capacity expansion and reduction, failure transfer, missed execution job re-trigger, parallel scheduling, self-diagnosis and repair and other features.

This framework was very popular about 2 years ago, and many companies were used at that time, and many people must have heard of it, but it is a pity that it is no longer being maintained, and the code has not been updated for 2 years, which violates the principle of update frequency. if something goes wrong, there may be no one to help you, so we don't recommend it. Github address: https://github.com/elasticjob/elastic-job-lite

3.3 some of the niche

There are some relatively minority github star on the Internet, which are rarely updated: Uncode-Schedule,LTS,openCron, etc., which are not in line with our principles and will not be considered.

3.4 XXL-JOB

Since there are no foundations such as CNCF,Apache for distributed timing tasks, the choice may not be so difficult. Unlike there are several message queues in Apache: Kafka,rocketmq,plusar, etc., each community is very large, and it may be difficult to choose. Well, we basically have two choices left. one is self-research. The difficulty of further research and development of this task scheduling framework is far less than that of message queue research and development, so in fact, many companies have chosen self-research, such as Meituan's Crane. However, for some complex middleware such as message queues, they may choose secondary development. For example, Meituan's mafka is based on kafka secondary development, and Didi's DDMQ is also based on Rocketmq. At present, if we choose self-research, it is obviously not enough in terms of resources, here we still use the strategy of secondary development framework.

Of course, there is still a choice of XXL-Job: http://www.xuxueli.com/xxl-job, which is basically in line with our principles, the current code is also constantly updated, issue authors are also actively responding, using more than 200 companies, including previous reviews, while other principles are also in line with. Generally speaking, when you decide to choose a framework, you need to list the advantages in detail so that others can be convinced.

Xxl-job has the following characteristics:

Simple: CRUD the task through the Web page. It is easy to operate and can be used in one minute.

Dynamic: support to dynamically modify task status, start / stop tasks, and terminate running tasks, effective immediately

Dispatching center HA (central type): dispatching adopts central design. Dispatching center develops scheduling components and supports cluster deployment, which ensures dispatching center HA.

Executor HA (distributed): distributed task execution. Task "executor" supports cluster deployment to ensure task execution HA.

Registry: the executor will automatically register the task periodically, and the dispatch center will automatically discover the registered task and trigger the execution. At the same time, it also supports manual entry of actuator addresses.

Flexible capacity expansion and reduction: once a new actuator machine is online or offline, tasks will be reassigned the next time it is scheduled.

Routing strategy: executor cluster deployment provides a wealth of routing strategies, including: first, last, polling, random, consistent HASH, least frequently used, most recently unused, failover, busy transfer, etc.

Failover: in the case of task routing strategy "failover", if a machine in the actuator cluster fails, the Failover will automatically switch to a normal executor to send scheduling requests.

Blocking processing strategy: the processing strategy when too dense executors have time to process, including: single machine serial (default), discarding subsequent scheduling, scheduling before overwriting

Event trigger: in addition to "Cron mode" and "task dependency mode" trigger task execution, support event-based trigger task mode. The scheduling center provides API services that trigger a single execution of tasks, which can be triggered flexibly according to business events.

Task progress monitoring: support real-time monitoring of task progress

Rolling real-time log: supports online viewing of scheduling results, and supports real-time viewing of the complete execution log of the executor output in Rolling mode

Thank you for reading, the above is the content of "how to choose web distributed task scheduling framework". After the study of this article, I believe you have a deeper understanding of how to choose web distributed task scheduling framework, and the specific use needs to be verified in practice. Here is, the editor will push for you more related knowledge points of the article, welcome to follow!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.