How to schedule jobs with Flink 07/01 Update SLTechnology News&Howtos

How to schedule jobs with Flink

2025-07-01 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Servers >

Shulou(Shulou.com)05/31 Report--

This article is to share with you about how Flink schedules jobs. The editor thinks it is very practical, so share it with you as a reference and follow the editor to have a look.

Scheduling  scheduling

The execution resources in Flink are determined by the task execution slot. Each TaskManager has one or more task execution slots, each of which can run a pipeline of parallel tasks. Each pipeline contains multiple consecutive tasks, such as N parallel instances of MapFunction and n parallel instances of ReduceFunction. Note that Flink often performs multiple consecutive tasks at the same time: this is true for data flow programs, but only frequently for batch programs.

The picture below illustrates the situation. A program with a data source, a MapFunction and a ReduceFunction. Both the data source and MapFunction are executed according to 4 concurrency degrees, while ReduceFunction is executed according to 3 concurrency degrees. This is a pipeline that contains the order from Source to Map to Reduce. On a cluster with two TaskManager, each TaskManager has three task execution slots, and this program will be executed as described below.

Internally, Flink uses SlotSharingGroup and CoLocationGroup to determine which tasks can share a task slot (permitted), placing those tasks strictly in the same execution slot.

JobManager Data Structures  JobManager data structure

During the job execution phase, JobManager continuously tracks those distributed tasks, decides when to schedule and execute the next task (or group of tasks), and responds to completed tasks or execution failures.

The JobManager receives the JobGraph, and the JobGraph is described by the data stream containing the operation (JobVertex) and the intermediate result (IntermediateDataSet). Every operation has attributes, such as code executed in parallelism. In addition, JobGraph contains a set of additional libraries necessary for the execution of the operation code.

JobManager converts the JobGraph to the execution diagram ExecutionGraph. The ExecutionGraph execution diagram is a parallel version of JobGraph: for each JobVertex, it contains the ExecutionVertex for each parallel subtask. An operation with 100 parallelism will have one JobVertex and 100 ExecutionVertices. ExecutionVertex tracks the execution status of specific tasks. All ExecutionVertices in a JobVertex will be in one ExecutionJobVertex. ExecutionJobVertex tracks the overall status of the operation. In addition to vertices, the ExecutionGraph execution diagram also contains the intermediate result IntermediateResult and the intermediate result partition IntermediateResultPartition. The former tracks the status of the intermediate dataset, while the latter tracks the status of each partition.

Each execution diagram ExecutionGraph has a job status associated with it. This job status represents the current state of job execution.

A Flink job starts with the created state completed, then transitions to the running state, and then transitions to the completed state after all jobs are completed. In case of failure, the job will go to a failed state and all running tasks will be undone. If the job point reaches the final state and the job cannot be restarted, the job changes to a failed state. If the job can be restarted, the job will enter the restart state. Once the job restart is complete, the job becomes created and completed.

If the user cancels the job, the job becomes canceled. You also need to undo all running tasks. Once all the running tasks reach the final state, the job is programmed to cancel the completion state.

Unlike the completed state, the cancelled and failed states represent the global final state, and the cleanup job task is sent roughly, and the suspended state is only on the local terminal. The local terminal means that the execution of the job has been terminated by its own JobManager, but another JobManager on the FLink cluster can get the job through persistent HA storage and restart the job. Therefore, the suspended homework will not be completely cleaned up.

When executing the figure ExecutionGraph execution, each parallel task goes through multiple states, from being created to being completed or failed. The following figure illustrates these states and the possible transition relationships between them. A task may be executed multiple times (for example, during recovery). For this reason, Execution tracks the execution of ExecutionVertex in a single execution.

Thank you for reading! This is the end of this article on "how to schedule jobs in Flink". I hope the above content can be of some help to you, so that you can learn more knowledge. if you think the article is good, you can share it for more people to see!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.