How to understand Job and DaemonSet in application choreography and management 07/03 Update SLTechnology News&Howtos

How to understand Job and DaemonSet in application choreography and management

2025-07-03 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Servers >

Shulou(Shulou.com)05/31 Report--

Today, I will talk to you about how to understand Job and DaemonSet in application choreography and management. Many people may not understand it very well. In order to make you understand better, the editor has summarized the following content for you. I hope you can get something according to this article.

I. Job background of Job demand source

First, let's take a look at the source of demand for Job. We know that in K8s, the smallest scheduling unit is Pod, and we can run task processes directly through Pod. Doing so will create the following problems:

How can we ensure that the processes in the Pod end correctly?

How do I ensure that the process fails to run and try again?

How do you manage multiple tasks with dependencies between tasks?

How do I run tasks in parallel and manage the queue size of tasks?

Job: the controller for managing tasks

Let's take a look at what Kubernetes's Job provides for us:

First of all, kubernetes's Job is a controller for managing tasks. It can create one or more Pod to specify the number of Pod, and can monitor whether it runs or terminates successfully.

We can set the reset mode and the number of retries for Job according to the status of Pod.

According to the dependency, we can also ensure that the last task is completed before running the next task.

At the same time, it can also control the degree of parallelism of the task, according to the degree of parallelism to ensure the number of parallelism and the overall completion size of the Pod.

Use case interpretation

Let's take a look at how Job completes the following application based on an example.

Job syntax

The figure above is the simplest yaml format for Job. Here we introduce a new kind called Job, which is actually a type of job-controller. Then the name in metadata specifies the name of the Job, and the spec.template below is actually the spec of pod.

The contents are all the same, except for two more points:

The first is restartPolicy. In Job, we can set three retry strategies: Never, OnFailure and Always. When we want Job to run again, we can use Never; to run it again if it fails, and then try again. We can use OnFailure; or rerun Alway in any case.

In addition, it is impossible for Job to retry indefinitely at run time, so we need a parameter to control the number of retries. This backoffLimit is to guarantee how many times a Job can retry.

So in Job, one of our main concerns is the restartPolicy restart strategy and the limit on the number of backoffLimit retries.

Job statu

After the Job is created, we can use the command kubectl get jobs to see the current running status of the job. In the resulting value, there is basically the name of Job, how many Pod are currently completed, and how long it takes.

* * AGE * * means that the Pod is calculated from the current time, minus the time it was created at that time. This length is mainly used to tell you the history of Pod and how long it has been since Pod was created. * * DURATION * * mainly depends on how long the actual business in our Job has been running. This parameter will be very useful when tuning our performance. * * COMPLETIONS * * mainly looks at how many Pod there are in our task, and then how many states it has completed will be displayed in this field.

View Pod

Let's take a look at Pod. In fact, the last execution unit of Job is Pod. The Job we just created will create a Pod called "pi". This task is to calculate this pi. The name of Pod will be "${job-name}-${random-suffix}". We can take a look at the yaml format of Pod below.

It has one more ownerReferences than a normal Pod, which declares which upper-level controller manages the pod. You can see that the ownerReferences here is managed by batch/v1, that is, the previous Job. Here you declare who its controller is, and then you can find out who its controller is through pod, and you can also find out what Pod it belongs to according to Job.

Running Job in parallel

We sometimes have some requirements: we want the Job runtime to maximize parallelism and produce n Pod to execute quickly. At the same time, because we have a limit on the number of nodes, we may not want to have too many parallel Pod at the same time. With the concept of pipeline, we can hope that the maximum parallelism is what the Job controller can do for us.

Here we mainly look at two parameters: one is completions and the other is parallelism.

The first parameter is used to specify the number of times this Pod queue is executed. Maybe this is not easy to understand, but you can actually think of it as the total number of times that can be run as specified by the Job. For example, this is set to 8, that is, the task will be performed a total of 8 times.

The second parameter represents the number of parallel executions. The so-called number of parallel execution is actually the size of the buffer queue in a pipeline or buffer. Set it to 2, that is to say, the Job must be executed 8 times, 2 Pod at a time, so that a total of 4 batches will be executed.

View parallel Job running

Let's take a look at its actual running effect. The figure above is what you can see when the Job is running as a whole. First, you can see the name of the job, and then you can see that it has created a total of 8 pod and executed for 2 minutes and 23 seconds. This is the creation time.

Then let's see that the real pods,pods has a total of eight pod, and the status of each pod is complete, and then let's take a look at its AGE, which is time. From the bottom up, we can see that there are 73s, 40s, 110s and 2m26s respectively. Each group has two pod times that are the same, that is, the time period of 40s is the last to create, and the 2m26s is the first to be created. That is, two pod are always created at the same time, parallelized, disappeared, then created, run, and finished again.

For example, just now we actually use the second parameter to control the number of parallel execution of the current Job, so we can see the role of this buffer or pipeline queue size.

Cronjob syntax

Let's introduce another Job, called CronJob, which can also be called time-running Job. CronJob is basically similar to Job, except that it can be designed for a time. For example, it can be executed regularly, especially suitable for doing some cleaning tasks in the evening, and it can be executed every few minutes, every few hours, and so on. This is called a scheduled task.

There are several different fields for scheduled tasks compared to Job:

Schedule:schedule this field is mainly to set the time format, its time format and Linux crontime is the same, so directly according to the Linux crontime writing format to write on it. For example: * / 1 means to execute the Job every minute. What the Job needs to do is print out the approximate time, and then print out the sentence "Hello from the kubernetes cluster".

* * startingDeadlineSeconds:** means how long it can wait each time Job is run, and sometimes the Job may run for a long time and will not start. So at this point, if it takes longer, CronJob will stop the Job.

ConcurrencyPolicy: that is, whether parallel operation is allowed or not. The so-called parallel running is, for example, I execute it every minute, but this Job may take a long time to run. If it takes two minutes to run successfully, that is, when the second Job needs to be run, the last Job is not finished. If the policy is set to true, it will be executed every minute regardless of whether your previous Job is finished or not; if it is false, it will wait for the last Job to run before it will run the next one

* * JobsHistoryLimit:** means that every time a CronJob is run, it will leave behind the running history and viewing time of a Job. Of course, this amount cannot be unlimited, so you need to set the number of historical retention. Generally, you can set a default of 10 or 100, which mainly depends on the different clusters of each person, and then determine this time according to the number of clusters of each person.

The choreography file of the operation demonstration Job

Let's take a look at how to use Job.

Creation and running Verification of Job

First take a look at job.yaml. This is a very simple task of calculating pi. Use kubectl creat-f job.yaml so that job can be submitted successfully. If you take a look at kubectl.get.jobs, you can see that the job is running; get pods can see that the pod should be running, so next logs the job and pod. You can see that pi is printed in the following picture.

Orchestration file of parallel Job

Let's look at the second example:

Creation and running Verification of parallel Job

This example means that after the parallel running Job creation, you can see that there is a second parallel Job.

Now that there are two Pod in running, you can see that it has been executed for almost 30s.

It should be the second one after 30 seconds.

The first batch of pod has been completed, and the second batch of pod is in the process of running, with two Pod in each batch. That is to say, every 40s or so, there will be two pod executing in parallel, it will execute 4 batches, a total of 8 pod, when all the pod execution is finished, that is, the buffer queue function of parallel execution just mentioned.

If you look at the pods after a while, you can see that the second batch has been executed, and then the third batch is created.

Cronjob choreography file

Let's take a look at the third example-- CronJob. CronJob is executed every minute, one job at a time.

Creation and running Verification of Cronjob

The following figure shows that CronJob has been created, and you can see that there is currently a CronJob through get cronjob. At this time, let's take a look at jobs. Since it is executed every minute, we will have to wait a little while.

At the same time, you can see that the last job is still running, and its time is about 2m12s. Its completion degree is 7max 8, 6max 8, and just saw 7max 8 to 8max 8. That is to say, our last task performed the last step, and it was run by two or two each time. Running two job at a time makes it particularly convenient for us to run some large workflows or work tasks.

In the picture above, you can see that a job suddenly appears. The job of "hello-xxxx" is the CronJob just mentioned. It has been 1 minute since the CronJob was submitted, so it will automatically create a job. If you don't interfere with it, it will create such a job every minute, unless we specify when it can no longer be run.

Here, CronJob is mainly used to run some cleanup tasks or perform some scheduled tasks. Some tasks, such as Jenkins building, can be particularly effective.

Architecture design Job management mode

Let's look at the architectural design of job. In fact, Job Controller is mainly to create the corresponding pod, and then Job Controller will track the status of the Job and retry or continue to create it in time according to some of the configurations we submitted. At the same time, we just mentioned that each pod will have its corresponding label to track the Job Controller it belongs to, and to configure parallel creation, parallel or serial creation of pod.

Job controller

The figure above shows the main flow of a Job controller. All job is a controller, which will watch the API Server. Every time we submit a Job, the yaml will be sent to ETCD through api-server, and then Job Controller will register several Handler. Whenever there are add, update, delete and other operations, it will be sent to controller through a memory-level message queue.

Check through Job Controller to see if there is a running pod, and if not, create the pod through Scale up; if so, or if it is greater than this number, Scale down it, and if the pod changes, you need to Update its state in time.

At the same time, to check whether it is parallel job, or serial job, according to the set configuration of parallelism, serialization, the number of pod to be created in time. Finally, it updates the entire status of the job to the API Server so that we can see the final effect.

II. DaemonSet background of DaemonSet demand source

The second controller is described below: * * DaemonSet. * * the same question: what would happen if we didn't have DaemonSet? Here are a few requirements:

First, what if you want each node to run the same pod?

What if a new node wants to perceive it immediately when it joins the cluster, and then deploy a pod to help us initialize something?

What should I do if I want the corresponding pod to be deleted when a node exits?

If the pod status is abnormal, we need to monitor the node exception in time, and then do some monitoring or reporting actions, then what controller will be used to do these things?

DaemonSet: daemon controller

DaemonSet is also a default controller provided by Kubernetes, which actually acts as a daemon controller, which can help us do the following things:

First of all, it can ensure that every node in the cluster is running the same set of pod.

At the same time, it can also ensure that the new node automatically creates the corresponding pod according to the status of the node.

When you remove a node, you can delete the corresponding pod

And it will track the status of each pod, when the pod is abnormal and the Crash is dropped, it will go to the recovery status in time.

Use case interpretation of DaemonSet Grammar

As an example, DaemonSet.yaml will be a little longer.

The first is kind:DaemonSet. If you have learned deployment before, in fact, it will be easier for us to look at this yaml. For example, it will have matchLabel, through matchLabel to manage the corresponding pod, this pod.label and this DaemonSet.controller.label also want to match, it can find the corresponding management Pod according to label.selector. Everything in the spec.container below is consistent. Here we use fluentd as an example. The most common points of DaemonSet are the following:

First of all, something like storage, GlusterFS or Ceph, which needs to run something similar to Agent on each node, and DaemonSet can meet this demand very well.

In addition, for log collection, such as logstash or fluentd, these are the same requirements, requiring each node to run an Agent, so that we can easily collect its status and report the information in each node to it in a timely manner.

Another thing is that each node needs to run some monitoring things, and each node needs to run the same thing, such as Promethues, which also needs the support of DaemonSet.

View DaemonSet status

After creating the DaemonSet, we can use kubectl get DaemonSet (DaemonSet abbreviated to ds). You can see that the return value of DaemonSet is very similar to deployment, that is, it currently has a few running, and then we need several, READY a few. Of course, there is only Pod in READY, so it finally creates all pod.

There are several parameters, namely: the number of pod needed, the number of pod currently created, the number of ready, and all available pod; that pass the health check, as well as NODE SELECTOR, because NODE SELECTOR is very useful in DaemonSet. Sometimes we may want only some nodes to run the pod instead of all, so if some nodes are marked, the DaemonSet will only run on those nodes. For example, I can use this NODE SELECTOR if I only want the master node to run some pod, or only the Worker node to run some pod.

Update DaemonSet

In fact, DaemonSet is very similar to deployment, it also has two update strategies: one is RollingUpdate, the other is OnDelete.

RollingUpdate is actually easier to understand, but it will be updated one by one. First update the first pod, then the old pod is removed, and then go to see the second pod after passing the health check, so that the business will be upgraded more smoothly without interruption.

In fact, OnDelete is also a good update strategy, that is, after the template update, there will be no change in pod, which needs to be controlled manually. If we delete the pod corresponding to a node, it will be rebuilt, and if it is not deleted, it will not be rebuilt, which will be particularly good for some special requirements that we need to control manually.

Operation demonstration DaemonSet choreography

Here is an example. For example, if we change some DaemonSet images and see its status, it will update one by one.

The picture above is the yaml of DaemonSet just now. It will be a little more than just now. We will make some resource restrictions, which will not affect us.

Verification of the creation and running of DaemonSet

Let's create a DaemonSet and then take a look at its status. The following picture is the status of DaemonSet typed in ready that we just saw.

As you can see from the figure below, a total of four pod have been created. Why four pod? Because there are only four nodes, a corresponding pod runs on each node.

Updates to DaemonSet

At this point, let's update DaemonSet, and after executing kubectl apply-f, its DaemonSet has been updated. Next, let's check the update status of DaemonSet.

As can be seen in the figure above: DaemonSet defaults to RollingUpdate. We see that it is 0-4, and now it is 1-4, that is, it is updating the first one, the first update will update the second, and the second update will update the third, which is RollingUpdate. RollingUpdate can update automatically without being on duty, but automatically update one by one, and the update process is relatively smooth, which can help us to release or do some other operations on the spot.

At the end of the image above, you can see that the entire DaemonSet has been RollingUpdate.

Architecture design DaemonSet management mode

Next, take a look at the DaemonSet architecture design. DaemonSet is also a controller, and its last real business unit is Pod,DaemonSet, which is very similar to Job controller. It also goes to the state of watch API Server through controller, and then adds pod in a timely manner. The only difference is that it monitors the status of the node, creates a corresponding pod on the node when the node joins or disappears, and then selects the corresponding node according to some affinity or label you configure.

DaemonSet controller

Finally, let's take a look at the controller of DaemonSet. DaemonSet actually does much the same thing as Job controller: both need to be based on the API Server status of watch. Now the only difference between DaemonSet and Job controller is that DaemonsetSet Controller needs to go to the state of watch node, but in fact, the state of this node is passed to ETCD through API Server.

When a node status node changes, it will be sent through an in-memory message queue, and then DaemonSet controller will go to watch this state to see that there is a corresponding Pod on each node, and create it if not. Of course, it will make a comparison, if any, it will compare the versions, and then add the just mentioned whether to do RollingUpdate? If not, it will be recreated, and when Ondelete deletes the pod, it will also check it to check whether to update, or to create the corresponding pod.

Of course, in the end, if all the updates are finished, it will update the status of the entire DaemonSet to the API Server and complete the final updates.

After reading the above, do you have any further understanding of Job and DaemonSet in application choreography and management? If you want to know more knowledge or related content, please follow the industry information channel, thank you for your support.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.