How to stop Pod gracefully in Kubernetes 07/04 Update SLTechnology News&Howtos

How to stop Pod gracefully in Kubernetes

2025-07-04 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Shulou(Shulou.com)06/02 Report--

Today, I will talk to you about how to stop Pod gracefully in Kubernetes. Many people may not know much about it. In order to make you understand better, the editor has summarized the following content for you. I hope you can get something according to this article.

I've always understood gracefully stopping Pod simply: don't you just use PreStop hook to make an elegant exit? Recently, however, it has been found that PreStop Hook does not fulfill the requirements very well in many scenarios, so this article briefly analyzes the matter of "stopping Pod gracefully".

What is an elegant stop?

The term "graceful Graceful shutdown" comes from the operating system. After we perform a shutdown, we all have to OS to complete some cleaning operations, as opposed to a hard stop (Hard shutdown), such as unplugging a power supply.

In a distributed system, graceful stopping is not only a matter for the process on a single machine, but also has to deal with other components of the system. For example, if we set up a micro-service and the gateway distributes part of the traffic to us, at this time:

If we kill the process without saying a word, this part of the traffic will not be handled correctly and some users will be affected. Fortunately, generally speaking, the gateway or service registry will keep the same heartbeat with our service, and after the heartbeat timeout, the system will automatically remove our service, and the problem will be solved; this is a hard abort. although our whole system is well written and can heal itself, it will still produce some jitter and even errors.

If we first tell the gateway or service registry that we want to go offline and wait for the other party to complete the service removal operation before aborting the process, then no traffic will be affected; this is an elegant stop, minimizing the impact of the start and stop of a single component on the entire system.

Traditionally, SIGKILL is a signal of hard termination, while SIGTERM is a signal that notifies the process to exit gracefully, so many micro-service frameworks listen to SIGTERM signals and perform clean-up operations such as anti-registration after receiving them to achieve elegant exit.

PreStop Hook

Going back to Kubernetes (hereinafter referred to as K8s), when we want to kill a Pod, the ideal situation is of course for K8s to remove the Pod from the corresponding Service (if any) and send a SIGTERM signal to Pod to let each container in the Pod exit gracefully. But in fact, Pod may commit all kinds of moths:

It is so stuck that it cannot handle the code logic of elegant exit or it will take a long time to complete.

The logic of elegant exit is BUG, which is an endless cycle.

The code is so wild that it doesn't care about SIGTERM at all.

Therefore, there is also a "maximum tolerable time" in the Pod termination process for K8s, namely grace period (defined in the .spec.terminationGracePerforming seconds field of Pod), which defaults to 30 seconds, and we can also override the configuration in Pod by explicitly specifying an elegant exit time when we execute kubectl delete with the-grace-period parameter. When the grace period exceeds, K8s can only choose SIGKILL to force Pod to be killed.

In many scenarios, in addition to removing Pod from the Service of K8s and gracefully exiting within the process, we have to do some additional things, such as unregistering from the service registry outside of K8s. PreStop Hook is needed at this time. K8s currently provides both Exec and HTTP PreStop Hook. For actual use, you need to configure each container in Pod separately through the .spec.containers [] .lifecycle.preStop field of Pod, such as:

Spec: contaienrs:-name: my-awesome-container lifecycle: preStop: exec: command: ["/ bin/sh", "- c", "/ pre-stop.sh"] copy the code

Our own cleanup logic can be written in the / pre-stop.sh script.

Finally, let's string together and explain the whole process of Pod exit (more stringent in the official documentation):

The user deletes the Pod.

2.1. Pod enters the Terminating state.

2.2. At the same time, K8s removes Pod from the corresponding service.

2.3. At the same time, for containers with PreStop Hook, kubelet will call the PreStop Hook of each container, and if the running time of PreStop Hook exceeds grace period,kubelet, it will send SIGTERM and wait another 2 seconds.

2.4. At the same time, kubelet sends SIGTERM for containers that do not have PreStop Hook.

After the grace period exceeds, kubelet sends SIGKILL to kill the containers that have not yet exited.

This process is good, but one problem with it is that we can't predict how soon Pod will complete an elegant exit, nor can we gracefully deal with the failure of an "elegant exit". In our product TiDB Operator, this is an unacceptable thing.

Challenges of stateful distributed applications

Why can't you accept this process? In fact, this process is usually OK for stateless applications, but the following scenario is a little more complicated:

There is a core distributed KV storage layer TiKV in TiDB. TiKV internal consistent storage based on Multi-Raft, this architecture is more complex, here we can simplify the description as one master and multi-slave architecture, Leader write, Follower synchronization. Our scenario is to do planned operation and maintenance operations on TiKV, such as rolling upgrades and migrating nodes.

In this scenario, although the system can accept less than half of the node downtime, but for the expected outage, we should try to stop gracefully. This is because the database scenario itself is very demanding and is basically at the core of the entire architecture, so we need to keep the jitter as small as possible. To do this, we have to do a lot of cleaning work, for example, we need to migrate all the Leader on the current node to other nodes before downtime.

Thanks to the good design of the system, most of the time this kind of operation is very fast, but exceptions are common in distributed systems, and elegant exit takes too long or even fails. If something similar happens, for the sake of business stability and data security, we cannot forcibly shut down Pod. Instead, we should stop the operation and notify the engineer to intervene. At this point, the Pod exit process mentioned above is no longer applicable.

Be careful: manually control all processes

In fact, K8s itself does not have an out-of-the-box solution, so we carefully control the service start and stop logic under various operation scenarios in our own Controller (the TiDB object itself is a CRD).

Details aside, the final logic is that before each suspension of service, Controller notifies the cluster of various migration operations before the node is offline. After the operation is completed, the node is really offline and the operation of the next node is carried out.

If the cluster cannot normally complete operations such as migration or takes too long, we can also "hold the bottom line" and will not forcibly kill the node, which ensures the security of operations such as rolling upgrade and node migration.

But a problem with this method is that it is complicated to implement. We need to implement a controller ourselves, in which we implement fine-grained control logic and constantly check whether the Pod can be safely stopped in the control cycle of Controller.

Another way: decouple the control flow deleted by Pod

Complex logic is always not as easy to maintain as simple logic, and there is not a small amount of development to write CRD and Controller, so can there be a more concise and general logic that can achieve the requirement of "guaranteed elegant shutdown (otherwise do not close)"?

Yes, the solution is ValidatingAdmissionWebhook.

Here we first introduce a little bit of background knowledge. Kubernetes's apiserver has the design of AdmissionController from the very beginning. This design is very similar to Filter or Middleware in various Web frameworks. It is a plug-in chain of responsibility. Each plug-in in the chain of responsibility does some operation or verification according to the requests received by apiserver. Take two examples of plug-ins:

DefaultStorageClass, which automatically sets storageClass for PVC that does not declare storageClass.

ResourceQuota to verify whether the resource usage of the Pod exceeds the Quota of the corresponding Namespace.

Although this is plug-in, before 1.7, all plugin needs to be written into apiserver code to compile together, which is very inflexible. In 1.7K8s introduces Dynamic Admission Control mechanism, which allows users to register webhook with apiserver, while apiserver uses webhook to call external server to implement filter logic. This feature is further optimized to divide webhook into two categories: MutatingAdmissionWebhook and ValidatingAdmissionWebhook, which, as the name implies, operate on api objects, such as DefaultStroageClass in the example above, while the latter validates api objects, such as ResourceQuota. After the split, apiserver can ensure that all the changes (Mutating) are made before Validating. The following diagram is very clear:

Our solution is to use ValidatingAdmissionWebhook to request the cluster to clean up and prepare before going offline on the webhook server when an important Pod receives a deletion request, and directly return the rejection. At this point, in order to reach the target state (such as upgrading to a new version), Control Loop will continue to reconcile and try to delete Pod, while our webhook will continue to refuse, unless the cluster has completed all the cleanup and preparation.

Here is a step-by-step description of the process:

The user updates the resource object.

Controller-manager watch to object changes.

Controller-manager starts synchronizing the state of the object, trying to delete the first Pod.

Apiserver calls the external webhook.

Webhook server requests the cluster to do the preparation work before the tikv-1 node goes offline (this request is idempotent), and queries whether the preparation work is completed. If the preparation is completed, delete is allowed, if not, it is rejected, and the whole process will return to step 2 because of the control cycle of controller manager.

Everything seems to be clear all at once, and the logic of this webhook is very clear, that is, to ensure that all relevant Pod deletion operations must be prepared for elegant exit, regardless of how the external control loop runs, so it is very easy to write and test, and gracefully meets our need to "ensure elegant closure (otherwise do not close)". At present, we are considering replacing the old online scheme in this way.

Postscript

In fact, Dynamic Admission Control has a wide range of applications. For example, Istio uses MutatingAdmissionWebhook to implement the injection of envoy containers. From the above example, we can also see that it has strong expansibility, and it can often stand on an orthogonal perspective, solve the problem very cleanly and decouple well from other logic.

Of course, there are a lot of extension points in Kubernetes, from kubectl to apiserver,scheduler,kubelet (device plugin,flexvolume), custom Controller to cluster-level networking (CNI), CSI can be said to do things everywhere. In the past, we are not familiar with and never used to do some conventional micro-service deployment, but now in the face of a complex distributed system like TiDB, especially when Kubernetes's support for stateful applications and local storage is not good enough, it is very interesting to consider it carefully at every extension point.

After reading the above, do you have any further understanding of how to ensure that Pod is gracefully stopped in Kubernetes? If you want to know more knowledge or related content, please follow the industry information channel, thank you for your support.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.