Secret: how to upgrade in place for Kubernetes 07/06 Update SLTechnology News&Howtos

Secret: how to upgrade in place for Kubernetes

2025-07-06 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Servers >

Shulou(Shulou.com)06/02 Report--

Author | Wang Siyu (Jiuzhu) Aliyun technical expert

Participate in the interaction of Alibaba Yun's original message at the end of the article, that is, he will have the opportunity to get the welfare of the complimentary book and the author to answer questions!

Concept introduction

In the word "upgrade", it is not difficult to understand that the version of the application instance is replaced by the new version from the old version. So how to understand "in place" in the context of Kubernetes?

Let's first take a look at how K8s native workload is released. Let's assume that we need to deploy an application, including foo and bar containers in Pod. The foo container is deployed with v1 image version for the first time. We need to upgrade it to v2 version image. What should we do?

If the application is deployed using Deployment, Deployment will trigger the new version of ReplicaSet to create Pod and delete the old version of Pod during the upgrade process. As shown in the following figure:

During this upgrade, the original Pod object is deleted and a new Pod object is created. The new Pod is dispatched to another Node, assigned to a new IP, and the foo and bar containers are re-pulled the image and started the container on this Node.

If this should be deployed using StatefulSet, StatefulSet will delete the old Pod object during the upgrade and create a new Pod object with the same name after the deletion is completed. As shown in the following figure:

It is worth noting that although the old and new Pod are both named pod-0, they are actually two completely different Pod objects (uid has changed, too). StatefulSet waits until the original pod-0 object is completely removed from the Kubernetes cluster before committing to create a new pod-0 object. The new Pod will also be rescheduled, assigned IP, pulled the image, and started the container.

The so-called in-place upgrade mode is to avoid deleting or creating entire Pod objects during the upgrade process, but to upgrade the mirrored versions of one or more containers based on the original Pod objects:

During the in-place upgrade process, we only updated the image field of the foo container in the original Pod object to trigger the foo container to upgrade to the new version. Neither the Pod object nor Node or IP has changed, and even the foo container has been running during the upgrade process.

Summary: this upgrade method, which only updates one or more container versions in Pod without affecting the entire Pod object and the rest of the containers, is called in-place upgrade in Kubernetes.

Income analysis

So why should we introduce this concept and design of upgrade in place in Kubernetes?

First of all, this in-situ upgrade mode greatly improves the efficiency of application publishing. according to incomplete statistics, in-situ upgrade is at least 80% faster than complete reconstruction upgrade in Ali environment. It's easy to understand that in-place upgrades bring the following optimization points to release efficiency:

It saves the time of scheduling, the location and resources of Pod remain unchanged, saves the time of allocating network, Pod uses the original IP; to save the time of allocating and mounting remote disks, and Pod also uses the original PV (all of which have been mounted on Node). It saves most of the time of pulling images, because the old images of applications already exist on Node, and only a few layers of layer need to be downloaded when pulling new versions of images.

Secondly, when we upgrade some sidecar containers in Pod (such as log collection, monitoring, etc.), we do not want to interfere with the operation of the business container. However, in the face of this scenario, the upgrade of Deployment or StatefulSet will rebuild the entire Pod, which is bound to have a certain impact on the business. On the other hand, the scope of in-situ upgrade at the container level is very controllable. Only the containers that need to be upgraded will be rebuilt, and other containers, including network and mount disks, will not be affected.

Finally, the in-situ upgrade also brings us the stability and certainty of the cluster. When a large number of applications in a Kubernetes cluster trigger a reconstruction Pod upgrade, it may cause large-scale Pod drift and repeated preemptive migration of some low-priority tasks on the Node, Pod. These large-scale Pod reconstructions themselves will put great pressure on central components such as apiserver, scheduler, network / disk allocation, and the delay of these components will also bring a vicious circle to Pod reconstruction. After upgrading in place, the whole upgrade process only involves the update operation of the Pod object by controller and the reconstruction of the corresponding container by kubelet.

Technical background

Within Alibaba, most of the e-commerce applications in the cloud native environment are released in the way of in-situ upgrade, and this set of controllers that support in-place upgrade are located in the OpenKruise open source project.

In other words, all the native cloud applications within Ali use the extended workload in OpenKruise for deployment management, and do not use native Deployment/StatefulSet and so on.

So how does OpenKruise achieve the ability to upgrade in place? Before introducing the principle of in-place upgrade implementation, let's take a look at some of the native Kubernetes features that the in-place upgrade feature depends on:

Background 1:Kubelet version management for Pod containers

For each Kubelet on each Node, a hash value is calculated for each container in all the Pod.spec.containers on the machine and recorded in the actual created container.

If we modify the image field of a container in the Pod, kubelet will find that the hash of the container has changed and is inconsistent with the container hash created in the past on the machine, and then the kubelet will stop the old container and create a new container based on the container in the latest Pod spec.

This feature is actually the core principle of upgrading in place for a single Pod.

Background 2:Pod update limit

In native kube-apiserver, requests for updates to Pod objects have strict validation check logic:

/ / validate updateable fields:// 1. Spec.containers [*]. Image// 2. Spec.initContainers [*]. Image// 3. Spec.activeDeadlineSeconds

To put it simply, for a Pod that has been created, only the image field in the containers/initContainers and the activeDeadlineSeconds field are allowed to be modified in the Pod Spec. Updates to all other fields in Pod Spec are rejected by kube-apiserver.

Background 3:containerStatuses report

Kubelet reports containerStatuses in pod.status, corresponding to the actual running status of all containers in Pod:

ApiVersion: v1kind: Podspec: containers:-name: nginx image: nginx:lateststatus: containerStatuses:-name: nginx image: nginx:mainline imageID: docker-pullable://nginx@sha256:2f68b99bc0d6d25d0c56876b924ec20418544ff28e1fb89a4c27679a40da811b

In most cases, spec.containers [x] .image and status.containerStatuses.image are consistent.

However, this is also the case. The report of kubelet is not consistent with the image in spec (nginx:latest in spec, but nginx:mainline in status).

This is because the image reported by kubelet is actually the image name of the container obtained from the CRI API. If there are multiple images on the Node machine corresponding to an imageID, then any one of them may be reported:

$docker images | grep nginxnginx latest 2622e6cca7eb 2 days ago 132MBnginx mainline 2622e6cca7eb 2 days ago

Therefore, the inconsistency between the image fields of spec and status in a Pod does not mean that the mirrored version of the container running on the host machine is inconsistent with the expected version.

Background 4:ReadinessGate controls whether Pod is Ready

Before Kubernetes version 1.12, it was only up to kubelet to determine whether a Pod was in the Ready state based on the container state: if all the containers in the Pod were ready, then the Pod was in the Ready state.

But in fact, most of the time, the upper operator or users need the ability to control whether Pod is Ready. Therefore, after version 1.12 of Kubernetes, a readinessGates feature is provided to satisfy this scenario. As follows:

ApiVersion: v1kind: Podspec: readinessGates:-conditionType: MyDemostatus: conditions:-type: MyDemostatus: "True"-type: ContainersReady status: "True"-type: Ready status: "True"

At present, there are two prerequisites for kubelet to determine whether a Pod is Ready:

All Ready of the container in Pod (in fact, ContainersReady condition is True); if one or more conditionType is defined in pod.spec.readinessGates, then these conditionType need to have the status of corresponding status: "true" in pod.status.conditions.

Kubelet will report Ready condition as True only if the above two prerequisites are met.

Realization principle

After understanding the above four backgrounds, let's take a look at how OpenKruise implements in-place upgrades in Kubernetes.

1. How to upgrade a single Pod in place?

As you can see from "background 1", if we modify the spec.containers [x] field of a stock Pod, the kubelet will sense that the hash of the container has changed, then stop the corresponding old container, and use the new container to pull the image, create and start the new container.

As can be seen from "background 2", our current modifications in the spec.containers [x] of a stock Pod are limited to the image field.

Therefore, the first implementation principle is derived: * * for an existing Pod object, we can and can only modify the spec.containers [x] .image field to trigger the corresponding container in Pod to upgrade to a new image.

two。 How to judge the success of Pod upgrade in place?

The next question is, when we modify the spec.containers [x] .image field in Pod, how can we tell that kubelet has rebuilt the container successfully?

As you can see from "background 3", it is unreliable to compare the image field in spec and status, because it is likely that another image name (the same imageID) that exists on Node is reported in status.

Therefore, the second implementation principle is obtained: the relatively reliable way to judge whether the Pod upgrade is successful or not is to record the status.containerStatuses.imageID before upgrading in place. After updating the spec image, if we observe a change in the status.containerStatuses.imageID of Pod, we assume that the upgrade in place has rebuilt the container.

But in this way, we also have a requirement for the image to be upgraded in place: you cannot upgrade in place with an image with a different image name (tag) but actually corresponding to the same imageID, otherwise it may not be judged to be successful all the time (because the imageID in the status will not change).

Of course, we can continue to optimize it later. OpenKruise is about to open source the ability to warm up images, which will deploy a NodeImage Pod on each Node through DaemonSet. Through the NodeImage report, we can know the imageID corresponding to the image in the pod spec, and then compare it with the imageID in the pod status to accurately determine whether the in-situ upgrade is successful.

3. How to ensure that the traffic is lossless during the in-place upgrade?

In Kubernetes, whether a Pod is Ready represents whether it can provide services. Therefore, traffic entrances such as Service will judge the Pod Ready to choose whether the Pod can be added to the endpoints endpoint.

As you can see from background 4, after Kubernetes 1.12 +, components such as operator/controller can also control the availability of Pod by setting the readinessGates and updating the custom type status in the pod.status.conditions.

Therefore, there is a third implementation principle: you can define a conditionType called InPlaceUpdateReady in pod.spec.readinessGates.

When upgrading in place:

First set the InPlaceUpdateReady condition in the pod.status.conditions to "False", which will trigger kubelet to report the Pod as NotReady, causing the traffic component (such as endpoint controller) to remove the Pod from the service endpoint; then update the image in the pod spec to trigger the in-place upgrade.

After the upgrade in place is over, set InPlaceUpdateReady condition to "True" to make Pod return to Ready state.

In addition, in the two steps of in-place upgrade, after the first step is changed from Pod to NotReady, it may take some time for the traffic component to asynchronously watch to change and remove the endpoint. So we also provide the ability to upgrade gracefully in place, that is, to configure the quiet period between the steps of modifying the NotReady status and actually updating the image to trigger the in-place upgrade through gracePeriodSeconds.

4. Combined release strategy

Upgrade in place, like Pod rebuild upgrade, can be performed in conjunction with various release strategies:

Partition: if you configure partition to do grayscale, then only the Pod of the number of replicas-partition will be upgraded in place; maxUnavailable: if maxUnavailable is configured, only the Pod that meets the number of unavailable will be upgraded in place; maxSurge: if maxSurge is configured to be flexible, then the existing Pod will still be upgraded in place after the Pod of the number of maxSurge is expanded first; priority/scatter: if the release priority / fragmentation policy is configured, Pod will be upgraded in place according to the policy order. Summary

As mentioned above, OpenKruise combines the kubelet container version management, readinessGates and other functions natively provided by Kubernetes to realize the in-place upgrade capability for Pod.

The in-situ upgrade also brings a significant improvement in efficiency and stability for application release. It is worth paying attention to that with the increase of cluster and application scale, the benefit of this improvement is more and more obvious. It is this in-place upgrade capability that has helped Alibaba's super-large-scale application containers migrate smoothly to the Kubernetes-based cloud native environment in the past two years, and the native Deployment/StatefulSet can not be rolled out in such a mass environment. (welcome to join the nail communication group: 23330762)

-complimentary book welfare-

Ask your question in the [Alibaba Yunyuan official account] message area before 12:00 on June 19. The first selected message will get this book free of charge. At that time, we will also ask the author to answer the top 5 questions in the message!

Course recommendation

In order that more developers can enjoy the dividends brought by Serverless, this time, we have gathered 10 + technical experts in Alibaba's Serverless field to create an open Serverless course that is most suitable for developers to get started, so that you can easily embrace the new paradigm of cloud computing-Serverless.

Click to watch the course for free: https://developer.aliyun.com/learning/roadmap/serverless

"Alibaba Cloud Native focus on micro services, Serverless, containers, Service Mesh and other technology areas, focus on cloud native popular technology trends, cloud native large-scale landing practice, to be the official account of cloud native developers."

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.