How to realize the first large-scale image preheating capability of K8s community by OpenKruise 07/13 Update SLTechnology News&Howtos

How to realize the first large-scale image preheating capability of K8s community by OpenKruise

2025-07-13 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Servers >

Shulou(Shulou.com)05/31 Report--

This article will explain in detail how OpenKruise can achieve the first large-scale image preheating capability in the K8s community. The content of the article is of high quality, so the editor will share it for you as a reference. I hope you will have some understanding of the relevant knowledge after reading this article.

Background: why is the image preheating ability necessary

"mirroring" is also a major innovation that Docker brings to the container field. Before Docker, although Linux has provided cgroup isolation, and although Alibaba has gradually started containerization based on LXC since 2011, there is a lack of encapsulation of the running environment such as mirroring. However, although mirroring brings us many benefits, it is undeniable that we also face a variety of problems caused by pulling mirrors in real scenarios, the most common of which is the time-consuming of pulling mirrors.

We have heard many users' expectations and understanding of containerization in the past, such as "extreme flexibility", "second expansion", "efficient release" and so on. However, combined with a standard Pod creation process in Kubernetes, there is still a gap between users' expectations and users' expectations (assuming that Pod contains sidecar and app containers):

Normally, it takes less time to schedule, allocate / mount remote disk and allocate network in small-scale clusters, and it needs to be optimized for large-scale clusters, but it is still within a controllable range. However, the time-consuming of pulling images is particularly tricky in large-scale flexible clusters. Even if P2P and other technical means are used to optimize, it may take a long time to pull a larger business image, which is not in line with the expansion and release speed expected by users.

If we can pull the image of the sidecar container and the basic image of the business container on the node in advance, the Pod creation process can be greatly shortened, and the time taken to pull the image can even be optimized by more than 70%:

Kubernetes itself does not provide any image-oriented operation capabilities, around the ecology of Kubernetes, there is no relatively mature large-scale image preheating products. This is why we provide image preheating in OpenKruise, and this set of image preheating capabilities have been implemented on a large scale in Alibaba's native cloud environment. Our basic usage will be briefly introduced later in practice.

How to realize Image warm-up by OpenKruise

To realize the principle of image warm-up in OpenKruise, we should first look at its operating architecture:

Starting with v0.8.0, after installing Kruise, there are two components under the kruise-system namespace: kruise-manager and kruise-daemon. The former is a centralized component deployed by Deployment, and a kruise-manager container (process) contains multiple controller and webhook;. The latter is deployed to nodes in the cluster by DaemonSet, and performs some expansion capabilities (such as pulling images, restarting containers, etc.) by interacting with CRI to bypass Kubelet.

Therefore, Kruise creates a custom resource with the same name for each node (Node): NodeImage, and the NodeImage of each node specifies which images need to be preheated on this node, so the kruise-daemon on this node only needs to perform the image pull task according to NodeImage:

As shown in the figure above, we can specify the name of the image to be pulled, the tag, and the pull policy in NodeImage, such as timeout for a single pull, number of failed retries, deadline of the task, TTL time, and so on.

With NodeImage, we have the most basic image preheating capability, but it can not fully meet the preheating needs of large-scale scenarios. In a cluster with 5k nodes, it is obviously not friendly to ask users to update NodeImage resources one by one to warm up. Therefore, Kruise also provides a more abstract custom resource ImagePullJob:

As shown in the figure above, in ImagePullJob, users can specify the range of nodes on which an image is to be warmed up in batches, as well as the pull policy and lifecycle of the job. After an ImagePullJob is created, it is received and processed by the imagepulljob-controller in kruise-manager, decomposed and written to the NodeImage of all matching nodes to complete large-scale warm-up.

The overall process is as follows:

And with the mirror warm-up ability, how do we use it, or in what scenarios do we need to use it? Next, we introduce several common ways to use image preheating in Alibaba.

What are the common ways to use image preheating 1. Basic image-cluster dimension prefetch

The most common warm-up scenario is to continuously warm up some basic images throughout the cluster dimension:

ApiVersion: apps.kruise.io/v1alpha1kind: ImagePullJobmetadata: name: base-image-jobspec: image: xxx/base-image:latest parallelism: 10 completionPolicy: type: Never pullPolicy: backoffLimit: 3 timeoutSeconds: 300

As mentioned above, ImagePullJob has several characteristics:

Do not configure selector rules, that is, prefetch the entire cluster dimension by default

Unified preheating on the nodes of stock

Subsequent new (imported) nodes will also be warmed up automatically immediately.

Adopt Never's completionPolicy strategy to run for a long time.

The Never policy indicates that the job continues to warm up and will not end (unless deleted)

Under the Never policy, ImagePullJob triggers a retry to pull on all matching nodes every 24 hours or so, that is, to ensure the existence of an image every day.

According to our experience, there are about 10-30 ImagePullJob in a cluster to preheat the basic image, depending on the cluster and business scenarios.

2. Prefetch sidecar image-cluster dimension

We can also warm up some sidecar images, especially the basic sidecar that is included in almost every business Pod:

ApiVersion: apps.kruise.io/v1alpha1kind: ImagePullJobmetadata: name: sidecar-image-jobspec: image: xxx/sidecar-image:latest parallelism: 20 completionPolicy: type: Always activeDeadlineSeconds: 1800 ttlSecondsAfterFinished: 300 pullPolicy: backoffLimit: 3 timeoutSeconds: 300

As mentioned above, ImagePullJob has several characteristics:

If selector is not configured, the whole cluster dimension is preheated by default, which is similar to the basic image.

Using Always strategy to warm up at one time

All nodes are preheated once.

Whole job prefetch timeout 30min

Delete automatically after job is completed by 5min

Of course, the sidecar prefetch here can also be configured as a Never policy, depending on the scenario. In our experience, especially when sidecar is doing version iteration and image upgrade, doing a large-scale image warm-up in advance can greatly improve the speed of subsequent Pod expansion and release.

3. Special business image-resource pool dimension prefetch

For some multi-leased Kubernetes clusters, there may be multiple different business resource pools, in which you may need to warm up some specific business images according to the resource pool dimension:

ApiVersion: apps.kruise.io/v1alpha1kind: ImagePullJobmetadata: name: serverless-jobspec: image: xxx/serverless-image:latest parallelism: 10 completionPolicy: type: Never pullPolicy: backoffLimit: 3 timeoutSeconds: 300 selector: matchLabels: resource-pool: serverless

As mentioned above, ImagePullJob has several characteristics:

Using Never strategy to warm up for a long time

Specifies the selector prefetch range, which is the node that matches the resource-pool=serverless tag

Of course, this is only taking the resource pool as an example, where users can define which nodes preheat a certain image according to their own scenarios.

Version foresight: the combination of in-situ upgrade and warm-up

Finally, let's introduce what enhancements we will implement based on the current image warm-up in the next version of OpenKruise (v0.9.0).

Students who have known about OpenKruise before must know that one of the major features we provide is "upgrade in place", which breaks the mode that Pod must be deleted and rebuilt when Kubernetes native workload is released, and only the image of one of the containers can be updated on the original Pod. Students who are interested in the principle of in-place upgrade can read this article: "Secret: how to achieve in-place upgrade for Kubernetes?" ".

Since upgrading in place avoids the process of deleting and rebuilding Pod, it has already brought us the following benefits:

The time consuming of scheduling is saved, and the location and resources of Pod are not changed.

Saves the time of allocating the network, and Pod also uses the original IP

It saves the time of allocating and mounting remote disks, and Pod also uses the original PV (all of which have been mounted on Node).

It saves most of the time of pulling the image, because the old image of the application already exists on the node, and only a few layers of layer need to be downloaded when pulling the new version of the image.

When upgrading one container in Pod in place, the other containers will keep running normally, and the network and storage will not be affected.

Among them, after "saving most of the time spent pulling the image", you only need to download some of the layer on the upper layer of the new image. And is it possible for us to completely optimize the pull time of this image? The answer is yes.

As shown in the figure above, OpenKruise's CloneSet will support automatic image preheating during the release process in the next release. When the user is still upgrading the first batch of Pod in grayscale, Kruise will preheat the image of the new version on the node where the subsequent Pod resides. In this way, when the subsequent batches of Pod are upgraded in place, the new images are ready on the node, which saves the time of pulling the mirror image during the real release process.

Of course, this "release + warm-up" mode only applies to OpenKruise's in-place upgrade scenarios. For native workload such as Deployment, since the Pod is newly created when it is released, we cannot predict the node to which it will be scheduled in advance, so it is impossible to preheat the image in advance.

This is the end of how OpenKruise can achieve the first large-scale image preheating in K8s community. I hope the above content can be of some help to you and learn more knowledge. If you think the article is good, you can share it for more people to see.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.