How to optimize the efficiency of Pod creation in Serverless scene 04/16 Update SLTechnology News&Howtos

How to optimize the efficiency of Pod creation in Serverless scene

2025-04-16 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Servers >

Shulou(Shulou.com)05/31 Report--

This article is about how to optimize the efficiency of Pod creation in Serverless scenarios. The editor thinks it is very practical, so I share it with you to learn. I hope you can get something after reading this article. Let's take a look at it.

Brief introduction of Serverless Computing

Before moving on to the topic, let's briefly review the definition of Serverless computing.

We can learn from Wikipedia that Serverless computing is a form of cloud computing, in which cloud vendors manage servers, dynamically allocate machine resources to users, and charge based on the amount of resources actually used.

When users build and run the service, they do not have to consider the server, which reduces the burden on users to manage the server. Automatically expand the instance capacity through the flexibility of the platform during the business peak, and automatically reduce the instance capacity during the business trough to reduce the resource cost.

Serverless computing platform

The following is the architecture of current common Serverless computing products.

The entire product architecture usually has two layers: the control plane and the data plane, the control plane service developers, manage the application life cycle, meet the developers' needs for application management, and the visitors to the data plane service applications, such as the users of the developer's business. meet the application traffic management and access requirements.

In the management and control plane, Kubernetes is usually used for resource management and scheduling, and master is usually 3 nodes, which meets the demand for high availability. Nodes access K8s master through the intranet SLB.

At the node level, there are usually two types of nodes:

One is the nodes running kubelet, such as bare metal servers, virtual machines, etc., on which the security container runs as Pod, and each Pod has its own kernel, which reduces the security risk brought by the shared host kernel. At the same time, the tenant's network access is isolated at the data link layer through cloud product VPC network or other network technologies. Through secure container + layer 2 network isolation, a reliable multi-rent operating environment can be provided on a single node.

There is also a virtual node that connects K8s and elastic instances through VirtualKubelet. Elastic instance is a lightweight resource form similar to virtual machines in cloud products, which provides container group service with unlimited resource pool. The concept of container group corresponds to the concept of Pod in K8s. AWS provides Fargate elastic instances, while Aliyun provides ECI elastic instances.

Serverless products will provide PaaS layer based on K8s, which is responsible for providing deployment, development and other related services to developers, shielding K8s-related concepts, and reducing the cost of developer development and operation and maintenance applications.

In the data plane, users can access the application instance through SLB. The PaaS layer also usually provides traffic management services such as traffic grayscale and Amax B testing in this plane to meet the needs of developers for traffic management.

Flexibility is the core competitiveness of Serverless computing platform, which needs to meet developers' demands for Pod scale, provide capabilities similar to unlimited resource pools, and also meet the requirements of creating Pod efficiency and respond to requests in a timely manner.

The scale of Pod can be met by increasing the resources of the IaaS layer, and then focus on the techniques to improve the efficiency of Pod creation.

Pod creates related scenes

First, take a look at the scenarios related to the creation of Pod, so that you can meet the business demands more effectively through technology.

There are two scenarios in the business that involve Pod creation:

The first is to create an application, which is scheduled to decide which node is most suitable for Pod, and then creates a Pod on the node.

The second is to upgrade the application, usually in the process of creating a new Pod and destroying the old Pod.

In Serverless services, developers focus on the life cycle of the application, especially the creation and upgrade phases. The efficiency of Pod creation will affect the overall time-consuming of these two phases, and then affect the developer's experience. In the face of sudden traffic, the creation efficiency will have an important impact on the response speed of developer services, and in serious cases, it will damage the business of developers.

In the face of the above business scenarios, we will focus on how to improve the efficiency of Pod creation.

Create a Pod process

Under the overall analysis, the stage of Pod creation is solved in turn according to the priority that affects the efficiency of Pod creation.

This is the simplified process of creating a Pod:

When there is a Pod creation request, schedule first to select the most appropriate node for Pod. On the node, pull the image first, and then create a container group after the image is ready locally. In the pull image stage, it is divided into two steps: downloading the image and decompressing the image.

We tested two types of images, and the results are as follows:

As can be seen from the test results, the proportion of time spent decompressing images in the whole process of pulling images cannot be ignored. For golang:1.10 images around 248MB before decompression, the time taken to extract images accounts for 77.02% of the time spent pulling images. For hadoop namenode images around 506MB before decompression, the time spent on decompressing images and downloading images is about 40% and 60% respectively, that is, the total time spent in the process of pulling images cannot be ignored.

Then, we optimize the different nodes of the above process, and discuss the whole process, decompression image, download image and so on.

Pull image efficiency to improve image preheating

The quick way to think of is to preheat the image, prepare the image on the node before Pod scheduling to the node, and remove the pull image from the main link where the Pod is created, as shown below:

You can warm up globally before scheduling and pull the image in advance on all nodes. You can also warm up during the scheduling process, and after determining the scheduled node, pull the image on the target node.

There is nothing wrong with the two ways, and you can choose according to the actual situation of the cluster.

The OpenKruise project in the community is about to launch an image preheating service, you can follow it. The following is how the service is used:

The image prefetch task is issued through ImagePullJob CRD to specify the target image and node. You can configure the concurrency of pull, the timeout for Job processing, and the time for automatic collection of Job Object. If it is a private image, you can specify the secret configuration when pulling the image. The Events of ImagePullJob will mirror the status information of the task. You can consider increasing the time for automatic collection of Job Object to make it easier to view the processing status of the task through ImagePullJob Events.

Improve decompression efficiency

Judging from the data you just saw pulling the image, the time taken to extract the image will account for a large proportion of the total time spent pulling the image, and the maximum proportion of the tested examples is 77%. Therefore, you need to consider how to improve the decompression efficiency.

Let's review the technical details of docker pull:

In docker pull, there are two phases as a whole:

Download the image layer in parallel

Disassemble image layer

The default gunzip is used when decompressing the image layer.

Let's take a look at the process of docker push:

The image layer is packaged first, which is compressed through gzip.

And then upload it in parallel.

Gzip/gunzip is a single-threaded compression / decompression tool. Pigz/unpigz can be considered for multi-threaded compression / decompression to make full use of multi-core advantages.

Containerd supports pigz since version 1.2, and after installing the unpigz tool on the node, it will be used as a priority for decompression. Through this method, the mirror decompression efficiency can be improved through the multi-core capability of nodes.

This process also needs to pay attention to the concurrency of download / upload. Docker daemon provides two parameters to control the concurrency and the number of mirror layers to be processed in parallel,-- max-concurrent-downloads and-- max-concurrent-uploads. By default, the concurrency of downloads is 3 and that of uploads is 5, which can be adjusted to the appropriate value according to the test results.

Image decompression efficiency after using unpigz:

In the same environment, the decompression efficiency of golang:1.10 image has been improved by 35.88%. The decompression efficiency of golang:1.10 image has been improved by 16.41%.

Uncompressed image

Usually, the bandwidth of the private network is large enough, is it possible to omit the decompression / compression logic and focus the time-consuming of pulling the image on downloading the image? That is, appropriately increase the download time, shorten the decompression time.

Looking back at the process of docker pull/push, you can consider removing the logic of gunzip and gzip during the unpack/pack phase:

For docker images, if the image in docker push is uncompressed, there is no need to decompress the image in docker pull, so to achieve the above goal, you need to remove the compression logic in docker push.

Docker daemon does not support the above operations. We have made some modifications to docker so that no compression operation is performed when uploading images. The test results are as follows:

Here we focus on the time-consuming decompression of images. We can see that the decompression efficiency of golang:1.10 images has been improved by about 50%, and that of hadoop namenode images has been improved by about 28%. In terms of the total time consuming to pull the image, this scheme has a certain effect.

Image distribution

In small clusters, improving the efficiency of pulling images needs to be focused on improving the decompression efficiency, and downloading images is usually not a bottleneck. In large-scale clusters, due to the large number of nodes, the bandwidth and stability of the centralized Image Registry will also affect the efficiency of pulling images, as shown below:

The pressure to download the image is concentrated on the central Image Registry.

This paper introduces a P2P-based image distribution system to solve the above problems, taking CNCF's DragonFly project as an example:

Here are several core components:

ClusterManager

It is essentially a central SuperNode, which acts as a tracker and scheduler to coordinate the download task of nodes in P2P networks. At the same time, it is also a caching service that caches images downloaded from Image Registry, reducing the pressure on Image Registry caused by the increase of nodes.

Dfget

It is not only the client that downloads the image on the node, but also acts as the ability to provide data to other nodes, and the existing local mirror data can be provided to other nodes on demand.

Dfdaemon

There is a Dfdaemon component on each node, which is essentially a proxy, implements a transparent proxy service for docker daemon's request to pull the image, and downloads the image using Dfget.

Through the P2P network, the central Image Registry data is cached in the ClusterManager, and the ClusterManager coordinates the demand of the node to download the image, and allocates the pressure of downloading the image to the cluster node. The cluster node is not only the puller of the mirror data, but also the provider of the mirror data, making full use of the bandwidth of the internal network for mirror distribution.

Load the mirror on demand

In addition to the methods described above, are there any other optimization methods?

When creating a container on the current node, you need to pull all the mirrored data locally before starting the container. Consider the process of starting a virtual machine, even for a few hundred GB virtual machine images, starting the virtual machine is usually at the second level, and you can hardly feel the impact of the virtual machine image size.

So can similar technologies be used in the container field?

Let's take a look at a paper description entitled "Slacker: Fast Distribution with Lazy Docker Containers" published on usenix:

Our analysis shows that pulling packages accounts for 76% of container start time, but only 6.4% of that data is read.

According to the paper analysis, pulling images accounts for 76% of the time spent on image startup, but only 6.4% of the data is used during startup, that is, the amount of image data needed for image startup is very small. It is necessary to consider loading the image as needed during the image startup phase and changing the way the image is used.

For "you can not start the image until all the layers of Image has been downloaded", you need to load the image on demand when you start the container, similar to the way you start the virtual machine, only the data needed during the startup phase is transferred over the network.

But the current image format is usually tar.gz or tar, and the tar file does not have an index, and the gzip file cannot read data from any location, so it can not meet the requirements of pulling the specified file when pulling on demand, and the image format needs to be changed to an indexable file format.

Google proposes a new image format, stargz, whose full name is seeable tar.gz. It is compatible with the current mirror format, but provides a file index to read data from a specified location.

The traditional .tar.gz file is generated like this: Gzip (TarF (file1) + TarF (file2) + TarF (file3) + TarFooter)). Package each file separately, and then compress the filegroup.

The stargz file innovates as follows: Gzip (TarF (file1)) + Gzip (TarF (file2)) + Gzip (TarF (file3_chunk1)) + Gzip (F (file3_chunk2)) + Gzip (F (index of earlier files in magic file), TarFooter). Each file is packaged and compressed, and an index file is formed, which is compressed with TarFooter.

In this way, you can quickly locate the location of the file to be pulled through the index file, and then pull the file from the specified location.

Then, when pulling the image in containerd, a remote snapshotter is provided for containerd. When creating a container rootfs layer, instead of downloading the image layer and then building it, you can directly mount the remote storage layer, as shown below:

To achieve this capability, on the one hand, we need to modify the current logic of containerd to identify the remote mirror layer in the filter phase. For such a mirror layer without download operation, on the one hand, we need to implement a remote snapshotter to support the management of the remote layer.

When containerd creates a container through remote snapshotter, the stage of pulling the image is omitted. For the files needed during the startup process, you can initiate a HTTP Range GET request for the image data in stargz format to pull the target data.

Aliyun has implemented an accelerator called DADI, which is similar to the idea mentioned above, which is currently applied to Aliyun's container service, which enables 10000 containers to be started in 3.01s, which perfectly eliminates the long waiting for cold start. Interested readers also refer to this article: https://developer.aliyun.com/article/742103

Upgrade in place

The above are all technical solutions for the process of creating Pod. For upgrade scenarios, is there any possibility of efficiency improvement under the existing technology? Can the following effect be achieved, that is, the process of creating Pod is eliminated and the Pod is upgraded in place?

In the upgrade scenario, the larger scenario is only the upgrade image. For this scenario, you can use the patch capability of K8s itself. There is no reconstruction through patch image,Pod, only the target container is rebuilt, so that the complete scheduling + new Pod process is not required, and only the containers that need to be upgraded are upgraded in place.

In the process of in-place upgrade, with the help of K8s readinessGates capability, you can control the elegant offline of Pod. K8s Endpoint Controller actively removes the Pod to be upgraded, and adds the upgraded Pod after the Pod is upgraded in place, so that the traffic during the upgrade is lossless.

CloneSet Controller in the OpenKruise project provides the above capabilities:

Developers declare applications using CloneSet, which is similar to Deployment. When upgrading the image, CloneSet Controller is responsible for performing the patch operation while ensuring that the business traffic is lossless during the upgrade process.

The above is how to optimize the efficiency of Pod creation in Serverless scenarios. The editor believes that there are some knowledge points that we may see or use in our daily work. I hope you can learn more from this article. For more details, please follow the industry information channel.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.