How to bring stable and efficient deployment capabilities to cloud native applications? 07/01 Update SLTechnology News&Howtos

How to bring stable and efficient deployment capabilities to cloud native applications?

2025-07-01 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Servers >

Shulou(Shulou.com)06/02 Report--

Author | Jiuzhu Aliyun technical expert and ink seal Aliyun development engineer

Live full video review: https://www.bilibili.com/video/BV1mK4y1t7WS/

Follow the official account of "Alibaba Cloud Origin" and reply "528" at the backend to download PPT.

On May 28, we launched the live broadcast of the third SIG Cloud-Provider-Alibaba Network Research Conference. This live broadcast mainly introduces the core deployment problems encountered in the process of large-scale cloud application in Ali economy, the corresponding solutions adopted, and how to help users on Ali Cloud improve the efficiency and stability of application deployment and release after these solutions have been precipitated into universal capability output and open source.

This article collects the complete video review and data download of the live broadcast, and collates the questions and answers collected during the live broadcast. I hope it can be helpful to all of you.

Preface

As Kubernetes has gradually become the de facto standard and cloud biochemistry of a large number of applications in recent years, we often find that the native workload of Kubernetes is not very "friendly" to support large-scale applications. How to provide more perfect, efficient and flexible deployment and release capabilities for applications on Kubernetes has become the goal of our exploration.

This article will introduce the improvements and optimizations we have made in application deployment, achieve a more fully functional enhanced version of workload, and open source it to the community during the process of Ali economy's full access to cloud native applications, so that every Kubernetes developer and Ali Cloud users can easily use the deployment and publishing capabilities used by Alibaba's internal cloud native applications.

Review of the first phase of the webinar: Kubernetes SIG-Cloud-Provider-Alibaba 's first webinar (including PPT download) the second phase of the webinar review: how does Aliyun build a high-performance cloud native container network in a production environment? (including PPT download) Ali application scenarios and native workloads

The start of Alibaba's containerization road is relatively leading both at home and abroad. Although the technical concept of container appeared a long time ago, it was not until the advent of Docker products in 2013 that it became well known. Alibaba began to develop LXC-based container technology as early as 2011. after several generations of system evolution, Alibaba now has more than one million containers, which is among the best in the world.

With the development of cloud technology and the rise of cloud native applications, we have gradually moved the past containers to the Kubernetes-based cloud native environment in the past two years. Among them, we have encountered a lot of problems in application deployment. First of all, for application developers, their expectations for migrating to a cloud native environment are:

Policy capabilities for rich business scenarios extreme deployment release efficiency runtime stability and fault tolerance

Ali's application scenario is very complex. Based on Kubernetes, there are many different PaaS layers, such as operation and maintenance center, large-scale operation and maintenance, middleware, Serverless, functional computing, etc., and each platform has different requirements for deployment and release.

Let's take a look at the two commonly used workload capabilities provided natively by Kubernete:

To put it simply, Deployment and StatefulSet can be work in some small-scale scenarios, but in the scale of applications and containers like Alibaba, it is totally unrealistic to use native workload in full. At present, there are more than 100,000 applications and millions of containers on Ali's internal container cluster, and there are tens of thousands of containers in some key core applications or even under a single application. Combined with the problems in the above figure, we will find that not only the release function for a single application is insufficient, but also when a large number of applications are upgraded at the same time, the super-large-scale Pod reconstruction has also become a "disaster".

Extended workloads developed by Ali

In order to solve the problem that native workload is far from satisfying application scenarios, we abstract common application deployment requirements from a variety of complex business scenarios and develop a variety of extended workload. We have made significant enhancements and improvements in these workload, but at the same time, we will strictly ensure the generalization of the function and do not allow the business logic to be coupled.

Here we focus on CloneSet and Advanced StatefulSet. In Ali's internal cloud native environment, almost all e-commerce related applications are deployed and released using CloneSet, while stateful applications such as middleware use Advanced StatefulSet management.

As the name implies, Advanced StatefulSet is an enhanced version of native StatefulSet, and its default behavior is exactly the same as the native. In addition, it provides features such as in-situ upgrade, parallel release (maximum unavailable), release pause, and so on. CloneSet, on the other hand, targets native Deployment, which mainly serves stateless applications, and provides the most comprehensive and rich deployment and release strategy.

Upgrade in place

Both CloneSet and Advanced StatefulSet support the specified Pod upgrade method:

ReCreate: rebuild the Pod upgrade, which is the same as the native Deployment/StatefulSet; InPlaceIfPossible: if only the labels/annotations fields in image and metadata are modified, the Pod upgrade is triggered; if other template spec fields are modified, the upgrade is reduced to Pod reconstruction; InPlaceOnly: only labels/annotations fields in image and metadata are allowed to be modified, and only in-situ upgrade is used.

The so-called in-place upgrade means that when upgrading the template template, workload will not delete or create the original Pod, but directly update the corresponding image and other data on the original Pod object.

As shown in the figure above, when upgrading in place, CloneSet will only update the image of the corresponding container in Pod spec. When kubelet sees a change in the definition of this container in Pod, it will stop the corresponding container, pull a new image, and create a startup container with the new image. In addition, you can see that during the process, the Pod sandbox container and other containers that have not been upgraded have been running normally, and only those containers that need to be upgraded will be affected.

Upgrading in place brings us too many benefits:

First of all, the publishing efficiency has been greatly improved. according to incomplete statistics, the release speed of in-situ upgrade in Ali environment is at least 80% faster than that of complete reconstruction upgrade: it not only saves the time-consuming of scheduling, network allocation and remote disk allocation, but also benefits from the fact that there are existing old mirrors on node and only need to pull less incremental layer when pulling new mirrors.) The IP remains unchanged, the upgrade process Pod network is continuous, and other containers except this upgrade remain in normal operation; Volume remains unchanged, and the mount devices of the original containers are completely reused; the cluster certainty is guaranteed, so that the layout topology can pass the verification.

In the follow-up, we will have a special article on Ali's in-situ upgrade on Kubernetes, which is of great significance. Without the in-situ upgrade, the super-large-scale application scenarios within Alibaba can hardly be perfectly landed in the native Kubernetes environment, and we encourage every Kubernetes user to "experience" the in-situ upgrade, which brings us a change different from the traditional Kubernetes release model.

Streaming + batch release

As mentioned in the previous chapter, Deployment currently supports streaming upgrades for maxUnavailable/maxSurge, while StatefulSet supports batch upgrades for partition. But the problem is that Deployment can not be grayscale batches, while StatefulSet can only be released one by one Pod serial, there is no way to parallel streaming upgrade.

The first thing to say is that we introduced maxUnavailable into Advanced StatefulSet. In fact, the one by one release of native StatefulSet can be understood as a process of forcing maxUnavailable=1, while in Advanced StatefulSet, if we configure a larger maxUnavailable, we will support the release of more Pod in parallel.

Then let's take a look at CloneSet, which supports all release strategies for native Deployment and StatefulSet, including maxUnavailable, maxSurge, and partition. So how does CloneSet put them together? Let's look at an example:

ApiVersion: apps.kruise.io/v1alpha1kind: CloneSet#... spec: replicas: 5 # Pod Total is 5 updateStrategy: type: InPlaceIfPossible maxSurge: 20% # expand 5 * 20% = 1 Pod (rounding up) maxUnavailable: 0 # guarantee release process 5-0 = 5 Pod available partition: 3 # keep 3 old versions of Pod (only release 5-3 = 2 Pod)

For this CloneSet with 5 copies, if we modify the image in template and configure: maxSurge=20% maxUnavailable=0 partition=3. When the release starts:

First expand a new version of Pod,5 stock Pod to remain unchanged; after the new Pod ready, gradually upgrade the old version of Pod; until the remaining 3 old versions of Pod, because it meets the final state of partition, the new version of Pod will be deleted; at this time, the total number of Pod is still 5, including 3 old versions and 1 new version.

If we then adjust partition to 0, CloneSet will first expand an additional new version of Pod, then gradually upgrade all Pod to the new version, and finally delete one Pod again, reaching the final state of full upgrade of 5 copies.

The release order is configurable

For native Deployment and StatefulSet, users cannot configure the release order. The release order of Pod under Deployment depends entirely on the expansion and contraction order after it modifies ReplicaSet, while StatefulSet strictly upgrades one by one according to the reverse order of order.

However, in CloneSet and Advanced StatefulSet, we have added the configurability of the release order so that users can customize their own release order. Currently, the order can be defined by the following two release priorities and a release fragmentation strategy:

Priority (1): according to the given label key, the weight is based on the value value corresponding to the key in the Pod labels at the time of release: apiVersion: apps.kruise.io/v1alpha1kind: CloneSetspec: #. UpdateStrategy: priorityStrategy: orderPriority:-orderedKey: some-label-key priority (2): weight is calculated by selector matching, and the sum of weights is calculated according to the matching of Pod to multiple weight selector when publishing: apiVersion: apps.kruise.io/v1alpha1kind: CloneSetspec: #. UpdateStrategy: priorityStrategy: weightPriority:-weight: 50 matchSelector: matchLabels: test-key: foo-weight: 30 matchSelector: matchLabels: test-key: bar break up: break up Pod matching key-value into different batches release: apiVersion: apps.kruise.io/v1alpha1kind: CloneSetspec: #... UpdateStrategy: scatterStrategy:-key: some-label-key value: foo

Some students may ask why the release order is configured. For example, when an application like zookeeper is released, it needs to upgrade all non-master nodes first, and then upgrade the master node, so as to ensure that only one cut-off will occur in the whole release process. At this time, users can mark through the process, or write an operator to automatically label the Pod of zookeeper with node responsibility, and then configure the non-master node to release with a large weight, so that the number of times to cut the master can be reduced as much as possible.

Sidecar Container Management

Lightweight container is also a major reform of Alibaba in the cloud primary stage. In the past, most of Ali's containers were run as "rich containers". The so-called "rich container" means that it not only runs business but also runs a variety of plug-ins and daemons in a container. In the cloud native era, we are gradually splitting the bypass plug-ins in the original "rich container" into separate sidecar containers, making the main container gradually return to the business itself.

Without going into the benefits of splitting here, let's look at another question, that is, how do these sidecar containers be managed after the split? The most intuitive way is to display in the workload of each application to define the sidecar needed in the Pod, but this brings a lot of problems:

When there are a large number of applications and workload, it is difficult for us to uniformly manage the increase or decrease of sidecar; application developers do not know (or even care) which sidecar containers their applications need to configure; when the sidecar image needs to be upgraded, it is unrealistic to upgrade all the workload of all applications.

Therefore, we designed SidecarSet to decouple the definition of sidecar container from the application workload. Application developers no longer need to care about which sidecar containers they need to write in their workload, and through in-place upgrades, sidecar maintainers can manage and upgrade sidecar containers on their own.

Open capability application

At this point, do you have a basic understanding of Alibaba's application deployment model? In fact, all of the above capabilities have been opened up to the community, and our project is called OpenKruise, which currently provides five extension workload:

CloneSet: provides more efficient, definite and controllable application management and deployment capabilities, and supports rich strategies such as elegant in-place upgrade, specified deletion, configurable release order, parallel / grayscale release, etc., which can meet more diversified application scenarios. Advanced StatefulSet: an enhanced version based on native StatefulSet, with the same default behavior as the native. In addition, it provides functions such as in-place upgrade, parallel release (maximum unavailable), release pause, etc. SidecarSet: unified management of sidecar containers, injecting specified sidecar containers into Pod that meets selector conditions; UnitedDeployment: deploying applications to multiple availability zones through multiple subset workload BroadcastJob: configure a job to run a Pod task on all eligible Node in the cluster.

In addition, we still have more expansibility on the road to open source! In the near future, we will open the internal Advanced DaemonSet to OpenKruise, which provides additional release strategies such as batching and selector on top of the native DaemonSet maxUnavailable. The batch feature enables DaemonSet to upgrade only part of the Pod when it is released, and selector allows the node to be upgraded first when it is released, which provides a guarantee of grayscale ability and stability for us to upgrade DaemonSet in large-scale clusters.

In the future, we also plan to open up the generic capabilities such as HPA and scheduling plug-ins extended internally by Alibaba, so that every Kubernetes developer and users on Aliyun can easily use the cloud native enhanced capabilities of ShangAli internally developed applications.

Finally, we also welcome every cloud enthusiast to participate in the construction of OpenKruise. Unlike some other open source projects, OpenKruise is not a copy of Ali's internal code; on the contrary, the OpenKruise Github repository is the upstream of Ali's internal code base. Therefore, every line of your contributed code will run in all the Kubernetes clusters within Ali and will jointly support Alibaba's top global application scenarios!

Q & A

Q1: what is the number of pod of Ali's largest business at present, and how long does it take to release it?

A1: this can only reveal the number of single applications with the largest scale at present, the number is in tens of thousands of units, and the release time depends on the duration of the specific batch grayscale. If there are more batches and longer observation time, it may last for a week or two.

How are the resources request and limit configured for Q2:pod? What is the ratio of request to limit? Too much request causes waste, and too little may lead to excessive hot node load.

A2: this is mainly determined by the needs of the application. At present, most online applications are at 1:1, and some offline and job types will be configured with request > limit.

Q3:kruise upgrade problem, in the case of upgrading kurise apiversion version, how to upgrade the deployment of the original version?

A3: at present, the apiVersion of resources in kruise is uniform. We plan to upgrade some of the more mature workload versions in the second half of this year. After users upgrade in their K8s cluster, the existing old version resources will be automatically upgraded to the new version through conversion.

Does Q4:OpenKruise provide go-client?

A4: at present, there are two ways: 1. The introduction of github.com/openkruise/kruise/pkg/client package, the following are generated clientset / informer / lister and other tools; 2. Users who use controller-runtime (including kubebuilder, operator-sdk) directly introduce github.com/openkruise/kruise-api lightweight dependencies, and then add them to scheme to use them directly.

Q5: how to upgrade Ali K8s version?

A5: Alibaba Group uses Kube-On-Kube architecture for large-scale Kubernetes cluster management, and uses a meta-K8s cluster to manage hundreds of business K8s clusters. The meta-cluster version is relatively stable, and the business cluster will be upgraded frequently. In fact, the upgrade process of the business cluster is to upgrade the version or configuration of the workloads (native workloads and kruise workloads) in the meta-cluster, which is similar to the normal upgrade process for the business workloads.

Q6: after this grayscale, how is the traffic cut?

A6: before upgrading in place, kruise will first set Pod to not-ready through readinessGate, and controllers such as endpoint will sense and remove Pod from the endpoint. Then the kruise update pod image triggers the container reconstruction, and when it is finished, change the Pod to ready.

Is the batching of Q7:daemonset achieved through a pause function similar to deployment? Count the number that has been released and then pause, then continue, and then pause.

A7: the overall process is similar, during the upgrade process, the new and old versions are counted and determined whether the specified final state has been reached. However, compared to the more complex boundary situations that deployment,daemonset needs to deal with (such as the Pod that was not specified in the cluster at the time of the initial release), the details can continue to focus on the code we are about to open source.

Q8: how did you start publishing on the multi-cluster release page?

A8: what is demonstrated in the LVB is an example of a demo publishing system combined with Kruise Workloads. Interactively, the user selects the corresponding cluster and clicks to start the release. In terms of implementation, the new version of YAML and the YAML in the cluster are calculated by diff, and then the Patch enters the cluster, and then manipulates the control fields of DaemonSet (partition / paused, etc.) to control the grayscale process.

"Alibaba Cloud Native focus on micro services, Serverless, containers, Service Mesh and other technology areas, focus on cloud native popular technology trends, cloud native large-scale landing practice, to be the official account of cloud native developers."

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.