"is the final battle between Serverless and the container imminent? With elastic expansion, it's different. " 04/20 Update SLTechnology News&Howtos

"is the final battle between Serverless and the container imminent? With elastic expansion, it's different. "

2025-04-20 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Database >

Shulou(Shulou.com)06/01 Report--

Cdn.com/b13a0815616534c858128d3235d87ea488d057af.png ">

Author | Mo Yuan, container technology expert at Aliyun

This article is based on the speech from K8s & cloudnative meetup Shenzhen venue on August 31. Follow the official account of "Alibaba Yunyuan" and reply to the key word "data" to obtain the PPT collection of meetup activities in 2019 and the most complete knowledge graph of K8s.

Introduction: Serverless and Autoscaling are of great concern to developers in recent years. Some people say that Serverless is Container 2.0, and one day the container will have a decisive battle with Serverless and decide the outcome. In fact, containers and Serverless can coexist and complement each other, especially in Autoscaling-related scenarios. Serverless can be perfectly compatible with containers to make up for the lack of simplicity, speed and cost in container scenarios. This article will introduce the principles, solutions and challenges of containers in elastic scenarios, and how Serverless can help containers solve these problems.

When we're talking about "stretch."

When we are talking about "stretch", what are we talking about? "stretch" has different meanings for different roles in the team, and this is the charm of stretch.

Start with a resource graph

This diagram, which is often used to illustrate the problem of auto scaling, shows the relationship between the actual resource capacity of the cluster and the capacity required by the application.

The red curve represents the actual capacity required for the application, because the amount of resource requests applied is much smaller than that of the nodes, so the curve is relatively smooth. The green broken line represents the actual resource capacity of the cluster, and the inflection point of the broken line indicates that manual capacity adjustment has been made at this time, such as adding or removing nodes, because the resource capacity of a single node is fixed and relatively large, so it is mainly broken line.

First of all, let's take a look at the first yellow grid area on the left. This area indicates that the capacity of the cluster cannot meet the capacity requirements of the business. In the actual scenario, it is usually accompanied by phenomena such as Pod that cannot be scheduled due to insufficient resources.

In the middle grid area, the capacity of the cluster is much higher than that required by the actual resources, which will lead to a waste of resources. the actual performance is usually that the load distribution of the nodes is uneven, and there is no scheduling load on some nodes. the load of other nodes is relatively high.

The grid area on the right represents the surge of peak capacity. We can see that the curvature before reaching the peak is very steep. This scenario is usually due to unconventional capacity planning scenarios such as traffic surge and a large number of tasks. The reaction time of the surge of peak traffic to operation and maintenance students is very short, which may cause accidents if it is not handled properly.

Auto scaling has different meanings for people in different roles:

Developers want to ensure the high availability of applications through self-scaling; operators want to reduce infrastructure management costs through auto-scaling; and architects hope that a flexible and flexible architecture can cope with sudden surge peaks.

Auto scaling has a variety of different components and solutions, and choosing a solution that suits your business needs is the first step before implementation.

Kubernetes Elastic expansion ability interpretation of Kubernetes Elastic expansion related components

The components of Kubernetes auto scaling can be interpreted from two dimensions: one is the scaling direction, and the other is the scaling object.

From the telescopic direction, it is divided into horizontal and vertical. From the telescopic object, it is divided into node and Pod. Then expand this quadrant into the following three types of components:

Cluster-autoscaler, node horizontal expansion; HPA & cluster-proportional-autoscaler,Pod horizontal expansion; vertical pod autoscaler&addon resizer,Pod vertical expansion.

HPA and Cluster-Autoscaler are the most commonly used auto-scaling components used by developers. HPA is responsible for the horizontal scaling of the container, and Cluster-Autoscaler is responsible for the horizontal scaling of nodes. Many developers will wonder why a function of auto scaling needs to be refined into so many components to be dealt with separately. Can't we just set a threshold to realize the automatic water level management of the cluster?

The Auto scaling Challenge of Kubernetes

Understanding how Kubernetes is scheduled can help developers better understand Kubernetes's flexible design philosophy. In Kubernetes, the minimum unit of scheduling is that a Pod,Pod will be scheduled to nodes that meet the conditions according to scheduling policies, including resource matching, affinity and anti-affinity, and so on. The calculation of resource matching relationship is the core element of scheduling.

There are usually four concepts related to resources:

Capacity represents the total capacity that a node can allocate; Limit represents the total amount of resources that a Pod can use; Request represents the resource space occupied by a Pod in scheduling; and Used represents the actual resource usage of a Pod.

After understanding these four basic concepts and usage scenarios, let's take a look at the three major challenges of Kubernetes auto scaling:

1. Capacity planning

Remember how capacity planning was done before containers were used? Generally, machines are allocated according to the application. For example, application A requires two 4C8G machines, application B requires four 8C16G machines, and the machine of application An and the machine of application B are independent and do not interfere with each other. When it comes to the container scenario, most developers do not need to care about the underlying resources, so where is the capacity planning at this time?

In Kubernetes, it is set through Request and Limit. Request represents the application value of the resource, and Limit represents the limit value of the resource. Since Request and Limit are the equivalent concepts of capacity planning, this means that the actual calculation rules of resources are more accurate according to Request and Limit. As for the resource reservation threshold of each node, it is likely to cause the scenario that the reservation of small nodes can not meet the scheduling, and the reservation of large nodes is endless.

two。 Percentage fragment trap

In a Kubernetes cluster, there is usually more than one specification of machines. According to different scenarios and different requirements, the configuration and capacity of machines may vary greatly, so the percentage of cluster scaling is very confusing.

Assuming that there are two different specifications of 4C8G machines and 16C32G machines in our cluster, the two specifications represent completely different meanings for 10% resource reservations. Especially in the scale-down scenario, in order to ensure that the reduced cluster is not in a state of shock, we will reduce the size of the node by node, so it is particularly important to judge whether the current node is in the state of scale-down according to the percentage. At this time, if the low utilization of large machines is judged to scale down, it is likely to cause competition and hunger after node reduction and container rescheduling. If you add a judgment condition and give priority to downsizing the nodes with small configurations, it may result in a lot of redundancy of resources after downsizing, and eventually only all the boulder nodes will be left in the cluster.

3. Resource utilization dilemma

Can the resource utilization of the cluster really represent the current state of the cluster? When the resource utilization of a Pod is very low, it does not mean that it can encroach on the resources he applies for. In most production clusters, the utilization rate of resources will not be maintained at a very high water level, but in terms of dispatching, the dispatching water level of resources should be maintained at a relatively high water level. Only in this way can we ensure the stable availability of the cluster without wasting resources too much.

What does it mean if Request and Limit are not set and the overall resource utilization of the cluster is high? This means that all Pod are being scheduled with real load as units, there is a very serious competition with each other, and simply joining the node can not solve the problem at all, because for a scheduled Pod, there is no way to remove the Pod from the high-load node except manual scheduling and eviction. What does it mean if we set up Request and Limit and the resource utilization of the node is very high? Unfortunately, this is not possible in most scenarios, because different applications and different loads will have different resource utilization at different times. Most likely, the Pod cannot be scheduled before the cluster triggers the threshold set.

After understanding the three major problems of Kubernetes auto scaling, let's take a look at the solution of Kubernetes.

Kubernetes's philosophy of flexible Design

The design idea of Kubernetes is to divide elastic scaling into scheduling layer scaling and resource layer scaling. The scheduling layer is responsible for scaling out the scheduling unit according to the index and threshold, while the resource layer scaling is responsible for meeting the resource needs of the scheduling unit.

In the scheduling layer, the horizontal scaling of Pod is usually carried out through HPA. The use of HPA is very similar to our traditional understanding of elastic scaling. Horizontal scaling is carried out by setting the judgment index and the judgment threshold.

At present, the mainstream solution in the resource layer is to scale nodes horizontally through cluster-autoscaler. When Pod cannot be scheduled due to insufficient resources, cluster-autoscaler will try to select a group from the configuration scaling group that can meet the scheduling needs, and automatically add an instance to the group. When the instance is registered to Kubernetes after startup, kube-scheduler will re-trigger the scheduling of Pod and schedule the previously unscheduled Pod to the newly generated node, thus completing the expansion of the full link.

Similarly, when reducing capacity, the scheduling layer will compare the resource utilization with the set threshold to achieve Pod-level capacity reduction. When the scheduling resource of Pod on the node is reduced to the capacity reduction threshold of the resource layer, Cluster-Autoscaler will drain the nodes with low scheduling percentage. After the drainage is completed, the node will be scaled down to complete the contraction of the whole link.

The Achilles heel of Kubernetes Elastic expansion solution A case study of classic Kubernetes Elastic expansion

This diagram is a very classic example of self-scaling, which can represent most online business scenarios. The initial architecture of the application is a Deployment, and there are two Pod below. The access layer of this application is exposed through the Ingress Controller. We set the scaling policy of the application: when the QPS of a single Pod reaches 100, the capacity is expanded, with a minimum of 2 Pod and a maximum of 10 Pod.

HPA controller will continue to train alibaba-cloud-metrics-adapter in rotation to obtain the QPS metrics of the current route of Ingress Gateway. When the traffic of Ingress Gateway reaches the QPS threshold, HPA controller will trigger a change in the number of Pod of Deployment; when the application capacity of Pod exceeds the total amount of cluster, cluster-autoscaler will select the appropriate scaling group and pop up the corresponding Node to host the previously unscheduled Pod.

Such a classic auto-scaling case has been resolved, so what problems will be encountered in the actual development process?

The disadvantage and solution of classical Kubernetes elastic expansion

The first is the delay of capacity expansion. In the community standard mode, the delay of capacity expansion is about 2min-2.5min by creating and releasing ECS, while Aliyun's independent extreme speed mode is realized by creation, shutdown and startup, and only the storage fee is charged during downtime, not the calculation fee. More than 50% elastic efficiency can be achieved at a very low price.

In addition, complexity is also a problem that can not be bypassed by cluster-autoscaler. If you want to make good use of cluster-autoscaler, you need to have an in-depth understanding of some internal mechanisms of cluster-autoscaler, otherwise it is very likely to cause scenarios that cannot be popped up or cannot be scaled down.

For most developers, cluster-autoscaler works in a black box, and cluster-autoscaler 's best way to troubleshoot problems is still to check logs. Once a running exception occurs in cluster-autoscaler, which is unable to scale as expected due to developer configuration errors, it is difficult for more than 80% of developers to correct errors themselves.

The Aliyun CCS team has developed a kubectl plugin, which can provide deeper observability of cluster-autoscaler, view the scaling phase of the current cluster-autoscaler and auto auto scaling error correction capabilities.

Although the core problems encountered at present are not the last straw that crushed the camel. But we've been wondering if there are other ways to make auto scaling easier and more efficient to use.

Martin Boots of Achilles-Serverless Autoscaling

The core problems of resource layer scalability are high learning cost, difficult troubleshooting and poor timeliness. When we look back at Serverless, we can see that these problems happen to be the characteristics and advantages of Serverless, so is there any way to make Serverless a resilient solution for the Kubernetes resource layer?

Serverless Autoscaling component-virtual-kubelet-autoscaler

The Ali Cloud CCS team developed virtual-kubelet-autoscaler, a component that implements serverless autoscaling in Kubernetes.

When there is an unschedulable Pod, the virtual-kubelet is responsible for carrying the real load, which can be understood as a virtual node with an infinite capacity. When Pod is dispatched to virtual-kubelet, Pod is started through a lightweight instance ECI. At present, the startup time of ECI is less than 30 seconds, and the program will be pulled up within 1 minute from scheduling to running.

Similar to cluster-autoscaler, virtual-kubelet-autoscaler also needs to use the simulation scheduling mechanism to determine whether Pod can be processed and carried in reality. However, compared with cluster-autoscaler, there are the following differences:

The object that virtual-kubelet-autoscaler simulates scheduling is Pod Template, not Node Template, which adds scheduling policy. The core of virtual-kubelet-autoscaler is to choose virtual-kubelet to carry the load. Once the Pod simulation scheduling is successfully bound to virtual-kubelet, the life cycle management and problem troubleshooting of Pod are no different from the traditional Pod, and it is no longer a black box troubleshooting problem. Virtual-kubelet-autoscaler is not a "silver bullet"

Virtual-kubelet-autoscaler is not used to replace cluster-autoscaler. Virtual-kubelet-autoscaler has the advantages of simple use, high flexibility, high concurrency, and pay-by-demand. But at the same time also at the expense of some compatibility, the current support for cluster-pi, coredns and other mechanisms is not perfect, only a little configuration virtual-kubelet-autoscaler can be compatible with cluster-autoscaler. Virtual-kubelet-autoscaler is particularly suitable for big data offline tasks, CI/CD assignments, sudden online loads, and so on.

Last

Serverless autoscaling has gradually become an important part of Kubernetes auto scaling. When serverless autoscaling compatibility is basically complete, serverless features of simple use, no operation and maintenance, and cost savings will complement Kubernetes perfectly and realize a new leap in Kubernetes auto scaling.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.