Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

How to analyze concept extension and component layout in the whole scene of Kubernetes auto-scaling

2025-04-05 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Servers >

Share

Shulou(Shulou.com)06/01 Report--

It is believed that many inexperienced people do not know what to do about how to analyze the concept extension and component layout in the whole scene of Kubernetes auto scaling. Therefore, this article summarizes the causes and solutions of the problem. Through this article, I hope you can solve this problem.

The dilemma of traditional elastic expansion

Auto scaling is a bright spot in Kubernetes, before discussing the related components and implementation solutions. First of all, I would like to expand the boundary and definition of auto-scaling. Traditionally, the main problem solved by auto-scaling is the contradiction between capacity planning and actual load.

As shown in the figure above, the blue water level line indicates that the capacity of the cluster increases as the load increases, and the red curve indicates the real change in the actual load of the cluster. The problem of elastic scaling is that when the actual load surges and capacity planning does not have time to respond.

Conventional auto scaling is based on a threshold, and a resource buffer level is set to ensure the filling of resources. Usually, a resource reservation of about 15% and 30% is a common choice. In other words, the redundancy of resources is exchanged for the availability of the cluster through a buffered resource pool.

On the surface, there is no problem with this approach, and it is true that it is also implemented in this way in many solutions or open source components, but when we think deeply about this implementation, we will find that there are three classic problems with this approach.

1. Percentage fragmentation problem

In a Kubernetes cluster, there is usually more than one type of machines. The configuration and capacity of machines may vary greatly according to different scenarios and different requirements, so the percentage of cluster scaling is very confusing. Suppose there are two different specifications of 4C8G machines and 16C32G machines in our cluster. For 10% resource reservation, the two specifications represent completely different meanings.

Especially in the scale-down scenario, in order to ensure that the reduced cluster is not in a state of shock, we will reduce the capacity of a node by node or dichotomy. Then it is particularly important to determine whether the current node is in a state of capacity reduction according to the percentage. At this time, if the utilization of large machines is judged to be low, it is likely to cause competition and hunger after node reduction and container rescheduling. If you add a judgment condition and give priority to downsizing the nodes with small configurations, it may result in a lot of redundancy of resources after downsizing, and eventually only all the boulder nodes will be left in the cluster.

two。 Capacity planning bomb

Remember how capacity planning was done before containers were used? Generally, machines are allocated according to the application. For example, application A requires two 4C8G machines, application B requires four 8C16G machines, and the machine of application An and the machine of application B are independent and do not interfere with each other. When it comes to the container scenario, most developers do not need to care about the underlying resources, so where is the capacity planning at this time?

In Kubernetes, it is set through Request and Limit. Request represents the application value of the resource, and Limit represents the limit value of the resource. Since Request and Limit are the equivalent concepts of capacity planning, this means that the actual calculation rules of resources are more accurate according to Request and Limit. As for the resource reservation threshold of each node, it is likely to cause the scenario that the reservation of small nodes can not meet the scheduling, and the reservation of large nodes is endless.

3. Resource utilization dilemma

Can the resource utilization of the cluster really represent the current state of the cluster? When the resource utilization of a Pod is very low, it does not mean that it can encroach on the resources it applies for. In most production clusters, the utilization rate of resources will not be maintained at a very high water level, but in terms of dispatching, the dispatching water level of resources should be maintained at a relatively high water level. Only in this way can we ensure the stable availability of the cluster without wasting resources too much.

What does it mean if Request and Limit are not set and the overall resource utilization of the cluster is high? This means that all Pod are scheduled with real load as units, there is a very serious competition with each other, and simply joining the node can not solve the problem at all, because for a scheduled Pod, there is no way to remove the Pod from the high-load node except manual scheduling and eviction. What does it mean if we set up Request and Limit and the resource utilization of the node is very high? Unfortunately, this is not possible in most scenarios, because different applications and different loads will have different resource utilization at different times. Most likely, the Pod cannot be scheduled before the cluster triggers the threshold set.

The extension of the concept of elastic expansion

Since there are three known problems with elastic scaling based on resource utilization, is there any way to solve it? With the development of the diversity of application types, the resource requirements of different types of applications are more and more different. The concept and meaning of auto-scaling are also changing. traditionally, the purpose of auto-scaling is to solve the contradiction between capacity planning and online load, but now it is a game between resource cost and availability. If you classify common applications into protocols, they can be divided into four different types:

1. Online task type

The more common Internet business applications are websites, API services, micro-services and other common Internet business applications, which are characterized by high consumption of conventional resources, such as CPU, memory, network IO, disk IO and so on.

two。 Offline task type

For example, big data offline computing, edge computing and so on, this kind of application is characterized by lower requirements for reliability and no clear timeliness requirements, and more attention is paid to how to reduce the cost.

3. Scheduled task type

It is a common form of this kind of application to run some batch computing tasks regularly, and the cost saving and scheduling ability is the key part.

4. Special task type

For example, leisure computing scenarios, IOT services, grid computing, supercomputing and so on, all of which have high requirements for resource utilization.

Most of the elastic scaling based on resource utilization is used to solve the first type of application, and it is not very suitable for the other three types of applications, so how does Kubernetes solve this problem?

Flexible layout of Kubernetes

Kubernetes abstracts the nature of elastic scaling. If you put aside the way of implementation, how to unify the model for the elastic scaling of different applications? The design idea of Kubernetes is to divide the elastic scaling into two levels: ensuring that the application load is within the capacity planning and ensuring that the size of the resource pool meets the overall capacity planning. To put it simply, when auto scaling is needed, the priority change should be the capacity planning of the load. When the resource pool of the cluster cannot meet the capacity planning of the load, adjust the water level of the resource pool to ensure availability. The combination of the two can not be realized by Pod, so developers can use HPA, VPA and other components to deal with capacity planning to achieve real-time extreme flexibility when the cluster resource water level is low. When resources are insufficient, the cluster resource water level can be adjusted and rescheduled by Cluster-Autoscaler to achieve scalability compensation. The two are decoupled and combined with each other to achieve extreme elasticity.

In the ecology of Kubernetes, different components are provided in multiple dimensions and levels to meet different scaling scenarios. If we interpret the elastic expansion of Kubernetes from two aspects: the expansion object and the expansion direction, we can get the following elastic expansion matrix.

Cluster-autoscaler: the component in the kubernetes community responsible for horizontal node scaling, currently in the GA phase (General Availability, that is, the officially released version).

HPA: the component responsible for horizontal scaling of Pod in the kubernetes community has the longest history of all scaling components. Currently, autoscaling/v1, autoscaling/v2beta1 and autoscaling/v2beta2 are supported. Autoscaling/v1 only supports CPU as a scaling metric, custom metrics is added in autoscaling/v2beta1 and external metrics is added in autoscaling/v2beta2.

Cluster-proportional-autoscaler: a component that adjusts the number of Pod horizontally according to the number of nodes in the cluster, which is currently in the GA stage.

Vetical-pod-autoscaler: a component that dynamically adjusts the Request value of the load according to the resource utilization, historical data and abnormal events of Pod. It mainly focuses on the resource scaling scenarios of stateful services and single applications, and is currently in the beta phase.

Addon-resizer: the component of Request that adjusts the load vertically according to the number of nodes in the cluster is currently in the beta phase.

Among these five components, cluster-autoscaler, HPA and cluster-proportional-autoscaler are currently relatively stable components. Developers with related requirements are recommended to choose them.

For more than 80% of the scenarios, we recommend that customers use HPA combined with cluster-autoscaler to manage the auto scaling of the cluster. HPA is responsible for the capacity planning and management of the load, while cluster-autoscaler is responsible for the expansion and reduction of the resource pool.

After reading the above, have you mastered the method of how to analyze the concept extension and component layout in the whole scene of Kubernetes auto scaling? If you want to learn more skills or want to know more about it, you are welcome to follow the industry information channel, thank you for reading!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Servers

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report