What is the principle and analysis of HPA in K8s 04/09 Update SLTechnology News&Howtos

What is the principle and analysis of HPA in K8s

2025-04-09 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Servers >

Shulou(Shulou.com)05/31 Report--

Today, I will talk to you about the principle and analysis of HPA in K8s, which may not be well understood by many people. In order to make you understand better, the editor has summarized the following contents for you. I hope you can get something according to this article.

I. introduction

The full name of HPA is (Horizontal Pod Autoscaling). It can dynamically expand and reduce the number of copies according to the current utilization of pod resources (such as CPU, disk, memory, etc.), so as to reduce the pressure on each pod.

When the pod load reaches a certain threshold, more new pod will be generated according to the capacity expansion strategy to share the pressure. When the use of pod is relatively idle, after a stable idle period of time, the number of pod copies will be automatically reduced.

Second, the working principle of HPA

A Metrics Server in K8s (Heapster or custom Metrics Server) continuously collects metric data for all Pod replicas.

The HPA controller obtains these data through Metrics Server's API (Heapster's API or aggregate API), and calculates based on the user-defined expansion rules to get the number of target Pod replicas.

When the number of target Pod replicas is different from the current one, the HPA controller initiates the scale operation to access the replica controller (Deployment, RC or ReplicaSet) of Pod, adjusts the number of Pod replicas, and completes the scaling operation.

III. Types of indicators

Master's kube-controller-manager service continuously monitors some performance metric of the target Pod to calculate whether the number of replicas needs to be adjusted. Currently, the types of metrics supported by k8s are as follows.

◎ Pod resource usage: a performance metric at the Pod level, usually a ratio value, such as CPU usage.

◎ Pod Custom Metrics: a Pod-level performance metric, usually a numeric value, such as the number of requests received.

◎ Object Custom Metrics or external Custom Metrics: usually a value that needs to be provided by the container application in some way, such as through HTTP URL "/ metrics", or using metrics provided by external services to collect URL.

Starting from version 1.11, K8s abandons the mechanism of collecting CPU utilization of Pod based on Heapster components, and turns to complete data acquisition based on Metrics Server. Metrics Server provides the collected Pod performance index data to the HPA controller for query by aggregating API (Aggregated API) such as metrics.k8s.io, custom.metrics.k8s.io and external.metrics.k8s.io. The concepts of aggregation API and API Aggregator will be explained in more detail later.

IV. Expansion and reduction strategy

Judge whether to expand or reduce capacity by the expansion coefficient.

HPA will use the corresponding algorithm to calculate a scaling coefficient based on the obtained metric value, which is the ratio of the expected value of the metric to the current value. If it is greater than 1, it will expand capacity, and if it is less than 1, it will be reduced.

Tolerance degree

-- horizontal-pod-autoscaler-tolerance: tolerance

It allows instability within a certain range of usage, which now defaults to 0.1, which is also for the sake of maintaining system stability.

For example, if you set the HPA scheduling policy to cpu utilization higher than 50% to trigger capacity expansion, scaling activities will be triggered only when the utilization rate is greater than 55% or less than 45%. HPA will try its best to keep the Pod utilization within this range.

Arithmetic

The specific Pod algorithm for each capacity expansion or reduction is as follows: expected copies = ceil [current copies * (current indicators / expected indicators)]

Take a chestnut.

If the current metricvalue is 200m and the expected value is 100m, the number of pod copies will double because the ratio is 200.0 / 100.0 = 2.0m.

If the current value is 50m, we will halve the number of pod copies because 50.0 / 100.0 = = 0.5

If the ratio is close to 1.0, such as 0.9 or 1.1 (that is, tolerance is 0.1), it will not be scaled (depending on the built-in global parameter tolerance,-horizontal-pod-autoscaler-tolerance, the default is 0.1).

In addition, there are several Pod exceptions, as described below.

◎ Pod is being deleted (delete timestamp is set): the number of target Pod copies will not be counted.

The current metric value of ◎ Pod is not available: this Pod will not be included in the number of target Pod copies in this probe, and subsequent probes will be re-included in the calculation.

◎ if the metric type is CPU utilization, Pod that is being started but has not yet reached the Ready status will not be included in the target replica quantity range for the time being. You can set the delay time for detecting whether Pod is Ready for the first time through horizontal-pod-autoscaler-initial-readiness-delay, the startup parameter of kube-controller-manager service. The default value is 30s. Another startup parameter, horizontal-pod-autoscaler-cpuinitialization-period, sets the delay time for the CPU utilization of the first collection of Pod.

Note:

The maximum number of pod for each expansion will not exceed 2 times the current number of copies.

If some pod containers do not have the required resource metrics, auto-scaling will not scale according to these metrics.

If targetAverageValue or targetAverageUtilization,currentMetricValue is specified, the metric of all target pod takes the mean. Before checking the tolerance and determining the final value, it combines the reference pod to see if it is ready and if the metrics is missing.

All Pod with delete timestamp set (that is, Pod in closed state) and all failed Pod will be discarded.

Cooling and delay mechanism

When using HPA to manage a set of replicas, it is possible that the number of replicas fluctuates frequently due to dynamic changes in metrics, a phenomenon called "bumps".

Imagine a scenario:

When the CPU load required by pod is too large, the CPU usage of the system may also increase during the process of creating a new pod. Therefore, the extended decision will not be made for a period of time after each decision is made. For capacity expansion, the time period is 3 minutes, and the capacity reduction is 5 minutes.

-- horizontal-pod-autoscaler-downscale-delay: this parameter is used to tell autoscaler how long it takes to wait for the next reduction after the reduction operation. The default is 5 minutes.

-- horizontal-pod-autoscaler-upscale-delay: this parameter is used to tell autoscaler how long it takes to wait for the next expansion after the expansion operation. The default is 3 minutes.

Note: users need to know the possible impact of adjusting this parameter. If the setting is too long, the response of the HPA to load changes will be longer; too short will cause automatic scaling "bumps".

Pod delay detection mechanism

If the metric type is CPU usage, Pod that is starting but has not yet reached the Ready state will not be included in the target replica count range for the time being.

You can set the delay time for detecting whether Pod is Ready for the first time through horizontal-pod-autoscaler-initial-readiness-delay, the startup parameter of kube-controller-manager service. The default value is 30s.

Another startup parameter, horizontal-pod-autoscaler-cpuinitialization-period, sets the delay time for the CPU utilization of the first collection of Pod.

5. HPA acquires the underlying implementation of custom metrics (Custom Metrics) (based on Prometheus)

Kubernetes implements Custom Metrics with the help of Agrregator APIServer extension mechanism. Custom Metrics APIServer is an API service that provides querying Metrics metrics (an adapter for Prometheus). After this service is started, kubernetes exposes an API called custom.metrics.k8s.io. When requesting this URL, the request goes to Prometheus through Custom Metics APIServer to query the corresponding metrics, and then returns the query results in a specific format.

After reading the above, do you have any further understanding of the principle and analysis of HPA in K8s? If you want to know more knowledge or related content, please follow the industry information channel, thank you for your support.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.