How to use HPA to realize the flexible scaling of business in TKE 07/12 Update SLTechnology News&Howtos

How to use HPA to realize the flexible scaling of business in TKE

2025-07-12 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Servers >

Shulou(Shulou.com)05/31 Report--

Today, I will talk to you about how to use HPA to achieve flexible scaling of business in TKE. Many people may not know much about it. In order to make you understand better, the editor has summarized the following for you. I hope you can get something from this article.

An Overview of using HPA to realize the Auto-scaling of Business on TKE

Kubernetes Pod horizontal automatic scaling (Horizontal Pod Autoscaler, hereinafter referred to as HPA) can automatically expand and shrink the number of copies of Pod based on CPU utilization, memory utilization and other custom metrics to make the overall measurement level of the workload service match the target value set by the user. This document introduces and uses the HPA feature of Tencent Cloud CCS TKE to achieve automatic horizontal scaling and scaling of Pod.

Working with scen

The automatic scaling feature of HPA enables CCS to have a very flexible and adaptive ability, which can rapidly expand multiple Pod copies within user settings to cope with the sharp surge of business load, and can also save computing resources to other services according to the actual situation when the business load becomes smaller. The whole process automation does not require human intervention, so it is very suitable for large service fluctuations. Business scenarios with a large number of services and frequent capacity expansion, such as e-commerce services, online education, financial services and so on.

Overview of the principle

The Pod horizontal automatic expansion and contraction feature is realized by Kubernetes API resources and controllers. The expansion process and description are as follows:

Tip: this feature is currently in beta version, and Pod automatic horizontal scaling is not suitable for objects that cannot be scaled, such as DaemonSet resources.

HPA Controller: the control component that controls the HPA expansion and contraction logic.

Metrics Aggregator: metric aggregator. Typically, the controller acquires metrics from a series of aggregated API (metrics.k8s.io, custom.metrics.k8s.io, and external.metrics.k8s.io). Metrics.k8s.io API is usually provided by the Metrics server, and the community version can provide basic CPU and memory metrics. Compared with the community version, TKE uses custom Metrics Server to collect metrics trigger types for a wider range of HPA metrics, including CPU, memory, hard disk, network and GPU metrics. For more details, please see TKE auto-scaling metrics description.

Tip: the controller can also obtain indicators directly from Heapster. However, since Kubernetes 1.11, the way of obtaining index properties from Heapster has been abandoned.

HPA algorithm for calculating the number of target copies: please refer to the working principle of the TKE HPA expansion and reduction algorithm. For more details of the algorithm, please see algorithm details.

prerequisite

Registered for Tencent Cloud account.

Logged in to the Tencent Cloud CCS console.

A TKE cluster has been created. For more information about creating a cluster, see creating a cluster.

Operation steps

Step 1: deploy the test workload

The creation result of this example is shown in the following figure:

Step 2: configure HPA

Bind a HPA configuration for the test workload in the TKE console. For more information on how to bind and configure HPA, please see HPA procedure. This example configures a policy that triggers capacity expansion when the network outbound bandwidth reaches 0.15Mbps (150Kbps).

Step 3: functional Verification

Start a temporary Pod in the cluster to test the configured HPA function (simulated client):

Kubectl run-it-- image alpine hpa-test-- restart=Never-- rm / bin/sh

Run the following command in a temporary Pod to simulate a large number of requests to access the "hpa-test" service in a short time to increase the egress traffic bandwidth:

# hpa-test.default.svc.cluster.local is the domain name of the service in the cluster. When you need to stop the script, press Ctrl+C to while true; do wget-Q-O-hpa-test.default.svc.cluster.local; done.

After executing the simulation request command in the test Pod, by observing the monitoring of the number of Pod of the workload in the following figure, we can see that the number of copies of the time-sharing workload was expanded to 2 at 16:21, from which it can be inferred that the expansion event of HPA has been triggered.

Through the monitoring of the network egress bandwidth of the workload in the following figure, we can see that the network egress bandwidth increases to about 199 Kbps at 16:21, which has exceeded the target value set by HPA. It is further proved that the HPA expansion algorithm triggers one replica to meet the set target value, so the number of copies of the workload becomes two.

Note: the HPA scaling algorithm not only uses formula dimensions to control the expansion logic, but also measures the need for capacity expansion or reduction in multiple dimensions. For more information, please see the algorithm details, so there may be a slight deviation in the actual situation.

Next, simulate the capacity reduction process, and manually stop executing the simulation request command around 16:24. From the monitoring below, we can see that the network egress bandwidth decreases to the position before capacity expansion. According to the logic of HPA, the conditions for workload capacity reduction have been met.

However, from the monitoring of the number of Pod of the workload in the figure below, we can see that the workload triggers the reduction of HPA at 16:30. This is because it triggers the algorithm of HPA with a default tolerance time of 5 minutes to prevent frequent expansion caused by short-term fluctuations in metrics. For more information, please see cooling / delay support. You can see from the figure below that the number of copies of the workload has been reduced to the original number of copies according to the HPA scaling algorithm 5 minutes after stopping the command.

When a HPA scaling event occurs in TKE, it will be displayed in the event list of the corresponding HPA instance, as shown below. It should be noted that the time of the event notification list is divided into "first occurrence time" and "last occurrence time". "first occurrence time" represents the first occurrence time of the same event, and "last occurrence time" is the latest time of the same event. therefore, from the "last occurrence time" field of the event list in the following figure, we can see that the expansion event time in this example is 16:21:03. The time of the downsizing event is 16.29Vol 42, which coincides with the time point seen by the workload monitoring.

In addition, the workload event list records the number of added or deleted replicas of the workload when HPA occurs. You can see from the following figure that the time point of workload scaling also coincides with that of the HPA event list. The time point for increasing the number of replicas is 16:21:03, and the time point for reducing the number of replicas is 16: 29:42.

In this example, the HPA function of TKE is mainly demonstrated. The network egress bandwidth metric type defined by TKE is used as the expansion metric of workload HPA. When the actual measurement value of workload exceeds the metric target value of HPA configuration, HPA calculates the appropriate number of replicas according to the expansion algorithm to achieve horizontal expansion to ensure that the measurement indicators of workload meet expectations and ensure that the workload runs healthily and stably. When the actual metric is much lower than the metric target configured by HPA, HPA will calculate the appropriate number of copies after the tolerance time to achieve horizontal capacity reduction, properly release idle resources, and achieve the purpose of improving resource utilization, and the whole process will be recorded in the HPA and workload event list, making the entire workload level expansion and scaling process traceable.

After reading the above, do you have any further understanding of how to use HPA to achieve business self-scaling in TKE? If you want to know more knowledge or related content, please follow the industry information channel, thank you for your support.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.