Kubernetes cluster scale horizontally-- HPA (auto scaling) 07/01 Update SLTechnology News&Howtos

Kubernetes cluster scale horizontally-- HPA (auto scaling)

2025-07-01 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Servers >

Shulou(Shulou.com)06/02 Report--

Kubernetes clusters can scale up or down services through Replication Controller's scale mechanism to achieve scalable services.

Kubernetes cluster auto-scaling is divided into:

Sacle manual scaling: refer to the basic management of K8s resource objects using the command line (upgrade, rollback, expansion, reduction);autoscale automatic scaling: that is, HPA introduced in this blog post;

Kubernetes automatic extension is mainly divided into:

Horizontal expansion: increase or decrease in the number of instances; vertical expansion: increase or decrease in resources that can be used by a single instance, such as increasing CPU and memory;

The full name of HPA is Horizontal Pod Autoscaling. It can dynamically expand and shrink the number of copies according to the utilization rate of current pod resources (such as CPU, disk, memory, etc.), so as to reduce the pressure on each pod. When the pod load reaches a certain threshold, more new pods will be generated to share the pressure according to the expansion and contraction strategy. When the pod is relatively idle, the number of copies of the pod will be automatically reduced after a stable idle period.

To realize the automatic scaling function, it is also necessary to deploy the heapster service, which is used to collect and count the utilization rate of resources. The kubectl top command is supported. The heapster service is integrated into the prometheus MertricServer service. Therefore, for convenience, I deploy the HPA (Dynamic Scaling) service on the environment based on the prometheus service here.

You can refer to the third deployment Prometheus service in Kubernetes 'three visual UI to run prometheus service. If you don't want to deploy prometheus, you can refer to github to deploy heapster service separately.

In short, if you want to use HPA, you must ensure that you can execute the following commands on the master node:

[root@master ~]# kubectl top node #View node resource usage NAME CPU(cores) CPU% MEMORY(bytes) MEMORY% master 1317m 65% 1383Mi 80% node01 1237m 61% 1082Mi 62% node02 1146m 57% 1045Mi 60% 2. Implement pod automatic expansion and reduction 1) Generate HPA controller [root@master ~]# kubectl run php-apache --image=mirrorgooglecontainers/hpa-example --requests=cpu=200m --expose --port=80#Run hpa resource named php-apache, And set the resource requesting CPU to 200m and expose a port 80 [root@master ~]# kubectl autoscale deployment php-apache --cpu-percent=50 --min=1 --max=10#When the CPU utilization rate of the deployment resource object of the hpa resource reaches 50%, the expansion is carried out. It can be expanded up to 10 [root@master ~]# kubectl get svc| grep php-apache #View svc cluster IPphp-apache for php-apache ClusterIP 10.97.45.108 80/TCP 44m[root@master ~]# kubectl get pod | grep php-apa #Make sure the current pod is working properly php-apache-867f97c8cb-9mpd6 1/1 Running 0 44m2) Simulate consumption of php-apache resources and verify whether pod automatically expands and shrinks

Open multiple terminals (node nodes can also be used) and make endless loop requests to the pod of php-apache, as follows (if your system resources are sufficient, you can choose to open multiple terminals and make endless loop requests to the pod. I have opened two node terminals here and requested the pod of php-apache at the same time):

[root@node01 ~]# while true; do wget -q -O- 10.97.45.108; done #OK is normal #Simulate concurrent requests from multiple users to php-apache pod [root@master ~]# kubectl get hpa #View CPU usage by hpa resources #You can also use the "-w" option to monitor CPU usage in real time NAME REFERENCE TARGETS MINPODS MAXPODS REPLICAS AGEphp-apache Deployment/php-apache 416%/50% 1 10 10 56m[root@master ~]# kubectl get pod#After running the endless loop request for a while, check the number of pods, of course, use the-w option to monitor the pod changes in real time NAME READY STATUS RESTARTS AGEphp-apache-867f97c8cb-6jsjq 1/1 Running 0 4m9sphp-apache-867f97c8cb-7xd5x 1/1 Running 0 51sphp-apache-867f97c8cb-9mpd6 1/1 Running 0 56mphp-apache-867f97c8cb-dhng7 1/1 Running 0 3m8sphp-apache-867f97c8cb-qc9hr 1/1 Running 0 2m22sphp-apache-867f97c8cb-rj494 1/1 Running 0 3m38sphp-apache-867f97c8cb-sbn9n 1/1 Running 0 3m38sphp-apache-867f97c8cb-vzfbg 1/1 Running 0 4m9sphp-apache-867f97c8cb-vzfbg 1/1 Running 0 5m19sphp-apache-867f97c8cb-vzfbg 1/1 Running 0 3m39s #Of course, the maximum number of pods can only be generated is 10, because we stipulated that we can generate up to 10 pods before.

When the endless loop request is stopped, the number of pods will not be reduced immediately, but will be reduced after a period of time to prevent traffic from surging again.

At this point, HPA realizes the automatic expansion and reduction of the number of pod copies.

--------This article ends here, thanks for reading-------

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.