HPA realizes automatic expansion and reduction of the number of pod replicas in Kubernetes cluster 02/10 Update SLTechnology News&Howtos

HPA realizes automatic expansion and reduction of the number of pod replicas in Kubernetes cluster

2026-02-10 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Servers >

Shulou(Shulou.com)06/03 Report--

Kubernetes clusters can expand or reduce the capacity of services through Replication Controller's scale mechanism to achieve scalable services.

The automatic scaling of Kubernetes cluster is divided into:

Sacle manual scaling: refer to the basic management of K8s resource objects using command line methods (upgrade, rollback, expansion, scaling down); autoscale auto scaling: that is, HPA introduced in this blog post

Kubernetes automatic extension is mainly divided into:

Horizontal expansion: for the increase or decrease in the number of instances; vertical expansion: that is, the increase or decrease of resources that can be used by a single instance, such as increasing CPU and memory; I. introduction to HPA

The full name of HPA is (Horizontal Pod Autoscaling). It can dynamically expand and reduce the number of copies according to the current utilization of pod resources (such as CPU, disk, memory, etc.), so as to reduce the pressure on each pod. When the pod load reaches a certain threshold, more new pod will be generated according to the capacity expansion strategy to share the pressure. When the use of pod is relatively idle, after a stable idle period of time, the number of pod copies will be automatically reduced.

To achieve automatic capacity expansion, you also need to deploy heapster services to collect and count the utilization of resources, support kubectl top commands, and heapster services are integrated into prometheus (Prometheus) MertricServer services. So, for convenience, I deploy HPA (dynamic scaling) services based on prometheus services here.

You can refer to the third of the three visual UI interfaces of blog Kubernetes to deploy Prometheus services to run prometheus services. If you do not want to deploy prometheus, you can refer to github to deploy heapster services separately.

All in all, if you want to use HPA, you must ensure that you can execute the following command on the master node:

[root@master ~] # kubectl top node # View the resource usage of nodes NAME CPU (cores) CPU% MEMORY (bytes) MEMORY% master 1317m 65% 1383Mi 80% node01 1237m 61% 1082Mi 62% node02 1146m 57% 1045Mi 60% II, Realize automatic expansion and reduction of pod 1) generate HPA controller [root@master] # kubectl run php-apache-- image=mirrorgooglecontainers/hpa-example-- requests=cpu=200m-- expose-- port=80# run hpa resources Name is php-apache, and set the resource requesting CPU to 200m and expose a port 80 [root@master ~] # kubectl autoscale deployment php-apache-- cpu-percent=50-- min=1-- max=10#. When the CPU utilization of the deployment resource object of the hpa resource reaches 50%, expand the capacity. You can expand the capacity to a maximum of 10 [root@master ~] # kubectl get svc | grep php-apache # View the svc cluster corresponding to php-apache IPphp-apache ClusterIP 10.97.45.108 80/TCP 44m [root@master ~] # kubectl get pod | grep php-apa # determines that the current pod is running php-apache-867f97c8cb-9mpd6 1 Running 0 44m2) to simulate the resources consumed by php-apache And verify whether pod will automatically expand and reduce capacity.

Open multiple terminals (you can also use node nodes) to request the pod of php-apache in an endless loop, as follows (if you have sufficient system resources, you can choose to open multiple terminals and make an endless loop request for pod. Here, I open two node terminals and request the pod of php-apache):

[root@node01] # while true; do wget-Q-O-10.97.45.108 Done # always returns ok as a normal phenomenon # simulates concurrent requests caused by multi-users to php-apache 's pod [root@master ~] # kubectl get hpa # to check the occupancy of hpa resources to cpu # you can also use the "- w" option to monitor CPU resource occupancy in real time NAME REFERENCE TARGETS MINPODS MAXPODS REPLICAS AGEphp-apache Deployment/php-apache 416% / 50% 1 10 10 56m [root@master ~] # kubectl get pod# after running an endless loop request for a period of time View the number of pod Of course, the-w option is used to monitor the changes of pod in real time NAME READY STATUS RESTARTS AGEphp-apache-867f97c8cb-6jsjq 1 Running 0 4m9sphp-apache-867f97c8cb-7xd5x 1 Running 0 51sphp-apache-867f97c8cb-9mpd6 1 56mphp-apache-867f97c8cb-dhng7 1 Running 0 56mphp-apache-867f97c8cb-dhng7 1 3m8sphp -apache-867f97c8cb-qc9hr 1 + 1 Running 0 2m22sphp-apache-867f97c8cb-rj494 1 + 1 + 1 Running 0 3m38sphp-apache-867f97c8cb-sbn9n 1 + 1 Running 0 3m38sphp-apache-867f97c8cb-vzfbg 1 + + 1 Running 0 4m9sphp-apache-867f97c8cb-vzfbg 1 + + 1 Running 0 5m19sphp-apache-867f97c8cb-vzfbg 1 + + 1 Running 0 3m39s # of course, the maximum can only generate 10 pod Because we previously stipulated that a maximum of 10 pod should be generated.

When the endless loop request is stopped, the number of pod will not be reduced immediately, but will be reduced after a period of time to prevent the traffic from surging again.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.