What is the cluster scaling principle of K8s? 07/09 Update SLTechnology News&Howtos

What is the cluster scaling principle of K8s?

2025-07-09 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Servers >

Shulou(Shulou.com)05/31 Report--

This article introduces what the cluster scaling principle of K8s is, the content is very detailed, interested friends can refer to it, I hope it can be helpful to you.

An important feature of Ali Cloud K8s cluster is that the nodes of the cluster can be dynamically increased or decreased. With this feature, the cluster can expand the capacity of new nodes in the case of insufficient computing resources, and release nodes to save costs when the resource utilization is reduced.

We will discuss the implementation principle of capacity expansion and reduction of Aliyun K8s cluster. Understand the principle of implementation, and when we encounter problems, we can efficiently troubleshoot and locate the cause.

Node increase principle

Aliyun K8s cluster can add nodes to the cluster by adding existing nodes, cluster expansion, and automatic scaling. Among them, adding existing nodes can be divided into manually adding existing nodes and automatically adding existing nodes. The components involved in the addition of nodes are node preparation, elastic scaling (ESS), management, Cluster Autoscaler, and scheduler.

Manually add an existing node

Node preparation is actually the process of installing and configuring an ordinary ECS instance into a K8s cluster node. This process can be done with a single command. This command uses curl to download the attach_node.sh script and then runs it on ECS with openapi token as an argument.

Curl http:///public/pkg/run/attach//attach_node.sh | bash-s-openapi-token

Here token is the right key, and value is the basic information of the current cluster. The management and control of Aliyun K8s cluster will generate this pair when it receives a request to manually add an existing node, and return key to the user as token.

The value of this token (key) is that it allows attach_node.sh scripts to index anonymously to the basic information (value) of the cluster on ECS, which is critical to node preparation.

In general, the node is ready to do two things, read and write. Read is data collection, write is node configuration.

Most of the reading and writing process here is very basic, you can read the script to understand the details. The only thing that needs to be noted is the process by which kubeadm join registers a node with Master. This process requires the establishment of mutual trust between the new node and the cluster Master.

On the one hand, the bootstrap token obtained by the newly added node from the management and control (unlike openapi token, this token is part of the value) is actually obtained by the management and control from the cluster Master through a credible way. The newly added node uses this bootstrap token to connect to the Master,Master and can establish the trust to the newly added node by verifying the bootstrap token.

On the other hand, the new node obtains the cluster cluster-info,cluster-info, including the cluster CA certificate, from the Master kube-public namespace anonymously, and signs the CA using the cluster bootstrap token. The new node uses the bootstrap token obtained from the control to generate a new signature for the CA, and then compares this signature with the signature in the cluster-info. If the two signatures are the same, it means that cluster-info and bootstrap token come from the same cluster. Because of the trust control, the new node establishes the trust to Master.

Automatically add existing nodes

Automatically add existing nodes without artificially copying and pasting the script to the ECS command line to complete the node preparation process. The control uses the features of ECS userdata, writes scripts similar to those prepared by the above nodes into ECS userdata, then restarts ECS and replaces the system disk. When ECS restarts, the script in Userdata is automatically executed to complete the process of adding nodes. You can actually confirm this part by looking at the node userdata.

! / bin/bashmkdir-p / var/log/acscurl http:///public/pkg/run/attach/1.12.6-aliyun.1/attach_node.sh | bash-s-docker-version-- token-- endpoint-- cluster-dns > / var/log/acs/init.log

Here we see that the parameters of attach_node.sh are very different from those of the previous section. In fact, the parameters here are all the contents of the previous section value, that is, the basic information that controls the creation and maintenance of the cluster. Automatically adding existing nodes omits the process of obtaining value through key.

Cluster expansion

Cluster expansion is different from adding existing nodes above, and this feature is aimed at situations where newly purchased nodes are needed. In the implementation of cluster expansion, on the basis of adding existing nodes, an auto-scalable ESS component is introduced. The ESS component is responsible for the process from scratch, while the rest of the process is similar to adding an existing node, relying on ECS userdata scripts to prepare the node. The following figure shows the process of controlling the creation of ECS from scratch through ESS.

Automatic telescopic

The first three methods are scaling methods that require human intervention, but the essence of automatic scaling is that it can automatically create ECS instances and join the cluster when the business demand increases. To automate, another component, Cluster Autoscaler, is introduced here. Automatic cluster scaling consists of two separate processes.

The first process is mainly used to configure the specification attributes of the node, including setting the user data of the node. This user data is similar to a script that manually adds existing nodes, except that it adds some special tags for scenarios such as automatic scaling. The attach_node.sh script sets the attributes of the node based on these tags.

! / bin/shcurl http:///public/pkg/run/attach/1.12.6-aliyun.1/attach_node.sh | bash-s-openapi-token-- ess true-- labels k8s.iobind clusterCorb autoscalerproof truth theory workloadloadtypewritten CPU journal k8s.aliyun.compose true

The second process is the key to automatically adding nodes. A new component, Autoscaler, is introduced, which runs in the K8s cluster in the form of Pod. In theory, we can think of this component as a controller. Because its function is similar to the controller, it basically listens to the Pod state so that when the Pod cannot be scheduled because of insufficient node resources, modify the ESS scaling rules to add new nodes.

Here is a knowledge point: the cluster scheduler's measure of resource adequacy is "booking rate", not "utilization rate". The difference between the two is similar to the hotel room rate booking rate and the actual occupancy rate: it is entirely possible that someone has booked a hotel, but does not actually check in. When the auto scaling feature is enabled, we need to set the downsizing threshold, which is the offline of the "booking rate". There is no need to set a threshold for capacity expansion. Because Autoscaler expands the cluster, it depends on the scheduling status of Pod: when Pod cannot be scheduled due to the high "booking rate" of node resources, Autoscaler will expand the cluster.

Node reduction principle

Unlike adding nodes, the operation of reducing nodes in a cluster has only one entry to remove nodes. However, for the nodes added by different methods, they are removed in slightly different ways.

First of all, by adding the nodes joined by the existing nodes, you need three steps to remove them: the management control removes the nodes from the cluster through ECS API, the ECS userdata; control removes the nodes from the cluster through K8s API, and the management controls executes the kubeadm reset command to clean up the nodes on the ECS through ECS InvokeCommand.

Secondly, the nodes added through cluster expansion add the operation of breaking the relationship between ESS and ECS on the basis of the above. This operation is done by the control call ESS API.

Finally, the nodes that are dynamically increased by Cluster Autoscaler are automatically removed and released by Cluster Autoscaler when the "booking rate" of cluster CPU resources decreases. The trigger point is the CPU "booking rate", which is the reason why Metrics is written in the picture above.

Generally speaking, the increase and decrease of K8s cluster nodes mainly involves four components, namely, Cluster Autoscaler,ESS, management and node itself (preparation or cleaning). Depending on the scenario, we need to troubleshoot different components. Cluster Autoscaler is an ordinary Pod, and its log acquisition is the same as that of other Pod. ESS auto scaling has its own dedicated console, where we can check the logs and status of related sub-instances such as scaling configuration and scaling rules. The controlled logs can be viewed by viewing the log feature. Finally, the preparation and cleaning of nodes is to check the execution process of the corresponding scripts.

Most of the above reasons are reasonable, but if combined with the practice of problem diagnosis, it will certainly be helpful to everyone's operation and maintenance work.

On the K8s cluster scaling principle is shared here, I hope the above content can be of some help to you, can learn more knowledge. If you think the article is good, you can share it for more people to see.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.