Kubernetes Advanced Node automatic capacity expansion / reduction 07/02 Update SLTechnology News&Howtos

Kubernetes Advanced Node automatic capacity expansion / reduction

2025-07-02 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Servers >

Shulou(Shulou.com)06/02 Report--

Kubernetes advanced Node automatic capacity expansion / reduction directory: 1, Cluster AutoScaler cloud vendor capacity expansion / reduction 2, Ansible automatic capacity expansion Node

1 、 Cluster AutoScaler

Capacity expansion: Cluster AutoScaler regularly checks whether there are sufficient resources to schedule the newly created Pod, and when the resources are insufficient, Cloud Provider will be called to create a new Node.

Workflow: regularly check whether the resources in the cluster are sufficient. If you find that the cluster resources are not enough, pod will appear the status of pending and wait for a ready resource. If no new resources are not released, it will wait all the time, and the deployed service will not provide services. At this time, autoscaler will detect the utilization of a resource and see if there is a shortage of resources. If it exists, the cloud provider provider will be called to create a new node

Scale down: Cluster AutoScaler also monitors the resource usage of Node regularly. When a Node resource utilization is low for a long time (less than 50%), a resource with low utilization for a long time is offline to determine and automatically delete its virtual machine from the cloud service provider. At this point, the above pod is expelled and dispatched to other nodes to provide services.

Supported cloud providers:

If you use them, cloud vendors can use their component solutions, and generally they have completed the docking.

Ali Yun: https://github.com/kubernetes/autoscaler/blob/master/cluster-autoscaler/cloudprovider/alicloud/README.md

AWS: https://github.com/kubernetes/autoscaler/blob/master/cluster-autoscaler/cloudprovider/aws/README.md

Azure: https://github.com/kubernetes/autoscaler/blob/master/cluster-autoscaler/cloudprovider/azure/README.md

Summary:

The main problem solved by auto scaling is the contradiction between capacity planning and actual load, that is, in my batch of machines, if the load is high, can we quickly expand the capacity of these servers? the main purpose of this problem is to solve such a problem. the test is rapid expansion and reduction.

It is the traditional elastic scaling into k8s. If you think about it, you will encounter two kinds of problems.

1. Non-uniform machine specifications result in fragmentation of the percentage of machine utilization.

In K8s cluster, not all machines are the same, and we don't have to be the same. In fact, K8s has already done a good job of resource management for us. These cpu and memory will be scheduled in the resource pool as a whole. In the case of capacity reduction and expansion, especially in the case of capacity reduction and non-uniform machine specifications, some small machines may be scaled down. Well, it is very possible that you do not have much effect after downsizing. Maybe your big machine has not been scaled down, so your effect is not very obvious. If a large machine is scaled down, it may make you scramble for resources. Maybe you don't have a lot of redundancy.

2. Machine utilization does not rely solely on host computing.

Before you make a container, plan the resources of a server and apply for memory and CPU. Then it is relatively easy to scale down and expand the capacity in that case. For example, it is easy to expand the capacity of your server to see if the resources of your server reach 80%, 90%, and then apply for a new machine. The same is true of capacity reduction. Take a look at the low overall resource utilization rate of idle resources, and it is still easy to reduce a few. But in the scenario of K8s, in fact, some of the applications we deployed above do not need to pay too much attention to the resources at the bottom of the server and do not care about the configuration utilization of my machine. What it cares about is whether you can give me enough of the resources I apply for and the applications I deploy. In K8s, capacity planning is based on the values of request and limit. Request is actually a quota in K8s, and you deploy an application. The resources applied for are all defined by request, so there is one more dimension in K8s, the resource utilization of request and host, so you should consider this point when reducing and expanding capacity, that is, you cannot allocate it according to your current node, because you apply to deploy this quota of request, even if it is not used, you cannot say that it will be removed, so you should consider the overall resource utilization of the overall request. And maintain a certain degree of redundancy above the utilization of this resource.

2.2 Ansible expansion Node

The third is that we manually intervene to expand and reduce capacity, which is a common self-built method.

1. To trigger a new Node, you need to know if you want to add a node. two。 Call the Ansible script to deploy components, how to prepare this node, whether the new machine is ready, and whether these components are deployed. 3. Check whether the service is available and the newly added components to see if it is normal 4. 5. Call API to add the new Node to the cluster or enable Node to automatically join 5. 5. Observe the status of the new Node, monitor, observe the new node, running logs, resource status. 6. Complete Node expansion and receive new Pod

Simulate the expansion of node nodes. Because I have too many resources, I cannot allocate them, resulting in the status of pending.

[root@k8s-master1] # kubectl run web-image=nginx-replicas=6-requests= "cpu=1 Memory=256Mi "[root@k8s-master1 ~] # kubectl get podNAME READY STATUS RESTARTS AGEweb-944cddf48-6qhcl 1 7ldsv 1 Running 0 15mwebcolor 944cddf48-7ldsv 1 Running 0 15mwebwash944cddf48-7nv9p 0qqr 1 Pending 0 2sweb-944cddf48-b299n 1lap 1 Running 0cddf48-1 Pending 0 15mweb- 944cddf48-pl4zt 1/1 Running 0 15mweb-944cddf48-t8fqt 1/1 Running 0 15m

The current state is that pod cannot allocate resources to the current node due to insufficient resource pool, so now we need to expand our node node.

[newnode] 10.4.7.22 node_name=k8s-node3 [root@ansible ansible-install-k8s-master] # ansible-playbook-I hosts add-node.yml-uroot-k

Check that the request to join node has been received, and run through the

[root@k8s-master1 ~] # kubectl get csrNAME AGE REQUESTOR CONDITIONnode-csr-0i7BzFaf8NyG_cdx_hqDmWg8nd4FHQOqIxKa45x3BJU 45m kubelet-bootstrap Approved,Issued

View node node status

[root@k8s-master1 ~] # kubectl get nodeNAME STATUS ROLES AGE VERSIONk8s-master1 Ready 7d v1.16.0k8s-node1 Ready 7d v1.16.0k8s-node2 Ready 7d v1.16.0k8s-node3 Ready 2m52s v1.16.0

View the overall utilization of the allocation of the resource status

[root@k8s-master1 ~] # kubectl describe node k8s-node1

Downsizing Node node

If you scale down, you will expel the pod on this node, which may affect both the business and the cluster.

If you want to delete a node from the kubernetes cluster, correct process 1, get the node list Kubectl get node2, set the unschedulable Kubectl cordon $node_name3, expel the podKubectl drain $node_name-I gnore-daemonsets4 on the node, remove the node, you can remove the node directly: Kubectl delete node $node_node, we smoothly remove a k8s node

First of all, it is necessary to know which node of the entire cluster should be deleted. If this manual intervention is done, it is necessary to determine which node is worth downsizing. It must be that the resource utilization rate is low, and the node with the least priority for downsizing the resources running above.

Then set unschedulable, because it is possible at any time for a new pod to dispatch above to prevent a new pod from dispatching, which has a command related to kubectl-help.

Cordon marks node as unschedulable [root@k8s-master1 ~] # kubectl cordon k8s-node3node/k8s-node3 cordoned

The unscheduled node will be marked here.

[root@k8s-master1 ~] # kubectl get nodeNAME STATUS ROLES AGE VERSIONk8s-master1 Ready 7d1h v1.16.0k8s-node1 Ready 7d1h v1.16.0k8s-node2 Ready 7d1h v1.16.0k8s-node3 Ready,SchedulingDisabled 45m v1.16.0

Now this stage will not have any effect on the pod of the existing stage.

Now expel the pod that already exists on the node, so now it is necessary to set a certain maintenance period for this node node. This also has related commands.

Drain Drain node in preparation for maintenance

If you set this state, it will expel the pod on this node, and it will give you a hint that the current node is unschedulable and is now being expelled. An error pod of daemonset is reported here. Because the flanneld we deployed is started using daemonset, this situation will occur, and this can be ignored directly.

[root@k8s-master1 ~] # kubectl drain k8s-node3node/k8s-node3 already cordonederror: unable to drain node "k8s-node3", aborting command...There are pending nodes to be drained: k8s-node3error: cannot delete DaemonSet-managed Pods (use-- ignore-daemonsets to ignore): ingress-nginx/nginx-ingress-controller-qxhj7, kube-system/kube-flannel-ds-amd64-j9w5l

This plus the following command

[root@k8s-master1 ~] # kubectl drain k8s-node3-- ignore-daemonsetsnode/k8s-node3 already cordonedWARNING: ignoring DaemonSet-managed Pods: ingress-nginx/nginx-ingress-controller-qxhj7 Kube-system/kube-flannel-ds-amd64-j9w5levicting pod "web-944cddf48-nsxgg" evicting pod "web-944cddf48- 7nv9p" pod/web-944cddf48-nsxgg evictedpod/web-944cddf48-7nv9p evictednode/k8s-node3 evicted [root@k8s-master1 ~] # kubectl get nodeNAME STATUS ROLES AGE VERSIONk8s-master1 Ready 7d1h v1.16.0k8s-node1 Ready 7d1h v1.16.0k8s-node2 Ready 7d1h v1.16.0k8s-node3 Ready SchedulingDisabled 53m v1.16.0 [root@k8s-master1 ~] # kubectl get podNAME READY STATUS RESTARTS AGEweb-944cddf48-6qhcl 1 7ldsv 1 Running 0 127mm webcolor 944cddf48-7ldsv 1 + 1 Running 0 127mweb-944cddf48-b299n 1 + 1 Running 0 127mweb-944cddf48-cc6n5 0max 1 Pending 0 38sweb-944cddf48-pl4zt 1 Running 0 127mweb-944cddf48-t8fqt 1 + 1 Running 0 127mweb-944cddf48-vl5hg 0 + + 1 Pending 0 38s [root@k8s-master1] # kubectl get pod-o wideNAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATESweb-944cddf48-6qhcl 1 + 1 Running 0 127m 10.244.0.6 k8s-node2 Web-944cddf48- 7ldsv 1/1 Running 0 127m 10.244.0.5 k8s-node2 web-944cddf48-b299n 1/1 Running 0 127m 10.244.0.7 k8s-node2 web-944cddf48-cc6n5 0/1 Pending 0 43s web-944cddf48- Pl4zt 1/1 Running 0 127m 10.244.2.2 k8s-master1 web-944cddf48-t8fqt 1/1 Running 0 127m 10.244.1.2 k8s-node1 web-944cddf48-vl5hg 0/1 Pending 0 43s

Now, after downsizing, due to the shortage of resources, the status of pending appears again, so this is the downsizing. After the downsizing, it is up to the controller to ensure the number of copies of pod. Currently, you have to ensure the redundancy of one other node, so it makes sense to scale down, otherwise the status of pending will not work.

Then delete the k8s-node3 node

[root@k8s-master1 ~] # kubectl get nodeNAME STATUS ROLES AGE VERSIONk8s-master1 Ready 7d2h v1.16.0k8s-node1 Ready 7d2h v1.16.0k8s-node2 Ready 7d2h v1.16.0k8s-node3 Ready,SchedulingDisabled 71m v1.16.0

Remove nodes, when the expulsion is completed, also ensure that there are expected copies on other nodes, but also make a certain strategy, during the offline period can not do pod scheduling.

Or shut down the node3. First of all, make sure that the resources of other nodes are redundant. Even if there are other situations, K8s has a certain mechanism to move the pod on the malfunctioning node node to other normal node nodes within 5 minutes, such as micro services. When we go to expel, the business will also be affected. After all, we need to expel the node on this node. Move to other nodes, so try to do it during business troughs.

[root@k8s-master1 ~] # kubectl delete node k8s-node3node "k8s-node3" deleted [root@k8s-master1 ~] # kubectl get nodeNAME STATUS ROLES AGE VERSIONk8s-master1 Ready 7d2h v1.16.0k8s-node1 Ready 7d2h v1.16.0k8s-node2 Ready 7d2h v1.16.0

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.