How to improve Resource Utilization by kubernetes 07/04 Update SLTechnology News&Howtos

How to improve Resource Utilization by kubernetes

2025-07-04 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Servers >

Shulou(Shulou.com)06/01 Report--

This article mainly explains how kubernetes can improve the utilization of resources. Interested friends may wish to have a look at it. The method introduced in this paper is simple, fast and practical. Let's let the editor take you to learn "how kubernetes improves the utilization of resources".

Background

The development of public cloud has brought great help to the stability, scalability and convenience of the business. This kind of service, which uses rent instead of buying, and provides perfect technical support and guarantee, should bring the effect of reducing cost and increasing efficiency for the business. But in fact, the cloud does not mean that the cost must be less. It also needs to adapt to the cloud business application development, architecture design, management, operation and maintenance, fair use and other solutions, in order to really help the business to reduce costs and increase efficiency. In the previous article "Analysis of the phenomenon of containerized Computing Resource Utilization" in the "Kubernetes cost reduction and efficiency Standard Guide" series, we can see that the improvement of resource utilization after cloud on IDC is limited. Even if it has been capacitated, the average utilization rate of nodes is still only about 13%, and there is a long way to go to improve resource utilization.

This article will take you to understand why CPU and memory resource utilization in Kubernetes clusters are usually so low. At this stage, what are the production methods on TKE that can easily improve resource utilization?

Resource waste scene

Why is resource utilization usually so low? First of all, you can take a look at the actual resource usage scenarios of several businesses:

1. There is a widespread waste of more than 50% of resource reservations.

The Request field in Kubernetes is used to manage the mechanism of container reservation for CPU and memory resources to ensure at least the amount of resources that can be reached by the container. This part of the resources cannot be preempted by other containers. For more information, please see. When the Request setting is too small to guarantee the amount of resources of the business, it is unable to carry when the load of the business becomes high, so users are usually used to setting the Request high to ensure the reliability of the service. But in fact, the load of the business is not very high in most of the time. Take CPU as an example. The following figure shows the relationship between resource reservation (Request) and actual usage (CPU_Usage) of a container in an actual business scenario: the resource reservation is much larger than the actual usage, and the resources corresponding to the difference between the two cannot be used by other loads, so a large Request setting will inevitably lead to a greater waste of resources.

How to solve such a problem? At this stage, users need to set a more reasonable Request according to the actual load, and limit the unlimited request for resources, so as to prevent resources from being over-occupied by some businesses. You can refer to the settings of Request Quota and Limit Ranges later. In addition, TKE will launch Request recommended products to help users intelligently narrow the gap between Request and Usage, and effectively improve resource utilization while ensuring the stability of the business.

two。 The phenomenon of peak and trough of business resources is common, and the trough time is usually longer than the peak time, and the waste of resources is obvious.

Most businesses have peaks and troughs, such as bus systems that usually increase their load during the day and decrease at night, while games usually peak on Friday night and trough on Sunday night. As shown in the following figure: the same business has different requests for resources in different time periods. If the user sets a fixed Request,f, the utilization rate is very low when the load is low.

At this time, you can dynamically adjust the number of replicas to carry the peaks and troughs of the service with high resource utilization. You can refer to HPA, HPC and CA later.

3. Different types of business lead to great differences in resource utilization

Online services usually have a high load and high delay requirements during the day, so priority must be given to scheduling and operation, while offline computing services usually have relatively low requirements for running time and delay, and can theoretically run in the trough of online services. In addition, some businesses are computing-intensive and consume more CPU resources, while others are memory-intensive and consume more memory.

As shown in the figure above, offline and online services can be dynamically scheduled in the offline mixed section, allowing different types of services to run in different periods of time to improve resource utilization. For computing-intensive and memory-intensive services, affinity scheduling can be used to allocate more suitable nodes for services to effectively improve resource utilization. For specific methods, please refer to the offline mixing and affinity scheduling in the following article.

The method of improving Resource Utilization on Kubernetes

Based on a large number of users' actual business, Tencent Cloud CCS TKE has produced a series of tools to help users improve resource utilization easily and effectively. It mainly starts from two aspects: one is to use the native Kubernetes capabilities to manually divide and limit resources; the other is to combine the business characteristics of the automation scheme.

1. How to divide and restrict resources

Imagine that you are a cluster administrator, and now there are four business units using the same cluster. Your responsibility is to ensure the stability of the business, so that the business can really use resources on demand. In order to effectively improve the overall resource utilization of the cluster, it is necessary to limit the upper limit of resources used by each business and use some default values to prevent business overuse.

Ideally, the business should set up reasonable Request and Limit according to the actual situation. (Request is used to occupy resources, indicating at least the resources available to the container; Limit is used to restrict resources, indicating the resources available to the container at most. This is more conducive to the healthy operation of the container and the full use of resources. But in fact, users often forget to set the container's Request and Limit for resources. In addition, for teams / projects that share a cluster, they usually set the Request and Limit of their containers high to ensure the stability of their services. If you are using the TKE console, the following default values will be set for all containers when creating the load. The default value is estimated by TKE based on real business analysis, and there may be a deviation between it and the specific business requirements.

RequestLimitCPU (core) 0.250.5Memory (MiB) 2561024

For finer-grained division and management of resources, Resource Quota and Limit Ranges at the namespace level can be set on TKE.

1.1 using Resource Quota to divide resources

If you manage a cluster with four businesses, you can use namespaces and Resource Quota in order to achieve isolation between businesses and resource restrictions

Resource Quota is used to set the usage quota of namespace resources. A namespace is an isolated partition in a Kubernetes cluster. A cluster usually contains multiple namespaces. For example, Kubernetes users usually put different businesses in different namespaces. You can set different Resource Quota for different namespaces to limit the amount of resources used by a namespace to achieve the effect of pre-allocation and restriction. Resource Quota is mainly used in the following aspects. For more information, please see

Computing resources: the sum of Request and Limit of all containers to CPU and memory

Storage resources: the sum of all PVC storage resource requests

Number of objects: the total number of resource objects such as PVC/Service/Configmap/Deployment

Resource Quota usage scenario

Assign different namespaces to different projects / teams / businesses, and achieve the purpose of resource allocation by setting the Resource Quota of each namespace resource

Set the upper limit of the number of resources used in a namespace to improve the stability of the cluster and prevent excessive encroachment and consumption of resources by a namespace.

Resource Quota on TKE

Resource Quota has been produced on TKE. You can use Resource Quota directly in the console to limit the amount of resources used in a namespace. For more information, please see the documentation.

1.2 use Limit Ranges to restrict resources

What if users often forget to set the Request and Limit of a resource, or set a large value? As an administrator, if you can set different default values and scope of resources for different businesses, you can effectively reduce the workload of business creation and limit the excessive encroachment of resources.

Unlike Resource Quota's resource restrictions on namespaces as a whole, Limit Ranges applies to a single container under a namespace. It can prevent users from creating containers that are too small or too large for resources within the namespace, and prevent users from forgetting to set the Request and Limit of the container. Limit Ranges is mainly used in the following aspects. For more information, please see

Computing resources: set the range of CPU and memory usage for all containers

Storage resources: the range of storage space that all PVC can apply for

Scale setting: controls the ratio between Request and Limit of a resource

Default: set the default Request/Limit for all containers. If the container does not specify its own memory request and limit, it will be assigned the default memory request and limit.

Limit Ranges usage scenario

Setting default values for resources to prevent users from forgetting can also prevent QoS from expelling important Pod

Different businesses usually run in different namespaces, and different businesses usually use different resources. Setting different Request/Limit for different namespaces can improve resource utilization.

Limit the upper and lower limits of the container's use of resources to ensure the normal operation of the container, limit its request for too many resources

Limit Ranges on TKE

Limit Ranges has been produced on TKE. You can manage the Limit Ranges of the namespace directly in the console. For more information, please see the documentation.

two。 The method of automatically improving the utilization rate of resources

The above-mentioned methods of allocating and limiting resources using Resource Quota and Limit Ranges rely on experience and manual work, and mainly address unreasonable resource requests and allocations. How to make more automatic dynamic adjustment to improve resource utilization is a problem that users are more concerned about. Then, from the three major product directions of flexible scaling, scheduling and offline mixing, this paper describes in detail how to improve resource utilization.

2.1Elastic expansion 2.1.1 Elastic expansion of capacity according to target through HPA

As mentioned in the above resource waste scenario 2, if your business has peaks and troughs, a fixed resource Request is bound to cause resource waste during troughs. In such a scenario, if you can automatically increase the number of copies of business load during troughs, you can automatically reduce the number of copies of business load during troughs, which will effectively improve the overall utilization of resources.

HPA (Horizontal Pod Autoscaler) can automatically expand and shrink the number of Pod copies in Deployment and StatefulSet based on some indicators (such as CPU, memory utilization) to achieve workload stability and real use on demand.

HPA usage scenario

Traffic burst: a sudden increase in traffic will automatically increase the number of Pod when the load is overloaded in order to respond in a timely manner

Automatic capacity reduction: when the traffic is low and the utilization of resources by the load is too low, the number of Pod will be automatically reduced to avoid waste.

HPA on TKE

TKE supports many metrics for auto scaling based on Custom Metrics API, including CPU, memory, hard disk, network and GPU-related metrics, covering most HPA auto scaling scenarios. For more information, please see the description of auto scaling metrics. In addition, for complex scenarios such as automatic capacity expansion based on the QPS size of business order copies, you can install prometheus-adapter to achieve automatic capacity expansion. For more information, please see the documentation.

2.1.2 scaling up and reducing capacity regularly through HPC

Suppose your business is an e-commerce platform, and you want to carry out promotional activities on Singles Day, then you can consider using HPA to expand and reduce capacity automatically. However, HPA needs to monitor various indicators before responding. The expansion speed may not be fast enough to carry high traffic in a timely manner. In view of this expected surge in traffic, if the replica expansion can occur in advance, it will effectively carry the flow blowout. HPC (HorizontalPodCronscaler) is a self-developed component of TKE, which is designed to control the number of replicas regularly to achieve the impact of insufficient resources when early expansion and dynamic expansion is triggered in advance. Compared with CronHPA in the community, additional support:

Combined with HPA: you can turn HPA on and off regularly, making your business more flexible during peak hours.

Exception date setting: business traffic is unlikely to be regular all the time. Setting exception date can reduce manual adjustment of HPC.

Single execution: in the past, CronHPA is always executed permanently, similar to Cronjob. Single execution can be more flexible in response to big promotion scenarios.

HPC usage scenario

Take the game service as an example, the number of gamers soared from Friday night to Sunday night. If the game server can be scaled up before Friday night and scaled to its original size after Sunday night, it can provide a better experience for players. If you use HPA, the service may be affected because the expansion speed is not fast enough.

HPC on TKE

HPC has been commercialized on TKE, but you need to install HPC,HPC in "component Management" in advance to use the CronTab syntax format.

Installation:

Use:

2.1.3 automatically adjust the number of nodes through CA

Both HPA and HPC mentioned above automatically expand and shrink the number of replicas at the business load level to flexibly cope with the peaks and troughs of traffic and improve resource utilization. But for the cluster as a whole, the total number of resources is fixed. HPA and HPC only allow the cluster to have more spare resources. is there a way to recover some resources when the cluster is "empty" and expand the overall resources of the cluster when the cluster is "full"? Because the usage of the overall resources of the cluster directly determines the billing fee, this flexible expansion at the cluster level will really help you save costs.

CA (Cluster Autoscaler) is used to automatically expand and shrink the number of cluster nodes, in order to really improve the utilization of resources, and directly affect the cost of users, which is the key to reduce cost and increase efficiency.

CA usage scenario

At the peak of the service, expand the appropriate node according to the sudden load of the service.

During the service trough, the redundant nodes are released according to the idle condition of the resources.

CA on TKE

CA on TKE is for users to use in the form of node pool. CA is recommended to use it with HPA: HPA is responsible for the expansion and reduction of the application layer, and CA is responsible for the expansion of the resource layer (node layer). When the HPA expansion results in a shortage of the overall resources of the cluster, the Pending,Pod Pending of the Pod will trigger the CA to expand the node pool to increase the overall resources of the cluster. The overall expansion logic can be found in the following figure:

For more information on how to configure parameters and application scenarios, please see "manage Node like managing Pod", or refer to the official documentation of Tencent Cloud CCS.

2.2 scheduling

Kubernetes scheduling mechanism is an efficient and elegant resource allocation mechanism provided by Kubernetes natively, and its core function is to find the most suitable node for each Pod. In the TKE scenario, the scheduling mechanism helps to achieve the transition from application layer to resource layer. By making rational use of the scheduling capabilities provided by Kubernetes and configuring reasonable scheduling strategies according to business characteristics, we can also effectively improve the resource utilization in the cluster.

2.2.1 Node affinity

If your business is CPU-intensive and is accidentally dispatched to a memory-intensive node by the Kubernetes scheduler, the memory-intensive CPU is full, but the memory is hardly used, which will result in a big waste of resources. If you can set a flag for the node to indicate that it is a CPU-intensive node, and then set a flag when creating a business load, indicating that the load is a CPU-intensive load, Kubernetes's scheduler will schedule the load to the CPU-intensive node, this way of finding the most suitable node will effectively improve resource utilization.

When you create a Pod, you can set node affinity, that is, specify which nodes Pod wants to schedule to (these nodes are specified through K8s Label).

Node affinity usage scenario

Node affinity is very suitable for scenarios where workloads with different resource requirements run at the same time in a cluster. For example, Tencent Cloud's CVM (node) has both CPU-intensive and memory-intensive machines. If some businesses require far more CPU than memory, using an ordinary CVM machine at this time is bound to cause a great waste of memory. At this point, a batch of CPU-intensive CVM can be added to the cluster, and these Pod with high demand for CPU can be dispatched to these CVM, which can improve the overall utilization of CVM resources. Similarly, you can also manage heterogeneous nodes (such as GPU machines) in the cluster, specify the amount of GPU resources required in workloads that require GPU resources, and the scheduling mechanism will help you find the right nodes to run these workloads.

Node Affinity on TKE

TKE provides the same affinity as native Kubernetes. You can use this feature through the console or by configuring YAML. For more information, please see.

2.2.2 dynamic scheduler

The native Kubernetes scheduling strategy tends to schedule pod to nodes with more remaining resources, such as the default LeastRequestedPriority policy. However, there is a problem with the native scheduling strategy: such resource allocation is static, Request can not represent the real use of resources, so there must be a certain degree of waste. Therefore, if the scheduler can schedule based on the actual resource utilization of the node, the problem of resource waste will be solved to a certain extent.

This is what TKE's self-developed dynamic scheduler does. The core principles of the dynamic scheduler are as follows:

Usage scenario of dynamic Scheduler

In addition to reducing the waste of resources, the dynamic scheduler can also alleviate the hot problem of cluster scheduling.

The dynamic scheduler counts the number of Pod dispatched to the node in the past to avoid scheduling too many Pod to the same node.

The dynamic scheduler supports setting the node load threshold and filtering out the nodes that exceed the threshold in the scheduling phase.

Dynamic Scheduler on TKE

You can install and use the dynamic scheduler in the extension component: for more instructions on the use of the dynamic scheduler, please refer to "TKE's blockbuster full-link scheduling solution" and official documentation.

2.3 in the offline business mixed division

If you have both online Web service business and offline computing service business, you can dynamically schedule and run different services with the help of TKE's offline business mixing technology to improve resource utilization.

In the traditional architecture, big data business and online business are often deployed in different resource clusters, and these two parts of business are independent of each other. However, big data's business is generally more offline computing business, which is at the peak of the business at night, while the online business is often empty at night. Cloud native technology makes use of the isolation capability of container integrity (CPU, memory, disk IO, network IO, etc.), and the powerful orchestration and scheduling capability of Kubernetes to achieve mixed deployment of online and offline services, so that offline services can make full use of resources during the idle period of online services to improve resource utilization.

Use scenarios in the mixed part of offline business

Under the Hadoop architecture, offline operations and online operations often belong to different clusters. However, online services and streaming operations have obvious characteristics of peaks and troughs. In the trough period, a large number of resources will be idle, resulting in a waste of resources and an increase in costs. In the offline mixed cluster, through dynamic scheduling to cut the peak and fill the valley, when the utilization rate of the online cluster is in the trough period, the offline task can be dispatched to the online cluster, which can significantly improve the utilization of resources. However, at present, Hadoop Yarn can only be allocated through the static resources reported by NodeManager, which can not be based on dynamic resource scheduling, and can not well support the mixed scenarios of online and offline services.

Offline mixing on TKE

Online business has obvious characteristics of peaks and valleys, and the law is relatively obvious, especially at night, when the resource utilization is relatively low. At this time, big data management and control platform sends a request to the Kubernetes cluster to create resources, which can improve the computing power of big data's application.

How to balance resource utilization and stability

In the operation and maintenance work of enterprises, in addition to the cost, the stability of the system is also a very important indicator. How to strike a balance between the two may be a "pain point" in the minds of many operators. On the one hand, in order to reduce costs, the higher the resource utilization, the better, but after the resource utilization reaches a certain water level, the high load is very likely to lead to business OOM or CPU jitter and other problems. In order to reduce the concerns on the road of enterprise cost control, TKE also provides a "magic device"-rescheduler to ensure that the cluster load water level is within a controllable range. The rescheduler is a good pair of dynamic schedulers (their relationship can be found in the figure below). As its name suggests, it is mainly responsible for "protecting" nodes that are already loaded with "dangerous" nodes and elegantly expelling business on these nodes.

Rescheduler on TKE

You can install and use the rescheduler in the extension component:

At this point, I believe you have a deeper understanding of "kubernetes how to improve resource utilization". You might as well do it in practice. Here is the website, more related content can enter the relevant channels to inquire, follow us, continue to learn!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.