What is the concept of requests and limits in Kubernetes 07/09 Update SLTechnology News&Howtos

What is the concept of requests and limits in Kubernetes

2025-07-09 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Servers >

Shulou(Shulou.com)06/01 Report--

This article mainly explains "what is the concept of requests and limits in Kubernetes". Interested friends may wish to have a look at it. The method introduced in this paper is simple, fast and practical. Next, let the editor take you to learn "what is the concept of requests and limits in Kubernetes?"

When deploying resources in a K8s cluster, do you often encounter the following situations:

Often the CPU requests is not set or the CPU requests is set too low when the load is deployed in the Kubernetes cluster (so that it "looks" to accommodate more Pod on each node).

When the business is busy, the CPU of the node runs at full load. Business delays have increased significantly, and sometimes even machines inexplicably enter "fake death" states such as CPU soft deadlocks.

Similarly, if the memory requests is not set or the memory requests is set too low when deploying the load, you will find that some Pod will constantly fail to restart.

And these Pod that restart constantly usually run Java business application. But these Java applications are obviously normal when they are debugged locally.

In the Kubernetes cluster, the cluster load is not evenly distributed among the nodes, usually the uneven distribution of memory is more prominent, and the memory utilization of some nodes in the cluster is significantly higher than that of other nodes.

Kubernetes as a well-known cloud native distributed container orchestration system, a so-called de facto standard, shouldn't its scheduler ensure a uniform distribution of resources?

If you encounter the above problems during peak business hours, and the machine already resides in hang or even cannot log in remotely with ssh, only the option of restarting the cluster is usually left to the cluster administrator.

If you have encountered a similar situation above and want to know how to circumvent the problem or if you are a Kubernetes operation and maintenance developer and want to find out the nature of such problems, please read the following chapter patiently.

We will first do a qualitative analysis of such problems, and give best practices to avoid such problems, and finally, if you are interested in the underlying mechanisms of Kubernetes requests and limits, we can do further analysis from the source point of view, so that "know it and know why".

Analysis of problems

For scenario 1

First of all, we need to know that there is a difference between CPU and memory resources. CPU belongs to compressible resources, in which the allocation and management of CPU resources are accomplished by Linux kernel with the help of complete fair scheduling algorithm (CFS) and Cgroup mechanism.

To put it simply, if the CPU used by the service in Pod exceeds the set CPU limits, the CPU resource of Pod will be throttled. For Pod without limit, once the node's free CPU resources are exhausted, the previously allocated CPU resources will gradually decrease.

In either case, the end result is that Pod has become increasingly unable to carry more external requests, as shown by increased application latency and slower response.

For scenario 2

Memory belongs to incompressible resources, which can not be shared between Pod and is completely exclusive, which means that once resources are exhausted or insufficient, the allocation of new resources must fail. Some Pod internal processes will open up a section of memory space in advance when initializing and starting.

For example, the JVM virtual machine will apply for a portion of memory space when it starts. If the value specified by the memory requests is less than the memory requested by the JVM virtual machine to the system, the memory request fails (oom-kill), and the Pod repeatedly fails to restart.

For scenario 3

In fact, in the process of creating Pod, on the one hand, Kubernetes needs to set aside a variety of resources, including CPU and memory, and the resource balance here is a comprehensive consideration of all resources, including CPU and memory.

On the other hand, Kubernetes's built-in scheduling algorithm not only involves the "minimum resource allocation node", but also takes into account other factors such as Pod affinity. And Kubernetes scheduling is based on the requests value of resources, and the reason why it is often observed is that the memory distribution is not balanced enough, because for applications, memory is generally a more scarce resource than other resources.

The scheduling mechanism of Kubernetes is based on the current state. For example, when a new Pod appears for scheduling, the scheduler will make the best scheduling decision based on its resource description of the Kubernetes cluster at that time.

But Kubernetes clusters are very dynamic. For example, in order to maintain a node, we first perform an eviction operation, and all the Pod on this node will be expelled to other nodes. When our maintenance is completed, the previous Pod will not automatically return to this node, because once Pod is bound to the node, rescheduling will not be triggered.

Best practic

From the above analysis, we can see that the stability of the cluster directly determines the stability of the business applications running on it. The temporary shortage of resources is often the main factor leading to cluster instability. Once the cluster is unstable, the performance of the business application will decline, and if it is serious, the related nodes will be unavailable.

So how to improve the stability of the cluster?

On the one hand, we can reserve part of the system resources by editing the Kubelet configuration file, so as to ensure the stability of the node where the kubelet is located when there are few available computing resources. This is especially important when dealing with incompressible resources such as memory and hard drives.

On the other hand, the stability of the cluster can be further improved by reasonably setting the QoS of Pod: the Pod of different QoS has different OOM scores. When there is a shortage of resources, the cluster will first Kill the Pod of Best-Effort type, followed by Pod of Burstable type, and finally Pod of Guaranteed type.

Therefore, if there are sufficient resources, you can set the QoS pods type to Guaranteed. Trade computing resources for business performance and stability, reducing the time and cost of troubleshooting problems. At the same time, if you want to better improve resource utilization, business services can also be set to Guaranteed, while other services can be set to Burstable or Best-Effort respectively according to the degree of importance.

Below we will take the Kubesphere platform as an example to demonstrate how to easily and elegantly configure Pod-related resources.

Practice of KubeSphere Resource allocation

We have learned that the reasonable setting of requests and limits in Kubernetes is very important to the stability of the whole cluster. As the release of Kubernetes, KubeSphere greatly reduces the learning threshold of Kubernetes. With the simple and beautiful UI interface, you will find that effective operation and maintenance is such an easy thing. Next we will demonstrate how to configure container-related resource quotas and limits in the KubeSphere platform.

Related concepts

Before we demonstrate, let's review the concepts related to Kubernetes.

Brief introduction of requests and limits

In order to realize the effective scheduling and full utilization of resources in Kubernetes cluster, Kubernetes uses requests and limits to allocate resources in container granularity. Each container can set the corresponding requests and limits independently. These two parameters are set through the resources field of each container containerSpec. Generally speaking, requests is more important when scheduling, and limits is more important at run time.

Resources: requests: cpu: 50m memory: 50Mi limits: cpu: 100m memory: 100Mi

Requests defines the minimum amount of resources required for the corresponding container. For example, for a Spring Boot business container, the requests here must be the least resource required by the JVM virtual machine in the container image. If the memory requests of Pod is specified as 10Mi, it is obviously unreasonable. The actual memory Xms occupied by JVM exceeds the memory allocated by Kubernetes to Pod, resulting in a Pod memory overflow, and Kubernetes keeps restarting Pod.

Limits defines the maximum amount of resources that can be consumed by this container to prevent excessive consumption of resources from causing resource shortages or even downtime. In particular, a setting of 0 means that there are no restrictions on the resources used. It is worth mentioning that when limits is set instead of requests, Kubernetes defaults to requests equals limits.

Further, the resources described by requests and limits can be divided into two categories: compressible resources (such as CPU) and incompressible resources (such as memory). Reasonable setting of limits parameters is particularly important for incompressible resources.

We already know that the requests parameter will have a direct and obvious impact on the final Kubernetes scheduling result. With the help of the Linux kernel Cgroup mechanism, the limits parameter is actually used by Kubernetes to constrain the resources allocated to the process. In terms of memory parameters, it actually tells the Linux kernel when the relevant container process can be oom-kill in order to clean up space.

To sum up:

For CPU, if the CPU used by the service in the Pod exceeds the setting, the limits,Pod will not be dropped by kill but will be restricted. If limits is not set, pod can use all free CPU resources.

For memory, when a Pod uses more memory than the container in the set limits,Pod, the process will be dropped by kernel due to OOM kill. When container is dropped by kill because of OOM, the system tends to restart the container or native or otherwise recreate a Pod on its original machine.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.