GPU Management in Kubernetes 10/21 Update SLTechnology News&Howtos

GPU Management in Kubernetes

2025-10-21 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Servers >

Shulou(Shulou.com)06/03 Report--

The knowledge points of this article include: the installation and deployment of GPU in Kubernetes, the use of GPU and the principle of GPU container image. Reading the complete article, I believe you have a certain understanding of GPU management in Kubernetes.

Since 2016, the Kubernetes community has been receiving a large number of requests from different sources: they want to run machine learning frameworks such as TensorFlow on Kubernetes clusters. Among these demands, in addition to the management of offline tasks like Job described in the previous article, there is also a huge challenge: the heterogeneous devices on which deep learning depends and Nvidia's GPU support.

We can't help but wonder: what are the benefits of Kubernetes managing GPU?

It is essentially a consideration of cost and efficiency. Because the cost of GPU is high compared to CPU. On the cloud, a single CPU usually costs a few cents an hour, while the cost of GPU ranges from 10 yuan to 30 yuan per hour for a single GPU, so we should find ways to improve the utilization rate of GPU.

Why use Kubernetes to manage heterogeneous resources represented by GPU?

Specifically, there are three aspects:

Accelerate deployment: avoid repeated deployment of complex machine learning environment through container concept; improve cluster resource utilization: unified scheduling and allocation of cluster resources; ensure exclusive use of resources: use containers to isolate heterogeneous devices to avoid mutual influence.

The first is to speed up deployment and avoid wasting time in environmental preparation. Through the container image technology, the whole deployment process is solidified and reused. If students pay attention to the field of machine learning, you can find that many frameworks provide container images. We can use this to improve the efficiency of using GPU. Through time-sharing multiplexing, the efficiency of GPU is improved. When the number of cards in GPU reaches a certain number, it is necessary to use the unified scheduling capability of Kubernetes, so that resource users can apply and release immediately, thus invigorating the resource pool of the whole GPU.

At this time, it is also necessary to use the device isolation capability of Docker to prevent processes of different applications from running on the same device, resulting in mutual influence. At the same time of high efficiency and low cost, it also ensures the stability of the system.

Containerization of GPU

The above learned the benefits of running GPU applications through Kubernetes, and from the previous series of articles, we also know that Kubernetes is a container scheduling platform, and the scheduling unit is the container, so before we learn how to use Kubernetes, let's learn how to run GPU applications in a container environment.

1. Using GPU applications in Container Environment

Using GPU applications in a container environment is actually not complicated. There are two main steps:

Build a container image that supports GPU; run the image using Docker, and map GPU devices and dependent libraries to the container. two。 How to prepare GPU Container Image

There are two ways to prepare:

Use the official deep learning container image directly

For example, looking for official GPU images directly from docker.hub or Aliyun image services, including popular machine learning frameworks such as TensorFlow, Caffe, PyTorch, all provide standard images. This advantage is simple and convenient, and safe and reliable.

Construction of CUDA Image Foundation based on Nvidia

Of course, if the official image does not meet the requirements, such as if you have made custom changes to the TensorFlow framework, you need to recompile and build your own TensorFlow image. In this case, our best practice is to build on the official Nvidia image instead of starting from scratch.

As shown in the TensorFlow example in the figure below, this is based on the Cuda image and starts to build your own GPU image.

3. GPU container mirroring principle

To understand how to build a GPU container image, you need to know how to install the GPU application on the host.

As shown on the left side of the figure below, the bottom layer is to install the Nvidia hardware driver first; then to the top is the general Cuda tool library; and at the top is the machine learning framework such as PyTorch and TensorFlow.

The CUDA tool library of the upper two layers is highly coupled with the application, and when the application version changes, the corresponding CUDA version will probably be updated, while the lowest Nvidia driver is usually relatively stable and will not be updated frequently like CUDA and applications.

At the same time, the Nvidia driver needs to be compiled by kernel source code. As shown on the right side of the figure, Nvidia's GPU container solution is to install the Nvidia driver on the host, and leave the software above CUDA to the container image to do. At the same time, map the links in the Nvidia driver to the container in the way of Mount Bind.

One benefit is that when you install a new Nvidia driver, you can run different versions of CUDA images on the same machine node.

4. How to use container to run GPU program

With the previous foundation, it is easier for us to understand how GPU containers work. The following figure is an example of running a GPU container using Docker.

We can observe that the only difference between a GPU container and a normal container at run time is that the host device and the Nvidia driver library need to be mapped to the container.

The right side of the above figure reflects the GPU configuration in the GPU container after it is started. The upper right shows the result of the device mapping, and the lower right shows the changes that can be seen after the driver library is mapped to the container in Bind mode.

Usually you will use Nvidia-docker to run the GPU container, and the actual job of Nvidia-docker is to automate these two tasks. Among them, mounting the device is relatively simple, but the real complexity is the driver library that the GPU application depends on.

For different scenarios such as deep learning and video processing, some driver libraries are not the same. This in turn depends on the domain knowledge of Nvidia, which runs through the container of Nvidia.

GPU management of Kubernetes 1. How to deploy GPU Kubernetes

Let's first look at how to add GPU capabilities to a Kubernetes node. Let's take the CentOS node as an example.

As shown in the above figure:

First install the Nvidia driver

Because the Nvidia driver requires kernel compilation, you need to install gcc and kernel source code before installing the Nvidia driver.

The second step is to install Nvidia Docker2 through the yum source

You need to reload docker after installing Nvidia Docker2. You can check that the default startup engine in the daemon.json of docker has been replaced with nvidia, or you can check whether the runC used at run time is the runC of Nvidia through the docker info command.

The third step is to deploy Nvidia Device Plugin

Download the deployment declaration file for Device Plugin from Nvidia's git repo and deploy it through the kubectl create command.

Here Device Plugin is deployed as deamonset. So we know that if we need to troubleshoot a Kubernetes node that cannot schedule GPU applications, we need to start with these modules. For example, I want to check the Device Plugin log, whether Nvidia's runC is configured as the docker default runC, and whether the Nvidia driver is installed successfully.

two。 Verify the results of deployment GPU Kubernetes

When the GPU node is deployed successfully, we can find the relevant GPU information from the status information of the node.

One is the name of the GPU, here is the nvidia.com/gpu; and the other is its corresponding number, as shown in the following figure is 2, indicating that there are two GPU in the node.

3. Yaml sample using GPU in Kubernetes

From the user's point of view, using the GPU container in Kubernetes is very simple.

You only need to specify the number of GPU used by nvidia.com/gpu in the limit field of the Pod resource configuration, as in the following figure example, we set the number to 1, and then use the kubectl create command to complete the Pod deployment of GPU.

4. View the running results

After the deployment is complete, you can log in to the container and execute the nvidia-smi command to observe the results. You can see that a T4 GPU card is used in the container. It indicates that one of the two GPU cards in the node can already be used in the container, but the other card of the node is completely transparent to the container and cannot be accessed. This reflects the isolation of GPU.

Working principle 1. Manage GPU resources in an extended manner

Kubernetes itself manages GPU resources through plug-in extension mechanisms, specifically there are two separate internal mechanisms.

The first is Extend Resources, which allows users to customize resource names. The resource is measured at the integer level, and the goal is to support different heterogeneous devices, including RDMA, FPGA, AMD GPU, etc., through a common pattern, not just designed for Nvidia GPU

Device Plugin Framework allows third-party equipment providers to manage the full lifecycle of devices externally, while Device Plugin Framework builds a bridge between Kubernetes and Device Plugin modules. On the one hand, it is responsible for reporting the equipment information to Kubernetes, on the other hand, it is responsible for the scheduling and selection of equipment. 2. Report of Extended Resource

Extend Resources belongs to Node-level 's api and can be used independently of Device Plugin. To report Extend Resources, you only need to update the status part of the Node object through a PACTH API, and this PACTH operation can be done through a simple curl command. In this way, the GPU type of this node can be recorded in the Kubernetes scheduler, and its corresponding number of resources is 1.

Of course, if you are using Device Plugin, you do not need to do this PACTH operation, just follow the programming model of Device Plugin, and Device Plugin will complete this operation in the work reported by the device.

3. Device Plugin working mechanism

Introduce the working mechanism of Device Plugin. The whole workflow of Device Plugin can be divided into two parts:

One is the resource reporting at the startup time, and the other is the scheduling and running of the user time.

Device Plugin development is very simple. It mainly includes the two most concerned and core event methods:

ListAndWatch not only corresponds to the reporting of resources, but also provides a mechanism for health check. When the device is unhealthy, you can report to the ID of the Kubernetes unhealthy device and ask Device Plugin Framework to remove the device from the schedulable device.

Allocate will be called by Device Plugin when deploying the container. The core of the passed parameters is the device ID that the container will use, and the returned parameters are the devices, data volumes and environment variables required when the container starts. 4. Resource reporting and monitoring

For each hardware device, it needs to be managed by its corresponding Device Plugin. These Device Plugin connect to the Device Plugin Manager in the kubelet through GRPC as the client, and report the version number and device name of the Unis socket api they are listening to, such as GPU, to kubelet.

Let's take a look at the whole process of Device Plugin resource reporting. In general, the whole process is divided into four steps, in which the first three steps take place on the node, and the fourth step is the interaction between kubelet and api-server.

The first step is the registration of Device Plugin, which requires Kubernetes to know which Device Plugin to interact with. This is because there may be multiple devices on a node, requiring Device Plugin to report three things to Kubelet as a client: who am I? Is the name of the device managed by Device Plugin. Is it GPU or RDMA;? where am I? It is the file location of the unis socket that the plug-in itself listens to, so that the kubelet can call itself; the interaction protocol, that is, the version number of API

The second step is to start the service, and Device Plugin will start a server for GRPC. Since then, Device Plugin has been providing services for kubelet to access as this server, while listening to the address and providing the version of API has been completed in the first step.

Third, when the GRPC server starts, kubelet establishes a long connection to the ListAndWatch of the Device Plugin to discover the device ID and the health status of the device. When Device Plugin detects that a device is unhealthy, it actively notifies kubelet. At this point, if the device is idle, kubelet will remove it from the list that can be allocated. But when the device is already in use by a Pod, kubelet will not do anything, if killing the Pod at this time is a very dangerous operation

In the fourth step, kubelet exposes these devices to the state of the Node node and sends the number of devices to the api-server of Kubernetes. The subsequent scheduler can schedule based on this information.

It is important to note that when kubelet reports to api-server, it only reports the corresponding number of GPU. Kubelet's own Device Plugin Manager saves the ID list of the GPU and uses it for specific device assignments. For the Kubernetes global scheduler, it does not know the ID list of the GPU, it only knows the number of GPU.

This means that under the existing Device Plugin working mechanism, the global scheduler of Kubernetes cannot carry out more complex scheduling. For example, if you want to do the affinity scheduling of two GPU, two GPU of the same node may need to communicate through NVLINK instead of PCIe in order to achieve better data transmission effect. Under this requirement, it can not be realized in the current Device Plugin scheduling mechanism.

5. The process of scheduling and running Pod

When Pod wants to use a GPU, it just needs to declare the GPU resource and the corresponding number (such as nvidia.com/gpu: 1) in the limits field under the Resource of Pod, as in the previous example. Kubernetes finds the node that meets the quantity condition, then subtracts the number of GPU from that node by 1, and completes the binding of Pod to Node.

After the binding is successful, it will be used by the kubelet of the corresponding node to create the container. When kubelet discovers that the resource requested by the Pod container is a GPU, kubelet will delegate its own internal Device Plugin Manager module and select an available GPU from the ID list of the GPU it holds to allocate to the container.

At this point, kubelet will issue an Allocate request to the local DeAvice Plugin, which takes a parameter that is the ID list of devices to be assigned to the container.

After Device Plugin receives the AllocateRequest request, it will find the device path, driver directory and environment variables corresponding to the device ID according to the device ID passed by kubelet, and return it to kubelet in the form of AllocateResponse.

Once the device path and driver directory information carried in AllocateResponse is returned to kubelet, kubelet will assign GPU to the container based on this information, so that Docker will create the container according to the instructions of kubelet, and GPU devices will appear in this container. And mount the driver directory it needs, and the process of assigning a GPU to Pod by Kubernetes is over.

2. Defects of Device Plugin mechanism

Finally, let's think about the question: is the current Device Plugin perfect?

It should be pointed out that the whole working mechanism and process of Device Plugin is actually quite different from the real scene of academia and industry. The biggest problem here is that the scheduling of GPU resources is actually done on kubelet.

As a global scheduler, this participation is very limited, and as a traditional Kubernetes scheduler, it can only handle the number of GPU. Once your device is heterogeneous and you can't simply use numbers to describe requirements, for example, when my Pod wants to run on two GPU with nvlink, this Device Plugin can't handle it at all.

Not to mention in many scenarios, we want the scheduler to schedule globally according to the devices of the entire cluster, which can not be satisfied by the current Device Plugin.

What is even more tricky is that in the design and implementation of Device Plugin, API like Allocate and ListAndWatch is also useless to add extensible parameters. This is when we use some of the more complex device usage requirements, it is virtually impossible to extend API through Device Plugin.

So the scenario covered by the current Device Plugin design is actually very single, a state that is available but not easy to use. This explains why vendors like Nvidia have implemented a fork solution based on Kubernetes upstream code and have no choice but to do so.

3. Heterogeneous resource scheduling scheme based on community

After reading the above, do you have any further understanding of GPU management in Kubernetes? If you want to learn more skills or want to know more about it, you are welcome to follow the industry information channel. Thank you for reading.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.