Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

How Kubernetes uses NVIDIA GPU through Device Plugins

2025-03-25 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Servers >

Share

Shulou(Shulou.com)06/01 Report--

This article shows you how Kubernetes uses NVIDIA GPU through Device Plugins. The content is concise and easy to understand. It will definitely brighten your eyes. I hope you can get something through the detailed introduction of this article.

Device Plugins

Device Pulgins is a beta feature in Kubernetes 1.10, which began with Kubernetes 1.8.It is used to connect device resources to Kubernetes by plug-in for third-party device manufacturers and provide Extended Resources to containers.

Through the way of Device Plugins, users do not need to change the code of Kubernetes, and the plug-ins are developed by third-party equipment manufacturers to realize the relevant interfaces of Kubernetes Device Plugins.

At present, the high-profile Device Plugins implementations are:

GPU plug-in provided by Nvidia: NVIDIA device plugin for Kubernetes

High performance and low latency RDMA card plug-in: RDMA device plugin for Kubernetes

Low-latency Solarflare 10 Gigabit Nic driver: Solarflare Device Plugin

When Device plugins starts, it exposes several gRPC Service to provide services and registers with kubelet through / var/lib/kubelet/device-plugins/kubelet.sock.

Device Plugins Registration

In versions prior to Kubernetes 1.10, the default is disable DevicePlugins, and users need to enable in Feature Gate.

In Kubernetes 1.10, the default enable DevicePlugins, users can disable it in Feature Gate.

When DevicePlugins Feature Gate enable,kubelet, a Register gRPC interface is exposed. Device Plugins completes the registration of Device by calling the Register interface.

The Register API is described as follows:

Pkg/kubelet/apis/deviceplugin/v1beta1/api.pb.go:440 type RegistrationServer interface {Register (context.Context, * RegisterRequest) (* Empty, error)} pkg/kubelet/apis/deviceplugin/v1beta1/api.pb.go:87 type RegisterRequest struct {/ / Version of the API the Device Plugin was built against Version string `protobuf: "bytes,1,opt,name=version,proto3" json: "version Omitempty "`/ / Name of the unix socket the device plugin is listening on / / PATH = path.Join (DevicePluginPath, endpoint) Endpoint string `protobuf:" bytes,2,opt,name=endpoint,proto3 "json:" endpoint,omitempty "` / / Schedulable resource name. As of now it's expected to be a DNS Label ResourceName string `protobuf: "bytes,3,opt,name=resource_name,json=resourceName,proto3" json: "resource_name,omitempty" `/ / Options to be communicated with Device Manager Options * DevicePluginOptions `protobuf: "bytes,4,opt,name=options" json: "options,omitempty"`}

The parameters required by RegisterRequest are as follows:

For nvidia gpu, there is only one PreStartRequired option, indicating whether to call Device Plugin's PreStartContainer interface (which is one of the Device Plugin Interface interfaces in Kubernetes 1.10) before each Container starts. The default is false.

Vendor/k8s.io/kubernetes/pkg/kubelet/apis/deviceplugin/v1beta1/api.pb.go:71 func (m * NvidiaDevicePlugin) GetDevicePluginOptions (context.Context, * pluginapi.Empty) (* pluginapi.DevicePluginOptions, error) {return & pluginapi.DevicePluginOptions {} Nil} github.com/NVIDIA/k8s-device-plugin/server.go:80 type DevicePluginOptions struct {/ / Indicates if PreStartContainer call is required before each container start PreStartRequired bool `protobuf: "varint,1,opt,name=pre_start_required,json=preStartRequired,proto3" json: "pre_start_required,omitempty" `}

Version, there are currently two versions of v1alpha dint v1beta1.

Endpoint, which represents the name of the socket exposed by device plugin. When Register generates the socket of plugin based on Endpoint, it is placed in the / var/lib/kubelet/device-plugins/ directory, such as Nvidia GPU Device Plugin corresponding to / var/lib/kubelet/device-plugins/nvidia.sock.

ResourceName, must follow the Extended Resource Naming Scheme format vendor-domain/resource, such as nvidia.com/gpu

DevicePluginOptions, passed as an additional parameter when kubelet communicates with device plugin.

As mentioned earlier, there are two versions of Device Plugin Interface: v1alpha and v1beta1. The corresponding APIs for each version are as follows:

/ v1beta1.Registration/Register

/ v1beta1.Registration/Register pkg/kubelet/apis/deviceplugin/v1beta1/api.pb.go:466 var _ Registration_serviceDesc = grpc.ServiceDesc {ServiceName: "v1beta1.Registration", HandlerType: (* RegistrationServer) (nil), Methods: [] grpc.MethodDesc {{MethodName: "Register" Handler: _ Registration_Register_Handler,},}, Streams: [] grpc.StreamDesc {}, Metadata: "api.proto",}

/ v1beta1.DevicePlugin/ListAndWatch

/ v1beta1.DevicePlugin/Allocate

/ v1beta1.DevicePlugin/PreStartContainer

/ v1beta1.DevicePlugin/GetDevicePluginOptions

Pkg/kubelet/apis/deviceplugin/v1beta1/api.pb.go:665 var _ DevicePlugin_serviceDesc = grpc.ServiceDesc {ServiceName: "v1beta1.DevicePlugin", HandlerType: (* DevicePluginServer) (nil), Methods: [] grpc.MethodDesc {{MethodName: "GetDevicePluginOptions" Handler: _ DevicePlugin_GetDevicePluginOptions_Handler,}, {MethodName: "Allocate", Handler: _ DevicePlugin_Allocate_Handler,} {MethodName: "PreStartContainer", Handler: _ DevicePlugin_PreStartContainer_Handler,},} Streams: [] grpc.StreamDesc {{StreamName: "ListAndWatch", Handler: _ DevicePlugin_ListAndWatch_Handler, ServerStreams: true,},} Metadata: "api.proto",}

/ deviceplugin.Registration/Register

Pkg/kubelet/apis/deviceplugin/v1alpha/api.pb.go:374 var _ Registration_serviceDesc = grpc.ServiceDesc {ServiceName: "deviceplugin.Registration", HandlerType: (* RegistrationServer) (nil), Methods: [] grpc.MethodDesc {{MethodName: "Register" Handler: _ Registration_Register_Handler,},}, Streams: [] grpc.StreamDesc {}, Metadata: "api.proto",}

/ deviceplugin.DevicePlugin/Allocate

/ deviceplugin.DevicePlugin/ListAndWatch

Pkg/kubelet/apis/deviceplugin/v1alpha/api.pb.go:505 var _ DevicePlugin_serviceDesc = grpc.ServiceDesc {ServiceName: "deviceplugin.DevicePlugin", HandlerType: (* DevicePluginServer) (nil), Methods: [] grpc.MethodDesc {{MethodName: "Allocate" Handler: _ DevicePlugin_Allocate_Handler,},}, Streams: [] grpc.StreamDesc {{StreamName: "ListAndWatch" Handler: _ DevicePlugin_ListAndWatch_Handler, ServerStreams: true,},}, Metadata: "api.proto",}

V1alpha:

V1beta1:

When Device Plugin is successfully registered, it will send a list of device it manages to kubelet through ListAndWatch. After receiving the data, kubelet updates the status of the corresponding node in etcd through API Server.

The user can then request the corresponding device in Container Spec request. Note the following restrictions:

Extended Resource only supports requesting integer device, not decimal point.

Overconfiguration is not supported, that is, Resource QoS can only be Guaranteed.

The same Device cannot be shared by multiple Containers.

Device Plugins Workflow

The workflow of Device Plugins is as follows:

Initialization: after the Device Plugin starts, do some plug-in-specific initialization to make sure that the corresponding Devices is in the Ready state, and for Nvidia GPU, load the NVML Library.

Start the gRPC service: expose the gRPC service through / var/lib/kubelet/device-plugins/$ {Endpoint} .sock. Different API Version corresponds to different service interfaces. As mentioned earlier, the following is a description of each interface.

ListAndWatch

Allocate

GetDevicePluginOptions

PreStartContainer

Pkg/kubelet/apis/deviceplugin/v1beta1/api.proto / DevicePlugin is the service advertised by Device Plugins service DevicePlugin {/ / GetDevicePluginOptions returns options to be communicated with Device / / Manager rpc GetDevicePluginOptions (Empty) returns (DevicePluginOptions) {} / / ListAndWatch returns a stream of List of Devices / / Whenever a Device state change or a Device disapears ListAndWatch / / returns the new list rpc ListAndWatch (Empty) returns (stream ListAndWatchResponse) {} / Allocate is called during container creation so that the Device / / Plugin can run device specific operations and instruct Kubelet / / of the steps to make the Device available in the container rpc Allocate (AllocateRequest) returns (AllocateResponse) {} / / PreStartContainer is called, if indicated by Device Plugin during registeration phase / / before each container start. Device plugin can run device specific operations / / such as reseting the device before making devices available to the container rpc PreStartContainer (PreStartContainerRequest) returns (PreStartContainerResponse) {}}

ListAndWatch

Allocate

Pkg/kubelet/apis/deviceplugin/v1alpha/api.proto / / DevicePlugin is the service advertised by Device Plugins service DevicePlugin {/ / ListAndWatch returns a stream of List of Devices / / Whenever a Device state changes or a Device disappears ListAndWatch / / returns the new list rpc ListAndWatch (Empty) returns (stream ListAndWatchResponse) {} / Allocate is called during container creation so that the Device / / Plugin can run device specific operations and instruct Kubelet / / of the steps to make the Device available in the container rpc Allocate (AllocateRequest) returns (AllocateResponse) {}}

V1alpha:

V1beta1:

Device Plugin registers with kubelet through / var/lib/kubelet/device-plugins/kubelet.sock.

After the registration is successful, Device Plugin officially enters Serving mode and provides the gRPC API call service mentioned earlier. The following is the specific analysis of each v1beta1 API:

Here is the GPU Sample of struct Device:

Struct Device {ID: "GPU-fef8089b-4820-abfc-e83e-94318197576e", State: "Healthy",}

PreStartContainer is expected to be called before each container start if indicated by plugin during registration phase.

PreStartContainer allows kubelet to pass reinitialized devices to containers.

PreStartContainer allows Device Plugin to run device specific operations on the Devices requested.

Type PreStartContainerRequest struct {DevicesIDs [] string `protobuf: "bytes,1,rep,name=devicesIDs" json: "devicesIDs,omitempty" `} / / PreStartContainerResponse will be send by plugin in response to PreStartContainerRequest type PreStartContainerResponse struct {}

Allocate is expected to be called during pod creation since allocation failures for any container would result in pod startup failure.

Allocate allows kubelet to exposes additional artifacts in a pod's environment as directed by the plugin.

Allocate allows Device Plugin to run device specific operations on the Devices requested

Type AllocateRequest struct {ContainerRequests [] * ContainerAllocateRequest `protobuf: "bytes,1,rep,name=container_requests,json=containerRequests" json: "container_requests,omitempty" `} type ContainerAllocateRequest struct {DevicesIDs [] string `protobuf: "bytes,1,rep,name=devicesIDs" json: "devicesIDs Omitempty "`} / / AllocateResponse includes the artifacts that needs to be injected into / / a container for accessing 'deviceIDs' that were mentioned as part of / /' AllocateRequest'. / / Failure Handling: / / if Kubelet sends an allocation request for dev1 and dev2. / / Allocation on dev1 succeeds but allocation on dev2 fails. / / The Device plugin should send a ListAndWatch update and fail the / / Allocation request type AllocateResponse struct {ContainerResponses [] * ContainerAllocateResponse `protobuf: "bytes,1,rep,name=container_responses,json=containerResponses" json: "container_responses,omitempty" `} type ContainerAllocateResponse struct {/ / List of environment variable to be set in the container to access one of more devices. Envs map [string] string `protobuf: "bytes,1,rep,name=envs" json: "envs,omitempty" protobuf_key: "bytes,1,opt,name=key,proto3" protobuf_val: "bytes,2,opt,name=value,proto3" `/ / Mounts for the container. Mounts [] * Mount `protobuf: "bytes,2,rep,name=mounts" json: "mounts,omitempty" `/ / Devices for the container. Devices [] * DeviceSpec `protobuf: "bytes,3,rep,name=devices" json: "devices,omitempty" `/ / Container annotations to pass to the container runtime Annotations map [string] string `protobuf: "bytes,4,rep,name=annotations" json: "annotations,omitempty" protobuf_key: "bytes,1,opt,name=key,proto3" protobuf_val: "bytes,2,opt,name=value,proto3"`} / / DeviceSpec specifies a host device to mount into a container. Type DeviceSpec struct {/ / Path of the device within the container. ContainerPath string `protobuf: "bytes,1,opt,name=container_path,json=containerPath,proto3" json: "container_path,omitempty" `/ / Path of the device on the host. HostPath string `protobuf: "bytes,2,opt,name=host_path,json=hostPath,proto3" json: "host_path,omitempty" `/ / Cgroups permissions of the device, candidates are one or more of / / * r-allows container to read from the specified device. / * w-allows container to write to the specified device. / * m-allows container to create device files that do not yet exist. Permissions string `protobuf: "bytes,3,opt,name=permissions,proto3" json: "permissions,omitempty" `}

AllocateRequest is the DeviceID list.

AllocateResponse includes Envs that needs to be injected into Container, mount information of Devices (including device's cgroup permissions), and custom Annotations.

Allocate:Device Plugin performs the device-specific operation, returns AllocateResponse to kubelet,kubelet and then passes it to dockerd, which is used by dockerd (calling nvidia-docker) when allocating device when creating the container. The following is a description of the Request and Response of this interface.

PreStartContainer:

GetDevicePluginOptions: currently there is only one field, PreStartRequired.

Type DevicePluginOptions struct {/ / Indicates if PreStartContainer call is required before each container start PreStartRequired bool `protobuf: "varint,1,opt,name=pre_start_required,json=preStartRequired,proto3" json: "pre_start_required,omitempty" `}

ListAndWatch: monitor the status changes or Disappear events of the corresponding Devices, and return ListAndWatchResponse to kubelet. ListAndWatchResponse is the Device list.

Type ListAndWatchResponse struct {Devices [] * Device `protobuf: "bytes,1,rep,name=devices" json: "devices,omitempty" `} type Device struct {/ / A unique ID assigned by the device plugin used / / to identify devices during the communication / / Max length of this field is 63 characters ID string `protobuf: "bytes,1,opt,name=ID,json=iD,proto3" json: "ID Omitempty "`/ / Health of the device, can be healthy or unhealthy, see constants.go Health string `protobuf:" bytes,2,opt,name=health,proto3 "json:" health,omitempty "`}

Exception handling

Every time kubelet starts (restarts), all sockets files under / var/lib/kubelet/device-plugins are deleted.

Device Plugin is responsible for monitoring that its own socket is deleted, then re-registering and regenerating its own socket.

What should Device Plugin do when plugin socket is mistakenly deleted?

Let's see how Nvidia Device Plugin is handled, and the related code is as follows:

Github.com/NVIDIA/k8s-device-plugin/main.go:15func main () {... Log.Println ("Starting FS watcher.") Watcher, err: = newFSWatcher (pluginapi.DevicePluginPath)... Restart: = true var devicePlugin * NvidiaDevicePluginL: for {if restart {if devicePlugin! = nil {devicePlugin.Stop ()} devicePlugin = NewNvidiaDevicePlugin () if err: = devicePlugin.Serve () Err! = nil {log.Println ("Could not contact Kubelet, retrying. Did you enable the device plugin feature gate? ") Log.Printf ("You can check the prerequisites at: https://github.com/NVIDIA/k8s-device-plugin#prerequisites") log.Printf (" You can learn how to set the runtime at: https://github.com/NVIDIA/k8s-device-plugin#quick-start")} else {restart = false) }} select {case event: =

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Servers

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report