Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

How to use Docker Compose to manage GPU Resources

2025-01-28 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Development >

Share

Shulou(Shulou.com)06/03 Report--

This article focuses on "how to use Docker Compose to manage GPU resources", interested friends may wish to take a look. The method introduced in this paper is simple, fast and practical. Let's let the editor take you to learn how to use Docker Compose to manage GPU resources.

Under the general trend of AI-oriented development, containerization can seamlessly migrate the environment and reduce the cost of configuring the environment indefinitely. However, configuring CUDA in the container and running TensorFlow can be troublesome for a while, so we'll introduce and use it here.

Enabling GPU access with Compose

Runtime options with Memory, CPUs, and GPUs

The Compose Specification

The Compose Specification-Deployment support

The Compose Specification-Build support

Using GPU resources in Compose

If the corresponding configuration is correctly installed and set on the host where we deploy the Docker service, and there are also corresponding GPU graphics cards on the host, then these GPU graphics cards can be defined and set in Compose.

# configuration to be installed $apt-get install nvidia-container-runtime

Old version = 19.03

# with-- gpus$ docker run-it-- rm-- gpus all ubuntu nvidia-smi# use device$ docker run-it-- rm-- gpus\ device=GPU-3a23c669-1f69-c64e-cf85-44e9b07e7a2a\ ubuntu nvidia-smi# specific gpu$ docker run-it-- rm-- gpus''device=0,2 "ubuntu nvidia-smi# set nvidia capabilities$ docker run-- gpus' all,capabilities=utility'-- rm ubuntu nvidia-smi

According to the old version (v2.3) configuration file of the Compose tool, if you want to use the GPU graphics card resources in the deployed service, you must use the runtime parameter to configure it. Although it can be used as a runtime to provide access to and use of GPU for containers, control over specific properties of GPU devices is not allowed in this mode.

Services: test: image: nvidia/cuda:10.2-base command: nvidia-smi runtime: nvidia environment:-NVIDIA_VISIBLE_DEVICES=all

In the version of Compose v1.28.0 +, the Compose Specification configuration file is used, and some configuration properties that can control GPU resources with finer granularity can be used, so our needs can be accurately expressed at startup. Ahem, let's have a look at it here.

Capabilities-required field

Specify features that need to be supported; multiple different features can be configured; fields that must be configured

Man 7 capabilities

Deploy: resources: reservations: devices:-capabilities: ["gpu"]

Count

Specify the number of GPU to be used; the value is of type int; choose one of the two fields with the device_ids field

Deploy: resources: reservations: devices:-capabilities: ["tpu"] count: 2

Device_ids

Specifies that the GPU device ID value is used; choose one of the two fields with the count field

Deploy: resources: reservations: devices:-capabilities: ["gpu"] device_ids: ["0", "3"] deploy: resources: reservations: devices:-capabilities: ["gpu"] device_ids: ["GPU-f123d1c9-26bb-df9b-1c23-4a731f61d8c7"]

Driver

Specify the GPU device driver type

Deploy: resources: reservations: devices:-capabilities: ["nvidia-compute"] driver: nvidia

Options

Specify specific options for the driver

Deploy: resources: reservations: devices:-capabilities: ["gpu"] driver: gpuvendor options: virtualization: false

Ahem, I've seen it and said it, so let's simply write a sample file to let the startup cuda container service use a GPU device resource, and run it to get the following output.

Services: test: image: nvidia/cuda:10.2-base command: nvidia-smi deploy: restart_policy: condition: on-failure delay: 5s max_attempts: 3 window: 120s resources: limits: cpus: "0.50" memory: 50m reservations: cpus: "0.25m" memory: 20m Devices:-driver: nvidia count: 1 capabilities: [gpu Utility] update_config: parallelism: 2 delay: 10s order: stop-first

Note here that if you set count: 2, you will see the settings of the two graphics cards in the output below. If we do not set the count or device_ids fields here, all GPU on the host will be used together by default.

# the foreground runs $docker-compose upCreating network "gpu_default" with the default driverCreating gpu_test_1 directly. DoneAttaching to gpu_test_1test_1 | +-+ test_1 | | NVIDIA-SMI 450.80.02 Driver Version: 450.80.02 CUDA Version: 11.1 | test _ 1 | |-- + test_1 | | GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC | test_1 | | Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. | test_1 | | MIG M. | test_1 | | = | test_1 | | Tesla T4 On | 00000000:00:1E.0 Off | | test_1 | | Nameza 23C | P8 9W / 70W | MiB / 15109MiB |% Default | test_1 | | NumberA | test_1 | +-+- -+-- + test_1 | test_1 | +-+ test_1 | | Processes: | | test_1 | | GPU GI CI PID Type Process name GPU Memory | test_1 | | ID ID Usage | test_1 | | = = | test_1 | | No running processes found | | | test_1 | +-+ gpu_test_1 exited with code |

Of course, if you set the count or device_ids field, you can use multiple graphics card resources in the program in the container. You can validate and use the following deployment configuration files.

Services: test: image: tensorflow/tensorflow:latest-gpu command: python-c "import tensorflow as tf;tf.test.gpu_device_name ()" deploy: resources: reservations: devices:-driver: nvidia device_ids: ["0", "3"] capabilities: [gpu]

As a result, as shown below, we can see that both graphics cards can be used.

# the foreground runs $docker-compose up...Created TensorFlow device (/ device:GPU:0 with 13970 MB memory-> physical GPU (device: 0, name: Tesla T4, pci bus id: 0000:00:1b.0, compute capability: 7.5)... Created TensorFlow device (/ device:GPU:1 with 13970 MB memory)-> physical GPU (device: 1, name: Tesla T4, pci bus id: 0000:00:1e.0, compute capability: 7.5)... gpu_test_1 exited with code

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Development

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report