Docker mounts NVIDIA graphics card how to run pytorch 04/26 Update SLTechnology News&Howtos

Docker mounts NVIDIA graphics card how to run pytorch

2025-04-26 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Servers >

Shulou(Shulou.com)06/01 Report--

Editor to share with you docker mount NVIDIA graphics card how to run pytorch, I believe that most people do not know much about it, so share this article for your reference, I hope you will learn a lot after reading this article, let's go to know it!

Host operating environment

$uname-aLinux CentOS 3.10.0-514.26.2.el7.x86_64 # 1 SMP Tue Jul 4 15:04:05 UTC 2017 x86 "64 GNU/Linux$ cat / usr/local/cuda/version.txtCUDA Version 8.0.61$ cat / usr/local/cuda/include/cudnn.h | grep CUDNN_MAJOR-A 2#define CUDNN_MAJOR 6#define CUDNN_MINOR 0#define CUDNN_PATCHLEVEL 21#define CUDNN_VERSION (CUDNN_MAJOR * 1000 + CUDNN_) MINOR * 100 + CUDNN_PATCHLEVEL) # include "driver_types.h" # NVIDIA 1080ti

1. About the mounting of GPU

1. Specify device mount while docker is running

Check out the related equipment first.

$ls-la / dev | grep nvidiacrw-rw-rw- 1 root root 195,0 Nov 15 13:41 nvidia0crw-rw-rw- 1 root root 195,1 Nov 15 13:41 nvidia1crw-rw-rw- 1 root root 195,255 Nov 15 13:41 nvidiactlcrw-rw-rw- 1 root root 242,0 Nov 15 13:41 nvidia-uvmcrw-rw-rw- 1 root root 242,1 Nov 15 13:41 nvidia-uvm-tools

There are two graphics cards installed on the computer. I need to run the official pytorch image in pytorch,dockerhub without gpu support, so I can only try pull an anaconda image first, which can be choreographed as Dockerfile later.

$docker run-it-d-rm-- name pytorch-v / home/qiyafei/pytorch:/mnt/home-- privileged=true-- device / dev/nvidia-uvm:/dev/nvidia-uvm-- device / dev/nvidia1:/dev/nvidia1-- device / dev/nvidiactl:/dev/nvidiactl okwrtdsh/anaconda3 bash

Okwrtdsh's image seems to be aimed at their lab GPU environment, which is a bit too large, but it's okay to run it barely. Inside the container, you also need

Install pytorch:

$conda install pytorch torchvision-c pytorch

The torch was run successfully here, but failed to load the graphics card. It may be due to the driver mismatch. You need to reinstall the driver. Do not try this for now.

Second, use the graphics card in the docker through nvidia-docker

Details: https://github.com/NVIDIA/nvidia-docker

(1) install nvidia-docker

Nvidia-docker is actually an application plug-in of the docker engine, specifically for NVIDIA GPU, because the docker engine does not support the NVIDIA driver, and after installing the plug-in, you can use cuda directly on the user layer. Look at the picture above. This picture is very vivid, and the running mechanism of the docker engine also shows, that is, a container OS user space is virtualized on top of the system kernel through cgroup and namespace. I don't know if this runs on ring0, but cuda and applications can be used. (for virtualization, if you are concerned about this kind of problem, you can learn about the implementation of docker, kvm, etc., which is a hot topic for system classes.)

Download the rpm package: https://github.com/NVIDIA/nvidia-docker/releases/download/v1.0.1/nvidia-docker-1.0.1-1.x86_64.rpm

You can also install it by adding apt or yum sourcelist, but I don't have root permission, and update can easily cause docker to restart, which is not recommended in the personal environment of the lab to prevent damage to programs that others are running (a young man in the company conducted yum update on Aliyun before, resulting in a morning suspension of part of the company's business).

$sudo rpm-I nvidia-docker-1.0.1-1.x86_64.rpm & & rm nvidia-docker-1.0.1-1.x86cm 64.rpm$ sudo systemctl start nvidia-docker

(2) Container testing

We also need the official docker container nvidia/cuda provided by NVIDIA, which has been compiled and installed with CUDA and CUDNN, or directly run. Those lacking image will automatically pull.

$docker pull nvidia/cuda$ nvidia-docker run-rm nvidia/cuda nvidia-smi

The nvidia graphics card can be used successfully when testing in the container:

(3) appropriate image or homemade dockerfile

Appropriate image: the pytorch of Floydhub is recommended. Pay attention to the corresponding cuda and cudnn versions.

Docker pull floydhub/pytorch:0.3.0-gpu.cuda8cudnn6-py3.22nvidia-docker run-ti-d-- rm floydhub/pytorch:0.3.0-gpu.cuda8cudnn6-py3.22 bash

Self-made dockerfile

First of all, we need to think clearly about what we want to install:

1. The basic image must be officially provided by NVIDIA. It is the easiest to install cuda and cudnn.

2. I will definitely take vim, git, lrzsz and ssh.

3. Anaconda and pytorch definitely want it.

Therefore, you need to prepare the domestic source source.list, otherwise the installation speed is very slow.

Deb-src http://archive.ubuntu.com/ubuntu xenial main restricted # Added by software-propertiesdeb http://mirrors.aliyun.com/ubuntu/ xenial main restricteddeb-src http://mirrors.aliyun.com/ubuntu/ xenial main restricted multiverse universe # Added by software-propertiesdeb http://mirrors.aliyun.com/ubuntu/ xenial-updates main restricteddeb-src http://mirrors.aliyun.com/ubuntu/ xenial-updates main restricted multiverse universe # Added by software-propertiesdeb http://mirrors.aliyun.com/ubuntu/ xenial Universedeb http://mirrors.aliyun.com/ubuntu/ xenial-updates universedeb http://mirrors.aliyun.com/ubuntu/ xenial multiversedeb http://mirrors.aliyun.com/ubuntu/ xenial-updates multiversedeb http://mirrors.aliyun.com/ubuntu/ xenial-backports main restricted universe multiversedeb-src http://mirrors.aliyun.com/ubuntu/ xenial-backports main restricted universe multiverse # Added by software-propertiesdeb http://archive.canonical.com/ubuntu xenial partnerdeb-src http://archive.canonical.com/ubuntu xenial Partnerdeb http://mirrors.aliyun.com/ubuntu/ xenial-security main restricteddeb-src http://mirrors.aliyun.com/ubuntu/ xenial-security main restricted multiverse universe # Added by software-propertiesdeb http://mirrors.aliyun.com/ubuntu/ xenial-security universedeb http://mirrors.aliyun.com/ubuntu/ xenial-security multiverse

The address to download anaconda: https://repo.continuum.io/archive/Anaconda3-5.0.1-Linux-x86_64.sh, which is downloaded directly in Dockerfile, as shown below:

$vim DockerfileFROM nvidia/cudaLABEL author= "qyf" ENV PYTHONIOENCODING=utf-8RUN mv / etc/apt/sources.list / etc/apt/sources.list.bakADD $PWD/sources.list / etc/apt/sources.listRUN apt-get update-- fix-missing & &\ apt-get install-y vim net-tools curl wget git bzip2 ca-certificates libglib2.0-0 libxext6 libsm6 libxrender1 mercurial subversion apt-transport-https software-properties-commonRUN apt-get install-y openssh-server-yRUN echo 'root:passwd' | chpasswdRUN sed-I' S/PermitRootLogin prohibit-password/PermitRootLogin yes/' / etc/ssh/sshd_configRUN sed-I 's/#PasswordAuthentication yes/PasswordAuthentication yes/' / etc/ssh/sshd_configRUN echo' export PATH=/opt/conda/bin:$PATH' > / etc/profile.d/conda.sh & & wget-- quiet https://repo.continuum.io/archive/Anaconda3-5.0.1-Linux-x86_64.sh-O / anaconda.sh & & / bin/bash ~ / anaconda.sh- B-p / opt/conda & & rm ~ / anaconda.shRUN apt-get install-y grep sed dpkg & &\ TINI_VERSION= `curl https://github.com/krallin/tini/releases/latest | grep-o "/ v.*\"| sed's: ^..\ (. *\). $:\ 1rm'`& &\ curl-L" https://github.com/krallin/tini/releases/download/v${TINI_VERSION}/tini_${TINI_VERSION}.deb"; > tini.deb & &\ dpkg-I tini.deb & &\ rm tini.deb & &\ apt-get cleanENV PATH / opt/conda/bin:$PATHRUN conda install pytorch torchvision-c pytorch-yENTRYPOINT ["/ usr/bin/tini", "-"] CMD ["/ bin/bash"]

Construct an image through docker build:

Docker build-t pytorch/cuda8. /

The call to cuda was successful.

Third, about some bug

Here is part of the debian configuration. I copied it according to the anaconda image on dockerhub, but it is no longer configured here. Anyway, you can also use the image after running. The system may then have an error:

Kernel:unregister_netdevice: waiting for lo to become free. Usage count = 1

This guy has come up with a solution, at least I believe in the cause of the error: it is caused by the TCP socket error of the kernel. Here I give some thoughts. With regard to the structure diagram above, on the graphics card, the underlying graphics card can be used through the container on top of nvidia-docker,docker (the driver is obviously under docker), and the TCP socket is also used, I guess, while the virtual dockerOS should not have permission to access the host kernel, at least some permissions are restricted by the kernel.

The above is all the contents of the article "how docker mounts NVIDIA graphics card to run pytorch". Thank you for reading! I believe we all have a certain understanding, hope to share the content to help you, if you want to learn more knowledge, welcome to follow the industry information channel!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.