In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-01-18 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Development >
Share
Shulou(Shulou.com)06/02 Report--
Py3nvml to achieve GPU-related information reading example analysis, many novices are not very clear about this, in order to help you solve this problem, the following editor will explain in detail for you, people with this need can come to learn, I hope you can gain something.
In the process of deep learning or other types of GPU operations, the monitoring of GPU information is also a very common function. If you only use the system-level GPU monitoring tool, there is no way to track every step of the memory and usage changes in great detail. If you use profiler, it is too detailed, and the environment configuration, information output and filtering are not very convenient. At this point, you can consider using tools such as py3nvml to make a detailed analysis of the process of GPU task execution, which will help to improve the utilization of GPU and the performance of program execution.
Technical background
With the growth of model computation and the development of hardware technology, using GPU to complete various tasks has gradually become the mainstream means of algorithm implementation. For some GPU occupancy during the run, such as the video memory utilization of each step, we need some more detailed tools for reading GPU information. Here, we recommend using py3nvml to monitor a process of python code running.
General information reading
Generally speaking, nvidia-smi is commonly used to read information such as GPU utilization, video memory occupation, driver version and so on:
$nvidia-smiWed Jan 12 15:52:04 2022 CUDA Version UV + | NVIDIA-SMI 470.42.01 Driver Version: 470.42.01 CUDA Version: 11.4 | |- -- + | GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. | MIG M. | | = | 0 Quadro RTX 4000 On | 00000000VO3Memory-Usage 00.0 On | NumberA | | 30% 39C P8 20W / 125W | 538MiB / 7979MiB | 16% Default | | | NCMA | +-- + | 1 Quadro | RTX 4000 On | 00000000:A6:00.0 Off | Namp A | | 30% 32C P8 7W / 125W | 6MiB / 7982MiB | 0% Default | NambiA | +- -+- -+ | Processes: | GPU GI CI PID Type Process name GPU Memory | | ID ID Usage | | = = | | 0 NameA NameA 1643 G / usr/lib/xorg/Xorg 412MiB | | 0 Nacheron A 2940 G / usr/bin/gnome-shell 76MiB | | 0 Nachet A 47102 G... AAAAAAAAA=-- shared-files 35MiB | | 0 Nachet A 172424 G... AAAAAAAAA=-- shared-files 11MiB | | 1 Nachet A N /. A 1643 G / usr/lib/xorg/Xorg 4MiB | +-- +
But if you do not use profile and only use the output of the nvidia-smi instruction, there is no way to analyze the changes in the running process of the program in great detail. Here, by the way, is a more elaborate gadget that is very similar to nvidia-smi: gpustat. This tool can be installed and managed directly using pip:
$python3-m pip install gpustatCollecting gpustat Downloading gpustat-0.6.0.tar.gz (78 kB) | ██ | 78 kB 686kB/sRequirement already satisfied: six > = 1.7 in/ home/dechin/.local/lib/python3.8/site-packages (from gpustat) (1.16.0) Collecting nvidia-ml-py3 > = 7.352.0 Downloading nvidia-ml-py3-7 .352.0.tar.gz (19 kB) Requirement already satisfied: psutil in/ home/dechin/.local/lib/python3.8/site-packages (from gpustat) (5.8.0) Collecting blessings > = 1.6 Downloading blessings-1.7-py3-none-any.whl (18 kB) Building wheels for collected packages: gpustat Nvidia-ml-py3 Building wheel for gpustat (setup.py)... Done Created wheel for gpustat: filename=gpustat-0.6.0-py3-none-any.whl size=12617 sha256=4158e741b609c7a1bc6db07d76224db51cd7656a6f2e146e0b81185ce4e960ba Stored in directory: / home/dechin/.cache/pip/wheels/0d/d9/80/b6cbcdc9946c7b50ce35441cc9e7d8c5a9d066469ba99bae44 Building wheel for nvidia-ml-py3 (setup.py). Done Created wheel for nvidia-ml-py3: filename=nvidia_ml_py3-7.352.0-py3-none-any.whl size=19191 sha256=70cd8ffc92286944ad9f5dc4053709af76fc0e79928dc61b98a9819a719f1e31 Stored in directory: / home/dechin/.cache/pip/wheels/b9/b1/68/cb4feab29709d4155310d29a421389665dcab9eb3b679b527bSuccessfully built gpustat nvidia-ml-py3Installing collected packages: nvidia-ml-py3, blessings, gpustatSuccessfully installed blessings-1.7 gpustat-0.6.0 nvidia-ml-py3-7.352.0
When used, it is also very similar to nvidia-smi:
$watch-color-N1 gpustat-cpu
The returned result is as follows:
Every 1.0s: gpustat-cpu ubuntu2004: Wed Jan 12 15:58:59 2022
Ubuntu2004 Wed Jan 12 15:58:59 2022 470.42.01
[0] Quadro RTX 4000 | 39cm, 3% | 537 / 7979 MB | root:Xorg/1643 (412m) de
Chin:gnome-shell/2940 (75m) dechin:slack/47102 (35m) dechin:chrome/172424 (11m)
[1] Quadro RTX 4000 | 32cm C, 0% | 6 / 7982 MB | root:Xorg/1643 (4m)
The results returned through gpustat contain general information such as the model, usage, and video memory usage of GPU, and the current temperature of GPU.
Installation and use of py3nvml
Next, let's officially take a look at the installation and use of py3nvml, a library that can view and monitor GPU information in python in real time, which can be installed and managed through pip:
$python3-m pip install py3nvmlCollecting py3nvml Downloading py3nvml-0.2.7-py3-none-any.whl (55 kB) | ██ | 55 kB 650 kB/sRequirement already satisfied: xmltodict in/ home/dechin/anaconda3/lib/python3.8/site-packages (from py3nvml) (0.12.0) Installing collected packages: py3nvmlSuccessfully installed py3nvml-0.2.7py3nvml binds GPU card
In order to maximize performance, some frameworks default to use all GPU cards in the entire resource pool during initialization, such as the following example demonstrated by using Jax:
In [1]: import py3nvmlIn [2]: from jax import numpy as jnpIn [3]: X = jnp.ones (1000000000) In [4]:! nvidia-smiWed Jan 12 16:08:32 2022 -+ | NVIDIA-SMI 470.42.01 Driver Version: 470.42.01 CUDA Version: 11.4 | |-+ | GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. | MIG M. | | = | 0 Quadro RTX 4000 On | 0000000003Memory-Usage 00.0 On | NumberA | | 30C P038W / 125W | 7245MiB / 7979MiB | 0 Default | | | NCMA | +-- + | 1 Quadro | RTX 4000 On | 00000000:A6:00.0 Off | Namp A | | 30% 35C P0 35W / 125W | 101MiB / 7982MiB | 0 Default | NambiA | +-- -+ + -+ | Processes: | | GPU GI CI PID Type Process name GPU Memory | | ID ID Usage | | = = | | 0 NAMA NAMA 1643 G / usr/lib/xorg/Xorg 412MiB | | 0 Nacheron A 2940 G / usr/bin/gnome-shell 75MiB | | 0 Nachet A 47102 G... AAAAAAAAA=-- shared-files 35MiB | | 0 Nachet A 172424 G... AAAAAAAAA=-- shared-files 11MiB | | 0 Nachet A NameA | 812125 C / usr/local/bin/python 6705MiB | | 1 NAcer A 1643 G / usr/lib/xorg/Xorg 4MiB | | 1 Nachra A 812125 C / usr/local/bin/python 93MiB | +-- -- +
In this case, we only allocated a piece of space in the video memory to store a vector, but after initialization, Jax automatically occupied the two local GPU cards. According to the official method provided by Jax, we can use the following operation to configure environment variables so that Jax can only see one of the cards, so that it will not expand:
In [1]: import osIn [2]: os.environ ["CUDA_VISIBLE_DEVICES"] = "1" In [3]: from jax import numpy as jnpIn [4]: X = jnp.ones (1000000000) In [5]:! nvidia-smiWed Jan 12 16:10:36 2022 In- -+ | NVIDIA-SMI 470.42.01 Driver Version: 470.42.01 CUDA Version: 11.4 |-- +-- -+ | GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. | MIG M. | | = | 0 Quadro RTX 4000 On | 0000000003Memory-Usage 00.0 On | NambiA | | 30% 40C P8 19W / 125W | 537MiB / 7979MiB | 0 Default | | | NCMA | +-- + | 1 Quadro | RTX 4000 On | 00000000:A6:00.0 Off | Namp A | | 30% 35C P0 35W / 125W | 7195MiB / 7982MiB | 0 Default | NambiA | +-- -+ + -+ | Processes: | | GPU GI CI PID Type Process name GPU Memory | | ID ID Usage | | = = | | 0 NAMA NAMA 1643 G / usr/lib/xorg/Xorg 412MiB | | 0 Nacheron A 2940 G / usr/bin/gnome-shell 75MiB | | 0 Nacher A 47102 G... AAAAAAAAA=-- shared-files 35MiB | | 0 Nachet A 172424 G... AAAAAAAAA=-- shared-files 11MiB | 1 Nachra A NameA | 1643 G / usr/lib/xorg/Xorg 4MiB | | 1 Nram A 813030 C / usr/local/bin/python 7187MiB | +-+
You can see that only one GPU card has been used in the result, which has achieved our goal, but this function achieved by configuring environment variables is not enough for pythonic. Therefore, this feature is also provided in py3nvml. You can specify a series of GPU cards to be used to perform tasks:
In [1]: import py3nvmlIn [2]: from jax import numpy as jnpIn [3]: py3nvml.grab_gpus (num_gpus=1 Gpu_select= [1]) Out [3]: 1In [4]: X = jnp.ones (1000000000) In [5]:! nvidia-smiWed Jan 12 16:12:37 2022 + | NVIDIA-SMI 470.42.01 Driver Version: 470.42.01 CUDA Version: 11.4 | |-- + | GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. | MIG M. | | = | 0 Quadro RTX 4000 On | 0000000003Memory-Usage 00.0 On | NambiA | | 30C P8 20W / 125W | 537MiB / 7979MiB | 0 Default | | | NCMA | +-- + | 1 Quadro | RTX 4000 On | 00000000:A6:00.0 Off | Namp A | | 30% 36C P0 35W / 125W | 7195MiB / 7982MiB | 0 Default | NambiA | +-- -+ + -+ | Processes: | | GPU GI CI PID Type Process name GPU Memory | | ID ID Usage | | = = | | 0 NAMA NAMA 1643 G / usr/lib/xorg/Xorg 412MiB | | 0 Nacheron A 2940 G / usr/bin/gnome-shell 75MiB | | 0 Nacher A 47102 G... AAAAAAAAA=-- shared-files 35MiB | | 0 Nachet A 172424 G... AAAAAAAAA=-- shared-files 11MiB | 1 Nachra A NameA | 1643 G / usr/lib/xorg/Xorg 4MiB | | 1 Nram A 814673 C / usr/local/bin/python 7187MiB | +-+
You can see that only one GPU card is used in the result, which has the same effect as the previous step.
View free GPU
The criterion for determining the GPU,py3nvml available in the environment is that there are no processes on this GPU, so this is a usable GPU card:
In [1]: import py3nvmlIn [2]: free_gpus = py3nvml.get_free_gpus () In [3]: free_gpusOut [3]: [True, True]
Of course, what needs to be noted here is that the system application will not be recognized here, but should judge the daemon.
Command line information acquisition
Much like nvidia-smi, py3nvml can also be used on the command line by calling py3smi. It is worth mentioning that if you need to use nvidia-smi to monitor the use of GPU in real time, you often need to use it with watch-n, but if it is py3smi, you do not need it, you can directly use py3smi-l to achieve similar functions.
$py3smi-l 5Wed Jan 12 16:17:37 2022 UV UV + | NVIDIA-SMI Driver Version: 470.42.01 | | +-+ | GPU Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. | + = = + | 0 30% 39C 8 19W / 125W | 537MiB / | 7979MiB | 0% Default | | 130% 33C 8 7W / 125W | 6MiB / 7982MiB | 0% Default | +-- +-- -+ | Processes: GPU Memory | | GPU Owner PID Uptime Process Name Usage | + = + +-+
You can see that the slight difference is that there are not as many processes as listed in nvidia-smi, and the system processes should be automatically ignored.
Check the driver version and video card model separately
The ability to view drivers and models is listed separately in py3nvml:
In [1]: from py3nvml.py3nvml import * In [2]: nvmlInit () Out [2]: In [3]: print ("DriverVersion: {}" .format (nvmlSystemGetDriverVersion ()) DriverVersion: 470.42.01In [4]: deviceCount = nvmlDeviceGetCount ().: for i in range (deviceCount):...: handle = nvmlDeviceGetHandleByIndex (I): print ("Device {}: {}" .format (I) NvmlDeviceGetName (handle)).: Device 0: Quadro RTX 4000Device 1: Quadro RTX 4000In [5]: nvmlShutdown ()
In this way, we do not need to screen one by one, which is more convenient in terms of flexibility and expansibility.
View video memory information separately
Similarly, the usage information of video memory is listed separately, so there is no need for users to filter this information separately, which is relatively detailed:
In [1]: from py3nvml.py3nvml import * In [2]: nvmlInit () Out [2]: In [3]: handle = nvmlDeviceGetHandleByIndex (0) In [4]: info = nvmlDeviceGetMemoryInfo (handle) In [5]: print ("Total memory: {} MiB" .format (info.total > > 20) Total memory: 7979MiBIn [6]: print ("Free memory: {} MiB" .format (info.free > 20) Free memory: 7441MiBIn [7]: print ("Used memory: {} MiB") .format (info.used > > 20) Used memory: 537MiB
If you insert this code into the program, you can learn about the changes in the video memory occupied by each step.
Is it helpful for you to read the above content? If you want to know more about the relevant knowledge or read more related articles, please follow the industry information channel, thank you for your support.
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.