How to install and use CUDA under Linux 04/09 Update SLTechnology News&Howtos

How to install and use CUDA under Linux

2025-04-09 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Development >

Shulou(Shulou.com)06/01 Report--

This article will explain in detail how to install and use CUDA under Linux. The editor thinks it is very practical, so I share it with you for reference. I hope you can get something after reading this article.

CUDA is a computing platform launched by graphics card manufacturer NVIDIA. CUD is a general parallel computing architecture introduced by NVIDIA, which enables GPU to solve complex computing problems.

CUDA installation steps in general, the process for installing and using CUDA under Linux is as follows:

Install NVIDIA Driver, the video card driver

Install CUDA Toolkit

GPU-accelerated CUDA programming using the GPU Candle + compiler or the Python extension library

The second half of this article will introduce the detailed steps for the installation and use of CUDA according to the above process.

Install NVIDIA Driver and CUDA Toolkit first check that the system has a GPU that supports CUDA programming. Can be used

Lspci | grep-I nvidia command to view the GPU model of the current system.

The operating system I use is provided by a virtual machine instance including GPU generated by Google Cloud Compute Engine. The system version is Ubuntu 16.04 LTS,GPU and NVIDIA Tesla K80 (1). The above command output

00Tesla 04.0 3D controller: NVIDIA Corporation GK210GL [Tesla K80] (rev A1) Note: unlike virtual machines that usually run on local hosts, virtual machines here run directly on Google's CVM, for which you can apply for GPU quotas and install NVIDIA Driver and CUDA Toolkit.

Traditionally, the steps to install NVIDIA Driver and CUDA Toolkit are separate, but in practice we can install CUDA Toolkit directly, and the system will automatically install NVIDIA Driver that matches its version. Let's talk about how to install CUDA Toolkit.

Before installing CUDA Toolkit, make sure that gcc and make are installed on the system. If you want to use C++ for CUDA programming, you need to install Gmail +. If you want to run the CUDA sample program, you need to install the appropriate dependent libraries.

Sudo apt update # update apt sudo apt install gcc Gateway + make # install gcc Gmail + make sudo apt install libglu1-mesa libxi-dev libxmu-dev libglu1-mesa-dev freeglut3-dev # installation dependent library select the system version and installation method on the download page of CUDA Toolkit, download and run runfile.

CUDA Toolkit download page

Download CUDA Toolkit (large file):

Wget http://developer.download.nvidia.com/compute/cuda/10.1/Prod/local_installers/cuda_10.1.243_418.87.00_linux.run installs CUDA Toolkit (for a long time):

After sudo sh cuda_10.1.243_418.87.00_linux.run installs CUDA Toolkit, the screen will output:

Driver: Installed Toolkit: Installed in / usr/local/cuda-10.1/ Samples: Installed in / home/abneryepku/ Please make sure that-PATH includes / usr/local/cuda-10.1/-LD_LIBRARY_PATH includes / usr/local/cuda-10.1/lib64, or, add / usr/local/cuda-10.1/lib64 to / etc/ld.so.conf and run ldconfig as root this means that NVIDIA Driver and CUDA Toolkit have been installed. The second half of the installation information prompts us to modify the environment variables PATH and LD_LIBRARY_PATH. Write in ~ / .bashrc file

# add nvcc compiler to path export PATH=$PATH:/usr/local/cuda-10.1/bin # add cuBLAS, cuSPARSE, cuRAND, cuSOLVER, cuFFT to path export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/local/cuda-10.1/lib64:/usr/lib/x86_64-linux-gnu can complete the configuration of CUDA.

Note:

The environment variable PATH sets the search path for executable programs, and LD_LIBRARY_PATH sets the search path for dynamic link libraries.

\ 2. CUDA, cuRAND and other dynamic libraries are all located in the / usr/local/cuda-10.1/lib64 path. Prior to CUDA 10.0, cuBLAS was also under this path, but in CUDA 10.1, cuBLAS was migrated to / usr/lib/x86_64-linux-gnu. You can run the

Sudo find /-iname libcublas* to find the path to the cuBLAS dynamic library.

\ 3. CUDA Toolkit installed with Anaconda is not in the lib64 path and does not conflict.

The test sample program can be found in the path

Find some sample programs under the / usr/local/cuda-10.1/extras/demo_suite path. DeviceQuery will output information about CUDA:

CUDA Device Query (Runtime API) version (CUDART static linking) Detected 1 CUDA Capable device (s) Device 0: "Tesla K80" CUDA Driver Version / Runtime Version 10.1 / 10.1 CUDA Capability Major/Minor version number: 3.7 Total amount of global memory: 11441 MBytes (11996954624 bytes) (13) Multiprocessors CUDA Cores/MP: 2496 CUDA Cores GPU Max Clock rate: 824 MHz (0.82 GHz) Memory Clock rate: 2505 Mhz Memory Bus Width: 384-bit L2 Cache Size: 1572864 bytes Maximum Texture Dimension Size 2D = (65536, 65536), 3D = (4096, 4096, 4096) Maximum Layered 1D Texture Size, (num) layers 1D = (16384), 2048 layers Maximum Layered 2D Texture Size, (num) layers 2D = (16384, 16384) 2048 layers Total amount of constant memory: 65536 bytes Total amount of shared memory per block: 49152 bytes Total number of registers available per block: 65536 Warp size: 32 Maximum number of threads per multiprocessor: 2048 Maximum number of threads per block: 1024 Max dimension size of a thread block: (1024, 1024, 64) Max dimension size of a grid size (xmemy) Z): (2147483647, 65535 65535) Maximum memory pitch: 2147483647 bytes Texture alignment: 512 bytes Concurrent copy and kernel execution: Yes with 2 copy engine (s) Run time limit on kernels: No Integrated GPU sharing Host Memory: No Support host page-locked memory mapping: Yes Alignment requirement for Surfaces: Yes Device has ECC support: Enabled Device supports Unified Addressing (UVA): Yes Device supports Compute Preemption: No Supports Cooperative Kernel Launch: No Supports MultiDevice Co-op Kernel Launch: No Device PCI Domain ID / Bus ID / location ID: 0 / 0 / 4 Compute Mode: deviceQuery Various characteristics of CUDA Driver = CUDART, CUDA Driver Version = 10.1, CUDA Runtime Version = 10.1, NumDevs = 1, Device0 = Tesla K80 Result = PASSCUDA: texture memory (texture memory), constant memory (constant memory), shared memory (shared memory), block (block), thread (thread), unified addressing (unified addressing) are all included in the above information. Understanding these features is the basis for CUDA Cramp programming.

Other programs such as bandwidthTest and vectorAdd will also test the performance of CUDA.

Configure the nvcc compiler nvcc is the compiler of CUDA CumberCutter +, you can directly compile (.cu) source files containing C++ syntax, the syntax is similar to gcc. The path to nvcc is located at:

/ usr/local/cuda-10.1/bin enter on the command line

Nvcc-- version can view the version of the CUDA CumberCure + compiler nvcc, and the native results are as follows

Nvcc: NVIDIA (R) Cuda compiler driver Copyright (c) 2005-2019 NVIDIA Corporation Built on Sun_Jul_28_19:07:16_PDT_2019 Cuda compilation tools, release 10.1, V10.1.243 use nvcc to compile the source file containing CUDA Library, you need to add the corresponding flag to the nvcc command. For example, the flag of cuRAND is-lcurand,cuBLAS and the flag is-lcublas. If you don't want to add these dynamic libraries every time you compile, you can write in .bashrc

The alias nvc= "nvcc-std=c++11-lcurand-lcublas" is then compiled using nvc to compile the Cmax Clipper + file.

To experience GPU programming, test a simple CUDA C++ program: the addition of two integer vectors

# include # include using namespace std; _ global__ void add (int * a, const int * b) {int I = blockIdx.x; a [I] + = b [I];} int main () {const int N = 10; / / number of elements int * a, * b, * temp, I; / / malloc HOST memory for temp temp = new int [N] / / malloc DEVICE memory for a, b cudaMalloc (& a, N*sizeof (int)); cudaMalloc (& b, N*sizeof (int)); / / set a's values: a [I] = I for (temp, a, N*sizeof (int), cudaMemcpyDeviceToHost); / / show a's values cudaMemcpy (temp, a, N*sizeof (int), cudaMemcpyDeviceToHost); I the CUDA functions used by the above code include: cudaMalloc: apply for the pointer to the memory in GPU cudaMemcpy: the memory copy between CPU and GPU cudaFree: release the pointer to the GPU memory to save the above code as a file test.cu, and type nvcc-std=c++11-o test test.cu on the command line to generate an executable file named test. Open this file, the screen will output 0 3 6 9 12 15 18 21 24 27 Note: the above code is only used to test the CUDA Calgary + program and does not have a reference on running efficiency. CUDA Python programming with Numba Numba is a Python extension library for high performance computing. It uses just-in-time compilation mechanism (JIT) to convert part of the code of Python and NumPy into machine instructions, thus greatly improving the running speed of the program. It is recommended to use Anaconda to manage various Python extension libraries, including Numba, because it is much more convenient than pip. Download and install Anaconda: wget https://repo.continuum.io/archive/Anaconda3-2019.07-Linux-x86_64.sh sudo sh Anaconda3-2019.07-Linux-x86_64.sh install CUDA Toolkit with Anaconda: conda install-c anaconda cudatoolkit install CUDA Toolkit with conda is located in Anaconda and can only be used for Python. The CUDA Toolkit used by CUDA Toolkit + and Anaconda are independent of each other and can exist at the same time without affecting each other. The version of CUDA Toolkit installed with Anaconda cannot exceed the latest version of CUDA supported by NVIDIA Driver. Numba is one of the extension libraries that come with Anaconda. Enter numba-s on the command line to view hardware information, operating system information, Python version, CUDA version information. The output result of this machine is: hardware information: _ _ Hardware Information__ Machine: x86: 64 CPU Name: broadwell Number of accessible CPU cores: 4 Listed accessible CPUs cores: 0-3 CFS restrictions: None CPU Features : 64bit adx aes avx avx2 bmi bmi2 cmov cx16 f16c fma fsgsbase invpcid lzcnt mmx movbe pclmul popcnt prfchw rdrnd rdseed rtm sahf sse sse2 sse3 sse4.1 sse4.2 ssse3 xsave xsaveopt operating system information: _ _ OS Information__ Platform: Linux-4.15.0-1040-gcp-x86_64-with-debian-stretch-sid Release : 4.15.0-1040-gcp System Name: Linux Version: # 42~16.04.1-Ubuntu SMP Wed Aug 7 16:42:41 UTC 2019 OS specific info: debianstretch/sid glibc infoPython version _ _ Python Information__ Python Compiler : GCC 7.3.0 Python Implementation: CPython Python Version: 3.7.3 Python Locale: en_US UTF-8CUDA version _ _ CUDA Information__ Found 1 CUDA devices id 0 b'Tesla K80' [SUPPORTED] compute capability: 3.7 pci device id: 4 pci bus id: 0 Summary: 1/1 devices are supported CUDA driver version: 10010 CUDA libraries: Finding cublas from Conda environment named libcublas.so.10.2.0.168 trying to open library... Ok Finding cusparse from Conda environment named libcusparse.so.10.1.168 trying to open library... Ok Finding cufft from Conda environment named libcufft.so.10.1.168 trying to open library... Ok Finding curand from Conda environment named libcurand.so.10.1.168 trying to open library... Ok Finding nvvm from Conda environment named libnvvm.so.3.3.0 trying to open library... Ok Finding libdevice from Conda environment searching for compute_20... Ok searching for compute_30... Ok searching for compute_35... Ok searching for compute_50... Please refer to the official documentation for the specific use of okNumba. Using TensorFlow + CUDA for GPU accelerated Python programming using conda to install the GPU version of TensorFlow: conda install-c anaconda tensorflow-gpu may prompt the environment for write permission errors during installation: EnvironmentNotWritableError: The current user does not have write permissions to the target environment. Environment location: / home/abneryepku/anaconda3 uid: 1001 gid: 1002 can be solved by modifying folder permissions: after sudo chown-R 1001gid 1002 / home/abneryepku/anaconda3 installs TensorFlow, in order to check whether GPU is available, enter the Python interpreter environment and type: import tensorflow as tf tf.test.is_gpu_available () to see if GPU is available in TensorFlow. Other extension libraries such as PyTorch can be installed using a similar method. This is the end of the article on "how to install and use CUDA under Linux". I hope the above content can be of some help to you, so that you can learn more knowledge. if you think the article is good, please share it for more people to see.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.