How to implement GPU acceleration example of Lattice algorithm in Python3 04/28 Update SLTechnology News&Howtos

How to implement GPU acceleration example of Lattice algorithm in Python3

2025-04-28 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Development >

Shulou(Shulou.com)06/03 Report--

This article mainly shows you the "Python3 how to achieve the lattice algorithm GPU acceleration example", the content is easy to understand, clear, hope to help you solve the doubt, the following let the editor lead you to study and learn "Python3 how to achieve the lattice algorithm GPU acceleration example" this article.

Technical background

In the field of mathematics and physics, it is always full of continuous functional models. When we use modern computer technology to deal with these problems, in fact, it is impossible to deal with the continuous model directly, in most cases, it has to be transformed into a discrete model and then calculated numerically. For example, calculating the integral of numerical value, calculating the second derivative of numerical value (Hessian matrix) and so on. The lattice algorithm we introduce here is a typical discretization method. This method of discretizing space can simplify the amount of computation to a great extent. For example, in molecular dynamics simulation, when calculating the nearest neighbor table, if the lattice method is not used, then it is necessary to search all the atoms in the whole space, calculate the distance and then determine whether it is a nearest neighbor. If we use the lattice method, we only need to go through the atomic alignment to discretize the lattice points, and then calculate the nearest neighbor table. it is only necessary to calculate whether the atoms in the adjacent 27 lattices in the three-dimensional space satisfy the nearest neighbor condition. In this article, we mainly discuss how to use GPU to realize the lattice algorithm.

Implementation of Lattice algorithm

First of all, let's use an example to illustrate what is called checkpoint. For a system with given coordinates of all atoms, that is, we need to get the corresponding lattice positions of these atoms [nx,ny,nz]. Let's first take a look at the implementation on CPU, which is a traversal algorithm:

The output is as follows

$python3 cuda_grid.py

[[4.17021990e-01 7.20324516e-01 1.14374816e-04]

[3.02332580e-01 1.46755889e-01 9.23385918e-02]

[1.86260208e-01 3.45560730e-01 3.96767467e-01]

[5.38816750e-01 4.19194520e-01 6.85219526e-01]]

[[2. 5. 0.]

[1. 0. 0.]

[0. 1. 3.]

[3. 2. 6.]]

The above two printouts correspond to [xmemy _ z] and [nx,ny,nz], respectively. For example, the first atom is placed on the grid numbered [2 ~ ~ 5 ~ ~ 0]. Well, in order to facilitate the understanding of the method of dotting, we take the atomic system of this three-dimensional space and the label after the trellis to take the first two dimensions to visualize the results.

We can see that these red dots are where the atoms are, and the black gridlines are the dots we marked. When there are a large number of atoms, it is possible that there are many atoms in a grid, so how to hit the lattice and how to define the lattice size are all empirical parameters in different scenarios, which need to be explored together.

Acceleration of lattice algorithm

In the above algorithm implementation, we mainly use a for loop. At this time, we can think of the vectorization operation supported by numba and the hardware acceleration of GPU. Here, let's compare the calculation results of the three implementation schemes:

# cuda_grid.pyfrom numba import jitfrom numba import cudaimport numpy as npdef grid_by_cpu (crd, rxyz, atoms, grids): "Transform coordinates [xpene ydhoz] into grids [nx,ny,nz]. Args: crd (list): The 3murd coordinates of atoms. Rxyz (list): The list includes xmin,ymin,zmin,grid_num. Atoms (int): The total number of atoms. Grids (list): The transformed grids matrix." For i in range (atoms): grids [I] [0] = int ((crd [I] [0]-rxyz [0]) / rxyz [3]) grids [I] [1] = int ((crd [I] [1]-rxyz [1]) / rxyz [3]) grids [I] [2] = int ((crd [I] [2]-rxyz [2]) / rxyz [3]) return grids@jitdef grid_by_jit (crd, rxyz, atoms Grids): "" Transform coordinates [XJI YJ Z] into grids [nx,ny,nz]. Args: crd (list): The 3murd coordinates of atoms. Rxyz (list): The list includes xmin,ymin,zmin,grid_num. Atoms (int): The total number of atoms. Grids (list): The transformed grids matrix. " For i in range (atoms): grids [I] [0] = int ((crd [I] [0]-rxyz [0]) / rxyz [3]) grids [I] [1] = int ((crd [I] [1]-rxyz [1]) / rxyz [3]) grids [I] [2] = int ((crd [I] [2]-rxyz [2]) / rxyz [3]) return grids@cuda.jitdef grid_by_gpu (crd, rxyz) Grids): "" Transform coordinates [XJI YJ Z] into grids [nx,ny,nz]. Args: crd (list): The 3murd coordinates of atoms. Rxyz (list): The list includes xmin,ymin,zmin,grid_num. Atoms (int): The total number of atoms. Grids (list): The transformed grids matrix. " I grid_size j = cuda.grid (2) grids [I] [j] = int ((crd [I] [j]-rxyz [j]) / rxyz [3]) if _ name__=='__main__': np.random.seed (1) atoms = 4 grid_size = 0.1crd = np.random.random ((atoms,3)) .astype (np.float32) xmin = min (crd [:, 0]) ymin = min (crd [: 1]) zmin = min (crd [:, 2]) xmax = max (crd [:, 0]) ymax = max (crd [:, 1]) zmax = max (crd [:, 2]) xgrids = int ((xmax-xmin) / grid_size) + 1 ygrids = int ((ymax-ymin) / grid_size) + 1 zgrids = int (zmax-zmin) / grid_size) + 1 rxyz = np.array ([xmin,ymin,zmin,grid_size] Dtype=np.float32) crd_cuda = cuda.to_device (crd) rxyz_cuda = cuda.to_device (rxyz) grids = np.ones_like (crd) * (- 1) grids = grids.astype (np.float32) grids_cpu = grid_by_cpu (crd, rxyz, atoms, grids) grids = np.ones_like (crd) * (- 1) grids_jit = grid_by_jit (crd, rxyz, atoms) Grids) grids = np.ones_like (crd) * (- 1) grids_cuda = cuda.to_device (grids) grid_by_gpu [(atoms,3), (1m 1)] (crd_cuda, rxyz_cuda Grids_cuda) print (crd) print (grids_cpu) print (grids_jit) print (grids_cuda.copy_to_host ())

The output is as follows:

$python3 cuda_grid.py

/ home/dechin/anaconda3/lib/python3.8/site-packages/numba/cuda/compiler.py:865: NumbaPerformanceWarning: Grid size (12) < 2 * SM count (72) will likely result in GPU under utilization due to low occupancy.

Warn (NumbaPerformanceWarning (msg))

[[4.17021990e-01 7.20324516e-01 1.14374816e-04]

[3.02332580e-01 1.46755889e-01 9.23385918e-02]

[1.86260208e-01 3.45560730e-01 3.96767467e-01]

[5.38816750e-01 4.19194520e-01 6.85219526e-01]]

[[2. 5. 0.]

[1. 0. 0.]

[0. 1. 3.]

[3. 2. 6.]]

[[2. 5. 0.]

[1. 0. 0.]

[0. 1. 3.]

[3. 2. 6.]]

[[2. 5. 0.]

[1. 0. 0.]

[0. 1. 3.]

[3. 2. 6.]]

We first see the alarm message in this, because GPU hardware acceleration can have a more obvious acceleration effect only if it is above a certain density of computation. For example, if we are only calculating the sum of two numbers, there is no need to use GPU at all. But if we want to calculate the sum of two very large arrays, then GPU can be of great value at this time. Because there are only four atoms in our case, it reminds us that we can't show the acceleration effect of GPU at this time. We only pay attention to the operation results here, and the lattice results obtained in different systems are the same, so we can compare the speed differences of several different implementation methods.

# cuda_grid.pyfrom numba import jitfrom numba import cudaimport numpy as npdef grid_by_cpu (crd, rxyz, atoms, grids): "Transform coordinates [xpene ydhoz] into grids [nx,ny,nz]. Args: crd (list): The 3murd coordinates of atoms. Rxyz (list): The list includes xmin,ymin,zmin,grid_num. Atoms (int): The total number of atoms. Grids (list): The transformed grids matrix." For i in range (atoms): grids [I] [0] = int ((crd [I] [0]-rxyz [0]) / rxyz [3]) grids [I] [1] = int ((crd [I] [1]-rxyz [1]) / rxyz [3]) grids [I] [2] = int ((crd [I] [2]-rxyz [2]) / rxyz [3]) return grids@jitdef grid_by_jit (crd, rxyz, atoms Grids): "" Transform coordinates [XJI YJ Z] into grids [nx,ny,nz]. Args: crd (list): The 3murd coordinates of atoms. Rxyz (list): The list includes xmin,ymin,zmin,grid_num. Atoms (int): The total number of atoms. Grids (list): The transformed grids matrix. " For i in range (atoms): grids [I] [0] = int ((crd [I] [0]-rxyz [0]) / rxyz [3]) grids [I] [1] = int ((crd [I] [1]-rxyz [1]) / rxyz [3]) grids [I] [2] = int ((crd [I] [2]-rxyz [2]) / rxyz [3]) return grids@cuda.jitdef grid_by_gpu (crd, rxyz) Grids): "" Transform coordinates [XJI YJ Z] into grids [nx,ny,nz]. Args: crd (list): The 3murd coordinates of atoms. Rxyz (list): The list includes xmin,ymin,zmin,grid_num. Atoms (int): The total number of atoms. Grids (list): The transformed grids matrix. " I grid_size j = cuda.grid (2) grids [I] [j] = int ((crd [I] [j]-rxyz [j]) / rxyz [3]) if _ name__=='__main__': import time from tqdm import trange np.random.seed (1) atoms = 100000 crd = np.random.random ((atoms,3)) .astype (np.float32) xmin = min (crd [: 0]) ymin = min (crd [:, 1]) zmin = min (crd [:, 2]) xmax = max (crd [:, 0]) ymax = max (crd [:, 1]) zmax = max (crd [:, 2]) xgrids = int ((xmax-xmin) / grid_size) + 1 ygrids = int ((ymax-ymin) / grid_size) + 1 zgrids = int ((zmax-zmin) / grid_size) + 1 rxyz = np.array ([xmin,ymin,zmin) Grid_size], dtype=np.float32) crd_cuda = cuda.to_device (crd) rxyz_cuda = cuda.to_device (rxyz) cpu_time = 0 jit_time = 0 gpu_time = 0 for i in trange: grids = np.ones_like (crd) * (- 1) grids = grids.astype (np.float32) time0 = time.time () grids_cpu = grid_by_cpu (crd, rxyz, atoms Grids) time1 = time.time () grids = np.ones_like (crd) * (- 1) time2 = time.time () grids_jit = grid_by_jit (crd, rxyz, atoms Grids) time3 = time.time () grids = np.ones_like (crd) * (- 1) grids_cuda = cuda.to_device (grids) time4 = time.time () grid_by_gpu [(atoms,3), (1mai 1)] (crd_cuda, rxyz_cuda Grids_cuda) time5 = time.time () if I! = 0: cpu_time + = time1-time0 jit_time + = time3-time2 gpu_time + = time5-time4 print ('The time cost of CPU calculation is: {} s'.format (cpu_time)) print (' The time cost of) JIT calculation is: {} s'.format (jit_time)) print ('The time cost of GPU calculation is: {} s'.format (gpu_time))

The output is as follows:

$python3 cuda_grid.py

100% | ██ | 100 amp 100 [00:23

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.