What are the implementation methods of Python non-maximum suppression 07/19 Update SLTechnology News&Howtos

What are the implementation methods of Python non-maximum suppression

2025-07-19 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Development >

Shulou(Shulou.com)06/02 Report--

This article mainly introduces "what are the implementation methods of Python non-maximum suppression". In daily operation, I believe that many people have doubts about the implementation of Python non-maximum suppression. I have consulted all kinds of materials and sorted out simple and easy-to-use operation methods. I hope it will be helpful to answer the doubts about "what are the implementation methods of Python non-maximum suppression"! Next, please follow the editor to study!

one。 A few points explain 1. Brief description Cython:

Cython is a tool for rapid generation of Python expansion modules. Syntactically speaking, it is a hybrid of Python grammar and C language syntax. When Python performance encounters a bottleneck, Cython directly inserts the native speed of C into Python programs, so that Python programs do not need to use C rewriting, and can quickly integrate the original Python programs, which greatly improves the development efficiency and execution efficiency. Cython helps us to do all these intermediate parts.

two。 A brief introduction to NMS:

There are two uses of NMS in Faster-RCNN, the first is when training + prediction, when using ProposalCreator to generate proposal, because only part of the proposal is needed, so NMS is used for filtering. The second use is prediction, when 300 classification and coordinate offset results are obtained, non-maximum suppression is needed for each category one by one. Some people may ask why not directly take the one with the highest confidence for each category? Because there may be more than one category in a picture, for example, there are multiple people in a picture, those who directly take the highest confidence can only predict one of them, and through NMS, ideally, each person (each individual in each category) will have and only one bbox box.

two。 Four methods to achieve 1. Pure python implementation: nmspurpy.pyhammer python binv python3#-*-coding: utf-8-*-"" Created on Mon May 7 21:45:37 2018@author: lps "" import numpy as npboxes=np.array ([100 pyrrine 0.72], [250 pyrrine 420 pyrrine 0.92], [220pime 220pime 330pl 0.92], [100pime 100mr210mr0.72], [230jin240p325pr 330mem0.81], [220pyrum 230jor31540mem0.9]) def py_cpu_nms Thresh): # dets: (mPower5) thresh:scaler x1 = dets [:, 0] y1 = dets [:, 1] x2 = dets [:, 2] y2 = dets [:, 3] areas = (y2-y1+1) * (x2-x1+1) scores = dets [: 4] keep = [] index = scores.argsort () [::-1] while index.size > 0: I = index [0] # every time the first is the biggst, and add it directly keep.append (I) x11 = np.maximum (x1 [I], x1 [index [1:]]) # calculate the points of overlap Y11 = np.maximum (y1 [I] Y1 [index [1:]]) x22 = np.minimum (x2 [I], x2 [index [1:]]) y22 = np.minimum (y2 [I], y2 [index [1:]]) w = np.maximum (0, x22-x11+1) # the weights of overlap h = np.maximum (0 Y22-y11+1) # the height of overlap overlaps = ious= overlaps / (areas [I] + areas [index [1:]]-overlaps) idx = np.where (ious= b else bcdef inline np.float32_t min (np.float32_t a, np.float32_t b): return an if a thresh: suppressed [j] = 1 return keepimport matplotlib.pyplot as pltdef plot_bbox (dets) ): X1 = dets [:, 0] y1 = dets [:, 1] x2 = dets [:, 2] y2 = dets [:, 3] plt.plot ([x1jue x2], c) plt.plot ([x1jue x1], [y1jiny2], c) plt.plot ([x1memx2], [y2memy2], c) plt.plot ([x2jue x2], [y1jiny2] c)

Among them, the static type of variables can greatly improve the efficiency, because the main variables involved in the calculation are the variables, and the main change is to define the variables using cdef.

Then create setup2.py as above:

From distutils.core import setupfrom Cython.Build import cythonizesetup (name = 'nms_module', ext_modules = cythonize (' nums_py2.pyx'),)

After build, copy the dynamic library .so to the nms folder, and then modify the test script as above to execute the test script:

Thresh=0.7, time wastes:0.0019

Thresh=0.8, time wastes:0.0028

Thresh=0.9, time wastes:0.0036

It is found that the speed is 15 times, 38 times and 118 times higher than that of pure python.

4. On the basis of method 3, GPU:gpu_nms.pyximport numpy as npcimport numpy as npassert sizeof (int) = = sizeof (np.int32_t) cdef extern from "gpu_nms.hpp": void _ nms (np.int32_t*, int*, np.float32_t*, int, int, float, int) def gpu_nms (np.ndarry [np.float32 _ t, ndim=2] dets, np.float thresh Np.int32_t device_id=0): cdef int boxes_num = dets.shape [0] cdef int boxes_dim = dets.shape [1] cdef int num_out cdef np.ndarry [np.int32 _ t, ndim=1]\ keep = np.zeros (boxes_num, dtype=np.int32) cdef np.ndarray [np.float32 _ t, ndim=1]\ scores = dets [:, 4] cdef np.ndarray [np.int _ t Ndim=1]\ order = scores.argsort () [::-1] cdef np.ndarry [np.float32 _ t, ndim=2]\ sorted_dets = dets [order,:] _ nms (& keep [0], & num_out, & sorted_dets [0,0], boxes_num, boxes_dim, thresh, device_id) keep = keep [: num_out] return list (order [keep])

Then create the file nms_gpu.hpp:

Void _ nms (int* keep_out, int* num_out, const float* boxes_host, int boxes_num, int boxes_dim, float nms_overlap_thresh, int device_id)

And nms_kernel.cu files:

# include "gpu_nms.hpp" # include # include # define CUDA_CHECK (condition)\ / * Code block avoids redefinition of cudaError_t error * /\ do {\ cudaError_t error = condition;\ if (error! = cudaSuccess) return; const int row_size = min (n_boxes-row_start * threadsPerBlock, threadsPerBlock); const int col_size = min (n_boxes-col_start * threadsPerBlock, threadsPerBlock) _ _ shared__ float block_ boxes [thread PerBlock * 5]; if (threadIdx.x

< col_size) { block_boxes[threadIdx.x * 5 + 0] = dev_boxes[(threadsPerBlock * col_start + threadIdx.x) * 5 + 0]; block_boxes[threadIdx.x * 5 + 1] = dev_boxes[(threadsPerBlock * col_start + threadIdx.x) * 5 + 1]; block_boxes[threadIdx.x * 5 + 2] = dev_boxes[(threadsPerBlock * col_start + threadIdx.x) * 5 + 2]; block_boxes[threadIdx.x * 5 + 3] = dev_boxes[(threadsPerBlock * col_start + threadIdx.x) * 5 + 3]; block_boxes[threadIdx.x * 5 + 4] = dev_boxes[(threadsPerBlock * col_start + threadIdx.x) * 5 + 4]; } __syncthreads(); if (threadIdx.x < row_size) { const int cur_box_idx = threadsPerBlock * row_start + threadIdx.x; const float *cur_box = dev_boxes + cur_box_idx * 5; int i = 0; unsigned long long t = 0; int start = 0; if (row_start == col_start) { start = threadIdx.x + 1; } for (i = start; i < col_size; i++) { if (devIoU(cur_box, block_boxes + i * 5) >

Nms_overlap_thresh) {t | = 1ULL

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.