How to use Deep Learning and OpenCV for Target Detection 07/15 Update SLTechnology News&Howtos

How to use Deep Learning and OpenCV for Target Detection

2025-07-15 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Development >

Shulou(Shulou.com)06/02 Report--

This article is about how to use deep learning and OpenCV for target detection. The editor thinks it is very practical, so share it with you as a reference and follow the editor to have a look.

Target detection using deep learning and OpenCV

When object detection based on deep learning, you may encounter three main object detection methods:

Faster R-CNNs (Ren et al., 2015)

You Only Look Once (YOLO) (Redmon et al., 2015)

Single Shot Detectors (SSD) (Liu et al., 2015)

Faster R-CNNs may be the most "listening and speaking" method for object detection using deep learning; however, the technique may be difficult to understand (especially for beginners of deep learning), difficult to implement, and difficult to train.

In addition, even with the "faster" R-CNN implementation (where "R" stands for "region proposal"), the algorithm can be very slow, about 7 FPS.

If pure speed is pursued, then we prefer to use YOLO because this algorithm is much faster and can handle 40-90 FPS on Titan X GPU. The ultra-fast variant of YOLO can even reach 155 FPS.

The problem with YOLO is that its accuracy is not high.

SSD, originally developed by Google, is a balance between the two. This algorithm is more direct than Faster R-CNN.

MobileNets: efficient (depth) neural network

When building an object detection network, we usually use existing network architectures, such as VGG or ResNet, which can be very large, about 200-500MB. Because of its large scale and the resulting amount of computing, such a network architecture is not suitable for resource-constrained devices. Instead, we can use another paper by Google researchers, MobileNets (Howard et al., 2017). We call these networks "MobileNets" because they are designed for devices with limited resources, such as your smartphone. MobileNet differs from traditional CNN in that it uses depth separable convolution. The general idea behind deep separable convolution is to integrate the volume into two stages:

3 × 3 depth convolution.

This is followed by 1 × 1 point-by-point convolution.

This enables us to actually reduce the number of parameters in the network. The problem is that accuracy is sacrificed-MobileNets is usually not as accurate as their big brothers. …… But their resource efficiency is much higher.

Using OpenCV for object Detection based on Deep Learning

MobileNet SSD first trains on COCO datasets (common objects in context), and then fine-tune on PASCAL VOC to achieve 72.7% mAP (average precision).

Therefore, we can detect 20 objects in the image (background class is + 1), including airplanes, bicycles, birds, boats, bottles, buses, cars, cats, chairs, cows, dining tables, dogs, horses, motorcycles, people, potted plants, sheep, sofas, trains and TV monitors.

In this section, we will use the MobileNet SSD + depth Neural Network (dnn) module in OpenCV to build our target detector.

Open a new file, name it object_detection.py, and insert the following code:

Import numpy as npimport cv2if _ _ name__== "_ _ main__": image_name = '11.jpg' prototxt =' MobileNetSSD_deploy.prototxt.txt' model_path = 'MobileNetSSD_deploy.caffemodel' confidence_ta = 0.2 # initialize the class tag list for MobileNetSSD training # Detection Then a set of bounding box colors CLASSES = ["background", "aeroplane", "bicycle", "bird", "boat", "bottle", "bus", "car", "cat", "chair", "cow", "diningtable", "dog", "horse", "motorbike", "person", "pottedplant" are generated for each class. "sheep", "sofa", "train", "tvmonitor"] COLORS = np.random.uniform (0,255, size= (len (CLASSES), 3))

Import the required package.

Define global parameters:

Image_name: enter the path to the image.

Prototxt: the path to the Caffe prototxt file.

Model_path: the path of pre-training model.

Confidence_ta: the minimum probability threshold for filtering weak detection. The default value is 20%.

Next, let's initialize the class label and bounding box color.

# load our serialized model from disk print ("[INFO] loading model...") Net = cv2.dnn.readNetFromCaffe (prototxt, model_path) # loads the input image and constructs an input blob # for the image to resize to fixed 300x300 pixels. # (Note: the input to the SSD model is 300x300 pixels) image = cv2.imread (image_name) (h, w) = image.shape [: 2] blob = cv2.dnn.blobFromImage (cv2.resize (image, (300,300)), 0.007843, (300,300) 127.5) # passing blob over the network and obtaining test results and # predicting print ("[INFO] computing object detections...") Net.setInput (blob) detections = net.forward ()

Load the model from disk.

Read the picture.

Extract the height and width (line 35) and calculate a 300 x 300 pixel blob from the image.

Put the blob into the neural network.

Calculates the forward pass of the input and stores the result as detections.

# cycle test result for i in np.arange (0, detections.shape [2]): # extract confidence (i.e. probability) related to data # predict confidence = detections [0,0, I 2] # filter out weak detection by ensuring "confidence" # greater than the minimum confidence if confidence > confidence_ta: # extract the index of the class tag from `classitions` # then calculate the (x, y) coordinates of the bounding box of the object idx = int (detections [0,0,I, 1]) box = detections [0,0,I, 3:7] * np.array ([w, h, w, h]) (startX, startY, endX EndY) = box.astype ("int") # shows forecast label = "{}: {: .2f}%" .format (CLASSES [idx], confidence * 100) print ("[INFO] {}" (label)) cv2.rectangle (image, (startX, startY), (endX, endY) COLORS [idx], 2) y = startY-15 if startY-15 > 15 else startY + 15 cv2.putText (image, label, (startX, y), cv2.FONT_HERSHEY_SIMPLEX, 0.5, COLORS [idx] 2) # show the output image cv2.imshow ("Output", image) cv2.imwrite ("output.jpg", image) cv2.waitKey (0)

Loop detection, first we extract the confidence value.

If the confidence level is higher than our minimum threshold, we extract the class tag index and calculate the bounding box around the detected object.

Then, extract the (x, y) coordinates of the box, which we will soon use to draw rectangles and display text.

Next, build a text label that contains the CLASS name and confidence.

Using the label, print it to the terminal, and then draw a colored rectangle around the object using the previously extracted (x, y) coordinates.

Usually, you want the label to appear above the rectangle, but if there is no space, we will display it below the top of the rectangle.

Finally, the color text is overlaid on the image with the y value just calculated.

Running result:

Use OpenCV to detect video

Open a new file, name it video_object_detection.py, and insert the following code:

Video_name = '12.mkv'prototxt =' MobileNetSSD_deploy.prototxt.txt'model_path = 'MobileNetSSD_deploy.caffemodel'confidence_ta = 0.1 initialize the list of class labels MobileNetSSD was trained to# detect, then generate a set of bounding box colors for each classCLASSES = ["background", "aeroplane", "bicycle", "bird", "boat", "bottle", "bus", "car", "cat", "chair", "cow", "diningtable" "dog", "horse", "motorbike", "person", "pottedplant", "sheep", "sofa", "train", "tvmonitor"] COLORS = np.random.uniform (0,255, size= (len (CLASSES), 3)) # load our serialized model from diskprint ("[INFO] loading model...") net = cv2.dnn.readNetFromCaffe (prototxt, model_path) # initialze the video stream, allow the camera to sensor to warmup # and initlaize the FPS counterprint ('[INFO] starting video stream...') vs = cv2.VideoCapture (video_name) fps = 30 # FPS that saves the video Size= (600325) fourcc=cv2.VideoWriter_fourcc (* 'XVID') videowrite=cv2.VideoWriter (' output.avi',fourcc,fps,size) time.sleep (2.0) can be adjusted appropriately

Define global parameters:

Video_name: enter the path to the video.

Prototxt: the path to the Caffe prototxt file.

Model_path: the path of pre-training model.

Confidence_ta: the minimum probability threshold for filtering weak detection. The default value is 20%.

Next, let's initialize the class label and bounding box color.

Load the model.

Initialize the VideoCapture object.

Set the VideoWriter object and parameters. The size of the size is determined by the following code and needs to be consistent, otherwise the video cannot be saved.

The next step is to loop the frame of the video, and then input it to the detector for detection. The logic of this part is consistent with the image detection. The code is as follows:

# loop over the frames from the video streamwhile True: ret_val, frame = vs.read () if ret_val is False: break frame = imutils.resize (frame, width=1080) print (frame.shape) # grab the frame dimentions and convert it to a blob (h, w) = frame.shape [: 2] blob = cv2.dnn.blobFromImage (cv2.resize (frame, (300,300)), 0.007843, (300,300) 127.5) # pass the blob through the network and obtain the detections and predictions net.setInput (blob) detections = net.forward () # loop over the detections for i in np.arange (0, detections.shape [2]): # extract the confidence (i.e., probability) associated with # the prediction confidence = detections [0,0, I 2] # filter out weak detections by ensuring the `confidence` is # greater than the minimum confidence if confidence > confidence_ta: # extract the index of the class label from the # `promotions`, then compute the (x, y)-coordinates of # the bounding box for the object idx = int (detections [0,0, I, 1]) box = detections [0,0, I 3:7] * np.array ([w, h, w, h]) (startX, startY, endX, endY) = box.astype ("int") # draw the prediction on the frame label = "{}: {: .2f}%" .format (CLASSES [idx], confidence * 100) cv2.rectangle (frame, (startX, startY), (endX) EndY), COLORS [idx], 2) y = startY-15 if startY-15 > 15 else startY + 15 cv2.putText (frame, label, (startX, y), cv2.FONT_HERSHEY_SIMPLEX, 0.5, COLORS [idx], 2) # show the output frame cv2.imshow ("Frame" Frame) videowrite.write (frame) key = cv2.waitKey (1) & 0xFF # if the `q` key was pressed, break from the loop if key = = ord ("Q"): breakvideowrite.release () # do a bit of cleanupcv2.destroyAllWindows () vs.release () Thank you for reading! This is the end of this article on "how to use deep learning and OpenCV for target detection". I hope the above content can be of some help to you, so that you can learn more knowledge. if you think the article is good, you can share it for more people to see!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.