What is the Tensorflow implementation of YOLOv2 detection process? 07/16 Update SLTechnology News&Howtos

What is the Tensorflow implementation of YOLOv2 detection process?

2025-07-16 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Shulou(Shulou.com)06/01 Report--

YOLOv2 testing process of Tensorflow implementation is how, many novices are not very clear about this, in order to help you solve this problem, the following editor will explain in detail for you, people with this need can come to learn, I hope you can gain something.

1. All the codes are interpreted as follows:

1. Model_darknet19.py:yolo2 network model-darknet19.

YOLOv2 uses a new basic model (feature extractor) called Darknet-19, which includes 19 convolution layers and 5 maxpooling layers, as shown in the following figure. The design principle of Darknet-19 is consistent with that of VGG16 model, which mainly adopts 3-3 convolution. After using 2-2 maxpooling layer, the dimension of feature graph is reduced by 2 times, while the channles of feature graph is increased by 2 times.

The main features are:

(1) the full connection layer fc has been removed.

This greatly reduces the parameters of the network, which is understood to be the reason why yolo2 can increase the probability that each cell generates a bounding box and that each bounding box can correspond to a separate set of categories.

Moreover, the downsampling of the network is 32 times, which also enables the network to receive pictures of any size, so yolo2 has the improvement of Multi-Scale Training multi-scale training: inputting the picture resize to different sizes (in this paper, we choose 320352. 608 ten sizes, downsampling 32 times corresponding to the feature map of 10'10'19'19). For every 10 epoch, resize the picture to a different size and then train. Such a model can adapt to different input picture sizes, such as large input image (608 to 608), high precision and slow speed, small input picture (320 to 320), low precision and high speed, which increases the robustness of the model to different size picture input.

(2) add a BN layer after each convolution layer and no longer use droput

This not only improves the convergence speed of the model, but also can play a certain regularization effect and reduce the over-fitting of the model.

(3) use cross-layer connection to Fine-Grained Features

The input image size of YOLOv2 is 416-416. After 5 times of maxpooling (32 times down sampling), a feature map of 13-13 size is obtained, and convolution is used to predict this feature image. This will cause the small target object to lose its features after 5 layers of maxpooling. So yolo2 introduces the passthrough layer: the dimension of the front feature graph is twice that of the later feature map. The passthrough layer extracts the local area of each 2 / 2 of the front layer, and then converts it into the channel dimension. For the feature map of 26'26 '512, after being processed by the passthrough layer, it becomes a new feature map of 13' 13 '2048, so that it can be connected with the following 13' 13 '1024 feature map to form a feature map of 13' 13 '3072. Then convolution prediction is made on the basis of this characteristic graph. In the later implementation, the author draws lessons from the ResNet network, which does not directly deal with the high-resolution feature image, but adds an intermediate convolution layer, which first uses 64 1-1 convolution kernels for convolution, and then carries on passthrough processing, so that the feature map of 26 "26" 512 gets the feature map of 13 "13" 256. This is a small detail of implementation.

2. Decode.py: decode the parameters obtained by darknet19 network.

3. Utils.py: function, including: preprocessing input pictures, screening bounding box NMS, drawing filtered bounding box.

This paper mainly introduces the calculation method of IOU in NMS: the calculation of IOU in yolo2 only considers the shape, first offsets the center point of anchor and ground truth to the same position (upper left corner of cell), and then calculates the corresponding IOU value.

The difficulty of IOU calculation is to calculate the intersection size: first, to determine whether there is an intersection, and then to calculate the IOU. When calculating, there is a trick, which only calculates the coordinates of the upper left corner and the lower right corner of the intersection. It is calculated by taking max and min:

4. Main.py:YOLO_v2 main function

The corresponding program has three steps:

(1) input the picture into the darknet19 network to get the feature map, and decode it to get the boundary box, confidence and category probability represented by xmin xmax.

(2) filter the decoded regression boundary box-NMS

(3) draw the filtered bounding box

Operating environment:

Python3 + Tensorflow1.5 + OpenCV-python3.3.1 + Numpy1.13

Both windows and ubuntu environments are fine

Preparatory work:

Download the model in the yolo2 inspection model and put it in the yolo2_model folder

Https://pan.baidu.com/s/1ZeT5HerjQxyUZ_L9d3X52w

Document description:

1. Model_darknet19.py:yolo2 network model-darknet19

2. Decode.py: decode the parameters obtained from the darknet19 network

3. Utils.py: function, including: preprocessing input pictures, filtering bounding box NMS, drawing filtered bounding box

4. Config.py: configuration file containing anchor size, 80 classes category names of coco dataset

5. Main.py:YOLO_v2 main function, the corresponding program has three steps:

(1) input the picture into the darknet19 network to get the feature map, and decode it to get the boundary box, confidence and category probability represented by xmin xmax.

(2) filter the decoded regression boundary box-NMS

(3) draw the filtered bounding box

6. Loss.py:Yolo_v2 Loss loss function (train is used, this program is not called during prediction)

(1) the anchor with the highest IOU value matches the ground truth, and the corresponding prediction box is used to predict the ground truth: calculate xywh, confidence c (target value is 1), category probability p error.

(2) the prediction box corresponding to the anchor whose IOU is less than a certain threshold: only the error of confidence c (target value is 0) is calculated.

(3) the prediction box corresponding to the anchor whose IOU is greater than a certain threshold but not max: discard and do not calculate any error.

7. Yolo2_data folder: contains the input image car.jpg to be detected, the output image detection.jpg after detection, and the 80 category names coco_classes.txt of the coco dataset

Run Main.py to get the effect picture:

1. Car.jpg: enter the image to be tested

2. Detected.jpg: visualization of test results

As you can see, compared with yolo1, the detection accuracy of yolo2 is improved after the introduction of anchor (the category confidence of car and person is much higher), and a set of class probabilities corresponding to each bounding box solves the problem that multiple target centers in yolo1 can only detect one object in the same cell (both person on the left have been detected). There is still a certain improvement compared with yolo1.

Is it helpful for you to read the above content? If you want to know more about the relevant knowledge or read more related articles, please follow the industry information channel, thank you for your support.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.