Practical Trick example Analysis of Detection Optimization 07/06 Update SLTechnology News&Howtos

Practical Trick example Analysis of Detection Optimization

2025-07-06 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Shulou(Shulou.com)06/01 Report--

Detection and optimization of practical Trick example analysis, in view of this problem, this article introduces the corresponding analysis and solution in detail, hoping to help more partners who want to solve this problem to find a more simple and feasible method.

For the convolution network, what you learn is the data distribution of the data set, and your convolution kernel parameters finally form the cognition of the feature distribution in the data set.

Write this article because of the invitation of the up master, and then share some of the experience and skills summed up at work, which may not be applicable to other networks, and some may be counterproductive, so it is to provide you with an idea, welcome to pat bricks, because it is all company data, and it is more troublesome to share the test results, so just look at a train of thought.

1. Sharing of preprocessing skills

The main purpose of adjusting the parameters of the image preprocessing part is to enhance the input data, so that the network model can pay more attention to the learning of the target features in the process of training. The common methods are random rotation, cropping and flipping of images. The essence of the preprocessing of these methods is to enrich your data set and let the network learn more distribution. There are already many blogs on this Internet, so the author will not repeat them. Another kind of adjusted trick is to superimpose information on the image, such as adding Gaussian noise and salt and pepper noise to the input data, so as to improve the target detection ability of the network in the case of interference and poor imaging. This paper will expand and explain the parameter adjustment technique of superposition information on the image.

1. Why is it useful to overlay information on the original image without destroying the original image information?

The superposition information on the image is divided into two categories, one is superimposed noise, the purpose of this operation is to make the network adapt to the image detection task in the case of poor image quality, the measured results at the end of the board, this way of superimposing noise can also improve the adaptability of the network to the affine transformation of the target to be detected in the input image to a certain extent. By adding a certain amount of Gaussian noise to the input image, the accuracy of license plate detection based on yolov3-tiny is improved by 0.5% on Hi3516CV500.

The other kind of information superposition is to try to enhance a specific feature on the image, the purpose of this enhancement is to highlight the image features, so that the network can first pay attention to the features and pay more attention to the learning of such features. Because this method only changes the specified feature or position in the image, and will not change the structure of the image as a whole, so it will not destroy the readability of the image information. For example, canny operator is used to enhance the edge features in the image.

2. How did you come up with the two adjustment techniques mentioned above?

As we all know, the network learns the distribution of parameters in the data. He Kaiming also mentioned in retinanet's paper that the imbalance of data is the main factor affecting the performance of the testing network, and the proposal of focal loss is to make the network focus more on missed and misclassified samples. So we went on to think, how to make the network more focused on the target area, so as to get as many mispicked and missed samples as possible? Flowers attract bees by their own fragrance and brighter appearance, so we also need to make the target area more "conspicuous". At ordinary times, when we train the detection network, we find that when we label the data set, the anchor base algorithm, when the target label frame is about a few pixels larger than the actual object compact frame, the statistical accuracy of the detection results and the stability of the location frame will be good, so can we think that For the detection network of anchor base, the edge of the target object is also very important, so we think of using canny operator to enhance the edge to enhance the training data.

3. How to apply the above skills in the actual network training?

The actual use process is as follows:

1) determine a better canny threshold through manual inspection or automatic statistics of the input data.

2) use the threshold to enhance the canny edge of the data of 10-20 batch in the training sample.

3) the enhancement method is as follows: the pixel position of the original image corresponding to the canny edge extracted from the original image is enhanced by contrast or blackened directly. The degree of deepening can be specified by the custom hyperparameter alpha.

4) use the data of 10-20 batch to do several epoch training, and then replace it with ordinary data for training.

2. Adjustment of model training parameters.

Talked about the data enhancement during the pre-training, followed by the model training parameters. In fact, this part of the trick on the Internet a lot, we usually pay attention to collect or look on the github, there are a lot of people's training warehouse can follow. I will not say any more, because I dare not say that I have learned all. Let me tell you some of my own thoughts when I actually ran the model.

1. BFEnet feature erasure network

This network is in the direction of reid. The reason to talk about this first is that this feature erasure is essentially similar to the noise mentioned above. By masking some eigenvalues during training, the network is accustomed to a certain amount of noise interference, thus enhancing performance. This technique can be used to deal with models in obscured scenes.

1. Adjustment of anchor

In the yolo code, you must have seen that the author clustered the k-means according to the length and width of the calibrated target in the dataset you gave, and then determined the anchor on the current dataset. My experience here is that I find that someone has asked me why I only train one class of tests, and then recalculate the size of the anchor6 or 9 anchor is not big, but in the actual test, it can not detect anything. My conclusion is that the design of anchor should be fine-tuned rather than completely recalculated based on the model author's default anchor.

Reason: as we all know, for yolov3, the output is three feature graphs, corresponding to small goals, medium goals and large goals. For example, the proportion of the target we want to detect in the image should be relatively large, and then the boxes we count are also relatively large in size, but in actual training, it does not mean that the big target must be output by the big target output layer originally designed by yolov3. It is likely that the output is from the middle target layer, and because the design of the anchor is too large, the trained network does not converge, and there are cases where the target can not be detected.

Solution: when designing an anchor, first count the distribution of the target box, then cluster, and then replace or modify several original anchor values in the original 9 anchor that are similar to the anchor you calculated. Then train, if the box is still not tight enough, and then fine-tune a few boxes, the core idea is: the distribution of anchor should also meet the sparse coverage of the complete set, not just your current data set.

2. Optimization of post-processing

Strictly speaking, the optimization part of post-processing is not the trick of network training. It should be the deployed trick. For example, when Hayth's NPU is deployed, it will limit the larger pool cores. Therefore, it is best to switch the large pooling to several small continuous pooling during training. Although the two should be similar in concept, they are actually 0.3% less accurate. (refers to the direct multi-layer pooling conversion to the board and training is a large pooling, and then changed into several small pooling during the conversion)

There is also a part of nms, in which a classmate asked me that because my data set was occluded, maybe two closer ones, nms removed the occluded small target. A tip shared in this section is that when you calculate nms, also pay attention to the distance between the center points of the two boxes, you can set the distance between the center points of the two boxes, do not do nms. This avoids the arbitrary deletion of the test result bbox by part of the nms.

3. A training skill in large model training.

A classmate once asked me why the same model quickly reached 97% MAP when training with less data, but stuck at 93% after training for 300w large data sets. There is a technique called warm up, that is, when big data sets the training model, you can first take part of the data training model from the big data set, and then use this training model as the pre-training model. On the big data set, increase the batch_size and then train, at least not stuck on the 93% problem.

4. Manual correction strategy of learning rate

When we train, we usually set the attenuation of the learning rate, and there are many ways to correct it according to the iterative step, according to the current loss value, and according to the gap difference between the current loss value of the training set and the loss value calculated by the test set. The skills I mentioned here are, for example, when you can adjust the learning rate automatically and when you need to adjust it manually.

When we train the model, we generally pay attention to the loss function change curve. In the curve, the sparsity of the data set can be reflected by the oscillation of the loss curve. If there are individual jump points, it is mostly bad data in the data set (marking wrong data). When our loss diagram shows shock-step-shock near another loss value, we should pay attention to it. At this time, it is mostly because the data is not very scattered when your data set is disrupted, so you can stop training and record the current state at this position, then reduce the learning rate and continue the training. when the training data starts to restore the previous concussion position again, then resume the learning rate training.

The reason for this operation is to avoid introducing excessive noise into the parameters, there are two kinds of noise, one is the wrong data, such as the background, something like the target but not the target, and when training in multiple categories, for each category, the remaining categories are also regarded as a kind of noise. So either get the data set right (it's hard, and I haven't read anyone's article that really says what it's like to get the training set right), either increase the batch, or pay attention to it during training.

This is the answer to the practical Trick sample analysis question about detection optimization. I hope the above content can be of some help to you. If you still have a lot of doubts to be solved, you can follow the industry information channel for more related knowledge.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.