How to improve YOLOv3 for infrared small target detection 07/08 Update SLTechnology News&Howtos

How to improve YOLOv3 for infrared small target detection

2025-07-08 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Shulou(Shulou.com)06/01 Report--

This article introduces how to improve YOLOv3 for infrared small target detection, the content is very detailed, interested friends can refer to, hope to be helpful to you.

1. Infrared small target detection

The target of infrared small target detection is relatively small, and the target is easily confused with other objects, which is challenging to some extent.

In addition, this is essentially a problem in the field of small goals, and many innovations applicable to small goals will also be used for reference.

The data comes from @ Xiaowu

In addition, there is another feature of the data set, that is, it divides the background. Although it is also used to detect small infrared targets, the difference is the background. We have made statistics on the data set and summarized its characteristics by manual review, as shown in the following table:

The background category quantity characteristic data difficulty test mAP+F1 suggests that the trees581 background is clean, the target is obvious, the quantity is low 0.99mm 0.97 is not as clean as the cloudless_sky1320 background, the target is obvious, the quantity is low, the quantity is 0.98mm 0.99, the architecture506 background changes greatly, the target shape changes greatly, the quantity is large, the 0.92+0.96focal losscontinuous_cloud_sky878 background is clean, and the target shape does not change much. However, individual targets are easy to occur and cloud confusion in the background. Generally, the shape of the 0.93+0.95focal losscomplex_cloud561 target is basically unchanged, but the background has a great influence on the location of the target. It is difficult to 0.85+0.89focal losssea17 the background clean, the target is obvious, and the quantity is very small. Generally, 0.87-0.88 generates high-quality new samples, which can be converted into simple samples (Mixup). The sea_sky45 background changes greatly, and the number of targets in a single image varies greatly. There are dense difficulties and a small number of difficult 0.68+0.77paste strategies

Through the above results, we can see that different backgrounds have a great impact on the results, and the last column also gives targeted suggestions for follow-up implementation.

two。 Experimental process

First of all, we use the U version of yolov3: https://github.com/ultralytics/yolov3. At that time, both YOLOv4/5 and PPYOLO were not available. At that time, there was an e-book called "Learning YOLOv3 from scratch", which was written while working on a project. The article on adding attention mechanism to YOLOv3 was very popular (you can write a lot of articles, it's important to graduate:)

The code and modifications of our project can be found at: https://github.com/GiantPandaCV/yolov3-point

Convert a dataset to a dataset in VOC format. The previous article described in detail how to convert a dataset into a standard VOC dataset and how to convert a dataset in VOC format to a U version. At that time, when I came into contact with several projects, I needed to use YOLOv3. Because I needed to convert each time, I probably called 4 or 5 scripts respectively. I felt very tired, so I spent a period of time building an one-click script library from VOC to U-version YOLOv3 format: https://github.com/pprp/voc2007_for_yolo_torch.

At this point, our project is ready to run, and then a lot of details have been adjusted.

2.1 modify Anchor

The Anchor of infrared small target data sets such as Anchor and COCO is very different. In order to converge better and faster, a set of scripts summed up by BBuf is used to calculate Anchor.

# coding=utf-8

Import xml.etree.ElementTree as ET

Import numpy as np

Def iou (box, clusters):

Calculate the intersection ratio (IOU) of a ground truth boundary box and k prior boxes (Anchor).

Parameter box: tuple or data, which represents the length and width of ground truth.

Parameter clusters: a numpy array in the form of (kjingo 2), where k is the number of clustering Anchor boxes

Return: the intersection and merge ratio of ground truth and each Anchor box.

X = np.minimum (clusters [:, 0], box [0])

Y = np.minimum (clusters [:, 1], box [1])

If np.count_nonzero (x = = 0) > 0 or np.count_nonzero (y = = 0) > 0:

Raise ValueError ("Box has no area")

Intersection = x * y

Box_area = box [0] * box [1]

Cluster_area = clusters [:, 0] * clusters [:, 1]

Iou_ = intersection / (box_area + cluster_area-intersection)

Return iou_

Def avg_iou (boxes, clusters):

Calculate the mean of the cross-union ratio of a ground truth and k Anchor.

Return np.mean ([np.max (iou (boxes [I], clusters)) for i in range (boxes.shape [0])])

Def kmeans (boxes, k, dist=np.median):

K-means clustering using IOU value

Parameter boxes: a ground truth box of shape (r, 2), where r is the number of ground truth

Parameter k: number of Anchor

Parameter dist: distance function

Return value: K Anchor boxes with shape (k, 2)

# that is, r mentioned above

Rows = boxes.shape [0]

# distance array to calculate the distance for each ground truth and k Anchor

Distances = np.empty ((rows, k))

# Last Anchor index closest to each ground truth

Last_clusters = np.zeros ((rows,))

# set random number seed

Np.random.seed ()

# initialize the clustering center, k clusters, and randomly select k clusters from r ground truth

Clusters = boxes [np.random.choice (rows, k, replace=False)]

# start clustering

While True:

# calculate the distance between each ground truth and k Anchor, using 1-IOU (box,anchor) to calculate

For row in range (rows):

Distances [row] = 1-iou (boxes [row], clusters)

# for each ground truth, select the Anchor with the smallest distance, and store the index

Nearest_clusters = np.argmin (distances, axis=1)

# if the current Anchor index closest to each ground truth is the same as the previous one, clustering ends

If (last_clusters = = nearest_clusters) .all ():

Break

# Update the average value of all ground truth boxes in the cluster center

For cluster in range (k):

Clusters [cluster] = dist (boxes [boxes _ clusters = = cluster], axis=0)

# Update the Anchor index closest to each ground truth

Last_clusters = nearest_clusters

Return clusters

# load your own dataset and only need all the xml files marked out by labelimg

Def load_dataset (path):

Dataset = []

For xml_file in glob.glob ("{} / * xml" .format (path)):

Tree = ET.parse (xml_file)

# Image height

Height = int (tree.findtext (". / size/height"))

# Picture width

Width = int (tree.findtext (". / size/width"))

For obj in tree.iter ("object"):

# offset

Xmin = int (obj.findtext ("bndbox/xmin")) / width

Ymin = int (obj.findtext ("bndbox/ymin")) / height

Xmax = int (obj.findtext ("bndbox/xmax")) / width

Ymax = int (obj.findtext ("bndbox/ymax")) / height

Xmin = np.float64 (xmin)

Ymin = np.float64 (ymin)

Xmax = np.float64 (xmax)

Ymax = np.float64 (ymax)

If xmax = = xmin or ymax = = ymin:

Print (xml_file)

# put the length and width of Anchor into dateset, and run kmeans to get Anchor

Dataset.append ([xmax-xmin, ymax-ymin])

Return np.array (dataset)

If _ _ name__ = ='_ _ main__':

ANNOTATIONS_PATH = "F:\ Annotations" # folder where xml files are located

CLUSTERS = 9 # clustering number, anchor number

INPUTDIM = 416 # enter the network size

Data = load_dataset (ANNOTATIONS_PATH)

Out = kmeans (data, k=CLUSTERS)

Print ('Boxes:')

Print (np.array (out) * INPUTDIM)

Print ("Accuracy: {: .2f}%" .format (avg_iou (data, out) * 100))

Final_anchors = np.around (out [:, 0] / out [:, 1], decimals=2) .tolist ()

Print ("Before Sort Ratios:\ n {}" .format (final_anchors))

Print ("After Sort Ratios:\ n {}" .format (sorted (final_anchors)

By browsing the script, you can see whether the Anchor has anything to do with the input resolution of the image, which was asked by a lot of people at that time. The result from the kmeans function is actually normalized to 0-1, and then the output of Anchor is multiplied by the input resolution. So I think Anchor is related to the input resolution of the picture.

In addition, Anchor calculation is also available in U version, as follows:

Def kmean_anchors (path='./2007_train.txt', nasty 5, img_size= (416,416)):

# from utils.utils import *; _ = kmean_anchors ()

# Produces a list of target kmeans suitable for use in * .cfg files

From utils.datasets import LoadImagesAndLabels

Thr = 0.20 # IoU threshold

Def print_results (thr, wh, k):

K = k [np.argsort (k.prod (1))] # sort small to large

Iou = wh_iou (torch.Tensor (wh), torch.Tensor (k))

Max_iou, min_iou = iou.max (1) [0], iou.min (1) [0]

Bpr, aat = (max_iou > thr). Float (). Mean (), (

Iou > thr) .float () .mean () * n # best possible recall, anch > thr

Print ('% .2f iou_thr:% .3f best possible recall,% .2f anchors > thr'%

(thr, bpr, aat))

Print (

'kmeans anchors (nasty% g, img_size=%s, IoU=%.3f/%.3f/%.3f-min/mean/best):'

% (n, img_size, min_iou.mean (), iou.mean (), max_iou.mean ())

End='')

For I, x in enumerate (k):

Print ('% iJing% i'% (round (x [0]), round (x [1])

End=','if I

< len(k) - 1 else '\n') # use in *.cfg return k def fitness(thr, wh, k): # mutation fitness iou = wh_iou(wh, torch.Tensor(k)).max(1)[0] # max iou bpr = (iou >

Thr) .float () .mean () # best possible recall

Return iou.mean () * bpr # product

# Get label wh

Wh = []

Dataset = LoadImagesAndLabels (path

Augment=True

Rect=True

Cache_labels=True)

Nr = 1 if img_size [0] = = img_size [1] else 10 # number augmentation repetitions

For s, l in zip (dataset.shapes, dataset.labels):

Wh.append (l [:, 3:5] *

(s / s.max ()) # image normalized to letterbox normalized wh

Wh = np.concatenate (wh, 0). Repeat (nr, axis=0) # augment 10x

Wh * = np.random.uniform (img_size [0], img_size [1]

Size= (wh.shape [0])

1)) # normalized to pixels (multi-scale)

# Darknet yolov3.cfg anchors

Use_darknet = False

If use_darknet:

K = np.array ([[10,13], [16,30], [33,23], [30,61], [62,45])

[59,119], [116,90], [156,198], [373,326])

Else:

# Kmeans calculation

From scipy.cluster.vq import kmeans

Print ('Running kmeans for g anchors on g points...' (n, len (wh)

S = wh.std (0) # sigmas for whitening

K, dist = kmeans (wh / s, n, iter=30) # points, mean distance

K * = s

K = print_results (thr, wh, k)

# Evolve

Wh = torch.Tensor (wh)

F, ng = fitness (thr, wh, k), 2000 # fitness, generations

For _ in tqdm (range (ng), desc='Evolving anchors'):

Kg = (

K.copy () *

(1 + np.random.random () * np.random.randn (* k.shape) * 0.30) .clip (

Min=2.0)

Fg = fitness (thr, wh, kg)

If fg > f:

F, k = fg, kg.copy ()

Print_results (thr, wh, k)

K = print_results (thr, wh, k)

Return k

This is similar to the method used in the hyperparametric search article, and it is also a genetic algorithm-like method to find the right Anchor through generation-by-generation screening. The author has not compared the above two methods. If you are interested, you can try these two methods and compare them.

Anchor sets three different numbers for clustering:

3 anchor:

13, 18, 16, 22, 19, 25

6 anchor:

12,17, 14,17, 15,19, 15,21, 13,20, 19,24

9 anchor:

10,16, 12,17, 13,20, 13,22, 15,18, 15,20, 15,23, 18,23, 21,26

2.2 Building Baseline

Because the data set is single-class and relatively simple compared with VOC and other data sets, we do not intend to use a deep neural network such as Darknet53. The Baseline is a YOLOv3-tiny model. In the case of using the original Anchor, this model can achieve the result that mAP@0.5=93.2%, achieves mAP@0.5=0.869 on the test set.

Then change the Anchor, replace the original Anchor with the new Anchor obtained in the previous section, and the model to be changed is yolov3-tiny-6a:

EpochModelPRmAP@0.5F1datasetbaselineyolov3-tiny original 0.9820.9390.9320.96validbaselineyolov3-tiny original 0.960.8730.8690.914test6ayolov3-tiny-6a0.9730.980.9840.977valid6ayolov3-tiny-6a0.9360.9250.9150.931test

We can see that almost all the indicators have been improved, which shows that the introduction of Anchor a priori is necessary.

2.3 partial improvements to the dataset

The above has been analyzed, the background still has a certain impact on the results of target detection, so we have used several methods to improve.

The first one: oversampling

By counting the number of images with different backgrounds, for example, there are only 17 images with sea as background, while the largest number of images with cloudless_sky as background is 1300 +, which produces a serious imbalance. Obviously, the background of cloudless_sky is very simple, and the background of sea is more difficult, so due to the imbalance of data, the trained model is likely to work well in images such as cloudless_sky, but not in other backgrounds.

So first of all, we should use the oversampling method, the oversampling here may be different from that in other places, which refers to the expansion of some pictures with a small number of backgrounds by copying.

EpochModelPRmAP@0.5F1datasetbaseline (os) yolov3-tiny original 0.9850.9710.9730.978validbaseline (os) yolov3-tiny original 0.9360.8710.860.902testbaselineyolov3-tiny original 0.9820.9390.9320.96validbaselineyolov3-tiny original 0.960.8730.8690.914test

Unfortunately, the experimental results do not support the idea. Let's analyze it together. Ps:os stands for over sample

Then a sub-background test is conducted, and the results are as follows:

Sub-background test after equilibrium

DatanummodelPRmAPF1trees506yolov3-tiny-6a0.9240.9960.9810.959sea_sky495yolov3-tiny-6a0.9270.9780.7710.85sea510yolov3-tiny-6a0.9230.9350.8930.929continuous_cloud_sky878yolov3-tiny-6a0.9570.950.9330.953complex_cloud561yolov3-tiny-6a0.9430.8330.8310.885cloudless_sky1320yolov3-tiny-6a0.9930.9810.9840.987architecture506yolov3-tiny-6a0.9590.9520.9410.955

From the sub-background results, it is true that the result of sea training data is very good, mAP increased by 2 points, but mAP such as complex_cloud decreased. To sum up, the background class mAP which has little data in the training set has been improved, but the other background mAP which has a large number of data has been slightly reduced or maintained.

Second: copy the small target anywhere in the picture

Modified version address: https://github.com/pprp/SimpleCVReproduction/tree/master/SmallObjectAugmentation

The specific idea is to first dig out all the small goals and set them aside. Then these small targets are copied on the image, which requires that the coincidence rate between the two pairs can not reach a threshold and the position of the copy can not exceed the image boundary.

The effect is as follows: (this is a schematic diagram, which is exaggerated and has a large number of copies.

Enhancement result

This practice comes from the relatively new paper "Augmentation for small object detection" at that time, and the best result in this paper is that it is copied 1 or 2 times. In fact, we have tried one, two, three or more times in the project, but the results are not satisfactory, and the results are too bad to be recorded. (it is said that the best combination of the effect shown in this paper is the original image + enhanced image, and the best result is improved by 1%.) ╮ (╯ effects ╰) ╭

2.4 modify Backbone

Modify Backbone is often asked by group friends about such a thing, after modifying the backbone network can not load the weight of pre-training, how to do?

There are several ways to do this:

Simply do not load, re-training, simple problems (such as infrared small targets) re-convergence effect is not inferior to those with pre-training weights.

If you do not want to change the code, you can choose to modify the part after Backbone and before YOLO Head (for example, the location of SPP is in this case)

Ability is relatively strong, you can change the model to load part of the code, skip your newly added module, so that it can also be loaded (the author has not tried, do not find me).

Modify Backbone we also enter from several directions, divided into attention module, plug and play module, modify FPN, modify activation function, replace backbone and SPP series with mature network.

1. Attention module

Most of the attention modules used in this project have written code parsing on the official account, which you can look at if you are interested. Some time ago, the author published an e-book "plug and play module in convolution neural network" because there are many attention modules in this project, so I began to sort out the results. The specific module is still being updated: https://github.com/pprp/SimpleCVReproduction

At that time, the experimental modules are: SE, CBAM and so on, because the Baseline is a little high at that time, the effect is not very ideal. (it is impossible to increase the percentage point by inserting the attention module as expected, and more parameters are needed to exceed the original percentage point.) according to group feedback, the success rate of SE direct insertion is relatively high. In a target detection competition, the author saw a big shot who added a CBAM to each of the three branches of YOLOv3's FPN, and finally surpassed Cascade R-CNN and other models to win the championship.

two。 Plug and play module

The attention module also belongs to the plug and play module. This part refers to the non-attention module, such as FFM, ASPP, PPM, Dilated Conv, SPP, FRB, CorNerPool, DwConv, ACNet and so on. The effect is OK, but it does not exceed the current best results.

3. Modify FPN

It took a long time for FPN to come up with a dt-6a-bifpn (dt for dim target infrared targets; 6a for 6 anchor). Disappointingly, this BiFPN didn't work well, and it was even worse on the test set. It may be because there is something wrong with the implementation of cfg. Feedback is welcome.

We all know that it is very painful to change the network structure by changing cfg, so recommend a visualization tool:

Https://lutzroeder.github.io/netron/

In addition, in order to find the number of lines, the author wrote a simple script to find the number of lines.

Import os

Import shutil

Cfg_path = ". / cfg/yolov3-dwconv-cbam.cfg"

Save_path = ". / cfg/preprocess_cfg/"

New_save_name = os.path.join (save_path,os.path.basename (cfg_path))

F = open (cfg_path,'r')

Lines = f.readlines ()

# remove content that begins with # and belongs to the comments section

# lines = [x for x in lines if x and not x.startswith ('#')]

# lines = [x.rstrip () .lstrip () for x in lines]

Lines_nums = []

Layers_nums = []

Layer_cnt =-1

For num, line in enumerate (lines):

If line.startswith ('['):

Layer_cnt + = 1

Layers_nums.append (layer_cnt)

Lines_nums.append (num+layer_cnt)

Print (line)

# s = s.join ("")

# s = s.join (line)

For iJournal num in enumerate (layers_nums):

Print (lines_nums [I], num)

Lines.insert (lines_ Nums [I]-1,'# layer-%d\ n'% (num-1))

Fo = open (new_save_name,'w')

Fo.write ('.join (lines))

Fo.close ()

F.close ()

We also tried to use only one, two, and three YOLO Head, and the result was 3 > 2 > 1, but the effect of using 3 and 2 was almost the same, and the difference was similar to the difference of 3 decimal places, so we chose two YOLO Head.

4. Modify activation function

The activation function used by YOLO by default is leaky relu, and mish is used in the activation function. The effect did not improve, so it came to a dead end.

5. Replace backbone with mature network

Network structures such as ResNet10 (third-party implementation), DenseNet, DenseNet modified by BBuf, ENet, VOVNet (modified by yourself), csresnext50-panet (provided by darknet in the AB version at that time), PRN (not very useful) are used here.

At present, the strongest network is dense-v3-tiny-spp, that is, the structure of the authentic SPP combination of Backbone+ modified by BBuf completely destroys other models, and achieves the results of mAP@0.5=0.932 and F1 0.951 on the test set.

6. SPP series

To talk about this, the three of us have studied a lot of papers and referenced a lot of trick, most of which are ineffective, and the module that never disappoints is SPP. We have made an in-depth study of SPP, which is mentioned in "various pooling operations in convolution neural networks".

SPP was proposed in SPPNet, and SPPNet proposed earlier, after RCNN, to solve the two problems of repeated convolution calculation and fixed output. The specific method is shown in the following figure:

Get the windows on feature map through selective search, then enter these areas into CNN, and then classify them.

In fact, SPP is a combination of multiple spatial pooling, using different window sizes and steps for different output scales to ensure the same output scale. at the same time, it can integrate a variety of scale features extracted from the pyramid and extract richer semantic information. RPN network is commonly used in multi-scale training and target detection.

In YOLOv3, there is a network structure called yolov3-spp.cfg, which tends to achieve higher accuracy than yolov3.cfg itself. The specific cfg is as follows:

# SPP #

[maxpool]

Stride=1

Size=5

[route]

Layers=-2

[maxpool]

Stride=1

Size=9

[route]

Layers=-4

[maxpool]

Stride=1

Size=13

[route]

Layers=-1,-3,-5,-6

# End SPP #

The SPP here is equivalent to a variant of the original SPPNet. By using the maxpool of multiple kernel size, all the feature map is finally concate to get a new feature combination.

Let's take a look at the official comparison of yolov3 and yolov3-spp on the COCO dataset:

You can see that with almost no increase in FLOPS, YOLOv3-SPP is nearly 3 percentage points higher than YOLOv3- 608mAP.

Analyze the reasons why SPP is effective:

From the perspective of receptive field, it is obvious that the operation of maxpool has a great influence on the receptive field when calculating the receptive field, which mainly depends on the size of kernel size. In SPP, the use of maxpool with very large kernel size will greatly improve the receptive field of the model. The author has not calculated the receptive field of darknet53 this backbone in detail. It is likely to be effective on COCO because the receptive field of backbone is not large enough. The second perspective is from the perspective of Attention, which is inspired by CSDN@ Xiaoleng, who says in his article:

The reason for the improvement of detection effect: through the spp module to achieve the fusion of local features and global features (so the largest pooled core of the spatial pyramid pool structure should be as close as possible to the size of the pooled featherMap), enrich the expression ability of the final feature graph, so as to improve MAP.

Many Attention mechanisms are designed to solve the problem of remote dependence, which can be solved at a relatively small computational cost by using kernel size to approach the size of the feature graph. In addition, if you use the SPP module, it is not necessary to continue to use other spatial attention modules such as SK block after SPP, because they work similarly and may have some redundancy.

In this experiment, we do get a very important conclusion, that is:

SPP is valid, where the setting of size should be close to the size of the feature map at this layer

Let's take a look at the results of the experiment.

SPP series experiments

EpochModelPRmAPF1datasetbaselinedt-6a-spp0.990.9830.9840.987validbaselinedt-6a-spp0.9550.9480.9290.951test Direct connection + 5x5dt-6a-spp-50.9780.9830.9810.98valid Direct connection + 5x5dt-6a-spp-50.9330.930.9140.932test Direct connection + 9x9dt-6a-spp-90.990.9830.9820.987valid Direct connection + 9x9dt-6a-spp-90.9390.9230.9040.931test Direct connection + 13x13dt-6a-spp-130.9950.9830.9830.989valid Direct connection + 13x13dt-6a-spp-130. 9590.9410.930.95test Direct connection + 5x5+9x9dt-6a-spp-5-90.9880.9880.9810.988valid Direct connection + 5x5+9x9dt-6a-spp-5-90.9370.9360.910.936test Direct connection + 5x5+13x13dt-6a-spp-5-130.9930.9880.9850.99valid Direct connection + 5x5+13x13dt-6a-spp-5-130.9360.9390.910.938test Direct connection + 9x9+13x13dt-6a-spp-9-130.9810.9850.9830.983valid Direct connection + 9x9+13x13dt-6a-spp-9-130.9250.9340.9070.93test

The current feature map size is 13x13, and the experimental results show that the effect of using 13x13 directly is almost the same as that of SPP, and the amount of computation is reduced.

2.5 modify Loss

Loss tried focal loss, but after adjusting the two parameters alpha and beta, the network could not converge either by default or slowly by itself, so an issue: https://github.com/ultralytics/yolov3/issues/811 was given to the author at that time.

Glenn-jocher said that if the effect is not good, do not use: (

The author replied

BBuf also studied it for a long time and found that focal loss can be used in Darknet, but the effect is mediocre. In the end, focal loss came to a dead end. In addition, we also try to adjust ignore thresh to match focal loss, and the experimental results are as follows (the experiment is completed under the AB version of Darknet):

StatemodelPRmAPF1dataignore=0.7dt-6a-spp-fl0.970.970.97550.97validignore=0.7dt-6a-spp-fl0.960.930.92940.94testignore=0.3dt-6a-spp-fl0.950.990.98740.97validignore=0.3dt-6a-spp-fl0.890.920.91030.90test3. Empirical summary

In the course of this experiment, the discussion with BBuf has a lot of inspiration and summary, which is made public here. (some of the conclusions may not be rigorous enough. If you are interested, you can do a comparative experiment.)

The SPP layer is valid, and the Size setting works better when it is close to feature map. When YOLOv3, YOLOv3-SPP and YOLOv3-tiny detect the same object, the confidence of the object given by YOLOv3-tiny is lower than that of the other two models. (in fact, it can be visually understood that YOLOv3-tiny 's brain capacity is relatively small, so I am not sure.) personally, I feel that Concate's method is softer than Add's method and is better for small targets. The result of this experiment is that the effect of DenseNet as Backbone is the best. The problem of multi-scale training is not mentioned in this article. Multi-scale training is effective for problems with wide scale distribution, such as data sets such as VOC. However, it is counterproductive to the dataset with a single scale, for example, the target scale of the infrared small target dataset is relatively uniform, which is very small. Anchor has a great influence on the model, and a priori unreasonable Anchor will lead to more mismatches, thus reducing Recall. When discussing with friends at that time, I mentioned an idea that shallow information is more useful for small goals, so when FPN is carried out, we should not simply Add or Concate the two, but should complete them in a certain proportion. For example, for small goals, introduce more shallow information to increase the weight of the shallow network; large goals are the opposite. Later, through reading, it is found that this idea has been realized by ASFF, and the idea is relatively perfect. The Upsample layer in PyTorch is not reproducible. If you have a card, you can try hyperparametric evolution.

The above is part of the whole experimental process, and we also encountered a lot of difficulties in the later stage. we wanted to lighten the project, but we didn't continue because of various reasons. In this process, we should sum up the lessons. The experimental instructions and backup should be done well, and the modified data set, the weight obtained by training, and the change points at that time should be well backed up. Now looking back at the previous experimental records and cfg files, we can not remember where the changes in some models are, or the arrangement is not detailed enough, and the experimental records are too messy.

On how to improve YOLOv3 infrared small target detection is shared here, I hope the above content can be of some help to you, can learn more knowledge. If you think the article is good, you can share it for more people to see.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.