Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

Thoughts on how to carry out mAP calculation

2025-01-17 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Share

Shulou(Shulou.com)06/01 Report--

This article mainly analyzes the relevant knowledge points of how to carry out mAP calculation, the content is detailed and easy to understand, the operation details are reasonable, and has a certain reference value. If you are interested, you might as well follow the editor to have a look, and follow the editor to learn more about "how to think about mAP calculation".

1. Basic requirements

From an intuitive understanding, the performance of a target detection network is good, mainly as follows:

Detect all the targets in the picture-less missed detection

Background is not detected as a target-less false detection

The target category conforms to the reality-the classification is accurate.

The target frame fits well with the edge of the object-- accurate positioning.

Meet the requirements of operating efficiency-- calculate quickly

The following figure shows a partial list of models intercepted from Tensorflow Object Detection API's Model Zoo.

The fast calculation is reflected by Speed. And other factors, the use of mAP (mean average Precision) this index to reflect comprehensively.

Both mean and average mean average, so literally, mAP indicators are averaged in at least two ways.

2. MAP index calculation

The calculation of mAP can be roughly broken down into the following steps:

Phase output key variables for a single target TP, FP, FNIOU (intersection ratio) for a single category PR-Curve, APConfidence (confidence) for test set full set mAP

2.1 output test results

For different types of target detection networks, the original output of model reasoning may be in various forms.

First of all, you need to complete the target decoding and generate a standardized list of targets, including at least the

Category

2DBBox

Confidence degree

The confidence here is different from the confidence threshold when reasoning is used, there is no fixed threshold, as long as there is a response in the channel, it is output as the target.

The meaning of confidence value is different for different types of networks, so it is impossible to set a unified threshold.

2.2 for a single target

For a single target, the problem is reduced to the determination of classification results.

First, GT (Ground Truth) and Predictions are grouped according to their respective categories.

Within each category, two sets of data, GT and Predictions, are matched based on IOU.

The matching results may be as follows:

TP (True Positive): a correct test, IOU ≥ threshold. That is to say, the predicted target is classified correctly, and the frame coincides with the GT highly enough.

FP (False Positive): an error detection that does not match the target of GT, or IOU

< threshold。即预测的目标分类不正确,或边框与 GT 重合度不够高。 FN (False Negative):漏检的 GT。即没有被匹配上目标的 Ground Truth。 2.3 针对单个类别 完成对整个测试集上每个目标的判断后,分类别统计 Precision 和 Recall: Precision:准确率(查准率),模型只找正确目标的能力。 Precision = TP / (TP + FP),其中 TP+FP 即模型找到的所有目标。 Recall:召回率(查全率),模型找到所有目标的能力。 Recall = TP / (TP + FN),其中 TP+FN 即所有的 GT。 在单个目标的处理步骤中,只需要记录TP。则对于所有检测目标,非TP即为FP,对于所有 GT,非TP即为FN。 当设定不同的 Confidence 阈值时,输出的检测目标数量不同,由此计算得到的 Precision 和 Recall 也不同。 根据不同的阈值,可以得到一系列 Precision 和 Recall 的值,连起来即可得到PR曲线。 在实际操作中,将所有目标按照置信度从高到低排序,每一步都只累加一个目标,统计当前的P-R值。 PR曲线示例如上图所示。 PR曲线会有折线的原因。每累加一步,如果: 当前累加的目标是FP,则 Recall 值不变, Precision 值变小,对应图中竖直向下的线段; 当前累加的目标是TP,则 Recall 和 Precision 都变大,对应图中斜向右上方的线段。 2.4 针对全集 至此,针对每一个类别,都计算得到一条PR曲线。 PR曲线与x轴所围成的面积,即为当前类别的AP值。 Average 是指对不同 Confidence 阈值下的结果进行平均。 所有类别AP值的均值,即为 mAP。 mean 是指对不同类别之间结果进行平均。 3. 存在问题及改进思路3.1 问题 (1)误检/漏检少、分类准、定位准,这些要求并没有在 mAP 的整个计算过程中一直传导到最终结果。 首先按照目标类别进行分类处理。 在对单个目标的处理中,将IOU作为匹配的指标,将检测到的目标二分类为TP/FP。在选定IOU阈值后,TP/FP的分配也就确定了。在后续的步骤中,目标被抽象为正确/错误两类,但正确或错误的程度被忽略了。 下图为 IOU=0.5 时的极限情况示意:

Similar IOU values may actually represent different situations:

In the processing of all targets, the ability to detect correct targets under different Confidence thresholds is mainly investigated.

Therefore, the requirements of less false detection / missed detection, accurate classification and accurate location are organized in stages according to a serial way.

(2) the problem is diversified, using only one indicator, there is no way to know where the current performance bottleneck is.

According to the level of mAP, we can only generally know the overall performance of the network, but it is difficult to analyze where the problem lies.

To give a few examples:

If the output frame of the network fits well, when choosing the appropriate Confidence threshold, the check-out and recall are more balanced, but there are many errors in the category judgment of the target. Due to the first classification according to the results of the category, as long as the category is wrong, the positioning, check-out and recall are very good, and the mAP index will not be high. However, from the observation of the results, it is not clear that the problem lies in the category judgment.

For example, Faster-RCNN series network, if the effect of RPN part is very good, but the effect of RCNN part is very poor, only according to mAP, it is impossible to judge.

If the other performance of the two networks is similar, but the positioning accuracy of the output box is different. For most of the targets determined to be TP, the target IOU value of one network is very high, and the box is very close to the GT;. The target IOU value of the other network has just exceeded the threshold. Theoretically, the mAP values calculated by the two networks are similar, but the actual performance is different.

(3) the focus of mAP index is not completely consistent with that of practical application.

MAP will count the PR values under all Confidence values, and in actual use, it will set a Confidence threshold, and targets below this threshold will be discarded, and this part of the target will also contribute to the statistics of mAP. Some of the rising skills for the competition will pay attention to the impact of this part of the test results on mAP.

In addition, in some concerned points in ADAS applications (especially vehicle detection), mAP indicators are not well reflected. For example:

Most concerned about the near target straight ahead, and relatively low attention to the distant side of the target.

Focus on the lower edge and width of the target box, but not on the upper edge

Pay attention to the stability and continuity of the detection results of the same target in continuous frames

Different types of misjudgments have different severity (for example, it is not a big problem for trucks to be misjudged as passenger cars, but it is more serious for pedestrians to misjudge as vehicles).

3.2 improvement

(1) to investigate the performance under different IOU thresholds.

In the mAP calculation of VOC standard, only one threshold of IOU=0.5 is taken.

This is improved by the MS-COCO standard, which takes 11 thresholds at equal intervals of 0.5, 0. 05, 0. 05 and 0. 95, respectively.

AP: the mean of mAP calculated on all 11 IOU thresholds (the main metric)

AP@.5IOU: the mAP value when the threshold is 0.5 (equivalent to VOC mAP)

AP@.75IOU: mAP value when the threshold is 0.75

In addition, statistics are made for the target size:

AP (small): AP on all 11 IOU thresholds for targets with a pixel area less than 32 ^ 2

AP (medium): targets with pixel areas between 32 ^ 2 and 96 ^ 2, AP on all 11 IOU thresholds

AP (large): AP on all 11 IOU thresholds for targets with a pixel area greater than 96 ^ 2

In addition, there are a series of indicators related to AR (Average Recall).

It can be seen that COCO mAP makes a more comprehensive evaluation of the detection performance and improves the problem of single IOU threshold.

COCO mAP calculation can directly use pycocotools, the test results are provided in accordance with the prescribed format, and the calculation can be completed automatically.

(2) uniform training and evaluation indicators

In the process of network training, the output of location branches is usually optimized by IOU Loss, which has been upgraded to DIOU or CIOU.

Then in the process of testing, the IOU index can also be replaced with a version similar to DIOU or CIOU to achieve more reasonable evaluation and the unity of training and evaluation.

(3) Design more custom indicators

MAP is a benchmark, a defined action that can be used to compare different networks, including open source models, models developed by external teams, and so on.

In addition, based on the performance of the model we are concerned about, we can design some additional indicators. These include:

Some intermediate indicators that can be split out in the process of mAP calculation

Indicators not covered by mAP

(4) threshold selection when using the model.

When using the model for reasoning, the selection of Confidence threshold is involved in the process of target decoding. It is usually an one-size-fits-all approach to choose a unified threshold.

During the calculation of mAP, the PR curve for each category is output. A typical PR curve is shown below:

According to the PR curve, we can not only find the most advantage in the mathematical sense, but also select a trade-off value according to the different tolerance of false detection and missed detection.

According to the different conditions of each category, different Confidence thresholds can be selected to optimize the detection results of each category.

In practice, we can first determine the category according to the results of the class channel, and then screen the detected targets according to the results of the conf channel and different thresholds.

For the cases with large differences in category nature and serious category imbalance, a more ideal output can be achieved.

For example, in ADAS applications, there may be differences in the recognition requirements for vehicle targets, pedestrian targets and traffic sign targets. It is a more reasonable choice to customize the threshold according to their respective PR curves.

This is the end of the introduction on "thinking about how to do mAP computing". More related content can be searched for previous articles, hoping to help you answer questions and questions, please support the website!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Internet Technology

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report