How to achieve accurate directional Target Detection in complex scenes by PIoU Loss 04/29 Update SLTechnology News&Howtos

How to achieve accurate directional Target Detection in complex scenes by PIoU Loss

2025-04-29 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Shulou(Shulou.com)06/01 Report--

This article introduces how to achieve accurate targeted target detection in complex scenes with PIoU Loss. The content is very detailed. Interested friends can use it for reference. I hope it will be helpful to you.

Abstract

Target detection using directional bounding frame (oriented bounding box) can better locate the target with rotational tilt by reducing the overlap with the background region. Most of the existing OBB methods are constructed by introducing additional angle scales (optimized by distance loss) on the horizontal bounding box detector (horizontal bounding box). However, because the distance loss only optimizes the angle error of OBB to a minimum, and its correlation with IoU is loose, it is insensitive to targets with aspect ratio. Therefore, this paper proposes a new loss, namely Pixels-IoU (PIoU) loss, to achieve more accurate OBB regression by using angle and IoU. PIoU loss is derived from the IoU index in pixels, which is simple but suitable for horizontal and directional bounding boxes. In order to prove its effectiveness, this paper evaluates the effect of using PIoU loss in the framework of ancho-based and anchor-free. The experimental results show that PIoU loss can significantly improve the performance of OBB detector, especially for target detection with high aspect ratio and complex background. In addition, the existing evaluation data sets do not contain a large number of aspect ratio targets, so a new data set Retail50K is introduced to encourage people to apply OBB detectors to deal with more complex environments.

OBB:oriented bounding box, the directional target box, refers to the target box that does not limit the tilt angle

HBB:Horizontal bounding box, horizontal target box, default tilt angle is 0

PIOU:Pixels-IoU loss: approximate calculation of the intersection and union area of two box using pixel accumulation instead of coordinates

Problems solved: non-horizontal box recognition (compared with ordinary horizontal target boxes), targets with large aspect ratio (comparing the following figure with the example diagram of Retail50k data set), OBB recognition under complex background (comparing the empty and simple background in the following figure)

This paper presents a data set: Retail50K (supermarket retail shelf data set), with complex background (various beverage bottles, etc.) and HBB target (non-horizontal box, with large aspect ratio).

Related work

Training a detector with rotation invariance based on SSD

Training a rotation detector based on Faster RCNN

Design a RoI converter to learn the rotation invariant feature from BB to OBB

The OBB candidate box is extracted by the generative model, and the selection is determined by the local maximum likelihood.

The existing problem: for remote sensing aerial images, the background is simple, and the object is not a target with large aspect ratio.

Pixels-IOU Loss:

Compared with the traditional loss, OBB (non-horizontal box) has one more tilt angle dimension, so it can not be calculated directly with the common Loss.

Starting from IoU, when we calculate IoU, we need to calculate the intersection and union of two box. Since an image is composed of several pixels, can the region of intersection and union be approximately replaced by the number of pixels inside it?

As shown in the image above, the green dot of p (iMaginj) is a pixel on the image, c is the central point of the OBB box, t (iMagnej) is the intersection of the vertical line of p to the box centerline, the distance from p to t is marked as dh (iMagnej), and the distance from c to t is marked as dw (iMagnej).

The author proposes to use a binary constraint relation to determine whether the pixel p is in the OBB box:

Distance dh and dw are used to determine whether p is in box.

Where θ represents the tilt angle of box, the calculation relationship can be seen in the following figure:

Calculating the intersection and union of two frames by accumulating pixels

Since the aforementioned binary constraint relation is discontinuous and non-differentiable, the author converts it into the product of two kernels (kernel method):

K is an adjustable coefficient that controls the sensitivity to the target pixel p

At this time, the above F function is continuous and derivable, while maintaining the correct value trend.

As shown above, the kernel function tends to 1 when the pixel p is close to the box center c and 0 when it is far away, which approximately reflects the probability distribution of the pixel relative to the box.

At this point, the method of finding the intersection and union of two box is updated as follows:

In order to reduce the amount of computation, the above calculation can be simplified by using the box's wth and h relation:

Finally, we get the calculation form of PIoU:

Note (bjournal b') as a pair of positive results, b as a prediction box based on a positive anchor (when an anchor matches a GTbox with a 0.5 + IoU), and b' as a matching ground-truth box. M represents the number of all positive sample pairs.

Then the Loss of PIoU can be expressed as:

Dataset Retial50K

Based on the supermarket images taken voluntarily from different countries and regions, the only marking type is the shelf layer.

Example figure:

It has the following characteristics:

Complex background: the shelf floor may be obscured by price tags or promotional bars. At the same time, there are all kinds of drinks, snacks and so on.

Great aspect ratio: most of the shelves are long in length and small in width.

It has practical value: it can be used for shelf retail label detection, automatic shelf layering, shelf layer and image deflection angle estimation and so on.

The figure above illustrates the distribution of aspect ratio, tilt angle and the number of instances in the dataset.

Experimental results

The experiment on the adjustable parameter k in kernel function:

For DOTA dataset, compare the effect of PIoU Loss experiment:

(DOTA data set is an aerial remote sensing data set, ground objects, but with inclination)

Among them, HPIoU is a version that uses the union of wth to simplify the calculation, and the accuracy is slightly reduced, but it saves time.

Test results on the PASCAL dataset:

The test of PIoU on the Retail50K dataset is as follows:

The comparative experimental results of PIoU and SmoothL1 losses, the red box below is PIoU, the obvious effect is better.

This is the end of how to achieve accurate directional target detection in complex scenes by PIoU Loss. I hope the above content can be helpful to you and learn more knowledge. If you think the article is good, you can share it for more people to see.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.