In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-03-29 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >
Share
Shulou(Shulou.com)06/02 Report--
This article is reproduced from Xin Zhiyuan
[introduction] the latest paper of FAIR he Kaiming's team puts forward "panoramic FPN", which focuses on the task of panoramic image segmentation, combines FCN and Mask R-CNN for semantic segmentation and case segmentation respectively, and designs Panoptic FPN. This method may become a powerful baseline for panoramic segmentation research.
Facebook artificial Intelligence Lab (FAIR), he Kaiming et al., released the latest paper: Panoptic Feature Pyramid Networks at arXiv.
This paper focuses on "panoramic segmentation". At the architecture level, FCN and Mask R-CNN, which are used for semantic segmentation and instance segmentation respectively, are combined to design a single network: Panoptic FPN.
The author says that experiments show that Panoptic FPN is effective for both semantic segmentation and case segmentation, and has both robustness and accuracy. In view of its effectiveness and simplicity of the concept, this method is expected to become a powerful baseline and the basis for the follow-up development of panoramic segmentation.
Although conceptually simple, designing a single network that can achieve high precision in both tasks is challenging because there are many differences in the best performance methods for the two tasks.
The result of Panoptic FPN on COCO and Cityscapes
For semantic segmentation, FCN enhanced by extended convolution (dilated convolutions) is optimal. For case segmentation, region-based Mask R-CNN with a feature pyramid (FPN) skeleton has been used as the basis for all high-score networks in the recent image recognition competition.
Considering the architectural differences of these methods, it may be necessary to sacrifice accuracy in instance segmentation or semantic segmentation when designing a single network for these two tasks. However, the FAIR team proposes a simple, flexible, and efficient architecture that uses a single network that generates both region-based output (instance segmentation) and dense-pixel output (semantic segmentation) to ensure the accuracy of both tasks.
When training each task separately, this method achieves good results in instance segmentation and semantic segmentation on COCO and Cityscapes. Example segmentation is close to Mask R-CNN, and the accuracy of semantic segmentation is similar to that of the latest DeepLabV3+.
Next, we will introduce the architecture and experimental results of Panoptic FPN in detail.
Panoramic feature pyramid network Panoptic FPN
Panoptic FPN is a simple, single-network baseline that aims to achieve maximum performance in instance segmentation and semantic segmentation and their joint task: panoramic segmentation.
The design principle is to start with Mask R-CNN with FPN and make minimal modifications to generate a semantically segmented dense-pixel output (see figure 1).
Model architecture
Figure the architecture of 1:Panoptic FPN
(a) feature pyramid network (b) instance segmentation branch (c) semantic segmentation branch
Feature Pyramid Network (Feature Pyramid Network): first take a brief look at FPN. FPN uses a standard network with multi-spatial resolution characteristics (such as ResNet) and adds a top-down channel with horizontal connections, as shown in figure 1A. The top-down path starts at the deepest level of the network and is gradually sampled upward, while adding a converted version of the high-resolution characteristics of the bottom-up path. FPN generates a pyramid, usually with a resolution of 1 to 4, where each pyramid level has the same channel dimension (the default is 256).
Instance segmentation branch: the design of FPN, especially using the same channel dimension for all pyramid levels, makes it easy to attach region-based object detectors, such as Faster R-CNN. To output instance segmentation, we use Mask R-CNN, which extends Faster R-CNN by adding FCN branches to predict the binary segmented Mask for each candidate region, as shown in figure 1b.
Skeleton architecture for increasing feature resolution
Panoptic FPN: as mentioned earlier, our approach is to use FPN to modify Mask R-CNN to achieve pixel-level semantic segmentation prediction. However, in order to achieve accurate predictions, the features used by the task should:
(1) have appropriate high resolution to capture fine structures
(2) encode enough semantics to accurately predict class tags.
(3) although FPN is designed for target detection, these requirements-high resolution, rich, multi-scale features-are exactly the characteristics of FPN.
Therefore, we suggest attaching a simple and fast semantic segmentation branch to FPN.
Figure 3: semantic segmentation branch
Experiments and results
Our goal is to prove that our method, Panoptic FPN, can be used as a simple and effective single-network baseline for instance segmentation, semantic segmentation, and panoramic segmentation of their joint tasks.
Therefore, we start by testing the semantic segmentation method (we call this single-task variant Semantic FPN). Surprisingly, this simple model achieves competitive semantic segmentation results on COCO and Cityscapes datasets.
Next, we analyze the integration of semantic segmentation branch and Mask R-CNN, as well as the effect of joint training. Finally, we show the results of panoramic segmentation again on COCO and Cityscapes datasets. The qualitative results are shown in tables 2 and 6.
Semantic segmentation FPN
Table 1: results of semantic segmentation FPN
Cityscapes dataset:
We first compare the baseline Semantic FPN with the existing methods of semantic segmentation on the Cityscapes dataset in Table 1a. Our method is a minimal extension of FPN, and our method can achieve powerful results compared with systems with a large number of engineering designs, such as DeepLabV3+ [12].
In our baseline, we deliberately avoid orthogonal architectural improvements, such as Non-local or SE, which may yield further benefits. In terms of computation and memory, Semantic FPN is lighter than a typical dilation model and produces higher resolution features (see figure 4).
Figure 4
COCO dataset:
An early version of our approach won the COCO-Stuff challenge in 2017. The results are shown in Table 1b.
Multitask training
Our approach performs very well on a single task; for semantic segmentation, the results of the previous section prove this; for instance segmentation, this is known because it is based on Mask R-CNN. But can we train these two tasks together in a multitasking environment?
In order to combine our semantic segmentation branch with the instance segmentation branch in Mask R-CNN, we need to determine how to train a single, unified network. Previous studies have shown that multitask training is often challenging and may lead to a decline in the accuracy of the results. We also observe that for semantic or instance segmentation, adding auxiliary tasks reduces accuracy compared to single-task baselines.
Table 2: multitasking training
In Table 2, ResNet-50-FPN 's results show that using a simple semantic segmentation loss λ s, or instance segmentation loss λ I, the results can improve the results of single-task baseline. Specifically, adding a semantic segmentation branch λ s appropriately can improve case segmentation, and vice versa. This can be used to improve the results of a single task. However, our main goal is to solve both tasks at the same time, which will be discussed in the next section.
Panoptic FPN
Test the results of Panoptic FPN's joint task for panoramic segmentation, where the network must combine and accurately output stuff and thing segmentation.
Table 3:Panoptic FPN results
Main results: in Table 3a, we compare two networks trained separately with Panoptic FPN with the same skeleton. Panoptic FPN has considerable performance, but only half the amount of computation.
We also balance the calculation budget by comparing two separate networks, Panoptic R101-FPN and R50-FPN × 2, as shown in Table 3b. Using roughly the same computational budget, Panoptic FPN is significantly better than two separate networks.
To sum up, these results show that the joint method is beneficial, and our proposed Panoptic FPN method can be used as a reliable baseline for joint tasks.
Paper address:
Https://arxiv.org/pdf/1901.02446.pdf
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
@ WebServlet (name = "httpServletDemo", urlPatterns = "/ httpServletDemo", initParams = {
© 2024 shulou.com SLNews company. All rights reserved.