What are the relevant knowledge points of PANet 04/23 Update SLTechnology News&Howtos

What are the relevant knowledge points of PANet

2025-04-23 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Shulou(Shulou.com)06/01 Report--

This article mainly introduces "what are the relevant knowledge points of PANet". In the daily operation, I believe that many people have doubts about the relevant knowledge points of PANet. The editor consulted all kinds of materials and sorted out simple and easy-to-use methods of operation. I hope it will be helpful for you to answer the questions of "what are the relevant knowledge points of PANet?" Next, please follow the editor to study!

1. Overall description of PANet

PANet is an improved network based on Mask R-CNN. The three improvements are as follows:

The original Mask R-CNN did not make good use of low-level information. The high-level Feature maps focuses on the object as a whole, while the lower-level Feature maps focuses on the texture pattern of the object. Objects can be better located by using low-level information. For this, PANet adds Bottom-up Path Augmentation (b. In the overall structure diagram), which transmits the information from the low level to the high level, and at the same time reduces the number of convolution layers that the high-level to low-level information flow needs to pass through.

The original RoI Pooling only extracts information on the last layer, while PANet uses Adaptive Feature Pooling (AFP, c. In the overall structure diagram) to RoI Pooling multiple levels at the same time to predict the integration of multi-level information.

The final Mask prediction branch combines the FCN prediction and the fully-connected prediction, the former focuses on the local, the latter focuses on the overall Context information, so as to improve the quality of the final Mask.

The following figure shows the overall structure of PANet:

The green dotted line indicates that PANet increases b. After that, there are fewer convolution layers between high-level information and input, and the red dotted line may have to pass through hundreds of convolution layers (see ResNet Block1~4), resulting in a smoother flow of information.

Second, each key point is described in detail

1 、 Bottom-up Path Augmentation

In the overall structure diagram, N _ 2 ~ P _ 2 ~ N _ 3 ~ N _ 5 is calculated by the convolution structure below.

2 、 Adaptive Feature Pooling

In FPN, objects of different sizes are assigned to different layers, such as the smallest to P2 and the largest to P5. This method is simple and effective, but it does not necessarily get the best results. For example, objects with a difference of 10 pixels in size may be divided into different layers. In order to get better results, PANet simply put each Proposal in the corresponding region of N2~N5 (structure diagram b. The characteristics of the gray area) are all used, and the specific usage is as follows:

1. Four groups of feature graphs with the same shape are obtained by using RoIAlign to extract them.

2. Four groups of features are fused, which can be sum, max and product.

3. The fused feature graph is used for classification, bbox prediction and mask prediction.

The above process is different when performed on the bbox branch and the mask branch:

Bbox branch

As shown in the following figure, the specific AFP calculation process of the bbox branch is as follows:

1. Get 4 Feature map of the same size by RoIAlign first.

2. Use the same full connection layer to calculate 4 Feature map separately.

3. Fuse 4 groups of features

4. A full connection layer is used to calculate the results of classification and bbox regression.

Mask branch

The mask branch has four convolution layers, and the feature fusion operation is carried out after conv1. The specific calculation process is as follows:

1. Get 4 Feature map of the same size by RoIAlign first.

2. Use conv1 to calculate 4 Feature map respectively.

3. Fuse 4 groups of features

4. The final mask prediction result is obtained after calculation using the fused features.

There is another detail in AFP:

Which operation should be used for feature fusion? The experimental results show that the effect of feature fusion using max is better.

Is AFP really effective? In this paper, after using max as the fusion function, it is found that most of the features extracted by the proposals,max function that should be assigned to N2~N5 (corresponding to the level1~4 below) do not come from this layer. For example, level4 (N5) only uses the features from level4 40%, that is to say, the model makes comprehensive use of the features of N2~N5 in prediction through AFP, and the final experimental results show that it does bring a lot of improvement.

3. The fusion of Mask branch FCN+Fully-connected

In the fusion of FCN and Fully-connected, it is necessary to determine which layer the Fully-connected layer chooses for input and how to integrate the results of the two. Through the experiment, it is found that the effect of conv3 as the input of Fully-connected layer and sum as fusion function is better.

4. Other details

For multi-scale training, set the long side to 1400, and the rest between 400 and 1400.

For multi-GPU synchronous BN, the mean and variance of all samples are calculated before a batch, and updated together, rather than one update in the batch.

Heavier head, similar to RetinaNet, uses 4 consecutive 3 × 3 convolutions instead of fc layers, except that box classification and box regression parameters are shared.

Multi-scale Training & Multi-GPU Sync. The two technologies of BN help the network converge better and generalize better.

Bottom-up Path Augmentation whether or not it uses adaptive pooled convolution. The bottom-up enhancement path improves the performance of predicted mask. This verifies the validity of low-level feature information.

Adaptive Feature Pooling regardless of whether or not it uses a bottom-up enhancement path. Adaptive pooled filtering continues to improve performance. The characteristics of other layers are useful for final prediction.

Fully-connected Fusion: the purpose of fully connected convergence is to improve the quality of mask predictions. This applies to all scales.

Heavier Head: it is very effective for bbox training and general for mask prediction.

Third, the PANet results show that the study of "what are the relevant knowledge points of PANet" is over. I hope to be able to solve your doubts. The collocation of theory and practice can better help you learn, go and try it! If you want to continue to learn more related knowledge, please continue to follow the website, the editor will continue to work hard to bring you more practical articles!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.