What are the four modules of Faster R-CNN ? 07/19 Update SLTechnology News&Howtos

What are the four modules of Faster R-CNN ?

2025-07-19 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Shulou(Shulou.com)06/01 Report--

What are the four modules of Faster R-CNN? I believe many inexperienced people are at a loss about this. Therefore, this paper summarizes the causes and solutions of the problem. Through this article, I hope you can solve this problem.

Faster R-CNN is the third model in the R-CNN series, which has experienced R-CNN proposed by Girshick in 2013, Fast R-CNN proposed by Girshick in 2015 and Faster R-CNN proposed by Ren in 2015.

Faster R-CNN is a two-stage network proposed earlier in target detection. Its network architecture is shown in the following figure:

You can see that it can be roughly divided into four parts:

Conv Layers convolution neural network is used to extract features and get feature map. RPN network for extracting Region of Interests (RoI). RoI pooling, which is used to synthesize RoI and feature map to get feature after fixed size resize. Classifier, which is used to classify which category RoI belongs to. 1. Conv Layers

In Conv Layers, the input image is convoluted and pooled, which is used to extract image features. Finally, what you want to get is feature map. In Faster R-CNN, the picture is first Resize to a fixed size, and then 13 convolution layers, 13 ReLU layers, and 4 maxpooling layers in VGG16 are used. (five downsamplings were carried out in VGG16. Here, the part after the fourth downsampling is discarded and the rest is extracted as Conv Layer features. )

Unlike YOLOv3, the resolution of the Faster R-CNN downsampled image is 1x16 of the original image resolution (YOLOv3 is changed to the original 1x32). The resolution of feature map is higher than that of YOLOv3's Backbone, which can explain why Faster R-CNN is better than YOLOv3 in detecting small targets.

2. Region Proposal Network

Referred to as the RPN network, it is used to recommend candidate areas (Region of Interests). The accepted input is the feature map obtained from the original picture after Conv Layer.

The RPN network takes feature map as input, then uses a 3x3 convolution to halve the filter to 512, and then enters two branches:

A branch is used to calculate the probability of the foreground and background of the corresponding anchor, and the target is foreground.

A branch is used to calculate the offset of the Bounding box of the corresponding anchor to obtain the location of its target.

Through the RPN network, we get whether each anchor contains the target and the location information of the target in the case of containing the target.

Compare RPN with YOLOv3:

It is said that YOLOv3 borrows from RPN. Here's a comparison between the two:

RPN: divided into two branches, one branch predicts the target box, and the other branch predicts the foreground or background. The two tasks are done separately, and the foreground background prediction branch function is to determine whether the anchor contains targets, and does not classify the targets. In addition, the setting of anchor is obtained through a priori.

YOLOv3: if you treat the whole problem as a regression problem, you can get the target category and coordinates directly. Anchor is obtained by IoU clustering.

Difference: Anchor settings, Ground truth and Anchor match details are not the same.

Connection: both of them are assigned multiple anchor at each point on the final feature map (wmax 16 or wmax 32), and then match. Although there is a big gap in implementation, this idea has something in common.

3. ROI Pooling

Here's an example from deepsense.ai:

The RoI Pooling inputs are feature map and RoIs:

Suppose the feature map is as follows:

One of the RoI provided by RPN is: upper left corner coordinates (0p3), lower right corner coordinates (7pence8)

Then cut the portion of the RoI that corresponds to the feature map into blocks the size of 2x2:

Do an operation similar to maxpooling for each block, and get the following results:

This is the complete operation of ROI pooling. Think about why you do this.

In the RPN phase, we know whether the current image has a target, and if so, the location of the target. The only missing information is which category the goal belongs to (you can only know through RPN that the goal belongs to the prospect, but you can't get a specific category).

If you want to know which category this goal belongs to, the easiest idea is to put the images in the frame into a CNN to classify them and get the final category. This involves the last module: classification

4. Classification

After ROIPooling, we get the same size feature, and then it is divided into two branches, which are classified by the lower branch, and the last branch is used for Bounding box regression.

The branch of classification is easy to understand and is used to calculate which category it belongs to. The branch of Bounding box regression is used to adjust the Bounding box predicted by RPN to make the results of regression more accurate.

After reading the above, have you mastered what the four modules of Faster R-CNN are? If you want to learn more skills or want to know more about it, you are welcome to follow the industry information channel, thank you for reading!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.