How to analyze dynamic convolution reproduction based on Pytorch 07/03 Update SLTechnology News&Howtos

How to analyze dynamic convolution reproduction based on Pytorch

2025-07-03 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Shulou(Shulou.com)06/01 Report--

Today, I will talk to you about how to analyze the dynamic convolution reproduction based on Pytorch. Many people may not know much about it. In order to make you understand better, the editor has summarized the following content for you. I hope you can get something according to this article.

A brief review

The main purpose of this paper is to improve the traditional convolution so that the convolution parameters of each layer are variable with the input in reasoning, rather than the traditional convolution is a fixed parameter for any input.

When reasoning: the parameters in the red box are fixed, and the parameters in the yellow box are constantly changing with the input data.

For a feature graph generated in the convolution process

First, do a few operations on the feature graph to generate a sum of

And then make a linear summation of the convolution kernel parameters, so that the convolution kernel changes with the change of input. (you can take a look at other explanatory articles. This article focuses on how to write code.)

The following is a simple version of the attention code, outputting weighted parameters of [,] size. Corresponds to the number of convolution kernels to be summed.

Class attention2d (nn.Module):

Def _ _ init__ (self, in_planes, K,):

Super (attention2d, self). _ _ init__ ()

Self.avgpool = nn.AdaptiveAvgPool2d (1)

Self.fc1 = nn.Conv2d (in_planes, K, 1,)

Self.fc2 = nn.Conv2d (K, K, 1,)

Def forward (self, x):

X = self.avgpool (x)

X = self.fc1 (x)

X = F.relu (x)

X = self.fc2 (x) .view (x.size (0),-1)

Return F.softmax (x, 1)

The following is a formula for the summation of convolution kernels in the article.

Among them

It's input, it's output; you can see

Two operations are performed, one for attention parameters (for generating dynamic convolution kernels) and one for convolution.

However, when writing code, there will be problems if you directly sum a convolution kernel. Next, let's review the convolution parameters in Pytorch, then describe the problems that may arise, and then explain how to solve the problem by grouping convolution.

Implementation of Pytorch convolution

I will review the implementation of convolution in Pytorch from a dimensional perspective (you can also write by hand, several key points: input dimension, output dimension, normal convolution kernel parameter dimension, grouped convolution dimension, dynamic convolution dimension, attention module output dimension).

Input: the dimension size of input data is [,].

Output: output dimension is [,].

Convolution kernel: the parameter dimension of normal convolution kernel is [,]. (in Pytorch, the 2d convolution kernel parameter should be fixed in this dimension)

Here we can note that the dimension of normal convolution kernel parameters does not exist. Because for normal convolution, different input data use the same convolution kernel, and the number of convolution kernels is independent of the input size of a forward operation (only one convolution kernel parameter of the same layer is needed).

Problems that may arise

Here is a description of the problems that may occur when implementing dynamic convolution code because it is greater than 1.

For the number of last softmax outputs of the attention module in the figure, their dimensions are [,

], can be directly .view into [,], and then act on the convolution kernel parameters (forming dynamic convolution).

The problem is: normal convolution, input multiple data at a time, their convolution kernel parameters are the same, so only one network parameter is needed; but for dynamic convolution, each input data uses a different convolution core, so a copy of network parameters is required, which does not conform to the convolution parameter format in Pytorch, and will make errors.

Look at the dimension operation [,] * [,], the generated dynamic convolution kernel is [,], which does not comply with the provisions of Pytorch, and cannot directly participate in the operation (you can write a code according to this idea to have a look, experience, just see may not feel the problem), the simplest solution is equal to 1, there will be no error, but slow ah!

In short, more than 1 will lead to non-compliance of intermediate convolution kernel parameters.

Packet convolution and how to achieve dynamic convolution greater than 1 by packet convolution

One sentence describes packet convolution: for multi-channel inputs, they are divided into several parts and convoluted separately, resulting in concate.

The process of group convolution is described in nonsense: for the input data [,], assuming that, then grouping convolution is to divide it into two data (which can also be divided by other methods), then the dimension is [, 5x2,], then the group convolution of 2 can be regarded as normal convolution. (if you still don't understand group convolution, you can read other articles and take a closer look at it. )

Ingenious conversion: the above will double the packet convolution can be converted into normal convolution, then think backwards, will become 1, is it possible to turn normal convolution into block convolution?

We regard the size as the quantity in the packet convolution, so that the dimension directly becomes

!!! If you directly change the input data from [,] to [1,], you can use packet convolution to solve the problem!

Describe the implementation process in detail: regard the dimension of input data as [1,] (rhythm of packet convolution) The convolution weight parameter is initialized to [,], and the dimension generated by the attention module is [,]. The normal matrix multiplication [,] * [, * *] generates the parameters of dynamic convolution directly, and the generated dynamic convolution weight dimension is [,], which is regarded as the weight of grouped convolution [,] (including reshape in the process). This processing is complete, enter the data [

,], dynamic convolution kernel [,], directly is the packet convolution, problem solving.

The specific code is as follows:

Class Dynamic_conv2d (nn.Module):

Def _ _ init__ (self, in_planes, out_planes, kernel_size, stride=1, padding=0, dilation=1, groups=1, bias=True, Kroom4,):

Super (Dynamic_conv2d, self). _ _ init__ ()

Assert in_planes%groups==0

Self.in_planes = in_planes

Self.out_planes = out_planes

Self.kernel_size = kernel_size

Self.stride = stride

Self.padding = padding

Self.dilation = dilation

Self.groups = groups

Self.bias = bias

Self.K = K

Self.attention = attention2d (in_planes, K,)

Self.weight = nn.Parameter (torch.Tensor (K, out_planes, in_planes//groups, kernel_size, kernel_size), requires_grad=True)

If bias:

Self.bias = nn.Parameter (torch.Tensor (K, out_planes))

Else:

Self.bias = None

Def forward (self, x): # treat batch as a dimensional variable for group convolution, because the weight of group convolution is different, and the weight of dynamic convolution is also different.

Softmax_attention = self.attention (x)

Batch_size, in_planes, height, width = x.size ()

X = x.view (1,-1, height, width) # changed into a dimension for group convolution

Weight = self.weight.view (self.K,-1)

# the weight generation of dynamic convolution generates batch_size convolution parameters (each parameter is different)

Aggregate_weight = torch.mm (softmax_attention, weight). View (- 1, self.in_planes, self.kernel_size, self.kernel_size)

If self.bias is not None:

Aggregate_bias = torch.mm (softmax_attention, self.bias). View (- 1)

Output = F.conv2d (x, weight=aggregate_weight, bias=aggregate_bias, stride=self.stride, padding=self.padding)

Dilation=self.dilation, groups=self.groups*batch_size)

Else:

Output = F.conv2d (x, weight=aggregate_weight, bias=None, stride=self.stride, padding=self.padding)

Dilation=self.dilation, groups=self.groups * batch_size)

Output = output.view (batch_size, self.out_planes, output.size (- 2), output.size (- 1))

After reading the above return output, do you have any further understanding of how to parse the dynamic convolution based on Pytorch? If you want to know more knowledge or related content, please follow the industry information channel, thank you for your support.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.