Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

Example Analysis of batchnorm2d parameters torch_Pytorch Free loading and freezing of some Model parameters

2025-04-06 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Share

Shulou(Shulou.com)06/01 Report--

This article introduces batchnorm2d parameter torch_Pytorch free loading part of the model parameters and frozen example analysis, the content is very detailed, interested friends can refer to, I hope to help you.

Pytorch's load method and load_state_dict method can only read parameter files in a relatively fixed way. They require that the key of state_dict read is equal to the key of Model.state_dict().

And we may need to use only part of a pre-trained network in the transfer learning process, put multiple networks together into a network, or separate the Sequential in the pre-trained model to get the output of the middle layer, etc., in these cases. The traditional load method is not very effective.

For example, if we want to use the first seven convolutions of Mobilenet and freeze these layers, the latter part will be connected to another structure, or rewritten into FCN structure, the traditional method will not work.

The most general method is to build a dictionary, so that the keys of the dictionary are the same as the network we created ourselves, and then we can fill in the new keys from various pre-trained networks to have a new state_dict, so that we can load this new state_dict. At present, we can only think of this method to deal with more complex network transformations.

Online search "load part of the model","frozen part of the model" are generally only changed to FC, not used at all, when learning to write their own state_dict also stepped on some pits, sent out to record it.

I. Loading partial pre-training parameters

Let's look at Mobilenet's structure first.

(Source github, with pre-trained model mobilenet_sgd_rmsprop_69.526.tar)

class Net(nn.Module):def __init__(self):super(Net, self).__ init__()def conv_bn(inp, oup, stride):return nn.Sequential(nn.Conv2d(inp, oup, 3, stride, 1, bias=False),nn.BatchNorm2d(oup),nn.ReLU(inplace=True))def conv_dw(inp, oup, stride):return nn.Sequential(nn.Conv2d(inp, inp, 3, stride, 1, groups=inp, bias=False),nn.BatchNorm2d(inp),nn.ReLU(inplace=True),nn.Conv2d(inp, oup, 1, 1, 0, bias=False),nn.BatchNorm2d(oup),nn.ReLU(inplace=True),)self.model = nn.Sequential(conv_bn( 3, 32, 2),conv_dw( 32, 64, 1),conv_dw( 64, 128, 2),conv_dw(128, 128, 1),conv_dw(128, 256, 2),conv_dw(256, 256, 1),conv_dw(256, 512, 2),conv_dw(512, 512, 1),conv_dw(512, 512, 1),conv_dw(512, 512, 1),conv_dw(512, 512, 1),conv_dw(512, 512, 1),conv_dw(512, 1024, 2),conv_dw(1024, 1024, 1),nn.AvgPool2d(7),)self.fc = nn.Linear(1024, 1000)def forward(self, x):x = self.model(x)x = x.view(-1, 1024)x = self.fc(x)return x

We only need the first 7 convolutions, and to facilitate the later concatenate operation, we split the Sequential into the following

class Net(nn.Module):def __init__(self):super(Net, self).__ init__()def conv_bn (inp, oup, stride):return nn.Sequential (nn.Conv2d(inp, oup, 3, stride, 1, bias=False),nn.BatchNorm2d(oup),nn.ReLU (inplace=True))def conv_dw (inp, oup, stride): return nn.Sequential (nn.Conv2d(inp, inp, 3, stride, 1, groups=inp, bias=False),nn.BatchNorm2d(inp),nn.ReLU (inplace=True),nn.Conv2d (inp, oup, 1, 1, 0, bias=False),nn.BatchNorm2d(oup),nn.ReLU (inplace=True),)self.conv1 = conv_bn ( 3, 32, 2)self.conv2 = conv_dw ( 32, 64, 1)self.conv3 = conv_dw ( 64, 128, 2)self.conv4 = conv_dw (128, 128, 1)self.conv5 = conv_dw (128, 256, 2)self.conv6 = conv_dw (256, 256, 1)self.conv7 = conv_dw (256, 512, 2)#So these are not needed #You can follow the following structure '''self.features = nn.Sequential (conv_dw(512, 512, 1),conv_dw (512, 512, 1),conv_dw (512, 512, 1),conv_dw (512, 512, 1),conv_dw (512, 512, 1),conv_dw (512, 1024, 2),conv_dw (1024, 1024, 1),nn.AvgPool2d(7),)self.fc = nn.Linear (1024, 1000)'''def forward(self, x): x1 = self.conv1(x)x2 = self.conv2(x1)x3 = self.conv3(x2)x4 = self.conv4(x3)x5 = self.conv5(x4)x6 = self.conv6(x5)x7 = self.conv7(x6)#x8 = self.features(x7)#out = self.fcreturn (x1,x2,x3,x4,x4,x6,x7)

Let's create a net with a modified structure and see how its state_dict differs from the state_dict of our pre-training file.

net = Net()#My computer does not have GPU, its parameters are GPU trained cudasensor, so I want to convert dict_trained = torch.load ("mobilenet_sgd_rmsprop_69.526.tar",map_location=lambda storage, loc: storage)["state_dict"]dict_new = net.state_dict().copy()new_list = list (net.state_dict().keys() )trained_list = list (dict_trained.keys() )print ("new_state_dict size: {} trained state_dict size: {}".format(len(new_list),len(trained_list)) )print("New state_dict first 10th parameters names")print(new_list[:10])print("trained state_dict first 10th parameters names")print(trained_list[:10])print(type(dict_new))print(type(dict_trained))

The output is as follows:

After we truncate in half, the parameters change from 137 to 65. As you can see from the first ten parameters, the name changes but the order does not change. The data type of state_dict is Odict, which can be operated according to the operation method of dict.

new_state_dict size: 65 trained state_dict size: 137

New state_dict first 10th parameters names

['conv1.0.weight', 'conv1.1.weight', 'conv1.1.bias', 'conv1.1.running_mean', 'conv1.1.running_var', 'conv2.0.weight', 'conv2.1.weight', 'conv2.1.bias', 'conv2.1.running_mean', 'conv2.1.running_var']

trained state_dict first 10th parameters names

['module.model.0.0.weight', 'module.model.0.1.weight', 'module.model.0.1.bias', 'module.model.0.1.running_mean', 'module.model.0.1.running_var', 'module.model.1.0.weight', 'module.model.1.1.weight', 'module.model.1.1.bias', 'module.model.1.1.running_mean', 'module.model.1.1.running_var']

We see that by building a dictionary with the same keys as the network we created, we can have a new state_dict by filling in the desired parameters from various pre-trained networks against the new keys, so that we can load this new state_dict, which is the most universal method for all network changes.

for i in range(65):dict_new[ new_list[i] ] = dict_trained[ trained_list[i] ]net.load_state_dict(dict_new)

There are other cases, such as we just add a few layers at the back, without changing the name and structure of the original network layer, you can use the following simple method:

loaded_dict = {k: loaded_dict[k] for k, _ in model.state_dict()}

II. Freeze these layers of parameters

There are many methods, here we use the freezing method corresponding to the above method

If you find that there is a problem with the previous freeze, it is recommended to look at https://discuss.pytorch.org/t/how-the-pytorch-freeze-network-in-some-layers-only-the-rest-of-the-training/7088 or https://discuss.pytorch.org/t/correct-way-to-freeze-layers/26714 or

Correspondingly, during training, the optimizer can only update the parameters required_grad = True, so

Optimizer = torch. optimm.Adam( filter(lambda p: p.requires_grad, net.parameters(),lr) ) About batchnorm2d parameter torch_Pytorch Free loading some model parameters and frozen example analysis is shared here, I hope the above content can be of some help to everyone, you can learn more knowledge. If you think the article is good, you can share it so that more people can see it.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Internet Technology

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report