In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-03-18 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Development >
Share
Shulou(Shulou.com)06/02 Report--
This article mainly introduces "using Pytorch for CNN analysis". In daily operation, I believe many people have doubts about using Pytorch for CNN analysis. The editor consulted all kinds of data and sorted out simple and easy-to-use operation methods. I hope it will be helpful to answer the doubts of "using Pytorch for CNN analysis". Next, please follow the editor to study!
Tools
Open source deep learning library: PyTorch
Dataset: MNIST
Realize
Initial requirement
First, set up a basic BASE network with the following code in Pytorch:
Class Net (nn.Module): def _ init__ (self): super (Net, self). _ _ init__ () self.conv1 = nn.Conv2d (1,20, kernel_size= (5,5), stride= (1,1), padding=0) self.conv2 = nn.Conv2d (20,50, kernel_size= (5,5), stride= (1,1), padding=0) self.fc1 = nn.Linear Self.fc2 = nn.Linear (500,10) def forward (self, x): X = F.max_pool2d (self.conv1 (x), 2) x = F.max_pool2d (self.conv2 (x), 2) x = x.view (- 1,4 '4x50) x = F.relu (self.fc1 (x)) x = self.fc2 (x) return F.log_softmax (x)
See base.py for this part of the code.
Question A: preprocessing
That is, the MNIST dataset is required to be read according to the rules and tranform to a format suitable for processing. The code read here follows the reading method of BigDL Python Support. Needless to say, it can be read out quickly according to the data format on the MNIST home page. The key block has a function to read 32 bits:
Def _ read32 (bytestream): dt = numpy.dtype (numpy.uint32) .newbyteorder ('>') # Big end mode read, * bytes before (MSB first) return numpy.frombuffer (bytestream.read (4), dtype=dt) [0]
After reading, there is a tensor of (N, 1, 28, 28). Each pixel is a value of 0-255. first, normalize all values by 255. get a value of 0-1. Then Normalize, the mean and variance of training set and test set are known, you can do it directly. Because the mean and variance of the training set and the test set are both for the normalized data, so there is no normalization at first, so the forward output and grad are very outrageous, and later found that there is a problem here.
See preprocessing.py for this part of the code.
Problem B:BASE model
Set random seed to 0, learn the parameters on the first 10000 training samples, and * * look at the test set error rate after 20 epochs. * the result is:
Test set: Average loss: 0.0014, Accuracy: 9732 Universe 10000 (97.3%)
As you can see, the accuracy of BASE model is not so high.
Question C:Batch Normalization v.s BASE
Add the Batch Normalization layer after the convolution layer of the first three block, and simply modify the network structure as follows:
Class Net (nn.Module): def _ init__ (self): super (Net, self). _ _ init__ () self.conv1 = nn.Conv2d (1,20, kernel_size= (5,5), stride= (1,1), padding=0) self.conv2 = nn.Conv2d (20,50, kernel_size= (5,5), stride= (1,1), padding=0) self.fc1 = nn.Linear Self.fc2 = nn.Linear (500,10) self.bn1 = nn.BatchNorm2d (20) self.bn2 = nn.BatchNorm2d (50) self.bn3 = nn.BatchNorm1d (500) def forward (self, x): X = self.conv1 (x) x = F.max_pool2d (self.bn1 (x) 2) x = self.conv2 (x) x = F.max_pool2d (self.bn2 (x), 2) x = x.view (- 1,4) 4x50) x = self.fc1 (x) x = F.relu (self.bn3 (x)) x = self.fc2 (x) return F.log_softmax (x)
With the same parameter run, the result of adding BN is as follows:
Test set: Average loss: 0.0009, Accuracy: 9817 Universe 10000 (98.2%)
It can be seen that there is an obvious improvement in the effect.
For more information about Batch Normalization, see [2], [5].
Question D: Dropout Layer
After adding a Dropout (pendant 0.5) after the fc2 layer, the results on BASE and BN are:
BASE:Test set: Average loss: 0.0011, Accuracy: 9769 BN 10000 (97.7%) BN: Test set: Average loss: 0.0014, Accuracy: 9789
It is observed that dropout can improve the BASE model to a certain extent, but the effect on the BN model is not obvious but reduced.
The reason may be that the BN model itself contains the effect of regularization, and adding a layer of Dropout seems unnecessary but may affect the results.
Problem E:SK model
SK model: Stacking two 3x3 conv. Layers to replace 5x5 conv. Layer
After such a change, the SK model is built as follows:
Class Net (nn.Module): def _ init__ (self): super (Net, self). _ _ init__ () self.conv1_1 = nn.Conv2d (1,20, kernel_size= (3,3), stride= (1,1), padding=0) self.conv1_2 = nn.Conv2d (20,20, kernel_size= (3,3), stride= (1,1), padding=0) self.conv2 = nn.Conv2d (20,50, kernel_size= (3) 3), stride= (1,1), padding=0) self.fc1 = nn.Linear (550,500) self.fc2 = nn.Linear (500,10) self.bn1_1 = nn.BatchNorm2d (20) self.bn1_2 = nn.BatchNorm2d (20) self.bn2 = nn.BatchNorm2d (50) self.bn3 = nn.BatchNorm1d (500) self.drop = nn.Dropout (pendant 0.5) def forward (self) X): X = F.relu (self.bn1_1 (self.conv1_1 (x) x = F.relu (self.bn1_2 (self.conv1_2 (x) x = F.max_pool2d (x, 2) x = self.conv2 (x) x = F.max_pool2d (self.bn2 (x), 2) x = x.view (- 1) 5 '550) x = self.fc1 (x) x = F.relu (self.bn3 (x)) x = self.fc2 (x) return F.log_softmax (x)
After 20 epoch, the results are as follows
SK: Test set: Average loss: 0.0008, Accuracy: 9848 Universe 10000 (98.5%)
The accuracy of the test set has been slightly improved.
Here, two 3x3 convolution kernels are used to replace the large 5x5 convolution kernels, and the number of parameters is changed from 5x5=25 to 2x3x3=18. Practice has shown that this makes the calculation faster, and the ReLU between small convolution layers is also helpful.
This method is used in VGG.
Problem F:Change Number of channels
By multiplying the size of the feature graph by a multiple, and then executed by the shell program, the following results are obtained:
SK0.2: 97.7% SK0.5: 98.2% SK1: 98.5% SK1.5: 98.6% SK2: 98.5% (max 98.7%)
The final accuracy is basically improved when the feature images are 4d10, 30 and 40, respectively. To a certain extent, it shows that increasing the number of feature graphs before fitting is equivalent to extracting more features, and the increase of the number of extracted features is helpful to improve the accuracy.
See SK_s.py and runSK.sh for this part of the code.
Problem G:Use different training set sizes
Also run through the script to add parameters
Parser.add_argument ('--usedatasize', type=int, default=60000, metavar='SZ', help='use how many training data to train network')
Indicates the size of the data used. Usebatchsize pieces of data are fetched from the back.
For this part of the program, see SK_s.py and runTrainingSize.sh.
The result of the operation is as follows:
1000: 92.0% 2000: 94.3% 5000: 95.5% 10000: 96.6% 20000: 98.4% 60000: 99.1%
From this, it is obvious that the more data, the greater the accuracy of the results.
Too little data can not accurately reflect the overall distribution of data, and it is easy to fit, and the effect of too much data will not be obvious to a certain extent, but most of the time we still disrelish that there is too little data, and it is difficult to obtain more data.
Problem H:Use different training sets
It is completed by script. See SK_0.2.py and diffTrainingSets.sh for this part of the program.
The running results are as follows:
0-10000: 98.0% 10000-20000: 97.8% 20000-30000: 97.8% 30000-40000: 97.4% 40000-50000: 97.5% 50000-60000: 97.7%
Thus it can be seen that there are some differences in the networks trained by different sets of training samples, although it is not very big, but after all, it shows unstable results.
Problem I:Random Seed's effects
Using runSeed.sh script to complete, using all 60000 training sets.
The result of the operation is as follows:
Seed 0: 98.9% Seed 1: 99.0% Seed 12: 99.1% Seed 123: 99.0% Seed 1234: 99.1% Seed 12345: 99.0% Seed 123456: 98.9%
In fact, when using the entire training set, the seed setting of the random number generator has little effect on the results.
Question J:ReLU or Sigmoid?
After replacing all the ReLU with Sigmoid, we train with all 60000 training sets. The comparison results are as follows:
ReLU SK_0.2: 99.0% igmoid SK_0.2: 98.6%
It can be seen that when training CNN, it is better to use ReLU activation unit than Sigmoid activation unit. The reason may be the difference between the two mechanisms. When the input value of sigmoid is large or small, the output value will be close to 0 or 1, which makes the gradient almost zero in many places, and the weight can hardly be updated. Although ReLU increases the computational burden, it can significantly accelerate the convergence process, and there is no gradient saturation problem.
At this point, the study of "using Pytorch for CNN analysis" is over. I hope to be able to solve your doubts. The collocation of theory and practice can better help you learn, go and try it! If you want to continue to learn more related knowledge, please continue to follow the website, the editor will continue to work hard to bring you more practical articles!
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.