What is the implementation process of simple handwritten digit recognition in PyTorch 01/15 Update SLTechnology News&Howtos

What is the implementation process of simple handwritten digit recognition in PyTorch

2025-01-15 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Development >

Shulou(Shulou.com)06/02 Report--

What this article shares with you is about the realization process of PyTorch simple handwritten digit recognition. The editor thinks it is very practical, so I share it with you to learn. I hope you can get something after reading this article.

I. package import and download of required data

The main function of torchvision package is to achieve data processing, import, preview and so on, so if you need to deal with computer vision-related problems, you can borrow a large number of classes provided in the torchvision package to complete the corresponding work.

There are two at the beginning of the code:

The main functions of import torchfrom torchvision import datasets, transforms # torchvision package are data processing, import and preview, etc.

Torchvision.datasets: download the training set and test set of the dataset by using torchvision plus the name of the dataset to be downloaded, such as MNIST in this example

The code to download the dataset is as follows:

Data_train = datasets.MNIST (transform=transform, root= ". / data/", train=True, download=True) data_test = datasets.MNIST (root= ". / data/", transform=transform, train=True, download=False)

① root is used to specify the storage path of the dataset after download, which is stored in the data folder under the root directory.

② transform is used to specify what kind of transformation of the data is required to import the dataset

③ train is used to specify which part of the data needs to be loaded after the dataset is downloaded (if set to True, the training set part of the dataset is loaded; if set to False, the test set part of the dataset is loaded)

About the changes introduced to the dataset

I made a slight change here, because the entire imported and downloaded dataset has about 60, 000 images, which is a huge amount of data, and the time required for a properly configured computer program to run will be huge. I was running all morning (a normally configured student computer), so here I only intercepted the first 1000 of the training and test sets of 60, 000 data sets for training and testing. Although the accuracy will be reduced, resulting in a large deviation, but it is also enough, there will be great time savings, the code is as follows:

From torch.utils.data import random_splitdata_train, _ = random_split (dataset=data_train, lengths= [1000, 59000], generator=torch.Generator () .manual_seed (0) data_test, _ = random_split (dataset=data_test, lengths= [1000, 59000], generator=torch.Generator () .manual_seed (0))

I call the torch.utils.data import random_split function to cut the data set, so that the amount of data is reduced and the running speed is increased.

Second, carry out data processing and transformation operations

Rich classes are provided in torch.transforms to transform the loaded data. We know that a large part of the dataset processed in computer vision is of picture type, while in PyTorch, variables of Tensor data type are actually calculated, so the first thing we need to solve is the problem of data type conversion.

The code for loading and changing the data is as follows:

Transform = transforms.Compose ([transforms.ToTensor (), transforms.Normalize (mean= [0.5], std= [0.5])])

We can think of the torchvision.transforms.Compose class in the above code as a container that can combine multiple data transformations at the same time. The parameter passed in is a list, and the elements in the list begin to transform the loaded data. For example, in this example:

① converts the data type to Tensor (tensor)

② standardizes the original data with a mean (mean) and standard deviation (std) of 0.5.

III. Data preview testing and data loading

After the data has been downloaded and loaded, we still need to load the data.

We can understand the loading of data as the processing of pictures, and after the processing is complete, we need to package these pictures to our model for training, and loading is the process of packaging.

The code chip is as follows:

Data_loader_train = torch.utils.data.DataLoader (dataset=data_train, batch_size=4, shuffle=True) data_loader_test = torch.utils.data.DataLoader (dataset=data_test, batch_size=4) Shuffle=True)

The data is loaded using the torch.utils.data.DataLoader class, and the parameters in the class:

The ① batch_size parameter sets the number of image data in each package, and the value in the code is 4 (here, if the computer configuration is not very high or you want the program to run faster, it can be slightly reduced to 64, but here I will set it to 4)

The ② dataset parameter is used to specify the name of the dataset we loaded. ③ sets the shuffle parameter to True, and the data is randomly disordered and packaged during loading.

After the loading is complete, we can select the data from one of the batches for preview. The code for data preview is as follows:

Images, labels = next (iter (data_loader_train)) img = torchvision.utils.make_grid (images) img = img.numpy (). Transpose (1,2,0) std = [0.5] mean = [0.5] img = img * std + meanprint ([labels [for i in range (4)]) plt.imshow (img) plt.show ()

Iter and next are used in the above code to get a batch of picture data (images) and its corresponding picture tag (abels).

Then use the make_grid class method in torchvision.utils to construct a batch of images into a grid pattern.

The parameter that needs to be passed to torchvision.utils.make_grid is the loading data of a batch. The loading data of each batch is 4-dimensional. The dimensions are composed of batch_size, channel, height and weight respectively, corresponding to the number of data in a batch, the number of color channels of each image, the height and width of each image.

After passing the torchvision.utils.make_grid, the dimension of the image becomes (channel,height,weight), and all the images in this batch are integrated, so the corresponding value in this dimension is not the same as before, but the number of color channels remains the same.

If we want to use Matplotlib to display the data in normal image form, the data used must first be an array, and then the dimension of this array must be (height, weight, channel), that is, the number of color channels is the last.

So we have to complete the conversion of the original data types and the exchange of data dimensions through numpy and transpose, so that we can draw the correct image using Matplotlib.

In the code that completes the data preview, we print out all the labels of the data in this batch before displaying all the image data in this batch. The results are as follows:

The effect picture is as follows, as you can see, the printout is first the label corresponding to the four pictures, and then the preview effect of the four pictures.

If plt.show () is compiled using PyCham, it must be added, otherwise the image will not be displayed.

Plt.show () IV. Model building and parameter optimization.

After successfully loading the data, we can begin to write the code for the construction and parameter optimization of the convolution neural network.

The convolution layer is built using torch.nn.Conv2d class methods

The activation layer is built using the torch.nn.ReLU () class method

The pooling layer is built using the torch.nn.MaxPool2d class method

The full connection layer is built using torch.nn.Linear class methods.

The code to build the convolution neural network model is as follows:

Class Model (torch.nn.Module): def _ init__ (self): super (Model, self). _ _ init__ () self.conv1 = torch.nn.Sequential (torch.nn.Conv2d (1,64, kernel_size=3, stride=1, padding=1), torch.nn.ReLU (), torch.nn.Conv2d (64,128, kernel_size=3, stride=1, padding=1), torch.nn.ReLU () Torch.nn.MaxPool2d (stride=2, kernel_size=2) self.dense = torch.nn.Sequential (torch.nn.Linear (14 * 14 * 128,1024), torch.nn.ReLU (), torch.nn.Dropout (pendant 0.5), torch.nn.Linear (1024, 10) def forward (self) X): X = self.conv1 (x) # convolution processing x = x.view (- 1,14,14,128) # flattening the parameters x = self.dense (x) return x

We choose to build a convolution neural network model that is simplified at the structural level, using two convolution layers: one maximum pool layer and two fully connected layers.

Torch.nn.Conv2d (): used to build the convolution layer of convolution neural network. The main input parameters are the number of input channels, the number of output channels, the size of convolution kernel, the moving step of convolution kernel and the adding value. Among them

The data type of the number of input channels is an integer, which is used to determine the number of layers of input data.

The data type of the number of output channels is also an integer, which is used to determine the number of layers of the output data.

The data type of convolution kernel size is integer, which is used to determine the size of convolution kernel.

The data type of convolution kernel movement step size is integer, which is used to determine the step size of each sliding convolution kernel.

The data type of Paddingde is integer. A value of 0 means no boundary pixel filling. If the value is greater than 0, then increase the number of boundary pixel layers corresponding to the number.

Torch.nn.MaxPool2d (): used to realize the maximum pooling layer in the convolution neural network, the size of the pooling window, the moving step of the pooling window and the Paddingde value when the main input parameters are input.

Similarly:

The data type of pooled window size is integer, which is used to determine the size of pooled window.

The data type of the pooled window step size is also an integer, which is used to determine the step size of each move of the pooled window.

The Paddingde value has the same usage and meaning as the Paddingde value defined in torch.nn.Conv2d.

Torch.nn.Dropout (): torch.nn.Dropout class is used to prevent the convolution neural network from over-fitting in the process of training. Its working principle is simply to return some parameters of the convolution neural network model to zero with a certain random probability in the process of model training, so as to achieve the purpose of reducing the neural connection between the two adjacent layers.

The code propagates the contents of the forward function forward:

First of all, convolution processing is carried out by self.conv1; then x.view (- 1,14 * 14 * 128) is used to flatten the parameters because after that is the full connection layer, so if there is no flattening processing, the parameter dimensions of the actual output of the full connection layer and the dimensions of its defined input will not match, and the program will report an error; finally, the final classification will be carried out through the full connection defined by self.dense.

After editing the code to build the convolution neural network model, we can begin to train the model and optimize the parameters. First, define which loss function and optimization function to use before training:

Model = Model () cost = torch.nn.CrossEntropyLoss () optimizer = torch.optim.Adam (model.parameters ()) # loss function: cross entropy # Optimization function: Adam adaptive optimization algorithm, the parameters that need to be optimized are all the parameters generated in Model. # because the learning rate is not defined, the default value is used.

Finally, the code for model training and parameter optimization of convolution neural network model is as follows:

Epochs_n = 5for epoch in range (epochs_n): running_loss = 0.0 running_correct = 0 print ("Epoch {} / {}" .format (epoch, epochs_n)) print ("-" * 10) for data in data_loader_train: X_train, y_train = data X_train, y_train = Variable (X_train), Variable (y_train) outputs = model (X_train) _ Pred = torch.max (outputs.data, 1) optimizer.zero_grad () loss = cost (outputs, y_train) loss.backward () optimizer.step () running_loss + = loss.data running_correct + = torch.sum (pred = = y_train.data) testing_correct = 0 for data in data_loader_test: X_test, y_test = data X_test Y_test = Variable (X_test), Variable (y_test) outputs = model (X_test) _, pred = torch.max (outputs.data, 1) testing_correct + = torch.sum (pred = = y_test.data) print ("Loss is: {: .4f}, Train Accuracy is: {: .4f}%, Test Accuracy is: {: .4f}" .format (running_loss / len (data_train), 100 * running_correct / len (data_train)) 100 * testing_correct / len (data_test) changes to the model building

Here I made optimization changes to the above model, which greatly optimized the running time, but also reduced some training accuracy.

The principle is that the amount of computation of the convolution layer will not be too large, but that of the fully connected layer is relatively large, so the number of fully connected parameters and the size of the image feature map will be reduced.

Class Model (torch.nn.Module): def _ init__ (self): super (Model, self). _ _ init__ () self.conv1 = torch.nn.Sequential (torch.nn.Conv2d (1,64, kernel_size=3, stride=2, padding=1), torch.nn.ReLU (), torch.nn.Conv2d (64,128, kernel_size=3, stride=2, padding=1), torch.nn.ReLU () # torch.nn.MaxPool2d (stride=2, kernel_size=2) self.dense = torch.nn.Sequential (# torch.nn.Linear (14 * 14 * 128,1024), torch.nn.Linear (7 * 7 * 128,512), torch.nn.ReLU (), # torch.nn.Dropout (pendant 0.5), torch.nn.Dropout (pendant 0.8) Torch.nn.Linear (512,10) def forward (self, x): X = self.conv1 (x) # convolution processing # x = x.view (- 1,14,14,128) # flattening the parameters x = x.view (- 1 77.128) # flattening the parameters x = self.dense (x) return x

In order to verify whether the model we have trained is as accurate as the results show, the best way is to randomly select some pictures in the test set and use the trained model to predict the deviation from the real value. and the results are visualized, and the test code is as follows:

X_test, y_test = next (iter (data_loader_test)) inputs = Variable (X_test) pred = model (inputs) _, pred = torch.max (pred,1) print ("Predict Label is:", [i for i in pred.data]) print ("Real Label is:", [i for i in y_test]) img = torchvision.utils.make_grid (X_test) img = img.numpy () transpose 0.5] mean = [0.5,0.5,0.5] img = img*std+meanplt.imshow (img) plt.show ()

Remember to add plt.show () at the end.

The output of the data label result for the test is as follows:

In the output result

The first result is the predicted value of our trained model, and the second result is the real value of the four test data.

Visualize the test data, as shown in the following figure:

As you can see, in this part of the test set picture visualized above, the predicted results of the model are completely consistent with the real results. Of course, if you want to select more test sets for visualization, you only need to set the batch_size larger, but consider that the running speed of the corresponding program will be slightly slower

General code: the main function of import torchimport numpyimport torchvisionimport matplotlib.pyplot as pltfrom torchvisionimport datasets, transforms # torchvision package is to process, import and preview data from torch.autograd import Variabletransform = transforms.Compose ([transforms.ToTensor (), transforms.Normalize (mean= [0.5], std= [0.5])]) data_train = datasets.MNIST (transform=transform, root= ". / data/", train=True, download=True) data_test = datasets.MNIST (root= ". / data/", transform=transform Train=True, download=False) from torch.utils.data import random_splitdata_train, _ = random_split (dataset=data_train, lengths= [1000, 59000], generator=torch.Generator () .manual_seed (0) data_test, _ = random_split (dataset=data_test, lengths= [1000, 59000], generator=torch.Generator () .manual_seed (0)) data_loader_train = torch.utils.data.DataLoader (dataset=data_train) Batch_size=4, shuffle=True) data_loader_test = torch.utils.data.DataLoader (dataset=data_test, batch_size=4, shuffle=True) # images Labels = next (iter (data_loader_train)) # # img = torchvision.utils.make_grid (images) # img = img.numpy () .transpose (1,2 0) # # std = [0.5] # mean = # img = img * std + mean## print ([labels [I] for i in range (64)]) # plt.imshow (img) # plt.show () # class Model (torch.nn.Module): # # def _ init__ (self): # super (Model, self). _ init__ () # self.conv1 = torch.nn.Sequential (# torch.nn.Conv2d (1) 64, kernel_size=3, stride=1, padding=1), # torch.nn.ReLU (), # torch.nn.Conv2d (64,128, kernel_size=3, stride=1, padding=1), # torch.nn.ReLU (), # torch.nn.MaxPool2d (stride=2, kernel_size=2) #) # # self.dense = torch.nn.Sequential (# torch.nn.Linear (14 * 14 * 128,1024) # torch.nn.ReLU (), # torch.nn.Dropout (pendant 0.5), # torch.nn.Linear (1024, 10) #) # # def forward (self, x): # x = self.conv1 (x) # convolution processing # x = x.view (- 1) 14014128) # flattening the parameters # x = self.dense (x) # return xclass Model (torch.nn.Module): def _ _ init__ (self): super (Model, self). _ _ init__ () self.conv1 = torch.nn.Sequential (torch.nn.Conv2d (1,64, kernel_size=3, stride=2, padding=1), torch.nn.ReLU () Torch.nn.Conv2d (64,128, kernel_size=3, stride=2, padding=1), torch.nn.ReLU (), # torch.nn.MaxPool2d (stride=2, kernel_size=2)) self.dense = torch.nn.Sequential (# torch.nn.Linear (14 * 14 * 128,512), torch.nn.Linear (7 * 7 * 128,512) Torch.nn.ReLU (), # torch.nn.Dropout (pendant 0.5), torch.nn.Dropout (pendant 0.8), torch.nn.Linear (512,10) def forward (self, x): X = self.conv1 (x) # convolution processing # x = x.view (- 1) 14014128) # flattening the parameters x = x.view (- 1) 7 * 7 * 128) # flattening the parameters x = self.dense (x) return xmodel = Model () cost = torch.nn.CrossEntropyLoss () optimizer = torch.optim.Adam (model.parameters ()) epochs_n = 5for epoch in range (epochs_n): running_loss = 0.0 running_correct = 0 print ("Epoch {} / {}" .format (epoch) Epochs_n)) print ("-" * 10) for data in data_loader_train: X_train, y_train = data X_train, y_train = Variable (X_train), Variable (y_train) outputs = model (X_train) _, pred = torch.max (outputs.data, 1) optimizer.zero_grad () loss = cost (outputs Y_train) loss.backward () optimizer.step () running_loss + = loss.data running_correct + = torch.sum (pred = = y_train.data) testing_correct = 0 for data in data_loader_test: X_test, y_test = data X_test, y_test = Variable (X_test), Variable (y_test) outputs = model (X_test) _ Pred = torch.max (outputs.data, 1) testing_correct + = torch.sum (pred = = y_test.data) print ("Loss is: {: .4f}, Train Accuracy is: {: .4f}%, Test Accuracy is: {: .4f}" .format (running_loss / len (data_train) 100 * running_correct / len (data_train) 100 * testing_correct / len (data_test)) X_test Y_test = next (iter (data_loader_test)) inputs = Variable (X_test) pred = model (inputs) _, pred = torch.max (pred, 1) print ("Predict Label is:", [i for i in pred.data]) print ("Real Label is:", [i for i in y_test]) img = torchvision.utils.make_grid (X_test) img = img.numpy (). Transpose (1,2,0) std = [0.5,0.5] Mean = [0.5,0.5,0.5] img = img * std + meanplt.imshow (img) plt.show () Test

Finally, the demand for the running time of this kind of code is huge, so it's normal not to get out in a short period of time, try not to interrupt the program, if you want to check whether the program is running:

Epochs_n = 5for epoch in range (epochs_n): running_loss = 0.0 running_correct = 0 print ("Epoch {} / {}" .format (epoch, epochs_n)) print ("-" * 10) iter = 0 for data in data_loader_train: iter+=1 print (iter) X_train, y_train = data X_train, y_train = Variable (X_train) Variable (y_train) outputs = model (X_train) _, pred = torch.max (outputs.data, 1) optimizer.zero_grad () loss = cost (outputs Y_train) loss.backward () optimizer.step () running_loss + = loss.data running_correct + = torch.sum (pred = = y_train.data) testing_correct = 0 for data in data_loader_test: X_test, y_test = data X_test, y_test = Variable (X_test), Variable (y_test) outputs = model (X_test) _ Pred = torch.max (outputs.data, 1) testing_correct + = torch.sum (pred = = y_test.data) print ("Loss is: {: .4f}, Train Accuracy is: {: .4f}%, Test Accuracy is: {: .4f}" .format (running_loss / len (data_train) 100 * running_correct / len (data_train) 100 * testing_correct / len (data_test))

You can add an int test variable iter here to determine whether the program continues to run by observing whether iter accumulates iterations.

The above is the realization process of simple handwritten digit recognition in PyTorch. The editor believes that there are some knowledge points that we may see or use in our daily work. I hope you can learn more from this article. For more details, please follow the industry information channel.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.