How to identify PyTorch objects 07/09 Update SLTechnology News&Howtos

How to identify PyTorch objects

2025-07-09 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Shulou(Shulou.com)06/01 Report--

In this issue, the editor will bring you about how to identify PyTorch objects. The article is rich in content and analyzes and narrates it from a professional point of view. I hope you can get something after reading this article.

Keras is a great library that provides a simple API to build neural networks, but the recent excitement about PyTorch finally got me interested in exploring the library. Although I am a "blind follower of hype", the adoption of researchers and the admiration of fast.ai have convinced me that there must be something new to explore in this new entrance to deep learning.

Since the best way to learn a new technology is to use it to solve problems, my work in learning PyTorch began with a simple project: using pre-trained convolution neural networks for object recognition tasks. We will see how to use PyTorch to achieve this goal and learn some important concepts about libraries and transfer learning in the process.

While PyTorch may not be for everyone, it's hard to tell which deep learning library will stand out at this point, and being able to learn and use different tools quickly is critical to becoming a data scientist.

The complete code for the project is provided as Jupyter Notebook on GitHub (https://github.com/WillKoehrsen/pytorch_challenge/blob/master/Transfer%20Learning%20in%20PyTorch.ipynb). This program originated from my participation in the Udacity PyTorch Scholarship Challenge (https://www.udacity.com/facebook-pytorch-scholarship).

From trained network prediction

Transfer learning method

Our task is to train the convolution neural network (CNN) which can recognize the objects in the image. We will use the Caltech 101dataset (http://www.vision.caltech.edu/Image_Datasets/Caltech201/), which contains 101categories of images. Most categories have only 50 images, which are usually not enough for neural networks to learn high precision. Therefore, we will use pre-built and pre-trained models to apply transfer learning instead of building and training CNN from scratch.

The basic premise of transfer learning is simple: adopt a model trained on a large dataset and transfer it to a smaller dataset. For object recognition using CNN, we freeze the early convolution layer of the network and train only the last few layers for prediction. The idea is that convolution layer extraction is suitable for general, low-level features (such as edges, patterns, gradients) behind layers to identify specific features in the image, such as eyes or wheels.

Therefore, we can use it to train unrelated categories of networks in large data sets (usually Imagenet) and apply them to our own problems, because images share common low-level features. The images in the Caltech 101dataset are very similar to the images in the Imagenet dataset, and the knowledge learned by the model on Imagenet should be easily transferred to this task. (http://www.image-net.org/)

The idea behind transfer Learning

The following is an overview of transfer learning for object recognition:

Load a pre-trained CNN model trained on a large dataset

Parameters (weights) in the lower convolution layer of the freezing model

Add a custom classifier with multi-layer trainable parameters for modeling

Train the classifier layer of training data that can be used for tasks

Fine-tune the hyperparameters and thaw more layers as needed

Facts have proved that this method can be applied to a wide range of fields. This is a good tool, and it is usually the first method that should be tried when faced with new image recognition problems.

Data settin

For all data science issues, correctly formatting the data will determine the success or failure of the project. Fortunately, the Caltech 101dataset images are clear and stored in the correct format. If we set up the data directory correctly, PyTorch can easily associate the correct label with each class. I divide the data into training, verification and test sets, which are 50%, 25%, 25%, 25%, respectively, and then build the directory as follows:

Number of training images by category (I can use the term categories and categories interchangeably):

Number of training images by category

We want the model to do better on classes with more examples because it can better learn to map features to tags. In order to handle a limited number of training samples, we will use additional data during the training period.

As another data exploration, we can also look at the size distribution.

Average image size by category (in pixels)

The Imagenet model requires a 224 x 224 input size, so one of the preprocessing steps will be to resize the image. Preprocessing is also where we implement data enhancement for training data.

Data enhancement

The idea of data enhancement is to artificially increase the number of training images seen by the model by applying random transformations to the images. For example, we can randomly rotate or crop images or flip them horizontally. We want our model to be able to distinguish between objects, and data enhancements can make the model's transformation of input data unchanged regardless of direction.

No matter which direction the elephant goes, the elephant is still an elephant!

Image transformation of training data

Enhancements are usually made only during training (although testing time can be increased in the fast.ai library). Each period-through an iteration of all training images-applies a different random transformation to each training image. This means that if we iterate through the data 20 times, our model will see 20 slightly different versions of each image. The overall result should be a model that can learn about the objects themselves, rather than how to render them or the artifacts in the image.

Image preprocessing

This is the most important step in processing image data. During the image preprocessing, we also prepare the image for the network and apply the data enhancement to the training set. Each model has different input requirements, but if we finish reading what Imagenet needs, we will find that our images need to be 224x224 and standardized to a range.

To process images in PyTorch, we use migration, which is a simple operation applied to an array. Verify (and test) the migration as follows:

adjustment

Center cut to 224 x 224

Transfer to tensor

Standardization with mean and standard deviation

The end result of these migrations is the tensor that can enter our network. The training transformation is similar, but random enhancement is added.

First, we define training and validation transformations:

Then we create a dataset and a data reader. ImageFolder creates the dataset, and PyTorch will automatically associate the image with the correct tag, provided our directory settings are as described above. The dataset is then passed to DataLoader, which is an iterator that produces batch images and tags.

We can view the iterative behavior of DataLoader using the following methods:

The shape of the batch is (batch_size,color_channels,height,width). During training, verification, and final testing, we will traverse the DataLoaders, passing through the complete dataset containing one period at a time. In each period, the training DataLoader will apply a slightly different random transformation to the image to enhance the training data.

Pre-training Model for Image recognition

As our data takes shape, we next turn our attention to the model. To do this, we will use a pre-trained convolution neural network. PyTorch has many models that have trained millions of images in Imagenet's 1000 classes. The complete list of models can be seen here (https://pytorch.org/docs/stable/torchvision/models.html). The performance of these models on Imagenet is as follows:

Pre-training Model in PyTorch and performance on Imagenet

For this implementation, we will use VGG-16. Although it does not record the lowest errors, I find that it is suitable for tasks and trains faster than other models. The process of using the pre-training model has been established:

Load pre-training weights from networks trained on large data sets

Freeze ownership weights in lower (convolution) layers: adjust the layers to be frozen according to the similarity between the new task and the original dataset

Replace the upper layer of the network with a custom classifier: the number of outputs must be set equal to the number of classes

Train custom classifier layers only for tasks to optimize the model of smaller data sets

Loading a pre-trained model in PyTorch is simple:

This model has more than 130 million parameters, but we only train the last few fully connected layers. First, we freeze the weights of all models:

Then we use the following layers to add our own custom classifier:

Full connection to ReLU activation, shape = (nasty inputs256)

Dropout has a 40% chance of decline.

Fully connected to the log softmax output, shape = (256 cinematic classes)

When additional layers are added to the model, they are set to trainable by default (require_grad = True). For VGG-16, we only change the last original fully connected layer. The ownership weights in the convolution layer and the first five fully connected layers are untrainable.

The final output of the network is the logarithmic probability of each of the 100 classes in our data set. The model has 135 million parameters, of which only more than 1 million will be trained.

Move the model to GPU (s)

One of the best aspects of PyTorch is that you can easily move different parts of the model to one or more gpus (https://pytorch.org/docs/stable/notes/cuda.html) so that you can make the most of your hardware. Because I used 2 gpus for training, I first moved the model to cuda, and then created a DataParallel model distributed over gpus:

This notebook should be run on a gpu so that it can be completed in a reasonable time. The acceleration of CPU can easily reach 10 times or more. )

Training loss and optimization

The training loss (the error or difference between predicted and true values) is negative logarithmic https://ljvmiranda921.github.io/notebook/2017/08/13/softmax-and-the-negative-log-likelihood/ (NLL: likelihood). NLL losses in PyTorch require logarithmic probability, so we pass the raw output from the last layer of the model. PyTorch uses automatic differentiation, which means that tensors track not only their values, but also each operation (multiplication, addition, activation, etc.). This means that we can calculate the gradient of any tensor in the network for any previous tensor.

In practice, this means that the loss not only tracks the error, but also tracks the contribution of each weight and deviation in the model to the error. After we calculate the loss, we can find the loss gradient relative to each model parameter, a process called back propagation. Once we have gradients, we will use them to update the parameters and optimizer.

The optimizer is Adam (https://machinelearningmastery.com/adam-optimization-algorithm-for-deep-learning/), an effective variant of gradient decline, and there is usually no need to manually adjust the learning rate. During training, the optimizer uses the gradient of loss to try to reduce the error of the model output by adjusting the parameters ("optimization"). Only the parameters we added to the custom classifier are optimized.

The loss and optimizer initialization are as follows:

We are ready for training through pre-trained models, custom classifiers, losses, optimizers and the most important data.

Training

There is a little more model training in PyTorch than in Keras, because we have to do our own back propagation and parameter update steps. The main loop iterates over multiple periods, and iterates through DataLoader in each period. DataLoader generates a batch of data and goals that we pass through the model. After each training batch, we calculate the loss, the gradient of the back propagation loss relative to the model parameters, and then update the parameters with the optimizer.

I suggest you check the complete training details (https://github.com/WillKoehrsen/pytorch_challenge/blob/master/Transfer%20Learning%20in%20PyTorch.ipynb) on your notebook, but the basic pseudo code is as follows:

We can continue to iterate over the data until a given amount of time is reached. However, one problem with this approach is that our model will eventually over-fit the training data. To prevent this, we use validating the data and stop it early.

Early stop

Early stop (https://en.wikipedia.org/wiki/Early_stopping) means to stop training when verification losses are not reduced in many periods. As we continue to train, the training loss will only be reduced, but the verification loss will eventually reach a minimum and reach a stable level or begin to increase. Ideally, when the verification loss is minimized, we want to stop training and hope that this model can be best extended to test data. When using an early stop, for each period of reduced validation loss, we save the parameters so that we can later retrieve those with the best validation performance.

We achieve an early stop by iterating over the DataLoader at the end of each training period. We calculate the verification loss and compare it with the minimum verification loss. If the loss is the least so far, we save the model. If the loss does not improve within a certain amount of time, we stop training and return to the best model that has been saved to disk.

Again, the complete code is in the notebook, but the pseudo code is:

To understand the benefits of early stop, we can look at the training curve that shows training and verifies loss and accuracy:

Negative logarithmic likelihood and accuracy training curve

As expected, with further training, the training loss will only continue to decrease. On the other hand, the verification loss reaches the lowest and stable state. At some point, further training is unrewarding (or even negative). Our model will only begin to memorize training data and cannot be extended to test data.

If there is no early stop, our model will train longer than necessary and will overtrain the data.

Another thing we can see from the training curve is that our model is not overfitted. There is always some overfitting, but the exit after the first trainable fully connected layer can prevent excessive loss of training and verification.

Make predictions: to infer.

I dealt with some boring but necessary details of saving and loading the PyTorch model in my notebook, but here we will move to the best part: predicting new images. We know that our model does a good job of training and even validating data, but the final test is how it performs on a set of retention tests that we have never seen before. We saved 25% of the data to determine whether our model could be extended to new data.

It is very simple to use the trained model to predict. We use the same syntax as training and validation:

The shape of our probability is (batch_size,n_classes), because we have the probability of each class. We can find the accuracy by finding the highest probability of each example and compare them with the label:

When diagnosing the network used for object identification (https://www.coursera.org/lecture/machine-learning/model-selection-and-train-validation-test-sets-QGKbr), it is helpful to look at the overall performance and individual predictions of the test set.

Model result

Here are two predictions of the model:

These are very simple, so I am glad that there is nothing wrong with the model!

We not only want to focus on the correct prediction, we will soon see some wrong output. Now let's evaluate the performance of the entire test set. To do this, we want to iteratively test the DataLoader and calculate the loss and accuracy of each example.

The convolution neural network used for object recognition is usually measured according to topk accuracy (https://stats.stackexchange.com/questions/95391/what-is-the-definition-of-top-n-accuracy). This refers to whether the real class belongs to the class that k is most likely to predict. For example, the first five accuracy is the percentage of the correct grade in the five highest probability predictions. You can get the most probable probability and class of topk from the PyTorch tensor, as shown below:

To evaluate the model on the entire test set, we calculate the metrics:

These are advantageous compared to the top1 accuracy of nearly 90% in the verified data. In general, we conclude that our pre-training model can successfully transfer its knowledge from Imagenet to our smaller dataset.

Model investigation

Although the model performs well, it is still possible to take some steps to make it better. In general, the best way to figure out how to improve the model is to investigate its errors (note: this is also an effective way to improve yourself. )

Our model is not very suitable for identifying crocodiles, so let's take a look at some test predictions in this category:

Given the subtle differences between crocodile and crocodile heads and the difficulty of the second image, I would say that our model is not entirely unreasonable in these predictions. The ultimate goal of image recognition is to surpass the human ability, our model is almost close!

Finally, we want the model to perform better on categories with more images, so we can view the accuracy chart in a given category and the number of training images in that category:

There seems to be a positive correlation between the number of training images and the accuracy of the previous test. This suggests that more increase in training data is helpful, or that we should increase the test time. We can also try different pre-training models, or build another custom classifier. At present, deep learning is still an area of experience, which means that experiments are often needed!

Conclusion

Although there is a more easy-to-use deep learning library, PyTorch has the advantages of high speed, good control over all aspects of model architecture / training, back propagation of tensors that can be distinguished automatically, and code that is easy to debug because of the dynamic nature of PyTorch diagrams. I'm not sure there are compelling arguments for using PyTorch instead of libraries with a milder learning curve (such as Keras) for production code or your own projects, but it's helpful to know how to use different options.

Through this project, we can see the basics of using PyTorch and the concept of transfer learning, which is an effective method of object recognition. We can use existing architectures that have been trained on large datasets and then adjust them to our tasks instead of training the model from scratch. This undoubtedly reduces training time and usually leads to better overall performance. The result of this project is the application of transfer learning and some knowledge of PyTorch, which we can build to build more complex applications.

We do live in an incredible era of deep learning, and anyone can build a deep learning model with easily available resources! Now is the time to make better use of these resources by building your own projects.

The above is the editor for you to share how to identify PyTorch objects, if you happen to have similar doubts, you might as well refer to the above analysis to understand. If you want to know more about it, you are welcome to follow the industry information channel.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.