Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

What are the six techniques for efficient use of Pytorch?

2025-04-04 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Share

Shulou(Shulou.com)06/01 Report--

This article is about the efficient use of Pytorch what are the six skills, the editor feels very practical, so share with you to learn, I hope you can learn something after reading this article, say no more, follow the editor to have a look.

Guide reading

It is often not enough to report the Top-1 accuracy of the model.

Convert a train.py script into a powerful pipeline with some additional features

The ultimate goal of each deep learning project is to bring value to the product. Of course, we want the best model. What is the "best"-depending on the specific use case, I will put this discussion outside of this article. I want to talk about how to get the best model from your train.py script.

We will introduce the following techniques:

Recommendation 1-Advanced training framework for leveraging the PyTorch ecosystem

PyTorch provides great flexibility and freedom when writing training cycles from scratch. In theory, this offers unlimited possibilities for writing any training logic. In practice, you rarely write a training cycle for training CycleGAN, distilling BERT, or 3D object detection from scratch.

Writing a complete training cycle from scratch is a good way to learn the basic principles of PyTorch. However, I strongly recommend that you turn to the high-level framework after you have some knowledge. There are many choices: Catalyst, PyTorch-Lightning, Fast.AI, Ignite, and others. Advanced frameworks save you time in the following ways:

Provide a well-tested training cycle support profile to support multi-gpu and distributed training management checkpoints / experiments to automatically record training progress

It will take some time to get the most out of these advanced libraries. However, this one-off investment pays off in the long run.

Advantages

The smaller the training pipeline-- the less code-- the less chance of error. It is easy to manage the experiment. Simplify distributed and hybrid precision training.

Shortcoming

In general, when using a high-level framework, we must write code in framework-specific design principles and examples. It takes time to invest time and learn additional frameworks. Show me the indicators.

Recommendation 2-check other indicators during training

Almost every quick-start sample project used to classify images in MNIST or CIFAR or even ImageNet has one thing in common-they report the most streamlined set of metrics during and after training. In general, this includes Top-1 and Top-5 accuracy, error rates, and training / verification losses, and that's all. Although these indicators are necessary, they are only the tip of the iceberg!

Modern image classification model has tens of millions of parameters. Do you want to use only one scalar value to calculate it?

The CNN classification model with the best Top-1 accuracy may not be the best in generalization. Depending on your domain and requirements, you may want to save the model with the most false-positive/false-negative, or the model with the highest average accuracy.

Let me give you some advice on what data you can record during training:

Grad-CAM heat-map-see which part of the image contributes the most to a particular class.

Visual Grad-CAM heat-maps helps to identify whether the model makes predictions based on real pathology or image artifacts.

Confusion Matrix-shows which two classes are the most challenging for your model.

The confusion matrix reveals the frequency at which a model incorrectly classifies specific types.

Distribution of predictions-lets you know the optimal decision boundary.

The distribution of negative and positive predictions of the model shows that there are a large number of data models that cannot be classified with certainty.

Minimum/Average/Maximum gradients across all layers, allowing you to identify whether there are vanishing / exploding gradients or poorly initialized layers in the model. Use panel tools to monitor training

Recommendation 3-use TensorBoard or any other solution to monitor training progress

When training a model, the last thing you want to do is look at the console output. Through a powerful dashboard, you can see all the metrics at once, which is a more effective way to check training results.

Tensorboard can quickly check and compare the training you run

TensorBoard is the gold standard for a small number of experiments and non-distributed environments. PyTorch has fully supported it since version 1.3 and provides a rich set of features to manage trial versions. There are also more advanced cloud-based solutions, such as Weights&Biases, [Alchemy] (https://github.com/catalyst team/alchemy) and TensorBoard.dev, which make it easier to monitor and compare training on multiple machines.

When using Tensorboard, I usually record a set of metrics like this:

Learning rate and other optimization parameters that may change (momentum, weight attenuation, etc.) time within the model for data preprocessing and loss of training and verification (average per batch and per epoch) across training and verification metrics training session hyperparametric final value confusion matrix, Precision-Recall curve, AUC (if applicable) visualization of model prediction (if applicable)

It is very important to observe the prediction of the model intuitively. Sometimes the training data are noisy; sometimes the model will overfit the artifacts of the image. By visualizing the best and worst batch (based on loss or metrics you are interested in), you can gain valuable insight into how the model performs well and badly.

Recommendation 4-visualize the best and worst batch in each epoch. It may give you valuable insight.

Catalyst user Tip: here is an example of using visual callbacks: https://github.com/BloodAxe/Catalyst-Inria-Segmentation-Example/blob/master/fit_predict.py#L258

For example, in the global wheat testing challenge, we need to detect the wheat head on the image. By visualizing the best batch images (based on mAP metrics), we can see that the model is almost perfect in finding small objects.

The visualization of the best model prediction shows the good performance of the model on small objects.

On the contrary, when we look at the first sample of the worst batch, we see that it is difficult for the model to make accurate predictions of large objects. Visual analysis provides valuable insights for any data scientist.

The visualization of the worst model prediction reveals that the performance of the model on large objects is very poor.

Looking at the worst batch can also help you find errors in data tags. In general, mislabeled samples lose more, so they become the worst batch. By doing a visual check of the worst batch in each epoch, you can eliminate these errors:

An example of tagging errors. Green pixels represent true positives and red pixels represent false negative. In this example, the ground-truth mask is marked where it doesn't actually exist.

Use Dict as the return value for Dataset and Model

Recommendation 5-if your model returns more than one value, use Dict to return the result, not tuple

In complex models, it is not uncommon to return multiple outputs. For example, the target detection model usually returns the bounding box and its label. In the image segmentation CNN-s, we often return the middle-tier mask for in-depth monitoring, and multitasking learning is also very common recently.

In many open source implementations, I often see things like this:

# Bad practice, don't return tuple

Class RetinaNet (nn.Module):

...

Def forward (self, image):

X = self.encoder (image)

X = self.decoder (x)

Bboxes, scores = self.head (x)

Return bboxes, scores

...

For the author, I think this is a very bad way to return results from the model. Here are the alternatives I recommend:

Class RetinaNet (nn.Module):

RETINA_NET_OUTPUT_BBOXES = "bboxes"

RETINA_NET_OUTPUT_SCORES = "scores"

...

Def forward (self, image):

X = self.encoder (image)

X = self.decoder (x)

Bboxes, scores = self.head (x)

Return {RETINA_NET_OUTPUT_BBOXES: bboxes

RETINA_NET_OUTPUT_SCORES: scores}

...

To some extent, this suggestion resonates with the setting of "The Zen of Python"-"the explicit is better than the implicit". Following this rule will make your code clearer and easier to maintain.

So why do I think the second option is better? There are several reasons:

The return value has an explicit name associated with it. You don't need to remember the exact order of the elements in the tuple. If you need to access a specific element of the returned dictionary, you can access it by its name. Adding new output from the model does not break the code.

With Dict, you can even change the behavior of the model to return additional output as needed. For example, here is a short clip that demonstrates how to return multiple "primary" outputs and two "secondary" outputs for metric learning:

# https://github.com/BloodAxe/Kaggle-2020-Alaska2/blob/master/alaska2/models/timm.py#L104

Def forward (self, * * kwargs):

X = kwargs [self.input _ key]

X = self.rgb_bn (x)

X = self.encoder.forward_features (x)

Embedding = self.pool (x)

Result = {

OUTPUT_PRED_MODIFICATION_FLAG: self.flag_classifier (self.drop (embedding))

OUTPUT_PRED_MODIFICATION_TYPE: self.type_classifier (self.drop (embedding))

}

If self.need_embedding:

Result [output _ PRED_EMBEDDING] = embedding

If self.arc_margin is not None:

Result [output _ PRED_EMBEDDING_ARC_MARGIN] = self.arc_margin (embedding)

Return result

The same advice applies to the Dataset class. For the Cifar-10 toy example, you can return the image and its corresponding label as a tuple. But when dealing with multitasking or multi-input models, you want to return samples of type Dict from the dataset:

# https://github.com/BloodAxe/Kaggle-2020-Alaska2/blob/master/alaska2/dataset.py#L373

Class TrainingValidationDataset (Dataset):

Def _ _ init__ (

Self

Images: Union [List, np.ndarray]

Targets: Optional [Union [List, np.ndarray]]

Quality: Union [List, np.ndarray]

Bits: Optional [Union [List, np.ndarray]]

Transform: Union [A.Compose, A.BasicTransform]

Features: List [str]

):

"

: param obliterate-Augmentation that destroys embedding.

"

If targets is not None:

If len (images)! = len (targets):

Raise ValueError (f "Size of images and targets does not match: {len (images)} {len (targets)}")

Self.images = images

Self.targets = targets

Self.transform = transform

Self.features = features

Self.quality = quality

Self.bits = bits

Def _ len__ (self):

Return len (self.images)

Def _ repr__ (self):

Return f "TrainingValidationDataset (len= {len (self)}, targets_hist= {np.bincount (self.targets)}, qf= {np.bincount (self.quality)}, features= {self.features})"

Def _ _ getitem__ (self, index):

Image_fname = self.images [index]

Try:

Image = cv2.imread (image_fname)

If image is None:

Raise FileNotFoundError (image_fname)

Except Exception as e:

Print ("Cannot read image", image_fname, "at index", index)

Print (e)

Qf = self.quality [index]

Data = {}

Data ["image"] = image

Data.update (compute_features (image, image_fname, self.features))

Data = self.transform (* * data)

Sample = {INPUT_IMAGE_ID_KEY: os.path.basename (self.images [index]), INPUT_IMAGE_QF_KEY: int (qf)}

If self.bits is not None:

# OK

Sample [input _ TRUE_PAYLOAD_BITS] = torch.tensor (self.bits [index], dtype=torch.float32)

If self.targets is not None:

Target = int (self.targets [index])

Sample [input _ TRUE_MODIFICATION_TYPE] = target

Sample [input _ TRUE_MODIFICATION_FLAG] = torch.tensor ([target > 0]). Float ()

For key, value in data.items ():

If key in self.features:

Sample [key] = tensor_from_rgb_image (value)

Return sample

When you have Dictionaries in your code, you can use name constants anywhere to refer to input / output. Following this rule will make your training channel very clear and easy to follow:

# https://github.com/BloodAxe/Kaggle-2020-Alaska2

Callbacks + = [

CriterionCallback (

Input_key=INPUT_TRUE_MODIFICATION_FLAG

Output_key=OUTPUT_PRED_MODIFICATION_FLAG

Criterion_key= "bce"

),

CriterionCallback (

Input_key=INPUT_TRUE_MODIFICATION_TYPE

Output_key=OUTPUT_PRED_MODIFICATION_TYPE

Criterion_key= "ce"

),

CompetitionMetricCallback (

Input_key=INPUT_TRUE_MODIFICATION_FLAG

Output_key=OUTPUT_PRED_MODIFICATION_FLAG

Prefix= "auc"

Output_activation=binary_logits_to_probas

Class_names=class_names

),

OutputDistributionCallback (

Input_key=INPUT_TRUE_MODIFICATION_FLAG

Output_key=OUTPUT_PRED_MODIFICATION_FLAG

Output_activation=binary_logits_to_probas

Prefix= "distribution/binary"

),

BestMetricCheckpointCallback (

Target_metric= "auc"

Target_metric_minimize=False

Save_n_best=3)

]

To detect anomalies in training, just as humans can read texts with many errors, deep learning models can also learn "something reasonable" when mistakes occur during training. As a developer, you are responsible for searching for exceptions and reasoning about their performance.

Recommendation 6-use torch.autograd.detect_anomaly () to find arithmetic anomalies during training

If you see NaNs or Inf in the loss / measure during training, an alarm will ring in your mind. It is an indicator of a problem in your pipe. Typically, it may be caused by the following reasons:

Poor initialization of the model or specific layers (you can check which layers by observing the gradient) mathematically incorrect operations (negative torch.sqrt (), non-positive torch.log (), etc.) improperly use the reduction of torch.mean () and torch.sum () (the mean on the zero-sized tensor will get nan Sum on large tensors can easily lead to overflow) use x.sigmoid () in loss (if you need to use probability in the loss function, a better way is x.sigmoid (). Clamp (eps,1-eps) to prevent the gradient from disappearing) the low epsilon value in Adam-like 's optimizer does not use dynamic loss scaling when using fp16 training

To find out exactly where Nan/Inf first appeared in your code, PyTorch provides an easy-to-use method, torch. Autograde .subscription _ anomaly ():

Import torch

Def main ():

Torch.autograd.detect_anomaly ()

...

# Rest of the training code

# OR

Class MyNumericallyUnstableLoss (nn.Module):

Def forward (self, input, target):

With torch.autograd.set_detect_anomaly (True):

Loss = input * target

Return loss

Use it for debugging purposes, otherwise disable it, anomaly detection will bring computational overhead and reduce the training speed by 10-15%.

These are the six techniques for efficient use of Pytorch. The editor believes that there are some knowledge points that we may see or use in our daily work. I hope you can learn more from this article. For more details, please follow the industry information channel.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Internet Technology

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report