How to diagnose our CNN according to the loss curve of training and verification 04/16 Update SLTechnology News&Howtos

How to diagnose our CNN according to the loss curve of training and verification

2025-04-16 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Shulou(Shulou.com)06/01 Report--

How to diagnose our CNN according to the training and verification loss curve, in view of this problem, this article introduces the corresponding analysis and solution in detail, hoping to help more partners who want to solve this problem to find a more simple and feasible method.

How to Debug

Let's talk about something simple first. If we don't consider debug, what can we do to improve our training accuracy?

Wu Enda made the following points:

Looking for more data

Make the network layer deeper.

Adopt some new methods in neural network

Training takes longer (more iterations)

Change batch-size

Try to use regularization technique (weight attenuation)

Weigh the bias and variance of the result (bias and variance)

Use more GPU

The above methods are similar to the Tricks (complete summary version) method of training neural networks mentioned earlier, which is not related to the neural network itself, but when it is detailed to our own task, we need to do a separate Debug of the task we are doing at the moment to find the problem.

So how do we Debug? Similar to writing programs, the hyperparameters in the neural network are equivalent to our code, and the output information of the neural network is equivalent to the result of code execution.

Super parameter

Hyperparameters are essential variables for training neural networks. Common hyperparameters are:

Learning rate (how to set the learning rate)

Batchsize

Weight attenuation coefficient

Dropout coefficient

Select the applicable optimizer

Whether to use batch-normalization

The structure of neural network design (such as the number of layers of neural network, the size of convolution, etc.)

So how to adjust these parameters? Of course, the corresponding modification is made by observing the output information of the neural network, and the sharp tool to observe the output information of the neural network is visualization.

Visualization

It is very important to observe the changes of various parameters during the training process. First of all, the most important is the loss curve (loss curves).

The figure above shows a relatively "perfect" change diagram of the loss curve, which decreases greatly at the beginning of the training, indicating that the learning rate is appropriate and the process of gradient decline is carried out. After learning to a certain stage, the loss curve tends to be stable, and the loss change is not as obvious at the beginning. The burr in the curve is due to batch-size. The larger the batch-size setting, the smaller the burr. After all, the data of buying a batch-size is equivalent to different individuals.

The above picture is also a correct loss curve, although the change trend is not very obvious, but you can still see that the curve is slowly declining, this process is actually a stage of fune-turning. Following the loss curve of Shang Yifu, the loss value of this picture is already very small, although there are many burrs, but the overall trend is correct.

So what is the problematic de-curve? Borrow PPT from CS231n:

In the image above, the image in the upper left corner shows that there is obviously nothing to learn (it may seem difficult to smooth properly), while the second is a typical overfitting phenomenon; the third is a more serious overfitting; the fourth loss value does not stabilize, which is probably due to insufficient training; the fifth one converges slowly after a relatively long iterate, obviously because the initialization weight is too small. The last one, the more learning, the greater the loss, which is likely to be a "gradient upward".

The figure above shows more errors: the first and second on the left: there is no shuffling of the data set, that is, the data set is read in the same order in each training; the first on the right: the curve suddenly disappears during the training. why? Because we encountered the Nano value (which is not shown in the figure), but we should be aware of this problem, which is probably due to the model setting; the last figure shows that a smaller proportion of val set settings can lead to statistical inaccuracy, and the better val setting ratio is 0.2.

The graph on the left of the above picture shows that it has been trained five times (five curves), but in the process of training, it is found that it is "very difficult" to converge, that is, the neural network is more difficult. Why? The reason is very simple, it is very likely that we added a nonlinear activation function before the softmax layer (such as RELU), the original softmax function wants us to input negative or positive numbers (negative input when the Softmax expected output is relatively small, and input positive when softmax output is actually larger), but relu only 0 and 1, then input to softmax will cause the loss of information, resulting in learning extremely difficult.

To sum up, if you think your neural network design is not obviously wrong, but the loss curve still shows very strange, then it is very likely:

There is a problem with the use of loss function.

There may be a problem with the way the training data is loaded

There may be a problem with the optimizer

Some other hyperparameter settings may be problematic

In a word, the loss curve is a sharp tool to observe whether there is something wrong with the neural network. It is very necessary for us to observe the change of our loss curve in the process of training. The more timely the better!

Regularization

In addition to the loss function curve, the accuracy curve is also the focus of our observation. The accuracy curve can not only observe whether our neural network is moving in the right direction, but also observe the relationship between loss and accuracy. Because we often use different evaluation criteria (metric) and loss functions when measuring a task, the more typical examples are:

IOU evaluation criteria and DICE loss function in image segmentation.

"Dice" is an a metric for model evaluation equal to intersection (A, B) / (A, B). Similar to IoU (IoU = intersection (Amagin B) / union (Amam B)), it is used to assess a quality of image segmentation models. "Accuracy" is not really good for this task. For example, in this competition, you can quite easily get 99.9% accuracy of predicted pixels, but the performance of the models may be not as great as you think. Meanwhile, such metrics as dice or IoU can describe your models reasonably well, therefore they are most often used to asses publicly available models. The implementation of the metric used for score evaluation in this competition takes some time and requires additional post-processing, such as mask splitting. Therefore, it is not so common for quick model evaluation. Also, sometimes "soft dice" (dice with multiplication instead of intersection) is used as a loss function during training of image segmentation models.

Of course, there are two important super parameters, random inactivation and weight attenuation, which are not obvious by observing the loss curve. Only through the specific evaluation standard curve, setting the standard and then comparing, can we judge whether it is necessary to add dropout or weight decay.

Standardization and batch standardization

Standardization may already be a standard process for training neural networks, whether it is standardizing in the data or adding a batch standardization layer to the network.

But standardization techniques are usually only used for classification (and some derived applications), but they are not suitable for tasks that are sensitive to the size of the input image and style migration to generate classes. Don't ask why, the result will give you the answer.

This is the answer to the question about how to diagnose our CNN according to the training and verification loss curve. I hope the above content can be of some help to you. If you still have a lot of doubts to be solved, you can follow the industry information channel for more related knowledge.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.