What is R language label smoothing? 07/19 Update SLTechnology News&Howtos

What is R language label smoothing?

2025-07-19 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Shulou(Shulou.com)05/31 Report--

This article mainly introduces "what is R language tag smoothing". In daily operation, I believe many people have doubts about what R language label smoothing is. The editor consulted all kinds of data and sorted out simple and easy-to-use operation methods. I hope it will be helpful for you to answer the question of "what is R language tag smoothing?" Next, please follow the editor to study!

What is label smoothing?

Label smoothing is a modification of loss function, which has been proved to be a very effective method for training deep learning networks. Label smoothing improves the accuracy of image classification, translation and even speech recognition. Our team used it to break many FastAI ranking records:

Label smoothing is called in our FastAI training code

The simple explanation is that it adjusts the training goal of the neural network from "1" to "1-label smoothing adjustment", which means that the neural network is trained to be less confident in its answers. The default value is usually 0.1, which means that the target answer is 0.9 (1-0.1) instead of 1.

For example, suppose we want to classify images as dogs and cats. If we see a picture of a dog, we train NN (through cross-entropy loss) to move to 1 for the dog and 0 for the direction of the cat. If it is a cat, we train in the opposite direction. 1 is for cat and 0 is for dog. In other words, this is a binary or "hard" answer.

However, NN has a bad habit of becoming "overconfident" in predictions during training, which may reduce their generalization ability and perform equally well on new, invisible future data. In addition, large datasets often contain mislabeled data, which means that neural networks should essentially be skeptical of "correct answers" to reduce modeling in extreme cases around incorrect answers to some extent.

So what tag smoothing does is train NN to move toward the "1-adjustment" target, and then divide the rest of the classes by this adjustment, making it less confident about its answer, rather than simply setting it to 1.

For our binary cat / dog example, 0.1 label smoothing means that the target answer will be 0.90 (90% sure) this is an image of a dog, while 0.10 (10% sure) this is a cat, not the result of the previous move to 1 or 0. Because it is uncertain, as a regularization form, it improves its ability to predict new data.

As you can see, tag smoothing in the code helps to understand how it works better than normal mathematical operations (from FastAI github). ε is the label smoothing adjustment factor:

FastAI implementation of label smoothing

Effect of label smoothing on Neural Network

Now let's go to the core of the article to visually show the impact of label smoothing on neural network classification processing.

First, AlexNet classifies "airplanes, cars and birds" in training.

Left: training without label smoothing, right: training with label smoothing

Performance on the verification set:

As you can see, label smoothing forces closer grouping of classifications and more equidistant intervals between clusters.

The ResNet example of Beaver, Dolphin and Otter is more illustrative:

ResNet training is used to classify 3 image categories. Please note the huge differences in clustering compactness.

ResNet validation set results, label smoothing improves the final accuracy. Note that during training, label smoothing drives the activation value into tight clusters, while in the verification set, it propagates around the center and fully covers the predicted confidence range.

As the image shows, label smoothing results in tighter clustering and greater separation between categories for final activation.

This is the main reason why label smoothing can produce more regularization and robust neural networks, and it is important to tend to better generalize future data. However, in addition to getting the center of a better activation value, there are additional benefits.

Implicit network correction function of label smoothing

In this paper, starting from the visualization process, Hinton and others show how to automatically calibrate the network and reduce the network calibration error without manually adjusting the temperature.

Previous studies (Guo et al) have shown that neural networks are often overconfident and poorly calibrated relative to their true accuracy. To prove this point, Guo et al developed a calibration metric called ECE (expected Calibration error). By using this measurement, they can use a trained modifier called temperature scaling to adjust the calibration value of a given neural network and make the network better consistent with its true capabilities (reducing ECE), thus improving the final accuracy. (perform temperature scaling by multiplying the final logits by the temperature scalar before passing it to the softmax function.)

This paper shows some examples, but the best example is ResNet with and without label smoothing training on ImageNet, and compares the two networks with the temperature adjustment network.

Compared with the uncalibrated network, label smoothing greatly improves the confidence / accuracy. The result is almost the same as using temperature scaling to adjust manually.

As you can see, the network trained with label smoothing has better ECE (expected calibration error) and, more simply, a better confidence level than its own accuracy.

In fact, the smoothed tag network is not "overconfident" and should be able to generalize and perform better on real data in the future.

Knowledge distillation (when not using label smoothing)

The last part of the paper discusses the finding that although label smoothing can produce improved neural networks for a variety of tasks. If the final model will serve as a teacher for other "student" networks, then it should not be used.

The author notes that although the use of label smoothing training improves the final accuracy of teachers, it fails to transfer enough knowledge to the student network (no label smoothing) compared with teachers who use "hard" goal training.

Tags smooth "erase" some of the details retained during hard target training. This generalization is beneficial to the performance of the teacher network, but it transmits less information to the student network.

The reason why the model produced by label smoothing is a bad teacher model can be shown more or less through the initial visualization. By forcing the final classification into tighter clusters, the network removes more details and focuses on the core differences between classes.

This "rounding" helps the network better handle invisible data. However, the lost information will eventually have a negative impact on its ability to teach new student models.

Therefore, teachers with higher accuracy are not better able to extract information from their students.

At this point, the study of "what is the smoothness of R language tags" is over. I hope to be able to solve your doubts. The collocation of theory and practice can better help you learn, go and try it! If you want to continue to learn more related knowledge, please continue to follow the website, the editor will continue to work hard to bring you more practical articles!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.