Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

What is the generalization problem and convergence problem of Adam being attacked?

2025-01-29 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Share

Shulou(Shulou.com)06/01 Report--

What is the generalization problem and convergence problem that Adam is attacked? aiming at this problem, this article introduces the corresponding analysis and solution in detail, hoping to help more partners who want to solve this problem to find a more simple and feasible method.

The most commonly used Adam optimizer has the advantages of fast convergence and easy parameter adjustment, but it also has generalization and convergence problems that are often complained about.

As a result, the traditional SGD+momentum optimizer is still used in many bosses' code.

Let's casually talk about the following questions to expand your knowledge:

Generalization problem and Convergence problem of Adam being attacked

What is 1 Adam?

Friends who have knowledge of machine learning should be no stranger to Adam optimizer. Generally speaking, it is the combination of Momentum, Adagrad and RMSProp.

[if necessary, you can then talk about the algorithms of various types of optimizers in a simple and easy way]

As you can see from Adam and SGDM, Momentum is a good design.

2 two complaints of Adam 2.1 generalization problem

When discussing the problem of model generalization, we will hope that the minimum (convergence position) found by the model is a relatively gentle and not steep position! The reason is shown in the following picture:

The convergence point on the left is a relatively gentle point, while the one on the right is a very steep, very sharp convergence point. Although the training set and test set are required to be the same distribution, there will still be a slight difference.

For steep convergence points, the loss of the training set may be small, but the loss of the test set is likely to be large. This is not the case with flat convergence points. This is the generalization problem, and it is sometimes regarded as a fitting phenomenon.

But we can't directly prove that Adam always finds the minimum of sharp. However, many papers have more or less only pointed out that the Adam will have a larger error when testing.

Here is a picture, you can see that although Adam converges fastest in the training set, the effect of the test set is not very good.

2.2 Convergence problem

Adam can fail to converge in some cases, and the most famous Adam complaint about this problem is this paper: best paper:On the Convergence of Adam and Beyond of 2018 ICLR

But this problem is not often encountered by people like us, on the contrary, the generalization problem is a real problem.

3 improved Adam3.1 learning rate scheduling

One learning rate scheduling solution that bosses often use is warn-up+decay.

[warn-up]: it means that you should not use a high learning-rate at first, but should slowly increase from a low one to a base-learning rate. The learning rate is from small to big.

[decay]: with the increase of the number of steps of optimization, gradually reduce the learning rate.

The Decay part is actually very common, and everyone can use it, but warn-up is actually a little weird, as you can see in ResNet's paper.

3.2 RAdam

Radam put forward some effective strategies during warm up.

The answers to the questions about the generalization and convergence of Adam attacks are shared here. I hope the above content can be of some help to you. If you still have a lot of doubts to be solved, you can follow the industry information channel for more related knowledge.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Internet Technology

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report