How to use the optimizer optimizers 07/02 Update SLTechnology News&Howtos

How to use the optimizer optimizers

2025-07-02 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Shulou(Shulou.com)06/01 Report--

This article is to share with you about how to use the optimizer optimizers, the editor thinks it is very practical, so I share it with you to learn. I hope you can get something after reading this article.

An overview of the optimizer

In the field of machine learning, there is a group of alchemists whose daily routine is:

Take the medicinal materials (data), set up the eight diagrams stove (model), light the Liuwei real fire (optimization algorithm), shake the pu fan and wait for the Dan medicine to come out.

However, as cooks know, the same ingredients, the same recipes, but the heat is not the same, the taste is very different. When the fire is small, it is easy to paste when the fire is big, and if the fire is uneven, it is half raw and half paste.

The same is true of machine learning, the choice of model optimization algorithm is directly related to the performance of the final model. Sometimes the effect is not good, it may not be the problem of feature or the problem of model design, it may be the problem of optimization algorithm.

Deep learning optimization algorithm has probably experienced the development process of SGD-> SGDM-> NAG-> Adagrad-> Adadelta (RMSprop)-> Adam-> Nadam.

For the average novice alchemist, the optimizer uses Adam directly and uses its default parameter OK.

Some alchemists who like to write papers may prefer to use Adam optimizer to decline rapidly in the early stage and fine tune the optimizer parameters in the later stage to get better results due to the pursuit of evaluation index effect.

In addition, there are also some cutting-edge optimization algorithms, which are said to be better than Adam, such as LazyAdam, Look-ahead, RAdam, Ranger and so on.

Second, the use of optimizer

The optimizer mainly uses the apply_gradients method to pass in variables and corresponding gradients to iterate over a given variable, or directly uses the minimize method to iterate and optimize the objective function.

Of course, a more common use is to pass the optimizer into keras's Model at compile time, and iteratively optimize Loss by calling model.fit.

When the optimizer is initialized, a variable optimier.iterations is created to record the number of iterations. So the optimizer, like tf.Variable, generally needs to be created outside @ tf.function.

Third, built-in optimizer

Deep learning optimization algorithm has probably experienced the development process of SGD- > SGDM- > NAG- > Adagrad- > Adadelta (RMSprop)-> Adam-> Nadam.

In the keras.implementations submodule, they basically all have implementations of the corresponding classes.

SGD, the default parameter is pure SGD, setting the momentum parameter is not 0 actually becomes SGDM, taking into account the first-order momentum, setting nesterov to True becomes NAG, that is, Nesterov Acceleration Gradient. When calculating the gradient, the gradient is calculated where the step forward is located.

Adagrad, considering the second-order momentum, has different learning rates for different parameters, that is, adaptive learning rates. The disadvantage is that the learning rate decreases monotonously, and the learning rate may be too slow in the later stage or even stop learning in advance.

RMSprop, considering the second-order momentum, has different learning rates for different parameters, that is, adaptive learning rate. Adagrad is optimized, and only the second-order momentum in a certain window is considered through exponential smoothing.

Adadelta, which takes into account the second-order momentum, is similar to RMSprop, but more complex and more adaptive.

Adam, considering both the first-order momentum and the second-order momentum, can be regarded as the further consideration of Momentum on RMSprop.

Nadam, further considering Nesterov Acceleration on the basis of Adam.

The above is how to use the optimizer optimizers. The editor believes that there are some knowledge points that we may see or use in our daily work. I hope you can learn more from this article. For more details, please follow the industry information channel.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.