In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-04-05 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Development >
Share
Shulou(Shulou.com)06/02 Report--
This article introduces how to use Python to understand artificial intelligence optimization algorithm, the content is very detailed, interested friends can refer to, hope to be helpful to you.
Overview
Gradient descent is one of the popular optimization algorithms in neural networks. In general, we want to find the weight and deviation of the minimized error function. The gradient descent algorithm updates the parameters iteratively to minimize the error of the whole network.
Gradient descent is one of the iterative methods, which can be used to solve least square problems (both linear and nonlinear). Gradient descent (Gradient Descent) is one of the most commonly used methods when solving the model parameters of machine learning algorithm, that is, unconstrained optimization problems, and another commonly used method is the least square method. When solving the minimum value of the loss function, the gradient descent method can be used to iterate step by step, and the minimized loss function and model parameters can be obtained. On the other hand, if we need to solve the maximum value of the loss function, we need to use the gradient rising method to iterate. In machine learning, two gradient descent methods are developed based on the basic gradient descent method, which are random gradient descent method and batch gradient descent method.
The algorithm iteratively updates the weight parameters on the gradient of the loss function until the minimum value is reached. In other words, we go downhill along the slope of the loss function until we reach the valley. The basic idea is roughly shown in figure 3.8. If the partial derivative is negative, the weight increases (the left part of the graph), and if the partial derivative is positive, the weight decreases (the right half of the graph). The learning rate parameter determines the number of steps required to reach the minimum.
Figure 3.8 basic idea of random gradient minimization
Error surface
It is a challenge to find the global best solution while avoiding the local minimum. This is because the error surface has many peaks and valleys, as shown in figure 3.9. The error surface may be highly curved in some directions, but flat in other directions. This makes the optimization process very complicated. In order to prevent the network from falling into the local minimum, it is usually necessary to specify an impulse (momentum) parameter.
Figure 3.9 complex error surfaces of typical optimization problems
I found early on that back propagation using gradient descent usually converges very slowly or not at all. When writing the first neural network, I used the back propagation algorithm, which contains a very small data set. It took more than three days for the network to converge to a solution. Fortunately, I took some measures to speed up the process.
It shows that although the learning rate related to back propagation is relatively slow, as a feedforward algorithm, it is quite fast in the prediction or classification stage.
Random gradient descent
The traditional gradient descent algorithm uses the whole data set to calculate the gradient of each iteration. For large data sets, this results in redundant calculations because the gradients of very similar samples are recalculated before each parameter is updated. The random gradient descent (SGD) is an approximation of the real gradient. In each iteration, it randomly selects a sample to update the parameters and moves on the correlation gradient of the sample. Therefore, it follows a tortuous gradient path to the minimum. To some extent, due to its lack of redundancy, it tends to converge to the solution faster than the traditional gradient decline.
It shows that a very good theoretical characteristic of random gradient descent is that if the loss function is convex 43, then the global minimum can be found.
Code practice
There are enough theories, so let's knock on the real code.
One-dimensional problem
Suppose the objective function we need to solve is:
() = 2x 1f (x) = x2x 1
It is obvious at a glance that its minimum value is = 0x=0, but here we need to use the Python code of the gradient descent method to achieve it.
#! / usr/bin/env python #-*-coding: utf-8-* "example of gradient descent method for one-dimensional problems" def func_1d (x): "" objective function: param x: independent variable, scalar: return: dependent variable, scalar "" return x * * 2 + 1 def grad_1d (x): "" gradient of objective function: param x: independent variable Scalar: return: dependent variable, scalar "" return x * 2 def gradient_descent_1d (grad, cur_x=0.1, learning_rate=0.01, precision=0.0001, max_iters=10000): gradient descent method for one-dimensional problem: param grad: gradient of objective function: param cur_x: current x value, initial value can be provided through parameters: param learning_rate: learning rate Also equivalent to the set step: param precision: set convergence precision: param max_iters: maximum iterations: return: local minimum x* "" for i in range (max_iters): grad_cur = grad (cur_x) if abs (grad_cur) < precision: break # when the gradient approaches 0 Regarded as convergent cur_x= cur_x-grad_cur * learning_rate print ("th", I, "second iteration: X value is", cur_x) print ("local minimum x =", cur_x) return cur_x if _ _ name__ = ='_ main__': gradient_descent_1d (grad_1d, cur_x=10, learning_rate=0.2, precision=0.000001) Max_iters=10000) on how to use Python to understand artificial intelligence optimization algorithms is shared here. I hope the above content can be of some help to you and learn more knowledge. If you think the article is good, you can share it for more people to see.
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.