In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-04-04 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >
Share
Shulou(Shulou.com)05/31 Report--
This article mainly shows you the "SpringBoot integration MybatisPlus activation function example analysis", the content is easy to understand, clear, hope to help you solve your doubts, the following let the editor lead you to study and study the "SpringBoot integration MybatisPlus activation function example analysis" this article.
I. the past life and present life of the activation function
As early as 1958, American psychologist Frank Rosenblatt proposed the first neuron model which can learn weights automatically, which is called perceptron. Its model is as follows:
As you can see from the figure, he is using a simple one-layer network, where the activation function is a step function (this is also the earliest activation function used).
So the formula of the perceptron model can be expressed as follows:
The activation function can be expressed as:
That is, when wx+b=0, the output is 1, which represents category 1. Then through the step-by-step iteration of the perceptron convergence algorithm, the parameters w and b are optimized, and finally the most original two-classification model is realized.
So some students will ask, why not use the gradient descent algorithm?
Yes, students who are familiar with the history of artificial intelligence must know that back propagation algorithms have not been proposed at that time, and there will certainly be no gradient decline. So why didn't the clever Frank think of the gradient decline? In fact, there is a reason for force majeure. And this reason is closely related to the activation function!
At that time, the most commonly used activation functions were step functions and symbolic functions. Let's take an intuitive look at their function diagrams:
On the left is the step function, and on the right is the symbolic function. These two functions have one common feature: they are discontinuous at zonal 0, and the derivative of other positions is 0, which makes it impossible to optimize parameters by using gradient descent algorithm.
Note: if the gradient is zero, the gradient drop is naturally ineffective, which is also the gradient dispersion phenomenon that we will talk about later.
The non-derivability of the perceptron model seriously restricts its potential, so that it can only solve extremely simple tasks. Therefore, in modern deep learning, on the basis of perceptron, the discontinuous step activation function is replaced with other smooth continuous activation function, which makes the model derivable.
In fact, the core structure of modern large-scale deep learning is not much different from the perceptron, but the expressive ability of the network is enhanced by stacking multi-layer network layers.
So, here are some of the most commonly used smooth continuous activation functions
Second, we have to know the activation function 1. Sigmoid
The Sigmoid function, also known as the Logistic function, is defined as:
The Sigmoid function is continuously derivable, as shown in the following figure, compared with the step function, the gradient descent algorithm can be used directly.
It is widely used to change the network parameters.
As an activation function, the input is mapped to 0: 1, so the output can be translated into probabilistic output through the Sigmoid function, which is often used to represent the event probability of the classification problem.
Derive S (x):
Derivative property: the maximum derivative is 0.25 when the input is positive or negative infinity, and the gradient dispersion occurs when the gradient tends to 0.
Advantages: smooth and easy to derive
Disadvantages: exponential calculation, large amount of calculation; easy to appear gradient dispersion.
2. Tanh
The Tanh function is defined as:
You can see that the tanh activation function can be realized by zooming and translating the Sigmoid function. The function curve is shown below:
The Tanh function can map the input x to the [− 1 Magi 1] interval. It is an improved version of Sigmoid function, the output is positive and negative, it is a symmetric function with 0 as the center, the convergence speed is fast, and it is not easy to produce loss value oscillation. However, it can not solve the problem of gradient dispersion, and the amount of calculation is large.
3. ReLU
Before the ReLU (REctified Linear Unit) activation function was proposed, the Sigmoid function was usually the first choice of the activation function of the neural network. However, when the input value of Sigmoid function is large or small, it is easy to appear the phenomenon that the gradient value is close to 0, which is called gradient dispersion phenomenon. The network parameters can not be updated for a long time, so it is difficult to train a deeper network model. The layer 8 AlexNet proposed in 2012 uses an activation function called ReLU, which enables the number of network layers to reach
To the eighth floor. The ReLU function is defined as:
ReLU (x) = max (0, x)
ReLU suppresses all the values less than 0 to 0; for positive numbers, it outputs directly, and this unilateral inhibition comes from biology. The function curve is as follows:
Advantages: (1) make the training converge quickly and solve the gradient dispersion. In the process of information transmission, the partial gradient greater than 0 is always 1. (2) sparsity: the activation rate of simulated neurons is very low; the input of ReLU can only spread information when it is greater than 0, which improves the performance of the network.
Disadvantages: when the input is less than 0, I will stop abruptly even if there is a large gradient propagation.
The design of ReLU function comes from neuroscience. It is very simple to calculate and has excellent gradient characteristics. It has been proved to be very effective in a large number of deep learning applications, and it is one of the most widely used activation functions.
4. Leaky ReLU
The gradient value of ReLU function is always 0 when the input x < 0, which may also cause gradient dispersion. In order to overcome this problem, Leaky ReLU is proposed and defined as follows:
Where p is the super parameter of a small value set by the user, such as 0.02 and so on. When p = 0, the LeayReLU function returns
Into a ReLU function. When p ≠ 0, x < 0 can obtain a smaller gradient value, thus avoiding gradient dispersion.
The function curve is as follows:
5. Softmax
Softmax function definition:
The Softmax function can not only map the output value to the [0JI] interval, but also satisfy the characteristic that the sum of all the output values is 1.
In the example shown in the figure below, the output of the output layer is [2.recoery 1.gl 0.1]. After the calculation of the Softmax function, the output is [0.7 dome 0.2 dint 0.1]. You can see that each value represents the probability that the current sample belongs to each category, and the sum of the probability values is 1. Pass
The over-Softmax function can translate the output of the output layer into category probability, which is frequently used in multi-classification problems.
In addition, in the multi-classification problem of softmax function, if the loss function chooses cross entropy, the calculation of the descending gradient will be very convenient, and the iterative computational complexity in the process of network training will be greatly reduced.
The above is all the contents of the article "sample Analysis of Activation functions in SpringBoot Integration MybatisPlus". Thank you for reading! I believe we all have a certain understanding, hope to share the content to help you, if you want to learn more knowledge, welcome to follow the industry information channel!
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.