What does neural network generalization mean? 04/06 Update SLTechnology News&Howtos

What does neural network generalization mean?

2025-04-06 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Shulou(Shulou.com)06/01 Report--

This article mainly introduces "what is the meaning of neural network generalization". In daily operation, I believe that many people have doubts about what neural network generalization means. The editor consulted all kinds of materials and sorted out simple and easy-to-use operation methods. I hope it will be helpful to answer the doubts of "what is the meaning of neural network generalization?" Next, please follow the editor to study!

Whenever we train our own neural network, we need to pay attention to the problem of generalization called neural network. In essence, this means how good our model is at learning from given data and applying it to other areas.

When training the neural network, some data will be trained on the neural network, and some data will be retained to check the performance of the neural network. If the neural network performs well on the untrained data, then it can be said that its reasoning effect on the given data is very good. Let's understand this through an example.

Suppose we are training a neural network that should tell us whether there is a dog in a given image. Suppose we have pictures of several dogs, each of which belongs to a certain breed, and there are a total of 12 breeds in these pictures. I will keep all images of 10 breeds of dogs for training, while the rest of these 2 breeds will be retained for the time being.

Now, before we go into deep learning, let's look at this from a human perspective. Let's consider a person who has never seen a dog in his life (to give just one example). Now, we will show humans 10 kinds of dogs and tell them that they are dogs. After that, if we show them the other two breeds, can they say that they are also dogs? Hopefully they can do it. Ten breeds should be enough to understand and identify the unique characteristics of dogs. The concept of learning from some data and correctly applying the knowledge gained to other data is called generalization.

Back to deep learning, our goal is to make the neural network learn as effectively as possible from the given data. If we succeed in making the neural network understand that the other two breeds are still dogs, then we have developed a very general (general) neural network that will perform well in the real world.

This is actually easier said than done, and training general neural networks is one of the most frustrating tasks for deep learning practitioners. This is due to a phenomenon in the neural network, that is, overfitting. If the neural network trains 10 kinds of dogs and refuses to classify the other two dogs as dogs, then the neural network is too suitable for the training data. This means that the neural network has remembered the 10 kinds of dogs and only treats them as dogs. As a result, it cannot form a general understanding of what a dog looks like. To solve this problem while training neural network is what we want to discuss in this paper.

Right now, we don't really have the freedom to divide all the data on a basis like "reproduction". Instead, we will simply split all the data. A portion of the data, usually a large portion (about 80-90%), will be used to train the model, and the rest will be used to test the model. Our goal is to ensure that the performance of the test data is roughly the same as that of the training data. We use indicators such as loss and accuracy to measure this performance.

We can control some aspects of the neural network to prevent overfitting. Let's introduce them one by one. The first is the number of parameters.

Number of parameters

In neural network, the number of parameters essentially refers to the number of weights. This will be proportional to the number of layers and the number of neurons in each layer. The relationship between the parameters and the number of overfitting is as follows: the more parameters, the easier it is to lead to overfitting.

We need to define the problem in terms of complexity. A very complex dataset will require a very complex function to successfully understand and represent it. Mathematically, we can associate complexity with nonlinearity. Let's recall the neural network formula.

Here, W1, W2 and W3 are the weight matrices of this neural network. Now, what we need to pay attention to is the activation function in the equation, which applies to each layer. Because of these activation functions, each layer is nonlinearly connected to the next layer.

The output of the first layer is f (Wend1 * X) (set L1), and the output of the second layer is f (Warri2 * L1). As you can see, the output of the second layer is nonlinear to that of the first layer due to the activation function (f). Therefore, at the end of the neural network, the final value Y is nonlinear to a certain extent relative to the input X, depending on the number of layers in the neural network.

The more layers there are, the more activation functions disturb the linearity between layers and therefore become more nonlinear.

Because of this relationship, we can say that if the neural network has more layers and more nodes in each layer, it will become more complex. Therefore, we need to adjust the parameters according to the complexity of the data. There is no definite method except to repeat the experiment and compare the results.

In a given experiment, if the test index is much lower than the training index, the model has overfitting, which means that the neural network has too many parameters for the given data. This basically means that the neural network is too complex for a given data and needs to be simplified. If the test score is roughly the same as the training score, then the model has been generalized, but this does not mean that we have reached the maximum potential of the neural network. If we increase the parameters, the performance will be improved, but it may also be overfitted. Therefore, we need to continue to try to optimize the number of parameters by balancing performance and generalization.

We need to match the complexity of the neural network with the data complexity. If the neural network is too complex, it will begin to remember the training data rather than have a general understanding of the data, resulting in overfitting.

Usually, in-depth learning how practitioners do this is to first train the neural network with a sufficient number of parameters to make the model over-fit. So, initially, we tried to get a model that was very suitable for the training data. Next, we try to iteratively reduce the number of parameters until the model stops overfitting, which can be regarded as the best neural network. Another technique that we can use to prevent overfitting is the use of dropout neurons.

Dropout neuron

In neural network, adding dropout neuron is one of the most popular and effective methods to reduce overfitting of neural network. Basically, every neuron in the network has a certain probability to withdraw completely from the network. This means that at a certain time, some neurons will not be connected to any other neurons in the network. This is a visual example:

At each moment of the training process, a different group of neurons will fall in a random manner. Therefore, we can say that at every moment, we are effectively training a neural network with a subset less than the original neural network. Because of the random nature of the dropout neurons, the sub-neural network changes every time.

In fact, what happens here is that when we train neural networks with missing neurons, we are basically training many smaller subset neural networks, and because the weights are part of the original neural networks, so the final weight of the neural network can be regarded as the average weight of all the corresponding subset neural networks. This is a basic visualization of what is happening:

This is how dropout neurons work in neural networks, but why does dropout prevent overfitting? There are two main reasons for this.

The first reason is that neurons in dropout promote neuronal independence. Because of the fact that neurons may or may not exist around a particular neuron at a particular moment, the neuron cannot rely on the neurons that surround it. As a result, it will be forced to become more independent while training.

The second reason is that because of dropout, we essentially train many smaller neural networks at a time. In general, if we train multiple models and calculate the average weight, the performance will usually be improved due to the accumulation of independent learning of each neural network. However, this is an expensive process because we need to define multiple neural networks and train them separately. However, when we drop out of school, we do the same thing, and we only need a neural network from which we can train a variety of possible configurations of sub-neural networks.

Training multiple neural networks and summarizing their learning knowledge is called "set", which can usually improve performance. The use of dropout is essentially implemented when there is only one neural network.

The next technique to reduce overfitting is weight regularization.

Weight regularization

When training neural networks, the values of some weights may become very large. This happens because these weights are concentrated on certain characteristics of the training data, which leads to their increasing value throughout the training process. Therefore, the network is too suitable for training data.

We don't need to keep adding weights to capture specific patterns. On the contrary, it is good if their values are higher than other weights. However, in the process of training, when the neural network trains the data through many iterations, the weight value may continue to increase until the weight becomes larger, which is unnecessary.

One of the other reasons why excessive weight is bad for neural networks is due to the increased difference between input and output. Basically, when there is a huge weight in the network, it is very easy to input small changes, but the neural network should basically output the same thing as a similar input. When we have a huge weight, even if we keep two very similar separate data inputs, their output may be very different. This will lead to many wrong predictions on the test data, thus reducing the versatility of the neural network.

The general rule of weight in neural network is that the higher the weight in neural network is, the more complex the neural network is. Therefore, neural networks with higher weights usually tend to over-fit.

So, basically, we need to limit the growth of weights so that weights don't increase too much, but what exactly should we do? The neural network tries to minimize the loss during training, so we can try to include a portion of the weight in the loss function so that the weight is minimized during training, but of course priority should be given to reducing the loss.

There are two ways to do this, called L1 and L2 regularization. In L1, we account for only a small portion of the absolute value of ownership weight in the network. In L2, we account for a small part of the sum of all the squares of the weights in the network. We just add this expression to the overall loss function of the neural network. The formula is as follows:

In this case, lambda is a value that allows us to change the degree of weight change. We basically just add L1 or L2 terms to the loss function of the neural network, so that the network will also try to minimize these terms. By adding L1 or L2 regularization, the network will limit the growth of its weight because the size of the weight is part of the loss function, and the network always tries to minimize the loss function. Let's highlight some of the differences between L1 and L2.

When using L1 regularization, although the weight is reduced by regularization, L1 attempts to reduce it completely to zero. Therefore, the unimportant weights that contribute little to the neural network will eventually become zero. However, in the case of L2, because the square function is inversely proportional to a value less than 1, the weight is not pushed to zero, but to a smaller value. As a result, unimportant weights are much less important than other rights.

This covers important ways to prevent overfitting. In deep learning, we usually mix these methods to improve the performance of the neural network and improve the generalization of the model.

At this point, the study of "what is the meaning of neural network generalization" is over. I hope to be able to solve your doubts. The collocation of theory and practice can better help you learn, go and try it! If you want to continue to learn more related knowledge, please continue to follow the website, the editor will continue to work hard to bring you more practical articles!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.