What is the influence of the size of batch in deep learning on learning effect 04/18 Update SLTechnology News&Howtos

What is the influence of the size of batch in deep learning on learning effect

2025-04-18 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Shulou(Shulou.com)06/01 Report--

This article is to share with you about the impact of the size of batch in deep learning on the learning effect. I think it is very practical, so I share it with you. I hope you can get something after reading this article. Let's take a look at it.

Talking about Batch_Size in Deep Learning

Batch_Size (batch size) is an important parameter in machine learning, which involves many contradictions.

First of all, why do you need the parameter Batch_Size?

The choice of Batch first determines the direction of decline. If the data set is small, it can be in the form of full data set (Full Batch Learning), which has at least two advantages: first, the direction determined by the whole data set can better represent the sample population, thus more accurately facing the extreme value. Second, because the gradient values of different weights are very different, it is very difficult to select a global learning rate. Full Batch Learning can use Rprop to update weights individually based on gradient symbols only and specifically.

For larger datasets, these two benefits turn into two disadvantages: first, with the massive growth of datasets and memory constraints, loading all data at once becomes less and less feasible. Second, iterative in the way of Rprop, because of the sampling difference between each Batch, the gradient correction values cancel each other and can not be corrected. Only then did RMSProp come up with a compromise.

Since Full Batch Learning is not suitable for large datasets, how about going to the other extreme?

The so-called other extreme is to train only one sample at a time, that is, Batch_Size = 1. This is called online learning (Online Learning). The linear neuron is a paraboloid on the wrong side of the mean square error cost function and the cross section is an ellipse. For multi-layer neurons and nonlinear networks, it is still approximately paraboloid locally. Using online learning, each correction direction is modified in the gradient direction of their respective samples, which goes their own way, so it is difficult to achieve convergence.

Optimization diagram

Can I choose a moderate Batch_Size value?

Of course, this is the batch gradient descent method (Mini-batches Learning). Because if the data set is sufficient, the gradient calculated by training with half (or even much less) of the data is almost the same as the gradient trained with all the data.

To a reasonable extent, what are the benefits of increasing Batch_Size?

The memory utilization is improved, and the parallelization efficiency of large matrix multiplication is improved. The number of iterations required to run an epoch (full data set) is reduced, and the processing speed for the same amount of data is further accelerated. In a certain range, generally speaking, the larger the Batch_Size is, the more accurate the descent direction is, and the smaller the training shock is.

What is the harm of blindly increasing Batch_Size?

Memory utilization has increased, but memory capacity may not hold up. The number of iterations required to run an epoch (full data set) is reduced, and the time it takes to achieve the same accuracy is greatly increased, so the correction of the parameters appears to be more slow. When the Batch_Size increases to a certain extent, the descending direction determined by it is basically unchanged.

What is the effect of adjusting Batch_Size on the training effect?

Here runs a LeNet effect on the MNIST dataset. MNIST is a handwritten standard library, and I use the Theano framework. This is an Python in-depth learning library. Easy to install (just a few lines of commands), easy to debug (with Profile), GPU / CPU take-all, official tutorial is quite complete, support modules are very rich Profile), GPU / CPU take-all, official tutorials are quite complete, support modules are very rich (in addition to CNNs, but also support RBM / DBN / LSTM / RBM-RNN / SdA / MLPs). In its upper layer, it has Keras encapsulation, supports newer structures such as GRU / JZS1, JZS2, JZS3, and supports optimization algorithms such as Adagrad / Adadelta / RMSprop / Adam.

The results of the network experiment used in the experiment

The running result is shown in the above figure, in which the absolute time is standardized. The operation results are confirmed by the above analysis:

The Batch_Size is too small and the algorithm does not converge within 200epoches. As the Batch_Size increases, the faster you can process the same amount of data. With the increase of Batch_Size, more and more epoch are needed to achieve the same precision. Due to the contradiction between the above two factors, Batch_Size increases to a certain time, reaching the optimal time. Because the final convergence accuracy will fall into different local extremes, so Batch_Size increases to some time to achieve the optimal final convergence accuracy. The above is the impact of the batch size in deep learning on the learning effect. The editor believes that there are some knowledge points that we may see or use in our daily work. I hope you can learn more from this article. For more details, please follow the industry information channel.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.