What is the relevant knowledge of network GoogLeNet in CNN 04/26 Update SLTechnology News&Howtos

What is the relevant knowledge of network GoogLeNet in CNN

2025-04-26 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Shulou(Shulou.com)06/01 Report--

In this article, the editor introduces in detail "what is the relevant knowledge of network GoogLeNet in CNN", the content is detailed, the steps are clear, and the details are handled properly. I hope that this article "what is the relevant knowledge of network GoogLeNet in CNN" can help you solve your doubts.

The GoogLeNet network was developed by Christian Szegedy in Google Research. The network won the championship of ILSVRC in 2014 and successfully reduced the error rate of top-5 to 7%. A large reason is that this network has a deeper layer than the previously introduced CNN. Although the number of layers is deeper, the weight parameters are less, 10 times less than the AlexNet network learned earlier. Why?

one。 Inception Module

That's because a large number of subnetworks in the network are made up of the Inception Module shown in the figure below. Let's take a look at the following Inception Module:

Where the expression "3x3+1 (S)" means that the convolution kernel size of this layer is 3x3Magneticstride, which is 1MagneS means SAME padding. As can be seen from the above network, the first input signal is copy and input to four different layers, all of which use the ReLU activation function. Note that the convolution layer above uses convolution kernels of 1x1, 3x3, and 5x5 respectively, which helps to capture pattern of different scales. And each layer uses SAME padding, which means that the width and height of the input image is the same as the width and height of the output, which enables the final Depth Concat to be implemented (images of different size cannot be superimposed). In TensorFlow, Depth Concat is implemented by the concat () function, in which the parameter axis is set to 3. 0.

We may have discovered why the size of some convolution kernels is 1x1. Because there is only one pixel, these convolution layers cannot actually obtain any features. In fact, this layer serves two purposes:

The first is dimensionality reduction, the role of these layers is to make the depth of the output lower than the input, so it is also known as the bottleneck layer, so as to achieve a dimensionality reduction. This effect is particularly effective before the convolution layer of 3x3 and 5x5, greatly reducing the number of training weights.

Then there is the convolution layer of each pair ([1x1Magne3x3] and [1x1Magne5x5]), like a powerful convolution layer, capable of capturing more complex pattern. In fact, a single-layer convolution layer is like a simple linear classifier sliding across the image, and the combination is equivalent to a two-layer neural network sliding across the image.

The number of convolution cores in each convolution layer is a super parameter, which means that an Inception Module has six super parameters to adjust.

two。 GoogLeNet composition

Let's take a look at the composition of GoogLeNet, as shown in the following figure. The figure contains nine inception module (with a spiral logo in the figure). The six numbers on the inception module correspond to the number of outputs of each convolution layer in the image above. And all the convolution layers in the figure use the ReLU activation function.

Let's take a look at this network:

In order to reduce the computing load, the stride=2 of the first two layers means that the width and height of the image are divided by 4, respectively, and the area is reduced to the previous 1max 16.

In order to enable the previous layer to learn more features, we then use local response regularization (which we learned together in the previous period).

Then there are two convolution layers, similar to a bottleneck layer, which can be thought of as an intelligent convolution layer.

Below is another layer of local response regularization to ensure that more features can be learned.

Then a maximum pooling layer with a stride of 2 is used to reduce the computing load.

Then there are nine inception module, with two maximum pooling layers inserted in the middle to reduce dimension and accelerate.

Next, a mean pooling layer is used, and VALID padding. The size of the output image is 1x1, which is called global mean pooling. This strategy can well force the previous layer to output more effective features, because other features will be filtered out by the mean. As a result, there are fewer full connection layers behind, and there is no need to have several full connection layers like AlexNet.

Finally, there is a Dropout regularization, a full connection layer and a softmax activation function to get the output.

The figure above is relatively simple. The original GoogLeNet also contains two auxiliary classifiers at the top of the third and sixth inception module, both of which are composed of a mean pooling layer, a convolution layer, two full connection layers and an softmax activation layer. During the training period, 70% of their loss was added to the overall loss of the network to prevent the gradient from disappearing and to regularize the network. However, the effect is relatively small.

After reading this, the article "what is the relevant knowledge of network GoogLeNet in CNN" has been introduced. If you want to master the knowledge points of this article, you still need to practice and use it yourself. If you want to know more about related articles, welcome to follow the industry information channel.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.