What is the evolution of GoogleNet Inception from v1 to v4 07/13 Update SLTechnology News&Howtos

What is the evolution of GoogleNet Inception from v1 to v4

2025-07-13 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Shulou(Shulou.com)06/01 Report--

In this issue, the editor will bring you about the evolution of GoogleNet Inception from v1 to v4. The article is rich in content and analyzes and narrates it from a professional point of view. I hope you can get something after reading this article.

"GoogleNet and VGG are the first and second places in the ImageNet Challenge. The common feature is that both networks are at a deeper level."

0 Overview

GoogleNet and VGG are first and second place in the ImageNet Challenge. The common feature is that the two networks are at a deeper level. But:

VGG inherits some of the framework structures of LeNet and AlexNet, while GoogleNet makes a bolder attempt, although it has a depth of 22 layers, but the parameter is Alexnet's 1max 12. The VGG is three times that of Alexnet, thus it can be seen that GoogleNet is a good structure when memory and computing resources are limited, and this performance is more superior, crushing VGG.

1 Inception v1

In short, Inception is the core of GoogLeNet, GoogLeNet is excellent, on the one hand, the operation speed is fast, and this is the credit of Inception. Design a sparse network structure, but how to produce dense data. Use this! the three common convolution cores in CNN are stacked with pooling operations to increase the width of the network on the one hand and enhance the impact of the network on the scale on the other. But the idea of this original version is good, but the amount of computation is too large, so the author used 1x1's channel to reduce the number of images before the convolution layer of 3x3 and 5x5, so V1 looks like this:

What is the use of 1x1's convolution kernel? ]

The main purpose of 1x1 convolution is to reduce dimensions and to correct linear activation (ReLU). For example, the output of the upper layer is 100x100x128, and after passing through the 5x5 convolution layer with 256 channels (stride=1,pad=2), the output data is 100x100x256, where the parameter of the convolution layer is 128x5x5x256 = 819200. If the output of the previous layer first passes through the 1x1 convolution layer with 32 channels, and then through the 5x5 convolution layer with 256 outputs, then the output data is still 100x100x256, but the number of convolution parameters has been reduced to 128x1x1x32 + 32x5x5x256 = 204800, about four times.

[why is there a pooling layer in it? ]

Generally speaking, there are two ways to make the image smaller: but the method on the left will pool the layer first and then inception, which will lead to the loss of features, while the method on the right will lead to a large amount of computation. In order to maintain features and reduce computing at the same time, change the network to the following figure, using two parallelized modules to reduce the amount of computation, that is, pooling, convolution parallelism, and then merging

2 inception V2

Designers think that if simply stacking the network can improve the accuracy, but it will lead to a decline in computational efficiency, how to improve the expressive ability of the network without increasing too much computation?

[convolution decomposition (Fatorizing Convolutions)]

A large convolution kernel can bring a larger receptive field, but it also means more parameters. For example, the convolution kernel of size=5 has 25 parameters and size=3 has 9 parameters. The GoogLeNet team proposed that the convolution layer of a single 3x3 can be replaced by a small network of two consecutive size=5 convolution kernels: a large number of experiments have proved that this scheme will not lead to the loss of expression. Further, the team considered the convolution kernel of nx1, as shown in the following figure: therefore, any nxn convolution can be replaced by nx1 followed by 1xn. However, the team found that the effect of using this decomposition in the early stage of the network is not good, but it will be good in the middle.

Insert a picture description here

The team updated the structure of the Inception in the network, as shown in the following figure:

Figure5 is the original v1 version, then figure6 is changed to two 3x3 versions, and then figure7 is changed to 1xn and nx1 versions.

3 inception v3

The most important improvement is to decompose Factorization and decompose 7x7 into two one-dimensional convolutions (1x7 and 7x1). The same is true of 3x3. This advantage is that it can not only accelerate the operation, but also split a convolution into two convolutions, which deepens the depth of the network and increases the nonlinearity of the network. (ReLU is used for each additional layer), and the input of the network changes from 224x224 to 299x299.

4 Inception v4

The combination of Inception module and residual connection is studied. ResNet structure greatly deepens the depth of the network and greatly improves the training speed. In short, Inception v4 uses residual connection (Residual Connection) to improve v3 to get Inception-ResNet-v1, Inception-ResNet-v2, and Inception-v4 networks. Let's take a brief look at what the residual structure is:

The combination is:

Then through twenty similar modules, we get:

This is the evolution of GoogleNet Inception from v1 to v4 shared by Xiaobian. If you happen to have similar doubts, please refer to the above analysis to understand. If you want to know more about it, you are welcome to follow the industry information channel.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.