What does alexnet network structure mean? 07/01 Update SLTechnology News&Howtos

What does alexnet network structure mean?

2025-07-01 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Shulou(Shulou.com)06/02 Report--

Editor to share with you what the alexnet network structure refers to, I believe that most people do not know much about it, so share this article for your reference, I hope you can learn a lot after reading this article, let's go to understand it!

The first layer convolution layer 1, the number of convolution kernels is 96; the second layer convolution layer 2, the number of convolution is 256; the third layer convolution 3, the input is the output of the second layer, the number of convolution kernels is 384; the fourth layer convolution 4, the input is the output of the third layer, the number of convolution kernels is 384; the fifth layer convolution 5, the input is the output of the fourth layer, the number of convolution cores is 256.

The operating environment of this tutorial: windows7 system, Dell G3 computer.

The AlexNet network was designed by Hinton, the winner of the 2012 ImageNet competition, and his student Alex Krizhevsky. After that year, more and deeper neural networks were proposed, such as the excellent vgg,GoogleLeNet. Its official data model, the accuracy of 57.1% top 1-5 reached 80.2%. This has been quite excellent for the traditional machine learning classification algorithm.

Analysis of network structure

The figure above shows the network structure of alexnet in caffe, using two GPU servers, all of which will see two flow charts. AlexNet's network model is interpreted as follows:

The interpretation is as follows:

The first layer: convolution layer 1, input is 224x224x224x3image, the number of convolution kernels is 96, the two pieces of GPU in this paper calculate 48 cores respectively; the size of convolution kernel is 11 × 11 × 3224\ times 11\ times 311 × 11 × 3; stride = 4, stride represents step size, pad = 0, indicates unextended edge; what is the size of the convolution image? Wide = (224 + 2 * padding-kernel_size) / stride + 1 = 54height = (224 + 2 * padding-kernel_size) / stride + 1 = 54dimention = 96 and then (Local Response Normalized), followed by pooling pool_size = (3,3), stride = 2, pad = 0 finally get the first layer convolution of feature map finally the output of the first layer convolution is the second layer: convolution layer 2, input is the upper layer convolution feature map, the number of convolution is 256 The two GPU in this paper have 128convolution cores respectively. The size of the convolution kernel is 5 × 5 × 485\ times 5\ times 485 × 5 × 48; pad = 2, stride = 1; then do LRN, and finally max_pooling, pool_size = (3,3), stride = 2 The third layer: convolution 3, the input is the output of the second layer, the number of convolution cores is 384, kernel_size = (3 × 3 × 2563\ times 3\ times 2563 × 3 × 256), padding = 1, the third layer does not do LRN and Pool layer 4: convolution 4, the input is the output of the third layer, the number of convolution cores is 384, kernel_size = (3 × 33\ times 33 × 3), padding = 1, just like the third layer, there is no LRN and Pool layer 5: convolution 5, input is the output of the fourth layer The number of convolution cores is 256, kernel_size = (3 × 33\ times 33 × 3), padding = 1. Then max_pooling directly, pool_size = (3,3), stride = 2; layer 6 and 7 is the full connection layer, the number of neurons in each layer is 4096, and the final output softmax is 1000, because as mentioned above, the number of categories in ImageNet this competition is 1000. RELU and Dropout are used in the full connection layer.

The network structure diagram drawn with caffe's own drawing tool (caffe/python/draw_net.py) and train_val.prototxt under the caffe/models/bvlc_alexnet/ directory is as follows:

Python3 draw_net.py-- rankdir TB.. / models/bvlc_alexnet/train_val.prototxt AlexNet_structure.jpg

Innovation of algorithm

The main results are as follows: (1) ReLU is successfully used as the activation function of CNN, and its effect is better than that of Sigmoid in deeper network, and the gradient dispersion problem of Sigmoid in deep network is solved successfully. Although the ReLU activation function was proposed a long time ago, it was not carried forward until the emergence of AlexNet.

(2) Dropout was used to ignore some neurons randomly during training in order to avoid over-fitting of the model. Although Dropout is discussed in a separate paper, AlexNet makes it practical and proves its effect through practice. Dropout is mainly used in the last few fully connected layers in AlexNet.

(3) use overlapping maximum pooling in CNN. Previously, average pooling is widely used in CNN, and all AlexNet uses maximum pooling to avoid the blurring effect of average pooling. And the concession length proposed in AlexNet is smaller than the size of the pooled core, so that there will be overlap and coverage between the outputs of the pooled layer, which improves the richness of the features.

(4) the LRN layer is proposed to create a competition mechanism for the activity of local neurons, so that the value with larger response becomes relatively larger, and suppresses other neurons with less feedback, which enhances the generalization ability of the model.

(5) Multi-GPU training can increase the scale of network training.

(6) millions of ImageNet data image input. There are three ways to use Data Augmentation in AlexNet:

Translation transform (crop)

Reflection transform (flip)

Lighting and Color conversion (color jittering): randomly pan the picture and then flip it horizontally. In the test, five translation transformations are performed on the upper left, upper right, lower left, lower right and middle, and then the results are averaged after flipping.

It is summarized as follows:

Use ReLU to activate functions

Dropout is proposed to prevent overfitting.

Enhanced datasets with data extensions (Data augmentation)

Horizontal flip image, random cropping, translation transformation, color transformation, lighting transformation, etc.

Use multi-GPU for training

Split the results of the upper layer into two parts according to the channel dimension, and send them to two GPU respectively, such as the 27 × 27 × 96 pixel layer output from the previous layer (divided into two groups of 27 × 27 × 48 pixel layers for operation in two different GPU)

The use of local normalization of LRN

Use overlapping pooling (pooling cores of 3x3).

Training under the Caffe framework

Prepare the dataset, modify the train.prototxt of the Alexnet network, configure the solver,deploy.prototxt file, and create a new train.sh script to start the training.

The above is all the contents of the article "what does the alexnet network structure mean?" Thank you for reading! I believe we all have a certain understanding, hope to share the content to help you, if you want to learn more knowledge, welcome to follow the industry information channel!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.