What is the knowledge of CNN network layer 04/27 Update SLTechnology News&Howtos

What is the knowledge of CNN network layer

2025-04-27 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Shulou(Shulou.com)06/01 Report--

This article introduces you what is the knowledge of CNN network layer, the content is very detailed, interested friends can refer to, hope to be helpful to you.

Convolution neural network (CNN) consists of input (Inputs), convolution layer (Convolutions layer), activation layer (Activation), pooling layer (Pooling layer) and full connection layer (Fully Connected, FC). This means that there can be these layers in CNN, but the number of each network layer (Layer) can theoretically be as many as you want. This also has the later AlexNet,GoogLeNet,ResNet and other famous network structure, later I will choose one or two to introduce it. The main difference between them is the depth of Layer, that is, the number of Layer. Generally speaking, the deeper the Layer, the more features you can learn. But the more Layer, the more data you need, and the more computing resources you need. But in recent years, the emergence of a large number of available data (ImageNet Datasets), the more naughty GPU (NVIDIA), has promoted the development of Layer to a deeper direction.

Today, I will give another example of Chestnut to introduce some details of the network layer involved in CNN one by one through the following picture.

Figure 1: detail diagram of convolution neural network (from network)

Let's skim through one by one from left to right, first a grayscale image the size of 31x39x1, then four Convolution layer,3 Pooling layer, one FC, and finally a SoftMax layer for classification. If you change to a color image, it is also possible, for the difference between grayscale image and color image processing, it may be that the initial channel number (channel number) is different, in fact, it does not affect anything, through the convolution of the feature image (FeatureMaps), in fact, is the channel (channel).

1. Inputs-- > Convolutional layer1-- > max-Pooling1

Input is a picture, and then do a series of processing calculations on it, and finally you can know what it is, for example, you can know that it is a picture of a person's head.

Then let's take a look at the details, and there is a convolution core the size of 4x4x1 in the middle of the picture. Get a 28x36x20-sized Feature Maps from Convolutional layer 1. There is a formula mentioned earlier, which is (31-4) x (39-4) = 28x36. If the size of the picture and the size of the convolution kernel change, the calculation method is still the same. There is no edge here, that is, the Zero Padding mentioned earlier, and we will talk about the details later. Why 20 Feature Maps? Because it uses 20 4x4-sized convolution check images for convolution processing. Why use 20? Because the more feature graphs, the more features you can learn. Is it possible to use a larger number? Yes. But there is no guarantee that the effect will be good.

Again, deep learning is a highly experimental discipline, and the parameters are set through specific problems and are finally obtained through many experiments. So doing deep learning is a time-consuming and resource-consuming task. I still remember when I first started to learn deep learning, someone said that it was not recommended to do this alone, and it would be better to have a team, because both hardware and software are a challenge. I dare say that the laboratories of many universities in China do not have relevant computing resources. At that time, I didn't understand why I couldn't do it alone, so I foolishly persisted in learning. Hey, when I think about that process, it's all tears! If you are in the same state as when I started, I suggest keep going,you can do it better! Lay a good foundation, the problem will be solved step by step! That's all for the convolution layer. Let's take a look at Pooling again.

As you may have noticed, the Max-Pooling written in this picture is also called maximum pooling. Is there any other Pooling? Yes. There is another type, but not as common as Max, which is called mean Pooling,Mean-Pooling. You can see the difference between them from the name. MaxPooling takes the largest value in the pooling window (Pooling Window) and the latest value, and MeanPooling takes the average. The pooled size (Pooling size) of the Pooling Window shown above is 2x2, that is, there are four values in a window, right, and then max to take out the largest one. Don't think about the whole picture, just look at this 2x2-sized picture. After Pooling, it is equivalent to taking half the length and width. Is that right? 2x2 takes the maximum value, then there is only one number left, and one is the size of 1x1. Then for the whole picture, the size of the picture becomes half of its original size. So the question is, why do Pooling? Can we not do it?

Generally speaking, Pooling can reduce the size of the input and accelerate the training process of the model. But will this result in a loss of information? There is certainly a loss, but it is almost negligible compared to training time and final accuracy. Why can it be ignored? That's because the picture has the attribute of "static". To put it simply, the characteristics of the adjacent areas of a picture are almost the same. Think about the picture in real life, you know!

Then said so much, did not seem to mention that Activation,Activation is generally added after Convolutional, the role of Activation is to activate. How to understand activation? from a neurological point of view, some neurons may be died after convolution, so activate them and bring them back to life. All right, come on, we're dealing with pictures, obviously inanimate things. From a mathematical point of view, Activation is to change some lost values back to make up for the loss of information caused by Convolutional. Activation usually has many methods, and different methods correspond to different activation functions (Activationfunction). Common ones mentioned before, such as softmax,relu,Leak relu,tanh,sigmoid and so on. What's the difference? Check it out for yourself. It's not difficult. It's just a few mathematical formulas. I'll give you a chestnut. I don't want to talk about it alone. For example, our relu, the formula is

F (x) = max (0, x)

Isn't it easy! Relu is to compare the input x with 0 and take the maximum value. To put it bluntly, it means to keep the positive number, and the negative number becomes 0. The function image will not be drawn, but you should take a look at it.

Figure 2: (. / copy figure 1.sh) (easy to see)

2. Convolutional layer 4mura-> Deephidden identity features

Just double 1 Inputs-- > Convolutional layer1-- > max-Pooling1 before this. The calculation method is exactly the same. In fact, you can also triple kill,quadruple kill or even penta kill it. It's all fine. It's just important to note that no matter what framework you use, TensorFlow,Keras, Caffe, or whatever, pay attention to their parameter settings in order to make sure they are correct. If incorrect, the output of the previous Layer cannot be used as the input of the latter Layer. To put it bluntly, it is to match. If you don't understand, you can try to write n such loops, regardless of the result of the final classification, n you decide, and then you will understand. In order to get everyone to write, I was also heartbroken! )

When it comes to Convolutional 4, the original image becomes the Feature Maps of 80 1x2. The last FC is a little bit special in that it is part Convolutional layer 4 and part Max-Pooling layer 3. For the time being, without considering so much, let's say Convolutionallayer 4, after FC, it becomes a feature vector of the 80x2x1=160 dimension feature. Deep hidden identity features is the deep feature of the input image obtained from the previous series of perfect (messy) calculations. At this point, you should understand what FC is. To put it simply, it is to Dense all the features of the previous layer.

As a matter of fact, there are many potholes about Features, and Feature selection is also an important topic in Machine Learning. Let's talk about it later. The last Soft-max layer is to make a multi-classification regression of FC. With regard to soft-max classification, it is a generalization of sigmoid. Sigmoid generally solves the problem of two-classification, while soft-max is multi-classification. For details, please follow up and share!

What is the knowledge of the CNN network layer to share here, I hope the above content can be of some help to you, can learn more knowledge. If you think the article is good, you can share it for more people to see.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.