How to reproduce Capsule Network with flying oars 07/13 Update SLTechnology News&Howtos

How to reproduce Capsule Network with flying oars

2025-07-13 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Shulou(Shulou.com)06/01 Report--

How to use flying oars to reproduce Capsule Network, I believe that many inexperienced people do not know what to do. Therefore, this paper summarizes the causes and solutions of the problem. Through this article, I hope you can solve this problem.

Let's explore the structure and principle of Capsule Network network together, and reproduce it using flying oars.

Download installation command # # CPU version installation command pip install-f https://paddlepaddle.org.cn/pip/oschina/cpu paddlepaddle## GPU version installation command pip install-f https://paddlepaddle.org.cn/pip/oschina/gpu paddlepaddle-gpu

The deficiency of convolution Neural Network

Although convolution neural network (CNN) performs very well, it can not extract features accurately for transformed images such as rotation or element translation.

For example, if you rotate and frame the letter R in the following image, CNN will mistakenly think that the three R in the following image are different letters.

This leads to the concept of posture. Posture combines the relative relationship between objects and is numerically expressed as a 4-dimensional pose matrix. The relationship between three-dimensional objects can be expressed by posture, which is essentially the translation and rotation of objects.

For humans, it is easy to recognize that the image below is the Statue of Liberty, although all the images are displayed at different angles, because human recognition of the image does not depend on perspective. Although I have never seen the exact same picture as the picture below, I can still know immediately that these are the Statue of Liberty.

In addition, the artificial neuron outputs a single scalar to represent the result, while the capsule can output the vector as the result. CNN uses convolution layer to obtain the feature matrix. In order to realize the invariance of viewing angle in the activity of neurons, the maximum pool method is used to achieve this. However, the fatal disadvantage of using maximum pooling is the loss of valuable information and the failure to deal with the relative spatial relationship between features. However, in the capsule network, the important information of the characteristic state will be encapsulated by the capsule in the form of vector.

The working principle of capsule

Let's compare capsules with artificial neurons. In the following table, Vector represents a vector, scalar represents a scalar, and Operation compares how they work.

The following is a detailed analysis of the implementation principles of these four steps:

The low-layer capsule inputs the vector into the high-level capsule by weighting, while the high-level capsule receives the vector from the low-level capsule. All inputs are represented by red and blue dots. The place where these points gather means that the predictions of the low-layer capsules are close to each other.

For example, there is a set of aggregated red dots in capsules J and K because the predictions of these capsules are very close. In capsule J, after multiplying the output of the low-layer capsule by the corresponding matrix W, it falls away from the red accumulation area in capsule J, while in capsule K, it falls on the edge of the red accumulation area, which represents the predicted results of the high-level capsule. The low-layer capsule has the mechanism to measure which high-level capsule can accept its output, and automatically adjust the weight accordingly, so that the weight C of the corresponding capsule K becomes higher and the weight C of the corresponding capsule J becomes lower.

With regard to weight, we need to pay attention to:

1. The weights are non-negative scalars.

two。 For each lower capsule I, the sum of the ownership weight is equal to 1 (weighted by the softmax function).

3. For each low-layer capsule I, the number of weights is equal to the number of high-level capsules.

4. The values of these weights are determined by iterative dynamic routing algorithms.

For each low-level capsule I, its weight defines the probability distribution of the output to each high-level capsule j.

3. Sum of weighted input vectors

This step represents the combination of inputs, similar to the usual artificial neural networks, except that it is the sum of vectors rather than scalars.

4. Nonlinear transformation from vector to vector

Another major innovation of CapsNet is the novel nonlinear activation function, which accepts a vector and then compresses its length to less than 1 without changing direction.

The implementation code is as follows:

Def squash (self,vector): a function that compresses a vector, similar to the activation function, vector normalization Args: vector: a 4-dimensional tensor [batch_size,vector_num,vector_units_num,1] Returns: a vector input vector with the same shape as x and the length of the compressed vector | v | (larger vector length) Output | v | closer to 1 'vec_abs = fluid.layers.sqrt (fluid.layers.reduce_sum (fluid.layers.square (vector) scalar_factor = fluid.layers.square (vec_abs) / (1 + fluid.layers.square (vec_abs)) vec_squashed = scalar_factor * fluid.layers.elementwise_div (vector, vec_abs) return (vec_squashed)

Dynamic routing between packets (essence)

The low-level capsule sends its output to the high-level capsule that says "yes". This is the essence of dynamic routing algorithms.

Pseudo code of ▲ inter-packet dynamic routing algorithm

The first line of the pseudo code indicates the input of the algorithm: the matrix multiplication of the lower input vector and the number of routing iterations r. The last line indicates the output of the algorithm, the vector Vj of the high-level capsule.

The bij in line 2 is a temporary variable that stores the weight of the low-level vector to the high-level capsule. Its value is updated one by one during the iteration, and when it starts a round of iteration, its value is converted to cij by softmax. At the beginning of the inter-packet dynamic routing algorithm, the value of bij is initialized to zero (but after softmax, it is converted to a non-zero cij with equal weights).

Line 3 indicates that the steps in lines 4-7 will be repeated r times (routing iterations).

Line 4 calculates the weights of all high-level capsules corresponding to the low-level capsule vector. The value of bi is converted to a non-zero weight ci after softmax and the sum of its elements is equal to 1.

If it is the first iteration, the values of all coefficients cij will be equal. For example, if we had 8 low-level capsules and 10 high-level capsules, then the weight of all cij would be equal to 0.1. This initialization maximizes uncertainty: low-level capsules do not know which high-level capsule their output is most suitable for. Of course, as this process repeats, these uniform distributions will change.

Line 5, where high-level capsules will be involved. In this step, the sum of the input vectors weighted by the routing coefficients determined in the previous step is calculated, and the output vector sj is obtained.

Line 7 updates the weight, which is the essence of the routing algorithm. We multiply the vector vj of each high-level capsule with the original input vector of the lower layer element by element to obtain the inner product (also called dot product, which detects the similarity between the input and output of the capsule (the diagram below), and then updates the original weight bi with the dot product result. This achieves the effect that the low-level capsule sends its output to the high-level capsule with similar output, and depicts the similarity between vectors. After this step, the algorithm jumps to step 3 to restart the process and repeats it r times.

▲ dot product operation is the inner product (dot product) operation of a vector.

Can show the similarity of vectors.

After repeating, we calculate the output of all the high-level capsules and establish the correct routing weight. The following is the capsule layer implemented according to the above principles:

Class Capsule_Layer (fluid.dygraph.Layer): def _ _ init__ (self,pre_cap_num,pre_vector_units_num,cap_num,vector_units_num): implementation class of''capsule layer You can directly use Args: pre_vector_units_num (int): input vector dimension vector_units_num (int): output vector dimension pre_cap_num (int): input number of capsules cap_num (int): output number of capsules routing_iters (int): routing iterations It is recommended to Notes 3 times: the number of capsules and the vector dimension affect the performance and can be used as the main tune parameter''super (Capsule_Layer). Self). _ init__ () self.routing_iters = 3 self.pre_cap_num = pre_cap_num self.cap_num = cap_num self.pre_vector_units_num = pre_vector_units_num for j in range (self.cap_num): self.add_sublayer ('u_hat_w'+str (j), fluid.dygraph.Linear (\ input_dim=pre_vector_units_num) Output_dim=vector_units_num)) def squash (self,vector):''function of compression vector Similar to the activation function, vector normalization Args: vector: a 4-dimensional tensor [batch_size,vector_num,vector_units_num,1] Returns: a vector input vector with the same shape as x and the length compressed. V | (vector length) is larger. Output | v | closer to 1 'vec_abs = fluid.layers.sqrt (fluid.layers.reduce_sum (fluid.layers.square (vector) scalar_factor = fluid.layers.square (vec_abs) / (1 + fluid.layers.square (vec_abs)) vec_squashed = scalar_factor * fluid.layers.elementwise_div (vector, vec_abs) return (vec_squashed) def capsule (self,x,B_ij,j Pre_cap_num): this is the essence of dynamic routing algorithms. Args: X: input vector, a four-dimensional tensor shape = (batch_size,pre_cap_num,pre_vector_units_num,1) B_ij: shape = (1) route assignment weight Here, we will select (split) the weight of the j group to calculate j: indicates the route pre_cap_num of the j th capsule currently calculated: input the number of capsules Returns: VIPJ: output 4-D tensor (single capsule) B_ij: splice the route after calculating the route ( Concat) back weight Notes: B_ij Be case-sensitive. X = fluid.layers.reshape (x, (x.shape [0], pre_cap_num,-1)) u_hat = getattr (self,'u_hat_w'+str (j)) (x) u_hat = fluid.layers.reshape (u_hat, (x.shape [0], pre_cap_num) -1Power1) shape_list = B_ij.shape# (1Power1152) split_size = [for i in range shapewriter list [2]-JREE 1] for i in range (self.routing_iters): C_ij = fluid.layers.softmax (Barriij pencils axistemer2) BenzijMore Benzir = fluid.layers.split (BosphijMore splitsizetem2 dimpIj) B_ir = fluid.layers.split (CentrijGravity splittering sizerecovery2) vSecretj = fluid.layers.elementwise_mul (uplifting hatmings Cephij) VPLANJ = fluid.layers.reduce_sum (vSecretjDimEndimFormEntrue) vSecretj = self.squash (Vallej) v_j_expand = fluid.layers.expand (Vallej, (1rect preemptive captive num) 1 (u_v_produce 1)) u_v_produce = fluid.layers.elementwise_mul (upright vicious hatching) u_v_produce = fluid.layers.reduce_sum (upright vicious produceEntiture2 memorials dimpotent True) b_ij + = fluid.layers.reduce_sum (upright vicious producemementorial0memendimwriting True) B_ij = fluid.layers.concat. Axis=2) return venerj def forward (self,x):''Args: x:shape = (batch_size,pre_caps_num,vector_units_num,1) or (batch_size,C,H,W) if the input is a tensor of shape= (batch_size,C,H,W) Then it is vectorized into shape= (batch_size,pre_caps_num,vector_units_num) 1) satisfy: C * H * W = vector_units_num * caps_num where C > = caps_num Returns: capsules: a list''if x.shape containing caps_num capsules [3]! = 1: X = fluid.layers.reshape (x, (x.shape [0], self.pre_cap_num) (- 1)) temp_x = fluid.layers.split (xmeme self.prefabricated vectorsunitsnumMagne2) temp_x = fluid.layers.concat (temp_x,axis=1) x = fluid.layers.reshape (temp_x, (x.shape [0], self.pre_cap_num,-1,1)) x = self.squash (x) B_ij = fluid.layers.ones ((1memx.shape [1]) Self.cap_num,1), dtype='float32') / self.cap_num# capsules = [] for j in range (self.cap_num): cap_j,B_ij = self.capsule

Loss function

A 10-dimensional one-hot coding vector is used as a label, which consists of nine zeros and one (the correct label). In the loss function formula, the coefficient Tc of the output capsule corresponding to the correct label is 1.

If the correct label is 9, this means that the Tc of the ninth capsule output loss function is 1, and the remaining nine are 0. When Tc is 1, the right term coefficient of the loss function in the formula is zero, that is to say, the value of the loss function of the correct output term only contains the calculation of the left term; if the corresponding left coefficient is 0, the coefficient of the right term is 1, and the value of the loss function of the error output term only contains the calculation of the right term.

| v | the module length of the capsule output vector, which indicates the class probability to a certain extent. We propose another quantity m, which is used to measure whether the probability is appropriate. The right term of the formula includes a lambda coefficient to ensure numerical stability in training (lambda is a fixed value of 0.5). These two terms are squared in order to make the loss function conform to L2 regularity.

Def get_loss_v (self,label):''calculate the edge loss Args: label:shape= (32Power10) tag Notes: here I call Relu to screen out values less than 0 m_plus: the probability of the correct output item (| v |) is greater than this value, loss is 0 The closer the loss is, the smaller the m_det: the probability of the wrong output item (| v |) is less than this value, the loss is 0, and the closer it is, the smaller the loss (| v | the module length of the capsule (vector))''# calculates the left term, although m + is a single value. But you can make a difference with label (32) by broadcasting max_l = fluid.layers.relu (train_params ['masked plus']-self.output_caps_v_lenth) # squared and reshape max_l = fluid.layers.reshape (fluid.layers.square (max_l), (train_params ['batch_size']) (- 1)) # 32 (10 #) the same method calculates the right term max_r = fluid.layers.relu (self.output_caps_v_lenth-train_params ['masked det']) max_r = fluid.layers.reshape (fluid.layers.square (max_r), (train_params [' batch_size']) (- 1)) # 32 when merging 10 #, margin_loss = fluid.layers.elementwise_mul (label,max_l)\ + fluid.layers.elementwise_mul can be obtained by directly multiplying the tags in one-hot form element by element. * train_params ['lambda_val'] self.margin_loss = fluid.layers.reduce_mean (margin_loss,dim=1)

Encoder

The complete network structure is divided into encoder and decoder. Let's take a look at the encoder first.

1. Input picture 28x28 first passes through the convolution layer of 1x256x9x9 to obtain 25620x20 feature images

two。 Using the convolution of 8 groups of 256x32x9x9 (stride=2) to obtain 8 groups of characteristic graphs of 32x6x6.

3. The obtained feature map is quantized into 10 capsules, and the length of the 10 capsule output vectors is the probability of each category.

Class Capconv_Net (fluid.dygraph.Layer): def _ _ init__ (self): super (Capconv_Net,self). _ _ init__ () self.add_sublayer ('conv0',fluid.dygraph.Conv2D (\ num_channels=1,num_filters=256,filter_size= (9), padding=0,stride = 1) Act='relu')) for i in range (8): self.add_sublayer ('conv_vector_'+str (I), fluid.dygraph.Conv2D (\ num_channels=256,num_filters=32,filter_size= (9), stride=2,padding=0,act='relu')) def forward (self,x,v_units_num): X = getattr (self) 'conv0') (x) capsules = [] for i in range (v_units_num): temp_x = getattr (self,'conv_vector_'+str (I)) (x) capsules.append (fluid.layers.reshape (temp_x, (train_params [' batch_size'],-1) x = fluid.layers.concat (capsules,axis=2) x = self.squash (x) return x

From the implementation code, it is not difficult to see that the actual process of converting a feature graph into a vector is to expand each set of two-dimensional matrices into one-dimensional matrices (of course, multiple two-dimensional matrices are expanded and spliced back and forth). After that, the one-dimensional matrices of all groups are spliced in the new dimensions to form vectors (the following figure is a diagram). According to the following idea, I have reduced the convolution of 8 times to one convolution, which is essentially separated from the cycle and directly vectorized by split and concat methods, which speeds up the training efficiency.

The decoder accepts a 16-dimensional vector from the correct capsule, and the input passes through three full connection layers to get 784 pixel output. The decoder learns to reconstruct a 28 × 28 pixel image. The loss function is the Euclidean distance between the reconstructed image and the input image.

The following picture is the image obtained by the network reconstruction that I trained by myself, above is the original picture of the input network, and below is the picture of the network reconstruction.

Let's play again, when we are halfway through the training, we will transpose all the pictures (which can be understood as turning the picture horizontally and vertically + rotating the angle, changing the posture). The experimental conclusions are as follows.

After reading the above, have you mastered how to reproduce Capsule Network with flying oars? If you want to learn more skills or want to know more about it, you are welcome to follow the industry information channel, thank you for reading!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.