What is the handwritten digit recognition for getting started with CNN in Tensorflow 07/08 Update SLTechnology News&Howtos

What is the handwritten digit recognition for getting started with CNN in Tensorflow

2025-07-08 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Servers >

Shulou(Shulou.com)05/31 Report--

This article introduces how the handwritten digit recognition of the introduction to CNN in Tensorflow is. The content is very detailed. Interested friends can use it for reference. I hope it will be helpful to you.

One of the most exciting areas of deep learning is computer vision. Through the convolution neural network, we have been able to create self-driving car systems, facial detection systems, automatic medical image analysis and so on. I will show you the basic principles of convolution neural networks and how to create your own system of handwritten numbers for classification.

Cdn.com/f6b247650f1d689504cfca4e25034dcb51025592.jpeg ">

The function of convolution neural network seems to be the replication of biological function in human brain. As early as 1959, David Hubel and Torsten Wiesel studied cats and monkeys. These studies revealed the function of animal visual cortex. What they found was that many neurons had a small local receptivity, that is, they responded only to a small limited area of the entire field of vision. They found that some neurons respond to certain patterns, such as horizontal lines, vertical lines, and other circles. They also found that other neurons had larger receptive fields and were stimulated by more complex patterns, which were a combination of information collected by lower-level neurons. These findings laid the foundation for what we now call convolution neural networks. Next, we introduce the composition of convolution neural networks one by one.

1. Convolution layer

Each convolution layer in convolution neural network is composed of several convolution units, and the parameters of each convolution unit are optimized by back propagation algorithm. The purpose of convolution operation is to extract different features of the input. the first convolution layer may only extract some low-level features such as edges, lines and corners, and more layers of networks can iteratively extract more complex features from low-level features. You can depict each filter as a window that slides over the size of the image and detects attributes. The number of pixels the filter slides on the image is called the stride. A stride of 1 means that the filter moves one pixel at a time, where the stride of 2 skips 2 pixels forward.

In the above example, we can see a vertical line detector. The original image is 6x6, which is scanned with a 3x3 filter with a step size of 1, resulting in 4x4 size output. The filter is only interested in the left and right columns of its field of view. By summing the input of the image and multiplying it by the configuration of the 3 × 3 filter, we get the 3 × 1 filter 2-1-7-5 color 7. Then the filter moves one step to the right, and then calculates 1, 0, 3, 2, 3, 3, 1, 2, 2, 3, 2, 1, 2, 2, 2, 2, 2, 3, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 3, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 3, 2, 2, 2, 3, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 3, 2, 2, 2, 3, 2, 2, 2, 3, 2, 2, 2, 2, 3, 2, 2, 2, 3, 2, 2, 2, 2, 2, 2, 3, 2, 2, 2, 3, 2, 2, 2, 2, 3, -2 and then enter the position on the right side of-7. This process continues until the 4x4 grid is complete. After that, the next feature graph will calculate its own value using its own unique filter / kernel matrix.

two。 Pooled layer

The goal of the pooling layer is to further reduce the dimension by aggregating the values collected by the convolution layer or so-called subsamples. In addition to providing some regularization schemes for the model to avoid over-fitting, this will also reduce the amount of computation. They follow the same sliding window idea as the convolution layer, but instead of calculating all the values, they choose the maximum or average value of their input. This is called maximum pooling and average pooling, respectively.

These two components are the key building blocks of the convolution layer. Then, you usually repeat this method to further reduce the size of the feature map, but increase its depth. Each feature map will specifically identify its own unique shape. At the end of the convolution, a fully connected layer is placed with an active function, such as Relu or Selu, to reshape the size to the appropriate size into the classifier. For example, if your final transformation layer is a 3x3x128 matrix, but you only predict 10 different classes, you need to reshape it to a 1x1152 vector and gradually reduce its size before entering the classifier. Fully connected layers will also learn their own characteristics, such as in typical deep neural networks.

Now let's take a look at the implementation in Tensorflow on the MNIST handwritten numeric dataset. First, we will load our library. Using fetch_mldata in sklearn, we load the mnist dataset and assign images and tags to the x and y variables. Then we will create our training / testing device. Finally, we will give a few examples to understand the task.

Next, we will make some data enhancements, which are reliable ways to improve the performance of the model. By creating slight changes in the training image, you can create regularization for the model. We will use Scipy's ndimage module to move the image one pixel to the right, left, up, and down. This not only provides more kinds of examples, but also greatly increases the size of our training set.

The last data enhancement I'll show you is to use the cv2 library to create a horizontal flip of the image. We also need to create new tags for these flipped images, which is as simple as copying the original tag.

Setting "flipCode = 0" will result in a vertical flip

Next, we will create an auxiliary function to provide random micro-batches to our neural network input. Because of the nature of the convolution layers, they require a large amount of memory during the forward and backward propagation steps. Consider the layer with the 4x4 filter, output the feature map with 128stride 1 and the SAME fill of the RGB image input with size 299x299. The number of parameters will be equal (4x4x3+1) x 128 = 6272. Now consider that each of the 128feature graphs calculates 299x299 neurons, and each of these neurons calculates the weighted sum of 4x4x3 inputs. This means that we need 4x4x3x299x299x150=643687200 calculations, which is just an example of training.

Now we begin to create our network architecture. First, we create placeholders for our training data / features. We need to reshape them into (- 1) matrix because the tensorflow conv2d layer requires 4-dimensional input. We set the first dimension to "null" to allow any batch size to be provided to the placeholder.

Then we designed our convolution layer, and I took inspiration from the Le-NET5 (pioneered by Yann LeCun) network architecture, which is known for its success in classifying handwritten numbers. I suggest you study Le-NET5 and other proven models so that you can understand which convolution networks are suitable for different tasks. Link: http://yann.lecun.com/exdb/publis/pdf/lecun-01a.pdf.

The first layer of the convolution layer of our model consists of 12 feature maps, using a 3x3 filter with a stride of 1. We chose the SAME fill to maintain the size of the image by adding a zero fill around the input. Then, we set the maximum pooling layer to use the 3x3 filter with a stride of 1, which outputs the 13x13x12 matrix. So we start with an image of 28x28x1, and then we pass this matrix to the second conversion layer, which has the depth of the 3x3 filter, stride=1 and SAME padding. This will output a 6'6'16-dimensional matrix. You can see that we are shrinking the dimensional space of the feature map, but deeper. Next, we use the Selu function to activate two dense layers to reduce the number of inputs per layer by about half until we finally enter them into our logits.

Then we create our loss function, in which case it will be softmax cross entropy, which will output multi-class probabilities. You can think of cross-entropy as a measure of the distance between various data points. We choose AdamOptimizer (Adaptive moment estimation) and automatically adjust its learning rate when the gradient decreases. Finally, we created a method to evaluate the results. Tensorflow's in_top_k function will calculate our logits and select the highest score. Then we use our accuracy variable to output a percentage between 0 and 1%.

Now that we are ready for the training phase, let's see how our model performs.

At 19epoch, our accuracy was 0.9907. This is already better than any machine learning algorithm, so convolution has taken the lead. Now let's try our shift / flip function and add two new elements to our network: dropout and batch standardization.

We use placeholder_with_default nodes to modify existing placeholders, which will hold the values generated by the batch normalization and dropout layer. During training, we set these values to True, and during testing, we will turn them off by setting them to False.

Batch standardization is simply standardizing the data of each batch. We specified a momentum of 0.9. Dropout and regularization specify a momentum of 1 to shut down nodes completely randomly during training. This causes the rest of the nodes to relax, thus improving their effectiveness. Imagine a company that decides to randomly select 50 employees a week to stay at home. The rest of the staff will have to deal effectively with the extra work and improve their skills in other areas.

Then we create our loss function, training steps, and evaluation steps, and then make some changes to our execution phase. Calculations performed by batch normalization are saved as update operations during each iteration. To access these, we assign a variable extra_update_ops = tf.get_collection (tf.GraphKeys.UPDATE_OPS). During our training operation, we provided it to sess.run as a list item along with training_op. Finally, when performing the validation / test prediction, we assign a false value to the placeholder through feed_dict. We don't want any randomization in the prediction phase. To get the output, we use our test set to run the logits operation. Let's see how this model behaves after adding regularization / normalization and is using methods to enhance data.

At 29epoch, we achieved 99.5% accuracy on a 10000-digit test set. As you can see, the accuracy of the model reached 99% in the second epoch, compared with 16% in the previous model. Although 0.05% may not be much, it is a significant improvement when dealing with large amounts of data. Finally, I'll show you how to use the predictions generated by np.argmax on the logits output.

On the introduction to CNN in Tensorflow handwritten digit recognition is how to share here, I hope the above content can be of some help to you, can learn more knowledge. If you think the article is good, you can share it for more people to see.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.