How to use NumPy to build convolution Neural Network to realize handwritten digit recognition 07/02 Update SLTechnology News&Howtos

How to use NumPy to build convolution Neural Network to realize handwritten digit recognition

2025-07-02 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Shulou(Shulou.com)06/01 Report--

01 introduction

When Yann LeCun published his research on the development of a new neural network architecture, convolutional neural networks (Convolutional neural network, CNN), his work went largely unnoticed. At the 2012 ImageNet computer Vision Competition, a team of researchers from the University of Toronto spent 14 years bringing CNN into public view. When they classified millions of images from thousands of categories, only 15.8% of the errors were generated. Fast forward to now, the current state-of-the-art convolution neural network achieves an accuracy higher than the human level of performance.

Error rate on ImageNet dataset

Inspired by these promising results, I began to understand the functions of CNN and how they perform so well. So in order to fully understand this progress in artificial intelligence, I built a convolution neural network from scratch in NumPy. After completing this project, I feel that there is a disconnect between the surface complexity of convolution neural networks and their actual complexity.

0 2

Challenges

CNN is known for its ability to recognize image patterns, so the task of the network described in this article is image classification. One of the most common benchmarks for measuring the performance of computer vision algorithms is to train them on the MNIST handwritten numeral database: the database contains 70000 handwritten digits and their corresponding tags. The goal is to train CNN to be as accurate as possible when marking handwritten numbers (from 0 to 9).

Let's review the components that make up the network and how they are connected to form predictions from the input data. After explaining each component, we will encode its functionality. In the final part of this article, we will use NumPy to program and train each part of the network. Cut the crap and let's get started.

0 3

How to learn convolution Neural Network

Convolution layer (Convolutions)

CNN uses filters (also known as kernels) to detect features such as edges in an image. A filter is just a matrix of values, called weights, which is trained to detect specific features. The filter moves to each part of the image to check whether the feature it is trying to detect exists. To provide a value to represent the credibility of a particular feature, the filter performs a convolution operation, which is the product of an element and the sum of two matrices.

When the feature appears in a certain part of the image, the filter convolution with that part of the image to get a high-value real number. If the property does not exist, the result value is low.

In the following example, the filter responsible for checking the curve on the right is passed to part of the image. Because this part of the image contains the same curve as the curve found by the filter, the result of the convolution operation is a large number (6600).

However, when the same filter passes through the part of the image with quite different edge sets, the output of convolution is very small, which means that there is no strong right-hand curve.

In order to enable the convolution neural network to learn the value of the filter that detects the features in the input data, the filter must be transmitted by nonlinear mapping. The output of the filter and the input image convolution operation is summed by the offset term and the nonlinear activation function is used. The purpose of the activation function is to introduce nonlinearity into our network. Since our input data is nonlinear (it is not feasible to linearly model the pixels that form a handwritten signature), our model needs to take this into account.

Main points of the code:

With NumPy, we can easily program convolution operations. The convolution function uses for loops to convolution all filters on the image. In each iteration of the for loop, two while loops are used to pass the filter to the image. In each step, the filter is part of a multi-element (*) and input image. Then use the NumPy's sum method to sum the results of the multiplication of this element to get a separate value, and then add a deviation term.

Downsampling (Downsampling)

In order to speed up the training process and reduce the memory consumption of the network, we try to reduce the redundancy in the input features. There are several ways to downsample an image, but in this article, we will focus on the most common one: max pooling (maximum pooling).

In maximum pooling, a window passes through an image according to the set step size (how many units at a time). In each step, the maximum value within the window is merged into an output matrix, so it is called maximum pooling.

In the following image, a window the size of fene2 passes through the image in steps of 2. F represents the size of the maximum pooling window (red box), and s represents the number of units the window moves in the x and y directions. In each step, select the maximum value within the window:

Maximum pooling greatly reduces the representation size, thereby reducing the amount of memory required and the amount of operations performed later on the network.

Main points of the code:

The maximum pool operation boils down to one for loop and several while loops. The for loop is used to traverse each layer of the input image, and the while loop slides the window to each part of the image. In each step, we use NumPy's max method to get the maximum value

Fully connected layer (fully-connected layer)

In the fully connected operation of the neural network, the input representation is flattened into a feature vector, and the output probability is predicted by the neural network.

These rows are joined together to form a long eigenvector. If there are multiple input layers, their rows are connected to form a longer feature vector.

Then the feature vector is passed through multiple dense layers. In each dense layer, the eigenvector is multiplied by the weight of the layer, plus its deviation, and then through nonlinearity. The following figure shows the fully connected operation and the dense layer:

Main points of the code:

NumPy makes it very easy to write a fully connected layer of CNN. In fact, you can do this in one line of code using NumPy's reshape method

Output layer (Output layer)

The output layer of CNN is responsible for generating the probability of each class (each number) of a given input image. In order to obtain these probabilities, we initialize the final dense layer to contain the same number of neurons as the class. Then, the output of this dense layer is activated by the Softmax function, which maps all the final dense layer output to a vector with a sum of 1 elements.

After training, the average accuracy of the network in the test set is 98%, which I think is quite good. After extending the training time by 2-3 epoch, I found that the performance of the test set degraded. I speculate that in the third to fourth training cycles, the network begins to over-fit the training set and no longer generalizes.

On how to use NumPy to build convolution neural network to achieve handwritten digit recognition is shared here, I hope the above content can be of some help to you, can learn more knowledge. If you think the article is good, you can share it for more people to see.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.