How to realize MNIST handwriting recognition in Python 07/13 Update SLTechnology News&Howtos

How to realize MNIST handwriting recognition in Python

2025-07-13 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Development >

Shulou(Shulou.com)06/02 Report--

This article mainly introduces how to achieve MNIST handwriting recognition in Python, which is very detailed and has a certain reference value. Interested friends must read it!

1. A brief description of the experimental content 1.1 Experimental environment

The software and hardware experimental environment used in this experiment is shown in the table:

Under the Windows operating system, the deep learning framework of Keras based on Tensorflow is adopted to train and test MNIST.

Using keras's deep learning framework, keras is a Python library designed for simple neural network assembly, with a large number of prepackaged network types, including 2D and 3D style convolution networks, short-term and long-term networks, and broader general networks. Using keras to build the network is direct, the semantics used by keras in its Api design is level-oriented, and the network construction is relatively intuitive, so this time we choose the Keras artificial intelligence framework, which focuses on user-friendly, modularization and expansibility.

1.2 introduction to MNIST datasets

MNIST (official website) is a very famous handwritten numeral recognition data set. It consists of pictures of handwritten numbers and corresponding labels, such as:

The MNIST dataset is divided into training images and test images. There are 60000 training images and 10000 test images, each of which represents a number from 0 to 9, and the size of the image is 28-28 matrix.

Train-images-idx3-ubyte.gz: training set images (9912422 bytes) training picture

Train-labels-idx1-ubyte.gz: training set labels (28881 bytes) training label

T10k-images-idx3-ubyte.gz: test set images (1648877 bytes) test picture

T10k-labels-idx1-ubyte.gz: test set labels (4542 bytes) test tag

1.3 data preprocessing

In the data preprocessing stage, the image is normalized, and we reduce these values in the image to between 0 and 1, and then feed them to the neural network model. To do this, convert the data type of the image component from an integer to a floating point number, and then divide by 255. This makes it easier to train, and here is the function of preprocessing the image: be sure to preprocess the training set and the test set in the same way:

After that, the tag is processed by one-hot coding: the value of the discrete feature is extended to the Euclidean space, and a certain value of the discrete feature corresponds to a point in the Euclidean space; in the machine learning algorithm, the common methods for calculating the distance or similarity between features are based on Euclidean space; using one-hot coding for discrete features will make the distance calculation between features more reasonable.

two。 The core code of the experiment (1) MLP perceptron # Build MLPmodel = Sequential () model.add (Dense (units=256, input_dim=784, kernel_initializer='normal', activation='relu')) model.add (Dense (units=128, kernel_initializer='normal', activation='relu')) model.add (Dense (units=64, kernel_initializer='normal') Activation='relu')) model.add (Dense (units=10, kernel_initializer='normal', activation='softmax')) model.summary () (2) CNN convolutional neural network # Build LeNet-5model = Sequential () model.add (Conv2D (filters=6, kernel_size= (5,5), padding='valid', input_shape= (28,28,1), activation='relu')) # C1model.add (pool_size= (2) ) # S2model.add (Conv2D (filters=16, kernel_size= (5,5), padding='valid', activation='relu') # C3model.add (MaxPooling2D (pool_size= (2,2) # S4model.add (Flatten ()) model.add (Dense (120,activation='tanh')) # C5model.add (Dense (84, activation='tanh')) # F6model.add (Dense (10, activation='softmax')) # outputmodel.summary () Model interpretation

In the process of model training, we use LENET-5 's convolution neural network structure.

The first layer, convolution layer

The input of this layer is the original image pixels, and the input layer size accepted by the LeNet-5 model is 28x28x1. The filter size of the first convolution layer is 5x5, the depth (convolution core type) is 6, the full zero fill is not used, and the step size is 1. Because the all-zero fill is not used, the output size of this layer is 32-5-1-28 and the depth is 6. The number of convolution layer parameters in this layer is 5x5x1x6+6=156 parameters (trainable parameters), of which 6 are bias parameters. Because the node matrix of the next layer has 28x28x6=4704 nodes (the number of neurons), each node is connected to the 5x5=25 current layer nodes, so the convolution layer of this layer has a total of 28x28x6x (5x5+1) connections.

The second layer, pooling layer

The input of this layer is the output of the first layer, which is a node matrix of 28x28x6=4704. The filter used in this layer is the size of 2x2, and the steps of length and width are 2, so the size of the output matrix of this layer is 14x14x6. The filters used in the original LeNet-5 model are slightly different from those that will be used here, and I won't cover them too much here.

The third layer, convolution layer

The input matrix size for this layer is 14x14x6, the filter size used is 5x5, and the depth is 16. This layer is not filled with all zero, and the step size is 1. The output matrix size of this layer is 10x10x16. According to the standard convolution layer, this layer should have 5x5x6x16+16=2416 parameters (trainable parameters), 10x10x16x (5x5+1) = 41600 connections.

The fourth layer, pool layer

The input matrix size of this layer is 10x10x16, the filter size is 2x2, the step size is 2, and the output matrix size of this layer is 5x5x16.

The fifth layer, full connection layer

The input matrix size for this layer is 5x5x16. If you pull the nodes in this matrix into a vector, then this is the same as the input of the full connection layer. The number of output nodes in this layer is 120, with a total of 5x5x16x120+120=48120 parameters.

The sixth layer, full connection layer

The number of input nodes in this layer is 120, the number of output nodes is 84, and the total parameter is 120x84+84=10164.

The seventh layer, full connection layer

The structure of the last output layer in the LeNet-5 model is different from that of the full connection layer, but here we use an approximate representation of the full connection layer. There are 84 input nodes and 10 output nodes in this layer, with a total of 84x10+10=850 parameters.

Model process

The training begins after the initial parameters are set, and each training needs to fine-tune the parameters to get better training results. after many attempts, the final parameters are set as follows:

Optimizer: adam optimizer

Number of training rounds: 10

Amount of data entered each time: 500

LENET-5 's convolutional neural network trains the MNIST data set, and uses the above model parameters for 10 rounds of training, and achieves 95% accuracy in the training set.

3. Result Analysis Machine Summary 3.1 Model Test and result Analysis

In order to verify the robustness of the model, the model with the best performance on the verification set is saved under the above optimal parameters, and the final test is carried out on the test set, and the final accuracy is 95.13.

In order to better analyze our results, the confusion matrix is used to evaluate the performance of our model. Before evaluating the model, learn some indicators.

TP (True Positive): the positive class is predicted as the positive class number, the truth is 0, and the prediction is also 0FN (False Negative); the positive class is predicted as the negative class number, the truth is 0, the prediction is 1FP (False Positive); the negative class is predicted as the positive class number, the truth is 1, and the prediction is 0. TN (True Negative): the negative class is predicted as the number of negative classes, the real is 1, and the prediction is also defined and expressed by the 1 confusion matrix:

The confusion matrix is a situation analysis table that summarizes the prediction results of the classification model in machine learning, which summarizes the records in the data set according to the real category and the category judgment predicted by the classification model in the form of matrix. The rows of the matrix represent the real value, and the columns of the matrix represent the predicted value. Take this case as an example, take a look at the matrix representation, as follows:

3.2 comparison of results

Compared with the four-layer full connection layer model, the model structure of the full connection layer is as follows:

The results are as follows:

In short, from the results, after continuous parameter tuning, a model with a classification accuracy of about 95% is trained, and the strong robustness of the model is proved by experiments.

3.3 Prediction of the model

Predict a single image:

The above is all the contents of the article "how to achieve MNIST handwriting recognition in Python". Thank you for reading! Hope to share the content to help you, more related knowledge, welcome to follow the industry information channel!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.