Juedaiyuer MNIST machine learning 07/02 Update SLTechnology News&Howtos

Juedaiyuer MNIST machine learning

2025-07-02 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Network Security >

Shulou(Shulou.com)06/01 Report--

MNIST is an entry-level computer vision dataset containing a variety of handwritten digital images:

1. MNIST dataset

MNIST, doesn't it sound very high-end atmosphere? I don't know what this is?

==(Classic) MNIST dataset to be used for handwritten digit classification problem ==

The MNIST dataset's official website is Yann LeCun's website

Automatically download and install python code for this dataset

The code for this paragraph is in tensorflow/examples/tutorials/mnist/input_data.py

"""Functions for downloading and reading MNIST data. """from __future__ import absolute_importfrom __future__ import divisionfrom __future__ import print_functionimport gzipimport osimport tempfileimport numpyfrom six.moves import urllibfrom six.moves import xrange # pylint: disable=redefined-builtinimport tensorflow as tffrom tensorflow.contrib.learn.python.learn.datasets.mnist import read_data_sets

import project

import tensorflow.examples.tutorials.mnist.input_dataimport input_datamnist = input_data.read_data_sets("MNIST_data/", one_hot=True)2. InteractiveSession for TensorFlow

Before using TensorFlow, import it first:

import tensorflow as tfsess = tf.InteractiveSession()3. computational graph

To perform efficient numerical calculations in Python, we often use libraries like NumPy to compute time-consuming operations such as matrix multiplication outside of the Python environment, which are usually implemented in other languages and more efficient code.

Unfortunately, switching back to Python for each operation still requires considerable overhead. If you want to compute in a GPU or distributed environment, this overhead is even more daunting, and this overhead is probably mainly for data migration.

TensorFlow also does most of its work outside Python, but it's improved to avoid this overhead. Instead of running a time-consuming operation independently outside Python, let's describe an interaction diagram and then run it completely outside Python. This is similar to Theano or Torch.

So the purpose of Python code is to build this graph that can be run externally, and to arrange which parts of the graph should be run. For more information, see the section Calculation Chart in Basic Usage.

4. Implementing softmax regression model 4.1 placeholders

We start building the computational graph by creating nodes for the input image x and the target output class y_

We describe these interactive units of operation by manipulating symbolic variables, which can be created in the following way:

x = tf.placeholder("float", shape=[None, 784])y_ = tf.placeholder("float", shape=[None, 10])

x is not a specific value, but a placeholder that we enter when TensorFlow runs the calculation. We want to be able to input any number of MNIST images, each flattened into a 784-dimensional vector. We represent these graphs as two-dimensional floating-point tensors, which have the shape [None, 784 ]. (None here means that the first dimension of this tensor can be of any length.)

The output class value y_is also a 2-dimensional tensor, where each row is a 10-dimensional one-hot vector representing the class corresponding to an MNIST picture.

Although the placeholder shape parameter is optional, TensorFlow can automatically catch errors caused by inconsistent data dimensions.

4.2 variables

Our model also requires weights and biases, which we can of course treat as additional inputs (using placeholders), but TensorFlow has a better way to represent them: Variable. A Variable represents a modifiable tensor that exists in TensorFlow's graph for describing interactive operations. They can be used to calculate input values, or they can be modified during calculations. For all kinds of machine learning applications, there are generally model parameters, which can be represented by Variable.

W = tf.Variable(tf.zeros([784,10]))b = tf.Variable(tf.zeros([10]))

We pass in the initial value when we call tf.Variable. In this example, we initialize both W and b to zero vectors. W is a 784 x 10 matrix (since we have 784 features and 10 output values). b is a 10-dimensional vector (since we have 10 categories)

Variables need to be initialized with seesion before they can be used in sessions. This initialization step is to assign specific values to the initial values (all zeros in this case) and assign them to each variable, which can be done for all variables at once.

sess.run(tf.initialize_all_variables())5. class prediction

Now we can implement our model. All it takes is one line of code! Compute softmax probability values for each category

y = tf.nn.softmax(tf.matmul(x,W) + b)

tf.matmul(X, W) denotes x times W, corresponding to Wx in the previous equation, where x is a 2-dimensional tensor with multiple inputs. Then add b, and type the sum into tf.nn.softmax.

To train our model, we first need to define a metric to evaluate whether the model is good. In fact, in machine learning, we usually define metrics to indicate that a model is bad, this metric is called cost or loss, and then try to minimize this metric. However, these two ways are the same.

A very common and beautiful cost function is cross-entropy. Cross-entropy originated in information compression coding techniques in information theory, but it has since evolved into an important technique in other fields, from game theory to machine learning. It is defined as follows:

Cross entropy is a measure of how inefficient our predictions are in describing the truth.

cross-entropy

Calculating the cross-entropy from the formula makes it easy to specify the loss function for minimizing the error for the training process. Our loss function is the cross-entropy between the target class and the prediction class

cross_entropy = -tf.reduce_sum(y_*tf.log(y))

First, calculate the logarithm of each element of y using tf.log. Next, we multiply each element of y_by the corresponding element of tf.log(y). Finally, calculate the sum of all elements of the tensor with tf.reduce_sum. (Note that the cross-entropy here is not just a measure of a single pair of predicted and true values, but the sum of the cross-entropy of all 100 images.) Predicted performance for 100 data points better describes the performance of our model than performance for a single data point

6. trained model

Now that we know what we need our model to do, training it with TensorFlow is very easy. Because TensorFlow has a graph describing your individual computational units, it can automatically use backpropagation algorithms to effectively determine how your variables affect the cost value you want to minimize. TensorFlow then uses the optimization algorithm of your choice to continuously modify variables to reduce costs.

train_step = tf.train.GradientDescentOptimizer(0.01).minimize(cross_entropy)

Here, we ask TensorFlow to minimize cross-entropy with a gradient descent algorithm at a learning rate of 0.01. Gradient descent algorithm is a simple learning process, TensorFlow just needs to move each variable a little bit in the direction of decreasing cost. TensorFlow also offers many other optimization algorithms: simply tweak one line of code and you can use other algorithms.

What TensorFlow actually does here is it adds a series of new computational operations to the graph describing your computation in the background to implement backpropagation algorithms and gradient descent algorithms. Then, what it returns to you is just a single operation, and when it runs that operation, it trains your model with gradient descent algorithms, fine-tuning your variables, and constantly reducing costs.

Then we start training the model, where we loop the model 1000 times!

for i in range(1000): batch_xs, batch_ys = mnist.train.next_batch(50) sess.run(train_step, feed_dict={x: batch_xs, y_: batch_ys})

Another way to write code

for i in range(1000): batch = mnist.train.next_batch(50) train_step.run(feed_dict={x: batch[0], y_: batch[1]})

At each step of the loop, we randomly grab 50 batch data points from the training data, and then we run train_step with these data points as parameters replacing the previous placeholders.

Training using a small amount of random data is called stochastic training-more specifically, stochastic gradient descent training. Ideally, we would like to use all our data for each step of training, because this gives us better training results, but obviously this requires a lot of computational overhead. So, we can use a different subset of data for each training session, which reduces computational overhead while maximizing the overall characteristics of the dataset.

7. evaluate our model

First let's find the labels that predicted correctly. tf.argmax is a very useful function that gives the index of a tensor object at which its data maximum is located in a certain dimension. Since the label vector is composed of 0,1, the index position where the maximum value 1 is located is the category label, such as tf.argmax(y,1), which has the highest probability among the prediction numbers, and tf.argmax(y_,1) represents the correct label. We can use tf.equal to check whether our prediction is true label matching (the index position is the same to indicate matching).

correct_prediction = tf.equal(tf.argmax(y,1), tf.argmax(y_,1))

This line of code gives us a set of Boolean values. To determine the proportion of correct predictors, we can convert Boolean values to floating-point numbers and average them. For example,[True, False, True, True] becomes [1,0,1,1], which averages out to 0.75.

accuracy = tf.reduce_mean(tf.cast(correct_prediction, "float"))

Finally, we calculate the accuracy of the learned model on the test dataset.

print sess.run (accuracy, feed_dict={x: mnist.test.p_w_picpaths, y_: mnist.test.labels})#Output Result 0.90928. code from tensorflow.examples.tutorials.mnist import input_datamnist = input_data.read_data_sets ("MNIST_data/", one_hot=True)print (mnist.train.p_w_picpaths.shape, mnist.train.labels.shape)print (mnist.test.p_w_picpaths.shape, mnist.test.labels.shape)print (mnist.validation.p_w_picpaths.shape, mnist.validation.labels.shape)import tensorflow as tfsess = tf.InteractiveSession()x = tf.placeholder (tf.float32, [None, 784])W = tf.Variable (tf.zeros([784, 10]))b = tf.Variable (tf.zeros([10]))y = tf.nn.softmax (tf.matmul(x, W) + b)y_ = tf.placeholder (tf.float32, [None, 10])cross_entropy = tf.reduce_mean (-tf.reduce_sum(y_ * tf.log(y), reduction_indices=[1]))train_step = tf.train.GradientDescentOptimizer(0.5).minimize(cross_entropy)tf.global_variables_initializer().run()for i in range(1000): batch_xs, batch_ys = mnist.train.next_batch(100) train_step.run({x: batch_xs, y_: batch_ys})correct_prediction = tf.equal(tf.argmax(y, 1), tf.argmax(y_, 1))accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))print(accuracy.eval({x: mnist.test.p_w_picpaths, y_: mnist.test.labels}))

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.