The method of reading mnist data set by python 07/03 Update SLTechnology News&Howtos

The method of reading mnist data set by python

2025-07-03 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Development >

Shulou(Shulou.com)06/03 Report--

This article mainly explains the "python read mnist data set method", the article explains the content is simple and clear, easy to learn and understand, the following please follow the editor's train of thought slowly in depth, together to study and learn "python read mnist data set method"!

Introduction to dataset format

This part of the content is very common on the Internet, here is a brief introduction. The mnist dataset downloaded on the network contains four files:

The first two are the image and label of the test set, containing 10000 samples. The last two are from the training set and contain 60000 samples. GZ represents this compressed package, and if you decompress it, you will get a binary file in .ubyte format.

The image above shows the storage format of the label and image data of the training set. Both files start with magic number and number of images/items, and the second is useful, indicating the number of samples stored in the file. Another thing to note is the number of bits of data, there are 32-bit integers and 8-bit integers.

Read method file reading in .gz format

Import gzip is required

The code to read the training set is as follows:

Def load_mnist_train (path, kind='train'):''path: path to the dataset kind: value is train Represents the read training set''labels_path = os.path.join (path,'%s-labels-idx1-ubyte.gz'% kind) images_path = os.path.join (path,'%s-images-idx3-ubyte.gz'% kind) # Open the file with gzip.open (labels_path,' rb') as lbpath using gzip: # use the struct.unpack method to read the first two data, > indicates that the high order comes first I stands for 32-bit integers. Lbpath.read (8) means to read 8 bytes at a time from a file # so that the first two data read are magic number and the number of samples magic, n = struct.unpack ('> II',lbpath.read (8)) # use np.fromstring to read the rest of the data Lbpath.read () means reading all data labels = np.fromstring (lbpath.read (), dtype=np.uint8) with gzip.open (images_path, 'rb') as imgpath: magic, num, rows, cols = struct.unpack (' > IIII',imgpath.read (16)) images = np.fromstring (imgpath.read (), dtype=np.uint8) .reshape (len (labels), 784) return images, labels

The code to read the test set is similar.

Reading of uncompressed files

If you extract the four files locally, you will get a file in .ubyte format, and the read code will change.

Def load_mnist_train (path, kind='train'):''path: path to the dataset kind: value is train Represents the read training set''labels_path = os.path.join (path,'%s-labels-idx1-ubyte'% kind) images_path = os.path.join (path,'%s-images-idx3-ubyte'% kind) # No longer use gzip to open the file with open (labels_path,' rb') as lbpath: # use the struct.unpack method to read the first two data, > indicates the high bit comes first, and I represents the 32-bit integer. Lbpath.read (8) means to read 8 bytes at a time from a file # so that the first two data read are magic number and the number of samples magic, n = struct.unpack ('> II',lbpath.read (8)) # use np.fromfile to read the remaining data labels = np.fromfile (lbpath,dtype=np.uint8) with gzip.open (images_path, 'rb') as imgpath: magic, num, rows Cols = struct.unpack ('> IIII',imgpath.read (16)) images = np.fromfile (imgpath,dtype=np.uint8) .reshape (len (labels), 784) return images, labels

After reading, you can check the length of images and labels to confirm that the read is correct.

Thank you for your reading, the above is the content of "the method of python reading mnist data set". After the study of this article, I believe you have a deeper understanding of the method of python reading mnist data set, and the specific use needs to be verified in practice. Here is, the editor will push for you more related knowledge points of the article, welcome to follow!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.