How to use pytorch to read datasets 07/01 Update SLTechnology News&Howtos

How to use pytorch to read datasets

2025-07-01 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Development >

Shulou(Shulou.com)05/31 Report--

Most people do not understand the knowledge of this article "how to use pytorch to read datasets", so the editor summarizes the following contents, detailed contents, clear steps, and has a certain reference value. I hope you can get something after reading this article. Let's take a look at this "how to use pytorch to read datasets" article.

Pytorch reads the dataset

There are generally three situations in which data sets are read using pytorch

First kind

Read the official dataset, such as Imagenet,CIFAR10,MNIST, etc.

These libraries can call torchvision.datasets.XXXX (), for example, if you want to read the MNIST dataset

Import torchimport torch.nn as nnimport torch.utils.data as Dataimport torchvisiontrain_data = torchvision.datasets.MNIST (root='./mnist/', train=True, # this is training data transform=torchvision.transforms.ToTensor () # Converts a PIL.Image or numpy.ndarray to # torch.FloatTensor of shape (C x H x W) and normalize in the range [0.0,1.0] download=True,)

This automatically downloads the MNIST dataset from the Internet and reads it in a saved data format.

Then an object of DataLoader is defined directly, and training can be carried out.

Train_loader = Data.DataLoader (dataset=train_data, batch_size=BATCH_SIZE, shuffle=True) for epoch in range (EPOCH): for step, (bauxx, bauxy) in enumerate (train_loader): # gives batch data, normalize x when iterate train_loader XXXX XXXX the second kind

This is more common, aiming at the problem of image classification.

When applicable, for the problem of multi-classification of pictures, the pictures are stored in the specified format:

Root path / category (label label) / picture

According to the above format to store pictures, many folders are saved under the root path, each folder stores a certain kind of pictures, and the folder name is the mapping of the class, for example, the root directory is learn_pytorch, each folder below represents a class, the name of the class is randomly named, in the training process will be automatically mapped into 0prit 1, 2, 2, 3 …

After saving this format, you can read it directly using the derived class ImageFolder defined by pytorch. ImageFolder is actually a derived class of Dataset, which is specifically defined to read pictures in a specific format. It is also convenient for us to use by the torchvision library, such as this.

It can then be used as a dataset input for DataLoader.

From torchvision.datasets import ImageFolderdata_transform = transforms.Compose ([transforms.ToTensor (), transforms.Normalize (mean= [0.5, 0.5), std= [0.5])]) dataset= ImageFolder ("/ home/xxx/learn_pytorch/", transform = data_transform) train_loader = Data.DataLoader (dataset=dataset, batch_size=BATCH_SIZE, shuffle=True)

Its constructor requires the input of two parameters, a root directory, a data operation, because the picture is automatically read into PILimage data format, so Totensor () is essential, and can use transforms.Compose to synthesize many operations into a parameter input, you can achieve data enhancement, very convenient. The above example is first converted to tensor, and then normalized, without doing various operations of data enhancement. If you want to enhance the data, you can add some cropping, reversing and so on. Like the one below.

Transforms.RandomSizedCroptransforms.RandomHorizontalFlip ()

Another problem is how to know what tag the folder name is mapped to, which can directly view the class_to_idx property of the defined object.

For the dataset object generated by this ImageFolder, the first dimension is the number of pictures, and the second dimension element 0 is the picture matrix element 1 is label.

The next step is to build models and train.

The training process is the same as the first one.

The third kind

This case is the most common and applies to mappings that are not a classification problem, or where the tag is not a simple file name.

The idea is to define a derived class of Dataset yourself, and you need to define data processing, data enhancement, and so on. You can use _ _ call_ () to define these definitions.

The implementation process is as follows:

First

Define a derived class of Dataset that aims to overload two magic methods _ _ len _ (), _ _ getitem__ ()

The _ _ len _ () function is called and returned when len (object) is called. The purpose of overloading is to return the size of the dataset when called.

The _ _ getitem _ () function makes object programming iterative, and after defining it, the object can be iterated by for statements. The purpose of overloading it is to enable it to return a sample of the dataset every iteration.

Now define a derived class

Class FaceLandmarksDataset (Dataset): "FaceLandmarks dataset." Def _ init__ (self, csv_file, root_dir, transform=None): "Args: csv_file (string): Path to the csv file with annotations. Root_dir (string): Directory with all the images. Transform (callable, optional): Optional transform to be applied ona sample." Self.landmarks_frame = pd.read_csv (csv_file) self.root_dir = root_dir self.transform = transform def _ _ len__ (self): return len (self.landmarks_frame) def _ getitem__ (self, idx): img_name = os.path.join (self.root_dir, self.landmarks_ frame.iloc.idx 0]) image = io.imread (img_name) landmarks = self.landmarks_frame.iloc [idx, 1:] .as _ matrix () landmarks = landmarks.astype ('float'). Reshape (- 1,2) sample = {' image': image, 'landmarks': landmarks} if self.transform: sample = self.transform (sample) return sample

The constructor defines some attributes, such as reading out the table that holds the entire data set, and then len returns the number of data sets. Getitem defines the iterative return of a data set sample, and the return value can be a list containing training samples and tags, or a dictionary, which is different according to the later usage (it is just the difference between whether the index is a number or a key).

In addition, Dataset will generally require input on the dataset operation, if you do not want to enhance the data, add a ToTensor (because to be converted to tensor in order to train), if you want to enhance the data to add some new classes themselves (yes, ToTensor, various data enhancement functions are actually a class, and then define an object), and then use transforms.Compose to connect them together. The above transform is written as None, but it is output directly without data processing.

Then instantiate the class and enter it as a parameter to DataLoader

Face_dataset = FaceLandmarksDataset (csv_file='faces/face_landmarks.csv', root_dir='faces/')

At this point, analyze the object, define its parameters are needed by the init constructor, and then automatically call getitem when iterating over it, for example, the following operation results in

For i in range (len (face_dataset)): sample = face_ dataset [I] print (sample ['image']) print (iMagna sample [' image'] .shape, sample ['landmarks'] .shape)

You can see that a dictionary is entered for each iteration.

Next, by defining DataLoader, you can iterate through the input, but not here, because you need to convert the dataset to tensor before you can input it into the model for training.

Then the next step is to consider how to change the transform in the DataSet class. At first, it is given to None without processing, so what comes out is still ImageArray, at least to implement ToTensor.

The magic function _ _ call _ () is mainly used to implement the class ToTensor.

The _ _ call__ () function is special, which makes the object itself callable. It can be followed by parentheses and input parameters, and then the magic function call will be called automatically.

The implementation of the Totensor class is as follows. Note that the difference between numpy and tensor array is that the number of channels is the last and the number of channels is first, so you also need to exchange the positions of different dimensions.

Class ToTensor (object): "Convert ndarrays in sample to Tensors." Def _ call__ (self, sample): image, landmarks = sample ['image'], sample [' landmarks'] # swap color axis because # numpy image: H x W x C # torch image: C X H X W image = image.transpose ((2,0,1) return {'image': torch.from_numpy (image),' landmarks': torch.from_numpy (landmarks)}

When you use it, you first define an object, and then the object (parameter) automatically calls the call function.

Let's take a look at the implementation of several data-enhanced classes, all of which are similar to the argument of the call function is sample, that is, the input data set.

Class Rescale (object): "Rescale the image in a sample to a given size. Args: output_size (tuple or int): Desired output size. If tuple, output is matched to output_size. If int, smaller of image edges is matched to output_size keeping aspect ratio the same." Def _ init__ (self, output_size): assert isinstance (output_size, (int, tuple)) self.output_size = output_size def _ call__ (self, sample): image, landmarks = sample ['image'], sample [' landmarks'] h, w = image.shape [: 2] if isinstance (self.output_size Int): if h > w: new_h, new_w = self.output_size * h / w, self.output_size else: new_h, new_w = self.output_size, self.output_size * w / h else: new_h, new_w = self.output_size new_h, new_w = int (new_h) Int (new_w) img = transform.resize (image, (new_h, new_w)) # h and w are swapped for landmarks because for images, # x and y axes are axis 1 and 0 respectively landmarks = landmarks * [new_w / w, new_h / h] return {'image': img,' landmarks': landmarks} class RandomCrop (object): "" Crop randomly the image in a sample. Args: output_size (tuple or int): Desired output size. If int, square crop is made. " Def _ init__ (self, output_size): assert isinstance (output_size, (int, tuple)) if isinstance (output_size, int): self.output_size = (output_size, output_size) else: assert len (output_size) = = 2 self.output_size = output_size def _ call__ (self, sample): image, landmarks = sample ['image'] Sample ['landmarks'] h, w = image.shape [: 2] new_h, new_w = self.output_size top = np.random.randint (0, h-new_h) left = np.random.randint (0, w-new_w) image = image [top: top + new_h, left: left + new_w] landmarks = landmarks-[left Top] return {'image': image,' landmarks': landmarks}

These two are clear. First, the constructor requires that parameters be entered when the object is defined, and then the object is called directly using call.

You can use it when you use it.

Transformed_dataset = FaceLandmarksDataset (csv_file='faces/face_landmarks.csv', root_dir='faces/', transform=transforms.Compose) [Rescale (256), RandomCrop (224) ToTensor ()) for i in range (len (transformed_dataset)): sample = transformed_ dataset [I] print (I, sample ['image'] .size (), sample [' landmarks'] .size ()) if I = 3: break

For analysis, first define the object of the overloaded DataSet class, and the transform parameter is written as the combination of the three operation classes defined above. Go back to the definition of this class.

Self.transform = transform

An object that combines three classes is defined above.

If self.transform: sample = self.transform (sample)

Then call the object directly, call the call functions of the three classes, and return the processed data set.

Finally, we can train iteratively.

Dataloader = DataLoader (transformed_dataset, batch_size=4, shuffle=True, num_workers=4)

Define an object of DataLoader, the remaining usage is the same as the second kind, double loop is trained, this DataLoader also has some skill, that is, each time it iterates over it, it returns the form of the return value of the DataSet class object, but the content adds a dimension in front of it, the size is batch_size, that is to say, when the DataLoader object is called, batch_size samples are taken out of the iterator each time. And stack them (this stack is stacked in a list / dictionary), and the content of each iteration is a dictionary / array.

Pytorch learning record

This is a simple model I built casually. Test it.

Import osimport torchimport torch.nn as nnimport torch.utils.data as Dataimport torchvisionimport matplotlib.pyplot as pltfrom torchvisionimport transformsfrom torchvision.datasets import ImageFolderimport matplotlib.pyplot as plt%matplotlib inline# defines several parameters EPOCH = 20BATCH_SIZE = 4LR = 0.00read data data_transform = transforms.Compose ([transforms.ToTensor (), transforms.Normalize (mean= [0.5, 0.5], std= [0.5, 0.5])]) dataset = ImageFolder ("/ home/xxx/learn_pytorch/") Transform = data_transform) print (dataset [0] [0] .size ()) print (dataset.class_to_idx) # define train_loader = Data.DataLoader (dataset=dataset, batch_size=BATCH_SIZE, shuffle=True) # define model class Is the inheritance class of nn.Module, the idea is to first define each layer, each is an attribute of the model class, and then define a member function forward () as the forward propagation process, so that each layer can be connected. Through this, the whole model class CNN (nn.Module): def _ init__ (self): super (CNN,self). _ _ init__ () self.conv1 = nn.Sequential (nn.Conv2d), nn.ReLU (), nn.MaxPool2d (kernel_size=2) ) self.conv2 = nn.Sequential (nn.Conv2d (16,32,5,1,2), nn.ReLU (), nn.MaxPool2d (2),) self.conv3 = nn.Sequential (nn.Conv2d (32,64,5,1,2)) Nn.ReLU (), nn.MaxPool2d (2),) self.conv4 = nn.Sequential (nn.Conv2d (64,128,5,1,2), nn.ReLU (), nn.MaxPool2d (2) ) self.out1 = nn.Sequential (nn.Linear (128 / 16 / 30, 1000), nn.ReLU (),) self.out2 = nn.Sequential (nn.Linear (100,100), nn.ReLU (),) self.out3 = nn.Sequential (nn.Linear (100,4) ) def forward (self, x): X = self.conv1 (x) x = self.conv2 (x) x = self.conv3 (x) x = self.conv4 (x) x = x.view (x.size (0),-1) # flatten the output of conv2 to (batch_size 32 * 7 * 7) x = self.out1 (x) x = self.out2 (x) output = self.out3 (x) return output, x # return x for visualization# if you use GPU training, put the model and tensor on the GPU Cnn = CNN (). Cuda () print (cnn) # defines the optimizer object, the loss function optimizer = torch.optim.Adam (cnn.parameters (), lr=LR) # optimize all cnn parametersloss_func = nn.CrossEntropyLoss () # the target label is not one-hotted# double loop starts training, and the outer loop is the number of iterations The second loop is to read and train for epoch in range (EPOCH) to batch_size data each time: accy_count = 0 for step, in enumerate (train_loader): output = cnn (b_x.cuda) [0] loss = loss_func (output) B_y.cuda () # carcute loss optimizer.zero_grad () # clear gradient loss.backward () # sovel gradient optimizer.step () # gradient sovel output_index = torch.max (output 1) [1] .CPU () .data.numpy () accy_count + = float ((output_index==b_y.data.numpy ()) .astype (int) .sum ()) accuracy = accy_count/ (BATCH_SIZE * train_loader.__len__ ()) print ("Epoch:", epoch, "accuracy is:", accuracy) Note

When training with GPU, put the model and tensor on the GPU, which is followed by a .cuda (). For example, when defining model objects, cnn.cuda ()

And when entering the model and calculating loss, b_x.cuda () b_y.cuda ()

Tensor a to numpy a.data.numpy ()

If you are on GPU, a.cpu () .data.numpy () first.

Nn.CrossEntropyLoss () this loss function is a big pit, it is softmax + normalization, so when using this loss function, do not add softmax to the model at last, otherwise you will find that your loss is only a few values, and you can't go down.

Input the input image of the model in a four-dimensional matrix in the format of (batch_size,Nc,H,W)

The above is about the content of this article on "how to use pytorch to read datasets". I believe we all have a certain understanding. I hope the content shared by the editor will be helpful to you. If you want to know more related knowledge, please pay attention to the industry information channel.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.