How to establish Neural Network quickly and accurately with PyTorch 11/03 Update SLTechnology News&Howtos

How to establish Neural Network quickly and accurately with PyTorch

2025-11-03 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Shulou(Shulou.com)06/01 Report--

This article is about how to use PyTorch to build a neural network quickly and accurately. I think it is very practical, so I share it with you. I hope you can get something after reading this article. Let's take a look at it.

You may have seen N times on social media about the polarization of PyTorch and TensorFlow. The popularity of these frameworks has promoted the rise of deep learning in recent years. Both have staunch supporters, but in the past year, a clear winner has begun to emerge.

PyTorch is one of the most popular frameworks in 2018. It has quickly become the preferred deep learning framework for academia and industry researchers. After using PyTorch over the past few weeks, I realize that it is a very flexible and easy-to-use deep learning library.

We will not only learn the theory-but also write four different use cases to see how PyTorch performs. Building a deep learning model has never been so interesting!

What is PyTorch?

Before delving into the implementation of PyTorch, let's take a look at what PyTorch is and why it has become so popular recently.

PyTorch is a scientific computing package based on Python, similar to NumPy, which has additional functions of GPU. At the same time, it is also a deep learning framework, which provides maximum flexibility and speed for the implementation and construction of deep neural network architecture.

The recently released PyTorch 1. 0 helps researchers address four major challenges:

Large area rework

Time-consuming training

The Python language lacks flexibility

Slow expansion

In essence, PyTorch differs from other deep learning frameworks in two ways:

Imperative programming

Dynamic calculation diagram

Imperative programming: PyTorch performs calculations while traversing each line of code, which is very similar to the way Python programs are executed. this concept is called imperative programming, and its greatest advantage is that it can debug code and programming logic dynamically.

Dynamic computing graph: PyTorch is called a "run-defined" framework, which means that the computational graph structure (neural network architecture) is generated at run time. The main advantage of this attribute is that it provides a flexible programming runtime interface to facilitate the construction and modification of the system through connection operations. In PyTorch, a new calculation graph is defined at each forward path, which is in sharp contrast to TensorFlow, which uses static graphs.

PyTorch2.0 comes with an important feature called torch.jit, which is an advanced compiler that allows users to separate models from code. In addition, it supports effective model optimization on custom hardware such as GPU or TPU.

Using PyTorch to construct Neural Network

Let's understand PyTorch through an actual case. It's good to study theory, but it won't be of much use if you don't put it into practice.

The PyTorch implementation of the neural network looks exactly the same as the NumPy implementation. The goal of this section is to demonstrate the equivalence of PyTorch and NumPy. To do this, let's create a simple three-layer network with five nodes in the input layer, three nodes in the hidden layer, and one node in the output layer. We only use a single-line training example with five features and one goal.

Import torch

N_input, n_hidden, n_output = 5, 3, 1

The first step is to initialize the parameters. Here, the weight and offset parameters of each layer are initialized as tensor variables. Tensor is the basic data structure of PyTorch, which is used to build different types of neural networks. They can be regarded as generalizations of arrays and matrices, in other words, tensors are N-dimensional matrices.

# # initialize tensor for inputs, and outputs

X = torch.randn (1, n_input)

Y = torch.randn (1, n_output)

# # initialize tensor variables for weights

W1 = torch.randn (n_input, n_hidden) # weight for hidden layer

W2 = torch.randn (n_hidden, n_output) # weight for output layer

# # initialize tensor variables for bias terms

B1 = torch.randn ((1, n_hidden)) # bias for hidden layer

B2 = torch.randn ((1, n_output)) # bias for output layer

After the parameter initialization is complete, the neural network can be defined and trained through the following four key steps:

Forward propagation

Loss calculation

Back propagation

Update parameters

Let's look at each step in more detail.

Forward propagation: in this step, each layer uses the following two formulas to calculate the active flow. These activation streams flow from the input layer to the output layer to generate the final output.

1. Z = weight * input + bias

2. A = activation_function (z)

The following code block shows how to write these steps in PyTorch. Note that most functions, such as exponents and matrix multiplication, are similar to those in NumPy.

# # sigmoid activation function using pytorch

Def sigmoid_activationreturn 1 / (1 + torch.exp (- z))

# # activation of hidden layer

Z1 = torch.mm (x, W1) + b1

A1 = sigmoid_activation (Z1)

# # activation (output) of final layer

Z2 = torch.mm (A1, w2) + b2

Output = sigmoid_activation (Z2)

Loss calculation: this step calculates the error (also known as loss) in the output layer. A simple loss function can be used to measure the difference between actual and predicted values. Later, we will look at the different types of loss functions available in PyTorch.

Loss = y-output

Back propagation: the purpose of this step is to minimize the error in the output layer by making marginal changes in deviations and weights, which are calculated using the derivative of the error term.

According to the calculus principle of chain rules, the incremental changes are returned to the hidden layer, and their weights and deviations are modified accordingly. By adjusting the weight and deviation, the error is minimized.

# # function to calculate the derivative of activation

Def sigmoid_deltareturn x * (1-x)

# # compute derivative of error terms

Delta_output = sigmoid_delta (output)

Delta_hidden = sigmoid_delta (A1)

# # backpass the changes to previous layers

D_outp = loss * delta_output

Loss_h = torch.mm (d_outp, w2.t ())

D_hidn = loss_h * delta_hidden

Update parameters: the final step is to update the weights and deviations using the incremental changes received from the above back propagation.

Learning_rate = 0.1

W2 + = torch.mm (a1.t (), d_outp) * learning_rate

W1 + = torch.mm (x.t (), d_hidn) * learning_rate

B2 + = d_outp.sum () * learning_rate

B1 + = d_hidn.sum () * learning_rate

When you perform these steps on multiple epochs using a large number of training examples, the loss is minimized. After the final weight and deviation are obtained, it is used to predict the unknown data.

Use case 1: handwritten numeral classification

In the previous section, we saw a simple use case for writing neural networks in PyTorch. In this section, we will build and train neural networks using different utility packages (nn, autograd, Optimm, torchvision, torchtext, etc.) provided by PyTorch.

Using these packages, neural networks can be easily defined and managed. In this use case, we will create a multilayer perceptron (MLP) network for building handwritten number classifiers. We will use the MNIST dataset in the torchvision package.

As with any project you're going to work on, the first step is data preprocessing: you need to convert the original data set into a tensor and normalize it within a fixed range. The torchvision package provides a utility called transforms that allows you to combine different transformations together.

From torchvision import transforms

_ tasks = transforms.Compose ([

Transforms.ToTensor ()

Transforms.Normalize ((0.5,0.5,0.5), (0.5,0.5,0.5))

])

The first transformation is to convert the original data into a tensor, and the second transformation is to perform normalization by:

X_normalized = x-mean / std

A value of 0.5 represents the mean and standard deviation of the red, green and blue channels.

From torchvision.datasets import MNIST

# # Load MNIST Dataset and apply transformations

Mnist = MNIST ("data", download=True, train=True, transform=_tasks)

Another excellent utility for PyTorch is the DataLoader iterator, which makes it possible to batch, move, and load data in parallel between multiple processors. To evaluate this model, we divide the dataset into training sets and validation sets.

From torch.utils.data import DataLoader

From torch.utils.data.sampler import SubsetRandomSampler

# # create training and validation split

Split = int (0.8 * len (mnist))

Index_list = list (range (len (mnist)

Train_idx, valid_idx = index_list [: split], index_ list [split:]

# # create sampler objects using SubsetRandomSampler

Tr_sampler = SubsetRandomSampler (train_idx)

Val_sampler = SubsetRandomSampler (valid_idx)

# # create iterator objects for train and valid datasets

Trainloader = DataLoader (mnist, batch_size=256, sampler=tr_sampler)

Validloader = DataLoader (mnist, batch_size=256, sampler=val_sampler)

The neural network architecture in PyTorch can be defined as a class that inherits all the properties of the underlying class of the nn package called Module. Inheritance from the nn.Module class makes it easy to implement, access, and call multiple methods, and to define layers in the constructor of the class, as well as forward functions in the forward propagation step.

We will define a network with the following layer configuration: [784J 128J 10]. This configuration means that there are 784 nodes (28028 pixels) in the input layer, 128 nodes in the hidden layer, and 10 nodes in the output layer. In the forward function, we will use the Sigmoid activation function in the hidden layer (accessible through the nn module).

Import torch.nn.functional as F

Class Modeldef _ init__784, 128)

Self.output = nn.Linear (128,10)

Def forwardreturn x

Model = Model ()

Define loss functions and optimizers using nn and Optim packages:

From torch import optim

Loss_function = nn.CrossEntropyLoss ()

Optimizer = optim.SGD (model.parameters (), lr=0.01, weight_decay= 1e-6, momentum = 0.9, nesterov = True)

Now that you are ready to start training the model, the core steps are the same as in the previous section: forward propagation, loss calculation, back propagation, and updating parameters.

For epoch in range (1,11): # # run the model for 10 epochs

Train_loss, valid_loss = [], []

# # training part

Model.train ()

For data, target in trainloader:

Optimizer.zero_grad ()

# # 1. Forward propagation

Output = model (data)

# # 2. Loss calculation

Loss = loss_function (output, target)

# # 3. Backward propagation

Loss.backward ()

# # 4. Weight optimization

Optimizer.step ()

Train_loss.append (loss.item ())

# # evaluation part

Model.eval ()

For data, target in validloader:

Output = model (data)

Loss = loss_function (output, target)

Valid_loss.append (loss.item ())

Print ("Epoch:", epoch, "Training Loss:", np.mean (train_loss), "Valid Loss:", np.mean (valid_loss))

> Epoch: 1 Training Loss: 0.645777 Valid Loss: 0.344971

> Epoch: 2 Training Loss: 0.320241 Valid Loss: 0.299313

> Epoch: 3 Training Loss: 0.278429 Valid Loss: 0.269018

> Epoch: 4 Training Loss: 0.246289 Valid Loss: 0.237785

> Epoch: 5 Training Loss: 0.217010 Valid Loss: 0.217133

> Epoch: 6 Training Loss: 0.193017 Valid Loss: 0.206074

> Epoch: 7 Training Loss: 0.174385 Valid Loss: 0.180163

> Epoch: 8 Training Loss: 0.157574 Valid Loss: 0.170064

> Epoch: 9 Training Loss: 0.144316 Valid Loss: 0.162660

> Epoch: 10 Training Loss: 0.133053 Valid Loss: 0.152957

After the training of the model is completed, the prediction can be made on the basis of the verification data.

# # dataloader for validation dataset

Dataiter = iter (validloader)

Data, labels = dataiter.next ()

Output = model (data)

_, preds_tensor = torch.max (output, 1)

Preds = np.squeeze (preds_tensor.numpy ())

Print ("Actual:", labels [: 10])

Print ("Predicted:", preds [: 10])

> Actual: [0 1 1 1 2 2 8 8 2 8]

> Predicted: [0 1 1 1 2 2 8 8 2 8]

Use case 2: object image classification

Now let's go one step further.

In this use case, we will create a convolution neural network (CNN) architecture in PyTorch to classify object images using the popular CIFAR-10 data set, which is also included in the torchvision package. The whole process of defining and training the model will be the same as the previous use case, except that additional layers are introduced into the network.

Load and convert the dataset:

# # load the dataset

From torchvision.datasets import CIFAR10

Cifar = CIFAR10 ('data', train=True, download=True, transform=_tasks)

# # create training and validation split

Split = int (0.8 * len (cifar))

Index_list = list (range (len (cifar)

Train_idx, valid_idx = index_list [: split], index_ list [split:]

# # create training and validation sampler objects

Tr_sampler = SubsetRandomSampler (train_idx)

Val_sampler = SubsetRandomSampler (valid_idx)

# # create iterator objects for train and valid datasets

Trainloader = DataLoader (cifar, batch_size=256, sampler=tr_sampler)

Validloader = DataLoader (cifar, batch_size=256, sampler=val_sampler)

We will create three convolution layers for low-level feature extraction, three pooling layers for maximum information extraction, and two linear layers for linear classification.

Class Modeldef _ _ init__## define the layers

Self.conv1 = nn.Conv2d (3,16,3, padding=1)

Self.conv2 = nn.Conv2d (16, 32, 3, padding=1)

Self.conv3 = nn.Conv2d (32, 64, 3, padding=1)

Self.pool = nn.MaxPool2d (2,2)

Self.linear1 = nn.Linear (1024, 512)

Self.linear2 = nn.Linear (512,10)

Def forward-1, 1024) # # reshaping

X = F.relu (self.linear1 (x))

X = self.linear2 (x)

Return x

Model = Model ()

Define loss functions and optimizers:

Import torch.optim as optim

Loss_function = nn.CrossEntropyLoss ()

Optimizer = optim.SGD (model.parameters (), lr=0.01, weight_decay= 1e-6, momentum = 0.9, nesterov = True)

# # run for 30 Epochs

For epoch in range (1,31):

Train_loss, valid_loss = [], []

# # training part

Model.train ()

For data, target in trainloader:

Optimizer.zero_grad ()

Output = model (data)

Loss = loss_function (output, target)

Loss.backward ()

Optimizer.step ()

Train_loss.append (loss.item ())

# # evaluation part

Model.eval ()

For data, target in validloader:

Output = model (data)

Loss = loss_function (output, target)

Valid_loss.append (loss.item ())

After the training of the model is completed, the prediction can be made on the basis of the verification data.

# # dataloader for validation dataset

Dataiter = iter (validloader)

Data, labels = dataiter.next ()

Output = model (data)

_, preds_tensor = torch.max (output, 1)

Preds = np.squeeze (preds_tensor.numpy ())

Print ("Actual:", labels [: 10])

Print ("Predicted:", preds [: 10])

Actual: ['truck',' horse', 'bird',' truck', 'ship',' bird', 'deer',' bird']

Pred: ['truck',' automobile', 'automobile',' horse', 'bird',' airplane', 'ship',' bird', 'deer',' bird']

Use case 3: emotional text classification

We will move from computer vision use cases to natural language processing in order to demonstrate the different applications of PyTorch in different areas.

In this section, we will use Pyotch based on RNN (recurrent neural network) and LSTM (long-term and short-term memory) layers to complete the task of text classification. First, load the dataset that contains two fields (text and target). The goal contains two classes: class1 and class2, and our task is to divide each text into one of them.

You can download the dataset in the link below.

Https://s3-ap-south-1.amazonaws.com/av-blog-media/wp-content/uploads/2019/01/train.csv

Train = pd.read_csv ("train.csv")

X_train = train ["text"] .values

Y_train = train ['target'] .values

It is strongly recommended that you set up the seed before coding, which ensures that the results you see are the same as mine-a feature that is very useful (and useful) when learning new concepts.

Np.random.seed (123)

Torch.manual_seed (123)

Torch.cuda.manual_seed (123)

Torch.backends.cudnn.deterministic = True

In the preprocessing step, the text data is first converted into a tokens sequence, which can then be passed to the embedding layer. I'll use the utilities provided in the Keras package for preprocessing, as can the torchtext package.

From keras.preprocessing import text, sequence

# # create tokens

Tokenizer = Tokenizer (num_words = 1000)

Tokenizer.fit_on_texts (x_train)

Word_index = tokenizer.word_index

# # convert texts to padded sequences

X_train = tokenizer.texts_to_sequences (x_train)

X_train = pad_sequences (x_train, maxlen = 70)

Next, you need to convert the tokens into a vector. For this reason, the pre-trained GloVe word embedding is used. We will load these word embedding and create an embedding matrix that contains the word vector.

GloVe:

Https://github.com/stanfordnlp/GloVe

EMBEDDING_FILE = 'glove.840B.300d.txt'

Embeddings_index = {}

For I, line in enumerate (open (EMBEDDING_FILE)):

Val = line.split ()

Embeddings_index [val [0]] = np.asarray (val [1:], dtype='float32')

Embedding_matrix = np.zeros ((len (word_index) + 1,300))

For word, I in word_index.items ():

Embedding_vector = embeddings_index.get (word)

If embedding_vector is not None:

Embedding_ Matrix [I] = embedding_vector

Define the model schema using the embedded layer and the LSTM layer:

Class Modeldef _ _ init__## Embedding Layer, Add parameter

Self.embedding = nn.Embedding (max_features, embed_size)

Et = torch.tensor (embedding_matrix, dtype=torch.float32)

Self.embedding.weight = nn.Parameter (et)

Self.embedding.weight.requires_grad = False

Self.embedding_dropout = nn.Dropout2d (0.1)

Self.lstm = nn.LSTM (300,40)

Self.linear = nn.Linear (40,16)

Self.out = nn.Linear (16,1)

Self.relu = nn.ReLU ()

Def forward1)

Linear = self.relu (self.linear (max_pool))

Out = self.out (linear)

Return out

Model = Model ()

Create training and validation sets:

From torch.utils.data import TensorDataset

# # create training and validation split

Split_size = int (0.8 * len (train_df))

Index_list = list (range (len (train_df)

Train_idx, valid_idx = index_list [: split], index_ list [split:]

# # create iterator objects for train and valid datasets

X_tr = torch.tensor (x _ train [idx], dtype=torch.long)

Y_tr = torch.tensor (y _ train [train _ idx], dtype=torch.float32)

Train = TensorDataset (x_tr, y_tr)

Trainloader = DataLoader (train, batch_size=128)

X_val = torch.tensor (x _ valid [valid _ idx], dtype=torch.long)

Y_val = torch.tensor (y _ valid [valid _ idx], dtype=torch.float32)

Valid = TensorDataset (x_val, y_val)

Validloader = DataLoader (valid, batch_size=128)

Define losses and optimizers:

Loss_function = nn.BCEWithLogitsLoss (reduction='mean')

Optimizer = optim.Adam (model.parameters ())

Training model:

# # run for 10 Epochs

For epoch in range (1,11):

Train_loss, valid_loss = [], []

# # training part

Model.train ()

For data, target in trainloader:

Optimizer.zero_grad ()

Output = model (data)

Loss = loss_function (output, target.view (- 1))

Loss.backward ()

Optimizer.step ()

Train_loss.append (loss.item ())

# # evaluation part

Model.eval ()

For data, target in validloader:

Output = model (data)

Loss = loss_function (output, target.view (- 1))

Valid_loss.append (loss.item ())

Finally, the predicted results are obtained.

Dataiter = iter (validloader)

Data, labels = dataiter.next ()

Output = model (data)

_, preds_tensor = torch.max (output, 1)

Preds = np.squeeze (preds_tensor.numpy ())

Actual: [0 1 1 1 0 0 0]

Predicted: [0 1 1 1 0 0]

Use case 4: image style migration

Let's look at the last use case, where we will perform the migration of graphical styles. This is one of the most creative projects I have ever experienced. I hope you can have a good time, too. The basic idea behind the concept of style migration is:

Get objects / content from an image

Get a style / texture from another image

Generate the final image of the mixture of the two

This concept is introduced in the paper "Image style migration using convolution network". An example of style migration is as follows:

That's great, isn't it? Let's see how it is implemented in PyTorch. This process consists of six steps:

Low-level features are extracted from two input images. This can be done using a pre-trained deep learning model like VGG 19.

From torchvision import models

# get the features portion from VGG19

Vgg = models.vgg19 (pretrained=True) .features

# freeze all VGG parameters

For param in vgg.parameters ():

Param.requires_grad_ (False)

# check if GPU is available

Device = torch.device ("cpu")

If torch.cuda.is_available ():

Device = torch.device ("cuda")

Vgg.to (device)

Load these two images on the device and obtain features from the VGG. In addition, the following transformations can be applied: adjust the size of the tensor and normalize the values.

From torchvision import transforms as tf

Def transformation400), tf.ToTensor ()

Tf.Normalize ((0.44, 0.44), (0.22, 0.22, 0.22)

Img = tasks (img) [: 3jigma Magi:] .unsqueeze (0)

Return img

Img1 = Image.open ("image1.jpg") .convert ('RGB')

Img2 = Image.open ("image2.jpg") .convert ('RGB')

Img1 = transformation (img1) .to (device)

Img2 = transformation (img2) .to (device)

Now, we need to get the relevant features of these two images. From the first image, we need to extract the content or features related to the existing object; from the second image, we need to extract the features related to style and texture.

Object-related features: in the initial article, the author suggested that more valuable objects and content can be extracted from the initial layer of the network, because at a higher level, the information space becomes more complex. Pixel information details are often lost at high levels.

Style-related features: in order to obtain style and texture information from the second image, the author uses the correlation between different features at different levels, which is explained in detail in point 4 below.

Before achieving this goal, let's take a look at the structure of a typical VGG 19 model:

The object information is extracted using the CONV42 layer, which is located in the fourth convolution block with a depth of 512. For the expression of styles, the layers used are the first convolution layers of each convolution block in the network, namely CONV11, CONV21, CONV31, CONV41 and CONV51. The selection of these layers is purely based on the author's experience, and I will only copy their results in this article.

Def get_features'0': 'conv1_1',' 5: 'conv2_1',' 10: 'conv3_1'

'1919:' conv4_1','21: 'conv4_2',' 28: 'conv5_1'}

X = image

Features = {}

For name, layer in model._modules.items ():

X = layer (x)

If name in layers:

Layers [name] = x

Return features

Img1_features = get_features (img1, vgg)

Img2_features = get_features (img2, vgg)

As mentioned earlier, the author uses different levels of correlation to obtain style-related features. The correlation of these features is given by the Gram matrix G, where each unit in G (iMagnej) is the inner product between the vector feature maps I and j in the layer.

Def correlation_matrixreturn correlation

Correlations = {l: correlation_matrix (img2_ examples [l]) for l in

Img2_features}

Finally, these features and correlations can be used for style transformation. Now, in order to convert a style from one image to another, you need to set the weight of each layer used to obtain the style features. As mentioned above, because the initial layer provides more information, you can set a higher weight for the initial layer. In addition, the optimizer function and the target image are defined, that is, a copy of image 1.

Weights = {'conv1_1': 1.0,' conv2_1': 0.8, 'conv3_1': 0.25

'conv4_1': 0.21,' conv5_1': 0.18}

Target = img1.clone () .requires_grad_ (True) .to (device)

Optimizer = optim.Adam ([target], lr=0.003)

Start the loss minimization process: run a large number of steps in the loop to calculate the losses associated with object feature extraction and style feature extraction. Using the minimized loss, the network parameters are updated to further modify the target image. After some iterations, the updated image is generated.

For ii in range (1, 2001):

# # calculate the content loss (from image 1 and target)

Target_features = get_features (target, vgg)

Loss = target_features ['conv4_2']-img1_features [' conv4_2']

Content_loss = torch.mean ((loss) * * 2)

# # calculate the style loss (from image 2 and target)

Style_loss = 0

For layer in weights:

Target_feature = target_ players [layer]

Target_corr = correlation_matrix (target_feature)

Style_corr = correlations [layer]

Layer_loss = torch.mean ((target_corr-style_corr) * * 2)

Layer_loss * = weights [layer]

_, d, h, w = target_feature.shape

Style_loss + = layer_loss / (d * h * w)

Total_loss = 1e6 * style_loss + content_loss

Optimizer.zero_grad ()

Total_loss.backward ()

Optimizer.step ()

Finally, we can see the predicted results, where I have only run a small number of iterations and can run as many as 3000 iterations (if there are enough computing resources!) .

Def tensor_to_image "cpu"). Clone (). Detach ()

Image = image.numpy () .squeeze ()

Image = image.transpose (1,2,0)

Image * = np.array ((0.22,0.22,0.22))

+ np.array ((0.44,0.44,0.44))

Image = image.clip (0,1)

Return image

Fig, (ax1, ax2) = plt.subplots (1,2, figsize= (20,10))

Ax1.imshow (tensor_to_image (img1))

Ax2.imshow (tensor_to_image (target)) above is how to use PyTorch to build neural networks quickly and accurately. The editor believes that there are some knowledge points that we may see or use in our daily work. I hope you can learn more from this article. For more details, please follow the industry information channel.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.