What are the differences between pytorch and tensorflow 04/17 Update SLTechnology News&Howtos

What are the differences between pytorch and tensorflow

2025-04-17 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Development >

Shulou(Shulou.com)06/03 Report--

This article mainly introduces "what are the differences between pytorch and tensorflow". In daily operation, I believe many people have doubts about the differences between pytorch and tensorflow. The editor consulted all kinds of materials and sorted out simple and easy-to-use methods of operation. I hope it will be helpful to answer the questions of "what are the differences between pytorch and tensorflow?" Next, please follow the editor to study!

> Source: Author

The data science community is a space full of vitality and cooperation. We learn from each other's publications, debate ideas about forums and online sites, and share a lot of code (and a lot of code). The natural side effect of this cooperative spirit is the high likelihood of encountering unfamiliar tools used by colleagues. Because we do not work in a vacuum, it often makes sense to gain familiarity with multiple languages and libraries in a given subject area in order to collaborate and learn most effectively.

It's not surprising that many data scientists and machine learning engineers have two popular machine learning frameworks in their toolboxes: Tensorflow and Pytorch. These frameworks-in Python-share many similarities and disagree in meaningful ways. These differences, such as how they handle API, load data, and support professional domains, can alternate between the two frameworks, which are cumbersome and inefficient. This is a problem that shows how common these two tools are.

Therefore, this article aims to illustrate the differences between Pytorch and Tensorflow by focusing on creating and training the basics of two simple models. In particular, we will show you how to use the dynamic subclass model using the module API from Pytorch 1.x and the module API from Tensorflow 2.x. We will look at how these frameworks automatically differ to provide a very simple implementation of gradient descent.

But first, the data

Because we focus on the core of the automatic difference / automatic derivation function (as a further study, we can automatically extract the capacity of the derivative of the function and apply gradients to some parameters. In order to use the gradient decline of these parameters) we can start with the simplest model, which is linear regression. We can use the Numpy library to generate some linear data using a little random noise, and then run our model on that virtual dataset.

Def generate_data: X = np.random.uniform (- 10,10, n) noise = np.random.normal (0,0.15n) y = (m * x + b) + noise return x.astype (np.float32), y.astype (np.float32) x plt.figure y = generate_data () plt.figure (figsize = (12prime5)) ax = plt.subplot (111Y, c = "b", label= "samples") model

Once we have the data, we can implement the regression model from the raw code in Tensorflow and Pytorch. For simplicity, we will not initially use any layer or activator, but only define two tensors, W and B, to represent the weight and offset of the linear model Y = Wx + B.

As you can see, except for a few differences in API names, the class definitions of the two models are almost the same. The most important difference is that Pytorch needs an explicit parameter object to define the weight and offset tensor captured by the graph, while TensoRFlow can automatically capture the same parameters. In fact, Pytorch parameters are Tensor subclasses that have special properties when used with module API: they automatically add SELF to the module parameter list, so SECRES appears in the parameter () iterator.

Both frameworks extract everything (_ _ call__ or forwarding) needed to generate a diagram from such definitions and execution methods, and calculate the gradients required to implement bospropagation, as shown below.

Tensorflow dynamic model class LinearRegressionKeras (tf.keras.Model): def _ init__ (self): super (). _ _ init__ () self.w = tf.Variable (tf.random.uniform (shape= [1],-0.1,0.1) self.b = tf.Variable (tf.random.uniform (shape= [1],-0.1,0.1) def _ call__ (self) X): return x * self.w + self.bPytorch dynamic model class LinearRegressionPyTorch (torch.nn.Module): def _ init__ (self): super (). _ init__ () self.w = torch.nn.Parameter (torch.Tensor (1,1) .uniform _ (- 0.1,0.1) self.b = torch.nn.Parameter (torch.Tensor (1) .uniform _ (- 0.1,0.1) def forward (self) X): return x @ self.w + self.b to build a training cycle Backpropagation and optimizer

Using these simple Tensorflow and Bytorch models, the next step is to implement the loss function, which in this case just means a square error. We can then instantiate the model class and run the training loop to implement several cycles.

Similarly, since we focus on the core automatic differential / derivation capabilities, the goal here is to build a custom training cycle using TensorFlow and Pytorch-specific automatic Diff implementations. These embodiments calculate the gradient of a simple linear function and manually optimize the weights and offset parameters using a naive gradient descent optimizer, basically minimizing the loss between the actual point and the prediction calculated between the differentiable function at each point.

For the TensorFlow training cycle, I explicitly use GradientTape API to track the forward execution and gradual attrition calculation of the model. I use GradientTape gradients to optimize weights and offset parameters. Pytorch provides a more "magical" method of automatic derivation, implicitly capturing any operation of the parameter tensor and providing gradients for optimizing weights and offset parameters without calling another API. Once I have weight and offset gradients, implementing a custom gradient descent method on Pytorch and Tensorflow is as simple as subtracting weights and bias parameters from these gradients, multiplied by a constant learning rate.

Note that because Pytorch automatically implements automatic difference / derivation, it is necessary to explicitly call no_grad api after calculating the backward propagation. This indicates that Pytorch does not calculate the gradient of the update operation for weights and offset parameters. We also need to explicitly release the previously automatically calculated gradients calculated in the forward operation to prevent Pytorch from automatically accumulating gradients in batches and loop iterations.

Tensorflow training cycle def squared_error (y_pred, y_true): return tf.reduce_mean (tf.square (y_pred-y_true)) tf_model = LinearRegressionKeras () [w, b] = tf_model.trainable_variables for epoch in range (epochs): with tf.GradientTape () as tape: predictions = tf_model (x) loss = squared_error (predictions, y) w_grad, b_grad = tape.gradient (loss) Tf_model.trainable_variables) w.assign (w-w_grad * learning_rate) b.assign (b-b_grad * learning_rate) if epoch% 20 = 0: print (f "Epoch {epoch}: Loss {loss.numpy ()}") Pytorch training cycle def squared_error (y_pred, y_true): return torch.mean (torch.square (y_pred-y_true)) torch_model = LinearRegressionPyTorch () [w B] = torch_model.parameters () for epoch in range (epochs): y_pred = torch_model (inputs) loss = squared_error (y_pred) Labels) loss.backward () with torch.no_grad (): W-= w.grad * learning_rate b-= b.grad * learning_rate w.grad.zero.b.grad.zero.if epoch% 20 = = 0: print (f "Epoch {epoch}: Loss {loss.data}") Pytorch and Tensorflow model reuse availability layer

Now that I've shown how to implement a linear regression model from the raw code in Pytorch and Tensorflow, we can see how to re-implement the same model from the TensorFlow and Pytorch libraries using dense and linear layers.

TensoRFlow and Pytorch dynamic models with existing layers

You will notice in the model initialization method that we are replacing the explicit declaration of the W and B parameters and the linear layer in Pytorch with the dense layer in TensorFlow. Both layers implement linear regression, and we will instruct them to use a single weight and offset parameter instead of the previously used explicit W and B parameters. Dense and linear implementations will internally use the same tensor statements we used previously (tf.variable and nn.parameter, respectively) to assign these tensors and associate them with the model parameter list.

We will also update the call / forward methods for these new model classes to replace manual linear regression calculations with density / linear layers.

Class LinearRegressionKeras (tf.keras.Model): def _ init__ (self): super (). _ _ init__ () self.linear = tf.keras.layers.Dense (1, activation=None) #, input_shape= [1] def call (self, x): return self.linear (x) class LinearRegressionPyTorch (torch.nn.Module): def _ init__ (self): super (LinearRegressionPyTorch) Self). _ init__ () self.linear = torch.nn.Linear (1,1) def forward (self, x): return self.linear (x) training with available optimizer and loss function

Now that we have reimplemented our Tensorflow and Pytorch models using existing layers, we can focus on how to build a more optimized training cycle. Instead of using our previous Na ï ve implementation, we will use the native optimizer and loss functions available in these libraries.

We will continue to use the automatic differential / automatic derivation function observed earlier, but at this time there is a standard gradient descent (SGD) optimization implementation and standard loss function.

Tensorflow training cycle, easy to fit method

In Tensorflow, FIT () is a very powerful, high-level training model method. It allows us to replace the manual training loop with a single method that specifies super-tuning parameters. Before calling fit (), we will compile the model class using the Compile () method, and then pass the gradient descendant optimizer and the loss function for training.

You will notice that in this case, we will reuse as many methods from the TensorFlow library as possible. In particular, we will implement (MEAL_ABSOLUTE_ERROR) the compilation method through the standard random gradient descent (SGD) optimizer and the standard average absolute error function (MEAL_ABSOLUTE_ERROR). Once the model is compiled, we can finally call the fitting method to fully train our model. We will look at the data (x and y), the number of epochs, and the batch size used in each era.

TensoRFLOF training cycle with custom loop and SGD optimizer

In the following code snippet, we will implement another custom training cycle for our model, this time reusing as many loss functions and optimizers provided by the Tensorflow library as possible. You will notice that our former custom Python loss function has been replaced with the tf.losses.mse () method. We initialized the TF.keras.optimizers.sgd () optimizer instead of manually updating the model parameters with gradients. Calling Optimizer.apply_gradient () and passing a list of weights and offset tuples updates the model parameters with a gradient.

Tf_model_train_loop = LinearRegressionKeras () optimizer = tf.keras.optimizers.SGD (learning_ratelearning_rate=learning_rate) for epoch in range (epochs * 3): x_batch = tf.reshape (x, [200,1]) with tf.GradientTape () as tape: y_pred = tf_model_train_loop (x_batch) y_pred = tf.reshape (y_pred, loss = tf.losses.mse (y_pred) Y) grads = tape.gradient (loss, tf_model_train_loop.variables) optimizer.apply_gradients (grads_and_vars=zip (grads, tf_model_train_loop.variables)) if epoch% 20 = 0: print (f "Epoch {epoch}: Loss {loss.numpy ()}") Pytorch training cycle with custom loop and SGD optimizer

Like the previous Tensorflow snippet above, the following code snippet implements the Pytorch training cycle of the new model by reusing the missing functionality provided by the Pytorch library and the optimizer. You will notice that we will replace our previous custom Python loss function with the NN.Mseloss () method and initialize the standard Optim.sgd () optimizer, which contains a list of learning parameters for the model. As mentioned earlier, we will instruct Pytorch to obtain the correlation gradient (load.backward ()) of each parameter tensor from the loss backward propagation. Finally, we can easily update the new standard optimizer and all parameters associated with the gradient to update the new standard optimizer .step () method. The way Pytorch automatically correlates tensors and gradients allows the optimizer to retrieve tensors and gradients to update them at a configured learning rate.

Torch_model = LinearRegressionPyTorch () criterion = torch.nn.MSELoss (reduction='mean') optimizer = torch.optim.SGD (torch_model.parameters (), lr=learning_rate) for epoch in range (epochs * 3): y_pred = torch_model (inputs) loss = criterion (y_pred, labels) optimizer.zero_grad () loss.backward () optimizer.step () if epoch% 20 = 0: print (f "Epoch {epoch}: Loss {loss.data}")

As we have seen, TensoRFlow and Pytorch automatic differentiation are very similar to the dynamic subclass API, even if they are implemented using standard SGD and MSE. Of course, these two models also give us very similar results.

In the following code snippet, we use Tensorflow's Training_variables and Pytorch parameter methods to get access to the parameters of the model and chart the linear functions we have learned.

[w_tf, b_tf] = tf_model_fit.trainable_variables [w2_tf, b2_tf] = tf_model_train_loop.trainable_variables [w_torch, b_torch] = torch_model.parameters () w_tf = tf.reshape (w_tf, [1]) w2_tf = tf.reshape (w2_tf, [1]) with torch.no_grad (): plt.figure (figsize = (1251) ax.scatter (x, y) ax C = "b", label= "samples") ax.plot (x, w_tf * x + b_tf, "r", linewidth = 5.0, label= "tensorflow fit") ax.plot (x, w2_tf * x + b2_tf, "y", linewidth = 5.0, label= "tensorflow train loop") ax.plot (x, w_torch * inputs + b_torch, "c", linewidth = 5.0 Label = "pytorch") ax.legend () plt.xlabel ("x1") plt.ylabel ("y", rotation = 0)

Conclusion

Both Pytorch and the new Tensorflow 2.x support dynamic graphics and automatic differential core functions to extract gradients of all parameters used in the chart. You can easily implement a training loop in Python, which includes any loss function and gradient descendant optimizer. To focus on the true core differences between the two frameworks, we simplified the above example by implementing our own simple MSE and Na ï veSGD.

However, I strongly recommend that you reuse the optimization and proprietary code available on these libraries before implementing any Na ï ve code.

The following table summarizes all the differences noted in the above sample code. I hope it can be used as a useful reference when switching between the two frameworks.

At this point, the study on "what is the difference between pytorch and tensorflow" is over. I hope to be able to solve your doubts. The collocation of theory and practice can better help you learn, go and try it! If you want to continue to learn more related knowledge, please continue to follow the website, the editor will continue to work hard to bring you more practical articles!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.