How to deploy Pytorch deep learning model into production environment online 07/12 Update SLTechnology News&Howtos

How to deploy Pytorch deep learning model into production environment online

2025-07-12 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Shulou(Shulou.com)06/01 Report--

In this issue, the editor will bring you about how to deploy the Pytorch deep learning model to the production environment. The article is rich in content and analyzes and describes it from a professional point of view. I hope you can get something after reading this article.

Pytorch model deployment preparation

Pytorch and TensorFlow are the two most widely used deep learning frameworks at present. In the previous article, "automatically deploying the deep neural network model TensorFlow (Keras) to the production environment", we introduced how to deploy the TensorFlow model automatically through AutoDeployAI's AI model deployment and management system DaaS (Deployment-as-a-Service). In this article, we will introduce how to deploy the Pytorch deep neural network model automatically through DaaS. Similarly, we need:

Install Python DaaS-Client

Initialize DaasClient

Create a project

For the complete code, please refer to Notebook:deploy-pytorch.ipynb on Github

Pytorch Custom Runtime

DaaS is an automatic deployment system of AI model based on Kubernetes. The model runs in Docker Container and is called Runtime in DaaS. There are two different types of runtime, namely, network service runtime environment (Environment) and task runtime environment (Worker). Environment is used to create network services (Web Service), while Worker is used to deploy tasks (Job), such as model evaluation and batch prediction. DaaS comes with four runtimes by default, which are based on different languages Python2.7 and Python3.7 for Environment and Worker, and most of the commonly used machine learning and deep learning libraries. However, due to the size of Docker image (Image), the Pytorch library is not included for the time being.

DaaS provides a custom runtime function that allows users to register custom Docker images as Runtime to meet the customization needs of users using different model types and versions. Let's take the deployment of the Pytorch model as an example to describe in detail how to create a custom runtime:

1. Build a Docker image:

In general, there are two ways to create Image, one is to build through Dockerfile (docker build), and the other is to generate (docker commit) through Container. Here we use the first method. Either way, you need to select a basic image. Here, we choose the official image pytorch/pytorch:1.5.1-cuda10.1-cudnn7-runtime of Pytorch to facilitate construction.

In order to create a network service runtime, some basic libraries of network services need to be installed in addition to the dependent class libraries for the model to run. For a complete list, please refer to requirements-service.txt. Download the requirements-service.txt file to the current directory and create a Dockerfile:

FROM pytorch/pytorch:1.5.1-cuda10.1-cudnn7-runtimeRUN mkdir-p / daasWORKDIR / daasCOPY requirements-service.txt / daasRUN pip install-r requirements-service.txt & & rm-rf / root/.cache/pip

Build the Image:

Docker build-f Dockerfile-t pytorch:1.0 .2. Push Docker image to Kubernetes:

The constructed Docker image must be pushed to a place that can be accessed by the Kubernetes environment where DaaS is installed. Different Kubernetes environments have different Docker image access mechanisms, such as local image, private or public image registry (Image Registry). Take Daas-MicroK8s as an example, which uses MicroK8s Local Mirror Cache (Local Images Cache):

Docker save pytorch:1.0 > pytorch.tarmicrok8s ctr image import pytorch.tar3. Create the Pytorch runtime:

After logging in to the DaaS Web page, click the top menu environment / runtime definition. The following page lists all valid runtimes, and you can see the four runtimes that come with DaaS:

Click the button to create the runtime, and create the Environment runtime based on the pytorch:1.0 image:

The default deployment of the Pytorch model trains the Pytorch model.

Use the MNIST data in torchvision to identify the numbers entered by the user. The following code refers to the official example: Image classification (MNIST) using Convnets.

First, define a no-parameter function that returns an instance of a user-defined model class (inherited from torch.nn.Module). The function contains all dependencies and can be run independently, that is, it contains imported third-party libraries, defined classes, functions, variables, and so on. This is the key to automating the deployment of Pytorch models.

# Define a function to create an instance of the Net classdef create_net (): import torch import torch.nn as nn # PyTorch's module wrapper import torch.nn.functional as F class Net (nn.Module): def _ _ init__ (self): super (Net, self). _ _ init__ () self.conv1 = nn.Conv2d (1,32,3,1) self.conv2 = nn.Conv2d (32,64,3) 1) self.dropout1 = nn.Dropout2d (0.25) self.dropout2 = nn.Dropout2d (0.5) self.fc1 = nn.Linear (9216) self.fc2 = nn.Linear (128,10) def forward (self) X): X = self.conv1 (x) x = F.relu (x) x = self.conv2 (x) x = F.relu (x) x = F.max_pool2d (x, 2) x = self.dropout1 (x) x = torch.flatten (x) 1) x = self.fc1 (x) x = F.relu (x) x = self.dropout2 (x) x = self.fc2 (x) output = F.log_softmax (x, dim=1) return output return Net ()

In order to train the model quickly, modify epochs=3

Import torchimport torch.nn.functional as Fimport torch.optim as optimfrom torchvision import datasets, transformsfrom torch.optim.lr_scheduler import StepLRdef train (model, device, train_loader, optimizer, epoch, log_interval): model.train () for batch_idx, (data, target) in enumerate (train_loader): data, target = data.to (device), target.to (device) optimizer.zero_grad () output = model (data) loss = F.nll_loss (output) Target) loss.backward () optimizer.step () if batch_idx% log_interval = 0: print ('Train Epoch: {} [{} / {} ({: .0f}%)]\ tLoss: {: .6f}' .format (epoch, batch_idx * len (data), len (train_loader.dataset), 100. * batch_idx / len (train_loader), loss.item () def test (model, device, test_loader): model.eval () test_loss = 0 correct = 0 with torch.no_grad (): for data, target in test_loader: data, target = data.to (device), target.to (device) output = model (data) test_loss + = F.nll_loss (output, target) Reduction='sum') .item () # sum up batch loss pred = output.argmax (dim=1, keepdim=True) # get the index of the max log-probability correct + = pred.eq (target.view_as (pred)) .sum () .item () test_loss / = len (test_loader.dataset) print ('\ nTest set: Average loss: {: .4F} Accuracy: {} / {} ({: .0f}%)\ n'.format (test_loss, correct, len (test_loader.dataset), 100. * correct / len (test_loader.dataset)) use_cuda = torch.cuda.is_available () batch_size = 64test_batch_size = 1000seed = 1234567lr = 1.0gamma = 0.7log_interval = 10epochs = 3torch.manual_seed (seed) device = torch.device ("cuda" if use_cuda else "cpu") kwargs = {'batch_size': batch_size} if use_cuda: kwargs.update ({' num_workers': 1 'pin_memory': True,' shuffle': True},) transform= transforms.Compose ([transforms.ToTensor (), transforms.Normalize ((0.1307,), (0.3081,))]) dataset1 = datasets.MNIST ('. / data', train=True, download=True, transform=transform) dataset2 = datasets.MNIST ('. / data', train=False Transform=transform) train_loader = torch.utils.data.DataLoader (dataset1, * * kwargs) test_loader = torch.utils.data.DataLoader (dataset2, * * kwargs) model = create_net (). To (device) optimizer = optim.Adadelta (model.parameters (), lr=lr) scheduler = StepLR (optimizer, step_size=1, gamma=gamma) for epoch in range (1, epochs + 1): train (model, device, train_loader, optimizer, epoch, log_interval) test (model, device Test_loader) scheduler.step () publishes Pytorch model

After the model training is successful, the model is released to the DaaS server through the client-side publish function. By setting the test data set x_test and yearly test data set DaaS automatically detects the model input data format (type and dimension), mining patterns (classification or regression), evaluates the model, and automatically stores the first row of data in x_test as sample data to facilitate model testing. The parameter source_object is specified as the create_net function defined above, and the function code is automatically stored in the DaaS system.

Batch_idx, (x_test, y_test) = next (enumerate (test_loader)) # Publish the built model into DaaSpublish_resp = client.publish (model, name='pytorch-mnist', x_test=x_test, y_test=y_test, source_object=create_net Description='A Pytorch MNIST classification model') pprint (publish_resp)

The results are as follows:

{'model_name':' pytorch-mnist', 'model_version':' 1'} Test Pytorch model

Call the test function and specify runtime as the previously created pytorch:

Test_resp = client.test (publish_resp ['model_name'], model_version=publish_resp [' model_version'], runtime='pytorch') pprint (test_resp)

The return value test_resp is the result of a dictionary type that records the test API information, as follows:

The runtime "pytorch" is startingWaiting for it becomes available... {'access_token':' eyJ0eXAiOiJKV1QiLCJhbGciOiJSUzI1NiJ9.eyJ1aWQiOjEwMDAsInVzZXJuYW1lIjoiYWRtaW4iLCJyb2xlIjoiYWRtaW4iLCJleHAiOjE1OTYwNzkyNzksImlhdCI6MTU5NjAzMjQ3OX0.kLO5R-yiTY6xOo14sAxZGwetQqiq5hDfPs5WZ7epSkDWKeDvyLkVP4VzWQxxlPyUX6SgGeCx0pq-of6SYVLPcOmR54a6W7b4ZfKgllKrssdMqaStclv0S2OFHeVXDIoy4cyoB99MjNaXOc6FCbNB4rae0ufu-eZLLYGlHbvV_c3mJtIIBvMZvonU1WCz6KDU2fEyDOt4hXsqzW4k7IvhyDP2geHWrkk0Jqcob8qag4qCYrNHLWRs8RJXBVXJ1Y9Z5PdhP6CGwt5Qtyf017s7L_BQW3_V9Wq-_qv3_TwcWEyCBTQ45RcCLoqzA-dlCbYgd8seurnI3HlYJZPOcrVY5w', 'endpoint_url':' https://192.168.64.7/api/v1/test/deployment-test/pytorch/test', 'payload': {' args': {'Xcow: [{' tensor_input': [...], [...],...]]}]] 'model_name': 'pytorch-mnist',' model_version':'1'}

Tensor_input is a nested array with dimensions (1, 1, 28, 28). The complete data values are not listed above.

Use the requests library to invoke the test API:

Response = requests.post (test_resp ['endpoint_url'], headers= {' Authorization': 'Bearer {token}' .format (token=test_resp ['access_token'])}, json=test_resp [' payload'], verify=False) pprint (response.json ())

Return the result:

{'result': [- 21.444242477416992,-20.39040756225586,-17.134702682495117,-16.960391998291016,-20.394105911254883,-22.380189895629883 -29.211040496826172,-1.311301275563892e-06,-20.16324234008789,-13.592040061950684]}], 'stderr': [],' stdout': []}

In addition to the predicted value, the test results also include the log information of standard output and standard error output, which is convenient for users to view and debug.

Verify the test results

Compare the predicted results with the results of the local model:

Import numpy as npdesired = model (x_test [[0]]) .detach () .numpy () actual = response.json () ['result'] [0] [' tensor_output'] np.testing.assert_almost_equal (actual, desired) formally deploy the Pytorch model

After the test is successful, you can make a formal model deployment. Similar to the test API test, you also need to specify runtime as the pytorch you created earlier. To improve the performance and stability of the deployment, you can specify the number of CPU cores, memory size, and the number of deployment replicas for the running environment, which can be set by the deploy function parameters.

Deploy_resp = client.deploy (model_name=publish_resp ['model_name'], deployment_name=publish_resp [' model_name'] +'- svc', model_version=publish_resp ['model_version'], runtime='pytorch') pprint (deploy_resp)

Return the result:

The deployment "pytorch-mnist-svc" created successfullyWaiting for it becomes available... {'access_token':' eyJ0eXAiOiJKV1QiLCJhbGciOiJSUzI1NiJ9.eyJ1aWQiOjEwMDAsInVzZXJuYW1lIjoiYWRtaW4iLCJyb2xlIjoiYWRtaW4iLCJwcm9qZWN0TmFtZSI6Ilx1OTBlOFx1N2Y3Mlx1NmQ0Ylx1OGJkNSIsInByb2plY3RMYWJlbCI6ImRlcGxveW1lbnQtdGVzdCIsImlhdCI6MTU5NjAyODU2N30.iBGyYxCjD5mB_o2IbMkSKRlx9YVvfE3Ih-6LOE-cmp9VoDde-t3JLcDdS3Fg7vyVSIbre6XmYDQ_6IDjzy8XEOzxuxxdhwFPnW8Si1P-fbln5HkPhbDukImShM5ZAcfmD6fNWbz2S0JIgs8rM15d1WKGTC3n9yaXiVumWV1lTKImhl1tBF4ay_6YdCqKmLsrLX6UqbcZA5ZTqHaAG76xgK9vSo1aOOstKLTcloEkswpuMtkYo6ByouLznqQ_yklAYTthdrKX623OJdO3__DOkULq8E-am_c6R7FtyRvYwr4O5BKeHjKCxY6pHmc6PI4Yyyd_TJUTbNPX9fPxhZ4CRg', 'endpoint_url':' https://192.168.64.7/api/v1/svc/deployment-test/pytorch-mnist-svc/predict', 'payload': {' args': {'Xcow: [{' tensor_input': [...], [...] ...]}]}

Use the requests library to call the official API:

Response = requests.post (deploy_resp ['endpoint_url'], headers= {' Authorization': 'Bearer {token}' .format (token=deploy_resp ['access_token'])}, json=deploy_resp [' payload'], verify=False) pprint (response.json ())

The results are as follows:

The formal deployment results and test results are the same, except through the DaaS-Client client program, model testing and model deployment, it can also be done on the DaaS Web client, so I won't repeat it here.

Custom deployment Pytorch model

In the above default model deployment, we see that the input data of the model is a tensor (Tensor) with dimension (1,28,28), and the output result is a tensor (10). When the client calls to deploy REST API, it must carry out data preprocessing and result post-processing, including reading the image file, converting it to the desired tensor format, and calling the same data transformation as the model training. For example, in the normalization operation (Normalize) above, the final identified number is calculated from the tensor result.

In order to reduce the burden on the client, we hope that all these operations can be completed on the deployment server side, where the client inputs the image directly and the server returns the final identification number directly. In DaaS, you can meet the above requirements through the model custom deployment feature, which allows users to freely add arbitrary data preprocessing and post-processing operations. Let's describe in detail how to customize the deployment of the above Pytorch model.

Switch to the real-time prediction tab, click the command to generate a custom real-time prediction script, and generate a predefined script:

We see that the content of the function create_net will be automatically written to the generated prediction script. Click the command Advanced Settings, and select the network service running environment as pytorch:

Click as the API test command, switch to the test page, modify the preprocess_files function, and introduce the image processing operation during model training:

Def preprocess_files (args): "" preprocess the uploaded files "files = args.get ('files') if files is not None: # get the first record object in X if it's present if' X' in args: record = args ['X'] [0] else: record = {} args ['X'] = [record] From torchvision import transforms transform = transforms.Compose ([transforms.ToTensor ()) Transforms.Normalize ((0.1307,), (0.3081,)]) import numpy as np from PIL import Image for key, file in files.items (): img = Image.open (file) normed = transform (img) record [key] = normed.numpy () return args

When finished, enter the function name predict, select the request body based on the form, enter the name tensor_input, select the file, click upload test image test.png (the image is the data used in the above test), click submit, and the right response page will display the forecast result:

As you can see, the result is the same as the default deployment output. Continue to modify the postprocess function to:

Def postprocess (result): "postprocess the predicted results" import numpy as np return [int (np.argmax (np.array (result). Squeeze (), axis=0))]

Resubmit, and the right response page displays the result as follows:

After the test is completed, you can create a formal deployment, switch to the deployment tab, click the command to add network services, enter the service name pytorch-mnist-custom-svc, select pytorch for the network service running environment, and use the default options for others. Click create. After entering the deployment page, click the test tab, which is similar to the previous script test interface. Enter the function name predict, select the request body based on the form, enter the name tensor_input, select the type file, click upload the image of the test, and click submit:

At this point, the formal deployment has been tested and created, and users can use any client program to invoke the deployment service. Click the generate code command in the above interface to show how to invoke the service through the curl command. The test is as follows:

Deploy the Pytorch model through ONNX

In addition to the above native deployment, the Pytorch library itself supports the export of ONNX format, so deploying the Pytorch model through ONNX is another option. The advantage of ONNX deployment is that the model deployment no longer needs to rely on the Pytorch library, that is, there is no need to create the above pytorch runtime. You can use the runtime Python 3.7-Function as a Service that comes with DaaS by default, which includes a version of ONNX Runtime CPU to support ONNX model prediction.

Convert Pytorch model to ONNX:# Export the modeltorch.onnx.export (model, # model being run x_test [[0]], # model input (or a tuple for multiple inputs) 'mnist.onnx', # where to save the model (can be a file or file-like object) export_params=True) # store the trained parameter weights inside the model file opset_version=10, # the ONNX version to export the model to do_constant_folding=True, # whether to execute constant folding for optimization input_names = ['tensor_input'], # the model's input names output_names = [' tensor_output'] # the model's output names dynamic_axes= {'input': {0:' batch_size'}, # variable lenght axes' output': {0: 'batch_size'}}) publish the ONNX model: publish_resp = client.publish (' mnist.onnx' Name='pytorch-mnist-onnx', x_test=x_test, y_test=y_test, description='A Pytorch MNIST classification model in ONNX') pprint (publish_resp)

The results are as follows:

{'model_name':' pytorch-mnist-onnx', 'model_version':' 1'} Test ONNX model

Above, we test the model through the client-side test function, and here we use another way to test the model in the DaaS Web page. Log in to the DaaS Web client, go to the pytorch-mnist-onnx model page, and switch to the test tab. We see that DaaS automatically stores a piece of test data. Click the submit command to test the data, as shown below:

We see that the test results of the ONNX model and the native Pytorch model are consistent.

Default deployment and custom deployment ONNX model

For information on how to create default and custom deployments for ONNX models in the DaaS Web interface, please refer to the article "deploying Deep Learning and traditional Machine Learning models using ONNX". The process is the same, so I won't repeat it here.

Try DaaS (Deployment-as-a-Service)

In this article, we introduced how to deploy the Pytorch model natively in DaaS. The whole process is very simple. For the default deployment, you can simply call a few API to complete the deployment of the model. For custom deployment, DaaS provides a convenient testing interface, and you can modify the script at any time to test, and then create a formal deployment after successful debugging. In real deployment, in order to achieve higher prediction performance, users need to modify more custom prediction scripts, such as better data processing, using GPU and so on. DaaS provides an easy-to-use deployment framework that allows users to customize and extend freely.

If you want to experience the DaaS model automated deployment system, either through our cloud SaaS service, or local deployment, please send an email to autodeploy.ai#outlook.com (# replace @) and explain your model deployment requirements.

The above is the editor for you to share how to deploy the Pytorch deep learning model to the production environment, if you happen to have similar doubts, you might as well refer to the above analysis to understand. If you want to know more about it, you are welcome to follow the industry information channel.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.