How to use WebAssembly to improve the speed and portability of model deployment 04/03 Update SLTechnology News&Howtos

How to use WebAssembly to improve the speed and portability of model deployment

2025-04-03 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Development >

Shulou(Shulou.com)06/03 Report--

This article mainly explains "how to use WebAssembly to improve model deployment speed and portability", the content of the article is simple and clear, easy to learn and understand, now please follow the editor's ideas slowly in depth, together to study and learn "how to use WebAssembly to improve model deployment speed and portability" bar!

Model training

To illustrate the difference between model training and deployment, let's first simulate some data. The following code generates 1000 observations based on the following simple model: picture publishing

Import numpy as np np.random.seed (66) # Set seed for replication# Simulate Data Generating Process n = 1000 # 1000 observations x1 = np.random.uniform (- 2pje 2jinn) # xjing1 & xjing2between-2and 2x2 = np.random.uniform (- 2jin2) p = 1 / (1 + np.exp (- 1* (.75 + 1.5*x1-.5 * x2)) # Implement DGPy = np.random.binomial (1, p N) # Draw outcomes# Create dataset and print first few lines: data = np.column_stack ((x1m x2jiny)) print (data [: 10])

After generating the data, we can focus on fitting the model. We just need to use the LogisticRegression () function of sklearn:

From sklearn.linear_model import LogisticRegression mod = LogisticRegression () .fit (data [:, [0jue 1]], np.ravel (data [:, [2]]))

Take a closer look.

At this point, it's useful to comb through and briefly consider what's going on under the hood. Like many other interesting ML models, logical regression models are iteratively trained. In order to train the model, sklearn (or any other software package that provides similar functionality) will have to implement the following functions:

1. Some kind of scoring function that indicates the degree of fit of the model. This may be an error function or a maximum likelihood function.

two。 This function updates the parameters of the fitting model from one iteration to the next.

The training process will effectively reuse these two functions: initially, the parameters of the model are randomly instantiated. Next, check the score of the model. If you think that the score is not enough (usually because the score is higher than the previous iteration), the model parameters are updated and the process is repeated.

Even for this simple model, sklearn still needs to traverse the dataset. The following code gives the number of iterations:

# Print the number of iterations print (f'The number of iterations is: {mod.n_iter_}.'

Therefore, to train the model, we need to access the data, there are several tool functions, and we need to iterate / traverse the dataset multiple times. Generally speaking, the training process requires a lot of computation, which explains why for complex models, we resort to parallel computing and GPU or NPU acceleration to execute in a reasonable time. Fortunately, when training the model, the rather complex logic required has been abstracted by the various ML libraries we use.

Generate prediction

Compare it with generating predictions from models that have been fitted (often called reasoning, but because the latter is different in statistics, I find the term confusing, so I insist on using predictions). When it comes to model fitting, in this case, all we actually need to generate is the logical regression function (the same mathematical function used to generate data in the above example) and the three parameters of the fitting model. These are easy to retrieve:

B = np.concatenate ((mod.intercept_, mod.coef_.flatten)) print (b)

The parameters end up relatively close to the value we used for data generation: [0.84576563 1.39541631-0.47393112].

In addition, in most deployment cases, we usually end up using only a single input to evaluate the model: in this case, a numeric vector of length 2. If we are to deploy the model, we do not need fitting functions, data, or iterations. To generate predictions, we only need to simply and effectively implement the mathematical functions involved.

Deployment model in edge devices

"so?" You might ask. When modern model training tools abstract all these details, why care about the details involved in training and prediction? Well, because when you want to deploy the model effectively (for example, when you need the model to run quickly on small devices), you can better take advantage of device differences.

For ease of discussion, compare the following two model deployment methods (that is, putting the trained model into production so that its prediction can be used):

Deploy sklearn as a REST service on the Docker container: this approach is simple and frequently used: let's start a docker image containing python and tools for training: for the example above, the logical regression model sklearn. Next, we create a REST API service that uses the mod.predict () function of the fitting model to generate the results.

Scailable WebAssembly deployment: in addition to the above approach, you can convert the fitting model to WebAssembly (using services similar to those provided by Scailable) and deploy .WASM binaries that contain only the logic needed to make predictions in the smallest WebAssembly runtime. The automatically generated binaries will contain only the necessary logical functions and estimated parameters. Binaries may be deployed on the server and therefore similarly used through REST calls, but it is compatible with available runtimes and can run on almost any edge device.

Obviously, the first deployment process is close to "what we know" of data scientists. It is very convenient to use our usual tool directly, and it works in many ways: we can use calls to REST endpoints to generate predictions. The second solution is a far cry from our standard practice and is of no use for model training (that is, there is no "WebAssembly package to train the model." ). However, we still think that it should be preferred: the second setting takes advantage of the difference between training and prediction, thus making the model deployment better in several ways:

Memory footprint: the first of the above two options will require at least a 75Mb container (making the container smaller requires a lot of engineering, making it more common for the container to be close to 1Gb). In this case, the stored model itself is very small (~ 2Kb), so the container accounts for the largest block of deployment memory (note, for example, large neural networks may be incorrect). In contrast, the WebAssembly runtime can drop below 64Kb. The WebAssembly binary itself is larger than the stored sklearn model (~ 50kb), but now it contains everything necessary to generate predictions. Therefore, while the first deployment option consumes at least 75Mb, the second deployment option does not take up 0.1Mb.

Speed: consuming a REST endpoint running in a Docker container does not gain an advantage in execution time over efficient WebAssembly deployment, because the Docker container starts everything you need for training. Here are some speed comparisons for different models, but, needless to say, taking advantage of the differences between training and forecasting, and simply putting the predicted basic demand into production, the speed can be increased by an order of magnitude to generate these predictions.

As a result, it takes up less memory and executes faster. There are several reasons; one of them is that we may want to deploy the model effectively without wasting energy every time we make a prediction. However, a small memory footprint and fast execution are also attractive, because this is exactly what we need on the edge of putting the model into production: good luck deploying your Docker container (for example,) on the ESP32 MCU board. Using WebAssembly is a piece of cake.

Thank you for reading, the above is the content of "how to use WebAssembly to improve the speed and portability of model deployment". After the study of this article, I believe you have a deeper understanding of how to use WebAssembly to improve the speed and portability of model deployment. Here is, the editor will push for you more related knowledge points of the article, welcome to follow!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.