How to build Serverless AI application 04/19 Update SLTechnology News&Howtos

How to build Serverless AI application

2025-04-19 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Servers >

Shulou(Shulou.com)05/31 Report--

This article is to share with you about how to build Serverless AI applications, the editor thinks it is very practical, so I share it with you to learn. I hope you can get something after reading this article.

To introduce a typical application scenario of function calculation: AI Model Serving (AI model service).

Function calculation and AI reasoning

Serverless is divided into FaaS and BaaS. Ali Cloud function Computing belongs to FaaS, which is an event-driven fully managed computing service. Through function calculation, you do not need to manage infrastructure such as servers, you just need to write code and upload. Function calculation will prepare computing resources for you, run your code in a flexible and reliable manner, and provide log query, performance monitoring, alarm and other functions.

With the help of functional computing, you can quickly build any type of applications and services without management and operation and maintenance. Moreover, you only have to pay for the resources consumed by the actual running of the code, and there is no cost if the code is not running.

As shown in the figure above: it is the workflow of a complete machine learning project.

The workflow can be divided into three parts:

First of all, preprocess the original data.

Then the processed data are trained in the model, and different parameters and algorithms are selected for multiple training to form a number of alternative models.

Finally, choose the most appropriate model for deployment.

The above three steps, the first two steps of data processing and model training are mainly completed by data scientists. The final step is usually the job of the application developer. When an AI model is deployed as a formal application, it needs to solve the problems in the field of application development, such as availability, reliability and scalability.

Function computing is an event-triggered computing service. Upstream services such as OSS, MNS and API Gateway trigger events that trigger the function to run and process data from upstream data sources.

Another function calculates that the scenario that is suitable for ML is suitable for AI Model Serving. The figure above is an example of image classification, and the model trained by TensorFlow can be deployed as a function. When a new image makes input and is sent to the function through the HTTP protocol, the function returns the classification judgment of the image.

Application of three-step construction to automatically generate five-character quatrains

Let's move on to the second part: quickly deploy an AI application. (more details: https://developer.aliyun.com/article/741406)

Functions support flexible calls, which can be divided into three ways: GUI, CLI and SDK:

GUI has web versions of the console and IDE, as well as local plug-ins

CLI We recommend Funcraft tools. Funcraft is an engineering-oriented development tool that provides some functions such as template wizard, local development and debugging, deployment and operation. Fcli focuses on operating resources in the deployed cloud.

On the SDK side, function computation covers common languages.

The picture above is the working principle diagram of the "write Poems for you" application.

The first is the training model, which has about 70,000 lines and five words, which are trained into a model through TensorFlow's CharRNN; then install TensorFlow's pip package through the Funcraft tool and write the call code with python. Template.yml is the description file of ROS, which describes the function declaratively, and then deploys the function through fun deploy.

Because Model and TensorFlow lib will exceed the function calculation limit for the deployment package 50m, the Funcraft tool will automatically deploy these two parts to the NAS network disk (prompt for automatic creation if it does not exist), and then the runtime function will access the model and library files on the NAS network disk as if it were the local file system.

Because the function instance is automatically expanded according to the number of requests and charged according to the amount of call, the AI service that goes online quickly is a highly available service in a real scenario.

To build the service, you first need to install git and funcraft tools. Funcraft is an open source github project, and you can find installation files and guides for each platform in https://github.com/alibaba/funcraft.

Then you can quickly deploy the "write Poems for you" application to the functional computing platform by performing the three steps in the figure above.

Advantages of function Computing in AI scenarios

In the last part, we will summarize the advantages of function computing in AI scenarios by comparing different dimensions with traditional architectures.

First, let's take a look at the engineering efficiency calculated by the function:

Compared to self-built services (ECS or K8s clusters), function computing does not require maintenance of infrastructure (hosts, networks, storage, etc.)

In terms of development efficiency, function computing has solved a series of pain points in application construction, development, debugging, packaging and deployment through the construction of tools such as Funcraft and VSCode in recent years. Users can get started smoothly and quickly deploy and redevelop applications through templates.

In terms of learning cost, because function computing implements the details of distributed applications, we only need to concentrate on the implementation of a single application and focus more on the business.

Compared with ECS and CCS, function computing is more flexible, supports flexibility of 100 milliseconds, and can better cope with real-time business fluctuations. At the same time, it also provides a fine-grained, out-of-the-box monitoring and alarm module.

The above picture shows some monitoring charts, and users can directly understand the monitoring status of the application through the visual interface.

Let's take a look at a comparative usability experiment, assuming that there are three AI scenarios in the figure above:

The first is delay-sensitive applications deployed on ECS

The second is cost-sensitive applications deployed on ECS

The third is the FC scheme, which is also a small advantage of FC because FC provides MKL acceleration by default.

When a lot of timeouts or 5xx errors occur in the scenario, the expansion speed is too slow. Theoretically, it takes 4 minutes or more to expand the capacity: trigger alarm 3*1min + purchase startup ECS (1-5min) > 4 min At the same time, the speed of capacity reduction is slower, only the alarm is triggered by 15 * 1min. Of course, you can adjust the time interval to trigger the alarm, but cloud monitoring is always a minute-level granularity, and the value set is too small. Frequent purchase and release of ECS is not a recommended operation, which is why it is officially recommended to expand capacity for 3 minutes and reduce capacity for 15 minutes.

Scenario 2 still causes 5xx due to untimely capacity expansion when the pressure suddenly rises. At the same time, the figure of the number of instances shows that the lag of minute-level expansion and reduction speed may seriously affect the user experience in this scenario.

The pressure and pressure change of scenario 3 is obviously greater than that of self-built ECS + SLB + ESS scheme, but there is no error, and the response time is basically stable at 200-300ms, except for cold start with a bit of burr. However, the maximum time of burr does not exceed the acceleration of 2s:MKL, and the single operation time is short; rapid flexible scale-down, sharp rise and drop of pressure can quickly scale and improve the efficiency of the use of resources.

There are two ways to solve the problem of function cold start burr: reducing single start time and pre-starting (preheating).

First of all, let's talk about how to shorten the startup time of the function. The startup time of the function is divided into two parts: one is the platform, including code download, container startup, runtime initialization and code initialization, and the other is the code part that is responsible for the user. This part is often difficult to optimize due to different businesses.

With regard to the way the function is pre-started, when the function calculates 1. 0, it is recommended that the user use the Time Trigger timing trigger function to keep the function from being recycled. Function calculation 2.0 deduces the reservation mode of the function. By reserving the mode, the user can keep the function alive without reclaiming it.

Let's do a set of comparative experiments on the reservation mode.

Scenario 1: when the request for function execution arrives, because there are no reserved resources, all requests are on demand, so every time the pressure increases, there will be a lot of cold start, there will be a lot of burrs, and the burr time will reach 20s +.

Scenario 2: when the request for function execution arrives, because the reserved resources are sufficient, all requests are scheduled to be executed in the reserved instance. At this time, there is no cold start, so the request is burr-free.

Scenario 3: when the request for function execution arrives, priority is dispatched to the reserved instance to be executed. At this time, there is no cold start, so the request is without burr. Later, with the increasing pressure of the test (peak TPS reaches 1184), the reserved instance can not meet the request to call the function. At this time, the function calculation will automatically expand the capacity of the instance as needed for function execution. The call at this time has a cold start process. As we can see from the figure above, the maximum latency time of the function even reaches 32s. If the web AP is delay-sensitive, the latency is unacceptable.

The four small diagrams in the above figure depict the relationship between resource utilization and cost in different scenarios.

Figure 1: ECS instance reservation based on peak value, resource utilization less than 30%

Figure 3: cost-sensitive, at the expense of some correspondence, resource utilization is less than 70%

Figure 4: based on the reservation mode + prepaid, the flexible part can be used by the quantity mode of the function, and the utilization rate of resources can be more than 80%.

According to the cost accounting of the above four Case, the final function calculates that the cost of the combined payment mode is the lowest.

The above is how to build Serverless AI applications, the editor believes that there are some knowledge points that we may see or use in our daily work. I hope you can learn more from this article. For more details, please follow the industry information channel.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.