Behind the noise: the concept and Challenge of Serverless 07/01 Update SLTechnology News&Howtos

Behind the noise: the concept and Challenge of Serverless

2025-07-01 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Servers >

Shulou(Shulou.com)06/02 Report--

Author | Xu Xiaobin Ali Yun, senior technical expert, is currently in charge of the construction of Serverless R & D operation and maintenance platform of Ali Group. The author of "Maven practice" used to be the maintainer of Maven central warehouse.

Introduction: as the head of Serverless R & D operation and maintenance platform of Ali Group, the author analyzes why Serverless fascinates so many people from the perspective of application architecture, what is its core concept, and summarizes some problems that will inevitably be faced by landing Serverless.

Preface

In the Sound and Fury of Serverless, I made an analogy to the state of Serverless in the industry today. The metaphor goes like this:

Serverless is like teenage sex: Everyone talks about it, nobody really knows how to do it, everyone thinks everyone else is doing it, so everyone claims they are doing it.

Although it has been half a year since I wrote that article, in my opinion, this state has not changed much. There are many front-line R & D or managers' understanding of Serverless technology is very one-sided, some even wrong. If there is a lack of understanding of the evolution of application architecture, lack of understanding of cloud infrastructure capabilities, lack of risk judgment, blindly up-to-date technologies may not only fail to realize business value and waste energy, but also introduce unnecessary technical risks.

From the perspective of application architecture, this paper attempts to analyze why Serverless fascinates so many people, what its core concept is, and from my personal practical experience, summarize some problems that landing Serverless will inevitably face.

The evolution of application architecture

To better understand Serverless, let's first review the evolution of application architecture. More than a decade ago, the mainstream application architecture was a single application, which was deployed in the form of a server plus a database. Under this architecture, operators would carefully maintain the server to ensure the availability of the service. As the business grows, this simplest architecture will soon face two problems. First of all, there is only one server, and if this server fails, such as hardware damage, then the whole service will be unavailable; second, after the volume of business increases, the resources of one server will soon be unable to carry all the traffic. The most direct way to solve these two problems is to add a load balancer at the traffic entrance and deploy the single application to multiple servers at the same time, so that the single point problem of the server is solved. at the same time, this single application also has the ability to scale horizontally.

As the business grows further, more R & D personnel join the team to develop features on single applications. At this time, because the code in the single application does not have a clear physical boundary, we will soon encounter all kinds of conflicts, requiring manual coordination, as well as a large number of conflict merge operations, the efficiency of research and development has plummeted. At this time, we begin to split the single application into micro-service applications that can be independently developed, tested and deployed independently, and the services communicate with each other through API, such as HTTP,GRPC or DUBBO. The micro-service architecture based on the split of Bounded Context in domain-driven design can greatly improve the R & D efficiency of large and medium-sized teams. If you want to know more about Bounded Context, we recommend you to read books related to domain-driven design.

Applications have evolved from single architecture to micro-service architecture. From a physical point of view, distribution has become the default option. At this time, application architects have to face the new challenges brought by distribution. In this process, everyone will start to use some distributed services and frameworks, such as caching service Redis, configuration service ACM, state coordination service ZooKeeper, message service Kafka, communication framework such as GRPC or DUBBO, and distributed tracking system. In addition to the challenges brought by the distributed environment, the micro-service architecture brings new challenges to the operation and maintenance. Originally, R & D personnel only need operation and maintenance of one application, but now they may need ten or more applications, which means that the workload of security patch upgrades, capacity assessment, fault diagnosis and other transactions increases exponentially. At this time, the importance of application distribution standards, life cycle standards, observation standards, automation flexibility and other capabilities is highlighted.

Now let's talk about the word "cloud native". To simply understand whether an architecture is cloud native or not, it depends on whether the architecture grows on the cloud. This understanding of "growing up" on the cloud does not simply mean the use of cloud IaaS layer services, such as simple ECS,OSS and other basic computing storage, but should be understood as whether or not distributed services on the cloud, such as Redis,Kafka, are used, which directly affect the business architecture. As we mentioned earlier, under the micro-service architecture, distributed services are necessary. It turns out that everyone develops such services themselves, or based on the open source version of their own operation and maintenance services. When it comes to the era of cloud native, businesses directly use cloud services.

The other two technologies that have to be mentioned are Docker and Kubenetes, in which the former standardizes the standard of application distribution, no matter the application written by Spring Boot or NodeJS is distributed in the way of mirror image, while the latter defines the standard of application life cycle in the former technology. an application has a unified standard from startup to online, to health check-up and offline. With standards for application distribution and lifecycle, the cloud can provide standardized application hosting services. Including application version management, release, observation after launch, self-healing and so on. For example, for stateless applications, the failure of an underlying physical node will not affect research and development at all, because the application hosting service can automatically complete the migration based on the standardized application life cycle. take the application container offline on the failed physical node and start the same number of application containers on the new physical node. We see that cloud natives further release value dividends.

On this basis, because the application hosting service can perceive the data of the application runtime, such as concurrency of business traffic, cpu load, memory footprint, and so on, the business can configure scaling rules based on these metrics, and the platform can implement these rules to increase or decrease the number of containers according to the actual situation of business traffic. This is the most basic auto scaling, automatic scaling. This can help users avoid restricting resources during the business trough, save costs, and improve the efficiency of operation and maintenance.

In the process of the evolution of the architecture, R & D operation and maintenance personnel are gradually shifting their attention from the machine, hoping that more machines will be managed by the platform system rather than by people, which is a very simple understanding of Serverless.

Core concepts of Serverless

In fact, we all know that although it is Serverless, but Server (server) can not really disappear, the less in Serverless means that developers do not need to care. This is like modern programming languages Java and Python, where developers don't have to allocate and release memory manually, but the memory is still there, just left to the garbage collector to manage. Calling a platform that can help you manage your servers as Serverless is like calling Java and Python the Memoryless language.

If we look at today's cloud era, then Serverless cannot be narrowly understood as not caring about the server. In addition to the basic computing, network and storage resources contained in the server, resources on the cloud also include various types of upper-level resources, such as databases, caches, messages, and so on.

In February 2019, the University of UC Berkeley published a paper entitled "Cloud Programming Simplified: A Berkeley View on Serverless Computing", which also contains a very clear and vivid metaphor, which is described as follows:

In the context of the cloud, Serverful computing is like programming in low-level assembly language, while Serverless computing is like programming in a high-level language such as Python. For example, for a simple expression such as c = a + b, if described in assembly, you must first select several registers, load the values into the registers, perform mathematical calculations, and then store the results. This is like today's Serverful computing in the cloud environment, developers need to allocate or find available resources first, then load code and data, then perform calculations, store the results of calculations, and finally manage the release of resources.

The so-called Serverful computing in this paper is the mainstream way we use the cloud today, but it should not be the way we use the cloud in the future. I think the vision of Serverless should be Write locally, compile to the cloud, that is, the code only cares about business logic, and the tools and clouds manage resources. Now that we have a more general but abstract concept of Serverless, let me elaborate on the main features of the Serverless platform.

First: don't worry about the server.

Managing one or two servers may not be a hassle, but managing thousands or even tens of thousands of servers is not that easy. Any server may fail. If you automatically identify the fault and remove the problematic instances, this is a necessary capability for the Serverless platform; in addition, the security patch upgrade of the operating system needs to be completed automatically without affecting the business; the log and monitoring system needs to be turned on by default; the security policy of the system needs to be automatically configured to avoid risk When resources are insufficient, you need to be able to automatically allocate resources and install related code and configuration, and so on.

Second: automatic elasticity

Today's Internet applications are designed to be scalable. When the business has obvious peaks and troughs, or when the business has temporary capacity requirements (such as marketing activities), the Serverless platform can achieve automatic resilience in a timely and stable manner. In order to achieve this capability, the platform needs to have a very strong resource scheduling ability, as well as a very keen perception of application indicators such as load, concurrency.

Third: charge according to the actual use of resources

Serverful uses cloud resources by occupation rather than usage. If a user buys three ECS sets on the cloud, then no matter how much CPU and memory the user actually uses, he needs to pay the overall cost of the three ECS. In Serverless mode, users are paid according to the resources actually used. For example, when a request actually uses a 1core2g specification resource 100ms, then users only need to pay for the unit price of the specification multiplied by time (i.e. 100ms). Similarly, if users are using a Serverless database, they only have to pay for the resources actually consumed by query, as well as the resources for data storage.

Fourth: less code, faster delivery

Code based on Serverless architecture usually heavily uses back-end services to separate data, state management, and other content from the code; in addition, the more thorough FaaS architecture also gives the Runtime of the code to the platform management. This means that for the same application, there will be much less code in Serverless mode than in Serverful mode, so it will be faster from distribution to startup. The Serverless platform also usually provides very mature code construction and release, version switching and other features to improve delivery speed.

The Challenge of realizing Serverless

I have talked so much about the benefits of Serverless, but it is not easy to implement Serverless on a large scale in mainstream scenarios. There are many challenges. Let me analyze these challenges in detail:

Challenge 1: the difficulty of lightening the business

To achieve complete automatic flexibility, paying according to the actual use of resources means that the platform needs to be able to expand the capacity of business instances in seconds or even milliseconds. This is a challenge to the infrastructure, and puts forward high requirements for the business, especially for the relatively large business applications. If it takes ten minutes for an application to be distributed and started, then the auto-resilient response ability can hardly keep up with the changes in business traffic. There are many ways to solve this problem. Micro-service can split giant applications into smaller ones, while FaaS uses a new application architecture to split applications into finer-grained functions to achieve lightweight. Of course, the disadvantage of this method is that it requires a major transformation of the business. For the Java language, the introduction of modules in Java 9 and GraalVM's native image technology can help Java applications slim down and reduce startup time.

Challenge 2: inadequate response capacity of infrastructure

Once the application or function instance of Serverless can be expanded in seconds or even milliseconds, the related infrastructure will soon face tremendous pressure. The most common infrastructure is the service discovery and log monitoring system. Originally, the frequency of the entire cluster instance may have changed from several times per hour to several times per second. In addition, if the response capability of these systems cannot keep up with the speed of instance change, for example, for business, the container instance is expanded in 2 seconds, but you still need to wait 10 seconds for service discovery to complete synchronization. Then the whole experience is greatly reduced.

Challenge 3: the business process life cycle is inconsistent with the container

The Serverless platform relies on a standardized application life cycle to achieve fully automatic container mobility, application self-healing and other features. In the system based on standard container and Kubenetes, the life cycle that the platform can control is the life cycle of the container. Therefore, it is necessary for the business to keep the life cycle of the business process consistent with that of the container, including start, stop, readiness probe and liveness probe specifications, and so on. In practice, although many businesses have been transformed into containers, the container actually contains not only the business main process, but also many auxiliary processes, which will also lead to inconsistencies in the life cycle of the business process and the container.

Challenge 4: the observability needs to be improved.

In Serverful mode, if there are any problems in the production environment, the server will not disappear, users will naturally want to log in to the server, operate linux commands, search logs, analysis processes, and even dump memory for problem analysis. In Serverless mode, we say that users no longer need to care about the server, that is, they cannot see the server by default, so what if the system has an exception and the platform cannot complete self-healing? Users still need to have a wealth of diagnostic tools, including traffic, system indicators, dependent services and other aspects of comprehensive status, in order to achieve fast and accurate problem diagnosis. When the overall observability around the Serverless model is insufficient, users will certainly not feel at ease about it.

Challenge 5: the mind of R & D operation and maintenance staff needs to change.

Almost all research and development, when deploying their own applications for the first time in their career, are oriented to one server, or one IP, which is a very powerful habit. Today, we can still see that many applications are still stateful and cannot automatically change instances; we can also see that many change deployment behaviors are bound to IP, such as selecting a specific machine for Beta, and so on; there are many publishing systems that do not change instances when doing Rolling Update, and the relevant OPS systems build capabilities based on this assumption. In the process of the gradual landing of Serverless, R & D needs to change some mode of thinking, gradually adapt to the mind that "IP may change at any time", and more from the service version, as well as from the perspective of traffic to operate and maintain their own system.

Summary

Let's go back to the wonderful metaphor in the "Cloud Programming Simplified: A Berkeley View on Serverless Computing" paper: today we use the cloud as if we were writing code in sinks. I think this situation will continue to change, and ideally, 100% of the packages that users deliver to the platform deployment should be the code that users use to describe the business. Although the current situation is far from that, we can see that many technologies, such as Service Mesh,Dapr.io,cloudsteate.io, are stripping out the necessary logic of the distributed architecture from the business runtime and giving it to the platform management. This trend has become increasingly clear and strong in the last year. Bilgin Ibryam made a good summary of this in the article "Multi-Runtime Microservices Architecture" and recommended to read it.

In this paper, we see that the evolution of Serverless has put forward new requirements for application architecture, to continuous delivery, service governance and operation and maintenance monitoring. In fact, Serverless will also put forward higher response requirements for lower-level technical facilities such as computing and storage networks. Therefore, this is actually a relatively thorough technological evolution that runs through many levels of application, platform and infrastructure. I think it is very exciting to have the honor to participate in it.

In order that more developers can enjoy the dividends brought by Serverless, this time, we have gathered 10 + technical experts in Alibaba's Serverless field to create an open Serverless course that is most suitable for developers to get started, so that you can easily embrace the new paradigm of cloud computing-Serverless.

Click to watch the course for free: https://developer.aliyun.com/learning/roadmap/serverless

"Alibaba Cloud Native focus on micro services, Serverless, containers, Service Mesh and other technology areas, focus on cloud native popular technology trends, cloud native large-scale landing practice, to be the official account of cloud native developers."

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.