"subtract" K8s API: challenge and practice of Alibaba Cloud Native Application Management 07/12 Update SLTechnology News&Howtos

"subtract" K8s API: challenge and practice of Alibaba Cloud Native Application Management

2025-07-12 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Servers >

Shulou(Shulou.com)06/03 Report--

Author | Sun Jianbo (Tianyuan) Alibaba technical expert

This article is organized from November 21 community sharing, 2 high-quality sharing per month, click to join the community.

As early as 2011, Alibaba began its internal application containerization. At that time, it started to build containers based on LXC technology, then gradually switched to Docker, and developed a large-scale orchestration and scheduling system. In 2018, our team began to promote "lightweight containerization" relying on the K8s system, while investing engineering efforts to solve many scale and performance problems with the open source community, thus gradually upgrading the previous "virtual machine-like" operation and maintenance links and Alibaba's overall application infrastructure architecture to the cloud native technology stack.

By 2019, Kubernetes infrastructure chassis has covered all aspects of Alibaba's business in Alibaba economy, with large-scale access to many head Internet scenarios, including core e-commerce, logistics, finance, takeout, search, computing, AI and so on. This set of technical chassis has gradually become one of the main forces for Alibaba to support 618, double 11 and other Internet-level promotion.

At present, Alibaba and Ant Financial Services Group run dozens of super-large-scale K8s clusters, of which the largest cluster has about 10,000 machine nodes, but in fact, this is not the upper limit of capacity. Each cluster serves tens of thousands of applications. On Aliyun Kubernetes service (ACK), we also maintain a K8s cluster with tens of thousands of users, which is second to none in scale and technical challenges in the world.

New challenges facing our Kubernetes

While the infrastructure problems such as scale and performance are gradually solved, in the process of rolling out Kubernetes on a large scale, we gradually find that there are still many unexpected challenges in this system. This is also the theme shared today.

The first is that there is no concept of "application" in the API of K8s.

Moreover, Kubernetes API's design mixes all the concerns of research and development, operations and infrastructure. This makes R & D feel that K8s is too complex, operation and maintenance staff feel that K8s is very messy, scattered and difficult to manage, and only the infrastructure team (that is, our team) thinks that Kubernetes is easier to use. But it is also difficult for the infrastructure team to explain to R & D and operators what the value of Kubernetes is.

Let's look at a practical example.

Cdn.com/4a89d4debc85d2de2689b32fac8da73f5ab5b774.png ">

For example, if the replica in the figure above is 3, how can the developer know how many instances should be matched? If the operator wants to change the replica, dare he change it? Can you change it? If replica can understand, then fields like shareProcessNamespace are soul torture. Developers only know literally that this may have something to do with container process sharing, so what is the impact of configuring this application? Is there a security problem?

Within Alibaba, many PaaS platforms only allow the development of very few fields that fill in Deployment. Why are there so few fields allowed? Is the platform not strong enough? In fact, no, the essential reason is that the business developer does not want to understand these many fields at all.

So this PaaS platform only allows users to fill in individual fields, which actually helps business developers avoid these soul torture. But on the other hand, does masking a large number of fields really solve the problem? In this case, how does the infrastructure capacity of the entire organization evolve? How should the demands of application developers and application operators be passed on to the infrastructure?

In fact, in the final analysis, Kubernetes is a Platform for Platform project designed for infrastructure engineers to build other platforms (such as PaaS or Serverless), rather than facing R & D and operations students. From this point of view, Kubernetes's API can actually be compared to Linux Kernel's System Call, which is not at all the same level as what R & D and operation and maintenance really want to use (Userspace tools). You can't let the students who originally wrote Java Web call Linux Kernel System Call directly every day and give you a like, can you?

Second, K8s is too flexible, there are too many plug-ins, and there are a lot of Controller and Operator developed by all kinds of people.

This flexibility makes it easy for our team to develop various capabilities, but it also makes it very difficult for application operators to manage these capabilities of K8s. For example, different operation and maintenance capabilities in an environment may actually be conflicting.

Let's take a look at an example. The infrastructure team recently launched a new plug-in called CronHPA. A specific Spec is shown below.

As an infrastructure team, we find this K8s plug-in simple and CRD easy to understand. Just like the function of this CronHPA, there are at least 20 and a maximum of 25 instances from 6 a.m. to 07:00, and at least one to nine at 06:00 the next morning. At each stage, the number of instances is measured and adjusted according to the CPU indicator.

However, shortly after we launched the plug-in happily, the application operation and maintenance students began to complain to us:

"how on earth should this ability be used? Where is the manual for its use? Do you want to read the CRD or the documentation? How do I know if this plug-in is installed in a cluster? " Our operation and maintenance staff accidentally bound CronHPA and HPA to the same application, only to find that this application will be fussy. Why do you K8s have to wait for this kind of conflict to make a mistake? Can't you design a mechanism to automatically check if there are any conflicts in the use of these plug-ins? " In fact, we did do this later, and the solution was to add more than 20 Admission Hook to our K8s.

Third, it is also a particularly painful point for our team after Alibaba went to the cloud.

We need to deal with application delivery scenarios, in addition to public clouds, there will be proprietary clouds, hybrid clouds, IoT and other complex environments. All kinds of cloud services in this complex scenario, even the API are not unified, at this time we need a special delivery team to make up, one by one to dock, to deliver applications. This is a very painful thing for them: "isn't it said that after the Docker is changed, it will be able to 'pack and run everywhere'?" To put it bluntly, K8s does not have a unified, platform-independent application description capability.

Alibaba's solution

In 2019, our team began to think about how to solve the above application management and delivery-related problems through technical means, and has achieved some results so far.

However, before explaining Alibaba's plans to solve the above problems, it is necessary to introduce the theoretical basis on which we advance all these programs. Here, we mainly follow the "application delivery hierarchical model" advocated by CNCF, as shown in the following figure:

The basic assumption of this model is that Kubernetes itself does not provide a complete application management system. In other words, the application management system based on K8s is not an out-of-the-box function, but needs to be built by the infrastructure team based on the cloud native community and ecology. This requires the introduction of a lot of open source projects or capabilities.

An important role of the above model is to be able to classify and express these projects and capabilities as well as their collaborative relationships very clearly.

For example, Helm is located at the top of the entire application management system, that is, layer 1, and there are various YAML management tools such as Kustomize and packaging tools such as CNAB, which all correspond to layer 1.5.

Then there are Tekton, Flagger, Kepton and other application delivery projects, including the process of release and deployment, configuration management, etc. At present, the more popular is GitOps-based management, through git as a "the source of truth", everything is oriented to final and transparent management, and it is also convenient for docking, corresponding at layer 2.

And Operator and various workload components of K8s (Deployment, StatefulSet, etc.), specifically, it is like hanging these components on an instance and automatically pulling up a number of instances needed to make up for the original three instances, including some self-healing, capacity expansion and other capabilities, corresponding to layer 3

The last layer is the platform layer, including all the underlying core functions, responsible for managing workload containers, encapsulating infrastructure capabilities, providing API for various workloads docking underlying infrastructure, and so on.

These levels, through close cooperation with each other, work together to build an efficient and concise application management and delivery system. Among these levels, Alibaba has announced an open source third-tier OpenKruise project at KubeCon this year. Recently, we are working with the broader ecology such as Microsoft to promote the work related to the first layer of "application definition" with the entire community.

What exactly should I do with the application definition?

In fact, many attempts have been made on application definition, both in the open source community and within Alibaba. For example, when I mentioned at the beginning that Docker solved the problem of stand-alone application delivery, it defined stand-alone applications well through Docker images.

We have also tried to use Helm and Application CRD to define applications around Kubernetes. But today's cloud native applications tend to rely on cloud resources, such as databases will rely on RDS, access will rely on SLB,Helm and Application CRD simply combine K8s API together, can not describe our dependence on cloud resources, when we use CRD to describe cloud resource dependence, it is actually freestyle, without a good specification and constraint Users, developers, operators and platform resource providers do not have a consensus, so they are naturally unable to collaborate and reuse.

On the other hand, since they are a simple combination of K8s API, the problem that K8s API itself is "not oriented to application R & D and operation and maintenance design" still exists, which is not in line with the direction of our desired "application definition". In addition, like Application CRD, although it is a project of the K8s community, there is a clear lack of community activity, and most of the changes were made a year ago.

After trying it around, we found that the "application definition" is actually missing in the whole cloud native community. This is why many teams within Alibaba have begun to try to design their own "definition applications". To put it simply, this design describes all the images, startup parameters, dependent cloud resources, etc., of the application itself, places them by category, and finally renders a configuration file through a template. there are thousands of fields in the file, which fully describe all the contents of an application definition. The configuration file looks something like this:

In addition to the basic Deployment description field, this in-house application definition often includes a declaration of resources on the cloud, such as which ECS package to use, how to renew, which disk and specification are used, and a series of additional descriptions. The definition of these resources is a large chunk, and we have made it as concise as possible in the above example; another chunk is the description of operation and maintenance capabilities, such as automatic expansion and scaling, traffic switching, grayscale, monitoring, etc., which involves a series of rules.

However, it is not difficult to see that in this way, all the configurations will eventually be stacked in one file, which is the same or even more serious as the problem with K8s API all-in-one. Moreover, these application definitions eventually become black boxes, in addition to the corresponding project itself can be used, other systems can not be reused, naturally can not make multi-party cooperative reuse.

After learning these lessons, our team decided to design a new application definition from another direction.

Specifically, compared with other "application definitions" to add and integrate K8s, we think that a really good application definition should be "subtraction" to K8s API. More precisely, we should "subtract" to expose the API that developers really care about and encapsulate the API that operators and platforms care about.

In other words, since K8s API has chosen to mix the concerns of various parties for the convenience of infrastructure engineers. Then, when infrastructure engineers want to serve higher-level application development and operation and maintenance personnel based on K8s, they should consider resorting out these concerns so that each participant in application management can get their own API subset.

Therefore, we begin to add a thin layer of abstraction on the basis of K8s API, so that the original K8s API is reasonably split and classified according to the real cooperation logic, and then exposed to R & D and operation and maintenance respectively. The principle here is: the API obtained by R & D must be from the perspective of R & D without any concept of infrastructure, while the API obtained by operation and maintenance must be a modular and declarative description of K8s capability. In this way, ideally, operators (or platforms) can combine these API objects from both sides, such as applying A + Autoscaler X and B + Ingress Y. In this way, the combination of the description object, in fact, can be a complete description of the "application" of this thing.

Open Application Model (OAM)

In the communication and verification with the community, we found that the above idea coincided with the thinking of Microsoft Brendan Burns (founder of Kubernetes project) and Matt Butcher (founder of Helm project) team at that time. So after several face-to-face conversations, we soon decided to build this project together and open it up to promote this very meaningful thing with the ecology of the whole community.

On October 17 this year, Aliyun Xiaoxie and Microsoft Cloud CTO Mark jointly announced the open source of this project, its official name is Open Application Model (OAM), and we also announced the corresponding K8s implementation of OAM-Rudr project.

Specifically, when designing OAM, we hope that this application definition should solve three problems of traditional application definition:

First, there can be no runtime locking. A set of application definitions must be able to run to different operating environments without modification, whether it is based on K8s or not, which is the key to solve the problems we encounter in application delivery. This is the real "define once, run everywhere".

Second, the application definition must distinguish between using roles, rather than continuing the all-in-one API of K8s. We have learned very well that the application developers we serve are actually very difficult and do not want to care about the concepts of operation and maintenance and the underlying concepts of K8s, and we should not make their already difficult days worse.

Finally, the application definition must not describe everything in a YAML. Once all the information in an application definition is coupled together, it will cause the application description and operation and maintenance description to be mixed together, which will increase the complexity of the definition and make the definition completely unreusable. We hope that the descriptions of these different areas can be separated, and then the platform can be freely combined and matched.

Under this idea, the final application definition we designed is mainly divided into three large blocks:

The first part is the description of the application component, including how the application component runs and the various resources on which the component depends. This part is written by the developer.

The second part is the description of operation and maintenance capabilities, such as how to apply scale, how to access, how to upgrade and other strategies. This part is written by operation and maintenance.

The third part is a configuration file that combines the above description files. For example: "an application has two components, component A requires operation and maintenance capability X and capability Y, and component B requires operation and maintenance capability X." So this configuration file is actually the final "application". This configuration file is also written by the operation and maintenance staff and submitted to the platform to run. Of course, the platform can also generate this file automatically.

Let's take a look at what the YAML files corresponding to the above three parts look like through examples. How on earth do they play?

Note: if you want to actually experience this process as I do, you only need to install the Rudr project in the K8s cluster.

Part one: Component

First of all, we can see that Component defines things that developers care about, and there are no concepts related to operation and maintenance.

Its Spec is mainly divided into two parts:

The first parameter block is the application description, including the WorkloadType field, which indicates what Workload the application uses to run. In our design, there are six default Workload, namely Server, Worker, Job and their corresponding singleton mode. Workload can also be extended. Server represents a mode that is automatically scalable and has a port to access. Next comes the mirror of the container, startup parameters, and so on, which contains the complete OCI spec.

The second block is how parameters runs extensible parameters such as environment variables and port numbers. The characteristic of this parameter is that although they are defined by the development, they all allow the operation and maintenance to be covered later. The key point here is that separation of concerns does not mean complete separation. Therefore, we designed the parameters list in the hope that the developer can tell the operation and maintenance staff which parameters can be overridden later. In this way, the developers can make demands to the operators, such as which parameters should be used and what the parameters represent.

A Component like this can be installed directly into K8s via kubectl.

Then we can use the kubectl tool to see which components have been installed:

Therefore, our current K8s cluster supports two "application components". It is important to point out that in addition to the components we support built-in, the developer is free to define a variety of components and submit them to us. Workload Type in Component Spec can be expanded at will, just like the CRD mechanism in K8s.

Part II: Trait

After talking about the API that can be used for development, let's take a look at what the API used by operation and maintenance looks like.

In the process of defining the operation and maintenance capability of the design application, we focus on how to discover and manage the operation and maintenance capability.

To this end, we designed a concept called Trait. The so-called Trait, that is, the "characteristics" of the application, is actually a declarative description of operation and maintenance capabilities. We can use command-line tools to find out which Traits (operation and maintenance capabilities) are supported in a system.

At this time, it is very simple for OPS to check how the specific OPS capability should be used:

As you can see, in the definition of Trait, he can clearly see what type of Workload this operation and maintenance capability can be applied to, including which parameters can be entered. What is required? Which ones are optional? What is the function description of the parameter? You can also find that in the OAM system, API such as Component and Trait are Schema, so they are the complete set of fields of the whole object and the ability to understand the description of the object. "what on earth can I do?" The best way (the infrastructure team's documentation is not well written anyway).

The above Trait can also be installed in the cluster after using kubectl apply.

Since Component and Trait are both Schema, how can they be instantiated into applications?

Part III: Application Configuration

In the OAM system, Application Configuration is the operation object for the operation and maintenance personnel (or the system itself) to perform actions such as application deployment. In Application Configuration, operators can bind Trait to Component for execution.

In Application Configuration YAML, operators can assemble Component and Trait to get an "application" that can be deployed:

Here we can see that the instantiated application of operation and maintenance includes a Component called hellowworld-python-v1, which has two parameters: one is the environment variable target, and the other is port. It should be noted that these two parameters are overridden by the operator who overrides the two overriding variables defined by the developer in the original Component yaml.

At the same time, this Component is bound with two OPS capabilities: one is horizontal expansion, and the other is Ingress domain name access.

Operation and maintenance personnel can deploy such an application through kubectl:

At this point in K8s, you can see that the OAM plug-in will automatically create the corresponding Deployment for you.

At the same time, the Ingress required by this application is automatically created:

In fact, the Rudr plug-in mentioned earlier is working. After getting the Application Configuration file of OAM, identify the Component and Trait in it, map them to the resources on K8s and pull them up, and the corresponding life cycle of K8s resources will be managed with the configuration of OAM. Of course, since the definition of OAM is platform-independent, in addition to the resources of K8s itself, external resources will also be added to the implementation of the Rudr plug-in.

OAM YAML file = a self-contained software installation package

Finally, we can instantiate an application of OAM by assembling different modules of reusing OAM like Lego bricks. More importantly, the OAM application description file is completely self-contained, that is, through OAM YAML, as a software distributor, we can fully track all the resources and dependencies needed for a software to run.

As a result, for an application, we only need an OAM configuration file to run the application quickly and in different running environments at any time, and deliver this self-contained application description file to any running environment completely.

This not only solves the software delivery problems mentioned earlier, but also allows more non-K8s platforms such as IoT, game distribution, mixed environment software delivery and other scenarios to enjoy cloud native application management.

Last

OAM is an application definition model that belongs entirely to the community, and we very much hope that everyone will be involved.

(* * add the pin scan code to the AC group *)

On the one hand, if you have any scenarios that OAM cannot satisfy, you are welcome to present issue in the community to describe your case.

On the other hand, the OAM model is also actively interfacing with various cloud vendors and open source projects.

We look forward to working with you to build this new application management ecology.

Q & A

There are currently no management objects belonging to Infra Operator in Q1:OAM spec (add: Component is App Developer,Traits-oriented or AppConfiguration-oriented App Operator-oriented, which object is Infra Operator-oriented? )

A1:OAM itself is a weapon in the hands of infrastructure operators, including a series of platform-level open source projects such as Kubernetes and Terraform. Infrastructure operators can build the implementation of OAM through these open source projects (for example, Rudr is based on Kubernetes). So the implementation layer of OAM is provided by infrastructure operators, who do not need additional objects to use OAM.

What is the standard of division of labor between Q2:OAM Controller and admission controller?

Admission controller in the A2:OAM project is used to transform and verify spec, which is exactly equivalent to admission controller in K8s. At present, the functions implemented include converting [fromVariable (VAR)], which is a function in spec, and verifying whether the CR such as AppConfig, Component, Trait, Scope and so on conform to the specification and are legal. OAM Controller, the current open source project Rudr, is the implementation layer of the entire OAM. It is responsible for interpreting the spec of OAM and converting it into real-life resources, which can be existing in K8s or cloud resources such as RDS on Aliyun. At present, the Rudr project is written in Rust language, considering that most of K8s ecology is written in Go language, we will also open source an OAM-Framework written in Go language to quickly implement OAM implementation layer like Rrudr.

Q3: when do you plan to open source Go's OAM-Framework?

A3: we need a little time to further polish the OAM-Framework and adapt it to everyone's scene. But I should see you soon.

Q4: how does Ali reduce the complexity of K8s to meet some common requirements of operation and R & D? The user user role in K8s may be development or operation and maintenance.

A4: most of the scenarios we encounter at present can tell which ones are concerned by operators and which are concerned by R & D. The main way for OAM to reduce the complexity of K8s is to separate concerns and subtract the API of K8s so that one party can pay less attention to the content as much as possible. If you have such an inseparable scene, in fact, we are also very interested, welcome to bring up the case to discuss. On the other hand, we are not shielding K8sMagneOAM Spec with sufficient scalability to provide users with the original capabilities of K8s.

Q5: I think OAM is based on K8s for abstraction layers on different applications. Nowadays, many of our applications are packaged in Helm packages. If we switch to OAM, what should we pay attention to?

A5: in fact, we have been promoting the use of Helm in China in the first half of the year, including providing Alibaba's Helm Mirror Station (https://developer.aliyun.com/hub), so OAM and Helm also complement each other. To put it simply, OAM is actually the contents of the template folder in the Helm package. Helm is a tool for OAM to do parametric rendering (template) and packaging (chart). If you don't need to change the way you switch to OAM,Helm, just change the spec to the spec of OAM.

Q6: excuse me, has Rudr been used and how effective is it? Is there any more information on the structure of Rudr?

A6:Rudr has always been available, if you can not use it, you can mention issue, what kind of information or questions you want can also mention issue, we are also improving the documentation. At present, all the relevant materials are here:

Https://github.com/oam-dev/rudr/tree/master/docs

Q7: we have been packaging our applications with Helm to do gitops, a general chart corresponding to different values.yaml to achieve reuse. After listening to the sharing, I am looking forward to OAM and, of course, Openkruise. A7:Openkruise is open source, ha. You can follow https://github.com/openkruise/kruise and we are iterating all the time.

What companies are using Q8:OAM? How is the feedback from the actual experience?

A8:OAM has only been released for a month or so, and we haven't had time to count which companies are already using it. Both Alibaba and Microsoft are already in use internally, and both have external products that use OAM. As far as the users we have come into contact with, both the users of the community and the interior of Alibaba agree very much with the concept of separation of concerns of OAM, and they are also actively landing.

Community sharing articles collation

Vol 1: when the K8s cluster reaches 10,000 level, how does Alibaba solve the performance problem of each component of the system?

Vol 2: in the super-large-scale commercial K8s scenario, how does Alibaba dynamically solve the problem of allocating container resources on demand?

Vol 3: prepare for double 11! How to design Ant Financial Services Group's ten thousand scale K8s cluster management system?

Vol 4: get you started with an IEDA plug-in that has been downloaded more than 100000 times

"Alibaba Cloud's native Wechat official account (ID:Alicloudnative) focuses on micro-services, Serverless, containers, Service Mesh and other technology areas, focuses on cloud native popular technology trends, and large-scale cloud native landing practices, and is the technical official account that best understands cloud native developers."

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.