Interpretation of Container 2019: carrying out "Application-centric" to the end 07/09 Update SLTechnology News&Howtos

Interpretation of Container 2019: carrying out "Application-centric" to the end

2025-07-09 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Servers >

Shulou(Shulou.com)06/03 Report--

Author | Zhang Lei Aliyun Senior Technical expert, official Ambassador of CNCF, Senior member of Kubernetes Project and Co-maintainer

At the turn of the next decade, do you know what new changes this seemingly calm cloud native technology ecology is giving birth to and undergoing?

Preface

During the year, the ecologically iconic KubeCon attracted an unprecedented 12000 people to San Diego, and the list of sponsors of the conference was as large as a ten-meter-long poster.

In this year, Kubernetes finally became a widely recognized industrial standard in the field of infrastructure, and the establishment of this standard came to a successful end with the heavyweight investment of AWS.

In this year, with the continuous promotion of the leading participants in the community, "scale" and "performance" finally became important keywords of the Kubernetes project, which not only really opened the last kilometer of Kubernetes's large-scale landing in the enterprise production environment, but also made Kubernetes a real technical protagonist in top Internet scale scenes such as "double 11" for the first time.

At the turn of the next decade, do you know what new changes this seemingly calm cloud native technology ecology is giving birth to and undergoing?

Scale: new business card of the Kubernetes project

If you want to nominate an important node in the evolution of cloud native technology in 2019, then "scale" must be one of the most indomitable keywords.

Out of the focus of the design concept, the Kubernetes project did not regard "scale" as the core priority of the whole project evolution in the past and for a long time before 2019. The main reason here is that the design idea of Kubernetes is "application-centered" infrastructure, so compared with the resource efficiency issues concerned by traditional job scheduling and scheduling management projects (such as Mesos and Yarn, etc.), the core competitiveness of Kubernetes has always been on the construction of higher latitude application infrastructure, such as workload description, service discovery, container design pattern and so on. The other reason, of course, is that Kubernetes service providers, such as GKE, have limited demands for scale and performance.

This state was broken with the heavyweight investment of top Internet companies in the Kubernetes community in early 2019. In fact, the scale and performance optimization of Kubernetes itself is not an "unsolvable" problem, but the whole community has been short of large-scale scenarios and specialized engineering efforts for a long time, so it is impossible to find, diagnose and repair the performance obstacles in the whole container orchestration and management link.

In this critical link, the scale problem of Kubernetes can be subdivided into three problem domains: the "data plane" represented by etcd, the "management plane" represented by kube-apiserver, and the "production / consumption plane" composed of kubelet and various Controller. Driven by the scenario, Kubernetes and the etcd community have made a lot of optimizations around these three problem areas over the past year, such as:

Data plane

By optimizing the data structure and algorithm of etcd underlying database, the performance of etcd million key pair random write is improved by 24 times.

Control surface

Add a Bookmark mechanism for kube-apiserver to reduce the events that require resynchronization when APIServer is restarted to 3% of the original, and the performance has been improved dozens of times.

Production / consumption

The heartbeat mechanism of Kubernetes nodes to APIServer is changed from "timing sending" to "sending on demand", which greatly reduces the great pressure of kubelet on APIServer in large-scale scenarios, and greatly increases the upper limit of the number of nodes that Kubernetes can support.

In addition, in the specific scene of large-scale Kubernetes landing, the scale and performance optimization practice around the above three problem areas and fully validated in the production environment also surfaced in technical speeches such as KubeCon. For example, how to make multiple kube-apiserver handle production / consumption requests more evenly and avoid performance hotspots; how to set up master / standby Controller reasonably so that there is no need to resynchronize a large amount of data when upgrading Controller, so as to reduce the performance impact on API Server when controller recovery, and so on.

Almost all of these "full-link optimization" efforts in the field of Kubernetes scale and performance come from real Internet-level scale scenarios, and they end up thanks to the collaboration of the top open source community and the joint efforts of all participants. The gradual solution of scale and performance problems has not only brought sufficient strength for the Kubernetes project, but also is rapidly changing the basic pattern of the entire infrastructure field.

In May 2019, Twitter officially announced at its headquarters in San Francisco that the infrastructure of Twitter would shift from Mesos to Kubernetes. The news seemed to have thrown a blockbuster into the slightly dull tech community at the time, and rumors were rife.

In fact, as early as a year ago, Twitter engineers have become important members and sharing regulars in the Bay area "large-scale Cluster Management Group (CMWS, Cluster Mgmt at Web Scale)". CMWS is a closed-door organization dedicated to cluster management in large-scale scenarios, and its founding members include a large number of global top technology companies, such as Alibaba, Pinterest, Linkedin, Netflix, Google, Uber, Facebook, Apple and so on. The team members will hold a closed-door Meetup every month to conduct in-depth technology sharing and exchanges around specific issues, so as to promote the team members to land the Kubernetes technology system in the Internet scene faster and better. It is well known that Bay area Internet companies have always been heavy users of the Mesos project because of size and performance considerations, and this Twitter shift is actually just one of the CMWS team members.

The essence and misunderstanding of Cloud Origin

Despite the great progress along the way, even in 2019, there are still many people who have doubts and even misunderstandings about the "original cloud". This must be why we have been able to hear different definitions of cloud origin on different occasions. Some people say that cloud natives are Kubernetes and containers; others say that cloud natives are "resilient and scalable"; others say that cloud natives are Serverless;, but later, some people simply make a judgment: cloud natives themselves are "Hamlet", and everyone's understanding is different.

In fact, since the beginning of this keyword was "borrowed" by CNCF and Kubernetes technology ecology, the meaning and connotation of cloud origin has been very definite. In this ecology, the nature of cloud nativism is a combination of best practices; in more detail, cloud nativism provides practitioners with a low mental burden and the best path to maximize the power and value of cloud in a scalable and replicable manner.

So, cloud native does not refer to an open source project or a certain technology, it is a set of ideas that guide software and infrastructure architecture design. The key point here is that the application and application infrastructure based on this idea will naturally integrate with the "cloud" and give full play to the maximum power and value of the "cloud".

This kind of thinking, in a word, is "application-centered".

It is precisely because it is application-centric that cloud native technology systems place unlimited emphasis on enabling infrastructure to better match applications and "deliver" infrastructure capabilities to applications in a more efficient way, rather than doing the opposite. Accordingly, Kubernetes, Docker, Operator and other open source projects that play a key role in cloud native ecology are the technical means to put this idea on the ground.

Taking application as the center is an important main line to guide the vigorous development of the whole cloud native ecology and Kubernetes project.

Application Infrastructure capability of "sinking" and Service Mesh

With such a main line, it will be a little clearer for us to go back and re-examine the technological evolution of cloud native ecology in 2019.

You may have heard that this evolution of infrastructure represented by Kubernetes is always accompanied by an important keyword, that is, the "sinking" of application infrastructure capabilities.

In the past, the infrastructure capabilities needed to write an application, such as databases, distributed locks, service registration / discovery, message services, etc., were often solved by introducing a middleware library. This library is actually the service access code written for you by a dedicated middleware team, so that you can learn and use these infrastructure capabilities at minimal cost without in-depth understanding of the details of specific infrastructure capabilities. This is actually a simple idea of "separation of concerns". However, more precisely, the emergence of middleware system is not just for "professionals to do professional things", but also because in the past, the capability of infrastructure is neither strong nor standard. This means that without middleware to block out these infrastructure details and unify access methods, business R & D must be "forced to do business" to learn countless obscure infrastructure API and invocation methods, which is obviously unacceptable to R & D students who "productivity is everything".

However, the evolution of the infrastructure itself has actually been accompanied by the rapid rise of cloud computing and open source communities. Today, the modern infrastructure system, which takes the cloud as the center and relies on the open source community, has completely broken the situation that the original enterprise-level infrastructure capabilities are intermingled, or can only be provided by several giants around the world.

This change is the beginning of cloud native technology to change the pattern of traditional application middleware. More specifically, all kinds of infrastructure capabilities originally provided and encapsulated through application middleware are now "dragged" by Kubernetes projects from the application layer to the infrastructure layer, that is, Kubernetes itself. It is worth noting that Kubernetes itself is not a direct provider of these capabilities, and the role of the Kubernetes project is to "expose" lower-level infrastructure capabilities to users through declarative API and controller patterns. These capabilities come either from the "cloud" (such as PolarDB database services) or from ecological open source projects (such as Prometheus and CoreDNS).

This is the fundamental reason why CNCF can quickly build a huge ecology of hundreds of open source projects based on a seed like Kubernetes: Kubernetes has never been a simple platform or resource management project, it is a full-weight "access layer" and the true "operating system" of the cloud native era.

But why only Kubernetes can do this?

This is because Kubernetes is the first open source infrastructure project that really tries to be "application-centric".

Application-centric, Kubernetes has made declarative API, rather than scheduling and resource management, its foundation from day one. The greatest value of declarative API lies in "leaving simplicity to users and complexity to yourself". Users of declarative API,Kubernetes always only need to care about and declare the final state of the application, rather than the configuration and implementation details of the underlying infrastructure such as cloud disk or Nginx. Note that the "final state" of the application here includes not only the final state of the application itself, but also all the underlying infrastructure capabilities needed by the application, such as routing policy, access policy, storage requirements, and so on.

This is the practical embodiment of taking "application as the center".

So instead of making middleware disappear, Kubernetes has turned itself into a "declarative", "language-independent" middleware, which is what application infrastructure capability "sinking" really means.

The "sinking" of application infrastructure capabilities has actually been accompanied by the development of the entire cloud native technology system and Kubernetes projects. For example, the earliest capabilities of application replica management, service discovery and distributed collaboration provided by Kubernetes are actually "sinking" the most urgent requirements for building distributed applications into the infrastructure through the Replication Controller,kube-proxy architecture and etcd. Service Mesh, in fact, goes a step further, sinking the vital part of "service-to-service traffic governance" in traditional middleware. Of course, this means that Service Mesh doesn't really have to rely on Sidecar: as long as it can unwittingly intercept traffic between services (such as API Gateway and lookaside patterns).

With the improvement and strength of the underlying infrastructure capabilities, more and more capabilities will be "sunk" in a variety of ways. In this process, the emergence of CRD + Operator has played a key role in promoting it. CRD + Operator actually exposes the Kubernetes declarative API driver to the outside world, so that any developer of infrastructure "capability" can easily implant this "capability" into Kubernetes. Of course, this reflects the essential difference between Operator and custom Controller: Operator is a special custom Controller, and its author must be a domain expert corresponding to some "capability", such as a TiDB developer, not a K8s expert. Unfortunately, the current development of Operator Framework does not reflect this deep meaning: it is wrong that too many details of K8s Controller have been exposed to Operator developers.

In 2019, the ecology of Service Mesh has made great progress, and from the original absolute dominance of Istio to a situation closer to "vassal contention". After all, the ecology of "middleware" can hardly be completely dominated in the past, and Google's announcement that the Istio project will not be donated to any open source community "as always" is just adding fuel to this trend. In fact, as the aggregator of the "sinking" of application infrastructure capabilities in this wave of Service Mesh, Istio projects are already creating more and more "local conflicts" with the Kubernetes project itself (such as the relationship with the kube-proxy system). In the future, there may be more debate about whether the specific capability of some kind of "declarative middleware" is to be provided by Kubernetes Core or Mesh plug-ins, and in the process, we will see more "intrusion" of Istio into Kubernetes's network and even container runtime level, which will make the infrastructure more and more complex and "cool techs".

The theme of declarative application infrastructure

So it is an indisputable fact that Kubernetes projects are actually getting more and more complex, not simpler.

Rather, the foundation of "declarative infrastructure" is the sinking of more and more "complexity" into the infrastructure, whether it's the efforts to "democratize" plug-in and interface-oriented Kubernetes, container design patterns, or Mesh systems, all of which are exciting technological developments that end up making Kubernetes itself more and more complex. The advantage of declarative API is that it can at least ensure that the complexity of the user interface still increases linearly while the complexity of the infrastructure itself increases exponentially. Otherwise, today's Kubernetes may have been abandoned by repeating the disaster of OpenStack.

"complexity" is an inherent trait rather than a weakness of any infrastructure project. Today's Linux Kernel must be more than several orders of magnitude more complex than the first edition in 1991; today's Linux Kernel developers will not be able to know every Module as well as they did a decade ago. This is the inevitable result of the evolution of infrastructure projects.

However, the complexity of the infrastructure itself does not mean that all users of the infrastructure should bear all the complexity itself. It's as if each and every one of us is actually "using" Linux Kernel, but we don't complain that "Linux Kernel is too complicated": most of the time, we don't even know Linux Kernel exists.

In order to better illustrate the "complexity" of Kubernetes, we need to first describe the three types of technology groups covered by the current Kubernetes system:

Infrastructure engineer

In a company, they are often called "Kubernetes teams" or "container teams". They are responsible for deploying, maintaining Kubernetes and container projects, conducting secondary development, and developing new features and plug-ins to expose more infrastructure capabilities. They are experts in the field of Kubernetes and containers, and the backbone of this ecology.

Operation and maintenance engineer (including operation, R & D and SRE)

They and the tools and pipelines they developed (Pipeline) are the cornerstone of ensuring the stability and correct operation of critical business, while the Kubernetes project is the core foundation of supply and R & D operation and maintenance capabilities. Ideally, operations engineers are the most important users of Kubernetes projects.

Business R & D engineer

Among the current Kubernetes project users, the proportion of business research and development is very small. They are not interested in infrastructure themselves, but they may also be attracted by Kubernetes's ability to "declarative middleware" and gradually accept relying on the infrastructure primitives provided by Kubernetes to write and deploy applications.

The above three user groups may overlap in different companies, but the inherent "complexity" of Kubernetes has a profound impact on all three users. In 2019, cloud native ecology is also trying to solve the problems caused by Kubernetes complexity for different users from the above three perspectives.

Serverless Infrastructure is in the ascendant.

Naturally, the first thing Kubernetes Ecology wants to address is the complexity faced by infrastructure engineers. Declarative API has actually helped a lot here, but deployment and operation and maintenance of Kubernetes itself is still a challenge. This is an important point of value for things like kubeadm, kops projects, and Kubernetes managed projects and services such as GKE,ACK,EKS, etc.

On the other hand, in the process of helping infrastructure engineers alleviate the complexity problem, some public cloud providers have gradually discovered the fact that Kubernetes engineers actually do not want to care about the details and features of lower-level infrastructure such as network, storage, and host. This is like an electrical engineer, how can he care about things in the power plant?

So many public cloud providers have launched Serverless Kubernetes services. This service retains the Control Plane of Kubernetes, but removes the kubelet components in Kubernetes, the corresponding network, storage and many other IaaS-related concepts, and then uses the Virtual Kubelet project to directly connect Kubernetes Control Plane to a service called Serverless Container to directly access IaaS. The so-called Serverless Container actually exposes virtual machines and corresponding network storage dependencies through a container API, such as Microsoft's ACI,AWS 's Fargate and Aliyun's ECI. From the user's point of view, what he sees is that the Node in Kubernetes has been removed, so this kind of service is also called "Nodeless".

Nodeless removes the most troublesome nodes and resource management parts from Kubernetes, so it is very simple to use, and users don't have to worry about the underlying resources at all, which can be said to be labor-saving and flexible; and the corresponding tradeoff is that the lack of kubelet will reduce the functional integrity of Kubernetes.

After AWS announced the EKS on Fargate service at the Re:Intent conference at the end of 2019, it quickly caused a huge response in the industry. The design of EKS on Fargate is similar to that of Serverless Kubernetes, the main difference is that it does not use Virtual Kubelet to directly remove the concept of Node, but still uses kubelet to apply for EC2 virtual machine to Fargate service as Pod, thus better preserving the functional integrity of Kubernetes. In this way, we can call it Serverless Infrastructure, that is, a Kubernetes that doesn't care about the details of the underlying infrastructure.

In fact, Ali proposed the Virtual Cluster architecture in the Kubernetes community in mid-2019. Its core idea is that on a "basic Kubernetes cluster", countless complete Kubernetes clusters can be "virtualized" to be used by different tenants. As you can see, this idea coincides with the design of EKS on Fargate, but goes further: Fargate currently needs to isolate the container runtime through the EC2 virtual machine and the corresponding virtual machine management and control system, while Virtual Cluster enables all tenants to share the same host securely through KataContainers, thus greatly improving the resource utilization (which is the greatest charm of KataContainers). However, I believe that soon EKS on Fargate will also gradually move towards the Virtual Cluster architecture through Firecracker. Another commercial product similar to Virtual Cluster is VMware's Project Pacific: through "magic modification" kubelet, it achieves the purpose of "virtual out" tenant Kubernetes cluster on top of vSphere.

As an open source community, EKS on Farge and Project Pacific,Virtual Cluster are currently the core incubation projects of the upstream multi-rent working group of Kubernetes and have carried out PoC many times, which is very worthy of attention.

Construction of an "application-centered" operation and maintenance system

Cloud native ecology from Nodeless to Virtual Cluster,2019 is going all out to solve the problems of infrastructure engineers. However, the more important problems faced by operation and maintenance engineers and business research and development seem to have been ignored so far. You know, they are actually the main group that complains about "Kubernetes complexity".

The essential problem here is that the positioning of the Kubernetes project is "The Platform for Platform", so its core functional primitive services are aimed at infrastructure engineers, not operations and R & D; its declarative API design and CRD Operator architecture are also designed to facilitate infrastructure engineers to access and build new infrastructure capabilities. As a result, as users and terminal beneficiaries of these capabilities, there is actually an obvious dislocation between operation and maintenance engineers and business R & D and the core positioning of Kubernetes, while there is also a huge gap between the existing operation and maintenance system and Kubernetes system.

In order to solve this problem, many companies and organizations actually adopt the idea of "PaaS" when they land on Kubernetes, that is, to build a PaaS system on top of Kubernetes and separate Kubernetes from business operation and business R & D through PaaS API (or UI). The advantage of this is that the infrastructure capabilities of Kubernetes really become "infrastructure": the problem of "complexity" of PaaS,Kubernetes that business operations and R & D really learn and use is solved.

But in essence, this approach is inconsistent with Kubernetes's "application-centric" design. Once Kubernetes has degenerated into a "IaaS-like infrastructure", its declarative API, container design patterns, and controller models simply cannot play to their original strength, and it is difficult to access the broader ecology. In the world of Kubernetes, the capabilities provided by traditional PaaS, such as CI/CD, application packaging and hosting, release and expansion, can now be deployed in Kubernetes as plug-ins through individual CRD Controller, which is similar to the process of "sinking" application infrastructure.

Of course, in this process, these plug-ins become the key to how to dock and integrate Kubernetes with the existing operation and maintenance system. For example: how to upgrade applications in place, fix IP, how to avoid Pod being expelled at will, how to uniformly manage and release Sidecar containers in accordance with the company's operation and maintenance standards, and so on. In 2019, the release of OpenKruise, a custom workload open source project in KubeCon Shanghai, is actually a typical case of successful docking between Kubernetes and the existing operation and maintenance system. Bottom line: in modern cloud native technology systems, "operations capabilities" can become part of the Kubernetes infrastructure directly through declarative API and controller models. So in most cases, there is no need for an operation and maintenance PaaS.

However, even if the "operation and maintenance capability" Yunyuan biochemistry is achieved, the "application-centric" infrastructure is still in its infancy.

Carry out the "application-centered" to the end

Unlike traditional middleware from the perspective of business research and development, the "sinking" revolution of cloud native and even the entire application infrastructure begins from the bottom up. It starts with the idea of building container infrastructure, which is lower than "cloud computing" than "cloud computing", and then reveals the "application-centric" design idea layer by layer. Because of the inherent high threshold of infrastructure and the speed at which declarative application management theory is accepted, until 2019, the community's understanding of Kubernetes system has just risen from "IaaS-like infrastructure" and "resource management and scheduling" to the dimension of "application-centric operation and maintenance capability".

Of course, this "application-centered" technological revolution will not come to an abrupt end at the node of "operation and maintenance". So what happens next?

In fact, declarative API and controllability are the means to connect the underlying infrastructure capabilities and operation and maintenance capabilities to Kubernetes, but not the end. What Kubernetes really hopes to achieve in building an "application-centric" infrastructure is to make the "application" the absolute protagonist in the infrastructure, and let the infrastructure build and function around the "application", not the other way around.

Specifically, in the next generation of "application-centric" infrastructure, business R & D no longer defines specific operation and maintenance configuration through declarative API (for example, the number of copies is 10), but "the maximum latency expected by my application is X ms". The next thing will be handed over to the application infrastructure like Kubernetes to ensure that the actual state is consistent with the expected state, in which the setting of the number of copies is only one part of all subsequent automation operations. Only by allowing Kubernetes to allow R & D to define applications from his own perspective, rather than defining Kubernetes API objects, can we fundamentally solve the problems caused by the "dislocation" of Kubernetes users. For business operations and maintenance, the next generation of "application-centric" infrastructure must be able to encapsulate and rediscover operation and maintenance capabilities from the perspective of applications, rather than directly letting operators learn and operate various underlying infrastructure plug-ins. * * for example, for kube-ovn, a Kubernetes network plug-in, operators no longer get the usage documents or CRD of kube-ovn itself, but a series of declarative descriptions of operation and maintenance capabilities that the plug-in can provide, such as "dynamic QoS configuration CRD" and "subnet isolation configuration CRD". Then, the operators only need to bind the specific policy values of dynamic QoS and subnet isolation for an application through these declarative API. All the rest will be left to Kubernetes.

In this way, after the underlying infrastructure layer is gradually Serverless, and more and more operation and maintenance capabilities are connected to the Kubernetes system, the next station of Cloud original will continue to move towards the direction of "taking the application as the center to the end".

In April 2019, Google Cloud released the Cloud Run service, presenting the concept of Serverless Application to the public for the first time. In September 2019, CNCF announced the establishment of an Application delivery Domain Group, announcing "Application delivery" as a first-class citizen of the cloud's native ecology. In October 2019, Microsoft released Dapr, a Sidecar-based application middleware project, which made "localhost-oriented programming" a reality.

In October 2019, Alibaba jointly released the Open Application Mode (OAM) project with Microsoft, which gave a complete description of the "application-centric" infrastructure and construction specifications for the first time. In 2019, we can already feel that a new cloud computing revolution is on the rise. At the end of this revolution, any software that conforms to the "application-centric" specification will be able to describe all its attributes from the perspective of research and development rather than infrastructure, as well as all the operation and maintenance capabilities and external dependencies needed for its own operation.

Such an application can be inherently Serverless Application: it can be "automatically matched" to any "application-centric" operating platform that meets the needs, and we no longer need to care about any infrastructure details and differences.

Summary

In 2019, after the scale problem of Kubernetes technology system is gradually solved, the whole cloud native ecology is actively thinking and exploring how to achieve the final vision of "application-centric" of cloud native technology concept. From Serverless Infrastructure to Serverless Application, we have seen that the cloud native technology stack is rapidly converging and converging toward the "application" center. We believe that in the near future, the only thing an application needs to do to run anywhere in the world is to declare "what I am" and "what I want". At this point, all infrastructure concepts, including Kubernetes, Istio, and Knative, will disappear: just like Linux Kernel.

We'll see.

"Alibaba Cloud Native focus on micro-services, Serverless, containers, Service Mesh and other technology areas, focus on cloud native popular technology trends, cloud native large-scale landing practice, to be the best understanding of cloud native developers of the technology circle."

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.