National Day charging: what is the essence of Kubernetes from container to container cloud? 10/26 Update SLTechnology News&Howtos

National Day charging: what is the essence of Kubernetes from container to container cloud?

2025-10-26 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Servers >

Shulou(Shulou.com)06/02 Report--

Who am I

My name is Zhang Lei. I am a senior member of the Kubernetes community and project maintainer. Engaged in upstream development work in Kubernetes and Kata Containers community, successively initiated many core features such as container image intimate scheduling and scheduling optimization based on equivalence class, and participated in the design and development of container runtime interface, secure container sandboxie and other basic features. As one of the main R & D personnel and maintainers, I have experienced the birth and rise of the concept of Serverless Container.

In my spare time, I initiated and organized the book Docker Container and Container Cloud, which has been well received by the readers who want to advance the container technology. Participated in and experienced the whole process of container technology from "fledgling" to "dust settling".

At the end of the article, there is a guide to access to K8S knowledge benefits.

From Container to Container Cloud

I once mentioned that a "container" is actually an isolated environment for processes built by Linux Namespace, Linux Cgroups, and rootfs technologies.

So, a running Linux container can actually be treated as "split in two":

A set of rootfs jointly mounted on / var/lib/docker/aufs/mnt, which we call "container Container Image", is a static view of the container

An isolated environment consisting of Namespace+Cgroups, which we call the Container Runtime (Container Runtime), is a dynamic view of the container.

As a developer, we don't care about differences in container runtime. Because, in the whole "develop-test-release" process, it is the container image that really carries the container information, not the container runtime.

This important assumption is the main reason why the container technology circle quickly moved towards the "superstructure" of "container orchestration" shortly after the success of the Docker project: as a cloud service provider or infrastructure provider, as long as I can run the Docker image submitted by the user as a container, I can become a bearing point on this very lively container ecological map, thus putting the value on the whole container technology stack. Precipitated on this node of mine.

More importantly, as long as I look back from my bearer point to the makers and users of Docker images, there is room for me to play and make a profit on each service node along the path, such as CI/CD, monitoring, security, network, storage, and so on. This logic is an important reason why all cloud computing providers are so keen on container technology: through container images, they can be directly associated with potential users (that is, developers).

From a developer and a single container image, to countless developers and huge container clusters, container technology has made a leap from "container" to "container cloud", indicating that it has really been recognized by the market and ecology.

In this way, the container has leapt from a gadget in the hands of a developer to an absolute protagonist in the field of cloud computing, while the "container orchestration" technology, which can define container organization and management specifications, is unyielding to sit on the top spot in the field of container technology.

Among them, the most representative container orchestration tools are the Compose+Swarm combination of Docker and the Kubernetes project led by Google and RedHat.

Design and Architecture of Kubernetes Project

I'd like to talk to you about the design and architecture of the Kubernetes project.

Unlike the development path of engineering practice before methodology in many infrastructure fields, the theoretical basis of the Kubernetes project is much more advanced than the engineering practice, thanks to the Borg paper published by Google in April 2015.

Borg system has always been regarded as the most powerful "secret weapon" within Google. Although it is a bit of an exaggeration, this is not a brag. Because, compared with the relatively upper-level projects such as Spanner and BigTable, the responsibility of Borg is the core dependence that carries the entire infrastructure of Google. In the infrastructure system papers published by Google, the Borg project is at the bottom of the entire infrastructure technology stack.

Photo source: Malte Schwarzkopf. "Operating system support for warehouse-scale computing". PhD thesis. University of Cambridge Computer Laboratory (to appear), 2015, Chapter 2.

The picture above comes from the doctoral thesis of the first author of the Google Omega thesis. In this diagram, you can find well-known projects such as MapReduce and BigTable, as well as Borg and its successor Omega at the bottom of the technology stack.

In this way, Borg can be said to be the most unlikely open source project for Google. Fortunately, thanks to the popularity of the Docker project and container technology, it has finally been able to meet the open source community in another way: the Kubernetes project.

Therefore, compared with the "small" Docker company and the Mesos community of "new wine in old bottles", the Kubernetes project has been lucky to stand on a height that is difficult for others to reach from the very beginning: in its growth stage, the proposal of each core feature is almost born out of the design and experience of Borg/Omega system. More importantly, these features have been greatly improved during the landing of the open source community and with the joint efforts of the community, and fixed many defects and problems left in the Borg system at that time.

Therefore, although it was criticized as "lofty" at the beginning of the release, after gradually becoming aware of the "immaturity" of the Docker technology stack and the "aging" of the Mesos community, the community soon understood that the Kubernetes project, under the guidance of the Borg system, embodies a unique "advanced nature" and "completeness", which are the core values for the survival of an open source project in the infrastructure field.

To better understand these two qualities, let's start with the top-level design of Kubernetes.

Problems to be solved by the Kubernetes project

Choreography? Dispatch? Container cloud? Or cluster management?

In fact, there is no fixed answer to this question so far. Because in different stages of development, Kubernetes needs to focus on solving different problems.

However, for most users, they want the experience of the Kubernetes project to be certain: now that I have a container image of the application, please help me run the application on a given cluster.

Furthermore, I hope Kubernetes can provide me with a series of operation and maintenance capabilities, such as routing gateway, horizontal expansion, monitoring, backup, disaster recovery and so on.

Wait a minute, do these functions sound familiar? Isn't that the power of classic PaaS (e.g., Cloud Foundry) projects?

Moreover, with Docker, I don't need any Kubernetes or PaaS at all. As long as I use Docker's Compose+Swarm project, I can easily DIY these functions!

So, if the Kubernetes project only stays in pulling user images, running containers, and providing common operation and maintenance functions, then it is difficult to compete with the "native" Docker Swarm project, even if it has no advantage over the classic PaaS project.

In fact, in the process of defining core functions, the Kubernetes project quickly gained a foothold in just a few months, relying on the theoretical advantages of the Borg project, and then determined a global architecture as shown in the following figure:

From this architecture, we can see that the architecture of the Kubernetes project, which is very similar to its prototype project Borg, is composed of Master and Node nodes, and these two roles correspond to control nodes and computing nodes respectively.

Among them, the control node, that is, the Master node, is composed of three closely cooperative independent components, namely, the kube-apiserver responsible for API services, the kube-scheduler responsible for scheduling, and the kube-controller-manager responsible for container orchestration. The persistent data of the entire cluster is processed by kube-apiserver and saved in Ectd.

The core part of the computing node is a component called kubelet.

Kubelet component

In Kubernetes projects, kubelet is primarily responsible for dealing with container runtimes, such as Docker projects. This interaction relies on a remote call interface called CRI (Container Runtime Interface), which defines the core operations of the container runtime, such as all the parameters needed to start a container.

This is why the Kubernetes project does not care what container runtime you deploy and what technology implementation you use, as long as your container runtime can run a standard container image, it can be connected to the Kubernetes project by implementing CRI. Specific container runtimes, such as Docker projects, generally interact with the underlying Linux operating system through the OCI container runtime specification, that is, translating CRI requests into calls to the Linux operating system (operating Linux Namespace, Cgroups, etc.).

In addition, kubelet interacts with a plug-in called Device Plugin through the gRPC protocol. This plug-in is the main component of Kubernetes project to manage host physical devices such as GPU, and it is also a function that must be paid attention to in machine learning and training, high-performance homework support and other work based on Kubernetes project.

Another important function of kubelet is to call network plug-ins and storage plug-ins to configure network and persistent storage for the container. The interfaces for these two plug-ins to interact with kubelet are CNI (Container Networking Interface) and CSI (Container Storage Interface).

In fact, the strange name kubelet comes from Borglet, a homologous component in the Borg project. However, if you look at the Borg paper, you will find that this naming may be the only similarity between kubelet components and Borglet components. Because the Borg project does not support the container technology we are talking about here, it simply uses Linux Cgroups to restrict the process.

This means that "container images" like Docker do not exist in Borg, and Borglet components naturally do not need to think about how to interact with Docker and manage container images like kubelet, nor do they need to support many container technology interfaces such as CRI, CNI, CSI, and so on.

It can be said that kubelet is a component reimplemented in order to realize the container management ability of Kubernetes project, and there is no direct inheritance relationship with Borg.

Note: although Docker is not used, Google does use a package management tool called Midas Package Manager (MPM), which can partially replace the role of Docker mirrors.

The guiding role of Borg

What is the role of Borg in guiding Kubernetes projects? The answer is the Master node.

Although Borg projects are different from Kubernetes projects in the implementation details of Master nodes, their starting points are highly consistent, that is, how to schedule, manage and schedule jobs submitted by users?

Therefore, the Borg project can think of Docker images as a new way to package applications. In this way, the past experience of the Borg team in large-scale job management and scheduling can be directly applied to Kubernetes projects.

The main manifestation of these experiences is that, from the beginning, the Kubernetes project did not regard Docker as the core of the entire architecture as the various "container cloud" projects of the same period, but only as a container runtime implementation at the bottom.

The problem that the Kubernetes project focuses on comes from a very important point mentioned by Borg researchers in their paper:

There are actually all kinds of relationships among the various tasks running in a large-scale cluster. The handling of these relationships is the most difficult part of the job scheduling and management system.

That's exactly what it is.

In fact, the relationship between tasks and tasks can be seen everywhere in our usual technical scenarios. For example, the access relationship between a Web application and the database, the proxy relationship between a load balancer and its back-end services, and the invocation relationship between a portal application and authorized components.

Furthermore, it is entirely possible to have such a relationship between different functions that belong to the same service unit. For example, the file exchange relationship between a Web application and the log collection component.

Before the popularization of container technology, the traditional virtual machine environment was relatively "coarse-grained" in dealing with this relationship. You will often find that many unrelated applications are deployed in the same virtual machine simply because they occasionally initiate several HTTP requests to each other. More often, after an application is deployed in a virtual machine, you have to manually maintain a number of daemons (Daemon) that collaborate with it to handle its log collection, disaster recovery, data backup and other auxiliary work.

However, after the advent of container technology, it is not difficult to find that containers have a unique "fine-grained" advantage in the division of "functional units": after all, the nature of containers is just a process. In other words, applications, components, and daemons that were previously crowded in the same virtual machine can be mirrored and run in separate containers if you like. They do not interfere with each other, have their own resource quotas, and can be scheduled on any machine in the entire cluster. This is the ideal working state of a PaaS system, and it is also a prerequisite for the landing of the so-called "micro-service" idea.

Of course, if you only achieve the level of "encapsulating microservices and scheduling single containers", the Docker Swarm project will be more than sufficient. If you add in the Compose project, you even have the ability to handle some simple dependencies, such as a "Web container" and the database "DB container" it accesses.

In the Compose project, you can define a "link" for such two containers, and the Docker project is responsible for maintaining the "link" relationship. The specific method is: Docker will inject the IP address, port and other information of the DB container into the Web container as environment variables for use by the application process, such as:

DB_NAME=/web/dbDB_PORT=tcp://172.17.0.5:5432DB_PORT_5432_TCP=tcp://172.17.0.5:5432DB_PORT_5432_TCP_PROTO=tcpDB_PORT_5432_TCP_PORT=5432DB_PORT_5432_TCP_ADDR=172.17.0.5

When the DB container changes (for example, mirror updates, being migrated to other hosts, etc.), the values of these environment variables are automatically updated by the Docker project. This is a typical example of a platform project that automatically handles relationships between containers.

But what if our current need is for this project to be able to handle all the types of relationships mentioned above and even support more types of relationships that may emerge in the future?

At this point, "link", a solution designed for a single case, is too simple. If you have done architectural work, you will be impressed: once you want to pursue the universality of the project, be sure to design from the top level.

Therefore, the main design idea of the Kubernetes project is to define the relationships between tasks in a unified way from a broader point of view, and leave room to support more kinds of relationships in the future.

For example, the Kubernetes project classifies "access" between containers, first summing up a very common type of "close interaction" relationship, that is, these applications require very frequent interaction and access, or they exchange information directly through local files.

In a conventional environment, these applications are often deployed directly on the same machine, communicating through Localhost and exchanging files through the local disk directory. In the Kubernetes project, these containers are divided into a "Pod", and the containers in the Pod share the same Network Namespace and the same set of data volumes, thus achieving the purpose of exchanging information efficiently.

Pod is the most basic object in the Kubernetes project, which comes from a design named Alloc in the Google Borg paper. We will elaborate further on Pod in subsequent chapters.

For another, more common requirement, such as the access relationship between Web applications and databases, the Kubernetes project provides a service called "Service". Two applications like this are often deliberately not deployed on the same machine, so that even if the machine where the Web application is located is down, the database is not affected at all. However, we know that for a container, its IP address and other information is not fixed, so how does the Web application find the Pod of the database container?

Therefore, the practice of the Kubernetes project is to bind a Service service to Pod, while information such as the IP address declared by the Service service is "life-long". The main function of this Service service is to act as a Portal for Pod, thus exposing a fixed network address to the outside world instead of Pod.

In this way, for the Pod of Web applications, what it needs to care about is the Service information of the database Pod. It is not difficult to imagine that it is the responsibility of the Kubernetes project to automatically update and maintain the IP address, port and other information of the Pod that the Service backend really proxies.

Like this, by constantly expanding to a real technical scenario around containers and Pod, we can figure out a "panorama" of the core functions of the Kubernetes project as shown below.

According to the clue of this diagram, starting from the most basic concept of containers, we first encountered the problem of "close collaboration" between containers, so we extended to Pod; with Pod, and we hope to start multiple application instances at one time, so we need Deployment, the multi-instance manager of Pod. With such a set of the same Pod, we need to access it in a load-balanced manner through a fixed IP address and port, so we have Service.

However, if there is not only an "access relationship" between two different Pod, but also authorization information is required when initiating. The most typical example is that Web applications need Credential (database username and password) information when accessing the database. So how do you deal with such a relationship in Kubernetes?

The Kubernetes project provides an object called Secret, which is actually a key-value pair data stored in Etcd. In this way, you store Credential information in Etcd as Secret, and Kubernetes will automatically mount the data in Secret to the container as Volume when your specified Pod (for example, Pod of Web application) is started. This way, the Web application can access the database.

In addition to the relationship between the application and the application, the shape of the application operation is the second important factor affecting "how to hold the application".

To this end, Kubernetes defines new, improved objects based on Pod. For example, Job is used to describe Pod that runs at one time (for example, big data task), DaemonSet is used to describe daemon services that must and can only run one copy on each host, and CronJob is used to describe scheduled tasks, and so on.

All of this is the main way Kubernetes projects define relationships and shapes between containers.

As you can see, the Kubernetes project does not create an instruction for each management function and then implement the logic in the project, as other projects do. This approach can indeed solve the current problems, but when more problems come, they will often fall short of their abilities.

By contrast, in the Kubernetes project, the preferred usage is:

First, describe the application you are trying to manage through a "orchestration object", such as Pod, Job, CronJob, etc.

Then, define some "service objects" for it, such as Service, Secret, Horizontal Pod Autoscaler (automatic horizontal extender), and so on. These objects will be responsible for specific platform-level functions.

This method of use is called "declarative API". The corresponding "orchestration objects" and "service objects" of this API are API objects (API Object) in the Kubernetes project.

This is the core design concept of Kubernetes, and it is also the key technical point that I will focus on next.

How to start containerization tasks

For example, I have now created a Nginx container image, and I want the platform to help me start this image. In addition, I asked the platform to help me run two identical copies of Nginx to provide services in a load-balanced manner.

If you are your own DIY, you may need to start two virtual machines, install two Nginx separately, and then use keepalived to make a virtual IP for the two virtual machines.

What if you use the Kubernetes project? What you need to do is write a YAML file like this (called nginx-deployment.yaml, for example):

ApiVersion: apps/v1kind: Deploymentmetadata: name: nginx-deployment labels: app: nginxspec: replicas: 2 selector: matchLabels: app: nginx template: metadata: labels: app: nginxspec: containers:-name: nginx image: nginx:1.7.9 ports:-containerPort: 80

In the above YAML file, we define a Deployment object whose body (spec.template part) is a Pod mirrored using Nginx, and the number of copies of this Pod is 2 (replicas=2).

Then execute:

$kubectl create-f nginx-deployment.yaml

In this way, two identical copies of the Nginx container are started.

However, it seems that Kubernetes users have a lot of work to do to do the same thing.

Later, I will introduce the benefits of the "declarative API" of the Kubernetes project, as well as the powerful orchestration capabilities based on it.

Colored eggs: free collection of Kubernetes skill map

If you follow this official account to reply "K8S", you can get the "Kubernetes skill Map" produced by Kubernetes project maintainer Zhang Lei & Etcd project author and senior technical expert of Ali system Software Division.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.