Practice of Kubernetes Container Cloud platform 07/19 Update SLTechnology News&Howtos

Practice of Kubernetes Container Cloud platform

2025-07-19 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Servers >

Shulou(Shulou.com)06/02 Report--

Kubernetes is an open source container orchestration engine for Google, which supports automated deployment, large-scale scalability, and application containerization management. With the rapid rise of cloud native technology, Kubernetes has actually become the standard of application containerization platform, which is more and more favored by enterprises, and it is also used more and more widely in production.

Since 2016, the construction of our container platform has roughly gone through three stages: exploration and pre-research, system construction and platform landing.

The following is to share the process of the construction of our container cloud platform from the aspects of Kubernetes network, storage, cluster management, monitoring and operation and maintenance, hoping to give you some thinking and inspiration.

I. kubernetes network

With the development of container network, it has been the pattern of duo meeting. The duo actually refers to the CNM of Docker and the CNI dominated by Google, CoreOS and Kuberenetes. First of all, it is clear that CNM and CNI are not network implementations, they are network specifications and network systems. From the perspective of research and development, they are a bunch of interfaces. Whether you use Flannel or Calico at the bottom, they do not care. CNM and CNI are concerned about network management.

According to the network demand survey, the business department mainly focuses on the following points: 1, the container network is connected with the physical network 2, the faster the better, the less changes, the better, 4, the fewer risk points as possible.

The network scheme of container can be divided into three forms: protocol stack level, traversal mode and isolation mode.

Protocol stack level: layer 2 is easy to understand, which is common in traditional computer room or virtualization scenarios, which is bridge-based ARP+MAC learning. Its biggest defect is broadcasting. Because layer 2 broadcast will limit the level of nodes; layer 3 (pure route forwarding), layer 3 of the protocol stack is generally based on BGP, and learns the routing status of the whole computer room independently. Its biggest advantage is its IP penetration, that is, as long as it is based on the IP network, then the network can be traversed. Obviously, its scale is very advantageous, and has a good order of magnitude scalability. However, in the actual deployment process, because most of the enterprise's network is controlled. For example, the BGP of some enterprise networks is not available to developers because of security considerations, or the enterprise network itself is not BGP, then you are limited; layer 2 plus layer 3 of the protocol stack has the advantage that it can solve the problem of scale expansion of layer 2 and various restrictions of layer 3, especially in cloud VPC scenarios, you can take advantage of the cross-node layer-3 forwarding capability of VPC.

Traversing form:

This is very relevant to the actual deployment environment. There are two kinds of traversing forms: Underlay and Overlay.

Underlay: in a better controllable network scenario, we generally use Underlay. Can be such a popular understanding, whether the following is bare metal or virtual machine, as long as the whole network can be controlled, the container network can directly pass through, this is Underlay.

Overlay:Overlay is common in cloud scenarios. Below the Overlay is a controlled VPC network. IP or MAC,VPC that are not under the jurisdiction of VPC will not be allowed to traverse IP/MAC. When this happens, we can use Overlay to do it.

Overlay network makes physical network virtualized and resource pooled, which is the key to realize cloud network convergence. With the combination of Overlay network and SDN technology, the SDN controller is used as the controller of the Overlay network control plane, which makes it easier to integrate the network and computing components, and is an ideal choice for the transformation of the network to cloud platform services.

Isolation method:

Isolation methods are usually divided into VLAN and VXLAN:

VLAN:VLAN is overused in the computer room, but there is actually a problem. Is that its total number of tenants is limited. As we all know, VLAN has a quantity limit.

VXLAN:VXLAN is a more mainstream way of isolation nowadays. Because it has better scale and better traversing mode based on IP.

We analyze several common network components of kubernetes (calico, contiv, flannel, Openshift SDN, custom routing) in traditional computer room network and cloud VPC network application scenarios from protocol level, traversal form and isolation method, and use wiring diagram to express their previous relationship.

First of all, whether it is the traditional computer room network or cloud-based VPC network, we can see that the Overlay scheme is universal, and it may be used more in cloud scenarios, because it has good penetration.

In the picture above, the solid red line points to the traditional computer room network, which is highlighted here. The Underlay + three-layer scheme is very popular in the traditional computer room network. at the same time, its performance is very considerable, and there are many scenarios.

The green dotted line points to the cloud-based VPC network, and the Underlay+ layer 3 network can also be limited in cloud-based VPC scenarios. Restricted use, as the name implies, can be used, but not every vendor allows you to use it, because each cloud vendor has a different definition of its own network protection. For example, like the Calico scheme, its BGP is easy to do in AWS, but it is not allowed in Azure, because Azure's VPC itself does not allow IP that is not controlled by it.

The solid yellow lines point to the clouded VPC network, and Overlay+ layer 2 or layer 3 is more common in cloud scenes. Below the Overlay is a controlled VPC network, which is more convenient for management and control.

Of course, there are also some problems in cloud-based VPC scenarios, as shown in the following figure.

Next, let's talk about the problem of network isolation between multi-tenants.

K8s introduces the network policy mechanism from version 1.3, through which the inbound and outbound access policies between POD can be realized.

Network policies can be applied to pod groups identified by common tags, and then tags are used to simulate traditional segmented networks, and front-end and back-end pod can be identified by specific "segment" tags. Policies control traffic between these segments, even from external sources. But not all network backends support policies, such as flannel. Now many manufacturers have strengthened their research in this area, and there are many new solutions, so they will not list them one by one.

There is also the management of cluster boundary Ingress.

Ingress only appeared in kubernetes 1.2. Container applications provide services in the form of Service by default, but Service only works inside the cluster. Exposing Service through Ingress can provide services to clients outside the cluster.

Here is a comparison of the common Ingress Controller, as shown in the table below

We can see that Nginx is better in terms of performance and functionality, as well as community activity, and more in practice.

II. Storage of kubernetes

K8s was originally used to manage stateless services, but as more and more applications migrate to the K8s platform, managing storage resources has become a very important function.

The use of storage in Kubernetes focuses on the following aspects:

Basic configuration file reading, password key management, etc.; storage status, data access, etc.; data sharing between different services or applications. There are roughly the following scenarios, as shown in the figure:

Kubernete storage is designed to follow the philosophy of Kubernetes, that is, declarative (Declarative) architecture. At the same time, in order to be compatible with as many storage platforms as possible, Kubernetes docks different storage systems in the form of in-tree plugin, so that users can use these plug-ins to provide storage services to containers according to their own business needs. Both FlexVolume and CSI customized plug-ins are compatible with users. Compared with Docker Volume, it supports more rich and diverse storage functions.

Kubernete storage plug-in parsing:

1. In-tree plugin: the storage code is tightly integrated with K8S and is too coupled.

2. FlexVolume: the storage plug-in is installed on the host and requires the root permission of the host

3. CSI specification: completely decouple the storage code from K8S (version 1.10 or above, using CSI attacher version 0.2.0)

Csi specification greatly facilitates the development, maintenance and integration of plug-ins, and has a good development prospect.

Kubernetes uses two resources to manage storage:

PersistentVolume (PV for short): a description of a storage added by an administrator, which is a global resource, including the type, size, access mode, and so on. Its life cycle is independent of Pod, for example, it has no effect on PV when using its Pod to destroy.

PersistentVolumeClaim (PVC for short): a resource in Namespace that describes a request for PV. The request information includes storage size, access mode, and so on.

PV can be regarded as an available storage resource, and PVC is the demand for storage resources. PVC will automatically bind the appropriate PV to Pod according to the requirements of Pod. The relationship between PV and PVC follows the lifecycle shown in the following figure.

PV mode has static and dynamic, static PV mode manages NFS, FC, ISCSI, dynamic PV mode manages glusterfs, Cinder, Ceph RBD, Vsphere, ScaleIO, AWS, Azure and so on. Statically, administrators are required to create and manage PV, while dynamically, the system automatically generates PV and binds PVC.

Here is a simple supplement to the image management in kubernetes. There will be many images of different versions and applications in production, and image management is also an important link.

Multi-tenant rights management of images:

1. The images of different tenants should be isolated from each other.

2. Different tenants have different permissions on the image, such as read-write, read-only, upload and download permissions.

3. The mirror library provides functions such as query, update and deletion of images.

For mirror management across regions and multiple data centers, you need to pay attention to the remote replication management of image libraries:

1. In a multi-data center or cross-region multi-site environment, in order to improve the download efficiency of multi-region images, at least two levels of image libraries are required: the total image library and the sub-image library.

2. Quasi-real-time incremental synchronization between mirror libraries.

III. Kubernetes cluster management

In the production system, the management of kubernetes multi-cluster mainly involves:

1. Service operation and maintenance

2. Centralized configuration

3. Capacity expansion and upgrading

4. Resource quota

First of all, talk about the scheduling management of multi-cluster.

1. Scheduling strategies in Kubernetes can be roughly divided into two types, one is global scheduling strategy, and the other is runtime scheduling strategy.

2. Isolation and recovery of NODE; expansion of NODE; dynamic expansion and scaling of Pod

3. Affinity can be deployed nearby, enhance the ability of the network to achieve the nearest routing of communications, and reduce the loss of the network. Anti-compatibility is mainly for the consideration of high reliability, so try to disperse the examples as much as possible.

4. Micro-service dependency, defining the startup sequence

5. Cross-departmental application of unmixed departments

6. Exclusive application of api gateway and GPU node

Application auto scaling management in multi-cluster management:

1. Manual expansion and reduction: know the changes of business volume in advance

2. Automatic expansion and reduction based on CPU utilization: the CPU resource utilization request must be set for the introduction of controller HPA,POD in v1.1.

3. Automatic expansion and reduction based on custom business indicators: version v1.7 redesigns HPA and adds components, which are called HPA v2.

In practical application, HPA still has many imperfections. Many manufacturers use their own monitoring system to monitor business indicators and achieve automatic expansion.

Kubernetes multi-cluster tuning:

There are three main difficulties:

The first is how to allocate resources. when users choose multi-cluster deployment, the system determines the number of containers allocated to each cluster according to the resource consumption of each cluster, and ensures that each cluster has at least one container. When the cluster automatically scales, containers are also created and recycled according to this ratio.

The second is failure migration. The cluster controller is mainly to solve the automatic scaling of multi-cluster and container migration in the event of cluster failure. The controller regularly detects multiple nodes of the cluster. If it fails many times, it will trigger the operation of cluster container migration to ensure the reliable operation of the service.

The third is the interconnection of network and storage, because the network across the computer room needs to be interconnected, we use the network scheme of vxlan, and the storage is also interconnected through dedicated lines. The container's image repository uses Harbor, and synchronization policies are set among multiple clusters, and each cluster sets its own domain name resolution, which is resolved to different image repositories.

Next, let's talk about the high availability implementation of Master nodes in K8S cluster. We know that the core of Kubernetes cluster is its master node, but there is only one master node by default. Once there is a problem with master node, the Kubernetes cluster will be paralyzed, and the management of the cluster and the scheduling of Pod will not be implemented. Therefore, there is an one-master and multi-slave architecture, including master node, etcd and so on, which can design highly available architecture.

And learn about the federation architecture of Federation clusters.

In a cloud computing environment, the range of services from near to far can be as follows: co-host (Host,Node), cross-host with availability zone (Available Zone), cross-availability zone with region (Region), cross-region with service provider (Cloud Service Provider), cross-cloud platform. K8s is designed to be a single cluster in the same region, because the network performance of the same region can meet the scheduling and computing storage connection requirements of K8s. The Cluster Federation (Federation) is designed to provide cross-Region and cross-service provider K8s cluster services to achieve high business availability.

Federation was introduced in version 1.3, and clustered federated federation/v1beta1 API extends the functionality based on DNS service discovery. Using DNS, POD can parse services transparently across clusters.

Version 1.6 supports cascade deletion of federated resources, version 1.8 claims to support 5000 node clusters, cluster federation V2

The current problems are:

1. Increase in network bandwidth and cost

2. Weakens the isolation between multiple clusters.

3. lack of maturity, there is no formal application in production.

IV. Monitoring and operation and maintenance of kubernetes

For a monitoring system, the common monitoring dimensions include: resource monitoring and application monitoring. Resource monitoring refers to the resource utilization of nodes and applications, which extends to node resource utilization, cluster resource utilization, Pod resource utilization and so on in container scenarios. Application monitoring refers to the monitoring of internal metrics of the application. For example, we will count the number of people online in real time and expose them through ports to achieve monitoring and alarm at the application business level. So which entities will the monitoring object be refined into in Kubernetes?

System component

Components built into the kubernetes cluster, including apiserver, controller-manager, etcd, and so on.

Static resource entity

Mainly refers to the resource state of the node, kernel events, and so on.

Dynamic resource entity

Mainly refers to the entities in Kubernetes that abstract workloads, such as Deployment, DaemonSet, Pod, and so on.

Custom application

Mainly refers to the need to apply internal monitoring data and monitoring indicators that need to be customized.

Comparison of cloud monitoring solutions for different containers:

About Prometheus monitoring:

Two main points should be paid attention to:

Encapsulation of  query api

The distribution of  configuration file

With prometheus, a powerful open source monitoring system, all we need to do is to query the encapsulation of api and the distribution of configuration files. There is nothing to say about the encapsulation of querying api, except that the front end calls our own server, and our server uses the http protocol to call prometheus's api interface to query the original data, then assemble it, and finally return it to the front end. The configuration file contains three parts, the definition of alerts, the configuration of alertmanager, and the configuration of prometheus. It is also difficult to expand here. If you are interested, you can go to the official website to have a look. Of course, you can also use Prometheus+Grafana to build a monitoring system, so that the visualization will be richer and faster.

Thinking of operation and maintenance-the integration of development and operation and maintenance

Thinking of operation and maintenance-the problem of high availability

Ocp platform:

1. Load balancer Router highly available cluster: 2 nodes

2. EFK high availability cluster: 3 ES nodes + n F nodes

3. Highly available clusters of image repositories: 2 image repositories

Micro-service architecture:

1. Registry highly available clusters (Eureka): 3

2. Configure the center with highly available clusters: 3

3. Gateway highly available clusters: 2

4. Key micro-services are all highly available clusters.

Thinking of operation and maintenance-high concurrency problem

Ocp platform:

1. Configure the auto expansion of backend micro services (Pod), the auto expansion of K8 and the second startup of Docker container can support the continuous growth of users.

2. 20% of resources are reserved in advance. When high concurrency occurs, resources can be expanded urgently.

Micro-service architecture:

Increase the number of circuit breakers for critical link microservices: improve the concurrent response capability of major services. The non-critical link micro-service is degraded or even turned off by fuse current limiting. Circuit breaker mechanism: improve fault tolerance in container cloud high concurrency scenarios, prevent fault cascading and avalanche effect of micro-services, and improve the availability of the system.

Middleware:

1. In addition to the clusters in use, add cold backup clusters in advance.

2. When a high concurrency scenario is about to occur, it can be expanded horizontally urgently.

There is the problem of performance stress testing and optimization, which is limited to the relationship of time, so we won't talk about it here.

Finally, the path of container cloud is summarized.

1. Business level: as large enterprises have high requirements for business stability and continuity, the evolution path of containerization must be from marginal business to core business, from simple applications to complex applications, specific to business, we can first consider the containerization migration in the front end of Web, and finally move the back-end business. 

two。 Technical level: at present, native Docker still has many shortcomings in service discovery, load balancing, container life cycle management, container-to-container network, storage and so on. Open source solutions and commercial versions provided by many third-party vendors have their own characteristics. No matter which product users choose, reliability and flexibility are two important factors that need to be carefully considered.

3. Take into account cost-effectiveness: comprehensively consider the balance between the cost and future benefits of containerization.

4. Considering the load capacity of the existing hardware, containerization is not a panacea. Some services that require higher concurrent throughput run directly on bare metal and improve performance through system tuning. Containerization may not be the best choice.

5, keep updating, always remind yourself to keep learning and embrace change, so that you can see the shortcomings of the platform and constantly iterate out better products.

In production practice, only by ramming the foundation to continuously improve the products based on container cloud platform and build the ecosystem, can we control the future.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.