What is the practice of development and deployment based on KubeSphere in production environment 07/06 Update SLTechnology News&Howtos

What is the practice of development and deployment based on KubeSphere in production environment

2025-07-06 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Servers >

Shulou(Shulou.com)05/31 Report--

Based on what the development and deployment practice of KubeSphere in the production environment is, this article introduces the corresponding analysis and solution in detail, hoping to help more partners who want to solve this problem to find a more simple and feasible method.

Background

Zhongtong Logistics is an express delivery enterprise with large domestic business scale and fast development in the first square. In 2019, all kinds of Zhongtong systems generate hundreds of millions of data streams, thousands of physical machines and virtual machines, and countless online micro-services. Such a huge management, so that the development of Zhongtong business is not sustainable, so start cloud transformation. In the process of transformation, Zhongtong chose KubeSphere as the construction plan of Zhongtong container management platform ZKE.

Business status and five difficulties

First of all, let me introduce the current situation of our Zhongtong business.

The picture above shows our data in 2019. When we began to transform, we generated hundreds of millions of data streams from various systems, thousands of physical machines and virtual machines, and countless online micro-services. By the third quarter of 2020, Zhongtong Express's market share had expanded to 20.8%, basically leading the industry. Such a huge management, with the development of Zhongtong business is basically unsustainable, so we urgently need to transform.

The difficulties we face in 2019 are roughly as follows:

1. Multi-version and multi-environment requirements of the same project

When our project is iterated, there are already more than N versions in the same project. If you still respond to resources in a virtual way, you can no longer keep up with the demand.

two。 The iterative speed of the project requires rapid initialization of environmental requirements

The iteration speed of our version is very fast, even one iteration a week.

3. It is troublesome to apply for resources and the initialization of the environment is complex.

In 2019, we applied for resources in a more traditional way, taking work orders and initializing the delivery of the environment. So testers are very painful during the test, they have to apply for resources first, and then release them after testing.

4. The utilization rate of virtual machine resources is low, and there are many zombie machines.

Some resources become zombie machines with the change of personnel or positions, and the number is very large, especially in the development and test environment.

5. Horizontal expansion difference

When we are in "618" or "double 11", resources are very scarce, especially the key core services. The previous approach is to prepare resources in advance. After "618" or "double 11" is over, we will recycle the resources. This is actually a very backward way.

How to carry out cloud transformation?

Through the survey, we believe that cloud transformation can be divided into three steps: cloud computer room, cloud ready and cloud native.

At that time, our micro-services were relatively high-end, using the Dubbo framework, and the transformation of micro-services had been completed, but the way was very traditional, which was started by virtual machines. Salt has a lot of problems when it comes to a lot of concurrency. Therefore, through the evaluation, we urgently need to modify the IaaS and containers.

Because when we intervened, the development of the whole business of Zhongtong was already very large. We have a very mature DevOps team that perfects the requirements of the released CI/CD. So if we intervene, we can only do the construction of IaaS and Kubernetes.

Why KubeSphere Development and deployment practice chooses KubeSphere

In the selection, the first thing we come into contact with is KubeSphere. At that time, I found KubeSphere through search, and then tried it out, and found that the interface and experience were very good. After a week of trial, we decided to use KubeSphere as the construction plan of Zhongtong container management platform ZKE. As far as I can remember, we started using KubeSphere 2.0 at that time. At the same time, under the influence of KubeSphere, we quickly reached a cooperation agreement with Qingyun to directly use Qingyun's private cloud products to build the IaaS of Zhongtong Logistics, while KubeSphere is used as the upper container PaaS platform to host micro-services.

Construction direction

Based on the current situation at that time, we combed the direction of the whole construction. As shown in the following figure, we will run stateless services and visually manage Kubernetes and infrastructure resources based on the container management platform KubeSphere. IaaS provides some stateful services, such as middleware.

The following picture is very familiar to all of you. The results of our applications in the first three parts are very good. I won't introduce you too much for the time being, but I'll focus on micro-services. We tried Istio at that time and found that it was heavy and expensive to transform. Because our microservice itself is relatively advanced, we don't use this area for the time being, and we may try it on the Java project in the future.

Multi-tenant large cluster or single tenant small cluster?

After the selection is completed, we begin to build. The first problem is very thorny: whether we should build a large multi-tenant cluster or multiple single-tenant small clusters and split it up.

After communicating and cooperating with the KubeSphere team and fully evaluating the needs of our company, we decided to adopt multiple small clusters for the time being, dividing them into business scenarios (such as Zhongtai business, scanning business) or resource applications (such as big data, marginal). We will cut into several small clusters and use the above DevOps platform to do CI/CD. KubeSphere's container management platform is mainly used as a container support, which allows users to view logs, deployment, refactoring and so on.

At that time, we based on the multi-cluster design and took KubeSphere 2.0 as the blueprint for the transformation. In the development, test and producer environments, we deploy a set of KubeSphere in each cluster. Of course, there are some common components that we will remove, such as monitoring and logging.

When we integrated, the KubeSphere team gave us a lot of help, because KubeSphere version 2.0 only supports LDAP docking, while the plan to dock OAuth is in version 3.0. later, the KubeSphere team helped us integrate to 2.0 and made a separate branch. Because our company's internal OAuth authentication also has custom parameters, after our development and transformation, the way of code-scanning authentication is quickly integrated.

Practice of secondary development based on KubeSphere

The following is a description of our customized development from the summer of 2019 to October 2020 based on the integration of our business scenarios and KubeSphere.

1. Over-score setting

We use the super-ratio method, as long as you set up the Limit, we will soon be able to calculate your Requset and integrate it for you. In current production, the CPU is 10 and the memory is about 1.5.

2.GPU cluster monitoring

At present, our use is still relatively primary, just to measure the usage and display the separate monitoring data of the GPU cluster.

3.HPA (horizontal scaling)

When we use KubeSphere, we actually have very high expectations for horizontal scaling. There is horizontal scaling in the resource allocation of KubeSphere, so we pull out the horizontal scaling and set it separately. The horizontal stretch setting combined with the overscore setting can well measure the overscore ratio.

Many core businesses have achieved good results through HPA and KubeSphere interface settings, and now there is basically no need for operation and maintenance intervention. Especially when there is an emergency scenario, such as the upstream MQ consumption backlog, we need to expand the copy immediately, so that we can respond very quickly.

4. Batch restart

In extreme cases, you may have to restart a large number of Deployments in batches. We extract this separately to make a small module. With one click on the KubeSphere platform, the Deployment or cluster under a project (NameSpace) can be restarted immediately, and you can get a quick response.

5. Container affinity

In the part of container affinity, we mainly do soft anti-affinity. Because the resource usage of some of our applications may be mutually exclusive, for example, they are all CPU resource usage type, so we simply modified it and added some affinity settings.

6. Scheduling strategy

In terms of scheduling strategy, we originally planned to do it through Yaml because it involves more sensitive background data. However, it was decided to do so through the advanced settings page of KubeSphere. We simply add some page elements to configure the functions of designated host, designated host group and exclusive host in the form of table rows. What we use particularly well now is to specify the host group and the exclusive host.

Briefly introduce the application of exclusive host. Our core business is from evening to 6: 00 in the morning, because the service is relatively free during this time, so it is very suitable to run big data app. We empty it out by monopolizing the host to prevent it from running all over the cluster, so we just hang up some points.

7. Gateway

KubeSphere has the concept of an independent gateway, and there is a separate gateway under each project. Independent gateways meet our production needs (because we want to produce independent gateways), but there is a need for a universal gateway in development and testing, because we want to respond to services more quickly. So we made a pan-gateway and set up an independent gateway, and all development, testing and domain names came in directly through the pan-domain name. This piece is configured and simply choreographed through the KubeSphere interface, basically our service can be accessed directly.

8. Log collection

We started by collecting logs in an official way, that is, through Fluent-Bit. But later found that as more and more business comes online, Fluent-Bit will often hang up. The reason for this may be that we have defects in resource optimization, or it may be that the whole parameters have not been adjusted. So we decided to enable Sidecar for log collection. Java services will set up a separate Sidecar and push its logs to a center like ElasticSearch through a small Agent like Logkit. In the development test environment, we also use Fluen-agent to collect logs. In addition, there are some production scenarios, which must ensure the integrity of the log, so we will further persist the log to disk. Collect all container logs in the four ways shown in the figure below.

9. Event tracking

We directly took Aliyun's open source Kube-eventer for transformation. We have added event tracking to KubeSphere, which can be configured and sent to our nail group. Especially in the production is more concerned about business changes, can be customized into the nail group.

Future planning

Next, we may promote mass production, and we have also put forward some ideas and want to communicate with the community.

1. Service market

In the KubeSphere console interface, we look at our micro services in the form of table rows, but we don't know the relationship between them. We hope to show it graphically and visually present its key indicators-events, logs, anomalies, and so on, so that we can operate visually. We are currently planning and should do it alone next year.

What we want to express is that anyone, including operators and developers, can know what the architecture of our services is, which middleware and databases we currently rely on, and the current status of the services. For example, which services are down, or which services have hidden problems at present.

two。 Global PODS

The name of the second picture is Global PODS. On the official side of the KubeSphere, it should be called a thermal map. We hope that from the perspective of the entire cluster, we can see all the current status of PODS, including its color changes and resource status.

3. Edge computing

The planning of this part of marginal calculation is shared by my colleague Wang Wenhu.

Aiming at the combination of edge computing and container, we finally chose KubeEdge through research. The scenarios where Zhongtong is suitable for edge computing include:

Transfer express scan data upload. After the express data of each transit center is scanned, it is first processed by the services deployed by each transit center, and then the processed data is uploaded to the data center. The services deployed by various transit centers are now distributed remotely through automated scripts. at present, there are nearly 100 transit centers in Zhongtong, requiring 5 people per day for each release. Through the edge management scheme, the manpower release and operation and maintenance costs can be greatly reduced. In addition, the release strategy can be flexibly customized with the Operator development model recommended by the Kubernetes community.

Operator violence sorting automatic identification. In order to reduce the damage rate of express shipments, Zhongtong installs cameras in transit centers and their network lines to scan operators for daily operation. The scanned data will be transmitted to the local GPU box for picture processing, and the processed data will be transmitted to the data center. Currently, the applications in the GPU box are released by manual login, which is very inefficient; the box is often lost, and it may have been a long time since the problem was found. The current publishing and node monitoring problems can also be solved through the KubeEdge edge scheme.

The project of each center wisdom park has been landed. The project is also being launched by the company, and there will be many edge scenes that can solve the current pain points with the help of container technology.

The answer to the question about the development and deployment practice based on KubeSphere in the production environment is shared here. I hope the above content can be of some help to you. If you still have a lot of doubts to be solved, you can follow the industry information channel for more related knowledge.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.