Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

Practical information | Boyun's practice of landing container network plug-ins in financial enterprises based on OVS

2025-01-17 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Servers >

Share

Shulou(Shulou.com)06/03 Report--

This article is organized according to the content shared by Bo Yun in the dockerone community WeChat group.

In the past few years, Boyun has encountered many pain points in the enterprise landing container cloud platform, one of which is a typical pain point from the network. Today, I am very happy to talk to you about this topic and introduce our CNI plug-in based on OVS self-development-internally called fabric project.

01

Network requirements for container platform landing

Since Docker technology became popular among developers around 2013, Kubernetes has become a de facto container orchestration engine. Containers, microservices and DevOps support each other and promote each other. There are more and more actual landing cases of container cloud platforms. Especially since 2018, more and more enterprises have begun to think about how to use container cloud platforms to support their production scenarios and ultimately improve productivity.

Unlike the development test scenario, the production scenario requires a set of platform or system requirements to be launched. Security, monitoring, process, existing system integration, business exposure and other construction requirements must be matched, otherwise it is impossible to go online. In this process, especially in the case of strict regulatory requirements of traditional financial enterprises, container cloud platform will encounter many problems when landing, among which, the most typical requirement is the network construction of container cloud platform, which must meet the demands of business parties, operation and maintenance personnel, security personnel and network personnel at the same time.

Most container cloud platforms are now built on Kubernetes, there are also many CNI plug-ins on the market, and the requirements of each enterprise's existing network are also very different, so it is almost impossible to have a network model that fits all customer scenarios. At present, the mainstream mature and stable CNI, such as calico, performs well in terms of scalability and stability, but it is very difficult for traditional financial enterprises to land, and it is often necessary to compromise on different needs.

We have conducted in-depth communication with many customers. Although the needs are different, the main demands summarized include:

In mainstream Layer 2 data centers, limited by hardware capabilities or management complexity, most customers do not want to introduce Layer 3 routing concepts such as BGP.

Enterprise business systems are often deployed inside and outside the container cloud platform at the same time, hoping that the network inside and outside the platform can be directly connected.

IP address is the identity of the business, it is hoped to have the ability to fix IP, and it is a manageable and auditable IP address.

Management network and data network are separated.

Network isolation capability, strong security of hardware isolation and flexibility of software isolation are required.

The network model should be as simple as possible, easy to control, and easy to debug.

High performance, low jitter network throughput.

Other advanced features such as bidirectional speed limit, DPDK, overlay, etc.

After extensive research on mainstream CNI plug-ins in the market, we found that mainstream CNI does not support the above "localization" requirements satisfactorily. The main unsatisfied points include:

Network model differences (three-tier VS two-tier, of course, L2 solutions are also many, OVS has advanced functions such as flow tables, which are very suitable for today's cloud environment), to adapt to the existing network environment, security policies, etc.

The concept of cloud origin. The mainstream CNI better meets the concept of cloud native, but the actual needs of customers are actually some "anti-cloudnative", how to balance between cloudnative and anti-cloudnative is generally lack of practical experience.

Simple, stable and reliable. This is actually a very important point to consider. Manufacturers and enterprises must have corresponding personnel to control the network model. After all, the network, as the bottom layer of the cloud platform, has too much impact after problems occur.

After comprehensive analysis of the core requirements of network construction and the status quo of the community, we decided to launch the beyondFabric project. At present, this project has been used as one of the two network models (calico/beyondFabric) supported by Boyun Container Cloud Platform, supporting the stable operation of production systems of many enterprises.

02

BeyondFabric

BeyondFabric is our self-developed kubernetes CNI plug-in, using etcd as its data storage unit, built-in perfect IPAM capability, can well meet the core requirements of the aforementioned customers. Because BeyondFabric is based on the two-layer network design and has been optimized for specific requirements, it performs well in some scenarios (especially in financial enterprise data centers that attach great importance to security in China); however, it also determines that BeyondFabric is not suitable for all scenarios. The specific choice of CNI should be evaluated according to its own situation. (There is no single CNI that meets all scenario requirements.)

fabric classic pattern diagram

From the concept diagram of fabric, you can clearly see the network topology of the cloud platform at a glance. An OVS is installed on each compute node and used as a pure virtual switch. The container is connected to the port of the OVS through the veth pair to naturally obtain the network identity in the physical environment. At the network level, the container, virtual machine and physical machine are completely equal. Both network administrators and business personnel can understand the topology of the network simply and clearly. Moreover, this simplified deployment model (which is also the most widely used model) does not include complex logic such as controllers, providing a simple, efficient, and stable network environment.

Component diagram of fabric

Fabric is a CNI plug-in based on OVS. Its specific function is to set up a network and set up IP addresses for POD.

Fabric-ctl is responsible for network and IP address management. It provides network/IP management capabilities through RESTFUL API, such as creating networks, editing networks, and finding IP addresses. Fabric-ctl itself is stateless and all state information is stored in etcd.

Fabric-admin is mainly used by platform administrators or BOC operation and maintenance personnel, which is convenient for users to view and manage network and IPAM. Refer to Kubectl for the command line format of fabric-admin.

In classic networking mode, ovs can be used as a basic virtual switch, which is very simple. If you use an isolation policy such as networkpolicy, you need to introduce a distributed controller on each node.

network management capabilities

In addition to the CNI protocol, the fabric project also provides additional network management capabilities in the form of restful APIs and annotations. After the interface integration, it can be convenient for management personnel to use, such as adding network, viewing network, viewing IP address use, fixed IP, etc. in the figure below.

increase network

view network

View IP address usage

fixed IP address

maturity

The maturity of a fabric project is determined by many aspects. In addition to the simple network model at the beginning of fabric design and mature components (no additional complex components, even in scenarios such as policy control/overlay, only a distributed controller is introduced on each node), we have also done the following work.

fabric-admin

Considering that abnormal conditions at the software and hardware level, such as kubelet or fabric bugs, environment (hardware damage), etc., may affect the normal operation of the system to varying degrees, a fabric-admin tool is provided, located in the/opt/cni/bin directory. Its function is similar to the FSCK capability of the file system, providing protection for runtime management. The command-line format matches kubectl perfectly, making it friendly to users familiar with kubernetes.

For example, you can view the IP occupancy of a pod (example output truncated):

At the same time, fabric-admin also provides support for a variety of runtime management capabilities. After running--help, you can prompt:

Just as FSCK is an important symbol of file system maturity, fabric-admin is a powerful guarantee of beyond Fabric project maturity! (Although fabric-admin is powerful, it has never been used in the customer's current network environment, which also illustrates the maturity of the fabric project from the side)

kubernetes Community CNI Test Suite

Because the fabric project fully meets the CNI protocol specification, it can be tested using any CNI test tool. Our testing team combined CNI testing tools and k8s job objects provided by the community to conduct a long time of rigorous testing on beyondFabric, and the test results proved that the fabric project has production usability.

Multiple platform support

In private cloud construction, container cloud platforms generally run in physical environments or virtualized environments such as vmware/openstack. Fabric supports these deployment environments perfectly. For scenarios where the network environment is complex and difficult to change, fabric based overlay can significantly reduce environment dependence.

Multiple landing cases

Boyun container cloud platform has several landing cases based on fabric, which runs well in many aspects such as manageability, stability and performance.

BeyondFabric Performance

Let's take a look at fabric performance. Since fabric uses stable and reliable OVS as its basic unit, its performance loss should be very small in principle, and our performance test based on 10G network in physical environment also verifies this point. It can be seen that pod-pod/pod-node losses are about 5% lower than node-node losses.

Boyun Container Cloud Platform Network Model Selection Suggestions

Then let's look at the selection recommendations. In the process of customers landing on the container cloud platform, we will conduct a lot of communication with customers, one of which is the network model communication involving business parties, security personnel, network personnel and operation and maintenance personnel. The specific selection suggestions are shown in the table below, and the final selection result is jointly determined by all involved personnel.

Summary of the fabric project

OK, a brief summary of the fabric project. The fabric project solves some of the main pain points of the enterprise landing container cloud platform, and can well meet the demands of various functional departments through the classic network model. But after all, no network solution can meet all network demands, fabric also has its inherent shortcomings, such as the classic network mode requires customers real IP network, these network resources in the containerized environment consumption speed is very fast, need to create good network resources in advance according to business needs, however, some customers IPV4 resources will be relatively tight. Of course, this point will be greatly improved with the support of VXLAN and the popularity of IPV6. The core of fabric is a two-tier solution, which will inevitably face the problem of scalability. Our current solution is to map the real network partition through the concept of partition, and then expand the Kubernetes cluster by expanding the partition.

Q: How is IPAM fixed IP implemented? Is IP associated with Pod UID?

A: After the administrator enters the network information, Fabric will store all IP addresses in etcd for unified management. Fixed IP is currently implemented by adding an Annotation to a workload object such as deployment. IP is not associated with Pod UID.

Q: Do the Layer 3 and Layer 2 networks mentioned here refer to Layer 3 and Layer 2 of the Layer 7 protocol?

A: Yes, for example, switches work at layer 2 and routers work at layer 3.

Q: How is Service Load Balancer implemented?

A: Load Balancer for external traffic import cluster is implemented through another component, ingress controller, not implemented in CNI. Kubernetes svc's Load Balancer is implemented through iptables. Fabric projects will also add some rules to iptables, mainly cross-node SNAT.

Q: Does it support flow limiting?

A: Ingress/Egress speed limit is supported. By adding Annotation to the container, the speed limit of the container can be realized.

Q: Is there a comparison with Contiv?

A: It was done in the selection stage, which was relatively early. At that time, it seemed that Contiv was not mature, so it was not studied in depth.

Q: Are there any good ways to learn from these network solutions?

A: Although the network is very complex, it is always the same. The term container network has been popular in recent years because the network has encountered some challenges in the container environment, but the concept of the essence of the network is still very mature in the past. So first you have to learn the basic knowledge of the network, and then look at the pain points caused by the rapid elasticity of the container environment.

Q: How does TC work?

A: This has been implemented for a long time. It has been done as early as when we focused on supporting Calico. Some details are vague, but basically implemented through Linux tc, because the essence is veth pair, so the speed limit can be implemented on the host side veth side. Basic speed limit command can be found tc mechanism can be, we encountered speed limit is not accurate, finally also solved by adjusting parameters, error control in a few percent.

Q: Is there a comparison with Kube-OVN?

A: Kube-OVN is an open source product of a friend, I know about it. First of all, Kube-OVN and Fabric projects are developed based on OVS, both support Overlay/Underlay mode, and both can implement CNI protocol. However, the difference was still relatively large. The OVN project originates from OpenStack. The network model in OpenStack is very heavy, with many concepts and components. OVN is also trying to unify the network model of Kubernetes/OpenStack. Therefore, some capabilities in Kube-OVN are actually outside the scope of CNI spec, such as Load Balancer, DNS, etc., which are actually implemented in the community. Fabric will be much simpler, it is a standard CNI implementation, the network model is also very clear, the container can be directly integrated into the existing network environment, the enterprise network management can generally control, and the compatibility with existing systems such as security supervision is relatively good.

The network is very complex, it is difficult to have a unified model to take into account all the scenarios, personally think this is also the Kubernetes community smart place, these complex, not standard things abstract out, to a third party to do. It is also because of the simplicity of CNI protocol and the complexity of the network, CNI can now say that hundreds of schools of thought contend in the market, each with its own strengths. Personally, I think this is a very good phenomenon. The decision on which CNI to use depends on the circumstances of the enterprise itself.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Servers

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report