Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

What is Kubernetes Cluster Network

2025-01-16 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Development >

Share

Shulou(Shulou.com)06/03 Report--

This article mainly explains "what is Kubernetes cluster network". Interested friends may wish to have a look at it. The method introduced in this paper is simple, fast and practical. Let's let the editor take you to learn "what is Kubernetes Cluster Network"?

In Kubernetes, network is very important to ensure the network interworking between containers. On the other hand, Kubernetes itself does not implement the container network by itself, but accesses it freely through plug-in. To connect to the container network, the following basic principles need to be met:

Pod can communicate directly with each other no matter which node it runs on, without the need for NAT address translation.

Node and Pod can communicate with each other, and Pod can access any network without restriction.

Pod has a separate network stack. Pod sees that its own address should be the same as that seen externally, and all containers in the same Pod share the same network stack.

Basic Container Network

The network stack of a Linux container is isolated in its own Network Namespace. Network Namespace includes: network card (Network Interface), loop device (Lookback Device), routing table (Routing Table) and iptables rules. For the service process, these build its request and the corresponding basic environment. The realization of a container network cannot be achieved without the following Linux network features:

Network namespaces: separate network protocol stacks are isolated into different command spaces and cannot communicate with each other

Veth Pair:Veth device pairs are introduced to achieve communication in different network namespaces, which always appear in pairs in the form of two virtual network cards (veth peer). And data sent from one end can always be received at the other end.

Iptables/Netfilter:Netfilter is responsible for executing all kinds of hanging rules (filtering, modifying, discarding, etc.) in the kernel and running in the kernel; Iptables mode is a process running in user mode, which is responsible for maintaining various rule tables of Netfilter in the kernel; through the cooperation of the two, the flexible packet processing mechanism in the whole Linux network protocol stack is realized.

Bridge: a bridge is a layer 2 network virtual device, similar to a switch. Its main function is to forward data frames to different ports of the bridge through learned Mac addresses.

Routing: the Linux system includes a complete routing function. When the IP layer handles data sending or forwarding, it uses the routing table to decide where to send it.

Based on the above, how to communicate with the container time of the host?

We can simply think of them as two hosts, which are connected by network cables. if we want multiple hosts to communicate, we can communicate with each other through the switch. in linux, we can forward data through the bridge.

In the container, the above implementation is through the docker0 bridge, which can be used to communicate with containers connected to the docker0. For the container to connect to the docker0 bridge, we also need a virtual device like a network cable, Veth Pair, to connect the container to the bridge.

We start a container:

Docker run-d-name C1 hub.pri.ibanyu.com/devops/alpine:v3.8 / bin/sh

Then check the network card device:

Docker exec-it C1 / bin/sh / # ifconfig eth0 Link encap:Ethernet HWaddr 02:42:AC:11:00:02 inet addr:172.17.0.2 Bcast:172.17.255.255 Mask:255.255.0.0 UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 RX packets:14 errors:0 dropped:0 overruns:0 frame:0 TX packets:0 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:0 RX bytes:1172 (1.1 KiB) TX bytes: 0 (0.0B) lo Link encap:Local Loopback inet addr:127.0.0.1 Mask:255.0.0.0 UP LOOPBACK RUNNING MTU:65536 Metric:1 RX packets:0 errors:0 dropped:0 overruns:0 frame:0 TX packets:0 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:1000 RX bytes:0 (0.0B) TX bytes:0 (0.0B) / # route-n Kernel IP routing table Destination Gateway Genmask Flags Metric Ref Use Iface 0 .0.0.0 172.17.0.1 0.0.0.0 UG 0 0 0 eth0 172.17.0.0 0.0.0.0 255.255.0.0 U 0 0 0 eth0

You can see that there is an eth0 network card, which is the virtual network card at one end of the veth peer. Then use route-n to view the routing table in the container, and eth0 is also the default route exit. All requests for the 172.17.0.0amp 16 segment will be sent out of the eth0.

Let's take a look at the other end of the Veth peer, where we look at the host's network equipment:

Ifconfig docker0: flags=4163 mtu 1500 inet 172.17.0.1 netmask 255.255.0.0 broadcast 172.17.255.255 inet6 fe80::42:6aff:fe46:93d2 prefixlen 64 scopeid 0x20 ether 02:42:6a:46:93:d2 txqueuelen 0 (Ethernet) RX packets 0 bytes 0 (0.0B) RX errors 0 dropped 0 overruns 0 frame 0 TX packets 0 bytes 656 (656.0 B) TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0 eth0: flags=4163 mtu 1500 inet 10.100.0.2 Netmask 255.255.255.0 broadcast 10.100.0.255 inet6 fe80::5400:2ff:fea3:4b44 prefixlen 64 scopeid 0x20 ether 56:00:02:a3:4b:44 txqueuelen 1000 (Ethernet) RX packets 7788093 bytes 9899954680 (9.2 GiB) RX errors 0 dropped 0 overruns 0 frame 0 TX packets 5512037 bytes 9512685850 (8.8 GiB) TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0 lo: flags=73 mtu 65536 inet 127.0.0.1 netmask 255.0.0.0 inet6:: 1 prefixlen 128 Scopeid 0x10 loop txqueuelen 1000 (Local Loopback) RX packets 32 bytes 2592 (2.5KiB) RX errors 0 dropped 0 overruns 0 frame 0 TX packets 32 bytes 2592 (2.5 KiB) TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0 veth30b3dac: flags=4163 mtu 1500 inet6 fe80::30e2:9cff:fe45:329 prefixlen 64 scopeid 0x20 ether 32:e2:9c:45:03:29 txqueuelen 0 (Ethernet) RX packets 0 bytes 0 (0.0B) RX errors 0 dropped 0 overruns 0 frame 0 TX packets 8 bytes .0B) TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0

We can see that at the other end of the Veth peer corresponding to the container is a virtual network card on the host called veth30b3dac, and you can view the bridge information through brctl to see that the network card is on the docker0.

# brctl show docker0 8000.02426a4693d2 no veth30b3dac

Then we start a container to see if we can ping the second container from the first container.

$docker run-d-- name c2-it hub.pri.ibanyu.com/devops/alpine:v3.8 / bin/sh $docker exec-it C1 / bin/sh / # ping 172.17.0.3 PING 172.17.0.3 (172.17.0.3): 56 data bytes 64 bytes from 172.17.0.3: seq=0 ttl=64 time=0.291 ms 64 bytes from 172.17.0.3: seq=1 ttl=64 time=0.129 ms 64 bytes from 172.17.0.3: seq=2 ttl=64 Time=0.142 ms 64 bytes from 172.17.0.3: seq=3 ttl=64 time=0.169 ms 64 bytes from 172.17.0.3: seq=4 ttl=64 time=0.194 ms ^ C-172.17.0.3 ping statistics-5 packets transmitted 5 packets received, 0% packet loss round-trip min/avg/max = 0.129 ms 0.185 ms

As you can see, the principle of being able to ping is that when we ping the destination IP172.17.0.3, it will match to the second rule of our routing table, the gateway is 0.0.0.0, which means that it is a directly connected route, which is forwarded to the destination through layer 2.

To reach 172.17.0.3 through the layer 2 network, we need to know its Mac address, which requires the first container to send an ARP broadcast to find the Mac through the IP address. At this time, the other section of the Veth peer is the docker0 bridge, which will be broadcast to all the veth peer virtual network cards connected to it, and then the correct virtual network card will respond to the ARP message when it is received, and then the bridge will return to the first container.

Above, different containers of the same host communicate through docker0, as shown in the following figure:

By default, the container process restricted by network namespace essentially realizes the data exchange between different network namespace by means of Veth peer devices and host bridges.

Similarly, when you visit the IP address of a container on a host, the requested packet first arrives at the docker0 bridge according to routing rules, then is forwarded to the corresponding Veth Pair device, and finally appears in the container.

Cross-host network communication

Under the default configuration of Docker, it is impossible for containers on different hosts to access each other through IP addresses. In order to solve this problem, there are many network solutions in the community. At the same time, in order to better control the access of the network, K8s launched CNI, which is the API interface of the container network. It is a standard interface in K8s to call network implementation. Kubelet uses this API to call different network plug-ins to achieve different network configurations. CNI plug-in implements this interface, which implements a series of CNI API interfaces. At present, there are flannel, calico, weave, contiv and so on.

In fact, CNI's container network communication process is the same as the previous basic network, except that CNI maintains a separate bridge to replace docker0. The name of this bridge is called CNI Bridge, and its device name on the host is cni0 by default. The design idea of cni is that after Kubernetes starts the Infra container, it can directly call the CNI network plug-in to configure the Network Namespace of the Infra container to meet the expected network stack.

There are three network implementation modes of CNI plug-in:

Overlay mode is based on tunnel technology, the whole container network and host network are independent, when containers communicate across hosts, the entire container network is encapsulated into the underlying network, and then unencapsulated and transferred to the target container when it reaches the target machine. Does not depend on the implementation of the underlying network. Plug-ins implemented include flannel (UDP, vxlan), calico (IPIP), and so on.

In the layer 3 routing pattern, containers and hosts also belong to unreachable network segments. The interconnection of containers is mainly based on the routing table, and there is no need to establish tunnel packets between hosts. However, the restrictions must depend on the second layer of the same local area network. Plug-ins implemented include flannel (host-gw), calico (BGP), and so on.

Underlay network is the underlying network, which is responsible for interconnection. Container network and host network still belong to different network segments, but they are in the same layer network and in the same position. The three layers of the whole network are interconnected, and there is no restriction of the second layer, but it needs to rely on the implementation support of the underlying network. The plug-ins implemented are calico (BGP) and so on.

Let's take a look at one implementation of route patterns, flannel Host-gw:

As shown in the figure, when container-1 on node1 sends data to container2 on node2, the following routing table rules will be matched:

10.244.1.0/24 via 10.168.0.3 dev eth0

For IP packets destined for the destination network segment 10.244.1.0 to 24, the next hop ip address that needs to be sent via native eth0 is 10.168.0.3 (node2). After arriving at 10.168.0.3, the cni bridge is forwarded through the routing table to enter the container2.

You can see how host-gw works. In fact, the next hop configured from each node node to each pod network segment is the mapping relationship between the node node IP,pod network segment where the pod network segment is located and the node node ip, and the flannel is stored in etcd or k8s. Flannel only needs to watch the changes of these data to update the routing table dynamically.

The biggest advantage of this network model is to avoid the loss of network performance caused by additional packets and unpacking. The disadvantage we can also see is that when the container ip packet goes out through the next hop, the layer 2 communication must be encapsulated into a data frame and sent to the next hop. If it is not in the same layer 2 LAN, it will be handed over to the layer 3 gateway, where the gateway does not know the destination container network (you can also statically configure pod segment routing at each gateway). Therefore, flannel host-gw must require the cluster host to be layer 2 interconnected.

In order to solve the limitation of layer 2 interconnection, the network solution provided by calico can be better implemented. The layer 3 network model of calico is similar to that provided by flannel, and routing rules in the following format are added to each host:

Via dev eth0

The IP address of the gateway is not accessible in different scenarios. If the host is layer 2 reachable, it is the IP address of the host where the destination container is located. If it is a layer 3 different LAN, it is the gateway IP (switch or router address) of the host.

Unlike flannel, which maintains native routing information through data stored in K8s or etcd, calico distributes routing information throughout the cluster through BGP dynamic routing protocol.

BGP, the full name of Border Gateway Protocol Border Gateway Protocol, is natively supported by linxu and is designed to pass routing information between different autonomous systems in large-scale data centers. Just remember that the simple understanding of BGP is actually a protocol to realize the synchronous sharing of node routing information in large-scale networks. The BGP protocol can replace flannel to maintain the host routing table.

Calico consists of three main parts:

Calico cni plug-in: mainly responsible for docking with kubernetes for kubelet calls.

Felix: responsible for maintaining the routing rules and FIB forwarding information base on the host.

BIRD: responsible for distributing routing rules, similar to routers.

Confd: configuration management component.

In addition, calico differs from flannel host-gw in that it does not create bridge devices, but maintains traffic for each pod through a routing table, as shown in the following figure:

You can see that the cni plug-in of calico sets a veth pair device for each container, and then connects the other end to the host network space. Since there is no bridge, the cni plug-in also needs to configure a routing rule for the veth pair device of each container on the host to receive incoming IP packets. The routing rules are as follows:

10.92.77.163 dev cali93a8a799fe1 scope link

The above indicates that the IP packet sending 10.92.77.163 should be sent to the cali93a8a799fe1 device and then to another container.

With such a veth pair device, the IP packet sent by the container will reach the host through the veth pair device, and then the host will send it to the correct gateway (10.100.1.3) according to the regular next address of the path, and then reach the destination host, and then reach the destination container.

10.92.160.0/23 via 10.106.65.2 dev bond0 proto bird

These routing rules are configured by felix maintenance, and the routing information is distributed by calico bird components based on BGP. Calico actually treats all the nodes in the cluster as border routers. Together, they form a fully interconnected network and exchange routes with each other through BGP. These nodes are called BGP Peer.

It should be noted that the default mode of the calico maintenance network is node-to-node mesh. In this mode, the BGP client of each host will exchange routes with the BGP client of all nodes in the cluster. In this way, with the increase of the number of nodes N, the connection will grow to the second power of N, which will bring great pressure on the cluster network itself.

Therefore, generally speaking, the recommended cluster size for this model is about 50 nodes, and another RR (Router Reflector) mode is recommended for more than 50 nodes. In this mode, calico can specify several nodes as RR, and they are responsible for establishing communication with all nodes BGP client to learn all the routes in the cluster. Other nodes only need to exchange routes with RR nodes. This greatly reduces the number of connections, and for the stability of the cluster network, it is recommended that RR > = 2.

The above working principle is still to communicate at layer 2. When we have two hosts, one is 10.100.0.2Maple 24, the container network on the node is 10.92.204.0Unix 24, the other is 10.100.1.2Compact 24, and the container network on the node is 10.92.203.0U24. In this case, the two machines need layer 3 routing communication because they are not in the same layer 2. Calico will generate the following routing table on the node:

10.92.203.0/23 via 10.100.1.2 dev eth0 proto bird

At this time, the problem arises, because 10.100.1.2 is not on the same subnet as our 10.100.0.2, so it is impossible to communicate with layer 2. After that, you need to use Calico IPIP mode. When the host is not in the same layer 2 network, it is encapsulated with overlay network and then sent out. As shown in the following figure:

When communicating in non-layer 2 mode in IPIP mode, calico adds the following routing rules to the node node:

10.92.203.0/24 via 10.100.1.2 dev tunnel0

You can see that although the next one is still the IP address of node, the exit device is tunnel0, which is an IP tunnel device, mainly with the IPIP driver implementation of the Linux kernel. The ip packet of the container is directly encapsulated in the IP packet of the host network. After arriving at the node2, the IP packet of the original container is unpacked by the IPIP driver to get the original container, and then sent to the veth pair device through routing rules to reach the destination container.

Although the above can solve the problem of non-layer 2 network communication, it will still lead to performance degradation due to packet encapsulation and unpacking. If calico can let the router devices between hosts also learn the container routing rules, so that they can communicate directly at layer 3. For example, add the following routing table to the router:

10.92.203.0/24 via 10.100.1.2 dev interface1

Node1 adds the following routing table:

10.92.203.0/24 via 10.100.1.1 dev tunnel0

Then the IP packet sent by the container on node1 is sent to the 10.100.1.1 gateway router based on the local routing table, and then the router receives the IP packet to view the destination IP, finds the next hop address through the local routing table and sends it to node2, and finally arrives at the destination container. This scheme can be implemented based on underlay network. As long as the underlying layer supports BGP network, we can establish an EBGP relationship with our RR nodes to exchange routing information in the cluster.

These are several network solutions commonly used by kubernetes. In public cloud scenarios, it is easier to use cloud vendors or flannel host-gw, while in private physical server room environment, Calico project is more suitable. According to their own actual scene, and then choose the appropriate network scheme.

At this point, I believe you have a deeper understanding of "what is Kubernetes cluster network". You might as well do it in practice. Here is the website, more related content can enter the relevant channels to inquire, follow us, continue to learn!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Development

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report