Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

Introduction and deployment of K8s Network component Flannel

2025-04-05 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Servers >

Share

Shulou(Shulou.com)06/03 Report--

Flannel

Flannel is a network component, which can assign different subnets to different node nodes and realize the cross-machine communication between containers, thus realizing the communication at the whole kubenets level. Thus, the component runs on the node node, and the dependent component is etcd,docker.

In fact, the main problem solved by k8s network components flannel and calico is the communication of container network between K8s nodes. If flannel wants to ensure that the IP of each pod is unique, how to ensure that it is unique? the practice of most components is to assign a unique subnet on each Node. Node1 is a separate subnet, node2 is a separate subnet, and it can be understood that it is different network segments and different vlan, so each node is a subnet. So flannel will set a large subnet in advance, and then assign subnets on each node. This information will be stored in etcd by flannel, and each subnet will be bound to node with a relationship record, and then facilitate the next second packet transmission, and flannel will start a daemon on node and run it. The daemon will mainly maintain local routing rules and maintain the information in etcd.

1. Flannel deployment

Https://github.com/coreos/flannel kubectl apply-f https://raw.githubusercontent.com/coreos/flannel/master/Documentation/kube-flannel.yml

After deployment, a pod will be launched on each node in the form of daemonset to start a flannel daemon, which is mainly responsible for setting the local routing table and the data in the etcd. The local subnet reports to the etcd, so the daemon is very important.

You can set large subnets and attribute modes in the flannel configuration file.

Net-conf.json: | {"Network": "10.244.0.0 vxlan 16", "Backend": {"Type": "vxlan"}}

After this configuration is completed, it will be placed in the cni directory. Because flannel uses the bridge mode to realize the communication of packets from the same node to the host, the subnet information is not written in this configuration file, but is placed under this cat / var/run/flannel/subnet.env. The ip assigned by the device can also be seen through ip a, and each node will be assigned a subnet. The network interface device is cni0, that is, 255small subnets can be assigned to a node.

[root@k8s-node2 ~] # cat / var/run/flannel/subnet.env FLANNEL_NETWORK=10.244.0.0/16FLANNEL_SUBNET=10.244.0.1/24FLANNEL_MTU=1450FLANNEL_IPMASQ=true

There is also a cni binary file, / opt/cni/bin, which is that kubelet calls this binary interface to create network information for each pod created, and takes the IP from our configured subnet

The modification of the configuration is to set its subnet and working mode in advance, and in addition, this network cannot conflict with the internal network of K8s itself, otherwise it will lead to network failure.

2. Working mode and principle of Flannel.

Flannel supports multiple data forwarding methods:

UDP: the earliest supported method, which has been deprecated because of its worst performance.

VXLAN:Overlay Network scheme, in which the source packet is encapsulated in another network packet for routing, forwarding and communication

This is also the virtualization technology of the network, that is, there used to be a packet, an active IP and a destination IP, but in some cases the packet could not reach the destination address, so it may be encapsulated with the help of a physical Ethernet network, and then transmitted to the destination address through this physical network. This is a superimposed network in which there are two kinds of packets. This is also called the tunnel scheme.

Through the Agent process on each node, Host-GW:Flannel brushes the routing information of the container network to the routing table of the host, so that all hosts have the routing data of the entire container network, so it knows that the packet arrives at this node and is forwarded to this machine, that is, between routing tables, which is also called routing scheme.

VXLAN

It is supported by default when using kubeadm deployment

Kubeadm deployment specifies Pod network segment kubeadm init-pod-network-cidr=10.244.0.0/16

But if you use binary deployment, you have to start cni support. By default, all k8s clusters deployed by ansible are started.

Binary deployment assignment

Cat / opt/kubernetes/cfg/kube-controller-manager.conf--allocate-node-cidrs=true\ allows node to automatically assign cidr as a network-- cluster-cidr=10.244.0.0/16\ specifies the network segment of the pod network, which corresponds to the network segment of flannel

In addition, cni should be supported on the configuration file of kubelet of each node node.

[root@k8s-node1 ~] # cat / opt/kubernetes/cfg/kubelet.conf-- network-plugin=cni\

In this way, the network can be configured for K8s according to the standard of cni.

Kube-flannel.ymlnet-conf.json: | {"Network": "10.244.0.0 vxlan 16", "Backend": {"Type": "vxlan"}}

There is a container on node 1 that communicates with the container on node 2. These two are for cross-host communication. If the local communication uses a bridge to use layer 2 transmission, like the native docker network, it can be solved. The most important thing is the packet transmission of these two nodes.

Flannel ensures that each node is a unique ip, which assigns a subnet to each node

You can see that the flannel is created based on the host. It creates a separate subnet for each node and assigns ip to the current pod.

[root@k8s-master1] # kubectl get pod-n kube-system-o widekube-flannel-ds-amd64-4jjmm 1 Running 0 14d 10.4.7.11 k8s-master1 kube-flannel-ds-amd64- 9f9vq 1 Running 0 14d 10.4.7.21 k8s-node2 kube-flannel-ds-amd64- Gcf9s 1/1 Running 0 14d 10.4.7.12 k8s-node1

In order to be able to tunnel on the layer 2 network, VXLAN will set up a special network device on the host as both ends of the tunnel. This device is called VTEP, or VXLAN Tunnel End Point (Virtual Tunnel Endpoint). The device in flannel.1 below is the VTEP device required by VXLAN. The schematic diagram is as follows:

How does vxlan work?

Vxlan is a tunnel technology supported on Linux, which is the communication between point-to-point and end-to-end devices. In fact, the implementation of vxlan is that a vtep device does packet encapsulation and de-encapsulation, but now it has been encapsulated in the flannel.1 process, that is, this virtual network card contains veth to use to encapsulate and unencapsulate the vxlan.

Now, the pod1 on the Node1 node is 1.10, and now we need to communicate with the 2.10 of pod 2 on the Node2 node. They are not on the same network. When the packet is sent, the network card eth0 of the pod1 container first exits the network card, and then connects the veth. This is like a network cable, etch0 is one end, veth is one end, that is, veth is the other end of the device.

The veth is on the host, so the host gets the packet of the container, and then the packet reaches the bridge. The bridge is also like a layer 2 switch. All the containers will be added to the bridge. You can see through yum-y install bridge-utils whether the other end of the veth is added to the bridge of cni, which was created by flannel. And the bridge has a separate mac address and IP can see

[root@k8s-node2 ~] # brctl show cni0bridge name bridge id STP enabled interfacescni0 8000.4a025e87aa87 no veth08925d5a veth3591a36f veth776a1e86 veth718beeac veth81dadcbd veth8a96f11c veth8c90fdb6 Veth8f350182 veth90818f0b vetha471152b

This is what flannel assigned and added to the bridge when we created the pod. Behind this, there is an interfaces with this interface, which is equivalent to the interface of the switch. This is the virtual network card on the host. If you are local, you can go directly to this bridge to find it, and then you can send an ARP broadcast packet for packet transmission. Cni0 is equivalent to a layer 2 switch to help you spread. Find the destination mac to respond, so the same node can go directly to the bridge, so the destination address is not in this bridge, just like 2.10.The current node does not know where the pod on 2.10 is, so it can only walk the routing table, that is, when it does not necessarily have the destination address, it will go to the default gateway, so the flannel will generate a lot of routing tables on the host and can be seen through ip router.

[root@k8s-node2 ~] # ip route default via 10.4.7.1 dev eth0 proto static metric 100 10.4.7.21 metric / 10.244.0.0 dev flannel.1 onlink 24 dev cni0 proto kernel scope link src 10.244.0.1 / 10.244.1.0 dev flannel.1 onlink / 10.244.2.0 dev flannel.1 24 via 10.244.2.0 dev flannel.1 Onlink / 172.17.0.0/16 dev docker0 proto kernel scope link src 172.17.0.1

Deploy the routing table generated by docker. The docker0 bridge here is not used. When deploying flannel, you use your own bridge by default. The principle of this is the same as that of flannel, except that flannel uses its own, and also to facilitate its own processing of packets.

The routing table here is recorded, so it will find out which is the IP address of the destination address 2.10pod2, so it will send it to this device of flannel according to this routing table. The flannel is in vxlan mode, and vxlan needs to encapsulate and decapsulate the data of veth, so flannel gives this packet to vxlan, while vxlan is a kernel-level driver, which encapsulates the packet because vxlan itself works on layer 2. It also requires the mac address of the destination

Then you can check the mac address through ip neigh show dev flannel.1.

[root@k8s-node2 ~] # ip neigh show dev flannel.110.244.2.0 lladdr ea:ca:d6:62:be:21 PERMANENT10.244.1.0 lladdr 4e:e3:fa:5f:d2:34 PERMANENT

And the vxlan implementation of flannel.1 has a vtep device to encapsulate and decapsulate the data packet. Because it encapsulates the packet at layer 2, it needs to know the destination mac address. Then the destination mac is provided to the vetp,flannel by flannel to store the gateway corresponding to the next hop. Then the gateway is definitely not local. When we get the destination mac address, you encapsulate it into a complete frame. It does not have much practical significance for the host, because this packet frame cannot be sent out, and if you press layer 2, you will not be able to reach another node, because in different subnets, you will not be able to communicate without the intervention of routing. Next, you need the data frame of the linux kernel to encapsulate a common data frame of the host, that is, udp encapsulates an ordinary data frame. That is, add a layer of udp packets on top of this, so that the packet can be transmitted directly to the host of the destination container.

Vxlan is the udp protocol used, which puts the original message inside and outside the source IP and destination address encapsulated by udp

[root@k8s-master1 ~] # bridge fdb show dev flannel.1a6:a4:e5:5d:19:9b dst 10.4.7.21 self permanentea:ca:d6:62:be:21 dst 10.4.7.12 self permanent

You can see that the MAC address of the other flannel.1 used above corresponds to the host IP, which is the destination to which the UDP is to be sent. Use this destination IP for encapsulation.

That is, these flannel are all known. Why does flannel maintain the data of etcd and guard local routing rules? in fact, the data of etcd should be written to etcd with flannel, and a copy of it is stored by each node, so the mac address is obtained according to this address, and then this is a complete package. The udp package encapsulated by vxlan has two IP packets in this udp package. The udp can be sent directly to the node of the node2, and the packet has been transmitted, because the host hosts are on the same network segment. After arriving at 31.63, after receiving the udp packet, the packet will be split, and the original packet will be taken out. So there is a mark of vxlan, and the flannel itself is handled by vtep, so a mark is marked on the dry packet, that is, the mark of vxlan header. First hit the head of vxlan, then this means that this is a vxlan packet, and a VNI number is added. VNI is to distinguish the point-to-point tunnel of vxlan, and multiple packets are also numbered separately, and this number is referenced by flannel, so this is an internal number, confirm that the packet is correct, and then give it to the flannel.1 device, which processes the packet. If you get the source IP and the destination IP, you will find that this is the cni bridge, so it puts it on the cni bridge according to the routing table, splits the destination address according to the routing table, and happens to match the destination address, so it will forward this to the cni bridge. To cni is the same as before, which is equivalent to a layer 2 switch. If you get this packet, it will broadcast an ARP. Found that it was in this bridge, and then forwarded the packet.

From this point of view, when vxlan uses overlay network to unpack and unpack packets, its performance degrades a lot.

Summary:

Container routing: according to the routing table, the container issues / # ip routedefault via 10.244.0.1 dev eth0 10.244.0.0 dev eth0 24 dev eth0 scope link src 10.244.0.45 10.244.0.0 dev eth0 host route: the packet enters the host virtual network card cni0, and is forwarded to the flannel.1 virtual network card according to the routing table, that is, it comes to the entrance of the tunnel. Ip routedefault via 192.168.31.1 dev ens33 proto static metric 100 10.244.0.0 dev ens33 proto static metric 24 dev cni0 proto kernel scope link src 10.244.0.1 10.244.1.0 dev flannel.1 onlink VXLAN 24 via 10.244.1.0 dev flannel.1 onlink 10.244.2.0 dev flannel.1 onlink VXLAN package: to form a layer 2 network between these VTEP devices (layer 2), you must know the destination MAC address. Where can I get this MAC address? In fact, after the flanneld process starts, other node ARP records are automatically added, which can be viewed through the ip command, as follows: ip neigh show dev flannel.110.244.1.0 lladdr ca:2a:a4:59:b6:55 PERMANENT10.244.2.0 lladdr d2:d0:1b:a7:a9:cd PERMANENT secondary packet: after knowing the destination MAC address and encapsulating layer 2 data frames (container source IP and destination IP) This frame does not make any practical sense for the host network. Next, the Linux kernel will further encapsulate this data frame into a common data frame of the host network, so that it can carry the internal data frame and transmit through the host's eth0 network card.

Package to UDP package send out: can you send UDP package directly now? So far, we only know the MAC address of the flannel.1 device on the other end, but we don't know what the corresponding host address is.

The flanneld process also maintains a forwarding database called FDB, which can be viewed through the bridge fdb command:

Bridge fdb show dev flannel.1d2:d0:1b:a7:a9:cd dst 192.168.31.61 self permanentca:2a:a4:59:b6:55 dst 192.168.31.63 self permanent

You can see that the MAC address of the other flannel.1 used above corresponds to the host IP, which is the destination to which the UDP is to be sent. Use this destination IP for encapsulation.

The packet arrives at the destination host: Node1's eth0 network card is sent, finds that it is a VXLAN packet, and gives it to the flannel.1 device. The flannel.1 device will further unpack the packet, take out the original layer 2 data frame packet, send the ARP request, and forward it to the container via the cni0 bridge.

Host-GW

Host-gw mode is much simpler than vxlan, adding routes directly, using the destination host as a gateway and directly routing the original packets.

Switch to host-gw mode, and the above forwarding is the same. The network card of the pod1 container first connects the veth to the host, and then reaches the bridge of the cni0. This bridge is equivalent to a layer 2 switch. After the packet reaches the bridge of the cni, that is, it arrives at the host, then the host's network protocol stack will decide which gateway to forward according to the routing table. Because its destination IP address is not the same as the network segment, it must walk by the table, and it will judge according to the routing table that the destination address is 2.10, that is, the packet from this data is forwarded to its next hop, that is, the gateway, and is forwarded to 31.63 through the interfaces, that is, directly according to the host network, because the packet is processed by the host. Therefore, if the host wants to access 31.63, it will re-encapsulate the packet with the destination address of 31.63. it determines that the gateway of the next hop of 31.63 is the same subnet and is a layer 2 transmission, and the layer 2 transmission needs to obtain the destination mac address. If it does not know the mac address of 31.63 locally, it will send an ARP broadcast packet and know the mac of the other party. So after the transmission of layer 2 to 31.63, after receiving the packet, it will judge the routing table, then enter the bridge of cni, and layer 2 will forward it to the container.

The most important two is that host-gw treats each node as a gateway. It will join other nodes and set them as gateways. When packets arrive at this node, they will be sent to the next hop according to the routing tables, that is, node IP, which is the IP of the same network segment. The data will be forwarded directly to another node through layer 2, and the other node will follow another rule. It is forwarded to the cni bridge according to the destination address, and the cni bridge is forwarded to the container according to layer 2. One is the inflow of data, that is, when the packet arrives at this node, it is sent to whom, this is the inflow of the packet, and one is the outflow of the packet. When the packet from the node should be forwarded to which node, these are maintained by flannel.

The limitation of this is that each node can be connected at layer 2, otherwise the next hop will not be forwarded, but its performance is much higher than that of vxlan, and there is no need for packet unpacking. This is close to the native, and the performance is also the best.

Here is a schematic diagram:

Kube-flannel.ymlnet-conf.json: | {"Network": "10.244.0.0 host-gw 16", "Backend": {"Type": "host-gw"}}

You can see from the name that hots-gw uses the destination host as a gateway and routes the original packet directly.

Switch vxlan to host-gw mode. After rebuilding, you can see that the routing table changes, and the switching will also affect the network, usually in the dead of night.

The previous routing table is forwarded to the device through flannel.1, that is, it is not necessary to use host-gw,flannel.1 as a device, so vxlan will not be used to de-packet.

When you set flannel to use host-gw mode, flanneld will create the routing table of the node on the host:

Ip routedefault via 192.168.31.1 dev ens33 proto static metric 100 10.244.0.0/24 dev cni0 proto kernel scope link src 10.244.0.1 10.244.1.0/24 via 192.168.31.63 dev ens33 10.244.2.0/24 via 192.168.31.61 dev ens33 192.168.31.0/24 dev ens33 proto kernel scope link src 192.168.31.62 metric 100

The destination IP address belongs to the IP packet of 10.244.1.0 destination 24 network segment and should be sent through the local eth0 device (that is, dev eth0); and its next hop address is 192.168.31.63 (that is, via 192.168.31.63).

Once the next-hop address is configured, when the IP packet is encapsulated into a frame from the network layer into the link layer, the eth0 device uses the MAC address corresponding to the next-hop address as the destination MAC address of the data frame.

When the Node 2 kernel network stack gets the IP packet from the layer 2 data frame, it will "see" that the destination IP address of the IP packet is 10.244.1.20, that is, the IP address of container-2. At this point, according to the routing table on Node 2, the destination address matches to the second routing rule (that is, the routing rule corresponding to 10.244.1.0), thus entering the cni0 bridge and then into the container-2.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Servers

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report