In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-01-19 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Servers >
Share
Shulou(Shulou.com)06/01 Report--
This article shows you the working principle of Flannel in Kubernetes and the example analysis of source code implementation, the content is concise and easy to understand, absolutely can make your eyes bright, through the detailed introduction of this article, I hope you can get something.
Flannel is an open source CNI network plug-in for cereos. The following diagram shows a packet that has been encapsulated, transmitted, and unpacked on the flannel official website. From this picture, you can see that the docker0 of the two machines are in different segments: 10.1.20.1 docker0 24 and 10.1.15.1 Universe 24. If you connect the Backend Service2 pod on another host from Web App Frontend1 pod (10.1.15.2) (10.1.20.3) The network packet is sent from the host 192.168.0.100 to 192.168.0.200. The packet of the inner container is encapsulated into the UDP of the host, and the IP and mac address of the host is wrapped in the outer layer. This is a classic overlay network, because the IP of the container is an internal IP and cannot communicate from a host, so the network of the container needs to be hosted on the host network.
Flannel supports a variety of network modes, commonly used are vxlan, UDP, hostgw, ipip, gce and Aliyun. The difference between vxlan and UDP is that vxlan is a kernel packet, while UDP is a flanneld user-mode program packet, so the performance of UDP is slightly worse. Hostgw mode is a host gateway mode. The gateway from the container to the container on another host is set to the address of the host's network card, which is very similar to calico, except that calico is declared through BGP, while hostgw is distributed through the central etcd, so hostgw is a directly connected mode, which does not need to package and unpack packets through overlay. The performance is relatively high, but the biggest disadvantage of hostgw mode is that it must be in a two-layer network. After all, the route for the next hop needs to be in the neighbor table, otherwise it is impossible to pass.
In the actual production environment, the most commonly used vxlan mode, we first look at the working principle, and then through the source code parsing to achieve the process.
The installation process is very simple and consists of two steps:
Step 1 install flannel
Yum install flannel or start it through kubernetes's daemonset mode and configure the etcd address used by flannel
Step 2: configure the cluster network
Curl-L http://etcdurl:2379/v2/keys/flannel/network/config-XPUT-d value= "{\" Network\ ":\" 172.16.0.0Network 16\ ",\" SubnetLen\ ": 24,\" Backend\ ": {\" Type\ ":\" vxlan\ ",\" VNI\ ": 1}}"
Then start the flanned program for each node.
I. working principle
1. How to assign the container address
When the Docker container starts, it assigns the IP address through docker0, and flannel assigns an IP segment to each machine, which is configured on docker0. After the container starts, select an unoccupied IP in this segment. How does flannel modify the docker0 IP address range?
Take a look at flannel's startup file / usr/lib/systemd/system/flanneld.service first.
[Service] Type=notifyEnvironmentFile=/etc/sysconfig/flanneldExecStart=/usr/bin/flanneld-start $FLANNEL_OPTIONSExecStartPost=/opt/flannel/mk-docker-opts.sh-k DOCKER_NETWORK_OPTIONS-d / run/flannel/docker
The file specifies the flannel environment variable and the mk-docker-opts.sh set by the startup script and the execution script ExecStartPost after startup. The purpose of this script is to generate / run/flannel/docker. The contents of the file are as follows:
DOCKER_OPT_BIP= "- bip=10.251.81.1/24" DOCKER_OPT_IPMASQ= "- ip-masq=false" DOCKER_OPT_MTU= "- mtu=1450" DOCKER_NETWORK_OPTIONS= "- bip=10.251.81.1/24-ip-masq=false-mtu=1450"
And this file is associated with the docker startup file / usr/lib/systemd/system/docker.service
[Service] Type=notifyNotifyAccess=allEnvironmentFile=-/run/flannel/dockerEnvironmentFile=-/etc/sysconfig/docker
In this way, you can set up the bridge for docker0.
In the development environment, there are three machines with the following network segments assigned:
Host-139.245 10.254.44.1/24
Host-139.246 10.254.60.1/24
Host-139.247 10.254.50.1/24
2. How the container communicates
The above describes assigning IP to each container, so how do containers on different hosts communicate? let's use the most common example of vxlan, where there are three key points, a route, an arp, and a FDB. According to the process of sending packets from the container, we analyze the functions of the above three elements one by one. First, the packets coming out of the container will go through docker0, so will they be sent directly from the host network or forwarded through vxlan packets? This is set by routing on each machine.
# ip route show dev flannel.110.254.50.0/24 via 10.254.50.0 onlink10.254.60.0/24 via 10.254.60.0 onlink
You can see that each host has a route to the other two machines. This route is an onlink route. The onlink parameter indicates that the gateway is forced to be "on the link" (although there is no link layer route), otherwise the route of different network segments cannot be added to the linux. In this way, the packet will know that if it is directly accessed by the container, it will be handed over to the flannel.1 device for processing.
Flannel.1, a virtual network device, will packet the data, but here comes the question: what is the mac address of this gateway? Because the gateway is set through onlink, flannel will send this mac address and check the arp table.
# ip neig show dev flannel.110.254.50.0 lladdr ba:10:0e:7b:74:89 PERMANENT10.254.60.0 lladdr 92:f3:c8:b2:6e:f0 PERMANENT
You can see the mac address of this gateway, so that the packets in the inner layer are encapsulated.
Or the last question, what is the destination IP of outgoing packets? In other words, which machine should the encapsulated packet be sent to? Is it difficult to broadcast every packet. The default implementation of vxlan is indeed through broadcast for the first time, but flannel once again uses a hack method to directly distribute the forwarding FDB.
# bridge fdb show dev flannel.192:f3:c8:b2:6e:f0 dst 10.100.139.246 self permanentba:10:0e:7b:74:89 dst 10.100.139.247 self permanent
In this way, the forwarding destination IP of the corresponding mac address can be obtained.
It is also important to note that both the arp table and the FDB table are permanent, which indicates that the write record is maintained manually. The traditional way for arp to obtain neighbors is by broadcasting. If it receives the corresponding arp from the peer, the peer will be marked as reachable. After the reachable setting time is exceeded, the peer failure will be marked as stale, and then the delay and probe will be transferred to the probe status. If the probe fails, it will be marked as Failed. The basic content of arp is introduced because the old version of flannel does not use the method above, but adopts a temporary arp scheme, in which the issued arp indicates the reachable status, which means that if the flannel downtime exceeds the reachable timeout, then the network of the container on this machine will be interrupted. Let's briefly review and try the previous version (0.7.x). In order to obtain the peer arp address, the container will first send an arp inquiry to the kernel. If you try
/ proc/sys/net/ipv4/neigh/$NIC/ucast_solicit
At this point, arp inquiries will be sent to user space.
/ proc/sys/net/ipv4/neigh/$NIC/app_solicit
Previous versions of flannel take advantage of this feature to set the
# cat / proc/sys/net/ipv4/neigh/flannel.1/app_solicit3
Thus, flanneld can get the L3MISS sent by the kernel to user space, and cooperate with etcd to return the mac address corresponding to this IP address, which is set to reachable. It can be seen from the analysis that if the flanneld program exits, the communication between containers will be interrupted, which should be noted here. The startup process of Flannel is shown in the following figure:
Flannel starts the execution of newSubnetManager, and creates background data storage through him. Currently, two backends are supported. The default is etcd storage. If flannel starts to specify the "kube-subnet-mgr" parameter, kubernetes's API is used to store data.
The specific code is as follows:
Func newSubnetManager () (subnet.Manager, error) {if opts.kubeSubnetMgr {return kube.NewSubnetManager (opts.kubeApiUrl, opts.kubeConfigFile)} cfg: = & etcdv2.EtcdConfig {Endpoints: strings.Split (opts.etcdEndpoints, ","), Keyfile: opts.etcdKeyfile, Certfile: opts.etcdCertfile, CAFile: opts.etcdCAFile, Prefix: opts.etcdPrefix, Username: opts.etcdUsername, Password: opts.etcdPassword } / / Attempt to renew the lease for the subnet specified in the subnetFile prevSubnet: = ReadCIDRFromSubnetFile (opts.subnetFile, "FLANNEL_SUBNET") return etcdv2.NewLocalManager (cfg, prevSubnet)}
Through SubnetManager, combined with the data of etcd configured during deployment described above, you can obtain network configuration information, which mainly refers to backend and network segment information. If it is vxlan, create a corresponding network manager through NewManager. Simple engineering mode is used here. First of all, each network mode manager will initialize registration through init.
Such as vxlan
Func init () {backend.Register ("vxlan", New)
If it's udp,
Func init () {backend.Register ("udp", New)}
Similarly, the build methods are registered in a map, and the corresponding network manager is set to be enabled according to the network mode configured by etcd.
3. Register the network
RegisterNetwork, the network card of flannel.vxlanID will be created first. The default vxlanID is 1. 0. Then register the lease with etcd and obtain the corresponding network segment information. This is a detail. Every time the old version of flannel starts, it is to obtain a new network segment. The new version of flannel will traverse the registered etcd information in the etcd to obtain the previously assigned network segment and continue to use it.
Finally, write the local subnet file through WriteSubnetFile
# cat / run/flannel/subnet.env FLANNEL_NETWORK=10.254.0.0/16FLANNEL_SUBNET=10.254.44.1/24FLANNEL_MTU=1450FLANNEL_IPMASQ=true
Use this file to set the network of docker. Careful readers may find that the MTU here is not the 1500 specified by Ethernet, because the outer vxlan packet also occupies 50 Byte.
Of course, after flannel starts, you also need to keep the data in watch etcd. These are the three tables that other flannel nodes can update dynamically when new flannel nodes join or change. The main processing methods are all in handleSubnetEvents.
Func (nw * network) handleSubnetEvents (batch [] subnet.Event) {. . . Switch event.Type {/ / if a new network segment is added (new host joins) case subnet.EventAdded:. . . / / update the routing table if err: = netlink.RouteReplace (& directRoute) Err! = nil {log.Errorf ("Error adding route to% v via% v:% v", sn, attrs.PublicIP, err) continue} / / add arp table log.V (2). Infof ("adding subnet:% s PublicIP:% s VtepMAC:% s", sn, attrs.PublicIP, net.HardwareAddr (vxlanAttrs.VtepMAC) if err: = nw.dev.AddARP (neighbor {IP: sn.IP, MAC: net.HardwareAddr (vxlanAttrs.VtepMAC)}) Err! = nil {log.Error ("AddARP failed:", err) continue} / / add FDB table if err: = nw.dev.AddFDB (neighbor {IP: attrs.PublicIP, MAC: net.HardwareAddr (vxlanAttrs.VtepMAC)}) Err! = nil {log.Error ("AddFDB failed:", err) if err: = nw.dev.DelARP (neighbor {IP: event.Lease.Subnet.IP, MAC: net.HardwareAddr (vxlanAttrs.VtepMAC)}) Err! = nil {log.Error ("DelARP failed:", err)} continue} / / if you delete the practice case subnet.EventRemoved://, delete the route if err: = netlink.RouteDel (& directRoute) Err! = nil {log.Errorf ("Error deleting route to% v via% v:% v", sn, attrs.PublicIP, err)} else {log.V (2) .Infof ("removing subnet:% s PublicIP:% s VtepMAC:% s", sn, attrs.PublicIP Net.HardwareAddr (vxlanAttrs.VtepMAC)) / / Delete arp if err: = nw.dev.DelARP (neighbor {IP: sn.IP, MAC: net.HardwareAddr (vxlanAttrs.VtepMAC)}) Err! = nil {log.Error ("DelARP failed:", err)} / / Delete FDB if err: = nw.dev.DelFDB (neighbor {IP: attrs.PublicIP, MAC: net.HardwareAddr (vxlanAttrs.VtepMAC)}); err! = nil {log.Error ("DelFDB failed:", err)} if err: = netlink.RouteDel (& vxlanRoute) Err! = nil {log.Errorf ("failed to delete vxlanRoute (% s->% s):% v", vxlanRoute.Dst, vxlanRoute.Gw, err)}} default: log.Error ("internal error: unknown event type:", int (event.Type))}
In this way, the addition and deletion of any host in the flannel can be perceived by other nodes, thus updating the local kernel forwarding table.
The above is the example analysis of the working principle and source code implementation of Flannel in Kubernetes. Have you learned the knowledge or skills? If you want to learn more skills or enrich your knowledge reserve, you are welcome to follow the industry information channel.
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.