Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

What is the implementation principle of mainstream Docker network?

2025-01-16 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Servers >

Share

Shulou(Shulou.com)06/01 Report--

This article mainly introduces the implementation principle of the mainstream Docker network, has a certain reference value, interested friends can refer to, I hope you can learn a lot after reading this article, the following let the editor take you to understand it.

A brief introduction to the container network

Container network mainly solves two core problems: one is the IP address allocation of containers, and the other is the communication between containers. This paper focuses on the second problem and mainly studies the cross-host communication of containers.

The simplest way to realize container communication across hosts is to use host network directly. At this time, because container IP is the IP of host, the network protocol stack of multiplexing host and underlay network, the original host can communicate, and the container can communicate naturally. However, the most direct problem is port conflict.

Therefore, the container is usually configured with its own IP address that is different from that of the host. Because the underlying network devices in the IP,underlay plane configured by the container, such as switches and routers, are completely unaware of the existence of these IP, the IP of the container can not be routed directly to achieve cross-host communication.

To solve the above problem, there are two main ways to realize container cross-host communication:

Idea 1: modify the configuration of the underlying network equipment, join the management of the IP address of the container network, modify the router gateway, etc., which is mainly combined with SDN.

Idea 2: do not modify the underlying network device configuration at all, reuse the original underlay plane network, and solve the container cross-host communication, mainly in the following two ways:

Overlay tunnel transmission. Encapsulate the data packet of the container into layer 3 or layer 4 of the original host network, then use the original network to transmit it to the target host using IP or TCP/UDP, and then unpack the packet and forward it to the container. Overlay tunnels, such as Vxlan, ipip, etc., currently use the mainstream container networks of Overlay technology, such as Flannel, Weave and so on.

Modify the host route. The container network is added to the host routing table, and the host is regarded as the container gateway, which is forwarded to the designated host through routing rules to realize the three-layer interworking of the container. At present, the routing technology is used to realize the container inter-host communication network, such as Flannel host-gw, Calico and so on.

Next, this paper will introduce in detail the implementation principle of the current mainstream container network.

Before starting the body content, introduce two scripts that will be used all the time:

The first script is docker_netns.sh:

#! / bin/bash

NAMESPACE=$1

If [[- z $NAMESPACE]]; then

Ls-1 / var/run/docker/netns/

Exit 0

Fi

NAMESPACE_FILE=/var/run/docker/netns/$ {NAMESPACE}

If [[!-f $NAMESPACE_FILE]]; then

NAMESPACE_FILE=$ (docker inspect-f "{{.NetworkSettings.SandboxKey}}" $NAMESPACE 2 > / dev/null)

Fi

If [[!-f $NAMESPACE_FILE]]; then

Echo "Cannot open network namespace'$NAMESPACE': No such file or directory"

Exit 1

Fi

Shift

If [[$#-lt 1]]; then

Echo "No command specified"

Exit 1

Fi

Nsenter-net=$ {NAMESPACE_FILE} $@

The script quickly enters the container's network namespace and executes the corresponding shell command by specifying the container id, name, or namespace.

If no parameters are specified, all network namespaces associated with the Docker container are enumerated.

#. / docker_netns.sh # list namespaces

4-a4a048ac67

Abe31dbbc394

Default

#. / docker_netns.sh busybox ip addr # Enter busybox namespace

1: lo: mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000

Link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00

Inet 127.0.0.1/8 scope host lo

Valid_lft forever preferred_lft forever

354: eth0@if355: mtu 1450 qdisc noqueue state UP group default

Link/ether 02:42:c0:a8:64:02 brd ff:ff:ff:ff:ff:ff link-netnsid 0

Inet 192.168.100.2/24 brd 192.168.100.255 scope global eth0

Valid_lft forever preferred_lft forever

356: eth2@if357: mtu 1500 qdisc noqueue state UP group default

Link/ether 02:42:ac:12:00:02 brd ff:ff:ff:ff:ff:ff link-netnsid 1

Inet 172.18.0.2/16 brd 172.18.255.255 scope global eth2

Valid_lft forever preferred_lft forever

The other script is find_links.sh:

#! / bin/bash

DOCKER_NETNS_SCRIPT=./docker_netns.sh

IFINDEX=$1

If [[- z $IFINDEX]]; then

For namespace in $($DOCKER_NETNS_SCRIPT); do

Printf "e [1x 31m% s: e [0m"

"$namespace

$DOCKER_NETNS_SCRIPT $namespace ip-c-o link

Printf "

"

Done

Else

For namespace in $($DOCKER_NETNS_SCRIPT); do

If $DOCKER_NETNS_SCRIPT $namespace ip-c-o link | grep-Pq "^ $IFINDEX:"; then

Printf "e [1x 31m% s: e [0m"

"$namespace

$DOCKER_NETNS_SCRIPT $namespace ip-c-o link | grep-P "^ $IFINDEX:"

Printf "

"

Fi

Done

Fi

The script looks up the namespace where the virtual network device is located based on ifindex:

#. / find_links.sh 354abe31dbbc394:354: eth0@if355: mtu 1450 qdisc noqueue state UP mode DEFAULT group default link/ether 02:42:c0:a8:64:02 brd ff:ff:ff:ff:ff:ff link-netnsid 0

The purpose of this script is to make it easy to find the namespace location of the opposite end of veth. If ifindex is not specified, all link devices for namespaces are listed.

2. Docker native Overlay

Laurent Bernaille introduces the implementation principle of Docker native Overlay network in detail on DockerCon2017. The author also summarizes three practical information articles to analyze the implementation principle of Docker network step by step, and finally teaches you to manually implement Docker network from scratch step by step. These three articles are:

Deep dive into docker overlay networks part 1

Deep dive into docker overlay networks part 2

Deep dive into docker overlay networks part 3

It is recommended for interested readers to read, this section also has a lot of reference to the contents of the above three articles.

2.1 Overlay network environment

The test uses two Node nodes:

Node name host IPnode-1192.168.1.68node-2192.168.1.254

First create an overlay network:

Docker network create-d overlay-- subnet 10.20.0.0Plus 16 overlay

Create two busybox containers on each node:

Docker run-d-name busybox-net overlay busybox sleep 36000

The container list is as follows:

Node name Host IP Container IPnode-1192.168.1.6810.20.0.3/16node-2192.168.1.25410.20.0.2/16

We found that the container has two IP, of which eth0 10.20.0.0 IP 16 is the Overlay network ip we created, and the two containers can communicate with each other. Containers that are not in the same node have an IP eth2 of 172.18.0.2, so 172.18.0.0 node 16 obviously cannot be used for cross-host communication, only for single node container communication.

2.2 North-South container flow

The north-south traffic here mainly refers to the traffic that the container communicates with the outside, such as the container accessing the Internet.

Let's look at the route of the container:

# docker exec busybox-node-1 ip rdefault via 172.18.0.1 dev eth210.20.0.0/16 dev eth0 scope link src 10.20.0.3172.18.0.0/16 dev eth2 scope link src 172.18.0.2

From this, we can see that the default gateway of the container is 172.18.0.1, which means that the container goes out through eth2:

# docker exec busybox-node-1 ip link show eth277: eth2@if78: mtu 1500 qdisc noqueue link/ether 02:42:ac:12:00:02 brd ff:ff:ff:ff:ff:ff#. / find_links.sh 78default:78: vethf2de5d4@if77: mtu 1500 qdisc noqueue master docker_gwbridge state UP mode DEFAULT group defaultlink/ether 2e:6a:94:6a:09:c5 brd ff:ff:ff:ff:ff:ff link-netnsid 1

Use the find_links.sh script to find the link with ifindex 78 in the default namespace, and the master of the link is docker_gwbridge, which means the device is attached to the docker_gwbridgebridge.

# brctl showbridge name bridge id STP enabled interfacesdocker0 8000.02427406ba1a nodocker_gwbridge 8000.0242bb868ca3 no vethf2de5d4

172.18.0.1 is the IP of bridge docker_gwbridge, which means that docker_gwbridge is the gateway to all containers of that node.

Since the IP of the container is 172.18.0.0GP16 private IP address range, which cannot go out of the public network, it is necessary to translate the container IP into the CVM IP address through NAT. Check the iptables nat table as follows:

# iptables-save-t nat | grep -'- A POSTROUTING'-A POSTROUTING-s 172.18.0.0 MASQUERADE 16!-o docker_gwbridge-j MASQUERADE

This verifies that the container is going out through NAT.

We found that in fact, the north-south traffic of the container is actually the most native bridge network model of Docker, only replacing docker0 with docker_gwbridge. If the container does not need to go out of the Internet, you can specify the-- internal parameter when creating the Overlay network. In this case, the container has only one ENI of the Overlay network, and no eth2 will be created.

2.3 East-west flow of the container

Container east-west traffic refers to the communication between containers, especially between containers across hosts.

Obviously the container communicates with other containers through eth0:

# docker exec busybox-node-1 ip link show eth0

75: eth0@if76: mtu 1450 qdisc noqueue

Link/ether 02:42:0a:14:00:03 brd ff:ff:ff:ff:ff:ff

#. / find_links.sh 76

1-19c5d1a7ef:

76: veth0@if75: mtu 1450 qdisc noqueue master br0 state UP mode DEFAULT group default link/ether 6a:ce:89:a2:89:4a brd ff:ff:ff:ff:ff:ff link-netnsid 1

Eth0's peer device ifindex is 76, through the find_links.sh script to find ifindex 76 under 1-19c5d1a7ef namespace, the name is veth0, and master is br0, so veth0 is hung under br0 bridge.

Through the docker_netns.sh script, you can quickly enter the specified namespace to execute commands:

#. / docker_netns.sh 1-19c5d1a7ef ip link show veth0

76: veth0@if75: mtu 1450 qdisc noqueue master br0 state UP mode DEFAULT group default

Link/ether 6a:ce:89:a2:89:4a brd ff:ff:ff:ff:ff:ff link-netnsid 1

#. / docker_netns.sh 1-19c5d1a7ef brctl show

Bridge name bridge id STP enabled interfaces

Br0 8000.6ace89a2894a no veth0

Vxlan0

It can be seen that in addition to veth0,bridge, vxlan0 is also bound:

. / docker_netns.sh 1-19c5d1a7ef ip-c-d link show vxlan074: vxlan0: mtu 1450 qdisc noqueue master br0 state UNKNOWN mode DEFAULT group default link/ether 96:9d:64:39:76:4e brd ff:ff:ff:ff:ff:ff link-netnsid 0 promiscuity 1 vxlan id 256 srcport 00 dstport 4789 proxy l2miss l3miss ttl inherit ageing 300 udpcsum noudp6zerocsumtx noudp6zerocsumrx...

Vxlan0 is a VxLan virtual network device, so it can be inferred that Docker Overlay communicates across hosts through vxlan tunnels. Here is a direct reference to the diagram of Deep dive into docker overlay networks part 1:

In the figure, 192.168.0.0amp 16 corresponds to the previous 10.20.0.0amp 16 network segment.

2.4 ARP Agent

As mentioned earlier, although the two containers across hosts communicate through Overlay, the containers themselves do not know that they only think of each other as being in the same layer 2 (the same subnet), or layer 2. We know that layer 2 identifies each other through MAC addresses and learns the address translation between IP and MAC through ARP protocol broadcast. Of course, there is no problem in theory to broadcast ARP packets through Vxlan tunnel. the problem is that this scheme will lead to too many broadcast packets and the cost of broadcasting will be very high.

Like OpenStack Neutron's L2 Population principle, Docker also solves the ARP broadcast problem through ARP proxy + static configuration. We know that although the underlying layer of Linux cannot know what the MAC address of the target IP is except through self-learning, it is easy for applications to obtain this information. For example, Port information is stored in Neutron's database, and IP and MAC addresses are stored in Port. Docker also saves endpoint information to the KV database, such as etcd:

With this data, it is possible to statically configure the IP and MAC address tables (neigh tables) instead of using ARP broadcasts. So vxlan0 is also responsible for the ARP proxy for the local container:

. / docker_netns.sh 2-19c5d1a7ef ip-d-o link show vxlan0 | grep proxy_arp

When the vxlan0 agent replies, you can directly find the local neigh table reply, while the local neigh table is configured statically by Docker. You can view the Overlay network namespaced neigh table:

#. / docker_netns.sh 3-19c5d1a7ef ip neigh20.20.0.3 dev vxlan0 lladdr 02:42:0a:14:00:03 PERMANENT10.20.0.4 dev vxlan0 lladdr 02:42:0a:14:00:04 PERMANENT

The PERMANENT description in the record is statically configured rather than learned, and IP 10.20.0.3 and 10.20.0.4 are the IP addresses of the other two containers.

Whenever a new container is created, Docker notifies the node to update the local neigh ARP table through the Serf and Gossip protocols.

2.5 static configuration of VTEP table

The ARP proxy described above belongs to the L2 layer problem, and the container packet is eventually transmitted through the Vxlan tunnel, so the natural problem that needs to be solved is which node node this packet should be transmitted to? If there are only two nodes, when you create a vxlan tunnel, you can specify the local ip (local IP) and the peer IP (remote IP) to establish point-to-point communication, but it is obviously impossible to have only two nodes.

We might as well call the physical network card out of Vxlan VTEP (VXLAN Tunnel Endpoint), which will have a routable IP, that is, the outer IP after the final encapsulation of the Vxlan. Determine which remote VTEP the packet should be transmitted to by looking up the VTEP table:

Container MAC address Vxlan IDRemote VTEP02:42:0a:14:00:03256192.168.1.25402:42:0a:14:00:04256192.168.1.245.

The VTEP table, like the ARP table, can also be learned through broadcast flooding, but it is clear that there are also performance problems, and in fact this scheme is rarely used. In hardware SDN, BGP EVPN technology is usually used to realize the control plane of Vxlan.

The solution of Docker is similar to that of ARP. By filling the VTEP table with static configuration, we can view the Forward database (fdb) of the container network namespace.

. / docker_netns.sh 3-19c5d1a7ef bridge fdb...02:42:0a:14:00:04 dev vxlan0 dst 192.168.1.245 link-netnsid 0 self permanent02:42:0a:14:00:03 dev vxlan0 dst 192.168.1.254 link-netnsid 0 self permanent...

It can be seen that the peer VTEP address of the MAC address 02:42:0a:14:00:04 is 192.168.1.245, while the peer VTEP address of 02:42:0a:14:00:03 is 192.168.1.254. Both records are permanent, that is, statically configured, but the data source is still the KV database, and the locator in endpoint is the node IP of the container.

2.6 Summary

The container uses Docker native Overlay network to create two virtual NICs by default, one of which is outside the container through bridge and NAT, which is responsible for north-south traffic. Another Nic communicates across host containers through Vxlan. In order to reduce broadcasting, Docker statically configures ARP table and FDB table by reading KV data. Events such as container creation or deletion will notify Node to update ARP table and FDB table through Serf and Gossip protocol.

3. Weave similar to Docker Overlay

Weave is a container network solution provided by weaveworks, which is similar to Docker native Overlay network in implementation.

Initialize the three nodes 192.168.1.68, 192.168.1.254, 192.168.1.245 as follows:

Weave launch-- ipalloc-range 172.111.222.0Compare 24 192.168.1.68 192.168.1.254 192.168.1.245

Start the container on three nodes:

# node-1docker run-d-name busybox-node-1-- net weave busybox sleep 360 node-2docker run-d-name busybox-node-2-- net weave busybox sleep 360 node-3docker run-d-name busybox-node-3-- net weave busybox sleep 3600

In the container we ping each other:

From the results, it is found that Weave realizes communication across host containers. In addition, our container has two virtual network cards, one is Docker native bridge network card eth0, which is used for north-south communication, and the other is Weave attached virtual network card ethwe0, which is used for container cross-host communication.

Also check the route of the container:

# docker exec-t-I busybox-node-$NODE ip rdefault via 172.18.0.1 dev eth0172.18.0.0/16 dev eth0 scope link src 172.18.0.2172.111.222.0 dev eth0172.18.0.0/16 dev eth0 scope link src 24 dev ethwe0 scope link src 172.111.222.128224.0.0.0 dev ethwe0 scope link

Where 224.0.0.0 Weave is a multicast address, it can be seen that Weave supports multicast. Refer to Container Multicast Networking: Docker & Kubernetes | Weaveworks.

We only see that the ethwe0,VETH peer ifindex of the first container is 14:

#. / find_links.sh 14default:14: vethwl816281577@if13: mtu 1376 qdisc noqueue master weave state UP mode DEFAULT group default link/ether de:12:50:59:f0:d9 brd ff:ff:ff:ff:ff:ff link-netnsid 0

It can be seen that the opposite end of ethwe0 is under default namespace, and the name is vethwl816281577. The virtual Nic is bridged to weave bridge:

# brctl show weavebridge name bridge id STP enabled interfacesweave 8000.d2939d07704b no vethwe-bridge vethwl816281577

In addition to vethwl816281577, there is also vethwe-bridge under weave bridge:

# ip link show vethwe-bridge9: vethwe-bridge@vethwe-datapath: mtu 1376 qdisc noqueue master weave state UP mode DEFAULT group default link/ether 0e:ee:97:bd:f6:25 brd ff:ff:ff:ff:ff:ff

It can be seen that vethwe-bridge and vethwe-datapath are a VETH pair, so let's look at the peer vethwe-datapath:

# ip-d link show vethwe-datapath8: vethwe-datapath@vethwe-bridge: mtu 1376 qdisc noqueue master datapath state UP mode DEFAULT group default link/ether f6:74:e9:0b:30:6d brd ff:ff:ff:ff:ff:ff promiscuity 1 veth openvswitch_slave addrgenmode eui64 numtxqueues 1 numrxqueues 1 gso_max_size 65536 gso_max_segs 65535

The master of vethwe-datapath is datapath, and it is known from openvswitch_slave that datapath should be an openvswitch bridge, while vethwe-datapath is hung under the datapath bridge as the port of datapath.

To verify, check through ovs-vsctl:

# ovs-vsctl show96548648-a6df-4182-98da-541229ef7b63 ovs_version: "2.9.2"

Using ovs-vsctl, it is found that there is no datapath bridge. Fastdp how it works in the official documentation explains that in order to improve network performance, the user-mode OVS is not used, but the kernel datapath is directly manipulated. Use the ovs-dpctl command to view the kernel datapath:

# ovs-dpctl showsystem@datapath: lookups: hit:109 missed:1508 lost:3 flows: 1 masks: hit:1377 total:1 hit/pkt:0.85 port 0: datapath (internal) port 1: vethwe-datapath port 2: vxlan-6784 (vxlan: packet_type=ptap)

It can be seen that datapath is similar to an OVS bridge device and is responsible for data exchange, which contains three port:

Port 0: datapath (internal)

Port 1: vethwe-datapath

Port 2: vxlan-6784

In addition to vethwe-datapath, there is also a vxlan-6784, which is known by its name as a vxlan:

# ip-d link show vxlan-678410: vxlan-6784: mtu 65535 qdisc noqueue master datapath state UNKNOWN mode DEFAULT group default qlen 1000 link/ether d2:21:db:c1:9b:28 brd ff:ff:ff:ff:ff:ff promiscuity 1 vxlan id 0 srcport 00 dstport 6784 nolearning ttl inherit ageing 300 udpcsum noudp6zerocsumtx udp6zerocsumrx external openvswitch_slave addrgenmode eui64 numtxqueues 1 numrxqueues 1 gso_max_size 65536 gso_max_segs 65535

Finally, the network traffic diagram of Weave is as follows:

4. Brief introduction of simple and elegant Flannel4.1 Flannel

Flannel network is one of the most mainstream container networks at present, which supports both overlay (such as vxlan) and routing (such as host-gw).

Flannel differs from Weave and Docker native overlay networks in that all Node nodes share a subnet, whereas Flannel initialization usually specifies a 16-bit network and then assigns a separate 24-bit subnet to each Node. Because Node are all in different subnets, cross-node communication is essentially layer 3 communication, so there is no layer 2 ARP broadcast problem.

In addition, I think the reason why Flannel is considered very simple and elegant is that unlike Weave and Docker Overlay networks, which need to add a network card specifically for Overlay network communication inside the container, Flannel uses Docker's most native bridge network, which hardly changes the original Docker network model except for the need to configure subnet (bip) for each Node.

4.2 Flannel Overlay network

Let's first take the Flannel Overlay network model as an example. The subnets assigned by the IP and Flannel of the three nodes are as follows:

Subnet node-1192.168.1.6840.15.43.0/24node-2192.168.1.25440.15.26.0/24node-3192.168.1.24540.15.56.0/24 assigned by Node name host IP

Create a busybox container in three Node environments with integrated Flannel networks:

Docker run-d-name busybox busybox:latest sleep 36000

The container list is as follows:

Node name Host IP Container IPnode-1192.168.1.6840.15.43.2/24node-2192.168.1.25440.15.26.2/24node-3192.168.1.24540.15.56.2/24

View the network devices of the container namespace:

#. / docker_netns.sh busybox ip-d-c link416: eth0@if417: mtu 8951 qdisc noqueue state UP mode DEFAULT group default link/ether 02:42:28:0f:2b:02 brd ff:ff:ff:ff:ff:ff link-netnsid 0 promiscuity 0 veth addrgenmode eui64 numtxqueues 1 numrxqueues 1 gso_max_size 65536 gso_max_segs 65535

Like the Docker bridge network, only one network card eth0,eth0 is a veth device, and the ifindex of the opposite end is 417.

Let's take a look at the link information for ifindex 417:

#. / find_links.sh 417default:417: veth2cfe340@if416: mtu 8951 qdisc noqueue master docker0 state UP mode DEFAULT group default link/ether 26:bd:de:86:21:78 brd ff:ff:ff:ff:ff:ff link-netnsid 0

It can be seen that ifindex 417 is under default namespace, the name is veth2cfe340 and the master is docker0, so it is hung under the bridge of docker0.

# brctl showbridge name bridge id STP enabled interfacesdocker0 8000.0242d6f8613e no veth2cfe340 vethd1fae9ddocker_gwbridge 8000.024257f32054 no

It's no different from Docker's native bridge network, so how does it solve cross-host communication?

To achieve cross-host communication, either Overlay tunnel encapsulation or static routing, it is obvious that docker0 does not see any trace of overlay, so it can only be achieved through routing.

Take a look at local routes as follows:

# ip rdefault via 192.168.1.1 dev eth0 proto dhcp src 192.168.1.68 metric 10040.15.26.0/24 via 40.15.26.0 dev flannel.1 onlink40.15.43.0/24 dev docker0 proto kernel scope link src 40.15.43.140.15.56.0/24 via 40.15.56.0 dev flannel.1 onlink...

We only care about the routes at the beginning of 40.15, ignoring the other routes, and we find that all routes are forwarded to flannel.1 except 40.15.43.0 take 24 directly through docker0. On the other hand, 40.15.43.0 Node 24 is the subnet of the local docker0, so containers on the same host can communicate directly through the docker0.

Let's look at the device type of flannel.1:

413: flannel.1: mtu 8951 qdisc noqueue state UNKNOWN mode DEFAULT group default link/ether 0e:08:23:57:14:9a brd ff:ff:ff:ff:ff:ff promiscuity 0 vxlan id 1 local 192.168.1.68 dev eth0 srcport 00 dstport 8472 nolearning ttl inherit ageing 300 udpcsum noudp6zerocsumtx noudp6zerocsumrx addrgenmode eui64 numtxqueues 1 numrxqueues 1 gso_max_size 65536 gso_max_segs 65535

It can be seen that flannel.1 is a Linux Vxlan device, where .1 is the VNI value, and the default is 1 if not specified.

Since the ARP is not involved, the proxy parameter is not required to implement the ARP proxy. Since the container communication of this node is in a subnet, you can directly ARP to learn by yourself, without the need for Vxlan devices to learn, so there is a nolearning parameter.

How does flannel.1 know the peer VTEP address? We still take a look at the forwarding table fdb:

Bridge fdb | grep flannel.14e:55:ee:0a:90:38 dev flannel.1 dst 192.168.1.245 self permanentda:17:1b:07:d3:70 dev flannel.1 dst 192.168.1.254 self permanent

Among them, 192.168.1.245 and 192.168.1.254 happen to be the IP of the other two Node, that is, the VTEP address, while 4e:55:ee:0a:90:38 and da:17:1b:07:d3:70 are the MAC addresses of the peer flannel.1 devices, and since they are permanent tables, it can be inferred that they were added statically by flannel, and this information can obviously be obtained from etcd:

# for subnet in $(etcdctl ls / coreos.com/network/subnets); do etcdctl get $subnet Done {"PublicIP": "192.168.1.68", "BackendType": "vxlan", "BackendData": {"VtepMAC": "0e:08:23:57:14:9a"} {"PublicIP": "192.168.1.254", "BackendType": "vxlan", "BackendData": {"VtepMAC": "da:17:1b:07:d3:70"} {"PublicIP": "192.168.1.245", "BackendType": "vxlan" "BackendData": {"VtepMAC": "4e:55:ee:0a:90:38"}}

Therefore, the principle of Flannel's Overlay network implementation is simplified as shown in the figure:

It can be seen that in addition to adding or decreasing Node, Flannel is required to cooperate with the configuration of static routes and fdb tables. The creation and deletion of containers do not require Flannel intervention at all. In fact, Flannel does not need to know whether new containers are created or deleted.

4.3 Flannel host-gw network

As described earlier, Flannel implements cross-host communication through Vxlan. In fact, Flannel supports different backend, in which backend type is specified as host-gw to support container cross-host communication through static routes. In this case, each Node is equivalent to a router and acts as the gateway of the container and is responsible for routing and forwarding the container.

It should be noted that if you use AWS EC2, you need to disable MAC address spoofing if you use Flannel host-gw networks, as shown in the figure:

Using OpenStack, it is best to disable the port security feature of Neutron.

Similarly, we create the busybox container on each of the three nodes, and the result is as follows:

Node name Host IP Container IPnode-1192.168.1.6840.15.43.2/24node-2192.168.1.25440.15.26.2/24node-3192.168.1.24540.15.56.2/24

Let's look at the local route for 192.168.1.68:

# ip rdefault via 192.168.1.1 dev eth0 proto dhcp src 192.168.1.68 metric 10040.15.26.0/24 via 192.168.1.254 dev eth040.15.43.0/24 dev docker0 proto kernel scope link src 40.15.43.140.15.56.0/24 via 192.168.1.245 dev eth0...

We are only concerned with the routing of the 40.15 prefix, and we find that the next hop of 40.15.26.0ax 24 is 192.168.1.254, which is exactly node2 IP, while the next hop of 40.15.43.0 blade 24 is the local docker0, because this subnet is the subnet where node is located, and the next hop of 40.15.56.0 blade 24 is 192.168.1.245, which happens to be node3 IP. It can be seen that Flannel implements container cross-host communication by configuring static routes, and each Node is used as a router.

The host-gw approach is relatively better than Overlay because there is no vxlan packet unpacking process, and direct routing is over. However, it is precisely because it is implemented through routing, and each Node is equivalent to the gateway of the container, so each Node must be in the same LAN subnet, otherwise the cross-subnet cannot be routed due to the disconnection of the link layer, and the host-gw cannot be implemented.

4.4 Flannel uses cloud platform routing to achieve cross-host communication

The host-gw described earlier realizes container-to-host communication by modifying the host routing table. Of course, there is no problem if you can modify the route of the host gateway, especially when combined with SDN.

At present, many cloud platforms implement custom routing table features, such as OpenStack, AWS, etc. Flannel implements the VPC backend of many public clouds with these features, and directly calls the cloud platform API to modify the routing table to achieve container-to-CVM communication, such as Ali Cloud, AWS, Google Cloud, etc. However, it is a pity that the official OpenStack Neutron backend has not been implemented yet.

Taking AWS as an example, the following four EC2 virtual machines are created:

Node-1: 197.168.1.68/24

Node-2: 197.168.1.254/24

Node-3: 197.168.1.245/24

Node-4: 197.168.0.33/24

Notice that the third and the other two are not on the same subnet.

All three EC2 are associated with flannel-role,flannel-role and the permissions of flannel-policy,policy are as follows:

{"Version": "2012-10-17", "Statement": [{"Sid": "VisualEditor0", "Effect": "Allow", "Action": ["ec2:DescribeInstances", "ec2:CreateRoute", "ec2:DeleteRoute", "ec2:ModifyInstanceAttribute" "ec2:DescribeRouteTables", "ec2:ReplaceRoute"], "Resource": "*"}]

That is, the EC2 instance needs to have permissions such as modifying the routing table.

It was always puzzling how AWS's role was linked to the EC2 virtual machine. In other words, how to call AWS API directly without configuring authentication information such as Key and Secretd. From the-debug information of awscli, you can see that awscli first obtains role information through metadata, and then obtains Key and Secret of role:

About how AWS knows which EC2 instance is calling metadata, you can refer to the previous article on how to get metadata by OpenStack virtual machine.

In addition, MAC address spoofing (Change Source/Dest Check) is disabled for all EC2 instances, and the security group allows the flannel IP address range 40.15.0.0 iptables 16 to be passed, and the following iptables rules are added:

Iptables-I FORWARD-- dest 40.15.0.0 FORWARD 16-j ACCEPTiptables-I FORWARD-- src 40.15.0.0 16-j ACCEPT

The flannel configuration is as follows:

# etcdctl get / coreos.com/network/config | jq. {"Network": "40.15.0.0 Backend", "Backend": {"Type": "aws-vpc"}}

Start flannel and automatically assign 24-bit subnets to each Node. The network segment is as follows:

Node name Host IP Container IPnode-1192.168.1.6840.15.16.0/24node-2192.168.1.25440.15.64.0/24node-3192.168.1.24540.15.13.0/24node-4192.168.0.3340.15.83.0/24

Let's view the routing tables associated with node-1, node-2 and node-3 as shown in the figure:

The routing table associated with node-4 is shown in the figure:

Thus, for each additional Flannel node, Flannel will call AWS API to add a record to the routing table associated with the subnet of the EC2 instance. Destination is the Flannel subnet assigned to the node, and Target is the primary network card of the EC2 instance.

Create a busybox container on each of the four nodes. The container IP is as follows:

Node name Host IP Container IPnode-1192.168.1.6840.15.16.2/24node-2192.168.1.25440.15.64.2/24node-3192.168.1.24540.15.13.2/24node-4192.168.0.3340.15.83.2/24

Containers for all nodes ping node-4, as shown in the figure:

We found that all nodes can ping node-4 containers. However, the node-4 container does not ping the rest of the containers:

This is because each Node only adds a record of its own route by default. Node-4 does not have routing information for node-1 ~ node-3, so it is not available.

Some people may ask, node1 ~ node3 also does not have a route to node4, so why is it possible to ping the container of node4? This is because in my environment, the route associated with the node1 ~ node3 subnet is the NAT gateway, node4 is the Internet gateway, and the subnet of the NAT gateway happens to be the subnet associated with node1 ~ node4. Therefore, although node1 ~ node3 does not find the routing information of node4 in its NAT gateway route, it finds the route of node4 in the routing table of the next hop to the Internet gateway, so it can ping, while node4 cannot find the route of node1 ~ node3.

The above is only the default behavior. Flannel can configure Node through the RouteTableID parameter to update the routing table. We only need to add the following routes for the following two subnets:

# etcdctl get / coreos.com/network/config | jq. {"Network": "40.15.0.0 rtb-054dfd5f3e47102ae 16", "Backend": {"Type": "aws-vpc", "RouteTableID": ["rtb-0686cdc9012674692", "rtb-054dfd5f3e47102ae"]}}

Restart the Flannel service and view the two routing tables again:

We found that Flannel subnet routes for node1 ~ node4 were added to both routing tables.

At this point, the containers of the four nodes can ping each other.

We found that aws-vpc solves the problem that host-gw cannot cross subnets, and Flannel officials also recommend that if you use AWS, it is recommended to use aws-vpc instead of overlay to achieve better performance:

When running within an Amazon VPC, we recommend using the aws-vpc backend which, instead of using encapsulation, manipulates IP routes to achieve maximum performance. Because of this, a separate flannel interface is not created.

The biggest advantage of using flannel AWS-VPC backend is that the AWS knows about that IP. That makes it possible to set up ELB to route directly to that container.

In addition, because the route is added to the host gateway, as long as the routing table is associated, the EC2 instance can directly ping the container from the outside. In other words, EC2 virtual machines on the same subnet can directly communicate with the container.

It should be noted, however, that the AWS routing table supports up to 50 routing rules by default, which limits the number of Flannel nodes. It is not known whether AWS supports the ability to increase quotas. In addition, the latest version of Flannel v0.10.0 seems to have a problem with aws-vpc support. It is recommended to use Flannel v0.8.0 before officially fixing the above problem.

Fifth, cool techs has the most Calico5.1 Calico environment configuration.

Calico and Flannel host-gw are similar to achieve cross-host communication through routing, except that Flannel is realized by adding host static routes one by one through the flanneld process, while Calico realizes the mutual learning and broadcasting of routing rules between nodes through BGP.

The implementation principle of BGP is not described in detail here, but only how the container communicates.

A 3-node calico cluster is created, and the ip pool configuration is as follows:

# calicoctl get ipPool-o yaml- apiVersion: v1 kind: ipPool metadata: cidr: 197.19.0.0 16 spec: ipip: enabled: true mode: cross-subnet nat-outgoing: true- apiVersion: V1 kind: ipPool metadata: cidr: fd80:24e2:f998:72d6::/64 spec: {}

The ip assigned by Calico is as follows:

For host in $(etcdctl-- endpoints $ENDPOINTS ls / calico/ipam/v2/host/); do

Etcdctl-- endpoints $ENDPOINTS ls $host/ipv4/block | awk-F'/'{sub (/-/, "/", $NF)} {print $6

Done | sort

Int32bit-docker-1 197.19.38.128/26

Int32bit-docker-2 197.19.186.192/26

Int32bit-docker-3 197.19.26.0/26

It can be seen that Calico, like Flannel, assigns a subnet to each node, except that Flannel is divided into 24-bit subnets by default, while Calico is divided into 26-bit subnets.

Create a busybox container with three nodes:

Node name Host IP Container IPnode-1192.168.1.68197.19.38.136node-2192.168.1.254197.19.186.197node-3192.168.1.245197.19.26.5/24

It's no problem to communicate with each other by ping.

5.2 Calico container internal network

Let's look at the link devices and routes of the container:

#. / docker_netns.sh busybox ip a

1: lo: mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000

Link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00

Inet 127.0.0.1/8 scope host lo

Valid_lft forever preferred_lft forever

2: tunl0@NONE: mtu 1480 qdisc noop state DOWN group default qlen 1000

Link/ipip 0.0.0.0 brd 0.0.0.0

14: cali0@if15: mtu 1500 qdisc noqueue state UP group default

Link/ether ee:ee:ee:ee:ee:ee brd ff:ff:ff:ff:ff:ff link-netnsid 0

Inet 197.19.38.136/32 brd 197.19.38.136 scope global cali0

Valid_lft forever preferred_lft forever

#. / docker_netns.sh busybox ip r

Default via 169.254.1.1 dev cali0

169.254.1.1 dev cali0 scope link

There are a few personal feelings that are amazing:

The MAC address of all containers is ee:ee:ee:ee:ee:ee

The gateway address is 169.254.1.1, but I searched all the namespaces and couldn't find the IP.

These two questions are recorded in Calico's official faq # 1 Why do all cali* interfaces have the MAC address ee:ee:ee:ee:ee:ee? and # 2 Why can't I see the 169.254.1.1 address mentioned above on my host?.

To address the first problem, officials believe that not all kernels can support automatic allocation of MAC addresses, so simply Calico specifies the MAC address itself, while Calico completely uses layer 3 routing for communication. It doesn't matter what the MAC address is, so it directly uses ee:ee:ee:ee:ee:ee.

The second problem, reviewing the previous network model, most of them connect the container network card to a bridge device through VETH, and this bridge device is often a container gateway, which is equivalent to an additional virtual network card configuration on the host. Calico believes that the container network should not affect the host network, so the other end of the VETH of the container Nic is directly hung in the default namespace without going through the bridge. As a matter of fact, the gateway equipped with the container is also fake. The behavior of the gateway is simulated by modifying the MAC address through proxy_arp, so it doesn't matter what the gateway IP is, so you directly choose an ip of local link, which also saves an IP of the container network. We can grab the bag and see the ARP package:

You can see that the peer calia2656637189 of the container Nic directly replied to ARP, so when going out to the gateway, the container packet will directly change the MAC address to 06:66:26:8e:b2:67, that is, the MAC address of the pseudo gateway.

One might say, what if you communicate in a container on the same host? They should be in the same subnet and the MAC address of the container is the same. So how do you communicate at layer 2? If you take a closer look at the IP mask configured by the container, it turns out that it is 32-bit, that is to say, it is no longer in the same subnet with anyone, and there is no direct communication between the link layer of layer 2.

5.3 Calico host routin

It was mentioned earlier that Calico communicates across hosts through BGP dynamic routing. We look at the host routes as follows, where 197.19.38.139 and 197.19.38.140 are the two container IP on this machine:

# ip r | grep 197.197.19.26.0 via 26 via 192.168.1.245 dev eth0 proto birdblackhole 197.19.38.128 dev eth0 proto bird 26 proto bird197.19.38.139 dev calia2656637189 scope link197.19.38.140 dev calie889861df72 scope link197.19.186.192/26 via 192.168.1.254 dev eth0 proto bird

We found that cross-host communication is exactly the same as Flannel host-gw, with the next hop pointing directly to hostIP, using host as the gateway for the container. The difference is that after arriving at the host, the Flannel will route the traffic to the bridge device, and then the bridge will forward the traffic to the container, while Calico will generate a detailed route for the IP of each container, directly pointing to the peer of the Nic of the container. Therefore, if there are a large number of containers, the number of host routing rules will become more and more, so there is routing reflection, which is not discussed here.

There is also a blackhole route. If the incoming IP is in the container subnet 197.19.38.128 IP 26 assigned by host, but not the IP of the container, it is considered to be an illegal address and is discarded directly.

5.4 Calico multiple network support

Multiple Calico networks can be created simultaneously on the same cluster:

# docker network ls | grep calicoad7ca8babf01 calico-net-1 calico global5eaf3984f69d calico-net-2 calico global

We use another Calico network calico-net-2 to create a container:

Docker run-d-name busybox-3-net calico-net-2 busybox sleep 36000

# docker exec busybox-3 ip a

1: lo: mtu 65536 qdisc noqueue qlen 1000

Link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00

Inet 127.0.0.1/8 scope host lo

Valid_lft forever preferred_lft forever

2: tunl0@NONE: mtu 1480 qdisc noop qlen 1000

Link/ipip 0.0.0.0 brd 0.0.0.0

24: cali0@if25: mtu 1500 qdisc noqueue

Link/ether ee:ee:ee:ee:ee:ee brd ff:ff:ff:ff:ff:ff

Inet 197.19.38.141/32 brd 197.19.38.141 scope global cali0

Valid_lft forever preferred_lft forever

# ip r | grep 197.19

197.19.26.0/26 via 192.168.1.245 dev eth0 proto bird

Blackhole 197.19.38.128/26 proto bird

197.19.38.139 dev calia2656637189 scope link

197.19.38.140 dev calie889861df72 scope link

197.19.38.141 dev calib12b038e611 scope link

197.19.186.192/26 via 192.168.1.254 dev eth0 proto bird

We find that containers whose IP addresses are not on the same network on the same host are on the same subnet. Is it not possible to communicate?

We found that although the IP assigned by two cross-network containers is on the same subnet, it is actually isolated.

If you use an overlay network such as vxlan, it's good to guess how isolation is achieved, simply by using a different VNI. But Calico does not use overlay, directly uses routing communication, and the subnets of different networks overlap, how does it achieve isolation?

To achieve isolation on the same subnet, we guess that the only way to achieve it is logical isolation, that is, through a local firewall such as iptables.

Look at the iptables rules generated by Calico and find that it is too complex, a variety of packages mark. Since the decision to release or discard packets is usually implemented in the filter table, rather than sending their own packets to the host should be in the FORWARD chain, we directly study the FORWARD table of the filter table.

# iptables-save-t filter | grep -'- A FORWARD'-A FORWARD-m comment-- comment "cali:wUHhoiAYhphO9Mso"-j cali-FORWARD...

Calico hangs the cali-FORWARD subchain on the FORWARD chain, and a string in comment looks like a random string cali:wUHhoiAYhphO9Mso. I don't know what it is.

# iptables-save-t filter | grep -'- A cali-FORWARD'-A cali-FORWARD-I cali+-m comment-- comment "cali:X3vB2lGcBrfkYquC"-j cali-from-wl-dispatch-A cali-FORWARD-o cali+-m comment-- comment "cali:UtJ9FnhBnFbyQMvU"-j cali-to-wl-dispatch-A cali-FORWARD-I cali+-m comment-- comment "cali:Tt19HcSdA5YIGSsw"-j ACCEPT-A cali-FORWARD-o cali+-m comment-comment "cali:9LzfFCvnpC5_MYXm"-j ACCEPT...

Cali+ represents all network interfaces prefixed with cali, that is, the Nic peer device of the container. Since we only care about the direction of the traffic destined for the container, that is, the traffic from caliXXX to the container, we only care about the-o cali+ rule with matching conditions. From the above, we can see that all traffic from cali+ is redirected to cali-to-wl-dispatch subchain processing, where wl is the abbreviation of workload and workload is the container.

# iptables-save-t filter | grep -'- A cali-to-wl-dispatch'-A cali-to-wl-dispatch-o calia2656637189- m comment-- comment "cali:TFwr8sfMnFH3BUla"-g cali-tw-calia2656637189-A cali-to-wl-dispatch-o calib12b038e611- m comment-- comment "cali:ZbRb0ozg-GGeUfRA"-g cali-tw-calib12b038e611-A cali-to-wl-dispatch-o calie889861df72- m comment-- comment "cali:5OoGv50NzX0sKdMg"-g cali-tw-calie889861df72-A Cali-to-wl-dispatch-m comment-- comment "cali:RvicCiwAy9cIEAKA"-m comment-- comment "Unknown interface"-j DROP

From the name of the subchain, we can also see that cali-to-wl-dispatch is responsible for the distribution of traffic, that is, it leads to the specific processing flow subchain according to the specific traffic outlet, from X, by cali-tw-X, from Y, by cali-tw-Y, and so on, where tw is the abbreviation of to workload.

We assume that it is sent to the container of busybox 197.19.38.139, and the corresponding host virtual device is calia2656637189, then the jump rotor chain is cali-tw-calia2656637189:

# iptables-save-t filter | grep -'- A cali-tw-calia2656637189'-A cali-tw-calia2656637189-m comment-- comment "cali:259EHpBvnovN8_q6"-m conntrack-- ctstate RELATED ESTABLISHED-j ACCEPT-A cali-tw-calia2656637189-m comment-- comment "cali:YLokMEiVkZggfg9R"-m conntrack-- ctstate INVALID-j DROP-A cali-tw-calia2656637189-m comment-- comment "cali:pp8a6fGxqaALtRK5"-j MARK-set-xmark 0x0/0x1000000-A cali-tw-calia2656637189-m comment-comment "cali:bgw2sCtlIfZjhXLA"-j cali-pri-calico-net-1-A cali-tw-calia2656637189-m comment-comment "cali:1Z2NvhoS27pP03Ll"-m comment-comment "Return If profile accepted "- m mark-- mark 0x1000000/0x1000000-j RETURN-A cali-tw-calia2656637189-m comment-- comment" cali:mPb8hORsTXeVt7yC "- m comment-comment" Drop if no profiles matched "- j DROP

Among them, the first and second rules have been introduced in the principle of OpenStack security group implementation, and will not be repeated.

The third rule notes that set-xmark is used instead of set-mark, so why not set-mark? this is because set-mark overrides the original value. Set-xmark value/netmask, which means X = (X & (~ netmask)) ^ value,-- set-xmark0x0/0x1000000 means to reset the 25th bit of X to zero, leaving the other bits unchanged.

I have not found the meaning of this mark bit in the official. I have found relevant information on the principle, networking mode and use of the Calico network.

Node uses a total of three tag bits, the corresponding tag bits of 0x7000000

0x1000000: message processing action. Setting 1 means release, and default 0 means rejection.

0x2000000: whether it has passed the policy rule test. Setting 1 indicates that it has passed.

0x4000000: message source. Set 1, indicating that it is from host-endpoint.

That is, bit 25 indicates the processing action of the message, 1 means pass, 0 indicates rejection, and rules 5 and 6 can also see the meaning of bit 25, which matches the direct RETRUN of 0x1000000/0x1000000 and the direct DROP of mismatch.

So rule 3 means to clear the 25th mark for re-evaluation, who will evaluate it? This is the purpose of rule 4, according to the network where the virtual network device cali-XXX is located to jump to the subchain of the specified network, and because calia2656637189 belongs to calico-net-1, it will jump to cali-pri-calico-net-1 subchain processing.

Let's look at the rules of cali-pri-calico-net-1:

# iptables-save-t filter | grep -'- A cali-pri-calico-net-1'-A cali-pri-calico-net-1-m comment-- comment "cali:Gvse2HBGxQ9omCdo"-m set--match-set cali4-s:VFoIKKR-LOG_UuTlYqcKubo src-j MARK-- set-xmark 0x1000000/0x1000000-A cali-pri-calico-net-1-m comment-- comment "cali:0vZpvvDd_5bT7g_k"-m mark-- mark 0x1000000mp 0x1000000-j RETURN

The rule is simple: as long as IP is in cali4-s:VFoIKKR-LOG_UuTlYqcKubo in this ipset collection, set the 25th bit of mark to 1, and then RETURN, otherwise DROP directly if IP is not in ipset (the default behavior of subchains is DROP).

# ipset list cali4-s:VFoIKKR-LOG_UuTlYqcKuboName: cali4-s:VFoIKKR-LOG_UuTlYqcKuboType: hash:ipRevision: 4Header: family inet hashsize 1024 maxelem 1048576Size in memory: 280References: 1Number of entries: 4Members:197.19.38.143197.19.26.7197.19.186.199197.19.38.144

Finally, the truth is revealed here. Calico achieves multi-network isolation through iptables + ipset. The IP of the same network will be added to the same ipset set, and the IP of different networks will be put into different ipset sets. Finally, the set module of iptables will match the IP of the ipset set. If src IP is allowed to pass in the specified ipset, otherwise DROP.

5.5 Calico cross-segment communication

We know that Flannel host-gw does not support Node hosts across network segments. Does Calico support it? for this reason, I added a node-4 (192.168.0.33 Node 24), which is obviously not on the same subnet as the other three.

Start a busybox in the new Node:

Docker run-d-name busybox-node-4-net calico-net-1 busybox sleep 36000

Docker exec busybox-node-4 ping-c 1-w 1 197.19.38.144

PING 197.19.38.144 (197.19.38.144): 56 data bytes

64 bytes from 197.19.38.144: seq=0 ttl=62 time=0.539 ms

-197.19.38.144 ping statistics

1 packets transmitted, 1 packets received, 0 packet loss

Round-trip min/avg/max = 0.539max 0.539max 0.539 ms

Verify that there is no problem when container communication is found.

View node-1 routes:

# ip r | grep 197.197.19.26.0 via 26 via 192.168.1.245 dev eth0 proto birdblackhole 197.19.38.128Compact 26 proto bird197.19.38.142 dev cali459cc263d36 scope link197.19.38.143 dev cali6d0015b0c71 scope link197.19.38.144 dev calic8e5fab61b1 scope link197.19.65.128/26 via 192.168.0.33 dev tunl0 proto bird onlink197.19.186.192/26 via 192.168.1.254 dev eth0 proto bird

Unlike other routes, we found that 197.19.65.128tick 26 went out via tunl0:

# ip-d link show tunl05: tunl0@NONE: mtu 1440 qdisc noqueue state UNKNOWN mode DEFAULT group default qlen 1000 link/ipip 0.0.0.0 brd 0.0.0.0 promiscuity 0 ipip any remote any local any ttl inherit nopmtudisc addrgenmode eui64 numtxqueues 1 numrxqueues 1 gso_max_size 65536 gso_max_segs 6553 ip-d tunnel showtunl0: any/ip remote any local any ttl inherit nopmtudisc

It can be seen that if the node spans the network segment, the Calico is transmitted through the ipip tunnel, which is equivalent to the overlay.

Compared with Flannel host-gw, in addition to the difference between static and BGP dynamic routing configuration, Calico also solves the problem of multi-network support through iptables + ipset, and realizes the problem of node cross-subnet communication through ipip tunnel.

In addition, some businesses or POD require fixed IP, such as POD migration from one node to another node to keep the IP unchanged, which may cause the container IP not to fall within the subnet assigned on the node Node. Calico can be implemented by adding a 32-bit detail route, which is not supported by Flannel.

Therefore, relatively speaking, Calico implements more functions, but in the end, Calico is much more complex than Flannel, and the operation and maintenance is more difficult, so it is not easy to sort out a bunch of iptables rules.

6. Kuryr integrated with OpenStack network

Kuryr is a relatively new project in OpenStack, and its goal is "Bridge between container framework networking and storage models to OpenStack networking and storage abstractions.", that is, to realize the network integration of container and OpenStack. This scheme realizes the same network function and interworking as virtual machine and bare metal, such as multi-tenant, security group and so on.

The network model is basically the same as the virtual machine, except that the virtual machine is directly attached to the virtual machine device through the TAP device, while the container is connected to the container's namespace through VETH.

Vm Container whatever

| | |

TapX tapY tapZ

| | |

| | |

QbrX qbrY qbrZ

| | |

-

| | br-int (OVS) |

-

| |

-

| | br-tun (OVS) |

-

Thank you for reading this article carefully. I hope the article "what is the implementation principle of the mainstream Docker network" shared by the editor will be helpful to everyone. At the same time, I also hope that you will support and pay attention to the industry information channel, and more related knowledge is waiting for you to learn!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Servers

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report