(7) Docker network and overlay cross-host communication 07/19 Update SLTechnology News&Howtos

(7) Docker network and overlay cross-host communication

2025-07-19 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Servers >

Shulou(Shulou.com)06/03 Report--

How do containers communicate with hosts, containers, and boast host containers? This requires the use of the Docker network.

In the previous introduction, we set the port exposed by the container through the EXPOSE parameter in Dockerfile, and use-p in docker run to set the mapping of the host port to the container port. This is only the simplest communication between the host and the container. Similarly, using the host IP:PORT method allows other containers to communicate with the container, but there is a problem. First of all, the application needs to hard-code the IP. Secondly, the IP of the container will change every time it restarts. Obviously, it should be decoupled as much as possible in the production process. Let's take a look at the composition of the Docker network first.

View network settings

Starting the docker service results in a virtual bridge (multi-port virtual switch) device for docker0. Veth*, if you start a container, you will produce a device like this. The device corresponds to the eth0 virtual network card in the container. This veth* can be understood as an interface on a certain bridge, so it only has an MAC address but no IP address. After all, the network port of the layer 2 switch does not have an IP address. The other end of this interface is plugged into the container's network card. This network card has both IP and MAC (it can also be understood as a network cable connected to the docker0. After all, there is a pair between the network card in the container and the outside of the container. The ID,459 outside the container is the ID of the network card in the container.) the bridge connects with other physical or virtual network cards in the kernel layer.

Let's take a look at the network-related parameters in the docker run command:

-- dns=IP # specifies the DNS server-- dns-search=DOMAIN # specifies the search domain-h HOSTNAME # sets the host name of the container-- link= container name: alias # links to the specified container when starting the container, so that containers can access-p # map host port-- net=bridge # default configuration by name, creating a separate network namespace for the container Assign the network card, IP address and hang the container to the docker0 virtual bridge through veth interface #. -- net=none # creates an independent network namespace for the container, but does not make network settings. The container does not have a network card and IP--net=host # containers and hosts share network settings. The network information seen in the container is the same as that of the host, that is, # does not create an independent network namespace for the container. -- net=user_defined_network # users create a network using network, and containers in the same network can be seen here. Similar to vmware, you can create multiple network channels such as vmnet1 #, vmnet2, etc. -- net=container: container name or ID # indicates that the container shares the network namespace of the specified container.

DNS configuration of the container:

The hostname and DNS settings in the container are maintained through / etc/resolv.conf, / etc/hostname, and / etc/hosts files, as shown below:

When creating a container, the / etc/resolv.conf file is the same by default as on the host; the / etc/hosts file has only one container's own record by default; / etc/hostname records the container's own hostname. You can modify these three files of the container directly, but once the container is restarted, it will fail.

So use-- dns=IP when running docker run to add an additional DNS server address after the / etc/rsovle.conf file in the container

-- hostname=HOSTNAME to specify the host name.

Access control of the container:

If you want to access the Docker container or the Docker container to access the external network, the following parameter needs to be 1, which means IPV4 forwarding is enabled

Sysctl net.ipv4.ip_forward

In addition, we need to pay attention to a few points:

In addition, the container can access the external network by default, but the external network cannot access the container

The container goes out and accesses the external network through SNAT

External access to the container also depends on the host and the firewall in the container. If access is allowed, add the-P or-p parameter to the docker run to specify the host-to-container port mapping.

The traffic enters the host Nic, first enters the PREROUTING chain, and then introduces the traffic into the DOCKER chain. Through DNAT, the 32768 traffic is changed to the docker container address and port.

Libnetwork

This is a plug-in network function of Docker. The structure of the model is very simple. It includes three elements based on:

Sandboxie: represents a container, can also be understood as a network namespace

Access point: represents the interface on which the container can be mounted, and an IP address is assigned

Network: a subnet that can connect multiple access points

First, drive to register your own to the network controller, which creates the network, then creates the access point on the network, and finally connects the container to the access point. The deletion process is reversed, first uninstalling the container from the interface, then deleting the access point, and finally deleting the network. Currently, there are four types of drivers supported:

Type description: Null does not provide network services. If the container is connected to the network access point of this type, there is no network connection Briage bridge. Like the root Docker0, it is implemented using the traditional Linux bridge and Iptables, and can communicate with the host and other hosts through the NAT container. Overlay uses vxlan tunneling to achieve cross-host communication, which is a concept in software-defined networks. There are also many implementation methods in software-defined networks, including vxlan, and another more commonly used one is GRE. In fact, these are all tunneling. GRE is the establishment of a layer 2 point-to-point tunnel, and vxlan can be seen as an upgrade of vlan, because traditional vland distinguishes different networks by labeling, but the label is limited to up to 4096 VLAN networks; while the label of vxlan is 24 bits, so the number of networks is greatly increased, in addition, it is in the form of L2 over UDP, so it can span three layers. This kind of SDN is widely used in public cloud platforms because of its large scale and multi-tenancy. Host

In host mode, the container uses the network namespace of the host, that is, it uses the host network card IP to communicate (as the name implies, they are both the same IP). However, there is no independent network protocol stack in this way, the container will compete with the host to use the network protocol stack, and the ports used by many services on the host cannot be used in the container.

Remote extension type, reserved for other scenarios

From the above description, you can see that these network types of containers are very similar to those in virtualization. Let's talk about common commands.

List the networks:

Docker network ls [options] #-f driver=NAME lists the networks of specific driver types

Create a network:

Docker network create [options] NETWORK-NAME#-d driver type #-- gateway IP gateway address #-- internal prohibits external access to this network #-- range of IP addresses assigned by ip-range IP #-- subnet VALUE sets subnet mask #-- plug-in type of ipam-driver STRING IP address management #-- option of ipam-opt VALUE IP address management plug-in #-- whether ipv6 supports IPV6#-- Lable VALUE adds tagging information to the network #-- o VALUE network driver option

Look at the following example

Docker network create-d bridge-- gateway 172.16.200.254-- subnet 172.16.200.0ax 24 vmnet01

I set up a network of 172.16.200.0 here, and specify the IP of the default gateway, using bridge mode.

Note: if you do not specify the network driver type with-d, the default is the bridge.

View internal details

Some options for bridge drivers:

The option describes the name of the com.docker.network.bridge.name bridge, which is seen in ifconfig. It is recommended to add this option, otherwise it is not easy to identify if there are too many network cards. The default docker0 is the alias of the default network bridge. Configuring IP for docker0 is equivalent to configuring the management IP for the virtual switch, and this IP is also the container gateway address.

Does com.docker.network.bridge.enable_ip_masquerade enable ip_masquerade? this is address camouflage, similar to SNAT, but there is a difference. It replaces the source IP with the network card IP that will send data. The IP address changes every time the container starts, so how can we ensure that we use the same address or domain name to access this container? It is never possible to modify the iptables every time the container is started. Using ip_masquerade, each time the IP of the container changes, the IP is automatically obtained and the iptables is modified. Com.docker.network.bridge.enable_icc whether the bridge allows communication between containers. The icc of the default bridge docker0 is true, so communication between containers is allowed by default. After the container is created or connected to the bridge, the parameters configured on the bridge will be used. If the icc of the bridge is false, the containers cannot communicate with each other, and the communication between containers can still be set by means of container links. What is the default IP when com.docker.network.bridge.host_binding_ipv4 binds the container port? the default bridge docker0 is configured to accept traffic from hosts on all network interfaces. Com.docker.network.driver.mtu sets whether the MTUcom.docker.network.bridge.default_bridge of the container is the default bridge, and the container creates a bridge that will be used by default, unless you use the-- net parameter to set which bridge to connect to. Docker network-d-o "com.docker.network.bridge.name" = "XX"

Note: if you create a bridge network, if you do not know the subnet, it will follow the default 172.16.0.0 Universe 16 to 17, 18.... At the same time, an IP is set for the bridge, which is usually the first IP of the subnet, so the name seen in ifconfig is different from that of creating the network, and the name is random, so that's why you need to use the com.docker.network.bridge.name in the-o parameter above to specify a name. In fact, creating a network background through the docker command is actually a Linux bridge created.

Connect the container to the network:

This command switches the running container to the network. If you are creating a container, you need to use the-- net parameter in the docker run command to specify the network. If not, the default bridge is used.

Docker network connect [options] NETWORK-NAME CONTAINER#-- ip IP manually assigns an address to the container, or automatically if it is not specified #-- alias VALUE adds an alias to the container #-- link VALUE adds a link to another container #-- link-local-ip VALUE adds a link address to the container

Looking at the example below, let's connect a running container to the network we created above.

Docker network connect vmnet01 jspSrv01

Connecting to other networks does not affect access to the external network.

Unload the container from the network:

Docker network disconnect [options] NETWORK-NAME CONTAINER#-f forces the container to be unloaded from the network interface

Looking at the following example, let's uninstall the container from vmnet01

Docker network disconnect-f vmnet01 jspSrv01

After uninstalling, you find that there is still network information, but in fact, it has returned to the default bridge.

Delete a network:

When the access point does not exist in the network, the deletion is successful.

Docker network rm NETWORK-NAME

View the inside of the network:

Docker network inspect [options] NETWORK-NAME#-f STRING formats the specified characters

Cross-host network communication:

There are three ways to communicate across hosts:

The container uses host mode and directly uses the IP of the host host, but the port is prone to conflicts. The usage scenario is limited.

Port mapping, which we have been using before, achieves external access through bridge-mode networks and through DNAT, but lacks flexibility.

Direct routing, you can use the default Docker0, or you can build a new bridge. Add a static route implementation on the Docker host. The problem with this solution is that although it spans hosts, containers on different hosts must be connected to the same bridge, which means that the IP segment is the same, so this has great limitations.

Use SDN method, such as Overlay network or Open vSwitch supported natively after Flanneldocker1.9

Next, we will implement it based on the overlay that comes with docker. We need to note that there are 2 modes in overlay network, one is swarm mode, the other is non-swarm mode, and when using in non-swarm mode, we need to discover third-party components with the help of services.

The type of network driver required for docker to communicate across hosts is overlay, but it also requires a key-value service discovery and configuration sharing software, such as Zookeeper, Doozerd, Etcd, Consul and so on. Zookeeper, Doozerd and Etcd are similar in architecture, providing only the original key storage, requiring program developers to provide their own service discovery function, while Consul has built-in service discovery, as long as users register the service and perform service discovery through DNS or HTTP interfaces, and it also has the function of health check. We use Consul here as a service discovery tool. Let's talk about the environment first:

Computer name IP feature dockerothsrv

Eth0:192.168.124.139

Eth2:172.16.100.10

Docker private repository, Consul service Docker01

Eth0:192.168.124.138

Eth2:172.16.100.20

Docker Container Server Docker02

Eth0:192.168.124.141

Eth2:172.16.100.30

Docker Container Server

Create a Consul service:

You can build this directly in the system, or you can run a container to do this, and we do it here by running the container, because, after all, we talk about cross-host communication, and we install it on the dockerothsrv server.

Docker run-d-p 8500 docker run-h consul progrium/consul-server-bootstrap

Configure the docker host:

Here, you need to modify the configuration file of dockerd. Add some content to both Docker servers, as shown below:

The parameter indicates that cluster-store points to the address where the key value is stored and is registered in consul. It has to be stored in the form of key value. Cluster-advertise

This is a combination of host network interface or IP address plus port, that is, the address of the dockerd instance in the host in the consul cluster, the value used by the remote dockerd service to connect to this dockerd service, and the port through which Docker01 and Docker02 servers interconnect. It can be in the form of IP:PORT or INTERFACE:PORT. This port is specified when your dockerd service runs as daemon. Enter the 5555 behind the tcp://0.0.0.0 above. In some documents you will see ports like 2376 or 2375, this is because this is written in the official Docker documentation, because dockerd defaults to local socket running and does not accept network remote connections, so if necessary, you need to specify that the IP:PORT,2375 in the-H tcp option is an unencrypted port and 2376 is an encrypted port.

Restart the dockerd service

Systemctl restart docker

Check it out

Establish a network of overlay driver types:

Docker network create-d overlay oNet

You just need to build it on one host and see it on another host. As shown below:

Description: docker_gwbridge network this is a local bridge, it will be automatically established in the case of 2, one is to initialize or join a swarm cluster; the other is

Test:

Create 2 new containers using an automatic oNet network

Docker01

Docker02

Check the details of the network

The two containers Ping each other.

# turn off the firewall on 2 hosts, otherwise you will not be able to get through systemctl stop firewalld

Principles of cross-host communication:

Since we are going to look at the network namespace later, and docker's network namespace is not built under / var/run/netns, we link the docker's network namespace there so that we can view it through the ip command

Ln-s / var/run/docker/netns/ / var/run/netns# View Network Namespace via IP netns

Check which network card in the container and which network card on the host is a pair by command.

Ip addrdocker exec aaa ethtool-S eth0/1

It is found that the eth0 in the container, that is, the network card in the 10.0.0.0 network segment, is not associated with any virtual network card on the host, so where is it? Instead, the eth2 in the container is associated to 496, and 496 uses the docker_gwbridge bridge, but this bridge is 172.18.0.0and16, so how does it communicate with the 10.0.0.0 network segment? Shall we take a look at the namespace?

As for which namespace you can only look at one by one, the network segment 10.0.0.0 in the 4-cd namespace, the eth0 in the container is associated with 494 in this namespace, and the 494 is connected to the br0 bridge. Where vxlan1 is the VXLAN tunnel endpoint, it is the device edge of the VXLAN network, and is used for non-packet and unpacket of VXLAN message, including ARP request message and normal VXLAN data packet. After the packet is encapsulated, the message is sent to the VTEP at the other end, that is, the VXLAN tunnel endpoint, and the VTEP at the other end is received to unlock the message.

You can see the routing for this namespace in the following figure.

You can see what the VLANID of vxlan1 is with the following command. You need to know that each network established has a separate network namespace.

Ip netns exec 4-cd7565b196 ip-d link show vxlan1

The X in the vxlanX on the two hosts is different, but the VLANID of the vxlan in the same overlay network is the same. Take a look at the decomposition diagram:

The communication process is as follows:

The container 01 ping 10.0.0.4 of host An is sent through the eth0 of this container, and it is learned from the routing table that sending to br0,br0 is equivalent to a virtual switch. If the target host is in the same host, it communicates directly through br0, if not, through vxlan.

Br0 will send the request to vxlan1 when it receives the request. You can PING it here, and then you can see it through the following command

Ip netns exec Network Namespace ip neigh

The MAC address table (learned by the docker daemon in the concul database through the gossip protocol) is saved in vxlan and sent through the eth2 of host A.

When the message arrives at host B, it is found to be a vxlan message. If you get the IP, you will give it to the vxlan device on it (the device of the same ID).

Unpack the vxlan on host B and give it to br0, then br0 completes the final delivery according to the MAC table.

Shortcomings of overlay:

Since the default network of the overlay network and the host network are not under the same network, in order to solve the communication problem with the host, docker adds an additional network card for the host and the container. What is added for the host is docker_gwbridge, and the eth2 is added for the container. These two network cards are an IP segment. However, this will make it inconvenient to use. One set of IP is used between containers, and another set of IP is used for public network access containers.

The external provision of the container still needs to be achieved through port binding, and the outside world cannot directly access the container through IP.

Overlay must rely on the docker process and the key database to communicate.

The performance loss of docker native overlay is relatively large, so it is not recommended to use it in production.

No matter what plan you take, you need to face some problems:

Cross-host communication (containers on different hosts should be able to communicate with each other and be accessible within the entire IDC environment)

Container drift

Cross-host container IP allocation to avoid IP conflicts

Network performance comparison:

It can be seen that the performance loss of Bridge is about 10%, and the performance loss of docker native overlay is the largest. Cisco's Calico overlay performance is almost close to Bridge performance.

Expand only knowledge:

Why do you need service discovery?

Write a program to call a service. If the service follows the traditional method on other servers, we first need to get the IP of that server and the port used by that service. No matter which server is a physical machine or a virtual machine, their IP addresses are relatively static. We just need to write this information in the code or in the configuration file that the code program depends on. But now the situation is different, especially in the micro-service architecture of cloud computing, such as container technology, the service runs in the container, no matter whether the container is on the physical machine or in the virtual machine, it is very troublesome to obtain the address of the container, because the network address of the container or service is dynamically assigned. And because of the dynamic scaling of the service capacity (the number of instances running the container), static addresses cannot be used, so service discovery needs to be done in a new way if you want to invoke the service.

There are two modes of service discovery, client-side discovery and server-side discovery, and server-side discovery is commonly used because server-side discovery pulls out the Service Registry compared to client-side discovery. Service Registry is the registration of the service, where the service instance is registered and logged out when it is not in use. The client initiates a request to the service through the load balancer, which looks up and sends the request to the available service instance in the Service Registry. All service discovery tools have the role of Service Registry, which is the core. It is the network address of the service instance contained in the database of KUnip V. In general, Service Registry must be highly available and always up-to-date.

Registration of service discovery: self-registration and third-party registration

Self-registration: to register yourself is to register and unregister the service instance in Service Registry, and send heartbeat information to Registry if necessary. The advantage is that it is simple and does not require other components, the disadvantage is that the service instance and Service Registry are coupled, and the developer must write separate registration code for the service.

Third-party registration: as the name implies, registration and logout are not done by the service instance itself, but by a service registry, which monitors and tracks the running service instance and automatically registers it when it finds that the service is available. The advantage is decoupling, the registration is completed by the unified component, and the developer does not have to write the registry separately; the disadvantage is that there must be a registry in the deployment environment, and if not, you have to set it yourself.

What is Overlay?

Overlay is a network overlay in which one packet or frame is encapsulated in another packet or frame. The encapsulated packet is de-encapsulated when it is forwarded to the other end of the tunnel. Vxlan or GRE/NVGRE is a kind of overlay network technology, which belongs to layer 2 overlay technology. The idea is to transmit Ethernet packets on a certain tunnel, but the way or protocol to build the tunnel is different. The tunnel can be divided into 2 layers and 3 layers, and the common layer 2 is PPTP or L2TP. Three layers of common GRE, IPsec and so on.

Overlay is to build a virtual network on top of the existing network, and the upper application is only related to the virtual network.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.