How to understand the Kubernetes Container Network Model 04/24 Update SLTechnology News&Howtos

How to understand the Kubernetes Container Network Model

2025-04-24 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Development >

Shulou(Shulou.com)06/02 Report--

This article mainly explains "how to understand the Kubernetes container network model". The content of the article is simple and clear, and it is easy to learn and understand. Please follow the editor's train of thought to study and learn "how to understand the Kubernetes container network model".

1. Background

Computing, storage and network are the three basic services in the cloud era, and Kubernetes, as a new generation of infrastructure, is no exception. Among these three, the network is the most difficult to grasp and the most prone to problems. This paper simply combs the Kubernetes network traffic model, hoping to provide some ideas for beginners. Take a look at the overall kubernetes model first:

Several addresses involved in the container network:

Node Ip: address of physical machine.

The minimum deployment unit of a POD Ip:Kubernetes is Pod, and a pod may contain one or more containers. Simply put, containers do not have their own separate address, and they share the address and port range of the POD.

For the Ip address of ClusterIp:Service, the external network cannot change the address through ping, because it is a virtual IP address, and no network device is responsible for this address. The internal implementation is to redirect to its local port using Iptables rules, and then equalize to the backend Pod;. Only Kubernetes cluster internal access can be used.

Public Ip: the IP assigned to the Service object in the Cluster IP range pool can only be accessed internally and is suitable as a hierarchy within an application. If this Service serves as a front-end service and is ready to provide business to customers outside the cluster, we need to provide a public IP for this service.

two。 Container network traffic model

Container network needs to solve at least the following communication scenarios: communication between containers within ① POD

Communication between ② and host POD

③ inter-host POD communication

The implementation of Service Cluster Ip and external access in ④ cluster is described below.

2.1 Communication between containers in POD

Containers in Pod can communicate with each other through "localhost". They use the same network namespace, and for containers, hostname is the name of Pod. All containers in Pod share the same IP address and port space, and you need to assign different ports to each container that needs to receive connections. In other words, applications in Pod need to coordinate the use of ports themselves. The experiment is as follows: first, we create a Pod that contains two containers with the following container parameters:

View:

You can see that the container shares the address of Pod, so whether they use the same port resource, we can simply experiment: first, listen on a port in Container 1:

Then check whether the port is occupied in Container 2:

Visible ports are also shared; so to put it simply, think of Pod as a small system and containers as different processes in the system Internal implementation: actually share the same Namespace with the containers in the POD, so use the same Ip and Port space. The Namespace is implemented by a small container called Pause. Whenever a Pod is created, a pause container is first created, and then other containers in the pod communicate with the external pod by sharing the network stack of the pause container, so for all containers in the Pod They see the same view of the network, and the address we see in the container, that is, the Pod address, is actually the IP address of the Pause container. The overall model is as follows:

When we view the POD created earlier in the node node, we can see the pause container:

This newly created container and an existing container (pause) share a Network Namespace (rather than a host) is what we often call the container pattern.

2.2 Communication with host POD

Each Pod on each node has its own namespace. How do you communicate with the POD on the host? We can set up a Vet Pair for communication between two POD, but if there are multiple containers, it will be very troublesome to establish a Veth in pairs. If there are N POD, then we need to create n (n POD 1) / 2 Veth Pair. The scalability is very poor. If we can connect these Veth Pair to a centralized forwarding point, it will be very convenient for unified forwarding. This centralized forwarding point is what we often call bridge. As follows (for simplicity, pause is ignored here):

Still take our test environment as an example, the pod1 and pod2 addresses created are 10.244.1.16 and 10.244.1.18, respectively, located on the node1 node.

View the namespace under the node:

These two NS are the namespace corresponding to the above two POD. Query the APIs under the corresponding namespace:

You can see that the address at the red is actually the ip address of POD; the NS and the corresponding POD address have been found, so how do you confirm the other end of the virtual interface under these two ns? A more intuitive confirmation method is as follows: the above API, such as 3: eth0@if7, means that the local interface id is 3 and the peer interface id is 7. Let's take a look at the veth port of default namespace (which we usually read under default by default):

7: veth4b416eb5@if3, the id of this interface is exactly the interface with id of 7, which is the other end of veth pair

2.3 Inter-host POD communication

To put it simply, there are only two schemes for interworking between two endpoints on the network, one is underlay direct interworking, then both parties need to have routing information of each other and the routing information exists on the path of underlay, one is overlay scheme, which realizes interworking through tunneling, and the underlay layer can ensure that the host can reach it. The former represents Calico (direct mode) and Macvlan, and the latter has Overlay,OVS,Flannel and Weave. We take the representative Flannel and calico plug-ins for introduction.

2.3.1 Flannel

The overall communication process is as follows:

Communication process

2.3.1.1 address assignment

When flanneld starts for the first time, it obtains the configured Pod network segment information from etcd, assigns an unused address segment to this node, and then creates a flannedl.1 network interface (which may also be another name, such as flannel1, etc.). Flannel writes the Pod network segment information assigned to it into the / run/flannel/docker file (there are differences in the file names of different K8s versions), and docker uses the environment variables in this file to set the docker0 bridge. So that this address field is owned by this node.

View the address range assigned by flannel to docker:

Indicates that the POD addresses created by this node are all assigned from 10.244.1.1 pod 24, such as the following two pod of the node1 node.

2.3.1.2 routing dispatch

On each host, flannel runs a daemon process called flanneld, which can create a routing table in the kernel. View the routing table of the node1 node as follows:

You can see the routing rule of node2 node match 10.244.2.0, and the exit interface is flannel.1 port (the number may be different after the interface name flannel) flannel.1 is a tunnel port created by flanneld program There is a problem here, that is, how to determine where the tunnel goes. Obviously, flannld stores a mapping relationship between container and physical nodes, and this information is stored in etcd. The flannld process determines the outer layer encapsulation of the tunnel by reading the mapping information in etcd.

2.3.1.3 data plane encapsulation

After Flannel knows the outer encapsulation address, it encapsulates the message. The source uses its own physical ip address, and the destination uses the peer-to-peer udp port 8472 (if UDP encapsulation uses 8285 as the default destination port, it will be mentioned below). The peer only needs to monitor the port. When the modified port receives the message, the process sends the message to the flanned interface to receive the encapsulation, and then queries the local routing table:

You can see that the destination address is cni0; the Flannel feature supports three different backend implementations, which are:

Host-gw: two host are required in the same network segment, and cross-network is not supported, so it is not suitable for large-scale deployment.

UDP: not recommended unless the kernel does not support vxlan or debugg and is currently deprecated

Vxlan: vxlan encapsulation. Flannel uses vxlan technology to create an interoperable Pod network for each node, using port UDP 8472 (this port needs to be opened, such as public cloud AWS, etc.).

Let's grab the package and verify it on the node node:

(note: in linux environment, the UDP destination port in Flannel vxlan encapsulation is 8472, and the identification of standard Vxlan messages is based on destination port 4789, so you need to manually specify to parse according to vxlan, otherwise the inner information cannot be recognized.)

2.3.2 CalicoCalico supports three routing modes:

Direct: routed and forwarded without packet encapsulation

Ip-In-Ip:Calico default route pattern with ipip encapsulation on the data side

Vxlan:vxlan encapsulation

This paper mainly introduces the Direct mode, which uses soft routing to establish the BGP declaration container network segment, so that all Node and network devices in the network have routing information to each other, and then directly forward through underlay. The overall structure of the Calico implementation is as follows:

The component contains:

Felix:Calico agent: run on each node and set network information for the container: IP, routing rules, iptable rules and other BIRD:

BGP Client: listen to the routing information injected by Felix on the Host, and then broadcast to other Host nodes through the BGP protocol to achieve network interworking.

BGP Route Reflector: BGP peer can be set up in a variety of ways. You can set up bgp peer (default mode) between node (default mode), which is similar to the traditional ibgp peer problem. This will bring the neighbor quantity of n* (nMuth1) / 2, so you can also build RR reflectors (the structure in the figure above), node nodes and RR to build peer. Of course, node can also build peer with Tor. For detailed network discussion, please refer to the official website:

Https://docs.projectcalico.org/reference/architecture/design/l3-interconnect-fabric

Calicoctl: calico command line management tool.

There is no fixed standard for choosing which peer method to choose. It is necessary to adapt to the overall network planning, as long as the container network can be correctly published to the physical network.

The process of data communication is as follows: the packet is sent from the veth device to the other port, reaches the virtual network card at the beginning of the Cali on the host, and then arrives at the network protocol stack on the host, and then queries the routing table for forwarding. Because the local machine establishes the bgp neighbor relationship through bird and RR, the local container address will be sent to RR and reflected to other nodes of the network. Similarly, the network address of other nodes will also be transmitted locally, then managed by the Felix process and sent to the routing table. The message can be forwarded normally after matching the routing rules (in fact, there are complex iptables rules, which are not expanded here).

Let's learn through a simple experiment:

The specific installation process will no longer be discussed. Please refer to the official website: https://www.projectcalico.org/ for installation and deployment.

The Node node bgp is configured as follows:

To simplify the experiment, we enable another machine to run FRR to act as RR (for Frr reference website https://frrouting.org/), RR configuration is as follows:

In this way, all nodes establish bgp neighbors with RR and check the status of the neighbors as follows:

Let's create two new pod, one on two node nodes:

By default, when the first container appears in the network, calico assigns a subnet (subnet mask / 26) to the container, and the subsequent pod on this node assigns ip addresses from this subnet, which has the advantage of reducing the size of the routing table on the node. Enter the container to check the route. We found that the gateway address is 169.254.1.1.

In fact, in the calico network, the container gateway is always 169.254.1.1. This address does not exist in the actual network and is a direct ARP proxy (ee:ee:ee:ee:ee:ee). When we create the Pod, the system will add a virtual network card at the beginning of the cali to the corresponding node, which is the other end of the veth Pair (the local end is the local eth0 port of the container). Its mac is the mac address corresponding to 169.254.1.1 above.

Now that the message has entered default namespace, check the routing table here:

Where 192.168.23.128 node2 26 is the address space on the node1. The route is sent by the node2 node bird to the bird reflected by the RR,RR node to the node1 node, and then managed and sent to the routing table by the felix. We can further confirm that the packet is grabbed on the node1 node:

At the same time, because of the proxy mode of calico, the communication between different POD and node is also special, which is also realized through three-layer forwarding. For example, the two addresses of node2 nodes all have / 32 bits in the routing table, the next hop interface is one end of veth-pair, and the other end is the corresponding pod inner interface.

This is different from the way flannel is implemented in bridge.

2.3.3 Summary

Here we make a simple comparison between flannel and calico from a network perspective:

Generally speaking, Calio is preferred when it is sensitive to performance and high strategy requirements, otherwise, Flannel will be a better choice.

2.4 Service and external communication

The implementation of Serice and external communication scenarios involves a lot of iptables forwarding principles, which will not be expanded here because of the limited space. A brief introduction is as follows:

Pod and service communication: Pod can communicate directly through the IP address, but only if the Pod knows the other party's IP. In a Kubernetes cluster, Pod may be destroyed and created frequently, which means that the IP of Pod is not fixed. To solve this problem, Service provides an abstraction layer to access Pod. No matter how the back-end Pod changes, Service provides services as a stable front-end. At the same time, Service also provides high availability and load balancing, and Service is responsible for transferring requests to the correct Pod

External communication: whether it is the IP of Pod or the Cluster IP of Service, they are only visible in the Kubernetes cluster. For the world outside the cluster, these IP are private Kubernetes that provides two ways for outsiders to communicate with Pod:

NodePort:Service provides services through the static port of the Cluster node, and the outside can access the Service through:

LoadBalancer:Service uses load balancer provided by cloud provider to provide services, and cloud provider is responsible for directing load balancer traffic to Service. Currently, cloud provider supports GCP, AWS, Azur and so on.

Thank you for your reading. the above is the content of "how to understand the Kubernetes container network model". After the study of this article, I believe you have a deeper understanding of how to understand the Kubernetes container network model, and the specific usage needs to be verified in practice. Here is, the editor will push for you more related knowledge points of the article, welcome to follow!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.