[Kubernetes Series] part 11: analysis of Network principles (part two) 07/03 Update SLTechnology News&Howtos

[Kubernetes Series] part 11: analysis of Network principles (part two)

2025-07-03 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Servers >

Shulou(Shulou.com)06/03 Report--

3. Overlay Network-Overlay Network

Overlay network (overlay network) is a technology that wraps TCP data in another network packet for routing, forwarding and communication. Overlay networks are not required by default, but they are useful in specific scenarios. For example, when we don't have enough IP space, or when the network can't handle extra routes, or when we need some additional management features provided by Overlay. A common scenario is when there is a limit on the number of routes that the cloud provider's routing table can handle. For example, the AWS routing table supports up to 50 routes so as not to affect network performance. So if we have more than 50 Kubernetes nodes, the AWS routing table will not be enough. In this case, using the Overlay network will help us.

In essence, Overlay is to encapsulate another layer of packets in a packet on a local network across nodes. You may not want to use the Overlay network because it brings latency and complexity overhead caused by encapsulating and decapsulating all messages. Usually this is unnecessary, so we should use it only when we know why we need it.

To understand the flow of traffic in Overlay networks, let's take Flannel, an open source project of CoreOS, as an example. Flannel provides a virtual network for the container by assigning a subnet to each host. It is based on Linux TUN/TAP, uses UDP to encapsulate IP packets to create an overlay network, and maintains the distribution of the network with etcd.

Cdn.xitu.io/2019/11/19/16e8316586e8fdd4?w=2000&h=929&f=gif&s=1472688 ">

Kubernetes Node with route table

(cross-node Pod-to-Pod communication over Flannel Overlay network)

Here we notice that it is the same as the facility we saw before, except that a virtual Ethernet device called flannel0 has been added to root netns. It is an implementation of the virtual extension network Virtual Extensible LAN (VXLAN), but on Linux, it is just another network interface.

The flow of packets from pod1 to pod4 (at different nodes) is similar to the following:

1. It leaves from the eth0 port of netns in pod1 and enters root netns through vethxxx.

two。 It is then sent to cbr0,cbr0 to find the destination address by sending an ARP request.

3. Data encapsulation

* a. Since no Pod has the IP address of pod4 on this node, the bridge sends the packet to flannel0 because the flannel0 on the node's routing table is assigned to the destination address of the Pod segment.

* b. Flanneld daemon communicates with Kubernetes apiserver or the underlying etcd. It knows all the Pod IP and knows which node they are on. So Flannel creates a mapping between Pod IP and Node IP (in user space). Flannel0 takes the packet and encapsulates it with a UDP packet. The source and destination IP of the UDP packet header is changed to the IP of the corresponding node, and then the new packet is sent to a specific VXLAN port (usually 8472).

Packet-in-packet encapsulation

(notice the packet is encapsulated from 3c to 6b in previous diagram)

Although this mapping takes place in user space, real encapsulation and data flow take place in kernel space, it is still very fast.

* c. The encapsulated packet is sent over eth0 because it involves routed traffic between nodes.

4. The packet leaves the node with the node IP information as the source and destination addresses.

5. The routing table of the cloud provider already knows how to send a message between nodes, so the message is sent to the destination address node2.

6. Data unpacking

* a. The packet arrives at the eth0 network card of node2, and since the destination port is a specific VXLAN port, the kernel sends the message to flannel0.

* B. flannel0 unencapsulates the message and sends it to the root namespace. From here on, the path of the message is the same as the non-Overlay network we saw earlier in Part 1.

* c. Because IP forwarding is turned on, the kernel forwards the message to cbr0 according to the routing table.

7. The bridge gets the packet, sends an ARP request, and finds that the target IP belongs to vethyyy.

8. The packet crosses the pipe pair to reach the pod4

This is how Overlay networks work in Kubernetes, although there are slight differences between different implementations. A common misunderstanding is that when we use Kubernetes, we have to use the Overlay network. The truth is, it all depends on the specific scenario. So make sure you use it only in scenarios where you really need it.

4. Dynamic cluster

Because Kubernetes (or, more generally, distributed systems) is inherently changing, its Pod (and Pod's IP) is always changing. The reason for the change can be rolling upgrades and extensions for unpredictable Pod or node crashes. This makes it impossible for Pod IP to be used directly for communication.

Let's take a look at Kubernetes Service, which is a virtual IP with a set of Pod IP as Endpoint (identified by a tag selector). They act as virtual load balancers, their IP remains the same, while the back-end Pod IP may be constantly changing.

Label selector in Kubernetes service object

The implementation of the entire virtual IP is actually a set of iptables (the latest version can choose to use IPVS, but this is another discussion), managed by the Kubernetes component kube-proxy. The name is now actually misleading. It was indeed an agent before v 1.0, and because its implementation was a constant replication between kernel space and user space, it became very resource-intensive and slow. Now, it's just a controller, and like many other controllers in Kubernetes, it watch api server the endpoint changes and update the iptables rules accordingly.

Iptables DNAT

With these iptables rules, whenever a packet is sent to Service IP, it performs a DNAT (DNAT= destination Network address Translation) operation, which means that the destination IP is changed from Service IP to one of the Endpoint-Pod IP-randomly selected by iptables. This ensures that the load is evenly distributed in the back-end Pod.

5-tuple entries in the conntrack table

When this DNAT occurs, this information is stored in the conntrack-- the Linux connection tracking table (iptables rule 5 tuple translation and complete storage: protocol,srcIP,srcPort,dstIP,dstPort). So when the request comes back, it can un-DNAT, which means changing the source IP from Pod IP to Service IP. In this way, the client does not have to care about how the packet flow is handled in the background.

So by using Kubernetes Service, we can use the same port without any conflicts (because we can remap the port to Endpoint). This makes service discovery very easy. We can use internal DNS and hard-code the service hostname. We can even use the environment variables of the service host and port provided by Kubernetes to do service discovery.

Experts suggest: by taking the second approach, you can save unnecessary DNS calls, but due to the limitation of the creation order of the environment variables (the environment variables do not contain the services created later), it is recommended to use DNS for service name resolution.

4.1 outbound traffic

The Kubernetes Service we've discussed so far is working within a cluster. However, in most practical situations, the application needs to access some external api/website.

Typically, a node can have both private IP and public IP. For Internet access, these public and private IP have some kind of 1:1 NAT, especially in the cloud.

For normal traffic from a node to some external IP, the source IP changes from the private IP of the node to the public IP of its outbound packet, while the inbound response packet is just the opposite. However, when the Pod issues a connection to the external IP, the source IP is Pod IP, and the cloud provider's NAT mechanism is not aware of the IP. Therefore, it will drop packets with a source IP other than the node IP.

So you may be right, we will use more iptables! These rules are also added by kube-proxy to perform SNAT (Source Network address Translation), or IP MASQUERADE (IP camouflage). It tells the kernel to use the IP of the network interface sent out by this packet instead of the source Pod IP while keeping the conntrack entry for anti-SNAT operations.

4.2 inbound traffic

So far so good. Pod can talk to each other and access the Internet. But we still lack the key part-providing services for users to request traffic. So far, there are two main ways to do this:

* NodePort / Cloud load Balancer (L4-IP and Port)

Setting the service type to NodePort allocates nodePort with a range of 30000-33000 for the service by default. Even if Pod is not running on a specific node, this nodePort opens on each node. Inbound traffic on this NodePort will again be sent to one of the Pod using iptables (that Pod may even be on other nodes! ).

The LoadBalancer service type in the cloud environment will create a cloud load balancer (such as ELB) before all nodes, hitting the same nodePort.

* Ingress (L7-HTTP / TCP)

Many different tools, such as Nginx,Traefik,HAProxy, retain the mapping of http hostnames / paths and their respective backends. Usually this is a traffic entry point based on load balancer and nodePort, and its advantage is that we can have one entry to handle inbound traffic for all services without the need for multiple nodePort and load balancers.

4.3 Network Policy

Think of it as Pod's security group / ACL. NetworkPolicy rules allow / deny traffic across the Pod. The exact implementation depends on the network layer / CNI, but most use only iptables.

Concluding remarks

That's it so far. In the previous section, we studied the foundation of the Kubernetes network and how the overlay network works. Now we know how Service abstraction works within a dynamic cluster and makes service discovery very easy. We also introduced how outbound and inbound traffic works and how network policies work on security within the cluster.

Reference article

An illustrated guide to Kubernetes Networking-part1

An illustrated guide to Kubernetes Networking-part2

An illustrated guide to Kubernetes Networking-part3

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.