How to optimize the performance of Service in K8S large-scale scenario 04/29 Update SLTechnology News&Howtos

How to optimize the performance of Service in K8S large-scale scenario

2025-04-29 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Servers >

Shulou(Shulou.com)06/01 Report--

This article shows you how to optimize Service performance in K8S large-scale scenarios, the content is concise and easy to understand, it will definitely brighten your eyes. I hope you can get something through the detailed introduction of this article.

Kubernetes's native Service load balancing is based on Iptables, and its rule chain increases linearly with the number of Service, which has a serious impact on Service performance in large-scale scenarios. This article shares the exploration and practice of Huawei Cloud in Kubernetes service performance optimization.

In the process of enterprise business promotion, the traffic peaks in different business areas usually come in different time periods, and to deal with these different time traffic often need a lot of additional network resources to guarantee. Today we bring you some optimization practices on Kubernetes Service, which is a network-related topic.

Service Mechanism of Kubernetes

First take a look at the Service in Kubernetes. Before using Kubernetes, when we have a container network, the most direct way to access an application is for the client to access a Backend Container directly. This approach is the most intuitive and easy, and the problem is obvious. When the application has multiple backend containers, how to do load balancing, how to do session persistence, what to do when a container is moved, how to change the IP, and how to match the corresponding health check, and what to do if you want to use the domain name as an access entry. These are actually the problems to be solved by the introduction of Kubernetes's Service.

01 Kubernetes Service and Endpoints

This diagram shows the correspondence between Service and several other objects. The first is Service, which stores the access entry information of the service (such as IP, port), which can be simply understood as a LoadBalancer built into Kubernetes. Its function is to provide load balancing for multiple Pod.

The figure shows the Service corresponding to two pod deployed by a Replication Controller. We know that the relationship between RC and pod is related through label-selector, and so is service, which matches the Pod of its load balancing through Selector. In fact, there is an object in the middle, called Endpoint, why should there be this object? Because in practical applications, a pod is created, it does not mean that it can provide services immediately, and if the pod will be deleted, or in other bad states, we all hope that the client's requests will not be distributed to the pod that cannot provide services. The introduction of Endpoint is used to map pod that can provide services to the outside world. The IP of each Endpoints object corresponds to an internal domain name of Kubernetes, through which you can directly access the specific pod.

Let's look at the definitions of Service and Endpoint. Note here that Service has a property field of ClusterIP, which can be simply understood as a virtual IP. This ClusterIP is usually obtained for domain name resolution of Service. It is also worth noting that Service supports port mapping, that is, the ports exposed by Service do not have to be the same as container ports.

02 Service internal logic

I just introduced the relationship among Service, Pods and Endpoint, and then let's take a look at the internal logic of Service. Here, let's take a look at Endpoint Controller, which will watch Service the changes of objects and pod, and maintain the corresponding Endpoint information. Then on each node, KubeProxy maintains local routing rules according to Service and Endpoint.

In fact, whenever an Endpoint changes (that is, the status of Service and its associated Pod changes), Kubeproxy refreshes the corresponding rules on each node, so this is more like a load balancer close to the client-a Pod where Pod accesses other services, and the request selects its destination Pod through local routing rules before leaving the node.

Load balancing based on Iptables

OK, let's take a look at how the Iptables pattern is implemented.

Iptables is mainly divided into two parts, one is its command line tool, in the user mode; then it also has a kernel module, but in essence it is encapsulated and implemented through the kernel module Netfilter. The characteristic of Iptables is that it supports more operations.

This is a flowchart of IPtables's processing of network packets, and you can see that each packet goes through several points in sequence. First of all, it is PREROUTING, which will determine whether the request packet received is to access the local process or other machines. If it is to access other machines, it will go to FORWARD this chain, and then it will do Routing desicion again to determine where it wants to FORWARD, and finally go out via POSTROUTING. If you are visiting locally, you will come in to the INPUT line, find out which local request you want to access, and then process it locally. After processing, a new data packet will be generated. At this time, it will go to OUTPUT, and then go out via POSTROUTING.

01 Iptables to achieve traffic forwarding and load balancing

We know that Iptables is a professional firewall, so how does it do traffic forwarding, load balancing and even session persistence? As shown in the following figure:

02 example of the Application of Iptables in Kubernetes

So, how do you use Iptables to achieve load balancing in Kubernetes? Let's look at a practical example. In Kubernetes, the Iptables links from VIP to RIP include: PREROUTING/OUTPUT (depending on whether the traffic is coming from the local machine or external machine)-> KUBE-SERVICES (the entry of all Kubernetes custom chains)-> KUBE-SVC-XXX (the latter string of hash values is generated by the virtual IP of Service)-> KUBE-SEP- > XXX (the latter string of hash values is generated by the actual IP of the backend Pod).

Current problems in Iptables implementation

01 the problem of load balancing with Iptables

So what are the main shortcomings of Iptables in load balancing? At first, we just analyzed the principle, but then we measured it in a large-scale scenario and found that the problem was actually very obvious.

The first is delay, matching delay and rule update delay. We can see from the example just now that the virtual IP of each Kubernetes Service corresponds to a chain under the kube-services. The rule matching of Iptables is linear and the time complexity of matching is O (N). Rule updates are non-incremental, and even if a rule is added / deleted, the Netfilter rule table is modified as a whole.

The second is scalability. We know that when there is a large number of Iptables in the system, updates can be very slow. At the same time, because the full submission process is protected, kernel lock will appear, and you can only wait.

Finally, there is usability. When the service capacity is expanded / reduced, the refresh of the Iptables rule will cause the connection to be disconnected and the service unavailable.

02 Iptables rule matching delay

The figure above shows that the Service access latency increases as the number of rules increases. But in fact, it is acceptable, because the highest latency is 8000us (8ms), which shows that the real performance bottleneck is not here.

03 Iptables rule update delay

So where is the delay in updating the rules of Iptables?

First of all, the rule update of Iptables is full update, not even-- no--flush (--no--flush only guarantees that the old rule chain will not be deleted when iptables-restore).

In addition, kube-proxy periodically refreshes the Iptables state: first iptables-save copies the system Iptables status, then updates some rules, and finally writes to the kernel through iptables-restore. When the number of rules reaches a certain level, the process becomes very slow.

There are many reasons for such a high latency, and there are some differences between different kernel versions. In addition, the delay is also closely related to the current memory usage of the system. Because Iptables updates Netfilter's rule table as a whole, allocating a larger kernel memory (> 128MB) at a time will have a larger latency.

04 Iptables periodic refresh causes TPS jitter

The figure above shows that under the highly concurrent loadrunner stress test, the periodic refresh of Iptables by kube-proxy leads to the disconnection of backend service and the periodic fluctuation of TPS.

K8S Scalability

So this brings a very big limit to the performance of the data side of Kubernetes. We know that the scale of the community management side actually reached 5000 nodes last year, while the data side did not give a specification due to the lack of an authoritative definition.

We evaluated in multiple scenarios and found that the number of Service can easily reach tens of thousands, so optimization is still necessary. There were two main optimization schemes that arrived first at that time:

The rules of Iptables are organized by tree structure, so that the matching and rule updating process become tree operations, so as to optimize the two delays.

With IPVS, I'll talk about its benefits later.

An example of organizing Iptables rules using a tree structure is as follows:

In this example, the root of the tree is a 16-bit address, the two child nodes of the root are 24-bit addresses, and the virtual IP, as a leaf node, is hung under different tree nodes according to different network segments. In this way, the delay of rule matching is reduced from O (N) to O (M-th root of N), M is the height of the tree. But the cost of doing so is that the Iptables rules become more complex.

Implementation of Service load balancing with IPVS

01 what is IPVS

Implementation of Load Balancer,LVS load Balancer in Transport layer

Also based on Netfilter, but using the hash table

Support TCP, UDP,SCTP protocol, IPV4,IPV6

Supports a variety of load balancing strategies, such as rr, wrr, lc, wlc, sh,dh, lblc...

Support session persistence, persistent connection scheduling algorithm.

Three forwarding modes of 02 IPVS

IPVS has three forwarding modes: DR, tunnel and NAT.

DR mode works in L2 and uses the MAC address, which is the fastest. The request message is forwarded to the back-end server through IPVS director, and the response message is sent back directly to the client. The disadvantage is that port mapping is not supported, so it is a pity that this mode has lost PASS.

Tunnel mode, using IP packages to encapsulate IP packets. After receiving the tunnel packet, the back-end server will first remove the encapsulated IP address header, and then the response message will be sent back directly to the client. IP mode also does not support port mapping, so this mode has been dropped by PASS.

NAT mode supports port mapping, and unlike the previous two modes, NAT mode requires a backhaul message to pass through IPVS's director. The native version of the kernel, IPVS, only does DNAT, not SNAT.

03 using IPVS to realize traffic forwarding

You only need to go through the following simple steps to use IPVS for traffic forwarding.

Bind VIP

Because IPVS's DNAT hook is hung on the INPUT chain, you must let the kernel recognize that VIP is the native IP. There are at least three ways to bind VIP:

1. Create a dummy network card and bind it, as shown below.

# ip link add dev dummy0 type dummy # ip addr add 192.168.2.2/32 dev dummy0

two。 Add the IP address VIP directly to the local routing table.

# ip route add to local 192.168.2.2/32 dev eth0proto kernel

3. Add a network card alias to the local network card.

# ifconfig eth0:1 192.168.2.2netmask255.255.255.255 up

Create an IPVS virtual server for this virtual IP

# ipvsadm-A-t 192.168.60.200 ipvsadm 80-s rr-p 600

In the above example, the virtual IP of IPVS virtual server is 192.168.60.200 80, and the session persistence time is 600s.

Create a corresponding real server for this IPVS service

# ipvsadm-a-t 192.168.60.200 purl 80-r 172.17.1.2 pur80murm

# ipvsadm-a-t 192.168.60.200 purl 80-r 172.17.2.3 purl 80ripm

In the above example, two real server:172.17.1.2:80 and 172.17.2.3 real server:172.17.1.2:80 80 are created for the virtual server of 192.168.60.200 IPVS.

Iptables vs. IPVS

01 Iptables vs. IPVS rules increase delay

By looking at the image above, it is easy to see:

Increase the delay of Iptables rules and increase exponentially with the increase of the number of rules.

When the number of Service in the cluster reaches 20,000, the delay of the new rule is changed from 50us to 5 hours.

On the other hand, increasing the delay of IPVS rules is always within 100us and is almost unaffected by the rule cardinality. This small difference can even be thought of as a systematic error.

02 Iptables vs. IPVS network bandwidth

This is the network bandwidth measured by iperf in both modes. You can see the difference in bandwidth between the first Service and the last Service in Iptables mode. The bandwidth of the last Service is significantly smaller than that of the first, and with the increase of the Service base, the difference becomes more and more obvious.

In IPVS mode, the overall bandwidth performance is higher than that of Iptables. When the number of Service in the cluster reaches 25000, the bandwidth in Iptables mode is basically zero, while the service in IPVS mode can still maintain about half of the previous level and provide normal access.

03 Iptables vs. IPVS CPU/ memory consumption

Obviously, IPVS's metrics in both dimensions of CPU/ memory are much lower than those of Iptables.

Feature community status

This feature has been introduced from Alpha 1.8 to Beta 1.9, which fixes most of the problems and is now relatively stable and highly recommended. In addition, this feature is mainly maintained by our Huawei Cloud K8S open source team. If you find a problem in use, you are welcome to reflect it to the community or our side.

The era of cloud native has come, and Huawei Cloud has taken the first step in the construction of cloud native infrastructure through Kubernetes. Although there are still many challenges waiting for us to deal with in practice, I believe that with our continuous investment in technology, these problems will be solved one by one.

The above content is how to optimize Service performance in large-scale K8S scenarios. Have you learned any knowledge or skills? If you want to learn more skills or enrich your knowledge reserve, you are welcome to follow the industry information channel.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.