What is the network load balancing of TKE based on elastic Nic directly connected to Pod? 07/09 Update SLTechnology News&Howtos

What is the network load balancing of TKE based on elastic Nic directly connected to Pod?

2025-07-09 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Servers >

Shulou(Shulou.com)05/31 Report--

Today, I will talk to you about the network load balancing of TKE based on elastic Nic directly connected to Pod. Many people may not know much about it. In order to make you understand better, the editor has summarized the following contents. I hope you can get something from this article.

Preface

Kubernetes designs and provides two native resources Service and Ingress in the cluster access layer, which are responsible for the network access layer configuration of layer 4 and layer 7, respectively.

The traditional practice is to create a Service of Ingress or LoadBalancer type to bind Tencent Cloud's load balancer to expose the service to the public. This method loads the user traffic onto the NodePort of the user node and forwards it to the container network through KubeProxy components, but this solution is limited in terms of business performance and capability support.

To solve this problem, the TKE container team provides a new network model for users who use stand-alone or managed clusters on Tencent Cloud. The solution of directly connecting Pod with ENI greatly enhances performance and business capabilities.

This article will start with the problems of the traditional model, compare the difference between the new model and the old model, and finally provide guidelines for the use of the new directly connected model.

Problems and challenges faced by traditional models performance and characteristics

In the cluster, KubeProxy forwards the traffic of user NodePort to the cluster network through NAT. This NAT forwarding brings the following problems.

NAT forwarding results in a certain performance loss of the request.

Performing NAT operations can itself result in a performance penalty.

The destination address of NAT forwarding may cause traffic to be forwarded across nodes within the container network.

NAT forwarding causes the source IP of the request to be modified and the client cannot obtain the source IP.

When load-balanced traffic is concentrated on several NodePort. Excessive concentration of traffic will cause too much SNAT forwarding in NodePort, causing the source port to exhaust traffic abnormally. It can also cause conntrack insertion conflicts, resulting in packet loss and affecting performance.

The forwarding of KubeProxy is random and cannot support session persistence.

In fact, each NodePort of KubeProxy also plays an independent role in load balancing. Because load balancing cannot converge to one place, it is difficult to achieve global load balancing.

In order to solve the above problems, our previous technical advice to users is mainly through Local forwarding to avoid the problems caused by KubeProxyNAT forwarding. However, session persistence is still not supported when multiple copies are deployed on a node because of the randomness of forwarding. Moreover, Local forwarding is prone to service flash when rolling updates, which puts forward higher requirements for business rolling update strategy and elegant downtime. We have reason to find a better solution to this problem.

Business availability

When accessing services through NodePort, the design of NodePort has great fault tolerance. The load balancer binds the NodePort of all nodes in the cluster as the backend. When any node of the cluster accesses the service, the traffic is randomly allocated to the workload of the cluster. This means that the unavailability of some NodePort or the unavailability of Pod will not affect the traffic access of the service.

As with Local access, when the cloud load balancer backend is directly connected to the user Pod, when the business is rolling and the load balancer cannot bind the new Pod in time, the fast scrolling of the business may result in a serious shortage or even emptying of the backend of the load balancer at the business entry. Therefore, when the service is updated, the load balance of the access layer is in a good state to ensure the security and stability of the rolling update.

Control plane performance of load balancing

The control plane interface of load balancing. This includes creating, deleting and modifying layer-4 and layer-7 listeners, creating and deleting layer-7 rules, and binding the backend of each listener or rule. Most of these interfaces are asynchronous interfaces, which need to poll the request result, and the calling time of the interface is relatively long. When the user cluster is large, a large number of access layer resource synchronization will lead to great delay pressure on the components.

Comparison of performance between old and new models

Pod direct connection mode has been launched in Tencent TKE, which is a control plane optimization for load balancer. For the whole synchronization process, we focus on optimizing the batch call and back-end instance query, which are two places where remote calls are more frequent. * * after the optimization is completed, the performance of the control plane in typical scenarios of Ingress is about 95% Mel 97% better than that of the previous version. * * currently, the time consuming of synchronization is mainly focused on the waiting of asynchronous interfaces.

Sudden increase of backend nodes (in response to cluster expansion scenarios)

The sudden increase of layer-7 rules (in response to the scenario in which the business is deployed to the cluster for the first time)

In addition to the hard core optimization such as control plane performance optimization, the load balancer's ability to directly access the Pod of the container network is the most important part of the component business capability, which not only avoids the loss of NAT forwarding performance, but also avoids the impact of NAT forwarding on the business functions in the cluster. However, this area does not have a good support for accessing the container network when the project is started. Therefore, in the first phase, we consider that under the cluster CNI network mode, Pod has an elastic Nic entrance, which can be directly connected to the load balancer to achieve direct access. The cloud load balancer accesses the container network directly at the backend. Currently, there is a solution for cloud networking, and this direct connection solution closer to the cluster network will be followed up later.

Now that you can access it directly, how can you ensure the availability of rolling updates? We found an official feature, ReadinessGate. This feature was officially provided on 1. 12 and is mainly used to control the status of Pod. By default, Pod has the following Condition:PodScheduled, Initialized, and ContainersReady. When all these states are Ready, the Condition of Pod Ready is passed. However, under the cloud native scenario, the state of Pod is very likely to need to refer to other states. ReadinessGate provides such a mechanism that allows a fence to be added to the state judgment of Pod, which can be judged and controlled by a third party. In this way, the status of the Pod is associated with a third party.

Load balancing traffic compared with traditional NodePort model

Request details process

Request traffic enters the load balancer

The request is forwarded to the NodePort of a node by the load balancer

KubeProxy forwards traffic from NodePort on NAT, and the destination address is a random Pod.

The request enters the container network and is forwarded to the corresponding node according to the Pod address.

The request comes to the node to which Pod belongs and is forwarded to Pod.

New Pod Direct connection Mode

Request details process

Request traffic enters the load balancer

The request is forwarded to the ENI ENI of a Pod by the load balancer.

The difference between directly connected and Local access

It seems that the effect of the two access methods is the same, but there are still some differences in details.

There is little difference in performance. When Local access is enabled, the traffic will not perform NAT operations or be forwarded across nodes, so there is only one more route to the container network.

Without a NAT operation, the source IP can be obtained correctly. The session persistence feature may have the following problems: when there are multiple Pod on a node, which Pod the traffic goes to is random, and this mechanism may cause problems with session persistence.

The introduction of ReadinessGate

There are two details that can be answered here.

Why is the cluster version higher than 1.12?

Why READINESS GATES is listed in the result of kubectl get pod-o wide.

This involves a problem related to rolling updates when users start rolling updates for the application, Kubernetes scrolls according to the update policy. However, it determines that the identity of a batch of Pod launches only includes the status of the Pod itself, and does not consider whether the Pod has been configured and passed the health check on the load balancer. Sometimes when the access layer components are under high load and these Pod can not be scheduled in time, these successful rolling updates of Pod may not be providing external services, resulting in service interruption. To associate rolling updates with the back-end state of load balancing, the TKE access layer component introduces a new feature, ReadinessGate, introduced in Kubernetes 1.12. Only when the TKE access layer component confirms that the backend binding is successful and the health check passes, it configures the status of the ReadinessGate to make the Pod reach the Ready state, thus promoting the rolling update of the entire workload.

Details of using ReadinessGate in a cluster Kubernetes cluster provides a mechanism for service registration. You only need to register your service in the cluster in the form of MutatingWebhookConfigurations resources. The cluster will notify you according to the callback path of your configuration when the Pod is created, and you can do some pre-creation operations on the Pod. In this Case, you can add ReadinessGate to the Pod. The only thing to note is that the callback process must be Https, so the standard configuration needs to configure the CA that issues the request in the MutatingWebhookConfigurations and the certificate issued by the CA on the server.

The service registration or certificate in the disaster recovery user cluster of ReadinessGate mechanism may be deleted by the user, although these system component resources should not be modified or destroyed by the user. However, under the user's exploration or misoperation of the cluster, this kind of problem will inevitably appear. Therefore, the access layer component will check the integrity of the above resources when starting, and reconstruct the above resources when the integrity is destroyed, so as to enhance the robustness of the system.

Comparison of QPS and network delay

Direct connection and NodePort is the access layer solution of service application. In fact, it is the workload deployed by the user that ultimately participates in the work. The capacity of the user workload directly determines the QPS and other indicators of the business. Therefore, for these two access layer schemes, under the condition of low workload pressure, we focus on some comparative tests on the delay of the network link. Directly connected to the network link at the access layer can optimize about 10% of the time. At the same time, the monitoring in the test also found that the direct connection mode reduced a lot of traffic in the VPC network. In the test scenario, from 20 nodes to 80 nodes, the size of the cluster is gradually increased, and the network delay of the cluster is tested by wrk tool. For QPS and network delay, the following figure shows the comparison test between directly connected scenario and NodePort.

Some Design thoughts of KubeProxy

The shortcomings of KubeProxy are as obvious as those mentioned earlier. However, based on the various characteristics of cloud load balancing and VPC network, we can give a variety of other more localized access layer solutions. But that doesn't mean that KubeProxy is poorly designed or ineffective. The design of the cluster access layer is very universal and fault-tolerant, and it is basically suitable for clusters in all business scenarios. As an official component, this design is very appropriate.

Pre-requirements for the guidelines for the use of the new model

The Kubernetes cluster version needs to be higher than 1.12.

VPC-CNI elastic Nic mode must be enabled in cluster network mode.

The workload used by the directly connected mode Service needs to use the VPC-CNI elastic Nic mode.

Console operation guide

Refer to the steps of creating Service in the console, go to the "New Service" page, and set the Service parameters according to the actual needs.

Some of the key parameter information needs to be set as follows, as shown in the following figure:

Service access method: select "provide public network access" or "VPC private network access".

Network mode: check [adopt Pod mode of load balancer direct connection].

Workload binding: select [reference Worklocad] and select the back-end workload in VPC-CNI mode in the pop-up window.

Click * * create Service * * to complete the creation.

Operation instructions for Kubectl

Workload example: nginx-deployment-eni.yaml

Notice that tke.cloud.tencent.com/networks: tke-route-eni is declared in spec.template.metadata.annotations, and the VPC-CNI elastic Nic mode is used in the workload.

ApiVersion: apps/v1 kind: Deployment metadata: labels: app: nginx name: nginx-deployment-eni spec: replicas: 3 selector: matchLabels: app: nginx template: metadata: annotations: tke.cloud.tencent.com/networks: tke-route-eni labels: app: nginx spec: containers:-image: nginx:1.7.9 name: nginx ports:-containerPort: 80 protocol: TCP

-Service example: nginx-service-eni.yaml > Note: `metadata.annotations` service.cloud.tencent.com/direct-access: "true" `is declared in `metadata.annotations`. When synchronizing load balancers, Service will use a directly connected configuration to access the backend. ````yamlapiVersion: v1kind: Servicemetadata:annotations: service.cloud.tencent.com/direct-access: "true" labels: app: nginxname: nginx-service-enispec:externalTrafficPolicy: Clusterports:-name: 80-80-no port: 80 protocol: TCP targetPort: 80selector: app: nginxsessionAffinity: Nonetype: LoadBalancer ```- deploy the above to the cluster > Note: in your environment, you first need to connect to the cluster (if there is no cluster, you need to create a cluster first) You can configure the kubectl connection cluster by referring to the help documentation at the end of the article. ````shell ➜~ kubectl apply-f nginx-deployment-eni.yamldeployment.apps/nginx-deployment-eni created➜ ~ kubectl apply-f nginx-service-eni.yamlservice/nginx-service-eni configured ➜~ kubectl get pod-o wideNAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATESnginx-deployment-eni-bb7544db8-6ljkm 1 Running 0 24s 172.17.160.191 172.17.0.3 1/1nginx-deployment-eni-bb7544db8-xqqtv 1/1 Running 0 24s 172.17.160.190 172.17.0.46 1/1nginx-deployment-eni-bb7544db8-zk2cx 1/1 Running 0 24s 172.17.160.189 172.17.0.9 1/1 ➜~ kubectl get service- o wideNAME TYPE CLUSTER-IP EXTERNAL-IP PORT (S) AGE SELECTORkubernetes ClusterIP 10.187.252.1 443/TCP 6d4h nginx-service-eni LoadBalancer 10.187.254.62 150.158.221.31 80:32693/TCP 6d1h app=nginx ``` Do you have any further understanding of TKE's network load balancing based on elastic Nic directly connected to Pod? If you want to know more knowledge or related content, please follow the industry information channel, thank you for your support.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.