Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

Research on ARP Overflow problem in TKE Container Network and what is its solution

2025-04-03 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Servers >

Share

Shulou(Shulou.com)05/31 Report--

This article will explain in detail the exploration of ARP Overflow problems in TKE container network and its solutions. The content of the article is of high quality, so the editor shares it for you as a reference. I hope you will have a certain understanding of the relevant knowledge after reading this article.

1. Question background 1.1 problem description

Recently, there is no access between pod on the independent network card cluster of an internal customer in TKE VPC-CNI mode, and the problem pod ping does not reach any other pod and nodes.

Check the dmesg kernel log and get the following error message: neighbour: arp_cache: neighbor table overflow! (the following is a screenshot of the log for subsequent reproduction)

Moreover, the scale of this cluster is relatively large, with about 1000 nodes and 30000 pod. It can be suspected that the large scale of the cluster leads to too many ARP items, which leads to the problem of ARP Overflow.

1.2 nouns explain that TKE is the full name Tencent Kubernetes Engine, Tencent Cloud CCS, which is based on native kubernetes to provide container-centered, highly scalable, high-performance container management service VPC-CNI model is container service TKE based on CNI and VPC ENI container network capability PodPod is the basic resource management unit of kubernetes, has an independent network namespace, and one Pod can contain multiple containers 2. Preliminary analysis of the problem

As can be seen from the error message above, the basic reason for this problem is that the ARP cache table is full. This involves the kernel's ARP cache garbage collection mechanism. When there are too many ARP table items and there are no recyclable table items, new table items cannot be inserted.

As a result, the corresponding hardware address (MAC) cannot be found when the network packet is sent. So that the network packet cannot be sent.

So what exactly causes the new table item to fail to be inserted? To answer this question, we need to take an in-depth look at ARP cache aging and garbage collection mechanisms.

3. ARP cache aging recovery mechanism 3.1 ARP cache table item state machine

The figure above shows the life cycle of the entire ARP item and its state machine.

We know that when sending a TCP/IP network packet, the network stack needs the MAC address of the opposite end to convert the network packet into a two-layer data structure-frame, so that it can be transmitted in the network. For IP addresses in different broadcast domains, the peer MAC address is the gateway, and the sender will send the network packet to the gateway for forwarding, while for the IP address in the same broadcast domain, the peer MAC address corresponds to the IP address.

Finding the MAC address through the IP address is the main work of the ARP protocol. The working process of the ARP protocol will not be described here, but after finding the MAC address corresponding to the IP address through the ARP protocol, the corresponding relationship will be stored on the local machine for a period of time to reduce the communication frequency of the ARP protocol and speed up the transmission of network packets. The corresponding relationship, that is, the ARP cache table entry, its state machine or its entire life cycle can be described as follows:

Initially, when any network packet is sent, the kernel protocol stack needs to find the peer IP address corresponding to the destination MAC address. If there is no hit in the ARP cache, a new table entry with a status of Incomplete will be inserted. The Incomplete status attempts to send an ARP packet, requesting the MAC address corresponding to an IP address.

If you receive a response from ARP, the status of the table item becomes Reachable.

If no response is received after a certain number of attempts, the table item becomes Failed.

After the Reachable table item reaches the timeout, it becomes the Stale state, and the table entry of the Stale state is no longer available.

If the Stale table item is referenced to send the package, the table item will change to the Delay state.

Table items in Delay status are also not available to send packets, but if you receive a native confirmation from ARP before the expiration of Delay status, you will revert to Reachable status again.

The Delay state expires and the table item changes to the Probe state, which is similar to the Incomplete state.

When the Stale status expires, it will be reclaimed and deleted by the initiated garbage collection.

You can view the arp table entries and their status in the current network namespace (network namespace) with the following command:

Ip neigh

Such as:

Local confirmation: this means that the machine has received a network packet with a matching source mac address. This network packet indicates that the "last hop" of this network communication is the machine with the mac address. If the network packet is received, the mac address is reachable. Therefore, the table item can be changed to Reachable state. Through this mechanism, the kernel can reduce the communication requirements of ARP.

3.2 Kernel parameters involved

The following is a list of the main kernel parameters involved in this mechanism:

Parameter meaning default value / proc/sys/net/ipv4/neigh/default/base_reachable_timeReachable state base expiration time. The expiration time of each table item is 30 seconds / proc/sys/net/ipv4/neigh/default/base_reachable_time_msReachable state base expiration time between [1]. Milliseconds represent 30 seconds / proc/sys/net/ipv4/neigh/default/gc_stale_timeStale state expiration time 60 seconds / proc/sys/net/ipv4/neigh/default/delay_first_probe_timedelay state expiration time to Probe 5 seconds / proc/sys/net/ipv4/neigh/default/gc_intervalgc startup cycle time 30 seconds / proc/sys/net/ipv4/neigh/default/gc_thresh2 less than this value Gc will not start the soft limit of the maximum records of the 2048 procession hand, sysbank, and gc, which is greater than that number, is allowed to exceed the hard limit of the maximum number of records of the defaultGcfaulh4ARP table, which is greater than that number, and the hard limit is started immediately and forced to recover 8192.

The kernel parameters related to gc are valid for * * all interface * *. However, various expiration time settings are only valid for individual network cards (interface), and default values are only valid for new interface devices.

3.3 ARP cache garbage collection mechanism

From the state machine that caches the table items, we know that not all table items will be reclaimed, only when the Stale state expires, the Failed table items may be reclaimed. In addition, the garbage collection of ARP cache table items is triggered, and the table items that need to be collected may not be collected immediately. There are four startup logic for garbage collection of ARP cache table items:

Number of arp items

< gc_thresh2,不启动。 gc_thresh2 =< arp 表项数量 /proc/sys/net/ipv4/neigh/default/gc_thresh2echo 16384 >

/ proc/sys/net/ipv4/neigh/default/gc_thresh3echo 32768 > / proc/sys/net/ipv4/neigh/default/gc_thresh46. Summary

After the ARP cache is full, the Pod will be out of service. At first glance, it looks simple, but the ARP cache aging and garbage collection mechanisms behind it are also complicated. A lot of information has been queried, but the questions such as "whether the garbage collection threshold is valid for the cumulative values of ARP items in each namespace or separately", "which table items will be collected by garbage collection", "how the table items will behave when they are full" and other questions are unclear. Therefore, the author tries to verify the specific behavior pattern through several small experiments. Instead of directly reading obscure kernel source code, experimentation may also be a shortcut to studying problems and understanding mechanisms. I hope I can help your readers.

On the TKE container network ARP Overflow problem exploration and what is the solution to share here, I hope the above content can be of some help to you, can learn more knowledge. If you think the article is good, you can share it for more people to see.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Servers

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report