An example Analysis of the time principle of ARP Cache Aging implemented by Linux 04/24 Update SLTechnology News&Howtos

An example Analysis of the time principle of ARP Cache Aging implemented by Linux

2025-04-24 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Servers >

Shulou(Shulou.com)06/01 Report--

This article mainly introduces the Linux implementation of ARP cache aging time principle of the example analysis, the article is very detailed, has a certain reference value, interested friends must read it!

one。 problem

As we all know, ARP is an address resolution protocol in the link layer, which uses the IP address as the key value to query the MAC address of the host that holds the IP address. The details of the agreement will not be detailed. You can read the RFC or the textbook. The main purpose of writing such an article here is to make some notes and to provide some ideas for the students. Specifically, I have encountered two problems:

1. A system that uses keepalived for hot backup needs a virtual IP address, but which machine the virtual IP address belongs to is determined by the master / slave of the hot backup group, so when the master machine gets the virtual IP, it must broadcast a free arp. At first, people thought this was not necessary, the reason is not to do so, the hot backup group also works well, but the facts have proved that this is necessary.

2.ARP cache table items have an aging time, but there is no specific way to set this aging time in the linux system. So how to set the aging time?

two。 Instructions before answering questions

The specification of the ARP protocol only describes the details of address resolution, but it does not specify how the implementation of the protocol stack maintains the ARP cache. The ARP cache requires an expiration time, which is necessary because the ARP cache does not maintain the state of the mapping and does not authenticate, so the protocol itself does not guarantee that the mapping will always be correct, it can only guarantee that the mapping is valid for a certain period of time after the arp reply. This also gives ARP spoofing an opportunity, but this article will not discuss such spoofing.

Devices like Cisco or VRP-based Huawei have a clear configuration to configure the expiration time of the arp cache, but there is no such configuration in the Linux system, or at least no such direct configuration. Linux users all know that if you need to configure any system behavior, it is a way to use the sysctl tool to configure the sys interface under procfs. However, when we google for a long time and finally find that the configuration of ARP is in / proc/sys/net/ipv4/neigh/ethX, we are finally confused by the N multiple files in this directory. Even if we query the Documents of the Linux kernel, we can not clearly understand the specific meaning of these files. For a mature system like Linux, there must be a way to configure the expiration time of the ARP cache, but when it comes to operation, how to configure it? It also starts with the ARP state machine implemented by Linux.

If you have read "Understading Linux Networking Internals" and really have an in-depth understanding, then this article is basically nonsense, but many people have not read that book, so the content of this article still has some value.

The Linux stack implementation maintains a state machine for ARP caching. Before you understand the specific behavior, take a look at the following diagram (modified based on figure 26-13 in "Understading Linux Networking Internals", in Chapter 26):

In the figure above, we see that only the reachable state of the arp cache item is available to the outgoing package, but it is actually not available for the arp cache item in the stale state. If someone wants to send a package at this time, it needs to be reparsed, which, as usual, means resending the arp request, which is not necessarily the case, because Linux adds an "event point" to arp to optimize the maintenance of the cache generated by the arp protocol "without sending arp requests". In fact, this measure is very effective. This is the "acknowledgement" mechanism of arp, that is, if you send a packet from a neighbor to the local machine, you can confirm that the neighbor of the packet "last hop" is valid, but why can only the packet arriving on the local machine confirm the validity of the neighbor "previous hop"? Because Linux does not want to burden the processing of the IP layer, that is, it does not want to change the original semantics of the IP layer.

Linux maintains a stale state in order to preserve a neighbour structure, and only individual fields are modified or populated when its state changes. If you follow a simple implementation, you can save only one reachable state, and delete the arp cache table entry when it expires. Linux only makes a lot of optimizations, but if you rack your brains for these optimizations, it will be a tragedy.

III. How to maintain this stale state by Linux

In the ARP state machine implemented by Linux, the most complex is the stale state. The arp cache table entry in this state is faced with the choice of life and death, and the decision maker is the locally issued package. If the locally issued package uses the arp cache table entry of the stale state, then push the state machine to the delay state. If no one is using the neighbor after the garbage collection timer expires, it is possible to delete the table entry. Do you want to delete it? To see if other paths use it, the key is to look at routing caching. Although routing caching is a layer 3 concept, it retains the ARP cache table entry of the next route. In this sense, Linux routing cache is actually a forwarding table rather than a routing table.

If this table item is used in an outgoing package, the ARP state machine of the table item will enter the delay state. In the delay state, linux will not send an ARP request as long as there is a "local" acknowledgement (the last hop of the locally received packet comes from the neighbor), but if there is no local acknowledgement all the time, then Linux will send a real ARP request and enter the probe state. So you can see that starting from the stale state, all states exist only for an optimization measure, and the ARP cache table entry for stale state is a cached cache. If Linux simply deletes the arp cache table entry for expired reachable state, the semantics are the same, but the implementation looks and understands much easier!

Again, reachable expires to enter the stale state instead of being deleted directly, in order to retain the neighbour structure, optimize memory and CPU utilization. In fact, the arp cache table entry that enters the stale state is not available. In order to make it available, either it is locally confirmed before the expiration of the delay status timer. For example, tcp received a packet, or the delay state expired and entered the probe state after the arp request was responded. Otherwise it will still be deleted.

IV. Key points of ARP cache implementation of Linux

Analyzing the source code in blog is a childhood memory, and now it is no longer a waste of space. You just need to know the main points of several timers that Linux maintained when implementing arp.

1.Reachable status timer

Whenever an arp response arrives or other neighbors that can prove that the ARP entry is really reachable, the timer is started. When it expires, the corresponding ARP cache table entry is transitioned to the next state according to the configured time.

two。 Garbage collection timer

Start the timer regularly. The next expiration is determined according to the configured base_reachable_time. For more information, please see the following code:

The code is as follows:

Static void neigh_periodic_timer (unsigned long arg)

{

...

If (time_after (now, tbl- > last_rand + 300 * HZ) {/ / Kernel is reconfigured every 5 minutes

Struct neigh_parms * p

Tbl- > last_rand = now

For (p = & tbl- > parms; p; p = p-> next)

P-> reachable_time =

Neigh_rand_reach_time (p-> base_reachable_time)

}

...

/ * Cycle through all hash buckets every base_reachable_time/2 ticks.

* ARP entry timeouts range from 1 + 2 base_reachable_time to 3 + + 2

* base_reachable_time.

, /

Expire = tbl- > parms.base_reachable_time > > 1

Expire / = (tbl- > hash_mask + 1)

If (! expire)

Expire = 1

/ / when the next expiration is based entirely on base_reachable_time)

Mod_timer (& tbl- > gc_timer, now + expire)

...

}

Static void neigh_periodic_timer (unsigned long arg)

{

...

If (time_after (now, tbl- > last_rand + 300 * HZ) {/ / Kernel is reconfigured every 5 minutes

Struct neigh_parms * p

Tbl- > last_rand = now

For (p = & tbl- > parms; p; p = p-> next)

P-> reachable_time =

Neigh_rand_reach_time (p-> base_reachable_time)

}

...

/ * Cycle through all hash buckets every base_reachable_time/2 ticks.

* ARP entry timeouts range from 1 + 2 base_reachable_time to 3 + + 2

* base_reachable_time.

, /

Expire = tbl- > parms.base_reachable_time > > 1

Expire / = (tbl- > hash_mask + 1)

If (! expire)

Expire = 1

/ / when the next expiration is based entirely on base_reachable_time)

Mod_timer (& tbl- > gc_timer, now + expire)

...

}

Once this timer expires, the neigh_periodic_timer callback function is executed with the following logic, that is, the above. The omitted part:

The code is as follows:

If (atomic_read (& n-> refcnt) = = 1 & & / / n-> used may move forward because of the "local confirmation" mechanism

(state = = NUD_FAILED | | time_after (now, n-> used + n-> parms- > gc_staletime)) {

* np = n-> next

N-> dead = 1

Write_unlock (& n-> lock)

Neigh_release (n)

Continue

}

If (atomic_read (& n-> refcnt) = = 1 & & / / n-> used may move forward because of the "local confirmation" mechanism

(state = = NUD_FAILED | | time_after (now, n-> used + n-> parms- > gc_staletime)) {

* np = n-> next

N-> dead = 1

Write_unlock (& n-> lock)

Neigh_release (n)

Continue

}

If your stale entry is not deleted in time in the experiment, try the following command:

[plain] view plaincopyprint?ip route flush cache

Ip route flush cache then looks at the results of ip neigh ls all, and note that don't expect to be deleted immediately, because the garbage collection timer hasn't expired yet. But I can assure you that the cache table entry will be deleted after a short period of time.

five。 The solution of the first problem

On the group with keepalived enabled for vrrp-based hot backup, many students think that it is not necessary to re-bind their MAC address and virtual IP address when they enter the master state, but this is fundamentally wrong. If there is no problem, it is a fluke, because the default arp timeout on each router is generally very short, but we cannot rely on this configuration. Please look at the following illustration:

If a handover occurs, assuming that the arp cache timeout on the router is 1 hour, one-way data will not be able to communicate for nearly an hour (assuming that the hosts in the group will not send data through the router and discharge a "local acknowledgement". After all, I don't know if the router is running Linux), the data on the router will continue to go to the original master, but the original matser no longer holds the virtual IP address.

Therefore, in order to make the data behavior no longer dependent on the configuration of the router, you must manually bind the virtual IP address and your own MAC address when switching to master under the vrrp protocol. The convenient arping on Linux is:

[plain] view plaincopyprint?arping-I ethX-S 1.1.1.1-B-c 1

Arping-I ethX-S 1.1.1.1-B-C1 so that the master host with the IP address 1.1.1.1 broadcasts the ARP request with IP address 255.255.255.255 to the whole network. Assuming that the router is running Linux, the router will update its local ARP cache entry (if any) based on the source IP address after receiving the ARP request. However, the problem is that the update status of the entry is stale. This is just the rule of ARP, as shown in the code, at the end of the arp_process function:

The code is as follows:

If (arp- > ar_op! = htons (ARPOP_REPLY) | | skb- > pkt_type! = PACKET_HOST)

State = NUD_STALE

Neigh_update (n, sha, state, override? NEIGH_UPDATE_F_OVERRIDE: 0)

If (arp- > ar_op! = htons (ARPOP_REPLY) | | skb- > pkt_type! = PACKET_HOST)

State = NUD_STALE

Neigh_update (n, sha, state, override? NEIGH_UPDATE_F_OVERRIDE: 0)

Thus, only when the next hop of the actual outgoing packet is 1.1.1.1 will the corresponding MAC address be mapped to the reachable state through the "local acknowledgement" mechanism or by actually sending the ARP request.

Correction: after reading the source code of keepalived, it is found that this worry is unnecessary. After all, keepalived is very mature and should not make "such a low-level mistake". After a host switches to master, keepalived will actively send free arp. The code in keepalived is as follows:

The code is as follows:

Vrrp_send_update (vrrp_rt * vrrp, ip_address * ipaddress, int idx)

{

Char * msg

Char addr_str [41]

If (! IP_IS6 (ipaddress)) {

Msg = "gratuitous ARPs"

Inet_ntop (AF_INET, & ipaddress- > u.sin.sin_addr, addr_str, 41)

Send_gratuitous_arp (ipaddress)

} else {

Msg = "Unsolicited Neighbour Adverts"

Inet_ntop (AF_INET6, & ipaddress- > u.sin6_addr, addr_str, 41)

Ndisc_send_unsolicited_na (ipaddress)

}

If (0 = = idx & & debug & 32) {

Log_message (LOG_INFO, "VRRP_Instance (s) Sending s on s for s"

Vrrp- > iname, msg, IF_NAME (ipaddress- > ifp), addr_str)

}

Vrrp_send_update (vrrp_rt * vrrp, ip_address * ipaddress, int idx)

{

Char * msg

Char addr_str [41]

If (! IP_IS6 (ipaddress)) {

Msg = "gratuitous ARPs"

Inet_ntop (AF_INET, & ipaddress- > u.sin.sin_addr, addr_str, 41)

Send_gratuitous_arp (ipaddress)

} else {

Msg = "Unsolicited Neighbour Adverts"

Inet_ntop (AF_INET6, & ipaddress- > u.sin6_addr, addr_str, 41)

Ndisc_send_unsolicited_na (ipaddress)

}

If (0 = = idx & & debug & 32) {

Log_message (LOG_INFO, "VRRP_Instance (s) Sending s on s for s"

Vrrp- > iname, msg, IF_NAME (ipaddress- > ifp), addr_str)

}

six。 The solution of the second problem

After all this crap, how on earth do you set the aging time of ARP cache on Linux?

We see multiple files under the / proc/sys/net/ipv4/neigh/ethX directory. Which one is the aging time of the ARP cache? In fact, to put it bluntly, it is the base_reachable_time file. Everything else is just a measure to optimize behavior. For example, the file gc_stale_time records the survival time of "ARP cache table item cache", which is only the survival time of a cached cache. During this time, if you need to use the neighbor, you can directly use the data recorded by the table entry as the content of the ARP request, or directly set it to reachable state after "local confirmation" instead of route lookup, ARP lookup and ARP neighbor creation. ARP neighbors resolve this slow way.

By default, the timeout period for reachable status is 30 seconds. If the timeout time exceeds 30 seconds, the ARP cache table entry will be changed to stale status. At this point, you can assume that the table entry has expired, but it is just that it has not been deleted in the implementation of Linux. After the gc_stale_time time, the table entry will be deleted. After the ARP cache table item becomes non-reachable, the garbage collector is responsible for executing the event that "the table item will not be deleted until after the gc_stale_time time". The next expiration time of this timer is calculated based on base_reachable_time, which is in neigh_periodic_timer:

The code is as follows:

If (time_after (now, tbl- > last_rand + 300 * HZ) {

Struct neigh_parms * p

Tbl- > last_rand = now

For (p = & tbl- > parms; p; p = p-> next)

/ / it is important to prevent ARP resolution storms caused by "resonance behavior"

P-> reachable_time = neigh_rand_reach_time (p-> base_reachable_time)

}

...

Expire = tbl- > parms.base_reachable_time > > 1

Expire / = (tbl- > hash_mask + 1)

If (! expire)

Expire = 1

Mod_timer (& tbl- > gc_timer, now + expire)

If (time_after (now, tbl- > last_rand + 300 * HZ) {

Struct neigh_parms * p

Tbl- > last_rand = now

For (p = & tbl- > parms; p; p = p-> next)

/ / it is important to prevent ARP resolution storms caused by "resonance behavior"

P-> reachable_time = neigh_rand_reach_time (p-> base_reachable_time)

}

...

Expire = tbl- > parms.base_reachable_time > > 1

Expire / = (tbl- > hash_mask + 1)

If (! expire)

Expire = 1

Mod_timer (& tbl- > gc_timer, now + expire)

You can see it! Appropriately, we can understand this by looking at code comments, which are written by well-intentioned people. In order to make the experiment clear, we designed the following two scenarios:

1. Use iptables to disable all local receipts, thereby shielding arp local acknowledgements, using sysctl to set base_reachable_time to 5 seconds and gc_stale_time to 5 seconds.

two。 Turn off the prohibition policy of iptables, use TCP to download a very large file on the external network or make a continuous short connection, use sysctl to set base_reachable_time to 5 seconds and gc_stale_time to 5 seconds.

In both scenarios, use the ping command to ping the default gateway of the local LAN, and then quickly Ctrl-C the ping. You can see the arp entry of the default gateway with ip neigh show all. However, in scenario 1, within about 5 seconds, the arp entry will become stale and then remain unchanged. If ping, the entry will first become delay and then probe, and then become stale again within reachable,5 seconds, while in scenario 2, the entry will become stale again. The arp table entries continue to be reachable and dealy, which illustrates the ARP state machine in Linux. So why is it that in scenario 1, the table item is not deleted long after it becomes stale? In fact, this is because there are also routing cache entries in use, and after you delete the routing cache, the arp entry is deleted quickly.

The above is all the contents of the article "an example Analysis of the time principle of ARP Cache Aging in Linux". Thank you for reading! Hope to share the content to help you, more related knowledge, welcome to follow the industry information channel!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.