Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

How to solve the delay problem caused by TCP delay acknowledgement Delayed Ack Mechanism under Linux

2025-04-05 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Servers >

Share

Shulou(Shulou.com)05/31 Report--

This article shows you how to solve the delay problem caused by the TCP delay confirmation Delayed Ack mechanism under Linux. The content is concise and easy to understand, which will definitely brighten your eyes. I hope you can get something through the detailed introduction of this article.

Case 1: colleagues casually write a stress test program, whose implementation logic is: send N 132-byte packets continuously per second, and then receive N 132-byte packets echoed back by the background service. The code is simplified as follows:

Char sndBuf; char rcvBuf; while (1) {for (int I = 0; I)

< N; i++){ send(fd, sndBuf, sizeof(sndBuf), 0); ... } for (int i = 0; i < N; i++) { recv(fd, rcvBuf, sizeof(rcvBuf), 0); ... } sleep(1);} 在实际测试中发现,当N大于等于3的情况,第2秒之后,每次第三个recv调用,总会阻塞40毫秒左右,但在分析Server端日志时,发现所有请求在Server端处理时耗均在2ms以下。 当时的具体定位过程如下:先试图用strace跟踪客户端进程,但奇怪的是:一旦strace attach上进程,所有收发又都正常,不会有阻塞现象,一旦退出strace,问题重现。经同事提醒,很可能是strace改变了程序或系统的某些东西(这个问题现在也还没搞清楚),于是再用tcpdump抓包分析,发现Server后端在回现应答包后,Client端并没有立即对该数据进行ACK确认,而是等待了近40毫秒后才确认。经过Google,并查阅《TCP/IP详解卷一:协议》得知,此即TCP的延迟确认(Delayed Ack)机制。 其解决办法如下:在recv系统调用后,调用一次setsockopt函数,设置TCP_QUICKACK。最终代码如下: char sndBuf[132];char rcvBuf[132];while (1) { for (int i = 0; i < N; i++) { send(fd, sndBuf, 132, 0); ... } for (int i = 0; i < N; i++) { recv(fd, rcvBuf, 132, 0); setsockopt(fd, IPPROTO_TCP, TCP_QUICKACK, (int[]){1}, sizeof(int)); } sleep(1);} 案例二:在营销平台内存化CDKEY版本做性能测试时,发现请求时耗分布异常:90%的请求均在2ms以内,而10%左右时耗始终在38-42ms之间,这是一个很有规律的数字:40ms。因为之前经历过案例一,所以猜测同样是因为延迟确认机制引起的时耗问题,经过简单的抓包验证后,通过设置TCP_QUICKACK选项,得以解决时延问题。 延迟确认机制 在《TCP/IP详解卷一:协议》第19章对其进行原理进行了详细描述:TCP在处理交互数据流(即Interactive Data Flow,区别于Bulk Data Flow,即成块数据流,典型的交互数据流如telnet、rlogin等)时,采用了Delayed Ack机制以及Nagle算法来减少小分组数目。 书上已经对这两种机制的原理讲的很清晰,这里不再做复述。本文后续部分将通过分析TCP/IP在Linux下的实现,来解释一下TCP的延迟确认机制。 1.为什么TCP延迟确认会导致延迟? 其实仅有延迟确认机制,是不会导致请求延迟的(初以为是必须等到ACK包发出去,recv系统调用才会返回)。一般来说,只有当该机制与Nagle算法或拥塞控制(慢启动或拥塞避免)混合作用时,才可能会导致时耗增长。我们下面来详细看看是如何相互作用的: 延迟确认与Nagle算法 我们先看看Nagle算法的规则(可参考tcp_output.c文件里tcp_nagle_check函数注释): 1)如果包长度达到MSS,则允许发送; 2)如果该包含有FIN,则允许发送; 3)设置了TCP_NODELAY选项,则允许发送; 4)未设置TCP_CORK选项时,若所有发出去的包均被确认,或所有发出去的小数据包(包长度小于MSS)均被确认,则允许发送。 对于规则4),就是说要求一个TCP连接上最多只能有一个未被确认的小数据包,在该分组的确认到达之前,不能发送其他的小数据包。如果某个小分组的确认被延迟了(案例中的40ms),那么后续小分组的发送就会相应的延迟。也就是说延迟确认影响的并不是被延迟确认的那个数据包,而是后续的应答包。 1 00:44:37.878027 IP 171.24.38.136.44792 >

175.24.11.18.9877: s 3512052379 win 3512052379 (0) IP 5840 200VV 444437.878045 IP 175.24.11.18.9877 > 171.24.38.136.44792: s 3581620571MAV 3581620571 (0) ack 3512052380 win 5792 3 00 win 44Vist37.879080 IP 171.24.136.44792 > 175.24.11.18.9877:. Ack 1 win 46.4 00 ack 4415 38.885325 IP 171.24.38.136.44792 > 175.24.11.18.9877: P 1321Vis1453 (132) ack 1321 win 86500 win 886037 IP 175.24.11.18.9877 > 171.24.136.44792: P 1321REX 1453 (132) win 23106 00 ack 4415 38.887174 IP 171.38.136.44792 > 175.24.11.9877: P 1453: 2641 (1188) ack 1453 win 1027 00 ack 4438.887888 IP 175.24.11.18.9877 > 171.24.38.136.44792: P 1453 ack 2641 win 2904800 ack 4438.925270 IP 171.24.38.136.44792 > 175.24.11.18.9877:. Ack 2476 win 1189 00 win 44 IP 38.925276 IP 175.24.11.18.9877 > 171.24.38.136.44792: P 2476purl 2641 (165) ack 2641 win 290410 00V 44vir 38.926328 IP 171.24.38.136.44792 > 175.24.11.18.9877:. Ack 2641 win 134

From the above tcpdump packet capture analysis, the 8th packet is delayed to confirm, while the data of the 9th packet on the server side (175.24.11.18) has long been placed in the TCP send buffer (the send called by the application layer has been returned), but according to the Nagle algorithm, the 9th packet needs to wait until the ACK of the seventh packet (less than MSS) arrives before it can be sent.

Delayed acknowledgement and congestion Control

We first use the TCP_NODELAY option to turn off the Nagle algorithm, and then analyze the interaction between delayed acknowledgement and TCP congestion control.

Slow start: the sender of TCP maintains a congestion window, marked cwnd. When a TCP connection is established, the value is initialized to 1 message segment, and each time an ACK is received, the value is incremented by 1 message segment. The sender takes the minimum value in the congestion window and the notification window (corresponding to the sliding window mechanism) as the upper limit of transmission (the congestion window is the flow control used by the sender and the notification window is the flow control used by the receiver). The sender starts to send 1 message segment. After receiving the ACK, the cwnd increases from 1 to 2, that is, two message segments can be sent. When the ACK of these two message segments is received, the cwnd increases to 4, that is, the exponential growth: for example, in the first RTT, a packet is sent and its ACK,cwnd increases by 1, while in the second RTT, two packets can be sent and the corresponding two ACK are received, then each ACK received by the cwnd increases by 1, and finally becomes 4. Achieved exponential growth.

In the Linux implementation, the cwnd is not incremented by 1 every time an ACK packet is received, but not if no other packet is waiting to be ACK when the ACK is received.

I use the test code of case 1. In the actual test, cwnd starts with an initial value of 2 and finally maintains the values of three message segments. The tcpdump results are as follows:

1 16ack 46ack 14.288604 IP 178.14.5.3.1913 > 178.14.5.4.20001: s 1324697951Vera 1324697951 (0) win 5840 2 16VlV 46149549 IP 178.14.5.4.20001 > 178.14.3.1913: s 2866427156purl 2866427156 (0) S1324697952 win 5792 3 16shu46 IP 178.14.5.3.1913 > 178.14.5.4.20001. Ack 1 win 1460.4 16 IP 15.327493 IP 178.14.5.3.1913 > 178.14.5.4.20001: P 1321 ack 1453 (132) ack 1321 win 41405 1646V 15.329749 IP 178.14.5.4.20001 > 178.14.5.3.1913: P 13141453 ack 1453 win 29046 16race 4615. 330001 IP 178.14.5.3.1913 > 178.14.5.4.20001: P 1453: 2641 (1188) ack 1453 win 41407 1614. 3629 IP 178.14.5.4.20001 > 178.14.5.3.1913: P 1453 ack 1585 (132) ack 2641 win 34988 1646 15.337629 IP 178.14.5.4.20001 > 178.14.5.3.1913: P 1585REX 1717 (132) ack 2641 win 34989 16 IP 46purl 15.340035 IP 178.14.5.4.20001 > 178.14.5.1913: P 1717pur1849 (132.132) ) ack 2641 win 349810 16 virtual 46 IP 15.371416 178.14.5.3.1913 > 178.14.5.4.20001. Ack 1849 win 414011 16 win 46V 15.371461 IP 178.14.5.4.20001 > 178.14.5.3.1913: P 1849 ack 2641 win 349812 1646V 15.371581 IP 178.14.5.3.1913 > 178.14.5.4.20001:. Ack 2641 win 4536

The packet in the above table is when TCP_NODELAY is set and the cwnd has grown to 3. After sending out the 7th, 8th and 9th, it is limited by the size of the congestion window, even if there is data in the TCP buffer that can be sent, that is, the 11th packet can not be sent until the 10th packet arrives, while the 10th packet obviously has a delay of 40ms.

Note: the TCP_INFO option of getsockopt (man 7 tcp) allows you to view the details of the TCP connection, such as the current congestion window size, MSS, etc.

two。 Why 40ms? Can you adjust this time?

First of all, in the official document of redhat, there is the following description:

When sending small messages, some applications may cause some delay because of TCP's Delayed Ack mechanism. Its value defaults to 40ms. The minimum delay confirmation time at the system level can be adjusted by modifying the tcp_delack_min. For example:

# echo 1 > / proc/sys/net/ipv4/tcpdelackmin

That is, it is expected to set the minimum delay confirmation timeout to 1ms.

However, in both slackware and suse systems, this option is not found, that is, the minimum value of 40ms, which cannot be adjusted through configuration on both systems.

There is a macro definition under linux-2.6.39.1/net/tcp.h as follows:

# define TCP_DELACK_MIN ((unsigned) (HZ/25)) / * minimal time to delay before sending an ACK * /

Note: the Linux kernel emits timer interrupt (IRQ 0) every fixed cycle, and HZ is used to define how many timer interrupts there are per second. For example, a HZ of 1000 represents 1000 timer interrupts per second. HZ can be set when compiling the kernel. For systems running on our existing servers, the HZ value is 250.

Thus, the minimum delay confirmation time is 40ms.

The delay confirmation time of TCP connection is generally initialized to the minimum value 40ms, and then continuously adjusted according to the parameters such as the retransmission timeout time (RTO) of the connection, the time interval between the last packet received and the packet received this time. For more information on the adjustment algorithm, please refer to the tcp_event_data_recv function of linux-2.6.39.1/net/ipv4/tcp_input.c and Line 564.

3. Why does TCP_QUICKACK need to be reset after each call to recv?

In man 7 tcp, it is described as follows:

TCP_ QUICKK `Enable quickack mode if set or disable quickack mode if cleared. In quickack mode, acks are sent immediately, rather than delayed if needed in accordance to normal TCP operation. This flag is not permanent, it only enables a switch to or from quickack mode. Subsequent operation of the TCP protocol will once again enter/leave quickack mode depending on internal protocol processing and factors such as delayed ack timeouts occurring and data transfer. This option should not be used in code intended to be portable.`

The manual clearly states that TCP_QUICKACK is not permanent. So what is its concrete implementation? Refer to the setsockopt function for the implementation of the TCP_QUICKACK option:

Case TCP_QUICKACK: if (! val) {icsk- > icsk_ack.pingpong = 1;} else {icsk- > icsk_ack.pingpong = 0; if ((1 sk_state) & (TCPF_ESTABLISHED | TCPF_CLOSE_WAIT) & & inet_csk_ack_scheduled (sk)) {icsk- > icsk_ack.pending | = ICSK_ACK_PUSHED Tcp_cleanup_rbuf (sk, 1); if (! (val & 1)) icsk- > icsk_ack.pingpong = 1;}} break

In fact, socket under linux has a pingpong attribute to indicate whether the current link is an interactive data stream. If its value is 1, it indicates that it is an interactive data stream, and a delayed confirmation mechanism will be used. But the value of pingpong changes dynamically. For example, when a TCP link sends a packet, it executes the following functions (linux-2.6.39.1/net/ipv4/tcp_output.c, Line 156):

/ * Congestion state accounting after a packet has been sent. * / static void tcp_event_data_sent (struct tcp_sock * tp,struct sk_buff * skb, struct sock * sk) {. Tp- > lsndtime = now; / * If it is a reply for ato after last received * packet, enter pingpong mode. * / if ((U32) (now-icsk- > icsk_ack.lrcvtime)

< icsk->

Icsk_ack.ato) icsk- > icsk_ack.pingpong = 1;}

The last two lines of code indicate that if the interval between the current time and the last accepted packet is less than the calculated delayed acknowledgement timeout, re-enter interactive data flow mode. It can also be understood this way: when the delayed confirmation mechanism is confirmed to be valid, it will automatically enter into interaction.

From the above analysis, we can see that the TCP_QUICKACK option needs to be reset after each call to recv.

4. Why not all packages are delayed in confirmation?

In the TCP implementation, the function tcp_in_quickack_mode (linux-2.6.39.1/net/ipv4/tcp_input.c, Line 197) is used to determine whether the ACK needs to be sent immediately. The function is implemented as follows:

/ * Send ACKs quickly, if "quick" count is not exhausted * and the session is not interactive. * / static inline int tcp_in_quickack_mode (const struct sock * sk) {const struct inet_connection_sock * icsk = inet_csk (sk); return icsk- > icsk_ack.quick & &! icsk- > icsk_ack.pingpong;}

Two conditions are required to be considered quickack mode:

Pingpong is set to 0.

Quick confirmation number (quick) must be non-0.

The value of pingpong is described earlier. The comment in the code of the quick attribute is: scheduled number of quick acks, that is, the number of packets that are quickly confirmed. Every time you enter quickack mode, quick is initialized into the receive window divided by 2 times the MSS value (linux-2.6.39.1/net/ipv4/tcp_input.c, Line 174). Every time an ACK packet is sent, quick is subtracted by 1.

5. About TCP_CORK option

The TCP_CORK option, like TCP_NODELAY, controls Nagle.

Turning on the TCP_NODELAY option means that no matter how small the packet is, it is sent immediately (regardless of the congestion window).

If you compare a TCP connection to a pipe, the TCP_CORK option acts like a plug. To set the TCP_CORK option is to plug the pipe with a plug, while to cancel the TCP_CORK option is to unplug the plug. For example, the following code:

Int on = 1 setsockopt (sockfd, SOL_TCP, TCP_CORK, & on, sizeof (on)); / / set TCP_CORKwrite (sockfd,...); / / e.g., http headersendfile (sockfd,...); / / e.g., http bodyon = 0bot setsockopt (sockfd, SOL_TCP, TCP_CORK, & on, sizeof (on)); / / unset TCP_CORK

When the TCP_CORK option is set, the TCP link does not send any packets, that is, it is sent only when the amount of data reaches MSS. When the data transfer is complete, it is usually necessary to cancel this option so that packets that are not the size of MSS can be sent out in time. If the application determines that multiple sets of data can be sent together (such as the header and body of the HTTP response), it is recommended that you set the TCP_CORK option so that there is no delay between the data. To improve performance and throughput, people such as Web Server and file servers generally use this option.

The famous high-performance Web server Nginx, when using sendfile mode, can set the open TCP_CORK option: configure the tcp_nopush in the nginx.conf configuration file to on. (TCP_NOPUSH and TCP_CORK implement similar functions, except that NOPUSH is implemented under BSD and CORK is implemented under Linux.) In addition, Nginx in order to reduce system calls, the pursuit of extreme performance, for short connections (usually after the transmission of data, immediately close the connection, except for Keep-Alive 's HTTP persistent connection), the program does not cancel the TCP_CORK option through setsockopt calls, because closing the connection will automatically cancel the TCP_CORK option and send the remaining data.

The above content is how to solve the delay problem caused by the delayed confirmation Delayed Ack mechanism of TCP under Linux. Have you learned any knowledge or skills? If you want to learn more skills or enrich your knowledge reserve, you are welcome to follow the industry information channel.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Servers

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report