How to understand keepalive and time_wait in TCP 07/08 Update SLTechnology News&Howtos

How to understand keepalive and time_wait in TCP

2025-07-08 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Servers >

Shulou(Shulou.com)06/01 Report--

This article introduces how to understand keepalive and time_wait in TCP. The content is very detailed. Interested friends can use it for reference. I hope it will be helpful to you.

TCP is a stateful communication protocol. The so-called stateful communication refers to the state in which both sides of the communication maintain the connection.

1. TCP keepalive

Let's briefly review the whole process of establishing and disconnecting TCP connections. Here we mainly consider the main process, which will be discussed in detail later on packet loss, congestion, window, failed retry, and so on. )

First of all, the client sends a syn (Synchronize Sequence Numbers: synchronous sequence number) packet to the server, telling the server that I want to connect to you. The syn package mainly carries the client's seq serial number; the server sends back a syn+ack, in which the synpacket is similar to the client, except that it carries the seq serial number of the server, and the ack packet confirms that the client allows the connection; finally, the client sends an ack confirmation to receive the syn packet of the server. In this way, the client and the server can establish a connection. The whole process is called a three-way handshake.

After the connection is established, the client or server can send data through the established socket connection, and after the peer receives the data, it can confirm that the data has been received through ack.

After the data exchange is finished, the client can usually send the FIN packet to tell the other end that I am going to disconnect; the other end first acknowledges the receipt of the FIN packet through ack, and then sends the FIN packet to tell the client that I have also closed it. Finally, the client responds to ack to confirm that the connection is terminated. The whole process becomes four waves.

The performance of TCP is often criticized. In addition to TCP+IP 's extra header, it requires three handshakes to establish a connection and four waves to close the connection. If only a small amount of data is sent, there is very little valid data to be transmitted.

Is it possible to establish a connection and then continue to reuse it? It is true that this can be done, but this leads to another problem: what if the connection is not released and the port is full. To this end, I introduced the first topic of today's discussion, TCP keepalive. The so-called TCP keepalive means that the TCP connection will be maintained through keepalive after it is established, and it will not be interrupted immediately after the data transmission is completed, but the connection status will be detected through the keepalive mechanism.

Linux control keepalive has three parameters: keep alive time net.ipv4.tcp_keepalive_time, keep alive time interval net.ipv4.tcp_keepalive_intvl, keep alive detection times net.ipv4.tcp_keepalive_probes. The default values are 7200 seconds (2 hours), 75 seconds and 9 probes, respectively. If you use TCP's own keep-Alive mechanism, it takes at least 2 hours + 9 minutes and 75 seconds to disconnect in a Linux system. For example, after SSH logs in to a server, you can see that the keepalive time of this TCP is 2 hours, and a probe packet will be sent 2 hours later to confirm whether the peer is connected.

TCP's keepalive is discussed because a leaked TCP connection was found on the server:

# ll / proc/11516/fd/10lrwx- 1 root root 64 Jan 3 19:04 / proc/11516/fd/10-> socket: [1241854730] # dateSun Jan 5 17:39:51 CST 2020

The connection has been established for two days, but the other party has been disconnected (abnormal disconnection). The connection was not released due to the use of the older go (there was a problem prior to 1.9).

To solve this kind of problem, we can use TCP's keepalive mechanism. The new go language supports setting the keepalive time when establishing a connection. First of all, check the DialContext method of establishing TCP connection in the network package.

If tc, ok: = c. (* TCPConn); ok & & d.KeepAlive > = 0 {setKeepAlive (tc.fd, true) ka: = d.KeepAlive if d.KeepAlive = = 0 {ka = defaultTCPKeepAlive} setKeepAlivePeriod (tc.fd, ka) testHookSetKeepAlive (ka)}

The defaultTCPKeepAlive is 15s. If it is a HTTP connection, using the default client, it sets the keepalive time to 30s.

Var DefaultTransport RoundTripper = & Transport {Proxy: ProxyFromEnvironment, DialContext: (& net.Dialer {Timeout: 30 * time.Second, KeepAlive: 30 * time.Second, DualStack: true,}). DialContext, ForceAttemptHTTP2: true, MaxIdleConns: 100, IdleConnTimeout: 90 * time.Second, TLSHandshakeTimeout: 10 * time.Second, ExpectContinueTimeout: 1 * time.Second,}

Let's pass a simple demo test with the following code:

Func main () {wg: = & sync.WaitGroup {} c: = http.DefaultClient for I: = 0; I

< 2; i++ { wg.Add(1) go func() { defer wg.Done() for { r, err := c.Get("http://10.143.135.95:8080") if err != nil { fmt.Println(err) return } _, err = ioutil.ReadAll(r.Body) r.Body.Close() if err != nil { fmt.Println(err) return } time.Sleep(30 * time.Millisecond) } }() } wg.Wait()} 执行程序后，可以查看连接。初始设置keepalive为30s。

Then it decreases continuously, and after 0, it will get 30s again.

The whole process can be obtained by grabbing the package through tcpdump.

# tcpdump-I bond0 port 35832-nvv-A

In fact, many applications are not probed alive through TCP's keepalive mechanism, because the default check time of more than two hours is completely impossible for many real-time systems. The usual practice is through the regular monitoring of the application layer, such as the PING-PONG mechanism (such as playing ping-pong, one round trip), and the application layer sends heartbeats, such as websocket ping-pong, from time to time.

II. TCP Time_wait

The second topic I'd like to share with you is the Time_wait status of TCP. 、

Why do I need time_wait status? Why not just enter the closed state? Entering the closed state directly frees resources for new connections more quickly, rather than having to wait for 2MSL (the Linux default) time.

There are two reasons:

One is to prevent "lost packets", as shown in the following figure, if the third packet in the first connection is delayed due to an underlying network failure. Waiting for a new connection to be established before the late packet arrives, it will cause the received data to be disordered.

The second reason is even simpler: if the last ack is lost, the other party will always be in the last ack state, and if the new connection is re-initiated at this time, the other party will return the RST packet to reject the request, resulting in a new connection cannot be established.

The time_wait state is designed for this purpose. In the case of high concurrency, if the TCP of time_wait can be reused, time_wait reuse means that connections in the time_wait state can be reused. Convert from time_wait to established and continue to reuse. The Linux kernel controls whether time_wait state multiplexing is enabled through the net.ipv4.tcp_tw_reuse parameter.

Readers may be curious, didn't you say that time_wait was designed to solve the above two problems? If direct reuse will not lead to the above two problems? Here we first introduce a TCP timestamp policy net.ipv4.tcp_timestamps = 1 that is enabled by Linux by default.

When the timestamp is enabled, for the problem of the first lost packet, the timestamp of the late packet will be discarded directly too early, so that the newly connected packet will not be disordered; for the second problem, after reuse is enabled, when the other party is in last-ack state, the syn packet will be sent back to FIN,ACK packet, and then the client sends RST to the server to close the request, so that the client can send syn again to establish a new connection.

Finally, readers need to be reminded that before the Linux 4.1kernel version, in addition to tcp_tw_reuse, there is also a parameter tcp_tw_recycle, which is forced to reclaim connections in time_wait state, which will cause packet loss in the NAT environment, so it is not recommended to enable it.

So much for sharing about how to understand keepalive and time_wait in TCP. I hope the above content can be helpful to you and learn more. If you think the article is good, you can share it for more people to see.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.