Talk about TCP long connections and heartbeats. 07/01 Update SLTechnology News&Howtos

Talk about TCP long connections and heartbeats.

2025-07-01 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Shulou(Shulou.com)06/03 Report--

1 preface

Perhaps many Java programmers only understand TCP with a three-way handshake and four waves. I think the main reason for this is that the TCP protocol itself is a little abstract (compared to the HTTP protocol at the application layer); second, non-framework developers do not need to be exposed to some of the details of TCP. In fact, I personally do not fully understand many details of TCP. This article mainly aims at the problems of long connection and heartbeat raised by some people in the Wechat exchange group.

In Java, the use of TCP communication, most likely will involve Socket, Netty, this article will borrow some of their API and setting parameters to assist the introduction.

2 long connection and short connection

There is no difference between long and short connections in TCP itself. whether it is long or short depends entirely on how we use it.

Short connection: each time a Socket; is created, the communication ends and socket.close () is called. This is the general sense of short connections, the advantage of short connections is relatively simple to manage, the existing connections are available connections, do not require additional means of control.

Persistent connection: after each communication, the connection is not closed, so that the connection can be reused. The advantage of a long connection is that it saves the time consuming to create a connection.

The advantages of short connection and long connection are the disadvantages of each other. If we want to be simple, do not pursue high performance, and use short connections, so that we do not have to worry about the management of connection state; if we want to pursue performance and use long connections, we need to worry about all kinds of problems, such as end-to-end connection maintenance and connection survival.

Long connections are also often used to push data. Most of the time, our understanding of communication is still the request/response model, but the nature of TCP duplex communication determines that it can also be used for two-way communication. Under the long connection, the push model can be easily implemented.

There is not much to say about short connections, so below we will focus on some of the issues of long connections. Pure theory is a bit monotonous, so I use some practices of Dubbo as a RPC framework to discuss TCP below.

3 long connections in the service governance framework

As mentioned earlier, when pursuing performance, you will inevitably choose to use persistent connections, so you can understand TCP very well with Dubbo. We open two Dubbo applications, one server is responsible for listening on the local 20880 (as we all know, this is the default port of the Dubbo protocol), and one client is responsible for sending requests in a loop. Execute the lsof-i:20880 command to view the usage of the port:

*: 20880 (LISTEN) indicates that Dubbo is listening on local port 20880 to process requests sent to local port 20880.

The last two messages indicate the sending of the request, which verifies that TCP is a two-way communication process. Since I opened two Dubbo applications on the same machine, you can see that the local port 53078 is communicating with port 20880. We did not manually set the client port 53078, it is random, but it also illustrates that even the party who sends the request needs to occupy a port.

Say a little bit about the parameter FD, it represents the file handle, each new connection will occupy a new file handle, if you have an open too many files exception in the process of using TCP communication, then you should check whether you have created too many connections and not closed. Careful readers will also think of another benefit of long connections, which is that they take up fewer file handles.

4 maintenance of long connection

Because the services requested by the client may be distributed on multiple servers, the client naturally needs to create multiple persistent connections with the peer and use persistent connections. The first problem we encounter is how to maintain persistent connections.

/ / client

Public class NettyHandler extends SimpleChannelHandler {

Private final Map channels = new ConcurrentHashMap (); / /

}

/ / Server

Public class NettyServer extends AbstractServer implements Server {

Private Map channels; / /

}

In Dubbo, both client and server use ip:port to maintain end-to-end persistent connections, and Channel is the abstraction of connections. We focus on persistent connections in NettyHandler, and the server maintains a collection of persistent connections at the same time is the design of Dubbo, which we will talk about later.

5. Keep the connection alive

There will be a chat on this topic, which will involve more knowledge points. First of all, we need to be clear, why do we need to connect to the job? When the two sides have established a connection, but because of network problems, the link is not available, so the long connection can not be used. To be clear, it is not very reliable to see that the state of the connection is in the ESTABLISHED state through instructions such as netstat,lsof, because the connection may be dead, but it is not perceived by the system, let alone the complication of fake death. It is a technical task to ensure that long connections are available.

6. Keep alive with connection: KeepAlive

The first thing that comes to mind is the KeepAlive mechanism in TCP. KeepAlive is not part of the TCP protocol, but most operating systems implement this mechanism. After the KeepAlive mechanism is enabled, when there is no data transfer on the link within a certain period of time (7200s in general time, parameter tcp_keepalive_time), the TCP layer will send the corresponding KeepAlive probe to determine the connection availability. After the probe fails, it will retry 10 (parameter tcp_keepalive_probes) times with an interval of 75s (parameter tcp_keepalive_intvl). After all probes fail, the current connection will be considered unavailable.

Open KeepAlive in Netty:

Bootstrap.option (ChannelOption.TCP_NODELAY, true)

Set the KeepAlive-related parameters in the Linux operating system and modify the / etc/sysctl.conf file:

Net.ipv4.tcp_keepalive_time=90net.ipv4.tcp_keepalive_intvl=15net.ipv4.tcp_keepalive_probes=2

The KeepAlive mechanism ensures the availability of the connection at the network level, but at the application framework level, we think this is not enough. It is mainly reflected in two aspects:

The switch of KeepAlive is turned on at the application layer, but the setting of specific parameters (such as retry test, retry interval) is at the operating system level and is located in the operating system / etc/sysctl.conf configuration, which is not flexible enough for applications.

KeepAlive's inactivation mechanism works only when the link is idle. What happens if data is sent and the physical link is down, and the link state on the operating system side is still ESTABLISHED? Naturally, we will follow the TCP retransmission mechanism. To know the default TCP timeout retransmission, the exponential Backoff algorithm is also a long process.

KeepAlive itself is network-oriented, not application-oriented. When the connection is not available, it may be due to the GC problem of the application itself and the advanced load of the system, but the network is still connected. At this time, the application has lost its activity, so the connection should naturally be considered to be unavailable.

It seems that the connection preservation at the application level must be done.

7 keeping alive of connections: application layer heartbeat

Finally, the topic mentioned in the article heartbeat is another TCP-related knowledge point that this article wants to emphasize. As we explained in the previous section, KeepAlive at the network level is not sufficient to support application-level connection availability. This section will talk about the heartbeat mechanism at the application layer to achieve connection survival.

How to understand the heartbeat of the application layer? To put it simply, the client will start a scheduled task to send a request to the connected peer application (the request here is a special heartbeat request), and the server needs to specially handle the request and return the response. If the heartbeat persists and does not receive a response, the client will consider the connection unavailable and actively disconnect. Different service governance frameworks have different strategies for heartbeat, connection establishment, disconnection, and blocking mechanisms, but most service governance frameworks do heartbeats at the application layer, and Dubbo is no exception.

8 Design details of application layer heartbeat

Take Dubbo as an example, the heartbeat of the application layer is supported. Both the client and the server will open a HeartBeatTask. The client will open it in HeaderExchangeClient, and the server will open it in HeaderExchangeServer. The article begins with a hole: why does Dubbo maintain Map on the server side at the same time? The main purpose of the heartbeat timing task is to contribute to the heartbeat. When the heartbeat timing task finds that the connection is not available, it will take different branches according to whether the client or the server is currently unavailable. The client finds it unavailable and is reconnected; the server finds it unavailable and is a direct close.

/ / HeartBeatTaskif (channel instanceof Client) {((Client) channel) .reconnect ();} else {channel.close ();}

Students who are familiar with other RPC frameworks will find that the heartbeat mechanisms of different frameworks are really very different. Heartbeat design is also related to connection creation, reconnection mechanism, blacklist connection, and needs specific framework analysis.

In addition to the design of scheduled tasks, it is also necessary to support heartbeats at the protocol level. For the simplest example, you can refer to the health check of nginx, and for Dubbo protocol, heartbeat support is also required. If the heartbeat request is identified as normal traffic, it will cause pressure problems on the server, interference, current restriction and many other problems.

Flag represents the flag bits of the Dubbo protocol, a total of 8 address bits. The lower four bits are used to represent the type of serialization tool used for message body data (the default hessian). In the high four bits, the first bit 1 indicates a request request, the second bit 1 indicates two-way transmission (that is, a response is returned), and the third bit 1 indicates a heartbeat event.

Heartbeat requests should be treated differently from ordinary requests.

9 pay attention to the difference between KeepAlive and HTTP

The KeepAlive intention of the HTTP protocol is to reuse connections and transmit request-response data serially on the same connection.

The KeepAlive mechanism of TCP is intended to keep alive, heartbeat, and detect connection errors.

These are two concepts at all.

10 KeepAlive common exceptions

Applications that enable TCP KeepAlive can generally catch the following types of errors

ETIMEOUT timeout error. After sending a probe protection packet (tcpkeepalivetime + tcpkeepaliveintvl * tcpkeepaliveprobes), the exception triggered by ACK acknowledgement is still not received, and the socket is turned off java java.io.IOException:Connectiontimedout.

EHOSTUNREACH host unreachable (host unreachable) error, which should be reported by ICMP to the upper application. Java java.io.IOException:Noroute to host

The link is reset and the terminal may crash and restart and receive a message from the server, but things have changed and the past can only be announced with a helpless reset. Java java.io.IOException:Connectionresetbypeer

11 Summary

There are three practical scenarios for using KeepAlive:

By default, the KeepAlive cycle is 2 hours. If you do not choose to change it, it is a misuse and a waste of resources: the kernel will turn on a keep-alive timer for each connection, and N connections will open N keep-alive timers. The advantages are clear:

TCP protocol layer lively detection mechanism, and the system kernel is done automatically for the upper layer applications.

The timer at the kernel level is more efficient than the upper application.

The upper application only needs to deal with data sending and receiving and connection exception notification.

Packets will be more compact

Turn off the KeepAlive of TCP and completely use the heartbeat survival mechanism in the application layer. The heartbeat is controlled by the application, which is more flexible and controllable. For example, you can set the heartbeat cycle at the application level and adapt to private protocols.

Business heartbeat and TCP KeepAlive are used together to complement each other, but the TCP live detection cycle and the application heartbeat cycle should be coordinated to complement each other, and the gap should not be too large, otherwise the desired effect will not be achieved.

The design of each framework is different. For example, Dubbo uses solution 3, but the HSF framework within Ali does not set the KeepAlive of TCP, only by the heartbeat of the application. Like the heartbeat strategy, this is related to the overall design of the framework

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.