How to troubleshoot when there are too many TIME_WAIT states in the server 07/11 Update SLTechnology News&Howtos

How to troubleshoot when there are too many TIME_WAIT states in the server

2025-07-11 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Development >

Shulou(Shulou.com)06/01 Report--

This article mainly introduces the relevant knowledge of how to troubleshoot when there is too much TIME_WAIT status in the server, the content is detailed and easy to understand, the operation is simple and fast, and has a certain reference value. I believe you will gain something when you read the article on how to troubleshoot too much TIME_WAIT status in this server. Let's take a look.

I. Overview (1) phenomenon

There are two phenomena in the server. The first is the small number of tcp connections, no more than 10, but 2000 of the time_wait status. Second, according to the previous nature, the server's resources are almost unused, such as CPU, with very little user access. In the absence of many users, CPU wear persists at around 40%, and does not stop at night. There are several web projects running, all separated by containers started by docker.

(2) relevant knowledge

The tcp connection has three handshakes and four waves when disconnected.

In the three-way handshake, the first time, the active end sends a SYN signal to the passive end of the listen, and then it becomes the SYN-SENT state; the second time, the passive end sends the ACK confirmation signal and the SYN signal; the third time, the active terminal sends the ACK signal to confirm that the SYN of the passive end has been received. Then both enter the enblished state, that is, they have been connected successfully.

The first of the four waves is that the active end disconnects, sends a FIN signal, and becomes FIN-WAIT-1; the second time, when the passive party receives the FIN signal, it becomes the CLOSE-WAIT state, and then quickly sends the ACK signal to the active party for confirmation, and it is time for the active party to become FIN-WAIT-2 state; the third time, when the passive party waits for its application to disconnect, it sends the FIN signal to the active party, and the passive party's state becomes LAST-ACK. The fourth time is that the active party receives the FIN signal of the passive party, then sends the ACK signal, instantly changes itself into TIME-WAIT state, and then waits for recovery.

That is to say, whoever has TIME-WAIT is the initiative. This can rule out the possibility that users close the web page frequently. This means that the server actively requests to be disconnected, and the links in TIME-WAIT status are not reclaimed.

2. Question speculation (1) Network

What is above the network is that the network is not good, or is attacked.

(2) Application

The parameters of the middleware are wrong, which leads to the disconnection of the middleware, or the active disconnection caused by the application error. Or it is the application that leads to the consumption of too many resources.

III. Investigation

This server has three projects, each of which has a lanmp architecture. The problem is that there are several projects in the server, each with a reverse proxy. The good thing is that the back end is a docker container, separate.

(1) IP1 on the TCP connection. The following figure shows the IP of the container

Command:

For i in $(docker ps | awk 'NRemotion1 {print $NF}'); do echo-e $I "\ c"; docker inspect-- format'{{.NetworkSettings.IPAddress}}'$idone

2. The following figure shows the local IP in the connection

Command:

Netstat-tn | grep TIME_WAIT | awk'{print $4}'| sort | uniq-c | sort-nr | head

The first one in the list is that our local IP,6601 is the listening port of the api project. You can see that in the TCP of the desired TIME_WAIT status, the back end of the API project is the one with the most requests. It is estimated that the reverse proxy server has also been requested a lot.

3. The following figure shows the active IP that connects the local API project.

Command:

Netstat-ant | grep 10.25.20.251purl 6601

On the way, you can see that all the requests to connect to the API backend are nginx's IP, which is easy to understand. The nginx reverse proxy is the entrance. Let's see who made the request to nginx.

4. The following figure shows the IP in the connection.

Command: netstat-tn | awk'{print $5}'| sort | uniq-c | sort-nr | head

The request for API is 600, and the request for nginx is 300, indicating that all the TIME-WAIT, one part is requesting nginx, the other part is nginx requesting API.

5. The following figure shows whether it is the web front-end nginx that requested API.

Command: netstat-ant | grep 192.168.42.32 virtual 443

It turned out to be the request of 192.168.42.1 IP. In fact, the IP of 192.168.42.1 is the IP of docker's virtual network card and serves as the gateway to all containers, that is to say, this is the request issued by these containers anyway, but it is not sure which one.

To sum up, we can rule out the network problem, the parameters of the middleware apache have not changed, but there are so many requests for the nginx of the web front end, it can show that the problem does not appear on the request of apache. Then think about code errors.

(2) Container on the host 1. The relationship between application and network

Maybe the problem with TIME-WAIT is that the backend program sends random requests. Apache is the backend container of the main project, and apache-api is the backend program of api. The increase in CPU consumed by webserver just shows that the system resources used by the container are caused by such requests. Let's take a look at api's access log with tail.

Real-time monitoring shows that API is requested about 12 times a second. According to the nature of the business and the status of docker, it can be concluded that the internal consumption of system resources is caused by the cyclic request of the main project. And every time the API project returns the access_token,API return data, it sends a disconnect signal, which is consistent with the logic, and it can also be concluded that the status of the TIME_WAIT is also caused by this request. And TIME_WAIT is not recycled, but recycled, but constantly generated.

This is the end of the article on "how to troubleshoot too much TIME_WAIT status in the server". Thank you for reading! I believe that everyone has a certain understanding of the knowledge of "how to troubleshoot when there is too much TIME_WAIT status in the server". If you want to learn more knowledge, you are welcome to follow the industry information channel.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.