Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

Example Analysis of close_wait problem

2025-01-19 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Share

Shulou(Shulou.com)06/02 Report--

This article introduces you to the example analysis of close_wait problems, the content is very detailed, interested friends can refer to, I hope it can be helpful to you.

Preface

Once the merchant reported that there was a problem accessing our service and there were a lot of timeouts. When we logged in to the server to query the problem, we found that-bash: fork: retry: resources were temporarily unavailable, and checked the tcp connection of the system, and found that there were a lot of closewait.

Problem description

The phenomena are

Merchants cannot connect to our services

Xshell login server reports bash: fork: retry: resources are temporarily unavailable

Check the tcp connection of the system and find that there are a lot of closewait

The application process survives and the log prints normally.

I'm going to analyze the problem from the outside to the inside. Network-"system -" application.

1. The network

Merchants can not access our services, which involves a lot of network testing problems. I contacted my colleagues in the network department and asked if there was a problem for merchants to access our network during this period.

two。 System 2.1 resources are temporarily unavailable

Bash: fork: retry: resource is temporarily unavailable. I inquired about the information and the results are as follows:

It may be due to resource constraints, either by the system itself or by the users of the system. Resource restrictions can be viewed through ulimit-a. Ulimit-u prints the maximum number of user processes. If the maximum number of processes is exceeded, fork cannot create a new process and will print the above error. It could also be due to swapping memory resources.

I'll use ulimit-u to see that the maximum number of user processes is 1024.

2.2 A lot of tcp closewait

Tcp closewait is an intermediate state of the server when the connection is closed. Let's first introduce the release of tcp connections.

2.2.1 tcp releases the connection process

Tpc_close__.jpg

The connection is established by tpc, and after the data transmission is finished, both sides of the communication can release the connection. Now both An and B are in the state of establishing a connection. A's application process first sends a connection release message segment to its TCP, stops sending data, and actively closes the TCP connection. A sets the first FIN of the connection release message segment to 1, and its sequence number seq=u, which is equal to the sequence number of the last byte of the previously transmitted data plus 1. At this point, An is in FIN-WAIT-1 (stop waiting for 1), waiting for B to confirm. Note that TPC stipulates that fin messages consume a sequence number even if they do not carry data.

Upon receipt of the connection release message segment, B issues an acknowledgement, and the message segment has its own sequence number v, which is equal to the sequence number of the last byte of data that B has previously transmitted plus 1, and then B enters the CLOSE-WAIT state. The TCP server process should notify the high-level application process, so the connection from A to B is released, which means that TCP is in a semi-closed state, that is, A has no data to send, but if B sends data, A still has to receive. In other words, the connection from B to An is not closed. This state may last for some time.

After A receives the confirmation from B, it enters the FIN-WAIT-2 (termination wait 2) and waits for the connection release message sent by B.

If B has no data to send to A, its application notifies TCP to release the connection. At this point, the connection release message segment sent by B must make the FIN=1. Now assume that the sequence number of B is w (B may have sent some data in the semi-closed state). B must also repeat the confirmation number ack=u+1 that was sent last time. At this point, B enters LAST-ACK (final confirmation status) and waits for confirmation from A.

A must confirm this after receiving the connection release message from B. In the confirmation message section, set ACK to 1, confirm the number ack=w+1, and your serial number is seq=u+1 (according to the TCP standard, the previously sent FIN message segment consumes a serial number). Then enter the TIME-WAIT (time waiting state). Please note that the new TCP connection has not been released. A must wait for the time set by the counter 2MSL before An enters the CLOSED state. The time MSL is called the maximum message segment life, and RFC793 recommends setting it to 2 minutes. But this is entirely from an engineering point of view, for the current network, MSL=2 minutes may be too long. Therefore, TCP allows different implementations to use the minimum MSL value on a case-by-case basis. Therefore, after entering the TIME-WAIT state from A, it takes 4 minutes to enter the CLOSED state before the next new connection can be established. When A cancels the corresponding transmission control block TCB, the TCP connection is terminated.

As long as B receives the confirmation from A, it enters the CLOSED state. Similarly, B ends the TCP connection after revoking the corresponding transmission control block TCB. We noticed that B ended the TCP connection earlier than A.

The above connection release is a four-way handshake, but it can also be seen as two two-way handshakes.

2.2.2 reasons for a large number of close-wait

The state of CLOSE-WAIT is that B already knows that An is not sending data to B, so B can have its application tell TCP to release the connection within an appropriate period of time. Well, the question now is why the application of B delayed telling TCP to release the connection, whether the application was dead, or whether the resources of the application reached the critical value, and could not notify TCP to release the connection. Recall that I just mentioned that the system resources are not available, is it because B's application wants to notify TCP to release the connection, but cannot perform this operation because there are no system resources?

We looked at the application deployment of the server and found that the server deployed many applications, and each application responded to tcp requests with a thread pool of 100 million. At the peak of the business, it is likely to reach the maximum number of user processes of 1024, thus causing a series of problems.

Application

The thread pool for applying test settings in response to tcp requests was 100 seconds, and there were a lot of applications deployed by the server at that time.

Solution

Increase the limit on the number of system user processes

The migration part is not important to be applied to other servers to reduce server pressure.

Thinking

Why it didn't leak out before. At the peak of business, why will you fill up the system resources? What is the actual system throughput and tps. Is our business processing capacity lower than before?

There is a lack of pressure testing in the quality control system.

Postscript

With the popularity of 5G, the speed of the network has been greatly improved, mastering network knowledge has become one of the essential skills.

This is the end of the sample analysis of close_wait problems. I hope the above content can be helpful to you and learn more knowledge. If you think the article is good, you can share it for more people to see.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Internet Technology

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report