In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-01-17 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Servers >
Share
Shulou(Shulou.com)05/31 Report--
This article shows you how to analyze the problem of abnormal closure of TCP. The content is concise and easy to understand. It will definitely brighten your eyes. I hope you can get something through the detailed introduction of this article.
Study and test various situations of disconnection and exception of TCP, in order to analyze the causes and scenarios of network disconnection of network applications (such as tconnd), help the team analyze and locate the problem of abnormal connection failure, and provide it to TCP-related developers and testers as a reference.
There is a certain problem of offline access to each game, and the offline proportion of some game projects is relatively high. Now the network access of self-developed games for mutual entertainment basically uses tconnd and ProtocalHandler components, so they participate in the analysis and research of the reasons for the disconnection.
In the process of participating in the research and analysis of the dropped line problem of Project A, tconnd added the pipeline log of each connection and ProtocalHandler increased the Qos reporting log of each connection, through these logs recorded the reason for each disconnection and relevant statistics, including the underlying error code of TCP when the connection was disconnected abnormally.
Through the statistical analysis of tconnd's pipeline log and ProtocalHandler's Qos log, it is found that when the connection is broken abnormally, the error code of TCP is mostly "10054: Connection reset by peer" (under Linux) or "10054: An existing connection was forcibly closed by the remote host" (under Windows). From the original error code, everyone knows that "the network has been reset by the peer", but under what circumstances will this happen? Therefore, we have done further test and research on all kinds of shutdown of TCP.
one。 Research and Test of abnormal shutdown of TCP 1. Server side only Recv messages but not Send messages 1.1 test method
The server program Sleep a few seconds after accepting the TCP connection from the client, the client program sends a lot of messages to the peer immediately after the TCP connection, and then does the corresponding action (exit or wait), and the server program starts the Recv message after Sleep.
Note: the server program tested the Linux and Windows versions, but the client only tested the Windows version. If it is a Linux client, some Case results will be different.
1.2 Test Case
When the client program is running normally, unplug the network cable and kill the client program.
Objective: to simulate the situation of client crashing, sudden system restart, loose network cable or network failure.
Conclusion: in this case, the server program does not detect any anomalies and finally waits for a "timeout" before disconnecting the TCP connection.
After sending a lot of packets, the client program normally closes the Socket and exit the process (or does not exit the process)
Objective: to simulate the normal exit of the client after sending the message.
Conclusion: in this case, the server program can successfully receive all the messages and finally receive the "peer shutdown" (Recv returns zero) message.
The client program does not shut down the Socket direct exit process after sending a lot of packets
Purpose: to simulate the situation when the client program exits and forgets to close the Socket (for example, exiting the process through the close icon of the Windows window, without capturing the corresponding closing event to do the normal exit processing, etc.).
Conclusion: in this case, the server program can receive some TCP messages and then receive "104: Connection reset by peer" (under Linux) or "10054: An existing connection was forcibly closed by the remote host" (under Windows) error.
The client program sends a lot of packets in the process of directly Kill process
Purpose: to simulate the situation in which the client program crashes or terminates the process in an abnormal way (such as "kill-9" under Linux or the task manager of Windows kills the process).
Conclusion: in this case, the server program quickly receives an error of "104: Connection reset by peer" (under Linux) or "10054: An existing connection was forcibly closed by the remote host" (under Windows).
two。 Server-side Recv message and Send reply message 2.1Test method
The server program Sleep a few seconds after accepting the client's TCP connection, the client program sends a lot of messages to the peer immediately after the TCP connection, and then does the corresponding action (exit or wait). The server program starts Recv and Send messages after Sleep.
Note: the server program tested the Linux and Windows versions, but the client only tested the Windows version. If it is a Linux client, some Case results may be different.
2.2 Test results
After sending a lot of packets, the client program normally closes the Socket and exit the process (or does not exit the process)
Objective: to simulate the situation that the server sends a message to the client after the client closes the Socket normally, before checking that the TCP is closed.
Conclusion: in this case, after receiving and sending some TCP messages, the server program produces "32: Broken pipe" (under Linux) or "10053: An established connection was aborted by the software in your host machine" (under Windows) error when sending Send messages.
The client program sends a lot of packets without shutting down the Socket direct exit or Kill process
Purpose: to simulate a situation in which the client program exits and forgets to close the Socket, or the client program crashes or ends the process in an abnormal way.
Conclusion: in this case, the server program produces a "104: Connection reset by peer" (under Linux) or "10054: An existing connection was forcibly closed by the remote host" (under Windows) error when sending Recv or Send messages.
3. Effect and Summary 3.1 Summary
TCP finds a lot of network anomalies (especially 104errors under Linux or 10054 errors under Windows), such as problems with the network itself, intermediate routers, Nic drivers and other irresistible factors, but here are the problems that may be caused by the application itself, which are also situations that we need to further study and solve, especially those caused by program crashes:
When the TCP-connected process exits after forgetting to close the Socket, the program crashes, or ends the process in an abnormal way
(Windows client), which will cause the peer process of TCP connection to produce "104: Connection reset by peer" (under Linux) or "10054: An existing connection was forcibly closed by the remote host" (under Windows) error.
When the process machine connected to TCP crashes, the system suddenly restarts, the network cable is loose, or the network is not connected.
-(Windows client), the peer process of the connection may not detect any exception and finally wait for "timeout" before disconnecting the TCP connection.
When the TCP connected process closes the Socket normally, the peer process still sends a message to the TCP before the TCP shutdown event is detected
(Windows client), a "32: Broken pipe" (under Linux) or "10053: An established connection was aborted by the software in your host machine" (under Windows) error occurs when a Send message is sent.
3.2 effect
In view of the problem of dropping the line of project A, through questionnaires and contacting individual players, it is found that most of the disconnected cases are due to the direct withdrawal of the client program, so the project team is promoted to achieve the Qos reporting function of the client. Finally, through the statistical data reported by the Qos of the client, it is concluded that the collapse rate of the client program is relatively high, accounting for a large percentage of the total disconnection. Of course, other cases also exist. But the proportion is relatively small.
Therefore, Project A should first solve the crash problem of the client program, and if the problem is solved, it will solve most of the offline problems.
two。 Further research and test of TCP abnormal shutdown 1. Background
Project B games have a high percentage of dropped lines during cross-server redirection. After analyzing the logs of ProtocalHandler and tconnd, it is found that the situation of dropping is: tconnd immediately shut down Socket after sending cross-server jump message, client process received Windows 10054 error before receiving cross-server jump message, and then failed to disconnect and reconnect.
The process of project B to achieve cross-server jump is that GameSvr sends cross-server jump commands to client programs while carrying Stop requests, that is, tconnd will close the current Socket connection immediately after forwarding cross-server jump messages to the client, and the client programs of project B will report messages to the server on a regular basis. How can this cause the client program to receive a 10054 error? In view of this, do further scenario test and analysis on the connection of TCP.
2. Further testing research on TCP anomalies 2.1 testing methods
The client program and the server program establish a TCP connection, the server program closes the Socket with or without messages in the TCP buffer, and the client continues the Send and Recv messages when the peer Socket has been closed.
Note: the server side only tested the Linux version, but the client side tested both Windows and Linux versions.
2.2 Test results
The server has already close the Socket, and the client will send the data.
Purpose: to test that when the TCP peer process has closed the Socket, the local process has not detected that the connection is closed and continues to send messages to the peer.
Conclusion: the first packet can be sent successfully, but the second packet fails, and the error code is "10053: An established connection was aborted by the software in your host machine" (under Windows) or "32: Broken pipe, while receiving SIGPIPE signal" (under Linux).
After the server sends the data to TCP, it close the Socket, and the client sends another packet of data and then receives the message.
Purpose: to test that the Socket is closed after the TCP peer process sends data, and the local process sends a packet message and then receives the message when the connection is closed.
Conclusion: the client can successfully send the first packet of data (this will cause the server to send a RST packet). When the client goes to Recv, the Windows and Linux programs will behave differently as follows:
Windows client program: Recv failed with error code "10053: An established connection was aborted by the software in your host machine".
Linux client program: can normally receive all the message packets, and finally receive the normal peer-to-peer shutdown message (unlike under Window, RST packets are not received in advance).
If the server still has unreceived data in the receiving buffer of TCP, it will close the Socket, and the client will receive the packet again.
Purpose: to test whether the peer process is normal when shutting down TCP when there is still unreceived data in the receiving buffer of Socket.
Conclusion: in this case, the server will send the RST packet to the peer instead of the normal FIN packet (which has been proved by grabbing the packet), which will cause the client to receive "10054: An existing connection was forcibly closed by the remote host" (under Windows) or "10054: Connection reset by peer" (under Linux) in advance (the RST packet is received before the normal packet).
3 effect and summary 3.1 summary
Some seemingly normal behaviors of TCP applications may also cause the peer to receive an exception. For example, shutting down Socket when there is still unreceived data in the TCP receiving buffer will cause the peer to receive an abnormal shutdown instead of a normal shutdown. On the other hand, when TCP detects an abnormal shutdown, it does not necessarily indicate that there is a business problem, because it is likely that the business ends normally. The following are the main conclusions of this test:
When the peer process of the TCP connection has closed the Socket, when the local process sends the data again, the first packet can be sent successfully (but it will cause the peer to send a RST packet): after that, it will fail if the data is sent again. The error code is "10053: An established connection was aborted by the software in your host machine" (under Windows) or "32: Broken pipe while receiving SIGPIPE signal" (under Linux). After that, if the data is received, a 10053 error will be reported under Windows and a normal shutdown message will be received under Linux.
If the Socket is close in the local receiving buffer of the TCP connection, the local TCP will send the RST packet to the peer instead of the normal FIN packet, which will cause the peer process to receive "10054: An existing connection was forcibly closed by the remote host" (under Windows) or ": Connection reset by peer" (under Linux) in advance (the RST packet is received earlier than the normal packet).
3.2 effect
For a considerable part of the drop problem of project B cross-server jump, the tconnd immediately closes the Socket connection after forwarding the cross-server jump message to the client, and the client sends a packet to tconnd:
The first case: when tconnd closes Socket, there is an unreceived message in the receive buffer of its TCP, which makes the TCP of the tconnd process send the RST packet to the client instead of the normally terminated FIN packet, so the client program will receive the RST packet in advance (the RST packet will be received earlier than the normal data), rather than receiving the normal end message after receiving the cross-server jump message first. This causes the client to reconnect when it receives an abnormal network disconnection, but the previous connection was actively closed by tconnd, so it is impossible to reconnect successfully, resulting in disconnection.
The second case: after tconnd has closed the Socket, the client sends a message to tconnd before receiving the jump message and detecting the TCP shutdown, which will cause the client program to reconnect and fail when it receives an abnormal disconnection.
Finally, after discussing with the B project team, after improving most of the cross-service jump business processes, the offline ratio j has been greatly reduced, of course, there is still a certain proportion of offline, but this should be some other reasons (there are many reasons for network anomalies, which cannot be completely avoided in the current domestic network environment).
Normally, Socket is closed after sending data to the Socket of TCP. Everyone thinks that there is no problem with this normal way. The peer should receive the TCP shutdown message after receiving the data correctly, but in fact, this is not the case in some cases: when the TCP local side has unreceived data in the receive buffer, Socket is closed, which will cause the peer to receive an abnormal shutdown message from RST. When the peer sends a message again when the Socket has been disabled at the local end, it will also cause the peer to receive an abnormal shutdown message; and if the SO_LINGER option is set to avoid TIME_WAIT, the connection will be aborted prematurely and the peer will receive the RST abnormal shutdown message.
Sometimes the business process is also important to whether the connection is dropped or not (especially the connection closure process). For example, the cross-server jump offline problem of the previous project B is largely caused by the GameSvr request to close the connection. It is recommended that the closing process of each game project (including the closure of the original connection for cross-service redirection) should be initiated by the client, so as to avoid the above problems to a certain extent (because when the customer server initiates the shutdown, the general business process is over, and the server will no longer send messages to the client).
The program receives a lot of network anomalies (the most are 104errors under Linux and 10054tap10053 errors under Windos): there are problems with the network itself and improper use of applications. There are cross-network problems between operators, network intermediate router problems, game machine hardware (such as network cards and their drivers) and operating systems, antivirus software, firewalls and other software problems, as well as players' Internet access devices and routers and other intermediate equipment problems, but the client program crash may account for a high proportion of offline, which is also worthy of our attention and improvement. It is also worth noting that in some TP-LINK routers, when the UDP packet size exceeds its MTU value, the user's machine will be disconnected from the network, resulting in disconnection (this problem has already occurred in individual players of some projects).
The above content is how to analyze the problem of TCP abnormal shutdown. Have you learned any knowledge or skills? If you want to learn more skills or enrich your knowledge reserve, you are welcome to follow the industry information channel.
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.