What are the TCP problems of linux? 07/15 Update SLTechnology News&Howtos

What are the TCP problems of linux?

2025-07-15 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Shulou(Shulou.com)06/01 Report--

This article mainly explains "what are the TCP problems of linux". The content of the explanation is simple and clear, and it is easy to learn and understand. Please follow the editor's train of thought to study and learn "what are the TCP problems of linux".

What problem is TCP used to solve?

TCP, or Transmission Control Protocol, can be seen as a transmission control protocol, and the focus is on this control.

Control what?

Reliable control, sequential transmission and end-to-end flow control. Is that enough? It is not enough, it needs to be more intelligent, so it also needs to add a congestion control, which needs to be considered for the overall network situation.

This is to travel to you, me and him, and safety depends on everyone.

Why should the TCP,IP layer achieve control?

We know that the network is implemented in layers, and the network protocol is designed for communication, which can be completed from the link layer to the IP layer.

You see, the link layer is indispensable. After all, our computers are connected to each other through the link, and then IP acts as the function of the address, so through IP we can find each other and communicate.

Then why add a TCP layer? Isn't it done by implementing control in the IP layer?

The reason why we need to extract a TCP layer to achieve control is that there are more devices involved in the IP layer, a piece of data transmission on the network needs to go through many devices, and the devices need to be addressed by IP.

Assuming that the IP layer implements control, do the devices involved need to care about a lot of things? Is the overall transmission efficiency greatly reduced?

For example, if A wants to transmit a building block to F, but cannot transmit it directly, it needs to go through the hands of B, C, D and E transfer stations. There are two situations:

Suppose BCDE needs to care about whether the building block is wrong, open the package carefully, no problem, and then put it back, and finally into the hands of F. Assuming that BCDE does not care about the building blocks, just forward any package and finish it, and it is up to the final F to check whether the building block is wrong.

Which do you think is more efficient? Obviously the second kind, the forwarding device does not need to care about these things, just forward it!

Therefore, the logic of the control is separated into a TCP layer, which is handled by the real receiver, so that the overall transmission efficiency of the network is high.

What exactly is the connection?

Now that we know why we need to separate from the TCP layer, and what this layer is mainly for, let's take a look at how it works.

We all know that TCP is connection-oriented, so what exactly is this connection? Did you really pull a line to connect the end to the end?

The so-called connection is actually that both parties maintain a state, maintaining a change in the state through each communication, making it look as if a line is associated with each other.

TCP protocol header

Before going any further, we need to take a look at the format of some TCP headers, which is very basic and important.

I won't explain them one by one. Let's get to the point.

First of all, you can see that the TCP package has only ports and no IP.

Seq is the serial number of Sequence Number, which is used to solve the problem of disorder.

ACK is the Acknowledgement Numer confirmation number, which is used to solve the packet loss situation, telling the sender that I have received the packet.

The flag is the TCP flags used to mark the type of the package and to control the status of the TPC.

Windows are sliding windows, Sliding Window, which are used for flow control.

Three-way handshake

After defining the main points of the head of the agreement, let's take a look at the three handshakes.

The three-way handshake is a clich é, but do you really get it? Not floating on the surface? Can you extend something else?

Let's first take a look at the familiar process.

Why do you shake hands in the first place? in fact, the main reason is to initialize the full name of Seq Numer,SYN is Synchronize Sequence Numbers, which is used to ensure the sequence of data after transmission.

I don't think there's anything wrong with you to say it's for testing to make sure that both sides send and receive functions properly, but I think the point is to synchronize the sequence number.

Then why three times? take the roles of me and you. First of all, I'll tell you my initialization number. You heard me and told me that you received it.

Then you tell me your initial serial number, and I'll tell you I got it.

Is that like four times? If you really press it four times, but the middle step can be put together, that is, you tell me your initial serial number when you tell me that you know my initial serial number.

So four handshakes can be reduced to three.

But have you ever thought of a situation in which you and I speak at the same time, tell each other their initial serial numbers, and then respond to them separately? isn't this a four-way handshake?

Let me draw a picture to make it clearer.

See if it's a four-time handshake? However, it depends on the implementation, and some implementations may not allow this to happen, but this does not affect our thinking, because the focus of the handshake is to synchronize the initial sequence number, which achieves the goal of synchronization.

The value of the initial sequence number ISN

Have you ever thought about what the value of ISN should be set to? Code is written from scratch?

Imagine if you write a dead value, such as 0, then suppose the connection has been established, and client has sent a lot of packets, such as the 20th packet, and then after the network is down, the client is renewed, the port number is the same as the previous one, and then the sequence number starts from 0, and the server returns the ack of the 20th packet. Is the client stupid?

So RFC793 believes that ISN should be tied to a fake clock ISN plus one every four microseconds, and when it exceeds 2 to the power of 32 and then starts at zero, it takes about four and a half hours for ISN winding to occur.

So ISN becomes a recursive value-added, and the real implementation also needs to add some random values to it to prevent lawbreakers from guessing ISN.

What should I do if SYN times out?

That is, client sends SYN to server and then hangs up, and server sends SYN+ACK and doesn't get a reply. What if I don't get a reply?

All I can think of is retry, but I can't retry many times in a row. Imagine that if client goes offline, you have to give it some time to recover, so you need to retry slowly, step by step.

In Linux, the default retry is 5 times, and it is a step-by-step retry with an interval of 1s, 2s, 4s, 8s and 16s. After the fifth retry, you have to wait 32s to know the result of this retry, so it takes a total of 63s to disconnect.

SYN Flood attack

Did you see that the SYN timeout takes 63 seconds for the server to disconnect, that is, the server needs to maintain this resource within 63 seconds, so lawbreakers can construct a large number of client to send SYN to the server but do not return the server.

As a result, the SYN queue of server is exhausted and cannot handle normal connection requests.

So what do we do?

You can turn on tcp_syncookies, so you don't need the SYN queue.

After the SYN queue is full, TCP generates a special serial number (that is, cookie) based on its own ip, port, ip, port, SYN serial number, timestamp and other operations. If the other party is a normal client, the serial number will be sent back, and then server will establish a connection according to this serial number.

Or adjust tcp_synack_retries to reduce the number of retries, set tcp_max_syn_backlog to increase the number of SYN queues, and set tcp_abort_on_overflow SYN queues to be full of direct reject connections.

Why do you wave four times?

Four waves and three handshakes are paired, and they are also A-list stars in TCP. Let's review the familiar pictures.

Why do you need to wave four times? Because TCP is a full-duplex agreement, that is, both parties have to shut down, and each side sends the FIN to the other and responds to the ACK.

Just like I told you that I had finished sending the data, and then you replied that you received it. Then you told me that you had finished sending the data, and then I replied to you that I had received it.

So it looks like four times.

You can see from the figure that the status of the active shutdown party is FIN_WAIT_1 to FIN_WAIT_2 and then to TIME_WAIT, while the passive shutdown party is CLOSE_WAIT to LAST_ACK.

Does the state of four waves have to change like this?

Does the state have to change like this? Let's look at another picture.

The picture is from the network.

You can see that both parties initiate the disconnect request actively, so each is the active initiator, and the state will go from FIN_WAIT_1 to CLOSING and then to TIME_WAIT.

Do I have to wave four times?

Suppose that client has no data to send to server, so it sends FIN to server indicating that it has finished sending data, not sending it any more. If server still has data to send to client at this time, it will reply to ack first, and then continue to send data.

After the server data has been sent, send the FIN to client indicating that it is finished, and then wait for the ACK of client. In this case, you will have four waves.

So suppose that when client sends FIN to server, server also has no data to client, so server can send ACK and its FIN to client, and then wait for client's ACK, so don't you wave three times?

Why is there a TIME_WAIT?

After receiving the FIN from the receiver and replying to the ACK, the disconnected initiator does not directly enter the CLOSED state, but waits for a wave of 2MSL.

MSL is Maximum Segment Lifetime, that is, the maximum lifetime of a message. The MSL time defined by RFC 793is 2 minutes, and the actual implementation of Linux is 30s, so 2MSL is one minute.

So why wait for 2MSL?

It is afraid that the passive shutdown party does not receive the final ACK, and if the passive party does not arrive due to network reasons, it will send FIN again. At this time, it will be silly if the active shutdown party has already CLOSED, so wait a while.

If the connection is disconnected, but the connection is reused, that is, the quintuple is exactly the same, and the sequence number is still within the appropriate range, although the probability is very low, it is also possible in theory. then the new connection will be interfered by some residual data on the closed connection link, so it is given a certain amount of time to process some residual data.

What's wrong with waiting for 2MSL?

If the server actively closes a large number of connections, a large amount of resources will be consumed, and resources will not be released until 2MSL.

If the client actively closes a large number of connections, then those ports in the 2MSL are occupied, and there are only 65535 ports. If the ports are exhausted, the connection cannot be initiated. But I think this probability is very low. How many connections do you want to establish with so many ports?

How to solve the problems caused by 2MSL?

Fast recycling, that is, it can be recycled without waiting for 2MSL. The parameter for Linux is tcp_tw_recycle, and tcp_timestamps is enabled by default.

In fact, we have already analyzed why we need to wait for 2MSL, so if the waiting time is decisive, there will be the problems mentioned above.

Therefore, it is not recommended to turn it on, and this parameter has been erased since Linux version 4.12.

A friend mentioned this thing in the group not long ago.

When asked, there is indeed a figure of NAT.

The phenomenon is that the static resources of the server are occasionally requested by the requester for about 20-60 seconds before there is a response. From the packet capture, there is no response from the requester for three consecutive SYN.

For example, when you are at school, you may have a public network IP, and then turn on tcp_tw_recycle (when tcp_timestamps is also open). The timestamp must be incremented in the connection request for the same origin IP within 60 seconds, otherwise packets that are considered to be expired will be discarded.

With so many machines in the school, you can't guarantee that the timestamp is consistent, so there will be problems.

So this thing is not recommended.

Reuse, that is, turning on tcp_tw_reuse, of course, requires tcp_timestamps.

The point here is that tcp_tw_reuse is used to connect to the initiator, while our server basically connects to the passive receiver.

When tcp_tw_reuse initiates a new connection, it can reuse connections in the TIME_WAIT state for more than 1 second, so it does not reduce the pressure on our server at all.

It reuses the connection where the initiator is in the TIME_WAIT.

There is also a SO_REUSEADDR, which some people will confuse with tcp_tw_reuse, first of all, tcp_tw_reuse is the kernel option and SO_REUSEADDR is the user mode option.

Then SO_REUSEADDR is mainly used when you start the service, if the port is occupied and the connection is in TIME_WAIT state, then you can reuse this port, if it is not TIME_WAIT, then it will give you an Address already in use.

So neither of these things seems to work, and tcp_tw_reuse and tcp_tw_recycle, in fact, violate the TCP agreement. It was agreed that you secretly let go of me when I was old.

Either reduce the time of MSL, but it is not very secure, or adjust the number of TIME_WAIT controlled by tcp_max_tw_buckets, but the default value is already very large by 180000, which should be used to combat DDos attacks.

So my advice is not to shut down the server actively, but to put the active shutdown party to the client. After all, our server is a pair of many services, our resources are more valuable.

Attack yourself.

There is also a very coquettish solution, which I have imagined, is to attack myself.

Socket has an option called IP_TRANSPARENT, which binds to a non-local address, and then the server writes down the ip and port of the connection, such as writing it somewhere local.

Then start a service, if the server resources are tight, then you set a time, after how long will be in the TIME_WAIT state of the other party ip and port to tell the service.

Then the service uses the IP_TRANSPARENT disguised as the previous client to initiate a request to the server, and then the server receives an ACK to the real client, and the client has been turned off, saying what you are doing, so the server replies with a RST, and then the server terminates the connection.

What problem is solved by the timeout retransmission mechanism?

We mentioned earlier that TCP should provide reliable transmission, then the network is unstable. If the transmitted packet is not received but must be guaranteed to be reliable, then it must be retransmitted.

The reliability of TCP depends on the confirmation number. For example, when I send you packages 1, 2, 3 and 4, you tell me that you want 5 now, which means you have received the first four bags. That's what happened.

Note, however, that both SeqNum and ACK are in bytes, that is to say, if you receive 1, 2, 4 but 3, you cannot ACK 5. If you reply 5, then the sender will think that you have received everything before 5.

So you can only reply to confirm that the maximum number of packets received in a row is 3.

However, the sender does not know whether the three and four packets have not yet arrived or have been lost, so the sender needs to wait, and the waiting time is more exquisite.

If you are in such a hurry, maybe ACK is already on its way, and your retransmission is a waste of resources. If it is too sloppy, then the receiver is worried to death. Why don't you send the package? thank you for all the flowers I've been waiting for.

So the time to wait for timeout retransmission is very critical, how to do it? Smart friends may immediately think of it, you can estimate the normal round trip time is not good, I will wait so long.

This round trip time is called RTT, that is, Round Trip Time, and then set the timeout retransmission time RTO, that is, Retransmission Timeout, according to this time.

But here probably have to RTO to refer to RTT, but how to calculate? First of all, it must be sampling, and then a wave of weighted averaging gets RTO.

The formula defined by RFC793 is as follows:

1. Sample RTT 2, SRTT = (ALPHA * SRTT) + ((1-ALPHA) * RTT) 3, RTO = min [UBOUND,max [LBOUND, (BETA*SRTT)]]

ALPHA is a smoothing factor with a value between 0.8 and 0.9, UBOUND is the upper bound of the timeout-1 minute, LBOUND is the lower bound-1 second, and BETA is a delayed variance factor with a value of 1.3-2.0.

But there is another question: the time of RTT sampling is the time from the time of sending the data to the time of receiving ACK as the sample value or the time of retransmission to ACK as the sample value?

The picture is from the network.

As you can see from the picture, one time is long and one time is short, which is a little difficult, because you don't know who the ACK is replying to.

So what do we do? I wish I didn't sample the retransmission back and forth. I don't know who the ACK replied to this time. I don't care about him. I'll sample the normal back and forth.

This is the Karn / Partridge algorithm, which does not sample the retransmitted RTT.

However, retransmission without sampling will have problems. For example, at a certain moment, the network is suddenly very poor. If you ignore the retransmission, then the RTO is calculated according to the normal RTT, then the timeout period is too short, so the crazy retransmission increases the load on the network when the network is very poor.

So the Karn algorithm rudely made a retransmission and I doubled the current RTO, huh! It's that simple and rough.

But this kind of average calculation is easy to smooth out a sudden big fluctuation, so we have another algorithm called Jacobson / Karels Algorithm.

It uses the latest RTT and the slippery SRTT to do wave calculation to get the right RTO, and I won't stick the formula. Anyway, I don't understand, and if I don't understand, I won't beep.

Why do you need a fast retransmission mechanism?

Timeout retransmission is driven by time. If the network condition is really bad, timeout retransmission is fine, but if the network condition is good, it just happens to lose the packet, then there is no need to wait so long.

So the introduction of data-driven retransmission called fast retransmission, what does it mean? That is, if the sender receives the same confirmation number of the other party three times in a row, it will immediately retransmit the data.

Because the same ACK has been received three times in a row to prove that the current network condition is ok, then it is confirmed that the packet has been lost, so it is immediately resent, there is no need to wait so long.

The picture is from the network.

It seems perfect, but have you ever thought that when I send 1, 2, 3, 4 packets, the other party does not receive them, 1, 3, 4 receive them, and then whether it is a time-out retransmission or a fast retransmission, the other party will return to ACK 2 anyway.

Do you want to retransmit 2, 3, 4 or just 2 at this time?

What problem is the introduction of SACK to solve?

SACK, namely Selective Acknowledgment, is introduced to solve the problem that the sender does not know which data to retransmit.

Let's take a look at the picture below.

The picture is from the network.

SACK means that the receiver sends back the data it has received, so that the sender knows which data has been received by the other party, so it can selectively send the lost data.

As shown in the picture, tell me through ACK to start the next 5500 data, and keep updating SACK,6000-6500 I received, 6000-7000 data I received, 6000-7500 data I received, the sender clearly knows that the 5500-5999 wave of data should be lost, so retransmit.

And if the data is multi-segment discontiguous, SACK can also send, for example, SACK 0-500, 1000-1500, 2000-2500. It shows that these paragraphs have been received.

What is D-SACK?

D-SACK is actually an extension of SACK, it uses the first paragraph of SACK to describe the repeated acceptance of discontiguous data sequence numbers, if the scope described in the first paragraph is covered by ACK, it is repeated, for example, I am ACK to 6000, you still give me a reply to SACK 5000-5500?

To put it bluntly, it is compared with the received ACK from the feedback in the first paragraph. The parameter tcp_dsack,Linux 2.4 is enabled by default.

What's the use of knowing to repeat it?

1. Knowing that it was repeated to show that the other party received the package just now, so the ACK bag that came back was lost. 2. Is the packet out of order, sending the package first and arriving later? 3. Are you in too much of a hurry and the RTO is too small? 4. Is it copied by the data to get a head start?

What's the sliding window for?

We already know the TCP serial number, and there are retransmissions, but this is not enough, because we are not young, but also need to control the transmission rate according to the situation, because the network is complex and changeable, sometimes it will be blocked, and sometimes it is very smooth.

Therefore, the sender needs to know the situation of the receiver in order to control the rate of transmission, so that the receiver will not be able to accept it.

So TCP has something called sliding window for flow control, that is, the receiver tells the sender how much more data I can accept, and then the sender can send the data based on this information.

The following is the window maintained by the sender, which is circled in black.

The picture is from the network.

In the figure, # 1 is the data that has been received from ACK, # 2 is the data that has been sent but has not been received by ACK, and # 3 is the data that can be sent in the window but has not yet been sent. # 4 is data that cannot be sent yet.

Then the ACK of 36 was received and 46-51 bytes were sent out, so the window slid to the right.

The picture is from the Internet.

There is also a complete picture on TCP/IP Guide, which is very clear. Let's have a look.

What if the window of the recipient's reply is always 0?

As mentioned above, the sending method controls how much data can be sent according to the window of the receiver's response. If the receiver keeps responding to 0, the sender will stand still.

If you think about it, all the data sent by the sender has been ACK, but the response window is 0. The sender does not dare to send it at this time, ah, it can't wait all the time, ah, when will the Window not change?

So TCP has a Zero Window Probe technology. After the sender learns that the window is 0, it will probe whether the receiver can or not, that is, send the ZWP packet to the receiver.

Specifically, it can be sent many times, and then there is an interval, and after many times, you can RST directly.

What if the receiver's response window is small each time?

Imagine if every time the receiver says I can receive one more byte, should the sender send it?

TCP + IP header is 40 bytes, this transmission is not cost-effective, if you keep sending it foolishly, it is called Silly Window.

What to do, one thought is to wait for the sender to wait for fattening and send it again, or the receiver will be aware of it, and if the data is less than a threshold, tell the sender window that it is 0 at this time, and then tell the sender when it is fattened.

The scheme that the sender is waiting for is the nag algorithm, which I believe will be known by looking at the code.

To put it simply, the current data and window that can be sent will be sent immediately if the data and window is greater than or equal to MSS, otherwise, it will be judged whether the previously sent packet ACK comes back or not, and then send it back, otherwise you will save the data.

The self-conscious solution of the receiver is the David D Clark's scheme. If the window data is less than a certain threshold, tell the sender window 0 not to send it, wait for the data to be greater than or equal to MSS, or accept buffer to free up half of the space, and then set the normal window value to the sender.

By the way, the Nag algorithm has to mention the delayed acknowledgement again. The nag algorithm is waiting for the confirmation from the receiver, while turning on the delayed acknowledgement will delay the sending of the acknowledgment. will wait for the packet to be received and then confirm together or wait for a period of time and then reply to the confirmation.

This waits for each other, and then the delay is so great that the two cannot be opened at the same time.

Why congestion control when there is already a sliding window?

As I mentioned earlier, congestion control is added because TCP not only manages the situation between the two ends, but also needs to know the overall network situation. After all, the road will be smooth only if everyone abides by the rules.

We mentioned retransmission earlier. If, regardless of the overall situation of the network, it must be that the other party did not give it to ACK, then I have no brain to retransmit.

If the network condition is very bad at this time, and all the connections are retransmitted without a brain, will the network situation be even worse and more congested?

And then the more congested, the more re-pass, keep rushing! And then GG.

So a congestion control is needed to avoid sending this situation.

What about congestion control?

The main steps are as follows:

1. Start slowly and explore the way. 2, congestion avoidance, feel almost decelerated look 3, congestion occurs fast retransmission / recovery

Slow start, that is, the new driver takes his time on the road, initializing cwnd (Congestion Window) to 1, then cwnd++ every ACK he receives and every RTT he passes, cwnd = 2*cwnd.

There is an index in the linearity, and there is a linear increase in the index.

Then when we reach a threshold, that is, ssthresh (slow start threshold), we enter the congestion avoidance phase.

This phase is cwnd = cwnd+ 1/cwnd for every ACK received and cwnd++ for each RTT.

You can see that they all increase linearly.

Then it keeps increasing until packet loss occurs. It has been analyzed earlier that there are two kinds of retransmission, one is timeout retransmission, the other is fast retransmission.

If a timeout retransmission occurs, the situation is a little bad, so the ssthresh is directly set to half of the current cwnd, and then the cwnd is directly changed to 1, entering the slow start phase.

If it is a fast retransmission, then there are two implementations, one is TCP Tahoe, which is handled in the same way as timeout retransmission.

One is TCP Reno, which implements cwnd = cwnd/2, and then sets ssthresh to the current cwnd.

Then enter the fast recovery phase, set cwnd = cwnd+ 3 (because there are three fast retransmissions), and retransmit the packet specified by DACK. If you receive another DACK, if cwnd++, receives a normal ACK, then set cwnd to ssthresh size and enter the congestion avoidance phase.

You can see that a specified packet is retransmitted after rapid recovery. It is possible that many packets have been lost, and then other packets can only wait for timeout retransmission. Timeout retransmission will cause the cwnd to be halved, and the number of triggers will drop exponentially.

So we created a New Reno and added a New, which improved fast recovery without SACK. It will observe whether the response ACK of the packet specified by the retransmission DACK is the largest ACK that has been sent. For example, if you sent 1, 2, 3, 4, the other party did not receive 2, but 3, 4 received it, so after you retransmit 2, the ACK must be 5, indicating that this packet has been lost.

Otherwise, there are other packets missing. If you lose one packet, it will be the same as before. If other packets are lost, you will continue to retransmit them until the ACK is all and then exit the fast recovery phase.

To put it simply, we will not finish this session until all the packets have been received.

There is also a FACK, which is based on SACK to be used as congestion control in the retransmission process. Compared to the above New Reno, we know that it has SACK, so we don't need to try it one by one.

What other congestion control algorithms are there?

There are so many on Wikipedia.

Thank you for your reading, the above is the content of "what are the TCP problems of linux". After the study of this article, I believe you have a deeper understanding of what the TCP problems of linux have, and the specific use needs to be verified in practice. Here is, the editor will push for you more related knowledge points of the article, welcome to follow!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.