How to discuss Linux Kernel Parameter Optimization from TCP/IP Protocol 07/02 Update SLTechnology News&Howtos

How to discuss Linux Kernel Parameter Optimization from TCP/IP Protocol

2025-07-02 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Servers >

Shulou(Shulou.com)06/01 Report--

This article introduces how to discuss Linux kernel parameter optimization from TCP/IP protocol. The content is very detailed. Interested friends can use it for reference. I hope it will be helpful to you.

In the case of limited hardware resources, to maximize the server performance and improve the concurrent processing capacity of the server is a problem that many technicians think about. In addition to optimizing the configuration of service software such as Nginx/PHP-FPM/Mysql/Redis, we can also improve the server performance by modifying the TCP parameters related to the kernel of Linux.

Before optimizing the Linux kernel parameters, we need to understand the TCP/IP protocol, which is the theoretical basis for us to optimize.

TCP/IP protocol

TCP/IP protocol is a very complex protocol, and it is not easy to master it completely, but as a basic knowledge, we must know the logical process of three-way handshake and four-wave hand of TCP/ IP.

Three-way handshake

The so-called three-way handshake means that when establishing a TCP connection, the client and server need to send a total of three packets to confirm the establishment of the connection. In socket programming, this process is triggered by the client executing connect.

Three-way handshake flowchart:

Three-way handshake process

The first handshake: the client sets the flag bit SYN to 1, randomly generates a value seq=J, and sends the packet to the server. The client enters the SYN_SENT state and waits for the server to confirm.

The second handshake: after the server receives the data packet, the flag bit SYN=1 knows that the client requests to establish a connection. The server sets the flag bit SYN and ACK to 1 seq=K, randomly generates a value of seq=K, and sends the packet to the client to confirm the connection request. The server enters the SYN_RCVD state.

The third handshake: after receiving the confirmation, the client checks whether the ack is Jacks 1, and if it is correct, set the flag bit ACK to 1, and send the data packet to the server. The server checks whether the ack is 1, and if it is correct, the connection is established successfully, and the client and the server enter the ESTABLISHED state to complete the three-way handshake, and then the client and the server can start to transfer data.

Waving four times

The TCP connection is terminated with four waves, which means that when a TCP connection is disconnected, the client and server need to send a total of 4 packets to confirm the disconnection. In socket programming, this process is triggered by the client or server side executing close.

Because the TCP connection is full-duplex, each direction must be closed separately. This principle is that when one party completes the data transmission task, it sends a FIN to terminate the connection in this direction. Receiving a FIN just means that there is no data flow in this direction, that is, no more data will be received, but data can still be sent on this TCP connection until FIN is also sent in this direction. The first party to close will perform an active shutdown, while the other party will perform a passive shutdown.

Flow chart of four waves:

Four-time wave process

The disconnected end can be either the client side or the server side.

First wave: the client sends a FIN=M to turn off the data transfer from the client to the server, and the client enters the FIN_WAIT_1 state. It means "my client has no data to send to you", but if you still have data to send on the server side, you don't have to close the connection in a hurry, you can continue to send data.

The second wave: after receiving the FIN, the server first sends ack=M+1 to tell the client that I have received your request, but I am not ready yet. Please continue to wait for my message. At this time, the client enters the FIN_WAIT_2 state and continues to wait for the server-side FIN message.

The third wave: when the server determines that the data has been sent, it sends a FIN=N message to the client, telling the client, all right, I have finished sending the data, and I am ready to close the connection. The server enters the LAST_ACK state.

The fourth wave: after receiving the FIN=N message, the client knows that the connection can be closed, but he still doesn't trust the network, fearing that the server does not know to shut it down, so he enters the TIME_WAIT state after sending ack=N+1. If the Server side does not receive the ACK, it can retransmit it. When the server receives the ACK, it knows it's ready to disconnect. If the client still does not receive a reply after waiting for 2MSL, it proves that the server has been shut down normally. Well, my client can also close the connection. Finally completed four handshakes.

Serial number and confirmation reply

As we all know, TCP/IP protocol is a highly reliable communication protocol, which ensures high reliability of communication through sequence number and acknowledgement reply. There are several key points:

When the data of the sender reaches the receiving host, the receiving host returns a notification that the message has been received. This message is called a confirmation reply (ACK). When the sender sends out the data, it will wait for the confirmation reply from the opposite side. If there is a confirmation reply, the data has successfully reached the opposite end. On the contrary, there is a great possibility of data loss.

If the sender does not wait for a confirmation reply within a certain period of time, the sender can think that the data has been lost and resend. As a result, even if packet loss occurs, the data can still be guaranteed to reach the opposite end and reliable transmission can be achieved.

Not receiving a confirmation reply does not mean that the data must be lost. It is also possible that the data has been received by the other party, but the confirmation reply returned is lost on the way. This situation can also cause the sender to mistakenly think that the data has not reached its destination and resend the data.

In addition, it is also possible to delay the arrival of acknowledgment replies due to some other reasons, and it is not uncommon to arrive after the source host retransmits the data. At this point, the source host only needs to resend the data according to the mechanism.

It is not advisable for the target host to receive the same data repeatedly. In order to provide reliable transmission for upper-layer applications, the target host must abandon duplicate packets. For this reason, we introduced the serial number.

The serial number is the numbering of each byte (8-bit byte) of the sent data in order. The receiver queries the sequence number and the length of the data in the header of the received data TCP, and sends back the sequence number that it should receive next as a confirmation reply. Through the sequence number and confirmation response number, TCP can identify whether the data has been received, and can determine whether it needs to be received, so as to achieve reliable transmission.

The retransmission timeout refers to the specific time interval between waiting for an acknowledgement to arrive before resending the data. If the acknowledgement is not received after this time, the sender will resend the data. Ideally, find a minimum time that ensures that "the confirmation response will be returned within that time".

TCP requires that high-performance communication be provided no matter what the network environment is, and this characteristic must be maintained no matter how the network congestion changes. For this reason, it calculates the round-trip time and its deviation each time the package is sent. Add the round trip time and the deviation time, and the retransmission timeout is a slightly larger value than the sum.

If no acknowledgement is received after the data is retransmitted, it will be sent again. At this point, the waiting time for a confirmation reply will be extended by 2 times and 4 times the exponential function.

In addition, the data will not be retransmitted indefinitely and repeatedly. After a certain number of retransmissions, if no acknowledgement is returned, it will be judged as an exception on the network or peer host, and the connection will be forced to close. And notify the application that the communication is forcibly terminated.

Defects in TCP/IP protocol

After learning about the TCP/IP protocol, we will find several problems:

In the three-way handshake, if the client interrupts or does not respond to the ACK=1 packet sent back by the server after initiating the first handshake, the server will keep retrying to send the packet until it times out. Yes, that's how SYN FLOOD attacks work.

In the four waves, the client that actively closes the connection will continue to 2MSL for a long time after it is in the TIME_WAIT state. MSL is the maximum segment lifetime (maximum segment lifetime), which is the longest time an IP packet can survive on the Internet. Beyond this time, it will disappear in the network (the TIME_WAIT state is generally maintained at 1-4 minutes). The length of 2MSL time is used to ensure that the old connection state does not affect the new connection. The resources occupied by connections in TIME_WAIT state will not be released by the kernel, so as a server, try not to disconnect actively when possible, so as to reduce the waste of resources caused by TIME_WAIT state. If our server is a load balancer server and the upstream server is unaffected for a long time, the load balancer server will actively close the link, which will lead to the accumulation of TIME_WAIT status in high concurrency scenarios.

In the four waves, if the client does not return ACK after receiving the FIN message, the server will also keep trying to send the FIN message, so the server will accumulate the CLOSE_WAIT status.

SYN Flood attack

Syn Flood attack is the most common DDoS attack on the current network, and it is also the most classic denial of service attack. It takes advantage of a defect in the implementation of the TCP protocol. By sending a large number of fake source address attack messages to the port where the network service is located, it may cause the half-open connection queue in the target server to be full, thus preventing other legitimate users from accessing.

Principle of Syn Flood attack

The attacker first falsifies the address and initiates a SYN request to the server (can I establish a connection?), and the server will respond with an ACK+SYN (can + please confirm). The real IP would think that I didn't send a request and didn't respond. The server does not receive a response and will retry 3-5 times and wait for a SYN Time (usually 30 seconds-2 minutes) before dropping the connection.

If an attacker sends a large number of SYN requests with forged source addresses, the server will consume a lot of resources to deal with this semi-connection, and saving traversal will consume a lot of CPU time and memory, not to mention constantly retrying the IP in this list. TCP is a reliable protocol, and the message will be retransmitted. By default, the number of retries is 5. The interval between retries is doubled from 1s to 1s + 2s + 4s + 8s + 16s = 31s, respectively. You have to wait 32s after the fifth time to know that the fifth time has timed out, so the total is 31 + 32 = 63s.

A fake syn message will take 63 seconds for the TCP to prepare the queue, while the semi-connection queue defaults to 1024. Without any protection, sending 20 fake syn packets per second is enough to burst the semi-connection queue, so that the real connection cannot be established and cannot respond to normal requests. The end result is that the server has no time to listen to the normal connection request-denial of service.

Kernel TCP parameter optimization

Edit the file / etc/sysctl.conf and add the following:

Net.ipv4.tcp_fin_timeout = 2 net.ipv4.tcp_tw_reuse = 1 net.ipv4.tcp_tw_recycle = 1 net.ipv4.tcp_syncookies = 1 net.ipv4.tcp_keepalive_time = 600 net.ipv4.ip_local_port_range = 4000 65000 net.ipv4.tcp_max_syn_backlog = 16384 net.ipv4.tcp_max_tw_buckets = 36000 net.ipv4.route.gc_timeout = 100 net.ipv4.tcp_syn_retries = 1 net.ipv4.tcp _ synack_retries = 1 net.core.somaxconn = 16384 net.core.netdev_max_backlog = 16384 net.ipv4.tcp_max_orphans = 16384

Then execute sysctl-p to make the parameter take effect.

Function description:

Net.ipv4.tcp_fin_timeout indicates that the socket is closed by the local request, and this parameter determines how long it remains in the FIN-WAIT-2 state. The default value is 60 seconds. This parameter corresponds to the system path: / proc/sys/net/ipv4/tcp_fin_timeout 60

Net.ipv4.tcp_tw_reuse means to turn on reuse. Allows TIME-WAIT sockets to be reused for new TCP connections. The default value is 0, which means off. This parameter corresponds to the system path: / proc/sys/net/ipv4/tcp_tw_reuse 0

Net.ipv4.tcp_tw_recycle means to enable fast recycling of TIME-WAIT sockets in TCP connections. This parameter corresponds to the system path: / proc/sys/net/ipv4/tcp_tw_recycle. The default is 0, which means that it is turned off. Note: the parameters reuse and recycle are set to prevent excessive number of time_wait network states of business servers such as Web and Squid in production environment.

Net.ipv4.tcp_syncookies means to enable the SYN Cookies function. When there is a SYN waiting queue overflow, enable Cookies to deal with it to prevent a small number of SYN attacks, this parameter can not be added. This parameter corresponds to the system path: / proc/sys/net/ipv4/tcp_syncookies, and the default value is 1

Net.ipv4.tcp_keepalive_time indicates how often TCP sends keepalive messages when keepalive is enabled. The default is 2 hours, and it is recommended to change it to 10 minutes. This parameter corresponds to the system path: / proc/sys/net/ipv4/tcp_keepalive_time, and the default is 7200 seconds.

Net.ipv4.ip_local_port_range this option is used to set the range of ports that the system is allowed to open, that is, the range of ports used to connect outward. This parameter corresponds to the system path: / proc/sys/net/ipv4/ip_local_port_range 32768 61000

Net.ipv4.tcp_max_syn_backlog represents the length of the SYN queue, which defaults to 1024. It is recommended that you increase the queue length to 8192 or more to accommodate more network connections waiting for connections. This parameter is the maximum value used by the server to record connection requests that have not received an acknowledgement from the client. The system path of the parameter object is: / proc/sys/net/ipv4/tcp_max_syn_backlog

Net.ipv4.tcp_max_tw_buckets indicates that the system maintains the maximum number of TIME_WAIT sockets at the same time, and if this value is exceeded, the TIME_WAIT socket will be cleared immediately and a warning message will be printed. The default is 180000, which can be lowered to 5000mm 30000 for Apache, Nginx and other servers, and larger for servers that do not have access to business, such as LVS and Squid. This parameter controls the maximum number of TIME_WAIT sockets to prevent the Squid server from being dragged to death by a large number of TIME_WAIT sockets. This parameter corresponds to the system path: / proc/sys/net/ipv4/tcp_max_tw_buckets

The value of the net.ipv4.tcp_synack_retries parameter determines the number of SYN+ACK packets sent by the kernel before the connection is abandoned. This parameter corresponds to the system path: / proc/sys/net/ipv4/tcp_synack_retries, and the default value is 5

Net.ipv4.tcp_syn_retries indicates the number of SYN packets sent before the kernel gives up establishing a connection. This parameter corresponds to the system path: / proc/sys/net/ipv4/tcp_syn_retries 5

Net.ipv4.tcp_max_orphans is used to set the maximum number of TCP sockets in the system that are not associated with any user file handle. If this value is exceeded, the orphaned connection is immediately reset and a warning message is printed. This restriction is only intended to prevent simple DoS attacks. You can't rely too much on this limit or even think that you can decrease this value, but more often you will increase it. This parameter corresponds to the system path of / proc/sys/net/ipv4/tcp_max_orphans 65536.

Net.core.somaxconn the default value of this option is 128. this parameter is used to adjust the number of TCP connections initiated by the system at the same time. In highly concurrent requests, the default value may cause link timeout or retransmission. Therefore, this value needs to be adjusted in combination with the number of concurrent requests. The corresponding system path for this parameter is / proc/sys/net/core/somaxconn 128,

Net.core.netdev_max_backlog indicates the maximum number of packets allowed to be sent to the queue when each network interface receives packets faster than the kernel processes them. This parameter corresponds to the system path: / proc/sys/net/core/netdev_max_backlog, with a default value of 1000

On how to discuss the Linux kernel parameter optimization from the TCP/IP protocol to share here, I hope the above content can be of some help to you, can learn more knowledge. If you think the article is good, you can share it for more people to see.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.