How to parse the Socket and TCP connection process 07/09 Update SLTechnology News&Howtos

How to parse the Socket and TCP connection process

2025-07-09 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Shulou(Shulou.com)06/01 Report--

How to analyze the Socket and TCP connection process, many novices are not very clear about this, in order to help you solve this problem, the following editor will explain in detail for you, people with this need can come to learn, I hope you can gain something.

one。 Background

1. Complete socket format {protocol,src_addr,src_port,dest_addr,dest_port}.

This is often referred to as the quintuple of sockets. Where protocol specifies the TCP or UDP connection, and the rest specifies the source address, source port, destination address, and destination port, respectively. But where did all this content come from?

The 2.TCP protocol stack maintains two socket buffers: send buffer and recv buffer.

The data to be sent out through the TCP connection is first copied to the send buffer, either from the app buffer of the user space process or from the kernel buffer of the kernel. The copying process is completed through the send () function. Because the write () function can also be used to write data, this process is also called write data, and the corresponding send buffer has a nickname write buffer. But the send () function is more efficient than the write () function.

In the end, the data flows out through the network card, so the data in the send buffer needs to be copied to the network card. Because one end is the memory and the other is the network card device, it can be copied directly by using DMA without the participation of CPU. In other words, the data in send buffer is copied to the network card by DMA and transmitted through the network to the other end of the TCP connection: the receiver.

When receiving data through a TCP connection, the data must first flow in through the network card, and then be copied to the recv buffer through the same way of DMA, and then copy the data from the recv buffer to the app buffer of the user space process through the recv () function.

3. Two types of sockets: listening sockets and connected sockets.

When the service process reads the configuration file, the listening socket parses the address and port from the configuration file, then creates it through the socket () function, and then binds the listening socket to the corresponding address and port through the bind () function. The process / thread can then listen to the port (strictly speaking, the listening socket) through the listen () function.

A connected socket is a socket returned by the accept () function after listening for a TCP connection request and a three-way handshake. Subsequent processes / threads can communicate with the client through this connected socket.

In order to distinguish between the two socket descriptors returned by the socket () function and the accept () function, some people use listenfd and connfd to denote listening sockets and connected sockets, respectively.

The following is to explain the role of various functions, analysis of these functions, but also in the process of connecting and disconnecting.

two。 Analysis of the concrete process of connection

2.1 socket () function

The purpose of the socket () function is to generate a socket file descriptor sockfd (socket () creates an endpoint for communication and returns a descriptor) for communication. This socket descriptor can be used as a binding object for the bind () function later.

2.2 bind () function

By analyzing the configuration file, the server parses out the address and port you want to listen on, plus the socket sockfd that can be generated by the socket () function, you can use the bind () function to bind the socket to the address and port combination "addr:port" that you want to listen on. A socket bound to a port can be used as a listening object for the listen () function.

Sockets that bind addresses and ports have source addresses and source ports (for the server itself), plus three tuples in the quintuple through the protocol type specified in the configuration file. That is:

{protocal,src_addr,src_port}

However, it is common to see that some service programs can be configured to listen to multiple addresses and ports to achieve multiple instances. This is actually achieved by generating and binding multiple sockets through multiple socket () + bind () system calls.

2.3 listen () function and connect () function

As the name implies, the listen () function listens for sockets that have been bound to addr+port through bind (). After listening, the socket changes from the CLOSE state to the LISTEN state, so the socket can provide a window for TCP connections.

The connect () function is used to initiate a connection request to a listening socket, that is, to initiate a three-way handshake for TCP. As you can see here, the connection requestor (such as the client) uses the connect () function, and of course, before initiating connect (), the connection initiator also needs to generate a sockfd, most likely using a socket bound with a random port. Since the connect () function initiates a connection to a socket, it is natural to use the connect () function with the destination of the connection, that is, the destination address and destination port, which is the address and port bound on the server's listening socket. At the same time, it also brings its own address and port, which is the source address and source port of the connection request for the server. As a result, the sockets at both ends of the TCP connection have become the complete format of the quintuple.

2.3.1 Deep Analysis of listen ()

Let's talk more about the listen () function. If you listen to multiple addresses + ports, that is, if you need to listen to multiple sockets, the process / thread responsible for listening will use select () and poll () to poll these sockets (of course, you can also use epoll () mode). In fact, only one socket descriptor is interested in select () or poll () when monitoring only one socket.

Whether you use the select () or poll () mode (needless to say about the different monitoring methods of epoll), it blocks on select () or poll () during process / thread (listener) listening. Until data (SYN information) is written to the sockfd it listens to (that is, recv buffer), the listener is awakened and copies the SYN data to the app buffer managed by him in user space for processing, and sends SYN+ACK, which also needs to be copied into send buffer (using the send () function) from app buffer, and then copied into the network card and transmitted. A new project is created for the connection in the connection incomplete queue and is set to the SYN_RECV status. Then use select () / poll () to monitor the socket listenfd again, and the listener is not awakened until data is written into the listenfd again. If the data written this time is ACK information, the data is copied into the app buffer for processing, and the corresponding items in the connection completed queue are moved to the connection completed queue and set to ESTABLISHED status. If it is not ACK this time, it must be SYN. That is, a new connection request, so as in the above process, the connection is not completed in the queue. This is the loop through which the listener handles the entire TCP connection.

That is, the listen () function also maintains two queues: the connection incomplete queue and the connection completed queue. When the listener receives a SYN from a client and replies to the SYN+ACK, an entry about the client is created at the end of the incomplete connection queue and its status is set to SYN_RECV. Obviously, this entry must contain information about the client's address and port (probably hash, I'm not sure). When the server receives the ACK message sent by the client again, the listener thread knows which item in the incomplete connection queue the message is replying to by analyzing the data, so it moves this item to the completed connection queue and sets its status to ESTABLISHED.

When the incomplete connection queue is full, the listener is blocked from receiving new connection requests and waits for two queues to trigger writable events via select () / poll (). When the completed connection queue is full, the listener does not receive new connection requests, and actions that are about to move into the completed connection queue are blocked. Before Linux 2.2, the listen () function had a parameter of backlog to set the maximum total length of the two queues, starting with Linux 2.2, this parameter represents only the maximum length of the completed queue, while / proc/sys/net/ipv4/tcp_max_syn_backlog is used to set the maximum length of the incomplete queue. / proc/sys/net/core/somaxconn is the maximum length of the hard limit completed queue, which defaults to 128. if backlog is greater than somaxconn, backlog will be truncated to equal this value.

When a connection in the queue is accept (), it indicates that the TCP connection has been established, and the connection will use its own socket buffer and client for data transmission. Both the socket buffer and the socket buffer of the listening socket are used to store the data received and sent by the TCP, but their meaning is no longer the same: the socket buffer of the listening socket only accepts the syn and ack data during the TCP connection request, while the socket buffer of the established TCP connection mainly stores the "formal" data transmitted between the two ends, such as the response data constructed by the server and the Http request data initiated by the client.

The Send-Q and Recv-Q columns of the netstat command represent socket buffer-related content, and the following is the explanation of man netstat:

Recv-Q Established: The count of bytes not copied by the user program connected to this socket. Listening: Since Kernel 2.6.18 this column contains the current syn backlog.Send-Q Established: The count of bytes not acknowledged by the remote host. Listening: Since Kernel 2.6.18 this column contains the maximum size of the syn backlog.

For sockets in listening state, Recv-Q represents the current syn backlog, that is, the current number of connections in the completed queue, and Send-Q represents the maximum number of syn backlog, that is, the maximum number of connections in the completed connection queue.

For established tcp connections, the Recv-Q column represents the size of data in the recv buffer that has not been copied by the user process, and the Send-Q column represents the data size that the remote host has not returned an ACK message. The reason for distinguishing between sockets with established TCP connections and sockets with listening status is that sockets in these two states use different socket buffer, in which listening sockets pay more attention to queue length, while sockets that establish TCP connections pay more attention to the size of data sent and received.

[root@xuexi] # netstat-tnlActive Internet connections (only servers) Proto Recv-Q Send-Q Local Address Foreign Address State tcp 0 0 0.0.0.0 only servers 22 0.0.0.0 Proto Recv-Q Send-Q Local Address Foreign Address State tcp * LISTEN tcp 0 0 127.0.1 Proto Recv-Q Send-Q Local Address Foreign Address State tcp 25 0.0.0.0 Proto Recv-Q Send-Q Local Address Foreign Address State tcp Tcp6 0 0: 80: * LISTEN tcp6 0 0: 22: * LISTEN tcp6 0 0:: 1:25:: * LISTEN [root@xuexi ~ ] # ss-tnlState Recv-Q Send-Q Local Address:Port Peer Address:PortLISTEN 0 128 *: 22 *: * LISTEN 0 100 127.0.1 LISTEN 25 *: * LISTEN 0 128 : 80:: * LISTEN 0 128: 22: * LISTEN 0 100:: 1:25:: *

Note that for sockets in the Listen state, the Send-Q of netstat is different from the Send-Q column of the ss command, because netstat does not write the maximum length of the completed queue at all. Therefore, when determining whether there is any free place in the queue to receive new tcp connection requests, you should use the ss command instead of netstat as much as possible.

2.3.2 impact of syn flood

In addition, if the listener does not receive the ACK message returned by the client after sending SYN+ACK, the listener will be awakened by the timeout set by select () / poll () and resend the SYN+ACK message to the client to prevent the message from being lost in the vast network. However, there is a problem with this retransmission. If the client forges the source address when calling connect (), then the SYN+ACK message replied by the listener must not reach the other party's host, that is, the listener will not receive the ACK message for a long time, so resend the SYN+ACK. However, whether the listener is awakened again and again because of the timeout set by select () / poll (), or the data is copied into send buffer again and again, CPU is required to participate in this period, and the SYN+ACK in send buffer has to be copied into the network card again (this time it is a DMA copy, no CPU is required). If the client is an attacker and sends tens of thousands of SYN, the listener will crash almost directly, and the network card will be seriously blocked. This is called a syn flood attack.

There are many ways to solve syn flood, such as reducing the maximum length of the two queues maintained by listen (), reducing the number of retransmissions of syn+ack, increasing the interval between retransmissions, reducing the waiting timeout for receiving ack, using syncookie, etc., but any method of directly modifying the tcp option can not strike a good balance between performance and efficiency. So it is extremely important to filter packets before the connection reaches the listener thread.

2.4 accept () function

The accpet () function reads the first item in the completed connection queue (which is removed from the queue after reading) and generates a socket descriptor for this item for subsequent connections, assuming connfd. With the new connection socket, the worker process / thread (called worker) can transfer data to the client through this connection socket, while the previously mentioned monitoring socket (sockfd) is still monitored by the listener.

For example, in httpd in prefork mode, each child process is both a listener and a worker. When each client initiates a connection request, the child process receives it while listening and releases the listening socket, so that other child processes can listen to the socket. After going back and forth, a new connection socket is finally generated through the accpet () function, so that the child process can concentrate on interacting with the client through this socket, and of course, it may block or sleep many times because of various io waits. This efficiency is really low, considering only the stages from the time the child process receives the SYN message to the final generation of the new connection socket, the child process is blocked again and again. Of course, you can set the listening socket to non-blocking IO mode, but it constantly checks the status even in non-blocking mode.

Consider the worker/event processing mode, where a dedicated listener thread and N worker threads are used in each child process. The listener thread is specifically responsible for listening and creating new connection socket descriptors that are placed in the socket queue of apache. In this way, the listener and the worker are separated, and the worker can still work freely during the monitoring process. If only from the point of view of snooping, the performance of worker/event mode is not better than that of prefork mode.

When the listener initiates the accept () system call, if there is no data in the completed connection queue, the listener will be blocked. Of course, you can set the socket to non-blocking mode, where accept () returns an error of EWOULDBLOCK or EAGAIN when the data is not available. You can use select () or poll () or epoll to wait for readable events for the completed connection queue. You can also set the socket to signal-driven IO mode so that the newly added data in the completed connection queue tells the listener to copy the data into the app buffer and use accept () for processing.

I often hear the concept of synchronous connection and asynchronous connection. How on earth do they distinguish? Synchronous connection means that from the time the listener listens to SYN data sent by a client, it must wait until the connection socket is established and the interaction with the client data ends, and no connection requests from other clients are received in the middle until the connection to that client is closed. To explain in detail, it is necessary to ensure that the socket buffer and app buffer data are consistent when synchronizing the connection. Typically, when dealing with synchronous connections, the listener and the worker are the same process, such as httpd's prefork model. On the other hand, asynchronous connections can receive and process other connection requests at any stage of connection establishment and data exchange. Usually, listeners and workers use asynchronous connections when they are not in the same process, for example, in httpd's event model, although listeners and workers are separated in worker model, synchronous connections are still used. After the listeners access the connection request and create a connection socket, they immediately give it to the worker thread, and the worker thread only serves the client until the connection is broken. And the asynchronism of event mode is only when the worker thread deals with a special connection (such as a connection in a persistent state), it can be placed in the custody of the listening thread, for a normal connection, it is still equivalent to the way of synchronous connection, so the so-called async of httpd event is actually pseudo-asynchronous. To put it loosely, synchronous connections are one process / thread handling one connection, and asynchronous connections are one process / thread handling multiple connections.

2.5 send () and recv () functions

The send () function copies data from app buffer to send buffer (or, of course, directly from the kernel's kernel buffer), and the recv () function copies data from recv buffer into app buffer. Of course, there's nothing wrong with replacing them with write () and read () functions, but send () / recv () is more targeted.

Both of these functions involve socket buffer, but when calling send () or recv (), it is important to consider whether there is data in the replicated source buffer and whether the replicated destination buffer is full so that it is unwritable. No matter which side, as long as the condition is not met, the process / thread will be blocked when send () / recv () is called (assuming the socket is set to a blocking IO model). Of course, you can set the socket to a non-blocking IO model, where the send () / recv () function is called when the buffer does not meet the condition, and the process / thread calling the function will return error status information EWOULDBLOCK or EAGAIN. In fact, you can use select () / poll () / epoll to monitor the corresponding file descriptor (the corresponding socket buffer monitors the socket descriptor). When the condition is met, call send () / recv () to operate normally. You can also set sockets to signal-driven IO or asynchronous IO models so that you don't have to call send () / recv () until the data is ready and copied.

2.6Functions close (), shutdown ()

The generic close () function closes a file descriptor, including, of course, connection-oriented network socket descriptors. When close () is called, all the data in the send buffer will be attempted. But the close () function just subtracts the socket reference count by 1, just like rm, deleting a file only removes a hard link count, and only when all reference counts of the socket are deleted will the socket descriptor actually be turned off and start the next four waves. For concurrent service programs in which the parent and child processes share sockets, calling close () to close the socket of the child process does not really close the socket, because the socket of the parent process is still open, and if the parent process does not call the close () function, then the socket will always be open and cannot enter the four waving process.

The shutdown () function is designed to close the connection of the network socket, unlike close () minus one reference count, it directly disconnects all the connections of the socket, thus triggering the process of waving four times. You can specify three ways to turn off:

1. Turn off writing. At this point, no more data can be written to the send buffer, and the data already in the send buffer will be sent until it is finished.

two。 Turn off reading. At this point, the data can no longer be read from the recv buffer and the data already in the recv buffer can only be discarded.

3. Turn off reading and writing. It cannot be read or written at this time, and the data already in send buffer will be sent until it is finished, but the data already in recv buffer will be discarded.

Whether it's shutdown () or close (), each time they are called, they send a FIN during the actual four-wave process.

three。 Address / Port reuse Technology

Normally, an addr+port can only be bound by one socket, in other words, addr+port cannot be reused, and different sockets can only be bound to different addr+port. For example, if you want to open two sshd instances, the same addr+port must not be configured in the configuration file of the sshd instance that was launched successively. Similarly, when configuring a web virtual host, two virtual hosts must not be able to configure the same addr+port unless it is based on a domain name, and the reason that a domain name-based virtual host can bind the same addr+port is that the http request message contains hostname information. In fact, when this kind of connection request arrives, it is still monitored through the same socket. The worker process / thread of httpd can assign this connection to the corresponding host.

Since what is mentioned above is normal, of course there are abnormal situations, that is, address reuse and port reuse, which are combined to be socket reuse. In the current Linux kernel, there is already an socket option SO_REUSEADDR that supports address reuse and a socket option SO_REUSEPORT that supports port reuse. After setting the port reuse option, bind the socket and there will be no more errors. Moreover, after an instance binds two addr+port (you can bind more than one, take two as an example here), you can use two listening processes / threads to listen to them at the same time, and the connections sent by the client can be received through the balance calculation of round-robin.

For listening processes / threads, each reused socket is called a listener bucket, that is, each listening socket is a listening bucket.

Take httpd's worker or event model as an example, suppose that there are currently three child processes, each with one listener thread and N worker threads.

Then, in the case of no address reuse, each listening thread is scrambling to listen. At some point, only one listening thread can be listening on the listening socket (obtaining the mutex mutex). When the listening thread receives the request, it gives up the listening qualification, so other listening threads compete for the listening qualification, and only one thread can seize the listening qualification.

When address reuse and port reuse techniques are used, multiple sockets can be bound to the same addr+port. For example, in the following figure, when one more listening bucket is used, there are two sockets, so two listening threads can listen at the same time. When a listening thread receives the request, it gives up the qualification and allows other listening threads to compete for the qualification.

If you bind one more socket, none of the three listening threads need to relinquish the listening qualification, but can listen indefinitely.

It seems that the performance is very good, which not only reduces the scramble for monitoring qualifications (mutexes), avoids the "hunger problem", but also monitors more efficiently, and reduces the pressure on the listening thread because it can be load balanced. But in fact, the listening process of each listening thread needs to consume CPU. If there is only one core CPU, even if it is reused, it does not show the advantage of reuse, but reduces the performance because of switching listening threads. Therefore, to use port reuse, you must consider whether the listening processes / threads have been isolated in their respective cpu, that is, whether to reuse and reuse several times need to consider the core number of the cpu and whether to bind the process to the cpu.

Is it helpful for you to read the above content? If you want to know more about the relevant knowledge or read more related articles, please follow the industry information channel, thank you for your support.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.