An article to understand TCP, HTTP, Socket, Socket connection pool 04/23 Update SLTechnology News&Howtos

An article to understand TCP, HTTP, Socket, Socket connection pool

2025-04-23 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Shulou(Shulou.com)06/03 Report--

As a developer, we often hear such words as HTTP protocol, TCP/IP protocol, UDP protocol, Socket, Socket persistent connection, Socket connection pool and so on. However, not everyone can understand the relationship, difference and principle between them. This article starts from the basis of network protocol to Socket connection pool, and explains the relationship between them step by step.

Seven-layer network model

First of all, it starts with the hierarchical model of network communication: the seven-layer model, also known as OSI (Open System Interconnection) model. From bottom to top, it is divided into physical layer, data link layer, network layer, transport layer, session layer, presentation layer and application layer. Everything about communication is inseparable from it, and the following picture introduces some of the protocols and hardware corresponding to each layer.

Cdn.xitu.io/2019/7/31/16c47ba8756beeb3?w=757&h=302&f=webp&s=30362 ">

From the figure above, I know that the IP protocol corresponds to the network layer, the TCP and UDP protocols correspond to the transport layer, and the HTTP protocol corresponds to the application layer. OSI does not have Socket, so what is Socket? we will describe it in detail later with the code.

TCP and UDP connection

On the transport layer TCP, UDP protocols we may encounter more, some people say that TCP is secure, UDP is not secure, UDP transmission is faster than TCP, then why, let's start with the process of TCP connection establishment, and then explain the difference between UDP and TCP.

TCP's three handshakes and four breakups

We know that TCP needs three handshakes to establish a connection, while disconnection takes four breakups, what those three handshakes and four breakups did and how they were done.

First handshake: establish a connection. The client sends a connection request message segment, setting the SYN location as 1 and the sequence Number as x; then the client enters the SYN_SEND state and waits for confirmation from the server

The second handshake: when the server receives the SYN message segment from the client, it needs to confirm the SYN message segment and set the Acknowledgment Number to Acknowledgment Number 1 (Sequence Number+1). At the same time, it also sends the SYN request message, which sets the SYN location to 1pm sequence Number as y. The server puts all the above information into a message segment (that is, the SYN+ACK message segment) and sends it to the client, and the server enters the SYN_RECV state.

Third handshake: the client receives the SYN+ACK message segment from the server. Then the Acknowledgment Number is set to yearly 1, and the ACK message segment is sent to the server. After the message segment is sent, both the client and the server enter the ESTABLISHED state and complete the TCP three-way handshake.

After completing the three-way handshake, the client and server can begin to transmit data. This is the general introduction of the TCP three-way handshake. At the end of the communication, the client and the server are disconnected and need to be confirmed after four breakups.

Break up for the first time: host 1 (can make client or server), set Sequence Number and Acknowledgment Number, and send a FIN message segment to host 2; at this point, host 1 enters FIN_WAIT_1 state; this means that host 1 has no data to send to host 2

The second breakup: host 2 received the FIN message segment sent by host 1, returned an ACK message segment to host 1, Acknowledgment Number was Sequence Number plus 1; host 1 entered FIN_WAIT_2 status; host 2 told host 1 that I "agreed" to your shutdown request

Third breakup: host 2 sends a FIN message segment to host 1, requesting to close the connection, while host 2 enters the LAST_ACK state

The fourth breakup: host 1 receives the FIN message segment sent by host 2, sends ACK message segment to host 2, and then host 1 enters the TIME_WAIT state; host 2 closes the connection after receiving the ACK message segment of host 1; at this time, host 1 still does not receive a reply after waiting for 2MSL, which proves that the Server end has been shut down normally, well, host 1 can also close the connection.

You can see that a tcp request is set up and closed for at least 7 communications, which does not include data communication, while UDP does not need 3 handshakes and 4 breakups.

The difference between TCP and UDP

TCP is link-oriented. Although the insecure and unstable nature of the network determines that no number of handshakes can guarantee the reliability of the connection, TCP's three-way handshake ensures the reliability of the connection to a minimum (and actually to a large extent). While UDP is not connection-oriented, UDP does not establish a connection with each other before transmitting data, and does not send confirmation signals to the received data. The sender does not know whether the data will be received correctly, and of course it does not need to be retransmitted, so UDP is a connectionless and unreliable data transmission protocol.

Because of the characteristics mentioned in 1, the cost of UDP is lower and the data transmission rate is higher. Because it is not necessary to confirm the sending and receiving data, the real-time performance of UDP is better. Knowing the difference between TCP and UDP, it is not difficult to understand why MSN using TCP transport protocol is slower than QQ using UDP to transfer files, but it cannot be said that QQ communication is not secure, because programmers can manually verify the sending and receiving of UDP data, such as the sender numbers each packet and then verifies it by the receiver, even if so. Because UDP does not use the "three-way handshake" similar to TCP in the encapsulation of the underlying protocol, it achieves the transmission efficiency that TCP cannot achieve. problem

We often hear some questions about the transport layer.

What is the maximum number of concurrent connections to the 1.TCP server?

There is a misunderstanding about the maximum number of concurrent connections to the TCP server, that is, "because the upper limit of the port number is 65535, the theoretical maximum number of concurrent connections that the TCP server can carry is 65535." First of all, you need to understand the components of a TCP connection: client IP, client port, server IP, server port. Therefore, for the TCP server process, the number of clients he can connect to at the same time is not limited by the available port number. Theoretically, the number of connections that a port of a server can establish is the number of IP in the world * the number of ports per machine. The actual number of concurrent connections is limited by the number of files that can be opened by linux, which is configurable and can be very large, so it is actually limited by system performance. Check the maximum number of file handles of the service through # ulimit-n, and modify xxx through ulimit-n xxx is the number you want to be able to open. You can also modify the system parameters:

2. Why do I have to wait for TIME_WAIT status to return to CLOSED status after 2MSL?

This is because although both parties have agreed to close the connection, and the four messages of the handshake have been coordinated and sent, they can go directly back to the CLOSED state (just like from the SYN_SEND state to the ESTABLISH state). However, because we have to assume that the network is unreliable, you cannot guarantee that the last ACK message you send will be received by the other party, so the Socket in the LAST_ACK state may resend the FIN message because it has not received the FIN message during the timeout, so the function of this TIME_WAIT state is to resend the ACK message that may be lost.

What is the problem if the 3.TIME_WAIT state still needs to wait for 2MSL before returning to the CLOSED state?

After both sides of the communication establish a TCP connection, the party who actively closes the connection will enter the TIME_WAIT state, and the TIME_WAIT state will be maintained for two MSL periods, that is, in 1-4 minutes, and the Windows operating system will be 4 minutes. When entering the TIME_WAIT state, it is usually the client, and a connection in the TIME_WAIT state occupies a local port. The upper limit of the number of port numbers on a machine is 65536. If a stress test is carried out on the same machine to simulate tens of thousands of customer requests and communicate with the server in a circular short connection, then the machine will generate about 4000 TIME_WAIT Socket, and the subsequent short connection will generate the exception of address already in use: connect. If you use Nginx as the direction agent, you also need to consider the TIME_WAIT status. It is found that there are a large number of connections in TIME_WAIT state in the system, which can be solved by adjusting kernel parameters.

Edit the file and add the following:

Then execute / sbin/sysctl-p to make the parameter take effect.

Net.ipv4.tcp_syncookies = 1 means that SYN Cookies is enabled. When a SYN waiting queue overflow occurs, enable cookies to handle it to prevent a small amount of SYN***, from defaulting to 0, which means that it is turned off.

Net.ipv4.tcp_tw_reuse = 1 means reuse is turned on. Allow TIME-WAIT sockets to be reused for new TCP connections. Default is 0, which means off.

Net.ipv4.tcp_tw_recycle = 1 means to enable fast recycling of TIME-WAIT sockets in TCP connections. Default is 0, which means disabled.

Net.ipv4.tcp_fin_timeout modifies the default TIMEOUT time of the system.

HTTP protocol

The network has an easy-to-understand introduction to the relationship between TCP/IP and HTTP: "when we transmit data, we can only use the (transport layer) TCP/IP protocol, but in that case, if there is no application layer, we will not be able to identify the data content." If you want the transmitted data to be meaningful, you must use the application layer protocol. There are many application layer protocols, such as HTTP, FTP, TELNET and so on. You can also define application layer protocols by yourself.

HTTP protocol, namely Hypertext transfer Protocol (Hypertext Transfer Protocol), is the basis of Web networking, and it is also one of the commonly used protocols in mobile phone networking. WEB uses HTTP protocol as the application layer protocol to encapsulate HTTP text messages, and then uses TCP/IP as the transport layer protocol to send it to the network.

Because HTTP actively releases the connection at the end of each request, the HTTP connection is a "short connection". In order to keep the client program online, you need to constantly make connection requests to the server. The usual practice is that there is no need to obtain any data immediately, and the client also keeps sending a "keep connected" request to the server at regular intervals, and the server replies to the client after receiving the request, indicating that it knows that the client is "online". If the server cannot receive the request from the client for a long time, it is considered that the client is "offline". If the client cannot receive a reply from the server for a long time, the network is considered to be disconnected.

Here is a simple request for HTTP Post application/json data content:

About Socket (socket)

Now we know that TCP/IP is only a protocol stack, just like the running mechanism of the operating system, it must be specifically implemented, but also provide external operation interface. Just as the operating system provides standard programming interfaces, such as Win32 programming interfaces, TCP/IP must also provide external programming interfaces, which is Socket. Now we know that Socket and TCP/IP are not necessarily related. When designing the Socket programming interface, it is hoped that it can also adapt to other network protocols. Therefore, the emergence of Socket can only make it more convenient to use the TCP/IP protocol stack, which abstracts TCP/IP and forms several basic function interfaces. Such as create,listen,accept,connect,read and write and so on.

Different languages have corresponding libraries for building Socket servers and clients. Here is an example of how Nodejs creates servers and clients:

Server:

Service listens on port 9000

The following uses the command line to send http requests and telnet

Notice that curl only processes the message once.

Client

Socket persistent connection

The so-called persistent connection means that multiple packets can be sent continuously on a TCP connection. If there are no packets sent during the maintenance of the TCP connection, both parties need to send detection packets to maintain the connection (heartbeat packet). Generally, they need to do online maintenance by themselves. Short connection means that when there is data exchange between the two sides of the communication, a TCP connection is established, and after the data transmission is completed, the TCP connection is disconnected. For example, Http only connects, requests, and closes, and the process takes a short time. If the server does not receive a request within a period of time, it can close the connection. In fact, long connection is relative to the usual short connection, that is, to maintain the connection state between the client and the server for a long time.

The usual steps for short connection are:

Connect → data transfer → close the connection

A long connection is usually:

Connect → data transmission → keep connection (heartbeat) → data transmission → keep connection (heartbeat) →... → closes the connection

When to use long connection, short connection?

Long connections are mostly used for frequent operations, point-to-point communication, and the number of connections should not be too many. Each TCP connection requires a three-step handshake, which takes time. If each operation is connected first, then the processing speed will be much lower, so it will continue to open after each operation, and the OK will be sent directly during the second processing. There is no need to establish a TCP connection. For example, database connections use long connections, frequent communication with short connections will cause Socket errors, and frequent Socket creation is also a waste of resources.

What is a heartbeat and why does it need to:

Heartbeat packet is a self-defined command word that regularly notifies each other of its own status between the client and the server, which is sent at a certain time interval, similar to the heartbeat, so it is called heartbeat packet. Socket is used to receive and send data in the network. But if the socket is disconnected (for example, one party is disconnected), there must be problems sending and receiving data. But how can you tell if this socket can still be used? This requires the creation of a heartbeat mechanism in the system. In fact, a mechanism called heartbeat has been implemented for us in TCP. If you set the heartbeat, TCP will send your heartbeat within a certain amount of time (for example, 3 seconds), and this message will not affect the protocol you define. You can also define it by yourself. The so-called "heartbeat" is to send a custom structure (heartbeat packet or heartbeat frame) regularly to let the other person know that he is "online" to ensure the validity of the link.

Achieve:

Server:

Server output result:

Client code:

Client output result:

Define your own agreement

If you want to make the transmitted data meaningful, you must use application layer protocols such as Http, Mqtt, Dubbo, and so on. There are several problems that need to be solved to customize your own application layer protocol based on TCP protocol:

Definition and processing of heartbeat packet format

The definition of the header is that when you send data, you need to send the header first, and the length of the data you are going to send can be parsed in the message.

The format in which you send the packet, whether it is json or other serialization

Let's define our own protocol and write the service and client to make calls:

Define the header format: length:000000000xxxx; xxxx represents the length of the data, with a total length of 20, which is not rigorous for example.

Data serialization method: JSON.

Server:

Log printing:

Client

Log printing:

Here we can see that a client can handle a request well at the same time, but imagine a scenario where if the same client invokes the server request several times at the same time and sends header data and content data multiple times, it is difficult for the data received by the data event on the server side to distinguish which data is which request, for example, two headers arrive at the server side at the same time. The server will ignore one of them, and the subsequent content data may not necessarily correspond to this header. So if you want to reuse long connections and handle server requests with high concurrency, you need connection pooling.

Socket connection pool

What is Socket connection pool? the concept of pool can be thought of as a collection of resources, so Socket connection pool is a collection that maintains a certain number of Socket persistent connections. It can automatically detect the validity of Socket persistent connections, eliminate invalid connections, and supplement the number of persistent connections in the connection pool. At the code level, it is actually a class that artificially implements this function. Generally, a connection pool contains the following attributes:

Idle available persistent connection queues

Persistent connection queue for running traffic

Waiting for a queue to get a request for an idle persistent connection

Elimination function of invalid persistent connection

Quantity configuration of persistent connection resource pool

New function of long-connected resources

Scenario: when a request comes, first go to the resource pool to obtain a persistent connection resource. If there is a persistent connection in the idle queue, get the persistent connection Socket and move the Socket to the running persistent connection queue. If there is no idle queue, and the length of the running queue is less than the number of configured connection pool resources, create a new long connection to the running queue. If the running queue is no less than the configured length of the resource pool, the request goes to the waiting queue. When a running Socket completes the request, it moves from the running queue to the idle queue and triggers the waiting request queue to get free resources, if there is a wait.

The following is a brief introduction to a general connection pooling module of Node.js: generic-pool.

Main file directory structure

Initialize connection pool

Use connection pooling

For the use of the connection pool below, the protocol used is the one we customized before.

Log printing:

Here, we can see that the previous two requests have established a new Socket connection socket_pool 127.0.0.1 9000 connect. If you re-initiate two requests after the timer ends, no new Socket connection will be established, and the Socket connection resources will be obtained directly from the connection pool.

Source code analysis

Found that the main code is located in the Pool.js in the lib folder

Constructor:

Lib/Pool.js

You can see the resource queue that contains the idle resource mentioned earlier, the resource queue that is being requested, the request queue that is waiting, and so on.

Let's look at the Pool.acquire method

Lib/Pool.js

The above code goes all the way down until you finally get the long-connected resources, and you can learn more about the other code yourself.

Brief introduction of the author: 6 years of server development experience, responsible for the technical solution evolution and deployment of start-up projects from zero to high concurrent access, and accumulated certain platform development and high concurrency and high availability experience. now he is responsible for the architecture and development of the gateway layer under the company's microservice framework.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.