Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

How to see the Listen and connection queue of Socket TCP from the Linux source code

2025-04-03 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Servers >

Share

Shulou(Shulou.com)06/01 Report--

How to look at the Socket TCP Listen and connection queue from the Linux source code, many novices are not very clear about this. In order to help you solve this problem, the following editor will explain it in detail. People with this need can come and learn. I hope you can get something.

From the Linux source code to see Socket (TCP) listen and connection queue preface the author has always felt that if you can know every code from the application to the framework and then to the operating system, it is an Exciting thing. Today, the author from the perspective of Linux source code to see what Socket on the server side did during listen (based on the Linux 3.10 kernel). Of course, because the backlog parameters of listen are related to semi-join hash table and full connection queue, it is also mentioned in this blog.

Listen is required for Socket on Server

As we all know, the establishment of a server-side Socket requires four steps: socket, bind, listen and accept. Today I will focus on the Listen step.

The code is as follows:

Void start_server () {/ / server fd int sockfd_server; / / accept fd int sockfd; int call_err; struct sockaddr_in sock_addr;. Call_err=bind (sockfd_server, (struct sockaddr*) (& sock_addr), sizeof (sock_addr)); if (call_err= =-1) {fprintf (stdout, "bind error!\ n"); exit (1);} / / this is our focus today listen call_err=listen (sockfd_server,MAX_BACK_LOG) If (call_err = =-1) {fprintf (stdout, "listen error!\ n"); exit (1);}}

First we create a socket through the socket system call, where SOCK_STREAM is specified, and the last parameter is 0, which creates a normal all-TCP Socket. Here, we directly give the ops corresponding to TCP Socket, that is, the operation function.

If you want to know how the structure in the above picture comes from, you can take a look at my previous blog:

Https://my.oschina.net/alchemystar/blog/1791017

Now that the Listen system has been transferred, let's go directly to the Listen system call.

# include / / successfully returns 0, error returns-1, and the error code is set to errno int listen (int sockfd, int backlog)

Note that the listen call here is loaded by glibc's INLINE_SYSCALL, which corrects the return value to only 0 and-1, and sets the absolute value of the error code in the errno. The backlog in this is a very important parameter, if it is not set up well, it is a very hidden pit.

For java developers, the basic off-the-shelf framework is used, while the default backlog setting size of java itself is only 50. This will lead to some subtle phenomena, which will be explained in this article.

Next, let's go to the Linux kernel source stack.

Listen |-> INLINE_SYSCALL (listen.) |-> SYSCALL_DEFINE2 (listen, int, fd, int, backlog) / * check whether the corresponding descriptor fd exists, does not exist, and returns-BADF |-> sockfd_lookup_light / * the maximum value of backlog passed does not exceed / proc/sys/net/core/somaxconn |-> if ((unsigned int) backlog > somaxconn) backlog = somaxconn |-> sock- > ops- > listen (sock, backlog) inet_listen

It is worth noting that Kernel made an adjustment to the backlog value we passed in so that it could not > the somaxconn in the kernel parameter settings.

Inet_listen

The next step is the core caller inet_listen.

Int inet_listen (struct socket * sock, int backlog) {/ * Really, if the socket is already in listen state * we can only allow the backlog to be adjusted. * if ((sysctl_tcp_fastopen & TFO_SERVER_ENABLE)! = 0 & & inet_csk (sk)-> icsk_accept_queue.fastopenq = = NULL) {/ / fastopen logical if ((sysctl_tcp_fastopen & TFO_SERVER_WO_SOCKOPT1)! = 0) err = fastopen_init_queue (sk, backlog) Else if ((sysctl_tcp_fastopen & TFO_SERVER_WO_SOCKOPT2)! = 0) err = fastopen_init_queue (sk, ((uint) sysctl_tcp_fastopen) > > 16); else err = 0; if (err) goto out;} if (old_state! = TCP_LISTEN) {err = inet_csk_listen_start (sk, backlog);} sk- > sk_max_ack_backlog = backlog . }

The first interesting thing about this code is that the system call listen can be called repeatedly! On the second call, you can only change its backlog queue length (although it doesn't feel necessary).

First, let's take a look at logic other than fastopen (discussed in more detail in a single chapter later on by fastopen). This is the final inet_csk_listen_start call.

Int inet_csk_listen_start (struct sock * sk, const int nr_table_entries) {. / / the nr_table_entries here is the adjusted backlog / / but inside this function, we will further change the logic nr_table_entries = min (backlog,sysctl_max_syn_backlog) the logic int rc = reqsk_queue_alloc (& icsk- > icsk_accept_queue, nr_table_entries);. Inet_csk_delack_init (sk); / / set socket to listen status sk- > sk_state = TCP_LISTEN; / / check port number if (! sk- > sk_prot- > get_port (sk, inet- > inet_num)) {/ / clear dst cache sk_dst_reset (sk) / / chain the current sock into listening_hash / / so that when the SYN arrives, you can find the sock sk- > sk_prot- > hash (sk) in the listen through the _ _ inet_lookup_listen function;} sk- > sk_state = TCP_CLOSE; _ reqsk_queue_destroy (& icsk- > icsk_accept_queue); / / the port has been occupied, return the error code-EADDRINUSE return-EADDRINUSE;}

The most important call here is sk- > sk_prot- > hash (sk), which links the current sock into the global listen hash table so that the corresponding listen sock can be found when the SYN package arrives. As shown in the following figure:

As shown in the figure, if SO_REUSEPORT is enabled, you can have different Socket listen (listen) on the same port, so that you can create load balancers for connections in the kernel. After the opening of Nginx version 1.9.1, its pressure test performance has reached 3 times!

Semi-connected queues hash table and fully connected queues

It is mentioned in the materials that the author read at the beginning. Tcp has two connection queues, one is sync_queue and the other is accept_queue. But the author carefully read the source code, in fact, this is not the case. In fact, sync_queue is actually a hash syn_table. The other queue is icsk_accept_queue.

So in this article, it is called reqsk_queue (short for request_socket_queue). Here, the author first gives the timing of the emergence of the two queue in the three-way handshake. As shown in the following figure:

Of course, in addition to the qlen and sk_ack_backlog counters mentioned above, there is also a qlen_young, which serves the following purposes:

Qlen_young: records the number of sock that have just arrived, have not been retransmitted by the SYN_ACK retransmission timer, SYN_ACK and have not completed a three-way handshake

As shown in the following figure:

As for the retransmission timer for SYN_ACK, the code in the kernel is as follows:

Static void tcp_synack_timer (struct sock * sk) {inet_csk_reqsk_queue_prune (sk, TCP_SYNQ_INTERVAL, TCP_TIMEOUT_INIT, TCP_RTO_MAX);}

This timer runs once at intervals of 200ms (TCP_SYNQ_INTERVAL) when the semi-connection queue is not empty. Limited to space, the author will not discuss much here.

Why is there a semi-connection queue?

Because according to the characteristics of TCP protocol, there will be network attacks such as semi-connection, that is, constantly sending SYN packets and never responding to SYN_ACK. If you send a SYN package and ask Kernel to build a very expensive sock, it's easy to run out of memory. Therefore, before the success of the three-way handshake, the kernel only allocates a request_sock that occupies very little memory to prevent this kind of attack, and together with the syn_cookie mechanism, tries its best to resist the risk of this kind of semi-connected attack.

Restrictions on semi-connected hash tables and fully connected queues

Because the normal sock that takes up a lot of memory is stored in the full connection queue, Kernel imposes a maximum length limit on it. This restriction is:

The backlog 2./proc/sys/inet/ipv4/tcp_max_syn_backlog 3./proc/sys/net/core/somaxconn passed in the 1.listen system call, the minimum of the following three, is min (backlog,tcp_ma_syn_backlog,somaxcon).

If the somaxconn is exceeded, it will be discarded by the kernel, as shown in the following figure:

The discarding of connections in this case can lead to a strange phenomenon. When the tcp_abort_on_overflow is not set, the client will not be aware of it, which will result in knowing that the peer connection has been discarded at the time of the first call.

So, how do we get the client to perceive in this case? we can set up tcp_abort_on_overflow.

Echo'1' > tcp_abort_on_overflow

After setting up, as shown in the following figure:

Of course, the most direct thing is to turn up the backlog!

Listen (fd,2048) echo '2048' > / proc/sys/inet/ipv4/tcp_max_syn_backlog echo '2048' > / proc/sys/net/core/somaxconn

The influence of backlog on semi-connected queues

This backlog also affects the semi-connection queue, as shown in the following code:

/ * TW buckets are converted to open requests without * limitations, they conserve resources and peer is * evidently real one. * / / when SYN cookie is enabled, if the semi-connection queue length exceeds backlog, send cookie / / otherwise discard if (inet_csk_reqsk_queue_is_full (sk) & &! isn) {want_cookie = tcp_syn_flood_action (sk, skb, "TCP"); if (! want_cookie) goto drop;} / * Accept backlog is full. If we have already queued enough * of warm entries in syn queue, drop request. It is better than * clogging syn queue with openreqs with exponentially increasing * timeout. * / / when the full connection queue is full, if there is a young_ack, then directly discard if (sk_acceptq_is_full (sk) & & inet_csk_reqsk_queue_young (sk) > 1) {NET_INC_STATS_BH (sock_net (sk), LINUX_MIB_LISTENOVERFLOWS); goto drop;}

What we often see in dmesg.

Possible SYN flooding on port 8080

This is due to the fact that Kernel sends a cookie check when the semi-connection queue is full.

Is it helpful for you to read the above content? If you want to know more about the relevant knowledge or read more related articles, please follow the industry information channel, thank you for your support.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 249

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Servers

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report