Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

Example Analysis of reuseport Evolution in Linux Kernel

2025-03-31 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Development >

Share

Shulou(Shulou.com)06/02 Report--

Today I will give you a sample analysis of the evolution of reuseport in the Linux kernel. The content of the article is good. Now I would like to share it with you. Friends who feel in need can understand it. I hope it will be helpful to you. Let's read it along with the editor's ideas.

The SO_REUSEPORT option was introduced into the kernel in Linux 3.9, and there was a similar option SO_REUSEADDR before that.

If you are not clear about the difference and connection between the two, it is recommended to search How do SO_REUSEADDR and SO_REUSEPORT differ?

If you don't want to read it, the following section is for lazy people.

What are SO_REUSEADDR and SO_REUSEPORT?

TCP/UDP uniquely identifies a connection with a quintuple.

At any time, the quintuple of the two connections cannot be exactly the same, otherwise when a message is received, the protocol stack cannot tell which connection it belongs to.

Quintuple {,}

In the quintuple, protocol determines when creating socket, and when bind (), and when connect ().

Of course, bind () and connect () do not need to be used explicitly in some cases, but this is beyond the scope of this article.

So, if the SO_REUSEADDR and SO_REUSEPORT options are set for socket, when will they work?

The answer is bind (), which is when the sum is determined.

There are some differences in how different operating system kernels treat SO_REUSEADDR and SO_REUSEPORT, but they both originate from BSD.

Therefore, the next step is to take the implementation of BSD as the standard.

SO_REUSEADDR

Suppose I now need bind () to bind socketA to AsocketB X and socketB to BRV Y (regardless of Xmap0 or Yanz0, because 0 means that the kernel automatically allocates ports, which must not conflict).

Both bind () will succeed regardless of the relationship between An and B. But if you Xcheck Y, the result would be something like this:

SO_REUSEADDR socketA socketB Result---- ON/OFF 192.168.0.1 ON/OFF 21 192.168.0.1 EADDRINUSE 21 ON/OFF 192.168.0.1 Error 21 10.0.0.1 Error 21 OK ON/OFF 10.0.0.1 Error 21 192.168.0.1 Error (EADDRINUSE) OFF 192.168.1.0 Error (EADDRINUSE) ON 0.0.0.0 OK ON/OFF 21 192.168.1.0 Error 21 OK ON 192.168.1.0 Error 21 0.0.0.0 Error 21 (EADDRINUSE)

The first column indicates whether SO_ REUSEADDR`` is set, and the last column indicates whether the bound socket can be bound successfully.

Note: the object set here refers to the post-bound socket (that is, does not care whether the previous one is set or not)

As you can see, in the implementation of BSD, SO_REUSEADDR can bind a socket using a wildcard address (0.0.0.0) and a specified address (192.168.1.0) at the same time.

SO_REUSEADDR also has an application scenario: there is a TIME_WAIT state in TCP, which refers to the last stop at the end of the active shutdown.

Suppose that socketA binds to socketB X and actively uses close () after completing the TCP communication to enter TIME_WAIT. At this point, if socketB also binds Aclose X, you will also get an EADDRINUSE error, but if socketB sets SO_REUSEADDR, then the binding can be successful.

SO_REUSEPORT

If you understand SO_REUSEADDR, then SO_REUSEPORT is easy to understand, which allows two socket to bind to exactly the same.

SO_REUSEPORT socketA socketB Result---- ON 192.168.0.1:21 192.168.0.1:21 OK

As a reminder, the above results are all the results of BSD. There are some differences in the Linux kernel, such as

Version 3.9 supports SO_REUSEPORT. Once TCP Socket as a Server is bound to a specific port, LISTEN is started, even if it has previously set SO_REUSEADDR, it will not take effect. This point Linux is more stringent than BSD SO_REUSEADDR socketA socketB Result---- ON/OFF 192.168.0.1 SO_REUSEADDR socketA socketB Result---- ON/OFF 21 0.0.0.0 SO_REUSEADDR socketA socketB Result---- ON/OFF 21 Error (EADDRINUSE)

Prior to version 3.9, the Socket,SO_REUSEADDR option as Client had the effect of SO_REUSEPORT in BSD. On this point, Linux is more relaxed than BSD. SO_REUSEADDR socketA socketB Result---- ON 192.168.0.2:55555 192.168.0.2:55555 OK

Evolution Linux of reuseport in Linux

Let's see how it is done: kernel socket uses the skc_reuse field to indicate whether SO_REUSEADDR is set

Struct sock_common {/ * omitted * / unsigned char skc_reuse; / * omitted * /} int sock_setsockopt (struct socket * sock, int level, int optname,... {. Case SO_REUSEADDR: sk- > sk_reuse = (valbool? SK_CAN_REUSE: SK_NO_REUSE); break;}

Inet_bind_bucket represents a bound port.

Struct inet_bind_bucket {/ * omitted * / unsigned short port; signed short fastreuse; int num_owners; struct hlist_node node; struct hlist_head owners;}

The fastreuse in the above structure indicates whether the port supports sharing, and all socket that share the port are hung on the owner member. When the user uses bind (), the kernel uses TCP:inet_csk_get_port (), UDP:udp_v4_get_port () to bind the port.

/ * inet_connection_Sock.c: inet_csk_get_port () * / tb_found: if (! hlist_empty (& tb- > owners)) {. If (tb- > fastreuse > 0 & & sk- > sk_reuse & & sk- > sk_state! = TCP_LISTEN & & smallest_size = =-1) {goto success

Therefore, when the port supports sharing, and socket also has SO_REUSEADDR set and is not in LISTEN state, bind () can succeed this time.

3.9 =

The 3.9 kernel adds support for SO_REUSEPORT, and listener can be bound to the same ".

At this time, when Server receives the SYN message sent by Client, it will select one of the socket to respond.

As for the implementation, version 3.9 extends sock_common to split the original record skc_reuse.

Struct sock_common {unsigned short skc_family; volatile unsigned char skc_state;- unsigned char skc_reuse;+ unsigned char skc_reuse:4;+ unsigned char skc_reuseport:4;@@ int sock_setsockopt (struct socket * sock, int level, int optname, case SO_REUSEADDR: sk- > sk_reuse = (valbool? SK_CAN_REUSE: SK_NO_REUSE); break;+ case SO_REUSEPORT:+ sk- > sk_reuseport = valbool;+ break

Then the inet_bind_bucket is extended accordingly.

Struct inet_bind_bucket {/ * omitted * / unsigned short port;- signed short fastreuse;+ signed char fastreuse;+ signed char fastreuseport;+ kuid_t fastuid

When binding the port, the pass condition of a queue reuseport was added.

/ * inet_connection_sock.c: inet_csk_get_port () * / tb_found: if (sk- > sk_reuse = = SK_FORCE_REUSE) goto success -if (tb- > fastreuse > 0 & &-sk- > sk_reuse & & sk- > sk_state! = TCP_LISTEN & & + if ((tb- > fastreuse > 0 & & + sk- > sk_reuse & & sk- > sk_state! = TCP_LISTEN) | + (tb- > fastreuseport > 0 & & + sk- > sk_reuseport & uid_eq (tb- > fastuid) Uid)) & & smallest_size = =-1) {goto success

When the SYN message of Client arrives, Server first calculates a hash collision chain based on the local port ("of SYN message"), then traverses all the Socket on the list and scores it according to the matching degree of the quad.

If reuseport is enabled, it is possible that multiple Socket will get the highest score, and the kernel will randomly select one for subsequent processing.

/ * inet_hashtables.c * / struct sock * _ inet_lookup_listener (struct.) {struct sock * sk, * result; unsigned int hash = inet_lhashfn (net, hnum); struct inet_listen_hashbucket * ilb = & hashinfo- > listening_ hash [hash]; / / find hash conflict chain based on local port / * code omitted * / result = NULL; hiscore = 0 Sk_nulls_for_each_rcu (sk, node, & ilb- > head) {score = compute_score (sk, net, hnum, daddr, dif); / / if (score > hiscore) {result = sk; hiscore = score; reuseport = sk- > sk_reuseport If (reuseport) {phash = inet_ehashfn (net, daddr, hnum, saddr, sport); matches = 1; / / if it is reuseport, how many socket satisfies}} else if (score = = hiscore & & reuseport) {matches++ If (reciprocal_scale (phash, matches) = = 0) result = sk; phash = next_pseudo_random32 (phash);} / * * if the nulls value we got at the end of this lookup is * not the expected one, we must restart lookup. * We probably met an item that was moved to another chain. * / return result;}

For example, suppose the kernel has four hash collision chains for listening socket, and then the user establishes four Server:A, B, C, and D, the listening address and port are shown in the following figure, and An and B enable SO_REUSEPORT.

The collision chain is based on the port Key, so A, B, and D will be attached to the same collision chain.

If a SYN message is received from the peer at this time, the kernel will traverse the listening_hash [0] to score the above seven socket. Because B listens to the exact address, B will score higher than A, and the kernel finally selects a SocketB for subsequent processing.

4.5

As you can see from the above example, when a SYN message is received, the kernel must traverse a complete chain of hash conflicts, scoring each socket, which is a bit redundant.

So, in version 4.5, the kernel introduced reuseport groups, which binds to the same IP and Port, and organizes socket with the SO_REUSEPORT option set into a group.

-a Universe deAccord sock.hcards + bAccord demarcationsock.hcards @-318Legend 6 + 318prime7 @ @ struct cg_proto * @ sk_error_report: callback to indicate errors (e.g.% MSG_ERRQUEUE) * @ sk_backlog_rcv: callback to process the backlog * @ sk_destruct: called at sock freeing time, i.e. When all refcnt = = 0 + * @ sk_reuseport_cb: reuseport group container * / struct sock {/ * @-453 struct sock 7 @ @ struct sock {int (* sk_backlog_rcv) (struct sock * sk) Struct sk_buff * skb) Void (* sk_destruct) (struct sock * sk); + struct sock_reuseport _ _ rcu * sk_reuseport_cb;}

This feature only supports UDP in version 4.5, while TCP (patch) is supported in version 4.6.

In this way, when looking for listen socket, the kernel will no longer have to traverse the entire collision chain, but when it finds a qualified socket, if it sets SO_REUSEPORT, it will directly find the reuseport group to which it belongs and choose one for subsequent processing.

@ @-215 listening_ 6 + 217 struct sock 7 @ struct sock * _ inet_lookup_listener (struct net * net, unsigned int hash = inet_lhashfn (net, hnum); struct inet_listen_hashbucket * ilb = & hashinfo- > listening_ hash [hash]; int score, hiscore, matches = 0, reuseport = 0th + bool select_ok = true; u32 phash = 0; rcu_read_lock () @ @-230select_ok 6 + 233 select_ok 15 @ @ begin: if (reuseport) {phash = inet_ehashfn (net, daddr, hnum, saddr, sport); + if (select_ok) {+ struct sock * sk2 + sk2 = reuseport_select_sock (sk, phash,+ skb, doff); + if (sk2) {+ result = sk2;+ goto found;+} +} matches = 1 What is Linux system Linux is a free-to-use and free-spread UNIX-like operating system, is a POSIX-based multi-user, multi-tasking, multi-threaded and multi-CPU operating system, using Linux to run major Unix tools, applications and network protocols.

The above is the whole content of the reuseport evolution example analysis in the Linux kernel. For more content related to the reuseport evolution example analysis in the Linux kernel, you can search the previous article or browse the following article to learn! I believe the editor will add more knowledge to you. I hope you can support it!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Development

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report