In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-04-04 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > IT Information >
Share
Shulou(Shulou.com)11/24 Report--
This article comes from the official account of Wechat: developing Internal skills practice (ID:kfngxl). Author: Zhang Yanfei allen
Hello, everyone. I'm Brother Fei!
We have disassembled the receiving process of the Linux network packet and completed the sending process of the network packet. The whole process of sending and receiving network packets in the kernel is clear.
While Brother Fei was complacent with these two articles, he received a question from a reader: "Brother Fei, 127.0.0.1 how does the local network IO communicate?" Well, it seems that this question has not been mentioned before.
Now the native network IO is widely used. In php, Nginx and php-fpm generally communicate through 127.0.0.1. In micro-service, due to the application of side car mode, there are more and more native network requests. Therefore, I think it will be very meaningful to have a deep understanding of this issue in practice. I would like to thank @ Wenwu for his proposal.
Today, let's figure out the network IO problem of 127.0.0.1! In order to facilitate the discussion, I split this question into two questions:
127.0.0.1 does the native network IO need to go through the network card?
Compared with the external network communication, what is the difference in the kernel transceiver process?
After laying the groundwork, the disassembly has officially begun!
1. Cross-machine network communication process before we begin to talk about the local communication process, let's first review the cross-machine network communication.
1.1 Cross-machine data transmission starts from the send system call until the Nic sends the data out. The overall process is as follows:
In this figure, we see that the user data is copied to the kernel state, then processed by the protocol stack and then entered into the RingBuffer. Then the Nic driver actually sends the data out. When the send is complete, the CPU is notified by a hard interrupt, and then the RingBuffer is cleaned.
However, the above diagram does not show the kernel components and source code very well, so let's look at it again from the perspective of the code.
When the network is sent. When the network card is sent, it will send a hard interrupt to CPU to notify CPU. Receiving this hard interrupt frees up memory used in RingBuffer.
1.2 Cross-machine data reception when the packet arrives on another machine, the receiving process of the Linux packet begins.
When the network card receives the data, the CPU initiates an interrupt to notify the CPU that the data has arrived. When CPU receives the interrupt request, it will call the interrupt handling function registered by the network driver to trigger the soft interrupt. Ksoftirqd detects the arrival of a soft interrupt request and starts polling for receiving the packet. After receiving it, it is handed over to all levels of protocol stack for processing. When the protocol stack has finished processing and the data is placed in the receive queue, the user process is awakened (assuming blocking mode).
Let's look at it again from the perspective of kernel components and source code.
1.3 Cross-machine network communication summary
Second, the local sending process in the first section, we saw the whole network sending process across computers.
In the process of native network IO, there will be some differences in process. In order to highlight the point, we will no longer introduce the overall process, but will only introduce what is different from the cross-machine logic. There are two differences, namely routing and driver.
2.1 when the network layer routes and sends data into the protocol stack to the network layer, the network layer entry function is ip_queue_xmit. Routing will be carried out in the network layer. After the routing is finished, some IP headers are set and some netfilter filtering is done, and the packet is handed over to the neighbor subsystem.
For the local network IO, the special feature is that the routing entries can be found in the local routing table, and the corresponding devices will use the loopback network card, which is our common lo.
Let's take a detailed look at the working process related to this section of routing in the routing network layer. From the network layer entry function ip_queue_xmit.
/ / file: net/ipv4/ip_output.cint ip_queue_xmit (struct sk_buff * skb, struct flowi * fl) {/ / check whether there is a cached routing table in socket rt = (struct rtable *) _ sk_dst_check (sk, 0); if (rt = = NULL) {/ / expand search / / find routing entries and cache them in socket rt = ip_route_output_ports () Sk_setup_caps (sk, & rt- > dst);} the function to find the routing item is ip_route_output_ports, which in turn calls ip_route_output_flow, _ _ ip_route_output_key, and fib_lookup. Omit the calling process and look directly at the key code of fib_lookup.
/ file:include/net/ip_fib.hstatic inline int fib_lookup (struct net * net, const struct flowi4 * flp, struct fib_result * res) {struct fib_table * table; table = fib_get_table (net, RT_TABLE_LOCAL); if (! fib_table_lookup (table, flp, res, FIB_LOOKUP_NOREF) return 0; table = fib_get_table (net, RT_TABLE_MAIN) If (! fib_table_lookup (table, flp, res, FIB_LOOKUP_NOREF) return 0; return-ENETUNREACH;} will query the local and main routing tables in fib_lookup, and query local first and then main. We can view these two routing tables on Linux using the command name, and here we only look at the local routing table (because the local network IO queries this table and terminates).
# ip route list table locallocal 10.143.x.y dev eth0 proto kernel scope host src 10.143.x.ylocal 127.0.0.1 dev lo proto kernel scope host src 127.0.0.1 as you can see from the above results, a route with a destination of 127.0.0.1 can be found in the local routing table. The fib_lookup work is complete. Return _ _ ip_route_output_key to continue.
/ / file: net/ipv4/route.cstruct rtable * _ ip_route_output_key (struct net * net, struct flowi4 * fl4) {if (fib_lookup (net, fl4, & res)) {} if (res.type = = RTN_LOCAL) {dev_out = net-loopback_dev;} rth = _ mkroute_output (& res, fl4, orig_oif, dev_out, flags); return rth } for network requests that are native, all devices will use net- > loopback_dev, that is, lo virtual network cards.
The next network layer is still the same as the cross-machine network IO, which will eventually pass through ip_finish_output and eventually enter into the entry function dst_neigh_output of the neighbor subsystem.
Does the native network IO need IP slicing? Because the same as the normal network layer processing will go through the ip_finish_output function. In this function, fragmentation is still performed if skb is greater than MTU. It's just that lo's MTU is much bigger than Ethernet. Through the ifconfig command, it can be found that the average network card is 1500, while the lo virtual interface can have 65535.
After processing in the neighbor subsystem function, it enters the network device subsystem (the entry function is dev_queue_xmit).
2.2 the entry function of the network equipment subsystem is dev_queue_xmit. Recall briefly that when we talked about the cross-machine sending process, for the physical devices that really have queues, after a series of complex queuing and other processing in this function, dev_hard_start_xmit is called and then sent from this function into the driver. In this process, a soft interrupt may even be triggered to send, as shown in the figure:
But for loopback devices in the boot state (Q-> enqueue is judged as false), it is much easier. If there is no problem with the queue, go directly to dev_hard_start_xmit. Then enter the send callback function loopback_xmit in the "driver" of the loopback device to "send" the skb.
Let's take a look at the detailed process, starting from the entry dev_queue_xmit of the network equipment subsystem.
/ / file: net/core/dev.cint dev_queue_xmit (struct sk_buff * skb) {Q = rcu_dereference_bh (txq-qdisc); if (q-enqueue) {/ / Loopback device here is false rc = _ dev_xmit_skb (skb, Q, dev, txq); goto out;} / / start processing if (dev- > flags & IFF_UP) {dev_hard_start_xmit (skb, dev, txq,) }} in dev_hard_start_xmit, the operation function of the device driver will still be called.
/ / file: net/core/dev.cint dev_hard_start_xmit (struct sk_buff * skb, struct net_device * dev, struct netdev_queue * txq) {/ / get the set of callback functions of the device driver ops const struct net_device_ops * ops = dev- > netdev_ops; / / call the driver's ndo_start_xmit to send rc = ops- > ndo_start_xmit (skb, dev) For real igb NICs, the driver code is in the file drivers / net / ethernet / intel / igb / igb_main.c. Along the way, I found the "driver" code location of the loopback device: drivers / net / loopback.c. In drivers / net / loopback.c
/ / file:drivers/net/loopback.cstatic const struct net_device_ops loopback_ops = {.ndo _ init = loopback_dev_init, .ndo _ start_xmit= loopback_xmit, .ndo _ get_stats64 = loopback_get_stats64,}; so the call to dev_hard_start_xmit is actually the loopback_xmit in the loopback "driver". Why do I put "driver" in quotation marks, because loopback is a pure software nature of the virtual interface, there is no real driver, its workflow is roughly like the figure.
Let's take a look at the detailed code.
/ / file:drivers/net/loopback.cstatic netdev_tx_t loopback_xmit (struct sk_buff * skb, struct net_device * dev) {/ / strips the contact skb_orphan (skb) with the original socket; / / calls netif_rx if (likely (netif_rx (skb) = = NET_RX_SUCCESS) {}} first removes the socket pointer on skb (stripped out) in skb_orphan.
Note that in the process of sending the local network IO, the skb below the transport layer does not need to be released, just pass it to the receiver. I finally saved a little bit of money. However, it is a pity that the skb of the transport layer can not be saved either, and it still has to be applied and released frequently.
Then call netif_rx, where it will eventually be executed in enqueue_to_backlog (netif_rx-> netif_rx_internal-> enqueue_to_backlog).
/ / file: net/core/dev.cstatic int enqueue_to_backlog (struct sk_buff * skb, int cpu, unsigned int * qtail) {sd = & per_cpu (softnet_data, cpu); _ _ skb_queue_tail (& sd-input_pkt_queue, skb); _ napi_schedule (sd, & sd-backlog) In enqueue_to_backlog, insert the skb to be sent into the softnet_data- > input_pkt_queue queue and call _ napi_schedule to trigger the soft interrupt.
/ / file:net/core/dev.cstatic inline void _ napi_schedule (struct softnet_data * sd, struct napi_struct * napi) {list_add_tail (& napi-poll_list, & sd-poll_list); _ _ raise_softirq_irqoff (NET_RX_SOFTIRQ);} only if the soft interrupt is triggered, the sending process is complete.
Third, the local receiving process in the process of receiving cross-machine network packets, it needs to go through a hard interrupt before it can trigger a soft interrupt. In the local network IO process, because it is not really over the network card, so the actual transmission of the network card, hard interrupts are omitted. Start directly from the soft interrupt, send it to the protocol stack after process_backlog, the general process is as shown in the figure.
Next, let's look at the process in more detail.
After the soft interrupt is triggered, it goes into the corresponding processing method net_rx_action of NET_RX_SOFTIRQ.
/ / file: net/core/dev.cstatic void net_rx_action (struct softirq_action * h) {while (! list_empty (& sd- > poll_list)) {work = n-> poll (n, weight);}} We remember that for the igb Nic, poll actually called the igb_poll function. So who is the poll function of the loopback network card? Since there are struct softnet_data objects in poll_list, we found clues in net_dev_init.
/ / file:net/core/dev.cstatic int _ init net_dev_init (void) {for_each_possible_cpu (I) {sd- > backlog.poll = process_backlog;}} it turns out that the default poll of struct softnet_data is set to the process_backlog function during initialization to see what it has done.
Static int process_backlog (struct napi_struct * napi, int quota) {while () {while ((skb = _ skb_dequeue (& sd-process_queue) {_ _ netif_receive_skb (skb); / / the skb_queue_splice_tail_init () function is used to connect linked list a to linked list b, / / to form a new linked list b, and to turn the head of the original an into an empty linked list. Qlen = skb_queue_len (& sd-input_pkt_queue); if (qlen) skb_queue_splice_tail_init (& sd-input_pkt_queue, & sd-process_queue);}} look at the call to skb_queue_splice_tail_init this time. Don't look at the source code, just say that its function is to link the skb in sd- > input_pkt_queue to the sd- > process_queue linked list.
Then take a look at _ _ skb_dequeue,__skb_dequeue is to take down the package from sd- > process_queue to process. This matches the end of the previous sending process. The sending process is to put the packet into the input_pkt_queue queue, and the receiving process is to extract the skb from this queue.
Finally, _ _ netif_receive_skb is called to send the skb (data) to the protocol stack. After that, the calling process is consistent with the cross-machine network IO.
The call chain sent to the protocol stack is _ _ netif_receive_skb = > _ _ netif_receive_skb_core = > deliver_skb and sends the packet into the ip_rcv.
The network is followed by the transport layer, and finally awakens the user process, so there is not much expansion here.
4. Summary of native network IO Let's summarize the kernel execution flow of native network IO.
Recall that the process of cross-machine network IO is
We can now review the three opening questions.
1) 127.0.0.1 does the native network IO need to go through the network card?
Through the description of this article, we have come to the conclusion that there is no need to go through the network card. Even if the network card is unplugged, whether the local network can still be used normally.
2) what is the direction of packets in the kernel, and what is the difference in flow between packets and those sent over the public network?
Overall, native network IO does save some overhead compared with cross-machine IO. Sending data does not need to go into the driver queue of RingBuffer, but directly sends skb to the receiving protocol stack (through soft interrupt). But in the other components of the kernel, there are no less, such as system calls, protocol stack (transport layer, network layer, etc.), network equipment subsystem, neighbor subsystem. Even the "driver" program is gone (though it's just a pure software virtual thing for the loopback device). So don't mistake it for no overhead, even if it's a native network IO.
Finally, there are companies in the industry that use ebpf to accelerate communication between sidecar agents and local processes in the istio architecture. Only by introducing BPF can we get around the overhead of the kernel protocol stack, which works as follows.
See: https://cloud.tencent.com/ developer / article / 1671568
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.