In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-01-30 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > IT Information >
Share
Shulou(Shulou.com)11/24 Report--
This article comes from the official account of Wechat: developing Internal skills practice (ID:kfngxl). Author: Zhang Yanfei allen
A process is an expensive guy on Linux, not to mention creation, it takes a few microseconds to change the context at a time. So in order to efficiently provide services to a large number of users, it is necessary to allow a process to handle many tcp connections at the same time. Now suppose a process maintains 10000 connections, so how do you find out which connection has data to read and which connection to write?
Of course, we can find IO events in a loop, but this approach is too low-level. We hope to have a more efficient mechanism to find an IO event directly and quickly when it occurs on one of many connections. In fact, the Linux operating system has already done this for us, and it is the IO multiplexing mechanism that we are familiar with. Reuse here refers to the reuse of processes.
The multiplexing schemes on Linux include select, poll and epoll. Of the three, epoll has the best performance and can support the largest number of concurrency. So today we take epoll as the object to be disassembled, and reveal in depth how the kernel implements multi-channel IO management.
To facilitate the discussion, let's give a simple example of using epoll (just an example, which is not written in practice):
Int main () {listen (lfd,); cfd1 = accept (); cfd2 = accept (); efd = epoll_create (); epoll_ctl (efd, EPOLL_CTL_ADD, cfd1,); epoll_ctl (efd, EPOLL_CTL_ADD, cfd2,); epoll_wait (efd,)} where epoll-related functions are as follows:
Epoll_create: create an epoll object
Epoll_ctl: add connections to the epoll object to manage
Epoll_wait: the IO event on the connection that awaits its management
With the help of this demo, we will unfold the deep disassembly of the epoll principle. I believe that after you understand this article, your ability to control epoll will become perfect!
Friendly tips, ten thousand words long text, be careful to enter!
First, accept creates a new socket. Let's start directly with the accept on the server side. After accept, the process creates a new socket dedicated to communicating with the corresponding client, and then puts it in the list of open files of the current process.
A more specific structure of one of the connected socket kernel objects is shown below.
Next, let's take a look at the source code for the creation of the socket kernel object when receiving the connection. The system call code for accept is located in the source file net / socket.c.
/ / file: net/socket.cSYSCALL_DEFINE4 (accept4, int, fd, struct sockaddr _ _ user *, upeer_sockaddr, int _ user *, upeer_addrlen, int, flags) {struct socket * sock, * newsock; / / socket sock = sockfd_lookup_light (fd, & err, & fput_needed) found according to fd; / / 1.1 apply and initialize new socket newsock = sock_alloc (); newsock- > type = sock- > type Newsock- > ops = sock- > ops; / / 1.2 apply for a new file object and set it to the new socket newfile = sock_alloc_file (newsock, flags, sock- > sk- > sk_prot_creator- > name);. / / 1.3 receive connection err = sock- > ops- > accept (sock, newsock, sock- > file- > f_flags); / / 1.4 add new files to the current process's open file list fd_install (newfd, newfile); 1.1 initialize the struct socket object in the above source code, first call sock_alloc to apply for a struct socket object. Then the set ops of the protocol operation functions on the socket object of the listen state is assigned to the new socket. (for all socket under the AF_INET protocol family, their ops methods are the same, so they can be copied directly here.)
The definition of inet_stream_ops is as follows
/ / file: net/ipv4/af_inet.cconst struct proto_ops inet_stream_ops = {... .accept = inet_accept, .accept = inet_listen, .sendmsg = inet_sendmsg, .recvmsg = inet_recvmsg,.} 1.2 there is an important member of the filestruct socket object in the application for the new socket object-- the file kernel object pointer. This pointer is empty when initialized. In the accept method, sock_alloc_file is called to request memory and initialize. Then set the new file object to sock- > file.
Let's take a look at the implementation process of sock_alloc_file:
Struct file * sock_alloc_file (struct socket * sock, int flags, const char * dname) {struct file * file; file = alloc_file (& path, FMODE_READ | FMODE_WRITE, & socket_file_ops);. Sock-file = file;} sock_alloc_file will then call alloc_file. Notice that in the alloc_file method, the set of socket_file_ops functions is assigned to the new file- > f_op.
/ / file: fs/file_table.cstruct file * alloc_file (struct path * path, fmode_t mode, const struct file_operations * fop) {struct file * file; file-f_op = fop;.} socket_file_ops is defined as follows:
/ / file: net/socket.cstatic const struct file_operations socket_file_ops = {.aio _ read = sock_aio_read, .aio _ write = sock_aio_write, .poll = sock_poll, .release = sock_close,...}; here you can see that the file- > fanciopop- > poll function in the new socket created in accept points to sock_poll. Then we'll call it, and we'll talk about it later.
In fact, there is also a socket pointer inside the file object that points to the socket object.
In addition to the file object pointer, there is a core member sock in the socket kernel object.
/ / file: include/linux/net.hstruct socket {struct file * file; struct sock * sk;} is a very large struct sock data structure and is the core kernel object of socket. Core data structures such as send queue, receive queue, wait queue and so on are all located here. Its definition location file include / net / sock.h is not displayed because it is too long.
In the source code of accept:
/ / file: net/socket.cSYSCALL_DEFINE4 (accept4,) / / 1.3 receive connection err = sock- > ops- > accept (sock, newsock, sock- > file- > f_flags); the corresponding method} sock- > ops- > accept is inet_accept. When it executes, it gets the created sock directly from the handshake queue. The complete process of creating a sock object involves a three-way handshake, which is complicated, not to mention. Let's just look at one function used in the struct sock initialization process:
Void sock_init_data (struct socket * sock, struct sock * sk) {sk-sk_wq = NULL; sk-sk_data_ready = sock_def_readable;} sets the sk_data_ready function pointer of the sock object to sock_def_readable here. Just remember this here, and you'll need it later.
Add new files to the current process's open files list when key kernel objects such as file, socket, sock are created, the only thing left to do is to hang it in the current process's open files list.
/ / file: fs/file.cvoid fd_install (unsigned int fd, struct file * file) {_ fd_install (current- > files, fd, file);} void _ fd_install (struct files_struct * files, unsigned int fd, struct file * file) {. Fdt = files_fdtable (files); BUG_ON (fdt- > fd [fd]! = NULL); rcu_assign_pointer (fdt- > fd [fd], file);} II. Epoll_create implementation creates a struct eventpoll kernel object when the user process calls epoll_create. And also associate it to the list of open files in the current process.
For the struct eventpoll object, the more detailed structure is as follows (again, only the members related to today's topic are listed).
The source code for epoll_create is relatively simple. Under fs / eventpoll.c
/ / file:fs/eventpoll.cSYSCALL_DEFINE1 (epoll_create1, int, flags) {struct eventpoll * ep = NULL; / / create an eventpoll object error = ep_alloc (& ep);} the definition of struct eventpoll is also in this source file.
/ / file:fs/eventpoll.cstruct eventpoll {/ / sys_epoll_wait wait queue wait_queue_head_t wq; / / receive ready descriptors are put here. There is a red-black tree struct rb_root rbr;.} eventpoll in each epoll object. The meaning of several members of this structure is as follows:
Wq: waiting queue list. When the softinterrupt data is ready, it uses wq to find the user process blocking on the epoll object.
Rbr: a red and black tree. To support efficient search, insertion, and deletion of a large number of connections, a red-black tree is used inside eventpoll. Use this tree to manage all socket connections added under the user process.
Rdllist: a linked list of ready descriptors. When some connections are ready, the kernel puts the ready connections in the rdllist linked list. In this way, the application process only needs to judge the linked list to find out the ready process, instead of traversing the whole tree.
Of course, after this structure is applied, a little initialization needs to be done, which is done in ep_alloc.
/ / file: fs/eventpoll.cstatic int ep_alloc (struct eventpoll * * pep) {struct eventpoll * ep; / / apply for epollevent memory ep = kzalloc (sizeof (* ep), GFP_KERNEL); / / initialize waiting queue header init_waitqueue_head (& ep- > wq); / / initialize ready list INIT_LIST_HEAD (& ep- > rdllist); / / initialize red-black tree pointer ep- > rbr = RB_ROOT At this point, these members have only just been defined or initialized and have not been used yet. They will be used down there.
Third, epoll_ctl add socket to understand this step is the key to understanding the whole epoll.
For simplicity, we only consider using EPOLL_CTL_ADD to add socket, ignoring deletions and updates.
Suppose we now have multiple socket connections to clients and epoll kernel objects created. When registering each socket with epoll_ctl, the kernel does the following three things
1. Assign a red-black tree node object epitem
two。 Add a wait event to the wait queue of socket, and its callback function is ep_poll_callback
3. Insert epitem into the red-black tree of the epoll object
After adding two socket through epoll_ctl, the final diagram of these kernel data structures in the process is roughly as follows:
Let's take a closer look at how socket is added to the epoll object and find the source code of epoll_ctl.
/ file:fs/eventpoll.cSYSCALL_DEFINE4 (epoll_ctl, int, epfd, int, op, int, fd, struct epoll_event _ _ user *, event) {struct eventpoll * ep; struct file * file, * tfile; / / find eventpoll kernel object file = fget (epfd) according to epfd; ep = file- > private_data; / / find its file kernel object tfile = fget (fd) according to socket handle number Switch (op) {case EPOLL_CTL_ADD: if (! epi) {epds.events | = POLLERR | POLLHUP; error = ep_insert (ep, & epds, tfile, fd);} else error =-EEXIST; clear_tfile_check_list (); break;} first finds eventpoll and socket related kernel objects in epoll_ctl based on the incoming fd. For EPOLL_CTL_ADD operations, the ep_insert function is then executed. All registrations are done in this function.
/ / file: fs/eventpoll.cstatic int ep_insert (struct eventpoll * ep, struct epoll_event * event, struct file * tfile, int fd) {/ / 3.1 assign and initialize epitem / / assign an epi object struct epitem * epi; if (! (epi = kmem_cache_alloc (epi_cache, GFP_KERNEL)) return-ENOMEM / / initialize a pair of assigned epi / / epi- > ffd contains handle number and struct file object address INIT_LIST_HEAD (& epi- > pwqlist); epi- > ep = ep; ep_set_ffd (& epi- > ffd, tfile, fd); / / 3.2 set socket waiting queue / / define and initialize the ep_pqueue object struct ep_pqueue epq; epq.epi = epi Init_poll_funcptr (& epq.pt, ep_ptable_queue_proc); / / call the ep_ptable_queue_proc registration callback function / / the actual injected function is ep_poll_callback revents = ep_item_poll (epi, & epq.pt);. / / 3. Insert epi into the red-black tree ep_rbtree_insert (ep, epi) in the eventpoll object.} 3.1 assign and initialize epitem. For each socket, an epitem is assigned when epoll_ctl is called. The main data of this structure are as follows:
/ / file: fs/eventpoll.cstruct epitem {/ / red-black tree node struct rb_node rbn; / / socket file descriptor information struct epoll_filefd ffd; / / the eventpoll object struct eventpoll * ep; / / waiting queue struct list_head pwqlist;} to which it belongs initializes epitem, first pointing its ep pointer to the eventpoll object in the line of code epi- > ep = ep. In addition, epitem- > ffd is populated with the file and fd of the socket to be added.
The ep_set_ffd function used is as follows.
Static inline void ep_set_ffd (struct epoll_filefd * ffd, struct file * file, int fd) {ffd-file = file; ffd-fd = fd;} 3.2 set the socket waiting queue. After the epitem is created and initialized, the second thing in ep_insert is to set the waiting task queue on the socket object. And set the ep_poll_callback under the function fs / eventpoll.c file as the callback function when the data is ready.
The source code of this piece is a little bit winding, if you have no patience, skip to the bold font below. First, let's take a look at ep_item_poll.
Static inline unsigned int ep_item_poll (struct epitem * epi, poll_table * pt) {pt-_key = epi-event.events; return epi-ffd.file-f_op-poll (epi-ffd.file, pt) & epi-event.events;} see that file- > flipper-> poll under socket is called here. From the structure diagram of socket in the first section above, we know that this function is actually sock_poll.
Static unsigned int sock_poll (struct file * file, poll_table * wait) {. Return sock-ops-poll (file, sock, wait);} also looking back at the structure diagram of socket in the first section, sock- > ops- > poll actually points to tcp_poll.
/ / file: net/ipv4/tcp.cunsigned int tcp_poll (struct file * file, struct socket * sock, poll_table * wait) {struct sock * sk = sock-sk; sock_poll_wait (file, sk_sleep (sk), wait);} the sk_sleep function is called before the second parameter of sock_poll_wait is passed. In this function, it gets the waiting queue header wait_queue_head_t under the sock object, where the waiting queue item is inserted later. Notice a little bit here that it's the wait queue for socket, not the epoll object. Take a look at the sk_sleep source code:
/ / file: include/net/sock.hstatic inline wait_queue_head_t * sk_sleep (struct sock * sk) {BUILD_BUG_ON (offsetof (struct socket_wq, wait)! = 0); return & rcu_dereference_raw (sk-sk_wq)-wait;} then really enter sock_poll_wait.
Static inline void sock_poll_wait (struct file * filp, wait_queue_head_t * wait_address, poll_table * p) {poll_wait (filp, wait_address, p);} static inline void poll_wait (struct file * filp, wait_queue_head_t * wait_address, poll_table * p) {if (p & p-_qproc & & wait_address) p-_qproc (filp, wait_address, p) Qproc here is a function pointer, which is set to the ep_ptable_queue_proc function in the previous init_poll_funcptr call.
Static int ep_insert (...) {... Init_poll_funcptr (& epq.pt, ep_ptable_queue_proc);.} / / file: include/linux/poll.hstatic inline void init_poll_funcptr (poll_table * pt, poll_queue_proc qproc) {pt- > _ qproc = qproc; pt- > _ key = ~ 0ul;} knock on the blackboard! Attention, after a long time of effort, we have finally come to the point! In the ep_ptable_queue_proc function, a new waiting queue item is created and its callback function is registered as the ep_poll_callback function. The wait item is then added to the wait queue in socket.
/ / file: fs/eventpoll.cstatic void ep_ptable_queue_proc (struct file * file, wait_queue_head_t * whead, poll_table * pt) {struct eppoll_entry * pwq F (epi-nwait = 0 & (pwq = kmem_cache_alloc (pwq_cache, GFP_KERNEL) {/ / initialize callback method init_waitqueue_func_entry (& pwq- > wait, ep_poll_callback) / / put ep_poll_callback into socket's waiting queue whead (note that it is not epoll's waiting queue) add_wait_queue (whead, & pwq- > wait) } in the previous article in-depth understanding of the stumbling block on the road of high-performance network development-the blocking system call recvfrom in the synchronous blocking network IO, because the user process needs to be awakened when the data is ready, the private waiting for the object item (this variable is also drunk) will be set to the current user process descriptor current. Today's socket is managed by epoll, and there is no need to wake up the process when a socket is ready, so the Q-> private here is set to NULL without any egg use.
/ / file:include/linux/wait.hstatic inline void init_waitqueue_func_entry (wait_queue_t * Q, wait_queue_func_t func) {q-flags = 0; q-private = NULL; / / ep_poll_callback is registered on the wait_queue_t object / / call q-func q-func = func when data arrives } as mentioned above, only the callback function Q-> func is set to ep_poll_callback in the waiting queue item. In the following section 5, we will see that after the soft interrupt receives the data from socket's receive queue, it will call back and forth through the registered ep_poll_callback function, and then notify the epoll object.
Insert the red-black tree after allocating the epitem object, then insert it into the red-black tree. A schematic diagram of a red-black tree in an epoll with some socket descriptors is as follows:
Here let's talk about why we use red and black trees. Many people say it is because of high efficiency. In fact, I think this explanation is not comprehensive enough, to find the efficiency tree can not compare with HASHTABLE. Personally, I think a more reasonable explanation is to make epoll more balanced in terms of search efficiency, insertion efficiency, memory overhead, and so on. Finally, it is found that the most suitable data structure for this requirement is the red-black tree.
Fourth, the thing that epoll_wait waits for receiving epoll_wait to do is not complicated. When it is called, it can observe whether there is any data in the eventpoll- > rdllist linked list. If you have any data, you will return. If there is no data, you will create a waiting queue item, add it to the waiting queue of eventpoll, and then block yourself.
Note: when epoll_ctl adds socket, it also creates a wait queue entry. The difference is that the wait queue item here is hung on the epoll object, while the former is hung on the socket object.
The source code is as follows:
/ / file: fs/eventpoll.cSYSCALL_DEFINE4 (epoll_wait, int, epfd, struct epoll_event _ user *, events, int, maxevents, int, timeout) {. Error = ep_poll (ep, events, maxevents, timeout);} static int ep_poll (struct eventpoll * ep, struct epoll_event _ _ user * events, int maxevents, long timeout) {wait_queue_t wait .fetch _ events: / / 4.1 determine whether there are event ready if (! ep_events_available (ep)) {/ / 4.2 define wait events and associate the current process init_waitqueue_entry (& wait, current); / / 4.3 add a new waitqueue to the epoll- > wq chain list _ _ add_wait_queue_exclusive (& ep- > wq, & wait) For (;;) {. / / 4.4 Let CPU actively enter sleep state if (! schedule_hrtimeout_range (to, slack, HRTIMER_MODE_ABS)) timed_out = 1 First, call ep_events_available to determine if there are any events in the ready list.
/ / file: fs/eventpoll.cstatic inline int ep_events_available (struct eventpoll * ep) {return! list_empty (& ep-rdllist) ep-ovflist! = EP_UNACTIVE_PTR;} 4.2 defines a wait event and correlates a connection that assumes that the current process is not ready, then it goes into init_waitqueue_entry to define the waiting task and adds current (current process) to the waitqueue.
Yes, when there is no IO event, epoll also blocks the current process. This is reasonable, because there is nothing to do and there is no point in occupying CPU. Many articles on the Internet have a bad habit of not saying the subject when discussing the concepts of blocking, non-blocking and so on. This will cause you to look in the clouds and fogs. In the case of epoll, epoll itself is blocking, but socket is generally set to non-blocking. These concepts are meaningful only when the subject is said.
/ / file: include/linux/wait.hstatic inline void init_waitqueue_entry (wait_queue_t * Q, struct task_struct * p) {Q-> flags = 0; Q-> private = p; Q-> func = default_wake_function;} Note here the callback function name is default_wake_function. This function will be called later when the section 5 data arrives.
Add to static inline void _ add_wait_queue_exclusive (wait_queue_head_t * Q, wait_queue_t * wait) {wait-flags | = WQ_FLAG_EXCLUSIVE; _ add_wait_queue (Q, wait);} here, the wait event defined in the previous section is added to the wait queue of the epoll object.
4.4. let CPU actively go to sleep and set the current process to interruptible through set_current_state. Call schedule_hrtimeout_range to give up CPU and actively go to sleep.
/ / file: kernel/hrtimer.cint _ _ sched schedule_hrtimeout_range (ktime_t * expires, unsigned long delta, const enum hrtimer_mode mode) {return schedule_hrtimeout_range_clock (expires, delta, mode, CLOCK_MONOTONIC);} int _ sched schedule_hrtimeout_range_clock () {schedule ();} Select the next process schedule in schedule
/ / file: kernel/sched/core.cstatic void _ _ sched _ schedule (void) {next = pick_next_task (rq); Context_switch (rq, prev, next); V, data is coming. When epoll_ctl executes earlier, the kernel adds a wait queue item for each socket. When the epoll_wait finishes running, the wait queue element is added to the event poll object. Before we discuss data reception, let's summarize the contents of these queue items a little bit.
The ready handler function for the socket- > sock- > sk_data_ready setting is sock_def_readable
In the wait queue entry of socket, its callback function is ep_poll_callback. In addition, its private is useless, pointing to a null pointer null.
In the wait queue entry of eventpoll, the callback function is default_wake_function. Its private points to the user process waiting for the event.
In this section, we will see how the soft interrupt enters each callback function in turn after the data is processed, and finally notifies the user process.
5.1 receiving data to the task queue about how soft interrupts handle network frames, in order to avoid being too bloated, it will not be described here. If you are interested, you can see the article "illustrating the receiving process of Linux network packets". Let's start today directly with tcp_v4_rcv, the processing entry function of the tcp protocol stack.
/ / file: net/ipv4/tcp_ipv4.cint tcp_v4_rcv (struct sk_buff * skb) {. Th = tcp_hdr (skb); / / get tcp header iph = ip_hdr (skb); / / get ip header / / find the corresponding socket sk = _ _ inet_lookup_skb (& tcp_hashinfo, skb, th- > source, th- > dest) based on the ip and port information in the packet header. / / socket is not locked by the user if (! sock_owned_by_user (sk)) {{if (! tcp_prequeue (sk, skb)) ret = tcp_v4_do_rcv (sk, skb);}} first query the corresponding socket in tcp_v4_rcv based on the source and dest information in the header of the received network packet. After finding it, we go directly to the receiving body function tcp_v4_do_rcv to see.
/ / file: net/ipv4/tcp_ipv4.cint tcp_v4_do_rcv (struct sock * sk, struct sk_buff * skb) {if (sk-sk_state = = TCP_ESTABLISHED) {/ / perform data processing if (tcp_rcv_established (sk, skb, tcp_hdr (skb), skb- > len)) {rsk = sk; goto reset;} return 0 } / / other packet processing in non-ESTABLISH state.} We assume that we are processing packets in ESTABLISH state, so we go back to the tcp_rcv_established function for processing.
/ file: net/ipv4/tcp_input.cint tcp_rcv_established (struct sock * sk, struct sk_buff * skb, const struct tcphdr * th, unsigned int len) {. / / receive data into the queue eaten = tcp_queue_rcv (sk, skb, tcp_header_len, & fragstolen); / / data ready, wake up the blocked process sk- > sk_data_ready (sk, 0) on the socket; put the received data on the receiving queue of socket by calling the tcp_queue_rcv function in tcp_rcv_established.
As shown in the following source code
/ / file: net/ipv4/tcp_input.cstatic int _ _ must_check tcp_queue_rcv (struct sock * sk, struct sk_buff * skb, int hdrlen, bool * fragstolen) {/ / put the received data to the tail of socket's receive queue if (! eaten) {_ _ skb_queue_tail (& sk- > sk_receive_queue, skb); skb_set_owner_r (skb, sk);} return eaten The lookup ready callback function calls tcp_queue_rcv to receive, and then calls sk_data_ready to wake up the user process waiting on socket. This is another function pointer. Recall that in the first section above, we mentioned the sock_init_data function in the socket creation process of the accept function, in which sk_data_ready has been set to the sock_def_readable function. It is the default data-ready processing function.
When the data is ready on socket, the kernel will use the function sock_def_readable as the entry point to find the callback function ep_poll_callback that is set on it when epoll_ctl adds socket.
Let's take a closer look at the details:
/ file: net/core/sock.cstatic void sock_def_readable (struct sock * sk, int len) {struct socket_wq * wq; rcu_read_lock (); wq = rcu_dereference (sk-sk_wq) / / the name is not good because there is no blocking process. / / it determines that the waiting queue is not empty if (wq_has_sleeper (wq)) / / executes the callback function wake_up_interruptible_sync_poll (& wq- > wait, POLLIN | POLLPRI | POLLRDBAND) on the waiting queue item; sk_wake_async (sk, SOCK_WAKE_WAITD, POLL_IN) Rcu_read_unlock ();} the function names here are actually confusing.
Wq_has_sleeper, for simple recvfrom system calls, does determine whether there is a process blocking. But for the socket under epoll, it is only judged that the waiting queue is not empty, and there may not be any process blocking.
Wake_up_interruptible_sync_poll, which only enters the callback function set on the socket waiting queue item, does not necessarily have the operation of waking up the process.
Then let's focus on wake_up_interruptible_sync_poll.
Let's take a look at how the kernel finds the callback function registered in the waiting queue item.
/ / file: include/linux/wait.h#define wake_up_interruptible_sync_poll (x, m)\ _ wake_up_sync_key ((x), TASK_INTERRUPTIBLE, 1, (void *) (m)) / / file: kernel/sched/core.cvoid _ _ wake_up_sync_key (wait_queue_head_t * Q, unsigned int mode, int nr_exclusive, void * key) {. _ wake_up_common (Q) Mode, nr_exclusive, wake_flags, key) } then enter _ _ wake_up_common
Static void _ _ wake_up_common (wait_queue_head_t * Q, unsigned int mode, int nr_exclusive, int wake_flags, void * key) {wait_queue_t * curr, * next; list_for_each_entry_safe (curr, next, & q-task_list, task_list) {unsigned flags = curr-flags If (curr-func (curr, mode, wake_flags, key) & & (flags & WQ_FLAG_EXCLUSIVE) & &!-- nr_exclusive) break;}} in _ _ wake_up_common, select the waiting queue to register an element curr, and call back its curr- > func. Recall that when we called ep_insert, we set this func to ep_poll_callback.
Execute the socket ready callback function in the previous section to find the function ep_poll_callback registered in the socket waiting queue item, which will then be called by the soft interrupt.
/ / file: fs/eventpoll.cstatic int ep_poll_callback (wait_queue_t * wait, unsigned mode, int sync, void * key) {/ / obtain epitem struct epitem * epi = ep_item_from_wait (wait) corresponding to wait; / / obtain eventpoll structure struct eventpoll * ep = epi- > ep; / / 1 corresponding to epitem. Add the current epitem to eventpoll's ready queue list_add_tail (& epi- > rdllink, & ep- > rdllist); / / 2. Check to see if eventpoll is waiting for if (& ep- > wq) wake_up_locked (& ep- > wq) on the waiting queue; the epitem can be found in ep_poll_callback based on the extra base pointer on the waiting task queue entry, and then the eventpoll object can also be found.
The first thing it does is add its own epitem to the epoll's ready queue.
It then checks to see if there are wait items in the wait queue on the eventpoll object (which is set when the epoll_wait executes).
If you didn't perform the soft interrupt, you would have done it. If there is a wait, find the callback function set in the wait.
Call wake_up_locked () = > _ _ wake_up_locked () = > _ _ wake_up_common.
Static void _ _ wake_up_common (wait_queue_head_t * Q, unsigned int mode, int nr_exclusive, int wake_flags, void * key) {wait_queue_t * curr, * next; list_for_each_entry_safe (curr, next, & q-task_list, task_list) {unsigned flags = curr-flags If (curr-func (curr, mode, wake_flags, key) & & (flags & WQ_FLAG_EXCLUSIVE) & &!-- nr_exclusive) break;}} in _ _ wake_up_common, call curr- > func. The func here is where epoll_wait is the default_wake_function function passed in.
Execute epoll Readiness Notification to find the process descriptor in the waiting queue item in default_wake_function and wake it up.
The source code is as follows:
/ file:kernel/sched/core.cint default_wake_function (wait_queue_t * curr, unsigned mode, int wake_flags, void * key) {return try_to_wake_up (curr-private, mode, wake_flags);} waiting for the queue item curr- > private pointer is a process that is blocked while waiting on the epoll object.
Push the epoll_wait process into a runnable queue and wait for the kernel to reschedule the process. Then, after the process corresponding to epoll_wait is run again, it is resumed from schedule.
When the process wakes up, the code paused while continuing from epoll_wait continues to execute. Return the ready events in rdlist to the user process
/ / file: fs/eventpoll.cstatic int ep_poll (struct eventpoll * ep, struct epoll_event _ user * events, int maxevents, long timeout) {. _ _ remove_wait_queue (& ep-wq, & wait); set_current_state (TASK_RUNNING);} check_events: / / returns the ready event to the user process ep_send_events (ep, events, maxevents))} from the user's point of view, epoll_wait just waits a little longer, but the process is executed sequentially.
Summary Let's use a picture to summarize the whole working journey of epoll.
During the soft interrupt callback, the callback function is also sorted out:
= > ep_poll_callback: added to the socket when the sock_def_readable:sock object is initialized = > default_wake_function: epoll_wait is set to the epoll
To sum up, the kernel runtime environment in epoll-related functions is divided into two parts:
User process kernel state. When a function such as epoll_wait is called, the process will be trapped in the kernel state to execute. This part of the code is responsible for looking at the receive queue and for blocking the current process out of the CPU.
Hard and soft interrupt context. In these components, the packet is received from the network card for processing, and then placed in the receiving queue of socket. For epoll, find the epitem associated with the socket and add it to the ready list of the epoll object. At this time, check to see if there are any blocked processes on the epoll, if there is a wake-up call.
In order to cover each detail, there are many processes involved in this article, and blocking is introduced.
But in practice, as long as there is enough work, epoll_wait will not block the process at all. The user process will work all the time, until there is really nothing to do in epoll_wait before giving up the CPU. This is where epoll is efficient!
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.