Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

Do-it-yourself Epoll

2025-01-21 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Servers >

Share

Shulou(Shulou.com)06/02 Report--

Epoll is the management mechanism of Linux IO multiplexing. As a necessary component of high-performance network IO on Linux platform. The implementation of kernel can be referred to: fs/eventpoll.c.

Why do you need to implement epoll yourself? Now I plan to make a user-mode protocol stack. The mode of single thread is adopted. Https://github.com/wangbojing/NtyTcp, why implement the user-mode protocol stack? You can have your own Baidu C10M problem.

Because the protocol stack achieves the user state, it is necessary to realize the management of high-performance network IO by ourselves. So epoll implements it himself. Code: https://github.com/wangbojing/NtyTcp/blob/master/src/nty_epoll_rb.c

Before implementing epoll, you must have a good understanding of how the kernel epoll works. The epoll of the kernel can be understood in four ways.

1. Epoll data structure, rbtree pair storage, ready queue storage ready io.

2. Thread safety of Epoll, running of SMP, and prevention of deadlock.

3. Epoll kernel callback.

4. LT (horizontal trigger) and ET (edge trigger) of Epoll

Let's implement epoll from these four aspects.

1. Epoll data structure

Epoll mainly consists of two structures: eventpoll and epitem. Epitem is the event corresponding to each IO. For example, when you operate on epoll_ctl EPOLL_CTL_ADD, you need to create an epitem. Eventpoll is corresponding to each epoll. For example, epoll_create is to create an eventpoll.

Definition of Epitem

Definition of Eventpoll

The data structure is shown in the following figure.

List is used to store ready IO. The data structure is mainly discussed in two aspects: insert and remove. Similarly, we also discuss insert and remove for list. When will the data be inserted into the list? When the kernel IO is ready, the callback function of epoll_event_callback is executed to add epitem to the list.

So when do you delete the data in list? When epoll_wait is activated and rerun, copy the epitem of list one by one into the events parameter.

Rbtree is used to store all io data, and it is convenient to quickly access io_fd search. It is also discussed from insert and remove.

For when rbtree is added: when App performs the epoll_ctl EPOLL_CTL_ADD operation, add epitem to the rbtree. When will it be deleted? When App performs the epoll_ctl EPOLL_CTL_DEL operation, add epitem to the rbtree.

How can the operation of List and rbtree be thread-safe, SMP and prevent deadlocks?

II. Epoll locking mechanism

Epoll needs locking protection from the following aspects. List operation, rbtree operation, epoll_wait wait.

List uses the smallest granularity of lock spinlock, which makes it easy to quickly manipulate list when adding operations under SMP.

List add

Line 346: get the spinlock.

Line 347: the rdy of epitem is set to 1, which means that epitem is already in the ready queue. If you trigger the same event later, you only need to change the event.

Line 348: add to the list.

Line 349: add 1 to the rdnum domain of eventpoll.

Line 350: release spinlock

List deletion

Line 301: get spinlock

Line 304: read the size of rdnum and maxevents to avoid event overflow.

Line 307: loop through list to determine that adding list cannot be empty

Line 309: get the first node of list

Line 310: remove the first node of the list.

Line 311: set the rdy domain of epitem to 0 to identify that epitem is no longer in the ready queue.

Line 313: event of copy epitem to events of user space.

Line 316: number of copy plus 1

Line 317: rdnum minus one in eventpoll.

Avoid multi-core competition in SMP system. Spin lock is used here, which is not suitable for sleep lock.

Addition of Rbtree

Line 149: get the mutex.

Line 153: find out if the epitem for sockid exists. If it exists, it cannot be added. If it does not exist, it can be added.

Line 160: assign epitem.

167line: sockid assignment

Line 168: add the set event to the eventfield of epitem.

Line 170: add epitem to rbrtree.

Line 173: release the mutex.

Rbtree deletion:

Line 177: get the mutex.

Line 181: delete the node of sockid, if it does not exist, rbtree returns-1.

Line 188: release epitem

Line 190: release the mutex.

Epoll_wait hangs.

Using pthread_cond_wait, the specific implementation can be referred to.

Https://github.com/wangbojing/NtyTcp/blob/master/src/nty_epoll_rb.c

III. Epoll callback

When the callback function of Epoll will be executed, this part needs to be explained together with the protocol stack of Tcp. The timing diagram of the Tcp protocol stack is shown in the following figure. The part of the epoll callback from the protocol stack is from the number 1, 2, 3, 4 of the following figure. The implementation of the specific Tcp protocol stack is described later in another article. The following four steps are described in detail

Number 1: it is the tcp three-way handshake. after the peer feedback ack, the socket enters the rcvd state. You need to set the event listening to the socket to EPOLLIN. At this point, the ID can go to accept to read socket data.

Number 2: in the established state, after receiving the data, you need to set the event of the socket to the EPOLLIN state.

Number 3: in the established state, when the fin is received, the socket enters the close_wait. The event that requires socket is set to EPOLLIN. Read the disconnect information.

Number 4: check the send status of the socket, and send the data if the peer cwnd > 0 is OK. Therefore, it is necessary to set socket to EPOLLOUT.

So add the callback function of EPOLL around here to make epoll receive the io event normally.

4. LT and ET

LT (horizontal trigger) and ET (edge trigger) are concepts in electronic signals. I don't know. It can be checked by man epoll. As shown in the following figure:

For example: event = EPOLLIN | EPOLLLT, set event to EPOLLIN and horizontal trigger. The epoll callback function can be called continuously as long as event is EPOLLIN.

For example: event = EPOLLIN | EPOLLET,event will be triggered if it changes from EPOLLOUT to EPOLLIN. In this case, the change occurs only once, so the epoll callback function is called only once. For horizontal trigger and edge trigger, when the epoll callback function is executed, if it is EPOLLET (edge trigger), compared with the previous event, if there is a change, the epoll callback function is called. If it is EPOLLLT (horizontal trigger), check whether event is EPOLLIN, then you can call the epoll callback function.

BAT, Didi, Jinri Toutiao, Meitu, Meituan and other front-line technical posts

QQ group: 935760465

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Servers

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report