What are the differences between IO model and select, poll, epoll and kqueue 07/01 Update SLTechnology News&Howtos

What are the differences between IO model and select, poll, epoll and kqueue

2025-07-01 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Shulou(Shulou.com)06/01 Report--

Xiaobian to share with you IO model and select, poll, epoll and kqueue what are the differences, I hope you have something to gain after reading this article, let's discuss it together!

blocking I/O

nonblocking I/O

I/O multiplexing (select and poll)

signal driven I/O (SIGIO)

asynchronous I/O (the POSIX aio_functions)----The biggest feature of the asynchronous IO model is notification when it is complete.

Blocking or not depends on how IO swapping is implemented.

Asynchronous blocking is based on select, the implementation of the select function itself is blocking, and the advantage of using the select function is that it can listen to multiple file handles at the same time.

Asynchronous non-blocking notification directly after completion, the user process only needs to initiate an IO operation and then immediately return, and so on IO operation really completed, the application will get IO operation completion notification, at this time the user process only needs to process the data is good, do not need to carry out the actual IO read and write operations, because the real IO read or write operations have been completed by the kernel.

1 blocking I/O

No need to explain this, blocking socket. The following is a diagram of its invocation process:

Focus on explaining the picture below, and the following examples will be mentioned. The kernel has two procedures: wait for data and copy data from kernel to user. Recvfrom() does not return until the last copy complete. This process has always been blocked.

2 nonblocking I/O：

As opposed to blocking I/O, non-blocking sockets call the process diagram as follows:

You can see, if you manipulate it directly, it's a poll. until the kernel buffer has data.

3 I/O multiplexing (select and poll)

The most common I/O reuse model, select.

Select blocks first, returns only if there are active sockets. Select has two system calls compared to blocking I/O, but select can handle multiple sockets.

4 signal driven I/O (SIGIO)

UNIX support only. Interested classes.

Compared with I/O multiplexing (select and poll), it has the advantage that it eliminates blocking and polling of select, and when there is an active socket, it is handled by the registered handler.

5 asynchronous I/O (the POSIX aio_functions)

Few *nix systems support it, and IOCP for Windows is this model.

Completely asynchronous I/O multiplexing mechanism, because all the other four models above block at least when copying data to application by kernel. This model notifies the application only when the copy is complete, which is purely asynchronous. It seems that only windows 'completion port is this model, and the efficiency is also excellent.

6 Below is a comparison of the above five models

It can be seen that the later, the less blocking, theoretically the efficiency is optimal.

=============

The comparison of the five models is relatively clear, and the rest is to put select,epoll,iocp,kqueue into place according to the number, and that's OK.

Select and iocp correspond to the third and fifth models, respectively. What about epoll and kqueue? In fact, it belongs to the same model as select, but it is more advanced and can be regarded as having some characteristics of the fourth model, such as callback mechanism.

Why is epoll,kqueue superior to select?

The answer is, they don't poll. Because they replaced it with callback. Think about it, when there are many sockets, each time select() must complete the scheduling by traversing FD_SETSIZE Sockets, no matter which Socket is active. This wastes a lot of CPU time. If you can register a callback function for sockets that automatically performs the relevant action when they are active, you avoid polling, which is what epoll and kqueue do.

windows or *nix （IOCP or kqueue/epoll）？

It is true that IOCP for Windows is excellent, and there are very few systems that support asynchronous I/O, but due to the limitations of the system itself, large servers are still under UNIX. And as mentioned above, kqueue/epoll is just one more layer of kernel-copy data to application-layer blocking than IOCP, and thus does not count as asynchronous I/O class. However, this small block is insignificant, and kqueue and epoll are already excellent.

Provide consistent interface, IO Design Patterns

In fact, no matter which model, you can abstract a layer to provide a consistent interface, well known as ACE,Libevent (based on reactor pattern), these are cross-platform, and they automatically choose the optimal I/O reuse mechanism, users only need to call the interface. There are two design patterns: Reactor and Proactor. See: Reactor mode--VS--Proactor mode. Libevent is the Reactor model, ACE provides the Proactor model. In fact, it is an encapsulation of various I/O multiplexing mechanisms.

What is Java nio I/O?

It is now certain that the current Java essence is the select() model, which can be seen by checking/jre/bin/nio.dll. As for why Java server efficiency is good. I don't know, maybe it's better designed.

=============

To summarize some highlights:

Only IOCP is asynchronous I/O; all other mechanisms block more or less a bit.

Select is inefficient because it requires polling every time. But inefficiency is relative, depending on the situation, and can be improved by good design.

epoll, kqueue, select is Reacor mode, IOCP is Proactor mode.

Java nio package is a select model.

Difference between epoll and select

1. Use multi-processes or multi-threads, but this approach causes complexity and requires a lot of overhead to create and maintain processes and threads. (Apache server is using child process mode, advantage can isolate users)(synchronous blocking IO)

2. A better approach is I/O multiplexing, which constructs a list of descriptors (queues in epoll) and then calls a function that does not return until one of these descriptors is ready, telling the process which I/O is ready. Both select and epoll are solutions to the multi-way I/O mechanism, select being in the POSIX standard and epoll being specific to Linux.

There are three main differences (epoll relative to select advantages):

1. The number of handles of select is limited. In the linux/posix_types.h header file, there is such a declaration: #define _FD_SETSIZE 1024 means that select can listen to up to 1024 fd at the same time. Epoll does not, and its limit is the maximum number of open file handles.

2. The biggest advantage of epoll is that it will not reduce efficiency as the number of FD increases. Polling is used in selecc, where the data structure is similar to that of an array, and epoll maintains a queue, directly looking at whether the queue is empty. Epoll only operates on "active" sockets--this is because in the kernel implementation epoll is implemented according to the callback function on each fd. Then, only the "active" socket will actively call the callback function (add this handle to the queue), and other idle state handles will not. At this point, epoll implements a "pseudo"AIO. However, if most I/O is "active" and each I/O port is heavily utilized, epoll may not be more efficient than select (perhaps because of the complexity of maintaining queues).

3. Use mmap to speed up messaging between kernel and user space. Whether select,poll or epoll requires the kernel to notify FD messages to user space, how to avoid unnecessary memory copies is very important, in this regard, epoll is implemented through the kernel and user space mmap the same block of memory.

About ePoll Working Mode ET, LT

Epoll works in two ways.

ET: Edge triggered. Notified only when status changes, epoll_wait returns. In other words, for an event, only one notification. Only non-blocking sockets are supported.

LT: Level Triggered (default mode). Similar to select/poll, it will be notified as long as there are still unhandled events. When the epoll interface is called in LT mode, it is equivalent to a faster poll. Socket that supports blocking and non-blocking.

Three Linux concurrent network programming models

Apache model, PPC (Process Per Connection): Assign one process per connection. The time and space allocated by the host to each connection is costly, and as the number of connections increases, so does the cost of a large number of interprocess handoffs. It is difficult to cope with a large number of concurrent customer connections.

TPC model (Thread Per Connection): One thread per connection. Similar to PCC.

Select model: I/O multiplexing technology.

.1 Each connection corresponds to a description. The select model is limited by FD_SETSIZE, that is, the maximum number of descriptors opened by the process linux2.6.35 is 1024. In fact, the number of descriptors that can be opened by each process in linux is only limited by the memory size. However, when designing the system call of select, it refers to the value of FD_SETSIZE. This value can be changed by recompiling the kernel, but it doesn't cure the problem, and even increasing the number of processes for millions of user connection requests is still a drop in the bucket.

.2select scans a collection of file descriptors at a time, the size of which is the value passed in as the first parameter of select. however, if that file descriptor that each process can open increase, the scanning efficiency will decrease.

.3 kernel to user space, using memory copy to pass information that occurs on file descriptions.

Poll model: I/O multiplexing technology. The poll model will not be constrained by FD_SETSIZE because the size of the file descriptor set scanned by the kernel is specified by the user, i.e. the second parameter of poll. But there are still scanning efficiency and memory copy issues.

5 pselect model: I/O multiplexing technology. Same with Select.

6 epoll model:

.1) No file descriptor size limit is only related to memory size

.2) When epoll returns, it already knows exactly which socket fd has what event, so there is no need to compare them one by one like select.

.3) Kernel-to-user space uses shared memory to deliver messages.

IV: FAQ

A single epoll does not solve all problems, especially when each of your operations is time-consuming, because epoll is processed serially. So you need to create thread pools to maximize performance.

If fd is registered in two epolls, both epolls will trigger events if there is time to occur.

3. If fd registered in epoll is closed, it will be automatically cleared from epoll listening list.

If multiple events trigger epoll at the same time, multiple events are returned together.

epoll_wait will always listen for epoll events, so it does not need to be added to events.

6, in order to avoid a large amount of data io, et mode only processing a fd, other fd is starved to death. Linux suggests adding the ready bit to the structure that fd contacts, then setting epoll_wait only to ready mode after triggering the event, and polling the ready fd list below.

After reading this article, I believe you have a certain understanding of "IO model and the differences between select, poll, epoll and kqueue." If you want to know more about relevant knowledge, please pay attention to the industry information channel. Thank you for reading!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.