Network programming: Ipaw O reuse 04/19 Update SLTechnology News&Howtos

Network programming: Ipaw O reuse

2025-04-19 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Network Security >

Shulou(Shulou.com)06/01 Report--

Istroke O multiplexing is a common technology in multithreading or multiprocess programming. It is mainly supported by three functions of select/epoll/poll. Here, the select and epoll functions are introduced in detail.

Select function

This function runs a process that instructs the kernel to wait for any one of multiple events to occur and to wake it up only after one or more events occur or after a specified period of time.

Call select to tell the kernel which descriptors (read, write, or exception conditions) are interested and how long to wait. The descriptors we are interested in are not limited to sockets; any descriptor can be tested using select.

Function prototype:

# include#includeint select (int maxfdp1, fd_set * readset, fd_set * writeset, fd_set * exceptset, const struct timeval * timeout); return: number of ready descriptors, 0 if timeout,-1 if error occurs

Wait forever: return only if there is a descriptor ready for Iripple O, setting it to a null pointer

Waits for a fixed period of time: returns when a descriptor is ready for the timeval O, but does not exceed the number of seconds and microseconds specified in the structure pointed to by this parameter.

No waiting at all: check the descriptor and return immediately, which is polling. To do this, the parameter must point to a timeval structure, but the value must be set to 0

The last parameter, timeout, tells the kernel how long it takes to wait for any of the specified descriptors to be ready. There are three possibilities for this parameter:

The three parameters, readset,writeset,exceptset, specify descriptors that we want the kernel to test for read, write, and exception conditions.

How to assign one or more descriptor values to each of these three parameters is a design problem. Select uses a set of descriptors, usually an array of integers, where each bit in each integer corresponds to a descriptor. For example, suppose you use a 32-bit integer, then the first element of the array corresponds to the descriptor 0# 31, the second element corresponds to the descriptor 32163, and so on. All of these implementation details are application-independent and are hidden in a data type called fd_set and the following four macros:

Void FD_ZERO (fd_set * fdset); / / clear all bits in fdsetvoid FD_SET (int fd, fd_set * fdset); / / turn on the bit for fd in fdsetvoid FD_CLR (int fd, fd_set * fdset); / / turn off the bit for fd in fdsetint FD_ISSET (int fd, fd_set * fdset); / / is the bit for fd on in fdset?

We assign a descriptor set of fd_set data types and use these macros to set or test each bit in the collection, or we can assign it to another descriptor set using assignment statements in C language.

Note: the method of occupying a bit in an array of integers for each descriptor discussed earlier is only one possible implementation of the select function.

The maxfdp1 parameter specifies the number of descriptors to be tested, and its value is the maximum descriptor to be tested plus 1. The descriptor 0Jol 1pm 2pm... until maxfdp1-1 was tested.

The select function modifies the set of descriptors pointed to by the pointers readset,writeset and exceptset, so that all three parameters are value-result parameters. After this function returns, we use the FD_ISSET macro to test the descriptor in the fd_set data type. Any bits in the descriptor set that correspond to the not ready descriptor are cleared to 0. To do this, every time we call the select function again, we have to set all the bits of interest in the descriptor set to 1 again.

Select returns the ready condition of the socket

When any of the following four conditions are met, a socket is ready to read:

The number of data bytes in the socket receive buffer is greater than or equal to the current size of the socket receive buffer low water mark. Performing a read on such a socket will not block and will return a value greater than 0 (that is, data that is ready to be read). We use the SO_RECVLOWAT socket option to set the socket's low water mark. For TCP and UDP sockets, the default value is 1

The read half of the connection is closed (that is, the TCP connection that received the FIN). Read operations on such sockets will not block and return 0 (that is, return EOF)

The socket is a listening socket and the number of completed connections is not 0.

There is a socket error waiting to be handled. Reads to such sockets will not block and return-1 (that is, an error), and errno will be set to the exact error condition. These pending errors can also be obtained and cleared by calling getsockopt with the SO_ERROR socket option.

When any of the following four conditions are met, a socket is ready to write:

The number of bytes available in the socket send buffer is greater than or equal to the current size of the socket send buffer low water mark, and either the socket is connected, or the socket does not require a connection (such as UDP sockets). This means that if we make such a socket non-blocking, the write operation will not block and return a positive value (such as the number of bytes received by the transport layer). We use the SO_SNDLOWAT socket option to set the low water mark for the socket. For TCP and UDP, the default value is 2048

The write half of the connection is closed. Writing to such a socket will produce a SIGPIPE signal

A connection has been established using a non-blocking connect socket, or the connect has failed

There is a socket error waiting to be handled. Writing to such a socket will not block and return-1 (that is, an error), while setting errno to the exact error condition. These pending errors can also be obtained and cleared by calling getsockopt with the SO_ERROR socket option.

If a socket has out-of-band data or is still in an out-of-band tag, it has an exception condition to handle.

Note: when an error occurs on a socket, it will be marked by select as both readable and writable

The purpose of receiving and sending low water marks is to allow the application process to control how much data is readable or how much space is available for writing before select readable or writable conditions.

Any UDP socket is always writable as long as its send low water mark is less than or equal to the send buffer size (the default should always be this relationship), because UDP sockets do not require a connection.

Poll function

Function prototype:

# includeint poll (struct pollfd * fdarray, unsigned long nfds, int timeout); return: number if there are ready descriptors, 0 if timeout,-1 if error occurs

The first parameter is a pointer to the first element of a structured array. Each array element is a pollfd structure that specifies the conditions under which a given descriptor FD is tested.

Struct pollfd {int fd; / / descriptor to check short event; / / events of interest on fd short revents; / / events that occurred on fd}

The condition to be tested is specified by the events member, and the function returns the state of the descriptor in the corresponding revents member. (each descriptor has two variables, one for the call value and the other for the return result, thus avoiding the use of value-result parameters.)

Poll event

Epoll function

Epoll is a unique Ithumb O reuse function of Linux. It is very different from select and poll in implementation and use.

First, epoll uses a set of functions to accomplish the task, rather than a single function.

Second, epoll places events on file descriptors that users care about in an event table in the kernel, eliminating the need to repeatedly pass in file descriptor sets or event sets for each call, as select and poll do.

But epoll needs an additional file descriptor to uniquely identify the event table in the kernel

The epoll file descriptor is created as follows:

# includeint epoll_create (int size)

The size parameter doesn't work at all, just gives the kernel a hint of how big the event table needs to be. The file descriptor returned by this function will be used as the first parameter of all other epoll system calls to specify the kernel event table to access.

The following function is used to manipulate epoll's kernel event table:

# includeint epoll_ctl (int epfd, int op, int fd, struct epoll_event * event); returned: 0 for success,-1 for failure, and errno for collocation

The fd parameter is the file descriptor to operate on, and the op parameter specifies the type of operation. There are three types of operations:

EPOLL_CTL_ADD to register events on fd in the event table

EPOLL_CTL_MOD, modifying registration events on fd

EPOLL_CTL_DEL, delete the registration event on fd

Event specifies the event, which is the epoll_event structure pointer type. Epoll_event is defined as follows:

Strcut epoll_event {_ _ uint32_t events; / / epoll event epoll_data_t data; / / user data}

Where the events member describes the event type. The event types supported by epoll are basically the same as poll. Macros that represent epoll event types are preceded by "E" before the corresponding macros of poll, for example, the data readable event of epoll is EPOLLIN.

Epoll has two additional event types-EPOLLET and EPOLLONESHOT. They are critical to the efficient operation of epoll.

Data members are used to store user data and are a consortium:

Typedef union epoll_data {void * ptr; int fd; uint32_t U32; uint64_t U64;} epoll_data_t

The most frequently used of these four members is fd, which specifies the target file descriptor to which the event belongs.

The main interface of the epoll series of system calls is the epoll_wait function, which waits for a set of events on the file descriptor during a timeout period. The prototype is as follows:

# includeint epoll_wait (int epfd, struct epoll_event * events, int maxevents, int timeout); return: the number of ready file descriptors is returned successfully,-1 is returned on failure, and errnoo is collocated.

The maxevents parameter specifies the maximum number of events to listen for, which must be greater than 0

If the event_wait function detects an event, it copies all ready events from the kernel event table (specified by the epfd parameter) into the array pointed to by its second parameter, events. This array is only used to output the ready events detected by epoll_wait, unlike the array parameters of select and poll, which are used both to pass in user registered events and to output ready events detected by the kernel. This greatly improves the efficiency of indexing ready file descriptors in the application.

The following code shows the difference in usage between poll and epoll:

/ / how to index the ready file descriptor int ret = poll (fds, MAX_EVENT_NUMBER,-1) returned by poll; / / you must traverse all registered file descriptors and find the ready person for (MAX_EVENT_NUMBER; I = 0; I < MAX_EVENT_NUMBER; + + I) {if (FDS [I] .revents & POLLIN) / / determine whether the I file descriptor is ready {int sockfd = FDS [I] .fd / / deal with sockfd}} / / how to index the file descriptor returned by epoll int ret = epoll_wait (epollfd, events, MAX_EVENT_NUMBER,-1); / / only traverse the ready ret file descriptors for (int I = 0; I < ret; + + I) {int sockfd = events.data.fd; / / sockfd is definitely ready, direct processing}

LT and ET mode

LT (Level Trigger) mode: is the default working mode, in which epoll is equivalent to a more efficient poll. When epoll_wait detects that an event has occurred on it and notifies the application of the event, the application can not deal with the event immediately. In this way, the next time the application invokes epoll_wait, epoll_wait also advertises this event to the application again.

ET (Edge Trigger, edge trigger) mode. For a file descriptor in ET working mode, when epoll_wait detects an event on it and notifies the application of the event, the application must handle the event immediately, because subsequent epoll_wait calls will no longer notify the application of the event.

The ET pattern greatly reduces the number of times the same epoll event is triggered repeatedly. Therefore, it is more efficient than LT mode.

Every file descriptor that uses ET mode should be non-blocking. If the file descriptor is blocked, the read or write operation will remain blocked (hungry) because there is no follow-up time.

EPOLLONESHOT event

Even with ET mode, an event on a socket can still be triggered multiple times. This causes a problem in concurrent programs. For example, after reading the data on a socket, a thread (or process) begins to process the data, and in the process of data processing, there is new data to read on the socket (EPOLLIN is triggered again), and another thread is awakened to read the new data. So there is a scene where two threads operate on a socket at the same time. This is certainly not what we expected. What we expect is that a socket connection is processed by only one thread at any one time.

For a file descriptor with a registered EPOLLONESHOT event, the operating system triggers at most one readable, writable, or abnormal event registered on it, and only once, unless we use the epoll_ctl function to reset the EPOLLONESHOT event on the file descriptor. In this way, when one thread is dealing with a socket, it is impossible for other threads to have a chance to manipulate the socket. But on the other hand, as soon as the socket that registers the EPOLLONESHOT event is processed by a thread, the thread should immediately reset the EPOLLONESHOT event on the socket to ensure that the next time the socket is readable, its EPOLLIN event can be triggered, thus giving other worker threads a chance to continue processing the socket.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.