In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-04-05 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Servers >
Share
Shulou(Shulou.com)06/01 Report--
In this issue, the editor will bring you about how to understand the IO mode of Linux operating system. The article is rich in content and analyzes and narrates it from a professional point of view. I hope you can get something after reading this article.
IO
IO (Input/Output, input / output) is the read (receive) or write (send) operation of data. Usually, a complete IO in the user process is divided into two stages: user process space kernel space, kernel space device space (disk, network, etc.). There are three types of IO: memory IO, network IO and disk IO, and usually we refer to the latter two by IO.
The process in LINUX cannot directly manipulate the Iamp O device, it must assist in completing the Imax O action through the system call request kernel; the kernel maintains a buffer for each Iamp O device.
For an input operation, after the process IO system call, the kernel will first see if there is any cached data in the buffer, and then read it in the device, because the device IO is generally slow and needs to wait; if there is data in the kernel buffer, it will be copied directly to the process space.
Therefore, for a network input operation, there are usually two different phases:
(1) wait for the network data to arrive at the Nic → to read into the kernel buffer, and the data is ready.
(2) copy data from the kernel buffer to the process space.
Key concept understanding
Synchronization: make a call and return when you get the result.
Async: after the call is initiated, the call returns directly; the caller actively asks the callee to get the result, or the callee uses the callback function.
Blocking: the call means that the current thread is suspended before the result of the call is returned. The calling thread does not return until it gets the result.
Non-blocking: a call means that the call does not block the current thread until the result is not immediately available.
Synchronization can be divided into blocking and non-blocking.
Blocking and non-blocking are about how to deal with the results of things (blocking: I'm not leaving until I want the result)
Identify the process status
Understand the state transition of the process
Ready state-> running state: after the process in the ready state is scheduled, it gets the CPU resource (dispatch CPU time slice), so the process changes from the ready state to the running state.
Running state-> ready state: the running process has to give up the CPU after running out of time slices, so that the process changes from the running state to the ready state. In addition, in the disposable operating system, when there is a higher priority process, the scheduling level converts the executing process into a ready state, allowing the higher priority process to execute.
Running state-> blocking state: when a process requests the use and allocation of a resource (such as a peripheral) or waits for an event to occur (such as the completion of the IWeiO operation), it transitions from the running state to the blocking state. The process requests the operating system to provide services in the form of system calls, which is a special form in which the kernel process of the operating system is called by running a user-mode program.
Blocking state-> ready state: when the event that the process is waiting for arrives, such as the end of the Iwhite O operation or the end of the interrupt, the interrupt handler must change the state of the corresponding process from the blocking state to the ready state.
Execute applications from the operating system level to understand the IO model
Blocking IO model:
Summary: the process blocks until the data is copied and the application calls an IO function, causing the application to block and wait for the data to be ready. If the data is not ready, keep waiting. . The data is ready and copied from the kernel to user space, and the IO function returns a success indication. The first network programming we come into contact with starts with interfaces such as listen (), send (), recv (), and so on. Using these interfaces, you can easily build a server / client model.
Block I recvfrom O model diagram: the process of waiting for and copying data in the kernel when the recv () / recvfrom () function is called.
When the recv () function is called, the system first checks to see if there is any prepared data. If the data is not ready, the system is waiting. When the data is ready, the data is copied from the system buffer to user space, and the function returns. In a socket application, when the recv () function is called, the data may not already exist in user space, so the recv () function will be in a waiting state.
Blocking mode brings a big problem to network programming, such as when calling send (), the thread will be blocked, during which the thread will not be able to perform any operations or respond to any network requests. This brings challenges to the network programming of multi-client and multi-service logic. At this point, we may choose a multi-threaded approach to solve this problem.
The simplest solution to multi-client network applications is to use multithreading (or multi-processes) on the server side. The purpose of multithreading (or multi-process) is to have a separate thread (or process) for each connection so that the blocking of any one connection does not affect other connections.
There is no specific mode for the specific use of multiprocess or multithreading. Traditionally, processes are much more expensive than threads, so if you need to serve more clients at the same time, multiple processes are not recommended; if a single service executor consumes more CPU resources, such as large-scale or long-term data operations or file access, the process is more secure.
Non-blocking IO model
Introduction: non-blocking IO repeatedly calls the IO function through the process (multiple system calls and immediately returns); in the process of data copying, the process is blocked; we set a SOCKET interface to non-blocking to tell the kernel that when the requested IUniGUP O operation cannot be completed, do not sleep the process, but return an error. In this way, our Iwhite O operation function will continue to test whether the data is ready, and if not, continue to test until the data is ready. In this continuous testing process, it will take up a lot of CPU time.
IO reuse model:
Introduce: IO multiplexing is what we say select,poll,epoll, some places also call this kind of IO way is event driven IO. The advantage of select/epoll is that a single process can handle IO of multiple network connections at the same time. Its basic principle is that select,poll,epoll, the function, will constantly poll all the socket in charge, and notify the user process when a socket has data arriving.
When the user process calls select, the entire process will be block, and at the same time, kernel will "monitor" all the socket responsible for select, and when the data in any socket is ready, select will return. At this time, the user process invokes the read operation to copy the data from the kernel to the user process.
Therefore, the characteristic of Ipaw O multiplexing is that a process can wait for multiple file descriptors at the same time through a mechanism, and any one of these file descriptors (socket descriptors) enters the read-ready state, and the select () function can return.
Asynchronous IO model
Summary: after the user process initiates the read operation, you can immediately start to do other things. On the other hand, from the perspective of kernel, when it receives an asynchronous read, it first returns immediately, so it does not generate any block to the user process. Kernel then waits for the data to be ready, and then copies the data to the user's memory, and when all this is done, kernel sends a signal to the user process telling it that the read operation is complete.
Distinguishing select poll epoll in IO Multiplexing
Select
Int select (int n, fd_set * readfds, fd_set * writefds, fd_set * exceptfds, struct timeval * timeout); the file descriptors monitored by the select function are divided into three categories, namely writefds, readfds, and exceptfds. After the call, the select function blocks until the descriptor is ready (data is readable, writable, or except), or timeout (timeout specifies the wait time, if the immediate return is set to null), the function returns. When the select function returns, you can find the ready descriptor by traversing the fdset
Poll
Int poll (struct pollfd * fds, unsigned int nfds, int timeout); unlike select, which uses three bitmaps to represent three fdset, poll is implemented using a pointer of one pollfd. There is no maximum number of pollfd (but performance will degrade if the number is too large). Like the select function, when poll returns, you need to poll pollfd to get the ready descriptor.
Epoll
Epoll calls epoll_create to create instances, calls epoll_ctl to add or delete monitored file descriptors, calls epoll_wait to block until there is a ready file descriptor, and returns ready file descriptors and events through epoll_event parameters.
The epoll operation process requires three APIs, as follows: int epoll_create (int size); / / create a handle to epoll. Size is used to tell the kernel how many listeners there are to generate an epoll-specific file descriptor. In fact, it is to apply for a kernel space to store whether and what events occur on the socket fd you want to pay attention to.
Int epoll_ctl (int epfd, int op, int fd, struct epoll_event * event)
Controls events on an epoll file descriptor: registration, modification, deletion. Where the parameter epfd is a file descriptor dedicated to epoll_create () to create the epoll.
Int epoll_wait (int epfd, struct epoll_event * events, int maxevents, int timeout)
Wait for the Imap O event to occur; returns the number of events that occurred. Parameter description:
Epfd: an Epoll-specific file descriptor generated by epoll_create ()
Epoll_event: an array of events to be processed by the callback generation
Maxevents: the number of events that can be handled at a time
Timeout: the timeout value waiting for the iCando event to occur
Differential summary
(1) the select,poll implementation needs to poll all fd collections on its own until the device is ready, during which sleep and wake may be alternated several times. In fact, epoll also needs to call epoll_wait to continuously poll the ready list, and may alternate sleep and wake up many times during this period, but it calls the callback function when the device is ready, puts the ready fd into the ready list, and wakes up the process that goes to sleep in the epoll_wait. Although both sleep and alternate, select and poll traverse the entire fd collection when they are awake, while epoll only needs to determine whether the ready list is empty while awake, which saves a lot of CPU time. This is the performance improvement brought about by the callback mechanism.
(2) every time select,poll calls, it copies the fd collection from user mode to kernel mode. Epoll maps kernel space and user space to the same memory through mmap, eliminating the copy operation.
Application example
Tornado:
Use a single-threaded approach to avoid the performance overhead of thread switching and to avoid thread unsafety when using some function interfaces
Support asynchronous non-blocking network IO model to avoid blocking and waiting of the main process.
Tornado's IOLoop module is the core of the asynchronous mechanism, which contains a series of open file descriptors and a handlers for each descriptor. These handlers are the encapsulation of select, poll, epoll and so on. (so essentially IO reuse)
Django
Instead of asynchrony, concurrency is achieved through the use of multi-process WSGI server (such as uWSGI), which is also a common practice in WSGI.
What knowledge points should Linux backend server developers learn about IO?
Network IO is the blood vessel of network communication, and data is blood. The flow of blood cannot leave the blood vessels.
The above is the editor for you to share how to understand the Linux operating system IO mode, if you happen to have similar doubts, you might as well refer to the above analysis to understand. If you want to know more about it, you are welcome to follow the industry information channel.
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.