In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-02-24 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Development >
Share
Shulou(Shulou.com)06/03 Report--
This article mainly introduces "what is the underlying principle of Java NIO". In daily operation, I believe many people have doubts about what the underlying principle of Java NIO is. The editor consulted all kinds of materials and sorted out simple and easy-to-use methods of operation. I hope it will be helpful for you to answer the doubts about "what is the underlying principle of Java NIO?" Next, please follow the editor to study!
I. the principle of Java IO reading and writing
Whether it is the reading and writing of Socket or the reading and writing of files, the application development at the Java level or the underlying development of linux system all belong to the processing of input input and output output, which is referred to as IO read and write. It is consistent in principle and processing flow. The difference lies in the parameters.
User programs to read and write IO, basically will use read&write two major system calls. May be different operating systems, the name is not exactly the same, but the function is the same.
Let's first emphasize a basic knowledge: read system calls do not read data directly from the physical device to memory. The write system call does not write data directly to the physical device.
The read system call copies data from the kernel buffer to the process buffer, while the write system call copies data from the process buffer to the kernel buffer. Neither of these system calls is responsible for the exchange of data between kernel buffers and disks. The underlying read-write exchange is done by the operating system kernel kernel.
1. Kernel buffer and process buffer
The purpose of the buffer is to reduce frequent system IO calls. As we all know, the system call needs to save the previous process data and status and other information, and when it comes back after the call ends, it also needs to recover the previous information. In order to reduce the loss of time and performance of the system call, there is a buffer.
With buffers, the operating system uses the read function to copy data from the kernel buffer to the process buffer, and write copies the data from the process buffer to the kernel buffer. Wait for the buffer to reach a certain number, and then make a call to IO to improve performance. It is up to the kernel to decide when to read and store, and the user program does not need to care.
In linux systems, the system kernel also has a buffer called the kernel buffer. Each process has its own independent buffer, called the process buffer.
Therefore, the IO reader of the user program, in most cases, does not perform the actual IO operation, but reads and writes its own process buffer.
2. The underlying process of java IO read and write
When the user program reads and writes IO, it basically uses the system call read&write,read to copy the data from the kernel buffer to the process buffer, and write copies the data from the process buffer to the kernel buffer, which is not equivalent to the exchange of data between the kernel buffer and the disk.
Insert a picture description here
First, look at the typical process of a typical Java server processing a network request:
(1) client request: Linux reads the request data broken by the client through the network card and reads the data into the kernel buffer.
(2) get request data: the server reads data from the kernel buffer to the Java process buffer.
(3) Server-side business processing: the Java server handles client requests in its own user space.
(4) data returned by the server: the response that has been constructed by the Java server is written from the user buffer to the system buffer.
(5) send it to the client: the Linux kernel writes the data in the kernel buffer to the network card through the network Imax O, and the network card sends the data to the target client through the underlying communication protocol.
2. Four main IO models
Server-side programming often requires the construction of high-performance IO models. There are four common IO models:
1. Synchronous blocking IO (Blocking IO)
First, explain the blocking and non-blocking here:
Blocking IO means that you need to complete the kernel IO operation before returning to user space to perform user operations. Blocking refers to the execution status of the user-space program, which has to wait until the IO operation is completed. The traditional IO model is synchronous blocking IO. In java, the socket created by default is blocked.
Second, explain synchronous and asynchronous:
Synchronous IO is a way to initiate calls in user space and kernel space. Synchronous IO means that the user space thread initiates the IO request actively and the kernel space is the passive receiver. Asynchronous IO, in turn, means that the kernel kernel is the party that initiates the IO request, and the user thread is the passive recipient.
two。 Synchronous non-blocking IO (Non-blocking IO)
Non-blocking IO means that the user program does not need to wait for the kernel IO operation to complete, the kernel immediately returns a status value to the user, and the user space does not need to wait for the kernel IO operation to be completed completely, so it can immediately return to the user space and perform the user's operation, which is in a non-blocking state.
To put it simply: blocking means that the user space (calling thread) has been waiting and doing nothing else; non-blocking means that user space (calling thread) gets the state and returns, and the IO operation can be done and cannot be done.
Non-blocking IO requires that socket be set to NONBLOCK.
To emphasize, the NIO (synchronous non-blocking IO) model mentioned here is not Java's NIO (New IO) library.
3. IO Multiplexing (IO Multiplexing)
The classic Reactor design pattern, sometimes called Asynchronous blocking IO,Java, is the same model for Selector in Linux and epoll in Linux.
4. Asynchronous IO (Asynchronous IO)
Asynchronous IO refers to the reverse invocation of user space and kernel space. The user space thread becomes passively accepted, and the kernel space is the active caller.
At this point, it is similar to the typical pattern in Java is the callback mode, in which the user-space thread registers callback functions for various IO events to the kernel space, and the kernel invokes them actively.
3. Synchronous blocking IO (Blocking IO)
In the Java process in linux, all socket is blocking IO by default. In the blocking Istroke O model, the application is blocked from the start of the IO system call to the time the system call returns. After a successful return, the application process begins to process the cached data in user space.
Insert a picture description here
For example, Chestnut, initiate a read read operating system call to blocking socket. The process goes something like this:
(1) when the user thread calls the read system call, the kernel (kernel) begins the first phase of IO: preparing the data. In many cases, when the data has not arrived at the beginning (for example, a full Socket packet has not been received), the kernel has to wait for enough data to arrive.
(2) when kernel waits until the data is ready, it copies the data from the kernel kernel buffer to the user buffer (user memory), and then kernel returns the result.
(3) from the beginning of the read system call for IO reading, the user thread enters the blocking state. It is not until the kernel returns the result that the user thread releases the state of block and starts running again.
Therefore, the characteristic of blocking IO is that during the two stages of kernel IO execution, the user thread is block.
Advantages of BIO:
The program is simple, and the user thread hangs while blocking waiting for data. User threads basically do not consume CPU resources.
Disadvantages of BIO:
In general, each connection is equipped with a separate thread, or a thread maintains the read and write of a successfully connected IO stream. In the case of small concurrency, there is no problem with this. However, when a large number of threads are needed to maintain a large number of network connections in high concurrency scenarios, the overhead of memory and thread switching will be very huge. Therefore, basically, the BIO model is not available in high concurrency scenarios.
Fourth, synchronous non-blocking NIO (None Blocking IO)
Under the linux system, you can change it to non-blocking by setting socket. In the NIO model, once an application starts an IO system call, the following two situations occur:
In the case of no data in the kernel buffer, the system call returns immediately, returning a call failure message.
In the case of data in the kernel buffer, it is blocked until the data is copied from the kernel buffer to the user process buffer. After the replication is completed, the system call returns success, and the application process begins to process the cached data in user space.
Insert a picture description here
Give me a chestnut. Initiate a read read operating system call to non-blocking socket. The process looks like this:
When the kernel data is not ready, when the user thread initiates an IO request, it returns immediately. The user thread needs to make IO system calls constantly.
After the kernel data arrives, the user thread initiates the system call and the user thread blocks. The kernel begins to copy data. It copies the data from the kernel kernel buffer to the user buffer (user memory), and kernel returns the result.
The user thread releases the state of the block and starts running again. After many attempts, the user thread finally reads the data and continues to execute.
The characteristics of NIO: the thread of the application needs to constantly make the Imax O system call to poll whether the data is ready, and if not, continue to poll until the system call is completed.
The advantage of NIO: each IO system call initiated can be returned immediately while the kernel is waiting for data. The user thread will not block, and the real-time performance is good.
Disadvantages of NIO: need to repeatedly initiate IO system calls, this constant polling, will continue to ask the kernel, which will take up a lot of CPU time, low utilization of system resources.
In short, the NIO model is not available in high concurrency scenarios. This IO model is not commonly used by Web servers. In general, this model is rarely used directly, but the non-blocking IO feature is used in other IO models. This IO model will not be involved in the actual development of java.
Again, Java NIO (New IO) is not a NIO model in the IO model, but another model, called IO Multiplexing Model (IO multiplexing).
5. IO multiplexing model (Icano multiplexing)
How to avoid the problem of polling waiting in synchronous non-blocking NIO model? This is the IO multiplexing model.
IO multiplexing model, through a new system call, a process can monitor multiple file descriptors, once a descriptor is ready (generally kernel buffer readable / writable), the kernel kernel can inform the program to make the corresponding IO system call.
Currently, system calls that support IO multiplexing include select,epoll and so on. Select system call is currently supported on almost all operating systems and has good cross-platform characteristics. Epoll is proposed in the linux 2.6 kernel and is an enhanced version of linux for select system calls.
The basic principle of IO multiplexing model is select/epoll system call. A single thread constantly polls hundreds of socket connections that select/epoll system calls are responsible for, and returns these read-write connections when data arrives from one or some socket network connections. Therefore, the benefit is obvious-one or even hundreds of network connections that can be read and written can be found in a single select/epoll system call.
Give me a chestnut. Initiate a read read operating system call that multiplexes IO. The process looks like this:
Insert a picture description here
In this mode, the first step is not to make the read system transfer, but to make the select/epoll system call. Of course, there is a premise that the target network connection needs to be registered in advance in select/epoll 's searchable socket list. Then, you can start the entire process of reading the IO multiplexing model.
(1) make a select/epoll system call to query the connections that can be read. Kernel will query the searchable socket list of all select, and when the data in any socket is ready, select will return.
When the user process calls select, the entire thread is block.
(2) after the user thread obtains the target connection, it initiates the read system call, and the user thread blocks. The kernel begins to copy data. It copies the data from the kernel kernel buffer to the user buffer (user memory), and kernel returns the result.
(3) the user thread releases the state of the block, and the user thread finally reads the data and continues to execute.
Features of multiplexing IO:
IO multiplexing model is based on the multiplexing system call select/epoll that can be provided by the operating system kernel kernel. Multiplexing IO requires two system calls (system call), a select/epoll query call and a read call to IO.
Similar to the NIO model, multiplexing IO requires polling. The thread responsible for select/epoll query invocation needs to do select/epoll polling constantly to find out the connections that can perform IO operations.
In addition, the multiplexing IO model is related to the previous NIO model. For each socket that can be queried, it is generally set to the non-blocking model. It's just that this is transparent (unaware) to the user program.
Advantages of multiplexing IO:
The advantage of using select/epoll is that it can handle thousands of connection at the same time. Compared with a thread to maintain a connection, the biggest advantage of Icano multiplexing technology is that the system does not have to create threads or maintain them, thus greatly reducing the overhead of the system.
Java's NIO (new IO) technology uses the IO multiplexing model. On linux systems, epoll system calls are used.
Disadvantages of multiplexing IO:
In essence, select/epoll system calls are synchronous IO and blocking IO. Both need to be responsible for reading and writing after the read-write event is ready, that is to say, the read-write process is blocked.
How to fully unblock the thread? That's the asynchronous IO model.
VI. Asynchronous IO Model (asynchronous IO)
How can we further improve efficiency and remove the last bit of congestion? This is the asynchronous IO model, the full name of asynchronous I IO O, abbreviated as AIO.
The basic flow of AIO is: the user thread tells the kernel kernel to start an IO operation through the system call, and the user thread returns. After the entire IO operation (including data preparation and data replication) is completed, the kernel kernel notifies the user program and the user performs subsequent business operations.
The data preparation of kernel is to read the data from the network physical device (network card) to the kernel buffer; the data replication of kernel is to copy the data from the kernel buffer to the buffer of the user's program space.
Insert a picture description here
(1) when the user thread calls the read system call, it can immediately start to do something else, and the user thread does not block.
(2) the kernel begins the first phase of IO: preparing data. When kernel waits until the data is ready, it copies the data from the kernel kernel buffer to the user buffer (user memory).
(3) kernel will send a signal to the user thread, or call back the callback interface registered by the user thread to tell the user thread that the read operation is complete.
(4) the user thread reads the data of the user buffer and completes the subsequent business operation.
Features of the asynchronous IO model:
During the two phases of the kernel kernel waiting for data and copying data, the user thread is not block. The user thread needs to accept the event completed by the IO operation of kernel, or register the callback function completed by the IO operation, to the kernel of the operating system. Therefore, asynchronous IO is sometimes called signal-driven IO.
Disadvantages of the asynchronous IO model:
The registration and delivery of events need to be completed, which requires a lot of support from the underlying operating system to do a lot of work.
At present, the real asynchronous I hand O is realized through IOCP in Windows system. However, as far as the current industry form is concerned, Windows system is rarely used as a server operating system for millions or highly concurrent applications.
Under the Linux system, the asynchronous IO model was introduced in version 2.6, which is not perfect at present. Therefore, this is also under Linux, the realization of high concurrency network programming is based on the IO reuse model mode.
At this point, the study of "what is the underlying principle of Java NIO" is over. I hope to be able to solve your doubts. The collocation of theory and practice can better help you learn, go and try it! If you want to continue to learn more related knowledge, please continue to follow the website, the editor will continue to work hard to bring you more practical articles!
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.