In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-01-15 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Development >
Share
Shulou(Shulou.com)06/03 Report--
This article mainly talks about "the comparison between Java IO model and Java network programming model". Interested friends may wish to have a look. The method introduced in this paper is simple, fast and practical. Let's let the editor take you to learn the comparison between the Java IO model and the Java network programming model.
Introduction of IO Model
Author: cooffeelis
Link: https://www.jianshu.com/p/511b9cffbdac
Source: brief Book
The copyright belongs to the author. Commercial reprint please contact the author for authorization, non-commercial reprint please indicate the source.
Five commonly used IO models:
Blocking IO
Nonblocking IO
IO multiplexing
Signal driven IO
Asynchronous IO
Let's talk about the objects and steps involved in the occurrence of IO:
For a network IO (here we use read as an example), it involves two system objects:
One is to call the process (or thread) of the IO
One is the system kernel (kernel)
When an read operation occurs, it goes through two phases:
Wait for data preparation, such as accept (), recv () wait for data (Waiting for the data to be ready)
Copy the data from the kernel to the process, such as accept () receives the request. After receiving the data sent by the connection, recv () needs to copy to the kernel, and then copy from the kernel to the process user space (Copying the data from the kernel to the process).
For socket streams, the flow of data goes through two phases:
The first step usually involves waiting for data packets on the network to arrive and then being copied to a buffer in the kernel.
The second step copies the data from the kernel buffer to the application process buffer.
It's important to remember these two points, because the difference between these IO Model is that there are different situations in each of the two phases.
Blocking IPUBO (blocking IO)
In linux, all socket is blocking by default, and a typical read flow would look something like this:
Blocking IO flow
When the user process calls the recvfrom system call, kernel begins the first phase of IO: preparing the data (for network IO, many times the data hasn't arrived in the first place. For example, a complete UDP package has not been received yet. At this point, kernel has to wait for enough data to arrive. This process requires waiting, which means that it takes a process for the data to be copied into the buffer of the operating system kernel. On the user's side of the process, the entire process is blocked (of course, blocked by the process's own choice). When the kernel waits until the data is ready, it copies the data from the kernel to the user's memory, and then the kernel returns the result, and the user process releases the block state and starts running again.
Therefore, the characteristic of blocking IO is that it is block in both stages of IO implementation.
Non-blocking IBG O (nonblocking IO)
Under linux, you can change it to non-blocking by setting socket. When performing a read on a non-blocking socket, the process looks like this:
Non-blocking Icano process
When the user process issues a read operation, if the data in the kernel is not ready, it does not block the user process, but immediately returns an error. From the point of view of the user process, it initiates a read operation without waiting, but gets a result immediately. When the user process determines that the result is an error, it knows that the data is not ready, so it can send the read operation again. Once the data in the kernel is ready and the system call of the user process is received again, it immediately copies the data into the user's memory and returns.
Therefore, the characteristic of nonblocking IO is that the user process needs to ask whether the kernel data is ready or not.
It is worth noting that the non-blocking IO at this time is only applied to waiting data. When the real data arrives to execute the recvfrom, it blocks the IO synchronously, as can be seen from the copy data from kernel to user in the figure.
Ipaw O Multiplexing (IO multiplexing)
IO multiplexing is what we call select,poll,epoll, and in some places this way of IO is called event driven IO. The advantage of select/epoll is that a single process can handle IO of multiple network connections at the same time. Its basic principle is that select,poll,epoll, the function, will constantly poll all the socket in charge, and notify the user process when a socket has data arriving.
Ipaw O multiplexing process
This diagram is not very different from blocking IO's. In fact, it's even worse. Because here you need to use two system call (select and recvfrom), and blocking IO calls only one system call (recvfrom). However, the advantage of using select is that it can handle multiple connection at the same time.
Therefore, if the number of connections handled is not very high, web server with select/epoll may not necessarily perform better than web server with multi-threading + blocking IO, and may have a greater latency. The advantage of select/epoll is not that it can handle a single connection faster, but that it can handle more connections. )
At present, the main implementation methods of IO reuse are select, poll and epoll.
The principles of select and poll are basically the same:
Register the fd to listen on (it is best to use non-blocking when creating the fd here)
Each call checks the status of these fd and returns when one or more fd is ready
Fd that is ready and not ready is included in the returned result.
Compared with select,poll, it solves the problem that there is a limit on the number of file descriptors that a single process can open: select is limited by FD_SIZE, and if you modify it, you need to modify this macro to recompile the kernel; while poll passes events of concern to the kernel through a pollfd array, avoiding the limit on the number of file descriptors.
In addition, a big disadvantage of select and poll is that the array containing a large number of fd is copied between the user state and kernel state address space as a whole, and the overhead increases linearly with the increase of the number of fd.
Select and poll are similar to the dining style mentioned above. But when you ask every time, the boss will poll all the meals you ordered and tell you the situation, which is inefficient when a large number of meals are not ready for a long time. As a result, the boss became a little impatient and asked the cook to inform him of every dish he had cooked. In this way, every time you ask again, he will directly tell you the dishes that have been prepared, and you will serve them again. This is how event-driven IO Readiness Notification-epoll.
The emergence of epoll solves the shortcomings of select and poll:
Based on the event-driven approach, it avoids scanning all fd every time.
Epoll_wait returns only the ready fd.
Epoll uses nmap memory mapping technology to avoid the overhead of memory replication.
The maximum number of fd for epoll is the maximum number of file handles for the operating system, which is generally related to memory and is usually much larger than 1024.
At present, epoll is the most efficient way of IO reuse under Linux2.6, and it is also the IO implementation of Nginx and Node. Under freeBSD, kqueue is another way of IO reuse similar to epoll.
In addition, there is a concept of horizontal trigger and edge trigger for IO reuse:
Horizontal trigger: when the ready fd is not processed by the user process, the next query will still be returned, which is how select and poll are triggered.
Edge trigger: no matter whether the ready fd is processed or not, it will not be returned next time. It has higher performance in theory, but the implementation is quite complex, and any unexpected loss event can result in request processing errors. Epoll uses horizontal triggers by default, and edge triggers can be used with the appropriate options.
Comments:
The characteristic of select O multiplexing is that a process can wait for multiple file descriptors at the same time through a mechanism, and any one of these file descriptors (socket descriptors) enters the read-ready state, and the file () function can return.
Therefore, IO multiplexing essentially does not have the function of concurrency, because there is only one process or thread working at any time. It can improve efficiency because select\ epoll puts the incoming socket into their watch list. When any socket has readable and writable data to be processed immediately, if select\ epoll detects a lot of socket at the same time, it will be returned to the process as soon as there is any movement. It's better than a socket to come over, block waiting, and deal with it more efficiently.
Of course, it can also be multi-threaded / multi-process mode, one connection to open a process / thread processing, so that the consumed memory and process switching pages will consume more system resources.
So we can combine IO multiplexing and multiprocess / multithreading to achieve high performance concurrency. IO multiplexing is responsible for improving the efficiency of receiving socket notifications. After receiving the request, it is handed over to the process pool / thread pool to deal with the logic.
Signal drive
The above way of eating still requires you to ask about the food every time. So, when you get impatient again, tell your boss which meal is ready, please let me know. And then sit at the table and do your own thing. What's more, you can leave your cell phone number to your boss, go out by yourself, and send you a text message directly when the food is ready. This is similar to the signal-driven IO model.
The process is as follows:
Turn on socket signal-driven IO function
The system calls sigaction to execute the signal processing function (non-blocking, return immediately)
When the data is ready, the sigio signal is generated and the data is read through the signal callback notification application.
There is a big problem with this io mode: the signal queue in Linux is limited, and if you exceed this number, you will not be able to read data.
Asynchronous non-blocking Asynchronous Istroke O (asynchronous IO)
Asynchronous IO under linux is rarely useful. Let's take a look at its flow:
Asynchronous IO process
As soon as the user process initiates the read operation, it can immediately start doing other things. On the other hand, from the perspective of kernel, when it receives an asynchronous read, it first returns immediately, so it does not generate any block to the user process. Kernel then waits for the data to be ready, and then copies the data to the user's memory, and when all this is done, kernel sends a signal to the user process telling it that the read operation is complete.
The difference and relation between blocking IO, non-blocking IO, synchronous IO and Asynchronous IO
Blocking IO VS non-blocking IO:
Concept:
Blocking and non-blocking are concerned with the state of the program while waiting for the result of the call (message, return value).
A blocking call means that the current thread is suspended before the result of the call is returned. The calling thread does not return until it gets the result. A non-blocking call means that the call does not block the current thread until the result is not immediately available.
Example: you call the bookstore owner to ask if there is a book called "distributed system". If you make a blocking call, you will "suspend" yourself until you get the result of the book. If it is a non-blocking call, no matter whether the boss tells you or not, you go to play by yourself, and of course you have to check every once in a few minutes whether the boss returns the result or not. Here blocking and non-blocking have nothing to do with synchronization and asynchronism. It has nothing to do with the way the boss answers you.
Analysis:
Blocking IO will block the corresponding process until the operation is complete, while non-blocking IO will return as soon as kernel still prepares data.
Synchronous IO VS Asynchronous IO:
Concept:
Synchronous and asynchronous synchronous and asynchronous are concerned with the message communication mechanism (synchronous communication/ asynchronous communication) called synchronization, which means that when a call is made, it does not return until the result is obtained. But once the call returns, you get the return value. In other words, the caller actively waits for the result of the call. Asynchronism, on the other hand, returns directly after the call is made, so no result is returned. In other words, when an asynchronous procedure call is made, the caller does not get the result immediately. Instead, after the call is made, the callee notifies the caller through status, notification, or handles the call through a callback function.
Typical asynchronous programming model such as Node.js to give a popular example: you call the bookstore owner to ask if there is a book called "distributed system". If it is a synchronous communication mechanism, the bookstore owner will say, wait a moment, "I'll check", and then start checking and checking, and when it's done (it could be 5 seconds or one day), I'll let you know the result (return result). As for the asynchronous communication mechanism, the bookstore owner directly told you that I would check it out, call you when it was done, and then hang up directly (without returning the result). Then check it out, he will take the initiative to call you. Here the boss callback through "call back" this way.
Analysis:
Before explaining the difference between synchronous IO and asynchronous IO, you need to define both. The definition given by Stevens (which is actually the definition of POSIX) goes like this:
A synchronous I/O operation causes the requesting process to be blocked until that I/O operation completes
An asynchronous I/O operation does not cause the requesting process to be blocked
The difference between the two is that synchronous IO will block process when it does "IO operation". According to this definition, blocking IO, non-blocking IO, and IO multiplexing described earlier all belong to synchronous IO.
Some people might say that non-blocking IO is not block. There is a very "tricky" place here. The "IO operation" in the definition refers to the real IO operation, which is the system call of recvfrom in the example. When non-blocking IO executes the recvfrom system call, if the kernel data is not ready, the process will not be block at this time. However, when the data in kernel is ready, recvfrom copies the data from kernel to the user's memory, when the process is block, and during this time, the process is block.
Asynchronous IO, on the other hand, is different. When the process initiates the IO operation, it simply returns and ignores it until the kernel sends a signal to the process that the IO is complete. Throughout this process, the process is not block at all.
The image example of IO model
Finally, give a few more inappropriate examples to illustrate these four IO Model:
There are four people who are fishing.
A uses the most old-fashioned fishing rod, so you have to keep it until the fish is hooked before pulling the rod
B's fishing rod has a function that can show whether any fish have taken the bait, so B will chat with the MM next to him and see if any fish take the bait every other time. If so, pull the rod quickly.
C used a fishing rod similar to that of B, but he came up with a good idea, that is, he put several rods at the same time, and then stood by. Once it was shown that the fish had taken the bait, he pulled up the corresponding rod.
D is a rich man, so he simply hired a man to help him fish. Once that person caught the fish, he sent a text message to D.
Select/Poll/Epoll polling mechanism
Select,poll,epoll is essentially synchronous Ibank O, because they all need to be responsible for reading and writing after the read-write event is ready, that is to say, the read-write process is blocked.
Select/Poll/Epoll is the implementation of IO reuse. As mentioned above, if you use IO reuse, you will set socket to non-blocking and then put it into Select/Poll/Epoll 's respective watch list. Then, what is their monitoring mechanism for whether there is any data arrival in socket? What about efficiency? Which way should we use to achieve IO reuse? The following lists their respective implementation methods, efficiency, advantages and disadvantages:
(1) the select,poll implementation needs to poll all fd collections on its own until the device is ready, during which sleep and wake may be alternated several times. In fact, epoll also needs to call epoll_wait to continuously poll the ready list, and may alternate sleep and wake up many times during this period, but it calls the callback function when the device is ready, puts the ready fd into the ready list, and wakes up the process that goes to sleep in the epoll_wait. Although both sleep and alternate, select and poll traverse the entire fd collection when they are awake, while epoll only needs to determine whether the ready list is empty while awake, which saves a lot of CPU time. This is the performance improvement brought about by the callback mechanism.
(2) each time select,poll calls, it copies the fd collection from user mode to kernel mode, and hangs the current in the device waiting queue once, while epoll only copies once, and hangs current on the waiting queue only once (at the beginning of epoll_wait, note that the waiting queue here is not a device waiting queue, but a waiting queue defined internally by epoll). It can also save a lot of money.
Java network programming model
Five IO models for the UNIX environment are described above. Based on these five models, with the introduction of NIO and NIO2.0 (AIO) in Java, there are generally the following network programming models:
BIO
NIO
AIO
BIO
BIO is a typical network programming model, which is usually the process of implementing a server-side program. The steps are as follows:
Main thread accept request blocking
The request arrives and a new thread is created to process the socket and complete the response to the client.
The main thread continues with accept's next request
A big problem with this model is that when the number of client connections increases, the number of threads created by the server will soar, and the system performance will decline sharply. Therefore, based on this model, bio connector, similar to tomcat, uses a thread pool to avoid creating a thread for each client. In some places this approach is called pseudo-asynchronous IO (throwing the request into the thread pool and waiting asynchronously for processing).
NIO
JDK1.4 began to introduce the NIO class library, where NIO refers to New IO, mainly using Selector multiplexer. Selector is implemented through epoll on mainstream operating systems such as Linux.
The implementation process of NIO, which is similar to select:
Create a ServerSocketChannel listening client connection and bind the listening port to non-blocking mode.
Create Reactor threads, create a multiplexer (Selector), and start the thread.
Register the ServerSocketChannel with the Selector of the reader thread. Listen for accept events.
Selector polls the ready Key wirelessly in the thread run method.
Selector listens for new client access, processes new requests, completes the tcp three-way handshake, and establishes a physical connection.
Register the new client connection to the Selector and listen for read operations. Read the network message sent by the client.
When the data sent by the client is ready, the client request is read and processed.
Programming is very complicated compared to BIO,NIO.
AIO
JDK1.7 introduces NIO2.0, which provides the implementation of asynchronous file channel and asynchronous socket channel. The underlying layer is implemented through IOCP on windows and epoll on Linux (LinuxAsynchronousChannelProvider.java,UnixAsynchronousServerSocketChannelImpl.java).
Create AsynchronousServerSocketChannel and bind listening port
Call the accpet method of AsynchronousServerSocketChannel and pass in the CompletionHandler you implemented. Including the previous step, are all non-blocking
When the connection is passed in, the completed method of CompletionHandler is called back, in which the read method of AsynchronousSocketChannel is called, and the CompletionHandler responsible for processing the data is passed.
The data is ready to trigger the completed method of the CompletionHandler responsible for processing the data. Just move on to the next step.
The write operation is similar, and you need to pass in CompletionHandler.
Its programming model is much simpler than that of NIO.
Contrast. Number of synchronous blocking IO pseudo-asynchronous IONIOAIO clients: IO thread 1: 1m: nm: 1m: 0IO model synchronous blocking IO synchronous blocking IO synchronous non-blocking IO asynchronous non-blocking IO throughput low, medium, high, high programming complexity simple very complex to this, I believe you have a deeper understanding of the "Java IO model and Java network programming model comparison", might as well come to the actual operation! Here is the website, more related content can enter the relevant channels to inquire, follow us, continue to learn!
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.