In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-01-16 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Development >
Share
Shulou(Shulou.com)06/03 Report--
This article mainly explains "how to talk about the IO multiplexing model from the bottom". Interested friends may wish to take a look. The method introduced in this paper is simple, fast and practical. Let's let the editor take you to learn how to talk about the IO multiplexing model from the bottom.
Preface
When we went to the interview, we asked redis,nginx,netty what their underlying models were.
Redis-> epoll
Nginx- > epoll
Netty- > epoll?
We need to talk about it from the level of operating system.
BIO
When we boot, the first thing that is loaded into memory is our Kernel (kernel), which is used to manage our hardware. At the same time, the kernel also creates a GDT table, and then divides it into two spaces (user space and kernel space). At the same time, the contents of the space are enabled in protected mode and cannot be modified.
At the same time, there is a concept of CPU, CPU has its own instruction set, and the instruction set is divided into several levels, respectively, from 0 to 3, Kernel belongs to level 0. APP can only use instruction sets at level 3.
From the above, we can know that our application can not directly access our Kernel, that is, the program can not directly access our disk, sound card, network card and other devices, only the kernel can access, then what do we do?
Only APP gets the contents of the hardware by calling syscall (system soft interrupts and hard interrupts) provided by Kernel.
Soft interrupt
Hard interrupt: hard interrupt refers to our keyboard. When a key is pressed, it will trigger our hard interrupt, that is, the kernel will have an interrupt number and get a callback function of callback.
The purpose of talking about this is to introduce a concept, that is, the cost between IO and the kernel.
/ * * the server reads the file * @ author: Mo Xi * @ create: 2020-07-01-20:40 * / public class TestSocket {public static void main (String [] args) throws IOException {ServerSocket server = new ServerSocket (8090); System.out.println ("step1: new ServerSocket (8090)"); while (true) {Socket client = server.accept () System.out.println ("step2: client" + client.getPort ()); new Thread (()-> {try {InputStream in = client.getInputStream (); BufferedReader reader = new BufferedReader (new InputStreamReader (in) While (true) {System.out.println (reader.readLine ());}} catch (IOException e) {e.printStackTrace ();}}, "T1") .start ();}
Does the crawler make a system call to the kernel, and then output
Strace-ff-o. / ooxx java TestSocket
Then we execute the above program and get our results.
Then we are looking at the process number of the current TestSocket through the jps command
Jps 2912 Jps 2878 TestSocket
Then when we enter the directory below, startup 2878 is the id number of the thread, which is where some information of the thread is stored.
Cd / proc/2878
We can see under the 2878 process, by looking at the task directory, we can see all the threads
There is also a directory, the fd directory, where there are some of our IO streams
The above 0BI 1 and 2 correspond to the input stream, output stream and error stream, respectively. In java, our streams are objects, while in linux systems, streams are files. The following 4 and 5 correspond to our socket communications, corresponding to ipv4 and ipv6, respectively.
View through the netstat command
Then we use nc to connect to port 8090
Nc localhost 8090
After we finished execution, we checked through the netstat command and found that there was an extra connection status.
Then, in the file, there is also an extra socket.
We look at the system call and find that a 58181 port number request has been received through the system call. Previously, we can also see 5, which is actually the socket in the figure above, using ipv4.
From here, we can actually know that the code we originally wrote in the call
Socket client = server.accept ()
Corresponding to the system level, the method of the system is also called.
At the same time, there are several ways to make system calls
Bind
Connect
Listen
Select
Socket
First of all, we need to know that java is actually an interpreted language, convert our .java file into bytecode files through the JVM virtual machine, and then call our syscall method in os. We must make it clear that no matter how we call it, we must finally call the kernel method, and then call our hardware.
The above model, which is the communication of BIO, is that there is a lot of blocking, and we can only avoid the blocking of the main thread through multiple threads. But as we can know from the above, if there are a large number of connections, the server needs to create many threads corresponding to it, and the creation of threads also needs to consume resources, because the stack used by threads is exclusive (the stack size defaults to 1MB), and the resource scheduling of CPU also needs to be wasted.
The most fundamental reason is that the above problem is caused by the blocking of BIO.
NIO
Because BIO has the problem of thread blocking, the concept of NIO is put forward later. In NIO, there is the problem of C10K, C10K = 10000 clients. But in the server to which you are connected, there is not much data to send to you, so what we need to do is that whenever someone sends a message, I connect with it.
That is, you need to traverse 10000 clients at a time, which is very time-consuming, because many clients may not send requests.
Multiplexing
At this point, instead of traversing 10K clients, we send our fds file to the kernel, and then the kernel determines which client needs to connect, so we don't have to traverse all of them. So the Select here is the multiplexer, and the state is returned by multiplexing, and then we need the program to determine these states.
To put it bluntly, it is through a multiplexer to determine which paths can go through, and then there is no need to poll all of them.
This model is done by handing the fds file to the kernel through select, that is, the kernel needs to complete the active traversal of 10K files. Compared with the previous 10K system calls, this 10K call is more time-saving and has the following problems.
Transfer a lot of data at a time (repeat work)
Then the kernel needs to actively traverse (complexity O (N))
The solution is to open up a space in the kernel and throw the file into the kernel every time a client comes, so that there is no need to pass 10K files to the kernel at a time. Then using an event-driven model, as shown in the following figure, is an asynchronous event-driven process
Also using epoll,Redis is polling, Nginx is blocking?
When we look at the running flow of nginx and redis through the strace command, we can see that epoll is also used, but nginx is blocking and redis is polling (non-blocking).
First of all, that's because Redis has only one thread, and this thread has to do a lot of things, such as receiving client, LRU,LFU (elimination filtering), RDB/AOF (fork thread for data backup).
In other words, redis also handles the C10K problem in Redis through the event driver of epoll, that is, through epoll, the operations of each client that need to be read are placed in an atomic serialization queue, and a client contains the following operations: read, calculation, write, etc.
In redis 6.x, there is also a concept of IO threads, first of all, in order to retain the characteristics of serialization atomicity, that is, the calculation is still serialized processing, but when reading data, the use of multi-thread for concurrent IO reading. Why do you need multithreaded reading? First of all, because the read operation requires the system call of CPU, if it is read by multiple threads, the multicore role of CPU can be brought into full play.
Nginx only needs to do one thing, and that is to wait for the client to come, and there is no need to do anything else, so it is set to block.
Zero copy
In terms of kafka, first of all, there are two roles, one is the message producer and the other is the message consumer.
In other words, we can reduce kernel system calls by opening up a memory space that can reach the disk directly. When reading, if it is the original practice, you need to request kernel first, then kernel initiates a read request to read the disk file into the kernel, and then kafka reads the information in kernel.
So what is zero copy?
Zero copy means that no copy occurs. The premise of zero copy is that the data does not need to be processed. There is a RandomAccessFile in JVM, which can directly open up an in-heap space or out-of-heap space.
At this point, I believe you have a deeper understanding of "how to talk about the IO multiplexing model from the bottom". You might as well do it in practice. Here is the website, more related content can enter the relevant channels to inquire, follow us, continue to learn!
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.