In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-04-06 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Servers >
Share
Shulou(Shulou.com)06/01 Report--
This article mainly explains "the principle and application of Java Ipaw O system". The content of the explanation in this article is simple and clear, and it is easy to learn and understand. Please follow the editor's train of thought to study and learn the principle and application of Java Ithumb O system.
I. basic concepts
Before introducing the principle of Icano, let's review a few basic concepts:
1. Operating system and kernel
Operating system: system software that manages computer hardware and software resources
Kernel: the core software of the operating system, responsible for managing the system's processes, memory, device drivers, files, network systems, etc., and providing applications with secure access to computer hardware.
two。 Kernel space and user space
In order to prevent user processes from directly operating the kernel and ensure kernel security, the operating system divides the memory addressing space into two parts: kernel space (Kernel-space), which is used by kernel programs (User-space). For security, kernel space and user space are isolated, even if the user's program crashes, the kernel will not be affected.
3. Data stream
The data in the computer is based on the transmission of high and low voltage signals over time, which are continuous and have a fixed transmission direction, similar to the flow of water in a water pipe. therefore, the concept of abstract data flow (Imax O flow): refers to a set of bytes with order, starting point and end point, abstracting the function of data flow: decoupling program logic from underlying hardware. By introducing data flow as the abstraction layer between programs and hardware devices, programming for general data flow input and output interfaces, rather than specific hardware features, programs and underlying hardware can be replaced and extended flexibly independently.
II. The working principle of Icano
1. Disk IPUBO
A typical Imap O read-write disk works as follows:
Tips: DMA: the full name is Direct memory access (Direct Memory Access), which is a mechanism that allows peripherals (hardware subsystems) to access the main memory of the system directly. Based on the DMA access mode, the data transmission between the main memory of the system and the hardware equipment can save the whole scheduling of CPU.
It is worth noting that:
Read and write operations are implemented based on system call
Read and write operations go through the user buffer, kernel buffer, and the application process cannot directly manipulate the disk.
The application process read operation needs to be blocked until the data is read.
two。 Network Ipaw O
First of all, this paper introduces the most classic blocking Ipicuro model:
Tips:recvfrom, the function of receiving data through socket
It is worth noting that:
The network read and write operation passes through the user buffer and Sokcet buffer.
The server thread is blocked during the period from the call of recvfrom to the time it returns a Datagram ready. After the recvfrom returns successfully, the thread begins to process the Datagram.
3. Java iPUBO design
1. Ipaw O classification
Data flow is materialized and implemented in Java. The following points are generally concerned about Java data flow:
The direction of the stream from the outside to the program is called the input stream; from the program to the outside, it is called the output stream
The data unit program of the stream takes bytes as the minimum read and write data unit, called byte stream, and characters as the minimum read and write data unit, called character stream.
Functional role of the stream
A stream that reads / writes data from / to a specific IO device (such as disk, network) or storage object (such as memory array), called a node stream
Connect and encapsulate an existing stream, and realize the read / write function of the data through the encapsulated stream, which is called processing flow (or filter flow).
2. Icano operation interface
Under the java.io package, there are a bunch of Imaco operation classes, which are easy to understand at beginning. in fact, if you look at them carefully, there is still a rule: these IWeiO operation classes are all based on inheriting four basic abstract flows, either node flow or processing flow.
(1) four basic abstract flows
The java.io package contains all the classes needed for streaming Ibank O, and there are four basic abstract streams in the java.io package that deal with byte streams and character streams respectively:
InputStream
OutputStream
Reader
Writer
(2) Node flow
The class name of Node flow I / O consists of Node flow Type + Abstract flow Type. Common node types are:
File file
Piped intra-process thread communication pipeline
ByteArray / CharArray (byte array / character array)
StringBuffer / String (string buffer / string)
The node flow is usually created by passing in the data source in the constructor, for example:
FileReader reader = new FileReader (new File ("file.txt")); FileWriter writer = new FileWriter (new File ("file.txt"))
(3) processing flow
The name of the processing stream Icano class consists of functions encapsulating existing streams + abstract flow types. Common features are:
Buffering: the buffering function is provided for the data read and written by the node stream, and the data can be read and written in batches based on buffering to improve efficiency. BufferedInputStream and BufferedOutputStream are common.
Conversion of byte stream to character stream: implemented by InputStreamReader and OutputStreamWriter
Conversion between byte stream and basic type data: here basic data type data, such as int, long, short, are implemented by DataInputStream and DataOutputStream
Conversion between byte flow and object instance: used for object serialization, implemented by ObjectInputStream and ObjectOutputStream
The processing flow applies the adapter / decoration pattern, transforms / extends the existing flow, and the processing flow is usually created when the constructor passes in the existing node stream or processing flow:
FileOutputStream fileOutputStream = new FileOutputStream ("file.txt"); / / extension provides buffered write BufferedOutputStream bufferedOutputStream = new BufferedOutputStream (fileOutputStream); / / extension provides basic data type write DataOutputStream out = new DataOutputStream (bufferedOutputStream)
3. Java NIO
(1) there is a problem with the standard Istroke O.
Java NIO (New New O) is an IO API that can replace the standard Java I IO API O API (starting from Java 1.4). Java NIO provides a different way of working with the standard Java O in order to solve the following problems existing in the standard Imax O:
a. Multiple copies of data
Standard Icano processing, complete a complete data read and write, at least need to read from the underlying hardware to the kernel space, and then read to the user file, and then write to the kernel space from the user space, and then write to the underlying hardware.
In addition, the bottom layer needs to pass in the starting address and length of the buffer where the data is located when the underlying system call is made by write, read and other functions. Due to the existence of JVM GC, the position of the object in the heap will often move, and the address parameters passed into the system function after moving are not the real buffer address.
It may lead to read and write errors, and to solve the above problem, a system call using standard I _ O also results in an additional copy of data: copying data from within the JVM heap to contiguous space memory (out-of-heap memory) outside the heap.
Therefore, it has experienced a total of 6 copies of data, and the execution efficiency is low.
b. Operation blocking
In the traditional network Imax O processing, thread blocking occurs due to the request to establish a connection (connect), read the network Imax O data (read), send data (send) and so on.
/ / wait for connection Socket socket = serverSocket.accept (); / / connection is established, read request message StringBuilder req = new StringBuilder (); byte [] recvByteBuf = new byte [1024]; int len; while ((len = socket.getInputStream (). Read (recvByteBuf))! =-1) {req.append (recvByteBuf, 0, len, StandardCharsets.UTF_8) } / / write the return message socket.getOutputStream () .write (("server response msg" .getBytes (); socket.shutdownOutput ()
Take the above server program as an example, when the request connection is established, the request message is read, and the server calls the read method, the client data may not be ready (for example, the client data is still being written or transferred), and the thread needs to block and wait in the read method until the data is ready.
In order to achieve server-side concurrent response, each connection needs to be handled separately by independent threads. When the amount of concurrent requests is large, memory and thread switching overhead is too large to maintain the connection.
(2) Buffer
The three core components of Java NIO are Buffer (buffer), Channel (channel) and Selector.
Buffer provides byte buffers commonly used for Istroke O operations. Common cache areas are ByteBuffer, CharBuffer, DoubleBuffer, FloatBuffer, IntBuffer, LongBuffer, ShortBuffer, corresponding to basic data types: byte, char, double, float, int, long, short. The following description mainly takes the most commonly used ByteBuffer as an example. Buffer supports Java out-of-heap memory and in-heap memory.
Out-of-heap memory refers to the memory that allocates memory objects outside the JVM heap corresponding to heap memory, which is directly managed by the operating system (rather than virtual machines). Compared with in-heap memory, the advantage of using out-of-heap memory in Ipool O operation is:
Do not need to be reclaimed by JVM GC lines to reduce GC thread resources
When I am calling the system, direct operation of out-of-heap memory can save a copy of out-of-heap memory and in-heap memory.
The bottom layer of ByteBuffer is based on the allocation and release of out-of-heap memory based on malloc and free functions. The external allocateDirect method can apply for allocation of out-of-heap memory and return the DirectByteBuffer object that inherits the ByteBuffer class:
Public static ByteBuffer allocateDirect (int capacity) {return new DirectByteBuffer (capacity);}
The recovery of out-of-heap memory is based on the member variable Cleaner class of DirectByteBuffer, which provides the clean method for active recycling. Most of the out-of-heap memory in Netty is reclaimed by recording the existence of Cleaner and actively calling the clean method. In addition, when the DirectByteBuffer object is GC, the associated out-of-heap memory will also be reclaimed.
The tips:JVM parameter does not recommend setting-XX:+DisableExplicitGC, because some Java NIO-dependent frameworks (such as Netty) will actively call System.gc () when memory is abnormally exhausted, trigger Full GC, and reclaim DirectByteBuffer objects as the last guarantee mechanism for reclaiming out-of-heap memory. Setting this parameter will cause out-heap memory not to be cleaned up in this case.
Out-of-heap memory is based on the DirectByteBuffer class member variable of the underlying ByteBuffer class: the Cleaner object, which reclaims the out-of-heap memory by performing unsafe.freeMemory (address) at the appropriate time.
Buffer can see an array that is understood as a set of basic data types, stores an array of consecutive addresses, supports read and write operations, corresponds to read mode and write mode, and saves the current location state of this data through several variables: capacity, position, limit:
The total length of the capacity buffer array
The location of the next data element to be manipulated by position
Location of the next inoperable element in the limit buffer array: limit 0) {/ / flip from write data to buffer to read data from buffer buffer.flip (); byte [] bytes = new byte [buffer.remaining ()]; buffer.get (bytes); String body = new String (bytes, StandardCharsets.UTF_8); System.out.println ("server received:" + body) }}
(4) Selector
Selector (selector), which is one of the core components of Java NIO, used to check whether the state of one or more NIO Channel (channels) is readable and writable. Implement a single thread to manage multiple Channel, that is, you can manage multiple network connections.
The core of Selector is based on the Igamot O reuse function provided by the operating system. A single thread can monitor multiple connection descriptors at the same time. Once a connection is ready (usually read-ready or write-ready), it can inform the program to read and write accordingly. There are different implementations such as select, poll, epoll and so on.
The basic working principles of Java NIO Selector are as follows:
Initialize Selector object, server-side ServerSocketChannel object
Register ServerSocketChannel's socket-accept event with Selector
The thread blocks selector.select (). When the client requests the server, the thread exits the blocking.
Get all ready events based on selector. Get the socket-accept event first, and register the data readable event event of the client SocketChannel with Selector.
The thread blocks selector.select () again, and when the client connection data is ready, it is readable.
Read the client request data based on ByteBuffer, then write the response data, and close the channel
The example is as follows: the complete runnable code has been uploaded to github (https://github.com/caison/caison-blog-demo):
Selector selector = Selector.open (); ServerSocketChannel serverSocketChannel = ServerSocketChannel.open (); serverSocketChannel.bind (new InetSocketAddress (9091)); / / configure the channel to be in non-blocking mode serverSocketChannel.configureBlocking (false); / / register the socket-accept event serverSocketChannel.register (selector, SelectionKey.OP_ACCEPT) on the server side; while (true) {/ / selector.select () will block until channel related operations are ready selector.select () / / all channel associated with SelectionKey have ready events Iterator keyIterator = selector.selectedKeys () .iterator (); while (keyIterator.hasNext ()) {SelectionKey key = keyIterator.next (); / / server socket-accept if (key.isAcceptable ()) {/ / get channel SocketChannel clientSocketChannel = serverSocketChannel.accept () of client connection / / set to non-blocking mode clientSocketChannel.configureBlocking (false); / / register to listen for the client channel readable event and associate the newly assigned buffer clientSocketChannel.register (selector, SelectionKey.OP_READ, ByteBuffer.allocateDirect (1024)) for channel } / / channel readable if (key.isReadable ()) {SocketChannel socketChannel = (SocketChannel) key.channel (); ByteBuffer buf = (ByteBuffer) key.attachment (); int bytesRead; StringBuilder reqMsg = new StringBuilder () While ((bytesRead = socketChannel.read (buf)) > 0) {/ / switch from buf write mode to read mode buf.flip (); int bufbufRemain = buf.remaining (); byte [] bytes = new byte [bufRemain]; buf.get (bytes, 0, bytesRead) / / here, when the packet is larger than the byteBuffer length, there may be sticking / unpacking problems reqMsg.append (new String (bytes, StandardCharsets.UTF_8)); buf.clear ();} System.out.println ("the server receives the message:" + reqMsg.toString () If (bytesRead = =-1) {byte [] bytes = "[this is the message returned by the service] .getBytes (StandardCharsets.UTF_8); int length; for (int offset = 0; offset)
< bytes.length; offset += length) { length = Math.min(buf.capacity(), bytes.length - offset); buf.clear(); buf.put(bytes, offset, length); buf.flip(); socketChannel.write(buf); } socketChannel.close(); } } // Selector不会自己从已selectedKeys中移除SelectionKey实例 // 必须在处理完通道时自己移除 下次该channel变成就绪时,Selector会再次将其放入selectedKeys中 keyIterator.remove(); } } tips: Java NIO基于Selector实现高性能网络I/O这块使用起来比较繁琐,使用不友好,一般业界使用基于Java NIO进行封装优化,扩展丰富功能的Netty框架来优雅实现。 四、高性能I/O优化 下面结合业界热门开源项目介绍高性能I/O的优化。 1. 零拷贝 零拷贝(zero copy)技术,用于在数据读写中减少甚至完全避免不必要的CPU拷贝,减少内存带宽的占用,提高执行效率,零拷贝有几种不同的实现原理,下面介绍常见开源项目中零拷贝实现。 (1) Kafka零拷贝 Kafka基于Linux 2.1内核提供,并在2.4 内核改进的的sendfile函数 + 硬件提供的DMA Gather Copy实现零拷贝,将文件通过socket传送。 函数通过一次系统调用完成了文件的传送,减少了原来read/write方式的模式切换。同时减少了数据的copy, sendfile的详细过程如下: 基本流程如下: 用户进程发起sendfile系统调用 内核基于DMA Copy将文件数据从磁盘拷贝到内核缓冲区 内核将内核缓冲区中的文件描述信息(文件描述符,数据长度)拷贝到Socket缓冲区 内核基于Socket缓冲区中的文件描述信息和DMA硬件提供的Gather Copy功能将内核缓冲区数据复制到网卡 用户进程sendfile系统调用完成并返回 相比传统的I/O方式,sendfile + DMA Gather Copy方式实现的零拷贝,数据拷贝次数从4次降为2次,系统调用从2次降为1次,用户进程上下文切换次数从4次变成2次DMA Copy,大大提高处理效率。 Kafka底层基于java.nio包下的FileChannel的transferTo: public abstract long transferTo(long position, long count, WritableByteChannel target) transferTo将FileChannel关联的文件发送到指定channel,当Comsumer消费数据,Kafka Server基于FileChannel将文件中的消息数据发送到SocketChannel。 A. RocketMQ零拷贝 RocketMQ基于mmap + write的方式实现零拷贝:mmap() 可以将内核中缓冲区的地址与用户空间的缓冲区进行映射,实现数据共享,省去了将数据从内核缓冲区拷贝到用户缓冲区。 tmp_buf = mmap(file, len); write(socket, tmp_buf, len); mmap + write 实现零拷贝的基本流程如下: 用户进程向内核发起系统mmap调用 将用户进程的内核空间的读缓冲区与用户空间的缓存区进行内存地址映射 内核基于DMA Copy将文件数据从磁盘复制到内核缓冲区 用户进程mmap系统调用完成并返回 用户进程向内核发起write系统调用 内核基于CPU Copy将数据从内核缓冲区拷贝到Socket缓冲区 内核基于DMA Copy将数据从Socket缓冲区拷贝到网卡 用户进程write系统调用完成并返回 RocketMQ中消息基于mmap实现存储和加载的逻辑写在org.apache.rocketmq.store.MappedFile中,内部实现基于nio提供的java.nio.MappedByteBuffer,基于FileChannel的map方法得到mmap的缓冲区: // 初始化 this.fileChannel = new RandomAccessFile(this.file, "rw").getChannel(); thisthis.mappedByteBuffer = this.fileChannel.map(MapMode.READ_WRITE, 0, fileSize); 查询CommitLog的消息时,基于mappedByteBuffer偏移量pos,数据大小size查询: public SelectMappedBufferResult selectMappedBuffer(int pos, int size) { int readPosition = getReadPosition(); // ...各种安全校验 // 返回mappedByteBuffer视图 ByteBuffer byteBuffer = this.mappedByteBuffer.slice(); byteBuffer.position(pos); ByteBuffer byteBufferbyteBufferNew = byteBuffer.slice(); byteBufferNew.limit(size); return new SelectMappedBufferResult(this.fileFromOffset + pos, byteBufferNew, size, this); } tips: transientStorePoolEnable机制Java NIO mmap的部分内存并不是常驻内存,可以被置换到交换内存(虚拟内存),RocketMQ为了提高消息发送的性能,引入了内存锁定机制,即将最近需要操作的CommitLog文件映射到内存,并提供内存锁定功能,确保这些文件始终存在内存中,该机制的控制参数就是transientStorePoolEnable。 因此,MappedFile数据保存CommitLog刷盘有2种方式: 开启transientStorePoolEnable:写入内存字节缓冲区(writeBuffer) ->From memory byte buffer (writeBuffer) commit (commit) to file channel (fileChannel)-> file channel (fileChannel)-> flush to disk
TransientStorePoolEnable is not enabled: write mapping file byte buffer (mappedByteBuffer)-> mapping file byte buffer (mappedByteBuffer)-> flush to disk
RocketMQ implements zero copy based on mmap+write, which is suitable for data persistence and transmission of small block files such as business-level messages. Kafka is based on the zero-copy mode of sendfile, and is suitable for data persistence and transmission of high-throughput large block files such as system log messages.
Tips: the index file of Kafka uses mmap+write mode, and the data file sending network uses sendfile mode.
B. Netty zero copy
There are two types of zero copies of Netty:
TransferTo method based on zero copy implemented by operating system and underlying based on FileChannel
Based on the operation optimization of Java layer, the array cache objects (ByteBuf) are encapsulated and optimized. By establishing a data view of ByteBuf data, ByteBuf objects can be merged and segmented. When only one data storage is retained at the bottom, unnecessary copies are reduced.
two。 Multiplexing
After the encapsulation and optimization of Java NIO functions in Netty, the implementation of Imax O multiplexing code is much more elegant:
/ / create mainReactor NioEventLoopGroup boosGroup = new NioEventLoopGroup (); / / create worker thread group NioEventLoopGroup workerGroup = new NioEventLoopGroup (); final ServerBootstrap serverBootstrap = new ServerBootstrap () ServerBootstrap / / assemble NioEventLoopGroup .group (boosGroup, workerGroup) / / set channel type to NIO type.channel (NioServerSocketChannel.class) / / set connection configuration parameter.option (ChannelOption.SO_BACKLOG, 1024) .childOption (ChannelOption.SO_KEEPALIVE, true) .childOption (ChannelOption.TCP_NODELAY) True) / / configure inbound and outbound events handler .childHandler (new ChannelInitializer () {@ Override protected void initChannel (NioSocketChannel ch) {/ / configure inbound and outbound events channel ch.pipeline () .childHandler (...) Ch.pipeline () .addLast (...);}}); / / bind port int port = 8080; serverBootstrap.bind (port) .addListener (future-> {if (future.isSuccess ()) {System.out.println (new Date () + ": Port [" + port + "] bind successfully!) } else {System.err.println ("Port [" + port + "] binding failed!");}})
3. Page cache (PageCache)
Page cache (PageCache) is the cache of files by the operating system, which is used to reduce the number of Icano operations on disk. The content is the physical blocks on disk in pages. Page cache can help programs read and write files sequentially at a speed close to that of memory. The main reason is that OS uses PageCache mechanism to optimize the performance of read and write access operations:
Page cache read strategy: when a process initiates a read operation (for example, a read () system call), it first checks whether the required data is in the page cache:
If yes, discard access to the disk and read directly from the page cache
If not, the kernel dispatch block Icano operation reads the data from the disk, reads into the few pages immediately following it (not less than one page, usually three pages), and then puts the data into the page cache
Page cache write strategy: when the process initiates a write system call to write data to a file, it is first written to the page cache, and then the method returns. At this time, the data is not really saved to the file, Linux only marks the page in the page cache as "dirty" and is added to the dirty page list.
Then, the flusher writeback thread periodically writes the pages in the dirty page list to disk, so that the data in the disk is consistent with the memory, and finally cleans up the "dirty" mark. Dirty pages are written back to disk in the following three cases:
Free memory is below a specific threshold
When a dirty page resides in memory above a specific threshold
When the user process calls sync () and fsync () system calls
In RocketMQ, ConsumeQueue logical consumption queue stores less data and is read sequentially. Under the pre-read action of page cache mechanism, the read performance of ConsumeQueue file is almost close to read memory, even if there is message accumulation, it will not affect performance. Two message flushing strategies are provided:
Synchronous flushing: after the message is really persisted to disk, the broker side of RocketMQ will really return a successful ACK response to Producer.
Asynchronous flushing disk can take full advantage of the PageCache of the operating system. As long as the message is written into PageCache, the successful ACK can be returned to Producer. Message flushing is carried out by background asynchronous thread submission, which reduces the read and write delay and improves the performance and throughput of MQ.
Kafka also takes advantage of page caching for high-performance reading and writing of messages, which is not expanded here.
Thank you for your reading. the above is the content of "the principle and Application of Java Iphazo system". After the study of this article, I believe you have a more profound understanding of the principle and application of Java Icano system, and the specific use needs to be verified by practice. Here is, the editor will push for you more related knowledge points of the article, welcome to follow!
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.