In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-04-06 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Development >
Share
Shulou(Shulou.com)06/03 Report--
This article mainly introduces "how to understand JAVA.IO, character coding". In daily operation, I believe many people have doubts about how to understand JAVA.IO and character coding. The editor consulted all kinds of materials and sorted out simple and easy-to-use operation methods. I hope it will be helpful for you to answer the doubts about "how to understand JAVA.IO and character coding". Next, please follow the editor to study!
1 JAVA.IO byte stream
Inputstream.png
LineNumberInputStream and StringBufferInputStream are officially recommended not to be used anymore. LineNumberReader and StringReader are recommended instead.
ByteArrayInputStream and ByteArrayOutputStream byte array processing streams, creating a buffer in memory to be used as a stream, reading data from a cache faster than from a storage medium such as a disk
/ / use ByteArrayOutputStream to temporarily cache data from other sources
ByteArrayOutputStream data = new ByteArrayOutputStream (1024); / / 1024 byte size cache
Data.write (System.in.read ()); / / temporarily store user input data
/ / convert data to ByteArrayInputStream
ByteArrayInputStream in = new ByteArrayInputStream (data.toByteArray ())
FileInputStream and FileOutputStream access the file and use the file as InputStream to read and write the file.
For ObjectInputStream and ObjectOutputStream object streams, the constructor needs to pass a stream to read and write JAVA objects; it can be used for serialization, and the objects need to implement Serializable interfaces.
/ / write to java object FileOutputStream fileStream = new FileOutputStream ("example.txt"); ObjectOutputStream out = new ObjectOutputStream (fileStream); Example example = new Example (); out.writeObject (example); / / read FileInputStream fileStream for java object = new FileInputStream ("example.txt"); ObjectInputStream in = new ObjectInputStream (fileStream); Example = (Example) in.readObject ()
PipedInputStream and PipedOutputStream pipeline streams are suitable for transferring data in two threads, one thread sends data through the pipeline output stream, and the other thread reads data through the pipeline input stream to realize the data communication between the two threads.
/ / create a sender object Sender sender = new Sender (); / / create a receiver object Receiver receiver = new Receiver (); / / get the output pipeline stream / / get the input / output pipeline stream PipedOutputStream outputStream = sender.getOutputStream (); PipedInputStream inputStream = receiver.getInputStream (); / / link two pipes, which is important to connect the input stream with the output stream outputStream.connect (inputStream); sender.start () / / start sender thread receiver.start (); / / start receiver thread
SequenceInputStream combines multiple InputStream into a single InputStream, allowing applications to merge several input streams successively.
InputStream in1 = new FileInputStream ("example1.txt"); InputStream in2 = new FileInputStream ("example2.txt"); SequenceInputStream sequenceInputStream = new SequenceInputStream (in1, in2); / / data read int data = sequenceInputStream.read ()
FilterInputStream and FilterOutputStream use the decorator pattern to add additional functionality to the flow, and the subclass construction parameters require an InputStream/OutputStream
ByteArrayOutputStream out = new ByteArrayOutputStream (2014); / / data writing, decorating an InputStream with DataOutputStream / / using InputStream to handle basic data DataOutputStream dataOut = new DataOutputStream (out); dataOut.writeDouble (2014); / / data reading ByteArrayInputStream in = new ByteArrayInputStream (out.toByteArray ()); DataInputStream dataIn = new DataInputStream (in); Double data = dataIn.readDouble ()
DataInputStream and DataOutputStream (subclasses of Filter streams) add to other streams the ability to handle various basic types of data, such as byte, int, String
BufferedInputStream and BufferedOutputStream (subclasses of Filter streams) add buffering capabilities to other streams
PushBackInputStream (FilterInputStream subclass) pushes back to the input stream, and some data read in can be pushed back into the buffer of the input stream.
PrintStream (FilterOutputStream subclass) prints streams with a function similar to System.out.print
2 JAVA.IO character stream
21.png
From the guide diagram of byte stream and character stream, they correspond to each other, such as CharArrayReader and ByteArrayInputStream
Conversion of byte stream and character stream: InputStreamReader can convert InputStream to Reader,OutputStreamReader and OutputStream to Writer
/ / InputStream to Reader InputStream inputStream = new ByteArrayInputStream ("Program" .getBytes ()); InputStreamReader reader = new InputStreamReader (inputStream, StandardCharsets.UTF_8); / / OutputStream to Writer OutputStream out = new FileOutputStream ("example.txt"); OutputStreamWriter writer = new OutputStreamWriter (out); / / read and write writer.write in characters (reader.read (new char [2]))
Differences: byte stream reading units are bytes, character stream reading units are characters; a character is composed of bytes, such as variable word length coding UTF-8 is represented by 1 byte and 4 bytes
3 garbled code problem and character stream
Characters are represented by different encodings, and their byte length (word length) is different. For example, the utf-8 coding format of "program" consists of [- 25] [- 88] [- 117]. ISO_8859_1 encoding is a single byte [63]
Usually, the operation of resources is oriented to byte stream, but when data resources are converted into bytes according to different byte codes, their contents are different, which is easy to cause garbled problems.
The character encodings used by encode and decode in two garbled scenarios are inconsistent: resources are encoded in UTF-8, but GBK decoding is used in the code to open the use of byte streams to read bytes that do not meet the character length: characters are made up of bytes, for example, the utf-8 format of "program" is three bytes. If the stream is read every two bytes in InputStream and then converted to String (the default encoding of java is utf-8), garbled code will appear (half a Chinese, guess what)
ByteArrayInputStream in = new ByteArrayInputStream ("good program" .getBytes ())
Byte [] buf = new byte [2]; / / read two bytes of the stream
In.read (buf); / / read data
System.out.println (new String (buf)); / / garbled code
-result
Random / / garbled code
Garbled scenario 1. If you know the character encoding of the resource, you can use the corresponding character encoding to decode the problem.
Garbled scenario 2, you can read all bytes at once, and then encode at one time. But for large file streams, this is not realistic, so there is the emergence of character streams
Byte stream is converted into character stream using InputStreamReader and OutputStreamReader, in which character encoding can be specified and then processed in terms of characters to solve garbled code.
InputStreamReader reader = new InputStreamReader (inputStream, StandardCharsets.UTF_8)
4 Conceptual distinction between character set and character coding
The relationship between the character set and the character coding, the character set is the specification, and the character coding is the concrete implementation of the specification; the character set specifies the only corresponding relationship between the symbol and the binary code value, but does not specify the specific storage method.
Unicode, ASCII, GB2312 and GBK are all character sets, in which ASCII, GB2312 and GBK are both character sets and character encodings; be careful not to confuse the difference between the two; and the concrete implementation of unicode is UTF-8,UTF-16,UTF-32
The earliest ASCII codes used one byte (8bit) to define the mapping between characters and binaries. The standard ASCII code specifies 128characters, which is sufficient in the English world. But how to map other written symbols such as Chinese and Japanese? So other larger character sets appear.
Unicode (Unified character set). In the early days, it used 2 byte to represent 1 character, and the entire character set could hold 65536 characters. However, it is still not enough, so it is extended to 4 byte to represent one character, which now supports U+010000~U+10FFFF.
It is wrong to say that unicode is two bytes; UTF-8 is variable length and needs to be stored in 1 to 4 bytes; UTF-16 is generally two bytes (U+0000~U+FFFF range), if two bytes are encountered, four bytes are used; and UTF-32 is fixed four bytes
The characters represented by unicode start with "U +", followed by hexadecimal numbers. For example, the encoding of "word" is U+5B57.
UTF-8 coding and unicode character set
Range Unicode (Binary) UTF-8 Code (Binary) UTF-8 Code byte length U+0000~U+007F 00000000 00000000 00000000 0XXXXXXX 0XXXXXX 1 U+0080~U+07FF 0000000000000000 00000YYY YYXXXXXX 110YYYYY 10XXXXXX 2 U+0800~U+FFFF 0000000000000000 ZZZZYYYY YYXXXXXX 1110ZZZZ 10YYYYYY 10XXXXXX 3 U+010000~U+10FFFF 00000000 000AAAZZ ZZZZYYYY YYXXXXXX 11110AAA 10ZZZZZZ 10YYYYYY 10XXXXXX 4
The program is divided into internal code and external code, the default code of java is UTF-8, which actually refers to the external code; the internal code tends to use fixed-length code, which is aligned with memory on the same principle, which is easy to deal with. External codes tend to use variable length codes, which encode common characters as short codes and rare characters as long codes, saving storage space and transmission bandwidth.
The string of JDK8 uses char [] to store characters, and char is two bytes in size, where UTF-16 encoding (internal code) is used. However, the Chinese characters specified by unicode are in U+0000~U+FFFF, so there is no garbled code when using char (UTF-16 Encoding) to store Chinese.
After JDK9, strings are stored using the byte [] array, because there are some characters that can no longer be stored in a char, such as emoji emoji characters. It is easier to expand using byte storage strings.
JDK9, if the contents of the string are all ISO-8859-1/Latin-1 characters (1 character 1 byte), use ISO-8859-1/Latin-1 encoding to store the string, otherwise use UTF-16 encoding to store the array (2 or 4 bytes)
System.out.println (Charset.defaultCharset ()); / / output java default code for (byte item: "Program" .getBytes (StandardCharsets.UTF_16)) {System.out.print ("[" + item + "]");} System.out.println (""); for (byte item: "Program" .getBytes (StandardCharsets.UTF_8)) {System.out.print ("[" + item + "]")) }-result---- UTF-8 / / java default encoding UTF-8 [- 2] [- 1] [122] [11] [94] [- 113] / / UTF_16:6 bytes? [- 25] [- 88] [- 117] [- 27] [- 70] [- 113] / / UTF_8:6 bytes are normal
What is the case that the UTF-16 code of the "program" outputs 6 bytes, with an extra two bytes? Try to input one character again.
For (byte item: "Cheng" .getBytes (StandardCharsets.UTF_16)) {
System.out.print ("[" + item + "]")
}
-result--
[- 2] [- 1] [122] [11]
You can see that UTF-16 encodes two more bytes [- 2] [- 1], and hexadecimal is 0xFEFF. It is used to identify whether the coding order is Big endian or Little endian. Take the character 'middle' as an example, its unicode hexadecimal is 4E2D, storage time 4e before, 2D after, that is, Big endian;2D before, 4e after, that is, Little endian. FEFF indicates that storage is Big endian,FFFE and Little endian is used.
Why doesn't UTF-8 have a byte order problem? Personal view, because UTF-8 is longer, by the first byte of the header 0,110,1110, 11110 to determine whether the need for the next few bytes to form characters, the use of Big endian easy to read processing, in turn difficult to deal with, so the mandatory use of Big endian
In fact, it feels that UTF-16 can force the use of Big endian;, but this is a historical problem.
At this point, the study on "how to understand JAVA.IO, character coding" is over. I hope to be able to solve your doubts. The collocation of theory and practice can better help you learn, go and try it! If you want to continue to learn more related knowledge, please continue to follow the website, the editor will continue to work hard to bring you more practical articles!
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.