Detailed explanation of the operation of java byte character conversion stream 07/11 Update SLTechnology News&Howtos

Detailed explanation of the operation of java byte character conversion stream

2025-07-11 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Development >

Shulou(Shulou.com)06/03 Report--

This article mainly explains "detailed explanation of the operation of java byte character conversion stream". Interested friends may wish to have a look. The method introduced in this paper is simple, fast and practical. Next let the editor to take you to learn the "java byte character conversion stream operation details" bar!

A basic concept

1. Recognize texts and text files

Java text (char) is 16-bit unsigned, is character unicode encoding (double-byte encoding) file is byte byte byte data sequence text file is text (char) sequence according to some coding scheme (utf-8,utf-16be,gbk) serialization into byte storage result.

2. Character stream (Reader Writer)-all operations are text files

Character processing: the bottom layer of one character at a time is still the basic byte sequence

3. Basic implementation of character stream.

InputStreamReader completes the byte stream parsing the non-char stream, parses the OutputStreamWriter stream to complete the char stream to the byte stream according to the code, and processes it according to the code.

4. UE code viewing method

The status bar of UltraEdit-32 shows the encoding type of the file, as shown below:

5. Myeclipse code viewing method

Project- > Property- > Resource

Two examples

Package com.imooc.io;import java.io.FileInputStream;import java.io.FileOutputStream;import java.io.IOException;import java.io.InputStreamReader;import java.io.OutputStreamWriter;public class IsrAndOswDemo {public static void main (String [] args) throws IOException {FileInputStream in = new FileInputStream ("e:\\ javaio\\ test2.txt"); InputStreamReader isr = new InputStreamReader (in, "utf-8") / / default project encoding. When operating, write the encoding format of the file FileOutputStream out = new FileOutputStream ("e:\\ javaio\\ test1.txt"); OutputStreamWriter osw = new OutputStreamWriter (out, "utf-8"); / * int c; while ((c = isr.read ())! =-1) {System.out.print ((char) c);} * / char [] buffer = new char [8x1024]; int c / * read in batches and put into the buffer character array, starting from position 0. The maximum number of buffer.length returned is the number of characters read * / while ((c = isr.read (buffer,0,buffer.length))! =-1) {String s = new String (buffer,0,c); System.out.print (s); osw.write (buffer,0,c); osw.flush ();} isr.close (); osw.close ();}}

Three running results

China 1jd

Four instructions

The file size of utf-8 created with UE is different from that of utf-8 created with myeclipse, and the program is tested with utf-8 created by myeclipse.

What's the difference between utf-8 and utf-8 without bom?

Utf-8+bom has three more byte prefixes than utf-8: 0xEF0xBB0xBF, a text or string with these three byte prefixes, the program can automatically judge it as utf-8 format and parse the text or string according to utf-8 format, otherwise, a text or string needs to be verified one by one according to the character coding specification in the case of unknown encoding.

Six utf-8 coding instructions

Https://baike.baidu.com/item/UTF-8/481798?fr=aladdin

Seven coding examples

Open the notepad program Notepad.exe, create a new text file, the content is a "strict" word, and then use ANSI,Unicode,Unicode big endian and UTF-8 coding to save.

Then, use the "Edit-hexadecimal function" in the text editing software UltraEdit to observe the internal encoding of the file.

1) ANSI: the encoding of the file is two bytes of "D1 CF", which is the "strict" GB2312 encoding, which also implies that the GB2312 is stored in a big head mode. 2) Unicode: the encoding is four bytes of "FF FE 25 4e", where "FF FE" indicates that it is stored in a small header mode, and the real encoding is 4E25. 3) Unicode big endian: the encoding is four bytes "FE FF 4e 25", where "FE FF" indicates that it is stored in big head mode. 4) UTF-8: the encoding is six bytes of "EF BB BF E4B8A5", the first three bytes of "EF BB BF" indicates that this is the UTF-8 code, and the last three "E4B8A5" is the specific encoding of "Yan", and its storage order is consistent with the coding order.

At this point, I believe that everyone on the "java byte character conversion stream operation detailed explanation" have a deeper understanding, might as well to the actual operation of it! Here is the website, more related content can enter the relevant channels to inquire, follow us, continue to learn!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.