Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

The solution of using readLine () garbled code in java

2025-01-27 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Development >

Share

Shulou(Shulou.com)06/02 Report--

In this issue, the editor will bring you solutions to the use of readLine () garbled in java. The article is rich in content and analyzed and described from a professional point of view. I hope you can get something after reading this article.

The solution of using readLine () garbled code I encountered reading a line of garbled code in the company's development program.

Eclipse defaults to utf-8

FileInputStream f4 = new FileInputStream (new File ("F:\\ bb.txt")); BufferedReader bufferedReader2 = new BufferedReader (new InputStreamReader (f4)); String readLine = bufferedReader2.readLine (); / / garbled code output

There are two text files in the test file, aa.txt (UTF-8 encoding) and bb.txt (GB2312 encoding). The contents of the two files are both in one character:

Prerequisite knowledge: three bytes in utf-8 and two bytes in GB2312

Test the code:

Public class EncodeTest {@ Test public void test1 () throws Exception {FileInputStream F1 = new FileInputStream (new File ("F:\\ aa.txt")); byte [] b1 = new byte [f1.available ()]; f1.read (b1); for (byte b: B1) {System.out.println (b);} System.out.println (new String (b1)) System.out.println ("+"); FileInputStream f2 = new FileInputStream ("F:\\ bb.txt"); byte [] b2 = new byte [f2.available ()]; f2.read (b2); for (byte b: b2) {System.out.println (b); byte [] tb = new byte [] {b}; String lm = new String (tb) System.out.println ("current garbled" + lm); byte [] lm_b = lm.getBytes (); System.out.println ("- garbled start-"); for (byte bn: lm_b) {System.out.println (bn) } System.out.println ("- garbled end-");} System.out.println (new String (b2, "gb2312")); System.out.println ("+"); FileInputStream f3 = new FileInputStream (new File ("F:\\ bb.txt")); BufferedReader bufferedReader = new BufferedReader (new InputStreamReader (f3)) String readLine2 = bufferedReader.readLine (); byte [] b3 = readLine2.getBytes ("UTF-8"); for (byte b: b3) {System.out.println (b);} System.out.println (new String (b3)); System.out.println ("+"); FileInputStream f4 = new FileInputStream ("F:\\ bb.txt") BufferedReader bufferedReader2 = new BufferedReader (new InputStreamReader (f4, "GB2312")); String readLine = bufferedReader2.readLine (); byte [] b4 = readLine.getBytes ("UTF-8"); for (byte b: b4) {System.out.println (b);} System.out.println (new String (b4)); System.out.println ("+");}}

Print the results by analyzing:

-28 # byte 1

-72 # bytes 2

-83 # bytes 3

The decoded character of # utf-8 is: medium, and there is no garbled code.

+

-42 # byte 1

The character decoded by-42 according to utf-8 is garbled, and then the garbled code is encoded according to utf-8 to get the following bytes

-garbled start-

-17

-65

-67

-garbled end-

-48 # bytes 2

The character decoded by-48 according to utf-8 is garbled, and then the garbled code is encoded according to utf-8 to get the following bytes

-garbled start-

-17

-65

-67

-garbled end-

In # Bytes 1:-42 and Bytes 2 are decoded into characters according to gb2312

+

-17 # here the readline () method is not set to use eclipse default encoding using utf-8 by default (read bb.txt)

-65

-67

-17

-65

-67

The output of the Chinese language is garbled.

+

-28 # here the readline () method is set to encode GB2312 to read a line of text as medium (read bb.txt)

-72

-83

Medium

+

Make a brief summary

New BufferedReader (new InputStreamReader (f4)) By default, utf-8 is used to decode bytes, while the character in the content of bb.txt file is gb2312, so the characters in this file account for two bytes on disk, while the Chinese in utf-8 encoding set accounts for three bytes, and when readline (), it is found that it is two bytes, and utf-8 is currently used, so the bottom layer of java decodes these two bytes using utf-8 alone. Each byte is encoded by utf-8 into one char character at a time, so the final data decoded by utf-8 into two garbled characters.

Readers can take a look at the above code and the printed information. The encoded bytes of the two characters are-17-65-67 (red) respectively, which is the same as the byte-17-65-67 (blue) obtained from the character encoded by utf-8 alone and then encoded according to utf-8. That is, when the byte is decoded according to utf-8 and the corresponding correct character is not found in the utf-8 encoding set, it will be output by default, and the corresponding utf-8 byte-17-65-67. Therefore, when the correct encoded character cannot be found, it will be output according to the character encoding corresponding to-17-65-67.

Common sense: when using new BufferedReader (new InputStreamReader (f4), "coding of text sources") the encoding of text sources must be written. So there will be no garbled code.

Garbled code problem in calling readLine

ReadLine is an easy method to use, but as a method of character streaming, you do encounter all kinds of coding problems. But using byte streams to process data, such as a text file, can be inflexible if it is to be processed on a line basis.

The following is a method to specify the encoding mode of the readLine character stream / / define a File object File someFile = new File ("somefile.txt"); / / input stream FileInputStream fis = new FileInputStream (someFile); InputStreamReader isr = new InputStreamReader (fis, "UTF-8"); / / specify to read BufferedReader br = new BufferedReader (isr) in UTF-8 encoding; / / output stream FileOutputStream fos = new FileOutputStream (someFile + ". Generated file .txt "); OutputStreamWriter osw = new OutputStreamWriter (fos," UTF-8 "); / / specifies to output while in UTF-8 encoding ((line = br.readLine ())! = null) {/ / osw.write (" write something "); osw.write (line);} / / close IO stream br.close (); osw.close () The above is the solution of using readLine () garbled code for java shared by Xiaobian. If you happen to have similar doubts, you might as well refer to the above analysis to understand. If you want to know more about it, you are welcome to follow the industry information channel.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Development

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report