In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-04-05 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Development >
Share
Shulou(Shulou.com)06/02 Report--
The editor will share with you how to use the default code used by FileReader. I hope you will get something after reading this article. Let's discuss it together.
Default encoding adopted by FileReader
A long time ago, I listened to the teaching video and said that the default encoding used by Java was ISO-8859-1. I kept it in mind.
But when I recently re-looked at the IO stream, I was surprised to find that FileReader could read a text file with Chinese content without specifying a character encoding. You know, ISO-8859-1 is a Western European character set, how can it include Chinese? So Baidu used the key word "IOS-8859-1 shows Chinese", and as a result, many people have this doubt.
The code is as follows:
Package day170903; import java.io.*; public class TestDecoder {public static void main (String [] args) {FileReader fr = null; try {fr = new FileReader ("G:/io/hello.txt"); int len = 0 While ((len=fr.read ())! =-1) {System.out.println ((char) len);}} catch (FileNotFoundException e) {e.printStackTrace () } catch (IOException e) {e.printStackTrace ();} finally {try {if (fringe null) {fr.close () }} catch (IOException e) {e.printStackTrace ();}} what is the truth of the matter?
The encoding is usually specified at the constructor, so take a look at the constructor of FileReader. It's also weird. I haven't noticed much before, but FileReader doesn't have a construction method that can specify character encoding. And it simply inherits from InputStreamReader without overriding or extending any methods. This is probably the most stingy subcategory in history, completely the Neet.
Fortunately, Java's documentation notes are well written. At the beginning of the FileReader class, there is the following document comment (the Chinese part is my poor translation):
/ * Convenience class for reading character files. The constructors of this * class assume that the default character encoding and the default byte-buffer * size are appropriate. To specify these values yourself, construct an * InputStreamReader on a FileInputStream. * * this is a very convenient class for reading character files (text files). * the constructor of this class assumes that the default character encoding and default cache array size are appropriate (as needed). * if you want to specify the size of the character encoding and cache array yourself, * use the FileInputStream-based InputStreamReader class. *
FileReader is meant for reading streams of characters. * For reading streams of raw bytes, consider using a * FileInputStream. * * FileReader is designed to read character streams. * if you want to read the original byte stream, consider using FileInputStream * @ see InputStreamReader * @ see FileInputStream * * @ author Mark Reinhold * @ since JDK1.1 * /
Therefore, the designer has explained the reason for this design in the documentation notes. But for us, what is more important now is what this so-called default character encoding is.
At this point, let's take a look at the details of the constructor in the FileReader we used.
Public FileReader (String fileName) throws FileNotFoundException {super (new FileInputStream (fileName));
FileReader inherits from InputStreamReader and calls InputStreamReader's constructor that accepts formal parameters of type InputStream, which is the following.
Public InputStreamReader (InputStream in) {super (in); try {sd = StreamDecoder.forInputStreamReader (in, this, (String) null); / / # # check lock object} catch (UnsupportedEncodingException e) {/ / The default encoding should always be available throw new Error (e);}}
Of course, this constructor of InputStreamReader calls the following constructor of its parent class, Reader.
Protected Reader (Object lock) {if (lock = = null) {throw new NullPointerException ();} this.lock = lock;}
Here, it just assigns the resulting InputStream object to the member variable lock (if you look at the documentation comments for the member variable lock, you probably know that it is used to ensure synchronization), and it doesn't talk about character encoding.
Since the constructor of the parent class Reader is found up through super (in), there is no trace of the default character encoding, so this path is over. The next thing you should look at is the code under super (in), that is, the exception catch statement block. The body statement has only the following line.
Sd = StreamDecoder.forInputStreamReader (in, this, (String) null)
If you look closely at the code of FileReader and other IO streams, you can see that many of the input stream reading functions (read and its overloading methods) are done through this StreamDecoder, which is later. Check this directly in Eclipse
The source code of StreamDecoder is not good, you need to find it on openjdk.
Http://grepcode.com/file/repository.grepcode.com/java/root/jdk/openjdk/6-b14/sun/nio/cs/StreamDecoder.java
The above exception catch statement block body calls the forInputStreamReader method of StreamDecoder, and the corresponding code is as follows:
Public static StreamDecoder forInputStreamReader (InputStream in, Object lock, String charsetName) throws UnsupportedEncodingException {String csn = charsetName; if (csn = null) csn = Charset.defaultCharset () .name () Try {if (Charset.isSupported (csn)) return new StreamDecoder (in, lock, Charset.forName (csn));} catch (IllegalCharsetNameException x) {} throw new UnsupportedEncodingException (csn);}
In fact, when calling, the third argument passed is the null in the form of a string, which is actually the default character encoding we are looking for.
What we are looking for is the default character encoding, and there is no need to delve into other code. The first line says to assign the third parameter received to csn (local variable: character encoding), which is, of course, meaningful when called by InputStreamReader's constructor with character encoding parameters. The constructor that does not specify a character encoding is passed null when calling the forInputStreamReader of StreamDecoder. So the next if statement judgment is established, so the variable csn gets Charset.defaultCharset (). Name (), which is the default character encoding.
The next step is to look at the return value of the defaultCharset method of the class Charset-- what the return value of the name () method of the Charset object is. It's a bit of a twist, but it's actually looking for the default character encoding.
Public static Charset defaultCharset () {if (defaultCharset = = null) {synchronized (Charset.class) {String csn = AccessController.doPrivileged (new GetPropertyAction ("file.encoding")); Charset cs = lookup (csn); if (cs! = null) defaultCharset = cs; else defaultCharset = forName ("UTF-8") }} return defaultCharset;}
This code looks laborious, and then you have to look at other code. The end result is this so-called default character encoding, which is actually the local encoding when JVM starts.
To view this, right-click on the corresponding project, select the Properties option, and in the pop-up properties window, you can see the default character encoding of the current project when it is running in JVM. For us Chinese, it is generally "GBK", but we can choose from the drop-down box according to our needs.
This code looks laborious, and then you have to look at other code. The end result is this so-called default character encoding, which is actually the local encoding when JVM starts.
To view this, right-click on the corresponding project, select the Properties option, and in the pop-up properties window, you can see the default character encoding of the current project when it is running in JVM. For us Chinese, it is generally "GBK", but we can choose from the drop-down box according to our needs.
So the initial question was entirely a misunderstanding that I didn't know that the default encoding was actually GBK. Just test it the other way around, first write the following French sentence to the file with OutputStreamWriter
Est-ce possible que tu sois en train de penser à moi lorsque tu me manques?
Do you happen to be thinking about me when I'm thinking about you?
When writing, specify the character encoding as ISO-8859-1, then read it with InputStreamReader, and read without specifying the character encoding (that is, using the default character encoding). Then, if this sentence is not restored correctly, the default character encoding is not ISO-8859-1.
Package day170903; import java.io.*; public class TestDefaultCharEncoding {public static void main (String [] args) {InputStreamReader isr = null; OutputStreamWriter osw = null; try {osw = new OutputStreamWriter (new FileOutputStream ("G:/io/ISO-8859-1.txt"), "ISO-8859-1") Isr = new InputStreamReader (new FileInputStream ("G:/io/ISO-8859-1.txt")); char [] chars = "Est-ce possible que tu sois en train de penser à moi lorsque tu me manques?" .toCharArray (); osw.write (chars); osw.flush () Int len= 0; while ((len=isr.read ())! =-1) {System.out.print ((char) len);}} catch (UnsupportedEncodingException | FileNotFoundException e) {e.printStackTrace () } catch (IOException e) {e.printStackTrace ();} finally {try {if (isrdisabled null) {isr.close () } if (oswalled null) {osw.close ();}} catch (IOException e) {e.printStackTrace () }}
The output is as follows:
Est-ce possible que tu sois en train de penser? Moi lorsque tu me manques?
Most of them are restored correctly, because most of the French letters are also English letters. But the French-peculiar (compared to English) à was unrecognizable after reading it and became a question mark.
If the default encoding is really ISO-8859-1, then there is no problem with reading. Now there is a problem, which means that the default encoding is not ISO-8859-1.
Basically, we're done here, but one more thing. Although we can easily know what kind of encoding JVM will use without specifying a character encoding, it is recommended to add character encoding when using character classes, because writing a clear character encoding can let others understand what you mean, and can avoid coding abnormalities that may occur after changing the code to a development tool.
The coding problem of FileReader
There is a UTF-8-encoded text file, read a string with FileReader, and then convert the character set: str=new String (str.getBytes (), "UTF-8"). As a result, most of the Chinese characters display normally, but finally some Chinese characters are displayed as question marks!
Public static List getLines (String fileName) {List lines = new ArrayList (); try {BufferedReader br = new BufferedReader (new FileReader (fileName)); String line = null; while ((line = br.readLine ())! = null) lines.add (new String (line.getBytes ("GBK"), "UTF-8"); br.close () } catch (FileNotFoundException e) {} catch (IOException e) {} return lines;}
When the file is read, it is decoded by OS's default character set, that is, GBK. I first use the default character set GBK to encode str.getBytes ("GBK"). At this time, it should be restored to the byte sequence in the file, and then it should be decoded by UTF-8. The resulting string should be correct.
Why is there still some garbled code in the result?
The problem is that when FileReader reads the file, FileReader inherits InputStreamReader, but does not implement the constructor with character set parameters in the parent class, so FileReader can only decode it according to the default character set of the system, and then lose its coding in the process of UTF-8-> GBK-> UTF-8, resulting in the result that the original characters cannot be restored.
The reason is clear: use InputStreamReader instead of FileReader,InputStreamReader isr=new InputStreamReader (new FileInputStream (fileName), "UTF-8"); in this way, reading the file will directly decode it with UTF-8, eliminating the need for transcoding.
Public static List getLines (String fileName) {List lines = new ArrayList (); try {BufferedReader br = new BufferedReader (new InputStreamReader (new FileInputStream (fileName), "UTF-8")); String line = null; while ((line = br.readLine ())! = null) lines.add (line); br.close () } catch (FileNotFoundException e) {} catch (IOException e) {} return lines;} after reading this article, I believe you have some understanding of "how to use the default encoding used by FileReader". If you want to know more about it, welcome to follow the industry information channel. Thank you for reading!
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.