In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-02-25 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Development >
Share
Shulou(Shulou.com)05/31 Report--
This article mainly introduces the relevant knowledge of "Java character stream case analysis". The editor shows you the operation process through the actual case. The operation method is simple, fast and practical. I hope this "Java character stream case analysis" article can help you solve the problem.
I. the origin of the character stream
Because it is not convenient to use byte stream to manipulate Chinese, Java provides character stream to manipulate Chinese.
Implementation principle: byte stream + coding table
Why is there no problem when copying text files with Chinese using byte streams?
Because the underlying operation will automatically splice bytes into Chinese.
How to recognize that the byte is in Chinese?
When storing Chinese characters, whether it is UTF-8 or GBK, the first byte is a negative number to prompt.
2. Code table character set:
Is a collection of all characters supported by a system, including national characters, punctuation marks, graphic symbols, numbers, etc.
If a computer wants to accurately store and recognize all kinds of character set symbols, it needs character coding, and a character set must have at least one set of character coding.
Common character sets include ASCII character set, GBXXX character set, Unicode character set and so on.
GBK: the most commonly used Chinese code table, which is an extended specification based on the GB2312 standard, uses a double-byte coding scheme, contains a total of 21003 Chinese characters, is fully compatible with the GB2312 standard, and supports traditional Chinese characters, Japanese and Korean characters, etc.
GB18030: the latest Chinese code table, containing 70244 Chinese characters, using multi-byte coding, each character can be composed of 1, 2 or 4 bytes. Support Chinese minority characters, as well as traditional Chinese characters, Japanese and Korean characters, etc.
Unicode character set:
Designed to express any character in any language, it is an industry standard, also known as Unicode, standard universal code; it uses a maximum of 4 bytes of numbers to express each letter, symbol, or text. There are three coding schemes: UTF-8, UTF-16, UTF32, the most commonly used is UTF-8
UTF-8: can be used to represent any character in the Unicode standard. It is the preferred encoding for e-mail, web pages, and other applications for storing or transferring files. The Internet working Group requires that all Internet protocols support the UTF-8 encoding format. It uses one to four bytes to encode each character
UTF-8 coding rules:
128 US-ASCII characters, requiring only one byte encoding
Characters such as Latin, which require two-byte encoding
Most common words (including Chinese) are encoded in three bytes
Other rarely used UniCode auxiliary characters, using four-byte encoding
Conclusion: if you use that kind of rule when you encode, you need to use the corresponding rule in decoding, otherwise it will be garbled.
3. Encoding and Decoding problems in string Encoding method (IDEA):
Byte [] getBytes (): use the platform default character set to encode the String as a series of bytes and store the results in a new byte array
Byte [] getBytes (String charsetName): encodes the String into a series of bytes using the specified character set, storing the result in a new byte array
Decoding method (IDEA):
String (byte [] bytes): construct a new String by decoding a specified byte array using the platform's default character set
String (byte [] bytes,String charsetName): constructs a new String by decoding a specified byte array from a specified character set
The default encoding format in IDEA is UTF-8
IV. Coding and decoding of character stream
Character stream abstract base class:
Reader: an abstract class of character input streams
Writer: an abstract class of character output streams
There are two classes in the character stream that are related to encoding and decoding problems:
InputStreamReader: a bridge from a byte stream to a character stream: it reads bytes and decodes them into characters using the specified character set. The character set it uses can be specified by name or explicitly, or it can accept the default character set of the platform.
Construction method:
InputStreamReader (InputStream in) creates an InputStreamReader that uses the default character set. InputStreamReader (InputStream in, String charsetName) creates an InputStreamReader that uses a named character set.
OutputStreamWruter: a bridge from character stream to byte stream: encodes the written character into bytes using a custom character set, which can be specified by name or explicitly, or can accept the default character set of the platform
Construction method:
OutputStreamWriter (OutputStream out) creates an OutputStreamWriter that uses the default character encoding. OutputStreamWriter (OutputStream out, String charsetName) creates an OutputStreamWriter that uses a named character set. Public class ConversionStreamDemo {public static void main (String [] args) throws IOException {/ / create a default encoding format InputStreamReader\ OutputStreamWriter InputStreamReader ipsr = new InputStreamReader (new FileInputStream ("E:\\ abc.txt"); OutputStreamWriter opsw = new OutputStreamWriter (new FileOutputStream ("E:\\ abc.txt")); / / write data opsw.write ("Hello"); opsw.close () / / read data, method 1: read one byte of data at a time int ch; while ((ch = ipsr.read ())! =-1) {System.out.print ((char) ch);} ipsr.close () Void write (int c) write a character void write (char [] cbuf) write a character array void write (char [] cbuf,int off,int len) write part of the character array void write (String str) write a string void write (String str,int off,int len) write part of a string
Character stream writing data needs to pay attention to the buffer. If you want to load the buffer data, you need to add the refresh method flush () after the write method.
The first three methods use the same as the byte stream writing method, and here we focus on the following two methods
Public class OutputStreamWriterDemo {public static void main (String [] args) throws IOException {/ / create an OutputStreamWriter object OutputStreamWriter opsw=new OutputStreamWriter (new FileOutputStream ("E:\\ abc.txt")) in the default encoding format; / / method 1: write a byte opsw.write (97); opsw.flush () / / if you need to immediately display the input data in the file, you need to add the refresh method / / method 2: write an array of characters char [] ch= {'axiomanagi'}; opsw.write (ch); opsw.flush () / / if you need to immediately display the input data in the file, you need to add the refresh method / / method 3: write part of a character array opsw.write (ch,0,2); opsw.flush () / / if you need to immediately display the input data in the file, you need to add the refresh method / / method 4: write a string opsw.write ("one, two, three"); opsw.flush () / / if you need to immediately display the input data in the file, you need to add the refresh method / / method 5: write part of a string opsw.write ("34,5", 1Jing 2); opsw.flush () / / if you need to display the entered data in the file immediately Need to add refresh method} 5. Two method names of character stream data description int read () read one character data at a time int read (char [] cbuf) read one character array data at a time public class InputStreamReadDemo {public static void main (String [] args) throws IOException {/ / create a default encoding format InputStreamReader InputStreamReader ipsr=new InputStreamReader (new FileInputStream ("E:\\ abc.txt")) / / read data, one character data at a time int ch; while ((ch=ipsr.read ())! =-1) {System.out.print ((char) ch);} ipsr.close (); / / method 2: read character array data char [] ch=new char [1024] at a time; int len While ((len=ipsr.read (ch))! =-1) {System.out.print (new String (ch,0,len));} ipsr.close ();}}
Summary: if the default encoding format is used, the character input stream InputStreamReader can be replaced by a subclass FileReader, and the character output stream OutputStreamWriter can be replaced by its subclass FileWriter, which works the same when using the default encoding format.
This is the end of the introduction to "Java character stream instance analysis". Thank you for reading. If you want to know more about the industry, you can follow the industry information channel. The editor will update different knowledge points for you every day.
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.