Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

Java character encoding and decoding

2025-02-27 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Servers >

Share

Shulou(Shulou.com)06/02 Report--

1. The development of character coding.

①, ASCII code

Because the computer only knows numbers, all the data we have in the computer are represented by numbers. Because English characters are limited, the highest bit of bytes used is 0, and each byte is represented by a number between 0 and 127. For example, A corresponds to 65 and a corresponds to 97. This is the American standard information interchange code, ASCII

123String str = new String ("Aa"); byte [] strASCII = str.getBytes ("ASCII"); System.out.println (Arrays.toString (strASCII)); / / [65,97]

②, GB2312 code

With the popularity of computers around the world, many countries and regions have introduced their own characters into computers, such as Chinese characters. At this time, it is found that the range of numbers represented by a word energy saving is too small to contain all Chinese characters. Then it is stipulated that two bytes are used to represent a Chinese character.

Regulation: the original ASCII character coding remains unchanged, still using a byte representation, in order to distinguish one Chinese character from two ASCII code characters. The highest bit of each byte of a Chinese character is specified as 1 (that is, the Chinese binary is negative), which is called GB2312 coding.

123String str = new String ("Aa handsome pot"); byte [] strASCII = str.getBytes ("GB2312"); System.out.println (Arrays.toString (strASCII)); / / [65, 97,-53,-89,-71,-8]

③ 、 GBK

Because there are too many Chinese characters, more Chinese characters are added to GB2312. This kind of coding is GBK.

Question: if it is only in China, then everyone knows Chinese characters, but if it is another country, and the code table of that country does not include Chinese characters. Then the computer will be garbled or other characters when it is displayed.

Solution: in order to solve the impact of localized character coding in various countries, all characters in the world are encoded uniformly-Unicode coding.

At this time, a character is displayed anywhere in the world is fixed, such as Chinese characters, everywhere is represented by hexadecimal 54E5.

The character encoding of Unicode occupies two bytes.

④ 、 UTF-8

It is a kind of variable length character coding for Unicode, also known as universal code, which is one of the implementation methods of Unicode. The first byte of the encoding is still compatible with ASCII, which allows the software that used to deal with ASCII characters to continue to use without or with a few modifications. As a result, it has gradually become the preferred coding in e-mail, web pages and other applications for storing or transmitting text. The Internet Engineering working Group (IETF) requires that all Internet protocols must support UTF-8 encoding

123String str = new String ("Aa handsome pot"); byte [] strASCII = str.getBytes ("UTF-8"); System.out.println (Arrays.toString (strASCII)); / / [65, 97,-27,-72,-123,-23,-108,-123]

Store letters, numbers: 1 byte no matter what character set

Store Chinese characters: the GBK family occupies 2 bytes. UTF-8 occupies 3 bytes

You cannot use a single-byte character set (ASCII/ISO-8859-1) to store Chinese

2. Encoding and decoding of characters

Information is transmitted in bytes in a computer network. So how do you change it to bytes? This is the process of coding. So the computer receives this code, how to let the user know it? It is necessary to convert the bytes into a human-recognized string form, which is the process of decoding.

Encoding: converting strings to byte arrays

Decoding: converting byte arrays to strings

Note: ①, encoding format and decoding format must be the same, otherwise garbled

1234567891011121314String str = new String ("Aa handsome pot"); / / coding operation byte [] strByte = str.getBytes ("GBK"); System.out.println (Arrays.toString (strByte)) / / [65, 97,-53,-89,-71,-8] / / Decoding operation / / Note that the format of the encoded character set and the decoded character set must be the same (or its extended character set is also possible), otherwise the first one: the encoding format is GBK, the decoding format is ISO-8859-1, then the garbled code String str2 = new String (strByte, "ISO-8859-1"); System.out.println (str2) / / Aa? §? / / second: consistent encoding and decoding format String str3 = new String (strByte, "GBK"); System.out.println (str3); / / Aa handsome pot

②, sometimes the encoding format is the same as the decoding format, but the code is still garbled. This is because the data is processed by the server during transmission, and the server may be written by foreigners, so the data will be converted to another character format, so it will be garbled if you convert it directly to the format you want.

Solution: first get the data recovery encoding after the server, and then decode it.

1234567891011121314151617String str = new String ("Aa handsome pot"); / / coding operation byte [] strByte = str.getBytes ("UTF-8"); System.out.println (Arrays.toString (strByte)); / / [65, 97,-27,-72,-123,-23,-108,-123] / / transferred to ISO-8859-1 String str2 = new String (strByte, "ISO-8859-1") / / Decoding operation. If decoding is performed directly, String str3 = new String (str2.getBytes (), "UTF-8"); System.out.println (str3); / / Aa? / / for the above garbled code, we must first restore the previous encoding format of the server, and then decode it. Then there will be no garbled byte [] strByte2 = str2.getBytes ("ISO-8859-1"); String str4 = new String (strByte2, "UTF-8"); System.out.println (str4); / / Aa handsome pot

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Servers

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report