Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

Conversion method of Chinese character location code, GB code (interchange code) and internal code

2025-02-03 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Network Security >

Share

Shulou(Shulou.com)06/01 Report--

I. Location code

In order to meet the needs of computer processing Chinese character information, GB2312 national standard was promulgated in 1981. The standard selects 6763 commonly used Chinese characters (3755 of which are first-class commonly used Chinese characters and 3008 of which are second-class Chinese characters) and 682 non-Chinese characters, and specifies a standard code for each character to facilitate the exchange of Chinese characters between different computer systems.

GB2312 character set constitutes a two-dimensional table with 94 rows and 94 columns. The row number is called area code, and the column number is called bit number. The position of each Chinese character or symbol in the code table is represented by its area code and bit number.

For the convenience of processing and storage, the area code and digit number of each Chinese character are represented by one byte respectively in the computer. For example, the area code of the word "learn" is 49, the bit number is 07, and its location code is 4907, which is expressed as a binary number of 2 bytes:

00110001 00000111

Second, the national standard exchange code

Location codes cannot be used for Chinese communication because they may conflict with the control codes (00H ~1FH) used for communication (i.e. 0~31). (Because ASCII code is divided into control signal code and type character code, the first 32 are control codes, such as carriage return, line feed, backspace, etc., in order to avoid these control codes, the national standard code stipulates that 20H is added to the location code, i.e., the hexadecimal number of 32) ISO2022 stipulates that the area code and digit number of each Chinese character must be added with 32 (i.e., the binary number 0010000) respectively. The code obtained through such processing is called the national standard exchange code, referred to as the exchange code or the international code. Therefore, the national standard exchange code of the word "learn" is calculated as:

00110001 00000111

+00100000 +00100000

--------------------------

01010001 00100111

In hexadecimal notation, it is 5127H.

III. Internal code

Because Chinese characters and Western characters are usually mixed in text, Chinese information will be confused with single-byte ASCII code if it is not specially identified. One solution to this problem is to treat a Chinese character as two extended ASCII codes, so that the most significant bits of the two bytes representing GB2312 Chinese characters are both 1. This kind of double-byte Chinese character code with high order 1 is the internal code of GB2312 Chinese characters, which is referred to as internal code for short.

Therefore, the built-in code of the word "learn" is:

11010001 10100111

In hexadecimal notation, it is D1A7H.

Finally, it should be pointed out that the input coding of Chinese characters and the internal coding of Chinese characters are concepts of different categories. No matter what coding input method (such as Pinyin, Five-stroke font, etc.) is used to input a Chinese character, its internal code is the same.

IV. Summary

The Conversion Relationship between Location Code, National Standard Code and Internal Code

Methods:

(1) The location code is first converted into hexadecimal number representation.

(2)(hexadecimal representation of location code)+2020H= national standard code;

(3) GB code +8080H= internal code

For example: Take the Chinese character "Da" as an example, the internal code of "Da" is 2083

Example analysis:

1. The area code is 20 and the digit number is 83.

2. Convert location number 2083 to hexadecimal representation as 1453H

3. 1453H+2020H=3473H, resulting in the national standard code 3473H.

4, 3473H +8080H = B4F3H, the internal code is B4F3H

5. 1453H + A0A0H = B4F3H, the internal code is B4F3H.

6. Location code of internal code B4F3H-A0A0H=1453H

The big character area code is 20H (area 32, 83 bits).

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Network Security

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report