Example Analysis of charset selection for HTML Page Encoding 04/18 Update SLTechnology News&Howtos

Example Analysis of charset selection for HTML Page Encoding

2026-04-18 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Development >

Shulou(Shulou.com)06/01 Report--

This article mainly introduces the HTML page coding charset selection example analysis, has a certain reference value, interested friends can refer to, I hope you can learn a lot after reading this article, the following let Xiaobian take you to understand.

I. the importance of coding

Coding can cause readers to garble via IE time division web pages, and it can also lead to div+css compatibility Hack.

Second, the coded seat

Usually this section of the page code is placed on both ends of the html page.

III. Html coding pattern

After tampering with the utf-8 in charset=utf-8, you can tamper with the coding of the web page.

Generally speaking, we also need to apply @ charset "utf-8" to the top of the CSS file when we write the CSS file; define the encoding type of the CSS file. General html source code and css file coding to compete, assuming that non-confrontation will lead to CSS hack, page garbled page typesetting disorder and other compatibility problems.

4. Types of html codes commonly used

The popular ones commonly used in foreign countries are utf-8 and gb2312. Usually these two types can meet the needs of domestic web coding. Of course, these two coding types are also used in normal models and databases to process web pages and store data types.

5. UTF-8 has a personality:

The UCS characters Ubun0000 to Ubun007F (ASCII) are encoded as bytes 0x00 to 0x7F (ASCII compatible). This means that files that solicit only 7-bit ASCII characters are the same in both ASCII and UTF-8 encodings.

All the UCS characters > Ubun007F are encoded as a multi-byte string, each with an identification table flag bit set. Therefore, ASCII bytes (0x00-0x7F) cannot be used as a small part of any other character.

The first byte of a multi-byte string that represents a non-ASCII character is always in the range of 0xC0 to 0xFD, and indicates that the character contains geometric bytes. The other bytes of the multi-byte string are in the 0x80 to 0xBF field. This makes resynchronization particularly easy and makes the encoding unavailable and less affected by lost bytes.

Can be compiled into all or 231pieces of UCS code

UTF-8-encoded characters can actually be up to 6 bytes long, while 16-bit BMP characters can be up to 3 bytes long.

The arrangement order of Bigendian UCS-4 byte strings is scheduled.

Bytes 0xFE and 0xFF are never used in UTF-8 encoding.

6. GB2312 has the following characteristics

The GB2312 scale contains a total of 6763 Chinese characters, including 3755 first-class Chinese characters and 3008 second-class Chinese characters; at the same time, GB2312 includes 682 full-form characters, including Latin letters, Greek letters, Japanese hiragana and katakana, and Russian Cyrillic letters.

The rise of GB2312 has basically met the needs of computer processing of Chinese characters, and the Chinese characters it contains have covered 99.75% of the frequency of use. The "partition" processing of the received Chinese characters is developed in GB2312, and each zone contains 94 Chinese characters / symbols. This flashing mode is also referred to as the location code.

Area 01-09 is an extraordinary symbol.

Area 16-55 is a first-class Chinese character, sorted by pinyin.

Area 56-87 is a second-class Chinese character, sorted by radical / stroke.

No coding was found in areas 10-15 and 88-94.

For example, the word "ah" is the first Chinese character in GB2312, and its location code is 1601. Byte layout in French that uses GB2312, the EUC storage gate diameter is usually accepted for compatibility with ASCII. Each Chinese character and symbol is represented by two bytes. The first byte is referred to as "high byte" and the second byte is called "low byte". High-order bytes use 0xA1-0xF7 (add the area code of 01-87 to 0xA0), and low-order bytes use 0xA1-0xFE (add 01-94 to 0xA0). For example, the word "ah" is stored in 0xB0A1 in most dharma models. (compared with the location code: 0xB00xA014160xA1140xA01).

Therefore, in GB2312 coding, the decimal code of Chinese character area code is from 176to 247and the bit code is from 161to 255.00. Therefore, 6763 is stored less than 82mm 94m 6768, because there are no Chinese character codes for a total of five codes between the area code and the bit code, so there are 6768-5mm 6763.

GB2312 coding can be widely recognized as the language of foreign films.

7. Guarantee the use of charset coding

UTF-8 can clearly distinguish between simplified and traditional Chinese. This code can be used locally in Taiwan and key places.

Eighth, the problem of incorrect title of web page compatibility caused by coding:

If the code mix will make the page garbled is also called incompatible, the rating is that the use of code mix in the CSS will lead to css hack.

Thank you for reading this article carefully. I hope the article "sample Analysis of HTML Page coding charset selection" shared by the editor will be helpful to you. At the same time, I also hope you will support us and pay attention to the industry information channel. More related knowledge is waiting for you to learn!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.