Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

What are the character sets of HTML

2025-03-26 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Development >

Share

Shulou(Shulou.com)06/01 Report--

In this article, the editor introduces in detail "what is the character set of HTML", the content is detailed, the steps are clear, and the details are handled properly. I hope that this article "what is the character set of HTML" can help you solve your doubts.

HTML character set

To display HTML pages correctly, browsers must know which character set to use.

The early character set used on the World wide Web was ASCII. ASCII supports 0-9 numbers, uppercase and lowercase English alphabets, and some special characters.

HTML character set

Because the characters used in many countries do not belong to ASCII, the default character set for modern browsers is ISO-8859-1.

If the page uses a different character set than ISO-8859-1, it should be specified in the label.

ISO character set

The ISO character set is a standard character set defined by the International Standards Organization (ISO) for different alphabet / languages.

The following lists the different character sets used around the world:

Character set description scope of use

ISO-8859-1 Latin alphabet part 1 North America, Western Europe, Latin America, Caribbean, Canada, Africa

ISO-8859-2 Latin alphabet part 2 Eastern Europe

ISO-8859-3 Latin alphabet part 3 SE Europe, Esperanto, other miscellaneous

ISO-8859-4 Latin alphabet part 4 Scandinavia / Baltic (and other parts not included in ISO-8859-1)

ISO-8859-5 Latin/Cyrillic part 5 uses languages of the ancient Slavic alphabet, such as Bulgarian, Belarusian, Russian, Macedonian

ISO-8859-6 Latin/Arabic part 6 uses Arabic alphabet language

ISO-8859-7 Latin/Greek part 7 Modern Greek and mathematical symbols derived from Greek

ISO-8859-8 Latin/Hebrew part 8 uses the Hebrew language

ISO-8859-9 Latin 5 part 9 Turkish. Except that Turkish characters replace Icelandic characters, others are the same as ISO-8859-1.

ISO-8859-10 Latin 6 Lapland, Germanic, Eskimo Nordic

ISO-8859-15 Latin 9 (aka Latin 0) is similar to ISO 8859-1, with euro symbols and other characters replacing less used symbols.

ISO-2022-JP Latin/Japanese part 1 Japanese

ISO-2022-JP-2 Latin/Japanese part 2 Japanese

ISO-2022-KR Latin/Korean part 1 Korean

Unicode standard

Due to the capacity limitations of the character sets listed above and their incompatibility with multiple language environments, the Unicode Alliance developed the Unicode standard.

The Unicode standard covers all characters, punctuation and symbols in the world.

No matter what platform, program or language it is, Unicode can process, store and exchange text data.

HTML character set

Unicode Alliance

The Unicode Alliance developed the Unicode standard. Their goal is to replace the existing character set with the standard Unicode conversion format (UTF).

The Unicode standard has been successful, and Unicode has been implemented in XML, Java, ECMAScript (JavaScript), LDAP, CORBA 3.0, and WML. Unicode is also supported in many operating systems and in all modern browsers.

The Unicode Alliance works with leading standards development organizations such as ISO, W3C, and ECMA.

Unicode can be compatible with different character sets. The most commonly used encoding methods are UTF-8 and UTF-16:

Character set description

Characters in UTF-8 UTF8 can be 1-4 bytes long. UTF-8 can represent any character in the Unicode standard. UTF-8 is backward compatible with ASCII. UTF-8 is the preferred coding for web pages and e-mail.

UTF-16 16-bit Unicode conversion format is a Unicode variable character encoding that encodes all Unicode instruction tables. UTF-16 is mainly used in operating systems and environments, such as Microsoft's Windows 2000/XP/2003/Vista/CE and Java and .NET bytecode environments.

Tip: the first 256 Unicode character set characters correspond to 256 ISO-8859-1 characters.

HTML character set

Tip: all HTML 4 processors support UTF-8, while all XHTML and XML processors support UTF-8 and UTF-16!

After reading this, the article "what are the character sets of HTML" has been introduced. If you want to master the knowledge points of this article, you still need to practice and use it yourself. If you want to know more about related articles, please follow the industry information channel.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 288

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Development

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report