Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

How to sort out URL Encoding and character Encoding supported in HTML5

2025-02-25 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Development >

Share

Shulou(Shulou.com)06/03 Report--

This article is about how to organize the URL encoding and character encoding supported in HTML5. Xiaobian thinks it is quite practical, so share it with you. I hope you can gain something after reading this article. Let's not say much. Let's take a look at it together with Xiaobian.

URL-encoded

URL encoding is the conversion of unprintable characters or characters with special meanings in URLs into a representation that is understood and generally accepted by Web browsers and servers. These characters include:

ASCII control characters-Unprintable characters are usually used for output control. Character ranges are 00-1F in hexadecimal (0-31 in decimal) and 7F (127 in decimal). The complete coding table is provided below.

Non-ASCII control characters-These characters are outside the range of the 128 ASCII character set. This range is part of the ISO-Latin character set and contains the "first half" of the entire hexadecimal ISO-Latin character set 00-FF (decimal 128-255). The complete coding table is provided below.

Preserve characters-such as dollar sign, ampersand, plus sign, universal sign, slash, colon, divide, equal sign, question mark, and "at." All of these symbols have different meanings within the URL and therefore need to be encoded. A complete coding table is provided below.

Unsafe characters-including spaces, question marks, less than symbols, greater than symbols, point characters, percentage symbols, left part of braces, right part of braces, pipe characters, backslashes, carets, wavy lines. Left bracket, right bracket, accent. For some reason, these characters appear in URLs with the potential for misunderstanding. These characters should also always be encoded. The complete coding table is provided below.

Encoding notation requires three characters to replace the expected character: a percent sign, two hexadecimal digits representing the character position in the ASCII character set,

example

One of the most common special characters is space. We cannot enter a space directly in the URL. A space in the character set is the hexadecimal 20. Therefore,%20 can be used to indicate spaces when requesting servers.

This URL actually retrieves a document named new pricing.html from www.example.com

ASCII control character encoding

This includes hexadecimal character codes 00-1F (decimal 0-31) and 7F (decimal 127).

decimal format hexadecimal value character URL code 000

101

202

303

404

505

606

707

808 Backspace 909tab%09100a Newline %0a110b

120c

130d carriage return %0d140e

150f

1610

1711

1812

1913

2014

2115

2216

2317

2418

2519

261a

271b

281c

291d

301e

311f

1277f

%7f

Non-ASCII control character encoding

Includes the entire hexadecimal ISO-Latin character set 80-FF (decimal 128-255) encoded "first half."

decimal format hexadecimal value character URL encoding 12880€%8012981?% 8113082?% 8213183?% 8313284?% 8413385…%8513486?% 8613587?% 8713688?% 8813789‰%891388a?% 8a1398b?% 8b1408c?% 8c1418d?% 8d1428e?% 8e1438f?% 8f14490?% 9014591‘%9114692’%9214793"%9314894"%9414995?% 9515096–%9615197-%9715298?% 9815399?% 991549a?% 9a1559b?% 9b1569c?% 9c1579d?% 9d1589e?% 9e1599f?% 9f160a0

%a0161a1?% a1162a2¢%a2163a3£%a3164a4¤%a4165a5¥%a5166a6|%a6167a7§%a7168a8¨%a8169a9?% a9170aaa%aa171ab?% ab172ac?% ac173ad-%ad174ae?% ae175afˉ%af176b0°%b0177b1±%b1178b22%b2179b33%b3180b4′%b4181b5μ%b5182b6?% b6183b7·%b7184b8?% b8185b91%b9186bao%ba187bb?% bb188bc?% bc189bd?% bd190be?% be191bf?% bf192c0à%c0193c1á%c1194c2?% c2195c3?% c3196c4?% c4197c5?% c5198c6?% v6199c7?% c7200c8è%c8201c9é%c9202caê%ca203cb?% cb204ccì%cc205cdí%cd206ce?% ce207cf?% cf208d0D%d0209d1?% d1210d2ò%d2211d3ó%d3212d4?% d4213d5?% d5214d6?% d6215d7×%d7216d8?% d8217d9ù%d9218daú%da219db?% db220dcü%dc221ddY%dd222deT%de223df?% df224e0à%e0225e1á%e1226e2a%e2227e3?% e3228e4?% e4229e5?% e5230e6?% e6231e7?% e7232e8è%e8233e9é%e9234eaê%ea235eb?% eb236ecì%ec237edí%ed238ee?% ee239ef?% ef240f0e%f0241f1?% f1242f2ò%f2243f3ó%f3244f4?% f4245f5?% f5246f6?% f6247f7÷%f7248f8?% f8249f9ù%f9250faú%fa251fb?% fb252fcü%fc253fdy%fd254fet%fe255ff?% ff

reserved character encoding

The following table is used to encode reserved characters.

decimal format hexadecimal value character URL encoding 3624$%243826&%26432b+%2b442c,%2c472f/%2f583a:%3a593b;%3b613d=%3d633f?% 3f6440@%40

unsafe character encoding

The following table is used to encode unsafe characters.

decimal format hexadecimal value character URL encoding 3220space%203422"%22603c%3e3523#%233725%%251237b{%7b1257d}%7d1247c| %7c925c\%5c945e^%5e1267e~%7e915b[%5b935d]%5d9660`%60

character encoding

Character encoding is a method of converting bytes into characters. To validate or display an HTML document, the program must select a character encoding. HTML5 authors have three ways to set character encoding:

HTTP Content-Type header:

If you are writing cgi programs or similar programs, you can set arbitrary character encodings using the HTTP Content-Type header:

Here is a simple example:

XML/HTML Code Copy content to clipboard

print "Content-Type: text/html; charset=utf-8\r\n";

Element:

You can specify the encoding of the first 512 bytes of an HTML5 document using elements with charset attributes:

Here is a simplified example:

XML/HTML Code Copy content to clipboard

Although this syntax is allowed, the syntax above requires substitution.

Unicode Byte Order Mark (BOM)

A byte order marker (BOM) consists of a U+FEFF character code at the beginning of a data stream, which can be used as a signature to define byte order and encoding, primarily unmarked plaintext files.

Many Windows programs (including Windows Notepad) add 0xEF, 0xBB, 0xBF to the beginning of any document saved as UTF-8. This is the UTF-8 encoding of Unicode byte order notation (BOM), often referred to as UTF-8 BOM, although it has nothing to do with byte order.

For HTML5 documents, we can use Unicode byte order markup (BOM) characters at the beginning of the file. This character provides a signature for the encoding used.

The above is how to organize the URL encoding and character encoding supported in HTML5. Xiaobian believes that some knowledge points may be seen or used in our daily work. I hope you can learn more from this article. For more details, please follow the industry information channel.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Development

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report