What are the character encodings in front-end development 07/19 Update SLTechnology News&Howtos

What are the character encodings in front-end development

2025-07-19 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Development >

Shulou(Shulou.com)06/02 Report--

This article will explain in detail about the character coding in front-end development. The editor thinks it is very practical, so I share it for you as a reference. I hope you can get something after reading this article.

In the process of front-end development, we will come into contact with all kinds of coding, mainly UTF-8 and HTML entity coding, but there are more than these two kinds of coding in the world of web front-end, and the choice of coding will also cause some problems, such as the compatibility of different codes in the front-end development process, the XSS loopholes that may be caused by multi-byte coding, and so on. Therefore, this paper aims to have a better comprehensive understanding of the character coding involved in the front-end development field, and avoid possible interactions and neglected loopholes in the development.

URL coding

I have described three sets of functions in URL encoding and decoding and Base64, and compared the relationship between these three sets of functions and base64 coding, which is briefly explained here.

Escape/unescape function does unicode coding for wide characters and hexadecimal coding for code value, so using escape for Chinese character coding will get a result like "\ uxxxx"; encodeURI/decodeURI,encodeURIComponent/decodeURIComponent function for wide-byte coding is different from escape, first for wide-byte characters UTF-8 coding, and then for the encoded results of "%" replacement to get the result. All of the above are for wide-byte characters, and the range of safe characters for the above three sets of functions is also different for the pre-encoded ASCII characters, as can be seen above.

Base64 coding

Base64 coding is usually used for picture and icon coding at the front end. It divides every three 8-bit bytes into a group, divided into four groups of 6-bit bytes, and the high-order zeros of each byte form four 8-bit bytes, from which we can see that base64 coding is reversible. In most browsers, the base64 encoding function for ASCII characters, window.btoa (), is provided. This function cannot encode base64 for wide bytes, but if it is for Chinese encoding, it needs to convert bit UTF-8 coding, and then base64 coding.

Function unicodeToBase64 (s) {return window.btoa (unescape (encodeURIComponent (s)}

Encoding wide-byte characters through encodeURIComponent is in the form of "% xx", which differs only from UTF8 encoding in prefixes (this is determined by the specification RFC3986, which encodes non-ASC characters in some form and converts them to hexadecimal, with "%" before the bytes). So it can be converted to UTF8 bytes through unescape (encodeURIComponent (s)). Of course, you can also write a conversion function that behaves UTF-8-encoded bytes according to certain rules, as in the following example:

````unescape (encodeURIComponent ("China")) / / result: "ä ZOSATHER" encodeURIComponent ("China") / / result: "% E4%B8%AD%E5%9B%BD" console.log ("\ u00E4\ u00B8\ u00AD\ u00E5\ u009B\ u00BD") / / result: "ä ZHAZUTH" ```

Through a simple replace function, you can complete the conversion from URL coding to UTF8 coding, and then complete the conversion from wide-byte characters to base64 coding. With this function, we can manually generate some content in data URI form, and we can convert the text by simply defining the MIME type and encoding method, such as the following code:

Abc / / before coding: test ```

Compatibility between front-end UTF8 coding and back-end GBK coding

At present, the front end is mostly encoded by UTF8, whether it is html, js or css, while the back end is mostly decoded by GBK or GB2312 due to historical reasons, so the URL encoded string transmitted by the front end through parameter cannot be decoded directly in the background. For better compatibility, the front end can encode URL twice, that is, encodeURIComponent ("China"). After receiving the parameters, the back end uses GBK or GB2312 to decode first. Get the UTF8 encoding and then decode it with UTF8. Twice coding is mainly completed by using the characteristic of "ASC characters using GBK or GB2312 coding unchanged", which is full of skill.

HTML entity coding and binary coding

Entity encoding is for reserved characters in HTML, such as "" and so on. There are two forms of entity encoding & entity names; or & entity_number;, * * is encoded in the form of entity numbers because of differences in browser compatibility with & entity names;.

Binary coding, as the name implies, encodes the code values corresponding to ASC characters in hexadecimal or decimal, and converts them to & # x; (hexadecimal) or & # D; (decimal).

There is no particular emphasis on entity coding alone, and it is included in a separate chapter to emphasize the relationship between the two codes and the scope of the js code.

Cccc

[xss_clean] ('

'); [xss_clean] ('

') [xss_clean] ('\ u003c\ u0069\ u006d\ u0067\ u0020\ u0073\ u0072\ u0063\ u003d\ u0031\ u0020\ u006f\ u006e\ u0065\ u0072\ u0072\ u006f\ u0072\ u0072\ u0061\ u006c\ u0065\ u0072\ u0074\ u0028\ u0032\ u0033\ u0029\ u003e')

Eight examples are listed in the code, * output HTML fragments in the event handler onclick; the second output HTML fragments encoded by the entity; and the third is directly targeted at

Do hexadecimal encoding; the fourth is hexadecimal encoding for onerror event handlers; the fifth is to output entity-encoded characters in the script; the sixth is hexadecimal encoding for event handlers; the seventh is hexadecimal encoding for all characters; the eighth is output directly in script

The unicode code of.

Comparing the results, the first two examples pop up alert; after clicking, and the third example displays the text on the page.

The fourth example will pop up the fifth and seventh alert; will output the string at the beginning of the page load, and the sixth and eighth will also pop up alert after alert in the fourth example. By analyzing these results, we can see through two examples that the inline js code in the HTML tag (except the script tag) can encode the HTML entity. This is a very important point, and we can verify it more clearly:

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.