How to use Unicode to match Special characters in regular expressions 07/06 Update SLTechnology News&Howtos

How to use Unicode to match Special characters in regular expressions

2025-07-06 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Shulou(Shulou.com)06/02 Report--

This article is about how to use Unicode to match special characters in regular expressions. The editor thinks it is very practical, so share it with you as a reference and follow the editor to have a look.

First of all, it is stated that all the code in this article runs under ES6, and ES5 needs to be modified before it can be run, but this article does not cover many new features of ES6, and because v8 does not support the u modifier, the final implementation is basically written with the knowledge of ES5.

At first, I just want to record the regular expression to match special characters in the way of unicode, writing that v8 does not support the u modifier, and then turn to study how to convert the string to the format of utf-16. In the process of studying how to convert, I found that the regular of ES5 does not support the string of unicode encoding unit > 0x10000, and then realize the conversion of a string larger than 0x10000.

There has been a need for a practical regular expression to match special characters, such as the text 'ab*cd$ Hello to me]\ nseg$me*ntfault\ nhello,world', the user can choose to split the string with * or $.

In javascript, $and * are predefined special characters that cannot be written directly in regular expressions, but need to be escaped as /\ $/ or /\ * /.

We need to write a regular expression according to the user's choice, which is encapsulated into a function:

Function reg (input) {

Return new RegExp (`\\ ${input}`)

}

This method of writing looks good at first. After escaping all the characters, you can encounter some special characters that can be matched, but the reality is cruel: when the user enters a character such as n or t, the returned regular expression is /\ n / or /\ t /, matching all tabs, which goes against the user's original intention.

There is usually a way to list all the special characters that need to be escaped and then match them one by one, which is very energy-consuming and may fail to match because of the special characters that are not counted.

At this time, unicode makes its grand debut. In JavaScript, we can also use unicode to represent a character. For example,'a 'can be written as'\ u {61} 'and' you 'can be written as'\ u {4f60}'.

For the introduction of unicode, please see the detailed explanation of Unicode and JavaScript.

The charCodeAt () method is provided in ES5 to return the Unicode value of the character at the specified index, except for the Unicode encoding unit > 0x10000. A new method codePointAt () has been added in ES2015 to return a value greater than the 0x10000 string. The returned value is decimal, and we also need to convert it to hexadecimal through toString (16).

The encapsulated function is as follows

Function toUnicode (s) {

Return `\ u {${s.codePointAt () .toString (16)}`

}

ToUnicode ('$')->'\ u {24}'

Reencapsulate the reg function as

Function reg (input) {

Return new RegExp (`${toUnicode (input)}`,'u')

}

In fact, I hope this is true, but unfortunately, V8 does not support the u modifier of RegExp. If V8 supports it, it should end here. It doesn't matter, it just provides an idea of escaping special characters in unicode.

Although v8 does not support the u modifier, as an aspiring programmer, we can't stop there, we can also use other methods to continue to improve this.

Function toUnicode (s) {var a = `\\ u$ {utf (s.charCodeAt (0) .toString (16))} `if (s.charCodeAt (1)) a = `$ {a}\\ u$ {utf (s.charCodeAt (1) .toString (16))} `return a} function utf (s) {return Array.from ('00'). Concat (Array.from (s)). Slice (- 4). Join ('')} / here use var instead of let declaration This is because the code can be copied directly to the chrome console and you can see the execution result / / test / / toUnicode ('a')-> "\ u0061" / / toUnitcode ('destroy')-> "\ ud842\ udfb7" function reg (input) {return new RegExp (`${toUnicode (input)}`)} / / test reg ('$'). Test ('$')-- > true Thank you for reading! This is the end of this article on "how to use Unicode to match special characters in regular expressions". I hope the above content can be of some help to you, so that you can learn more knowledge. if you think the article is good, you can share it out for more people to see!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.