What is the use of regular expressions? 01/02 Update SLTechnology News&Howtos

What is the use of regular expressions?

2026-01-02 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Shulou(Shulou.com)06/02 Report--

This article is about what regular expressions do. The editor thinks it is very practical, so share it with you as a reference and follow the editor to have a look.

Examples

/ / this is a regular expression used to match URL, grouping to get different parts of the information var parse_url = / ^ (?: ([A-Za-z] +):) (\ / {0Magne3}) ([0-9.\-A-Za-z] +) (?:: (\ d +))? (?:\ / ([^? #] *))? (?:\? ([^ #] *))? (?: # (. *))? $/ Var url = "http://www.ora.com:80/goodparts?q#fragment";var result = parse_url.exec (url); var names = [" url "," scheme "," slash "," host "," port "," path "," query "," hash "]; var I for (I = 0; I < names.length; ipheres +) {[xss_clean] ln (names[ I] +": "+ result [I] +") } / / the output of this code is as follows: url: http://www.ora.com:80/goodparts?q#fragmentscheme: httpslash: / / host: www.ora.comport: 80path: goodpartsquery: qhash: fragment

Analysis

Let's break down the various parts of parse_url and see how it works:

The ^ character represents the beginning of the string, which is an anchor that instructs exec not to skip prefixes that are not like URL, but only to match strings that look like URL from the beginning.

(?: ([A-Za-z] +):)

This factor matches a protocol name, but only if it is followed by a: (colon). (?) . (.) Represents a non-capturing packet (noncapturing group). Suffix? Indicates that the grouping is optional, which indicates that it is repeated 0 or 1 times. (. . (.) Represents a capturing packet. A capture group copies the text it matches and puts it in the result array. Each capture packet is assigned a number. The number of the first captured packet is 1, so a copy of the text matched by the packet appears in result [1]. [. . .] Represents a character class. The character class A-Za-z contains 26 uppercase letters and 26 lowercase letters. Concatenation character-indicates the range from Amurz. The suffix + indicates that the character class will be matched one or more times. This group is followed by the character:, which matches literally.

(\ / {0Pol 3})

This factor is capture grouping 2, matching /. \ / indicates that the match should be / (slash). It is escaped with\ (backslash) so that it is not misinterpreted as the Terminator of this regular expression. The suffix {0prit 3} indicates that / will match 0such 3 times.

([0-9.\-A-Za-z] +)

This factor is capture grouping 3. It matches a host name, consisting of one or more numbers, letters, and. Or-character composition. -will be escaped as\-to prevent confusion with the hyphen that represents the range.

(?: (\ d +))?

This optional factor matches the port number, which consists of a prefix: a sequence of one or more digits. \ d represents a numeric character. A number string consisting of one or more numbers is captured by capture packet 4.

(?:\ / ([^ #] *))?

This factor is also an optional grouping that matches the path. The grouping begins with a /. The subsequent character class [^? #] starts with a ^, which means that the class contains division? All characters except # and #. * indicates that this character class will be matched 0 or more times.

Note that my handling here is not rigorous. This kind of matching division? All characters except # and #, including line Terminators, control characters, and a large number of other characters that should not be matched here. In most cases, it will do as we expect, but some malicious text may be at risk of leaking in. Lax regular expressions are a common source of security vulnerabilities. It is much easier to write loose regular expressions than to write rigorous regular expressions.

(?:\? ([^ #] *)?

Is this factor one by one? Optional grouping to start. It contains capture grouping 6, which contains 0 or more non-# characters.

(?: # (. *))?

This factor is an optional grouping starting with #. . Matches all characters except the line Terminator.

$indicates the end of the string. It ensures that there is nothing more at the end of the URL.

Thank you for reading! This is the end of the article on "what is the use of regular expressions"? I hope the above content can be of some help to you, so that you can learn more knowledge. If you think the article is good, you can share it for more people to see!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.