Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

Example Analysis of regular expression in C #

2025-04-01 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Development >

Share

Shulou(Shulou.com)06/02 Report--

This article will explain in detail the example analysis of regular expressions in C#. The editor thinks it is very practical, so I share it with you for reference. I hope you can get something after reading this article.

(1) "@" symbol

Although Dangchen is not a "member" of the C# regular expression, it is often paired with the C# regular expression. "@" means that the string that follows it is a "verbatim string", which is not easy to understand. For example, the following two statements are equivalent:

String x = "D:\\ My Huang\\ My Doc"

String y = @ "D:\ My Huang\ My Doc"

In fact, if you declare as follows, C# will report an error because "\" is used to escape in C#, such as "\ n" line wrapping:

String x = "D:\ My Huang\ My Doc"

(2) basic grammatical characters.

The complement of\ D\ d (with all characters as the complete set, the same below), that is, all non-numeric characters

\ w word characters, which refer to uppercase and lowercase letters, 0-9 numbers, underscores

Complement of\ W\ w

\ s white space characters, including line feed\ n, carriage return\ r, tab\ t, vertical tab\ v, page feed\ f

Complement of\ S\ s

. Any character except the newline character\ n

[.] Match all the characters listed in []

[^...]

Here are some simple examples:

Copy the contents to the clipboard program code

String I = "\ n"; string m = "3"; Regex r = new Regex (@ "\ D"); / / same as Regex r = new Regex ("\\ D"); / / r.IsMatch (I) result: true / / r.IsMatch (m) result: false string i = "%"; string m = "3"; Regex r = new Regex ("[a-z0-9]") / / match lowercase letters or numeric characters / / r.IsMatch (I) result: false / / r.IsMatch (m) result: true

(3) location characters

The "positioning character" represents a virtual character, it represents a position, and you can also intuitively think that the "positioning character" represents the small gap between a character and a character.

^ indicates that the following character must be at the beginning of the string

$indicates that the character before it must be at the end of the string

\ b match the boundary of a word

\ B matches the boundary of a non-word

In addition, it also includes: the character before\ A must be at the beginning of the character, the character before\ z must be at the end of the string, and the character before\ Z must be at the end of the string or before the newline character.

Here are some simple examples:

Copy the contents to the clipboard program code

String I = "Live for nothing,die for something"; Regex R1 = new Regex ("^ Live for nothing,die for something$"); / / r1.IsMatch (I) true Regex R2 = new Regex ("^ Live for nothing,die for some$"); / / r2.IsMatch (I) false Regex R3 = new Regex ("^ Live for nothing,die for some"); / / r3.IsMatch (I) true string i = @ "Live for nothing,die for something" / / Multiline Regex R1 = new Regex ("^ Live for nothing,die for something$"); Console.WriteLine ("R1 match count:" + r1.Matches (I) .Count); / / 0 Regex R2 = new Regex ("^ Live for nothing,die for something$", RegexOptions.Multiline); Console.WriteLine ("R2 match count:" + r2.Matches (I) .Count); / / 0 Regex R3 = new Regex ("^ Live for nothing,\ r\ ndie for something$") Console.WriteLine ("R3 match count:" + r3.Matches (I) .Count); / / 1 Regex R4 = new Regex ("^ Live for nothing,$"); Console.WriteLine ("R4 match count:" + r4.Matches (I) .Count); / / 0 Regex R5 = new Regex ("^ Live for nothing,$", RegexOptions.Multiline); Console.WriteLine ("R5 match count:" + r5.Matches (I) .Count); / 0 Regex R6 = new Regex ("^ Live for nothing,\ r\ n $") Console.WriteLine ("R6 match count:" + r6.Matches (I) .Count); / / 0 Regex R7 = new Regex ("^ Live for nothing,\ r\ n$", RegexOptions.Multiline); Console.WriteLine ("R7 match count:" + r7.Matches (I) .Count); / / 0 Regex R8 = new Regex ("^ Live for nothing,\ R$"); Console.WriteLine ("R8 match count:" + r8.Matches (I) .Count) / / 0 Regex R9 = new Regex ("^ Live for nothing,\ r $", RegexOptions.Multiline); Console.WriteLine ("R9 match count:" + r9.Matches (I) .Count); / / 1 Regex R10 = new Regex ("^ die for something$"); Console.WriteLine ("R10 match count:" + r10.Matches (I) .Count); / / 0 Regex R11 = new Regex ("^ die for something$", RegexOptions.Multiline); Console.WriteLine ("R11 match count:" + r11.Matches (I) .Count) Console.WriteLine ("R3 match count:" + r3.Matches (m) .count); / / 1 Regex R4 = new Regex (@ "\ bfor something\ b"); Console.WriteLine ("R4 match count:" + r4.Matches (I) .Count); / / 1 / /\ b is usually used to constrain a complete word

(4) repetitive description characters

The "repetitive description character" is one of the places where C# regular expressions are "very good and powerful":

{n} match the previous character n times

{n,} match the previous character n or more times

{n ~ m} matches the preceding characters n to m times

? Match the previous character 0 or 1 times

+ match the previous character one or more times

* match the previous character 0 times or formula 0 times

Here are some simple examples:

Copy the contents to the clipboard program code

String x = "1024"; string y = "+ 1024"; string z = "1024"; string a = "1"; string b = "- 1024"; string c = "10000"; Regex r = new Regex (@ "^\ +? [1-9],?\ d {3} $"); Console.WriteLine ("x match count:" + r.Matches (x) .Count); / 1 Console.WriteLine ("y match count:" + r.Matches (y) .Count) / / 1 Console.WriteLine ("z match count:" + r.Matches (z) .count); / / 1 Console.WriteLine ("a match count:" + r.Matches (a) .Count); / / 0 Console.WriteLine ("b match count:" + r.Matches (b) .Count); / / 0 Console.WriteLine ("c match count:" + r.Matches (c) .Count); / 0 / / matches integers from 1000 to 9999.

(5) alternative matching

The (|) symbol in C # regular expressions does not seem to have a special title, so let's call it "alternative matching". In fact, a match like [a xy] is also an alternative match, except that it can only match a single character, while (|) provides a wider range, (ab | match) means to match ab or match xy. Note that "|" and "()" are a whole here. Here are some simple examples:

Copy the contents to the clipboard program code

String x = "0"; string y = "0.23"; string z = "100.01"; string a = "100.01"; string b = "9.9"; string c = "99.9"; string d = "99."; string e = "00.1"; Regex r = new Regex (@ "^\ +? (100 (.0 +) *) | ([1-9]? [0-9]) (\.\ d +) $") Console.WriteLine ("x match count:" + r.Matches (x) .count); / / 1 Console.WriteLine ("y match count:" + r.Matches (y) .Count); / / 1 Console.WriteLine ("z match count:" + r.Matches (z) .Count); / / 1 Console.WriteLine ("a match count:" + r.Matches (a) .Count); / / 0 Console.WriteLine ("b match count:" + r.Matches (b) .Count) / / 1 Console.WriteLine ("c match count:" + r.Matches (c) .count); / / 1 Console.WriteLine ("d match count:" + r.Matches (d) .Count); / / 0 Console.WriteLine ("e match count:" + r.Matches (e) .Count); / / 0

The outermost parenthesis contains two parts "(100 (.0 +) *)", "([1-9]? [0-9]) (\.\ d +) *". These two parts are the relationship of "OR", that is, the regular expression engine tries to match 100 first, and if it fails, it tries to match the latter expression (representing the number in the range of [0100)).

Here are some simple examples:

String x = "\\"; Regex R1 = new Regex ("^\ $"); Console.WriteLine ("R1 match count:" + r1.Matches (x) .Count); / / 1 Regex R2 = new Regex (@ "^\\ $"); Console.WriteLine ("R2 match count:" + r2.Matches (x) .Count); / / 1 Regex R3 = new Regex ("^\ $"); Console.WriteLine ("R3 match count:" + r3.Matches (x) .Count) / / 0 / / match "\" string x = "\"; Regex R1 = new Regex ("^\" $"); Console.WriteLine (" R1 match count: "+ r1.Matches (x) .Count); / / 1 Regex R2 = new Regex (@" ^ "" $"); Console.WriteLine (" R2 match count: "+ r2.Matches (x) .Count); / 1 / / match double quotes

(7) Group and non-capture group

Here are some simple examples:

Copy the contents to the clipboard program code

String x = "Live for nothing,die for something"; string y = "Live for nothing,die for somebody"; Regex r = new Regex (@ "^ Live ([Amurz] {3}) no ([Amurz] {5}), die\ 1 some\ 2 $"); Console.WriteLine ("x match count:" + r.Matches (x) .Count); / / 1 Console.WriteLine ("y match count:" + r.Matches (y) .Count); / 0

/ / the regular expression engine remembers the matching content in "()" as a "group" and can be referenced by index. The "\ 1" in the expression is used to reverse reference the * groups that appear in the expression, that is, the * parentheses identified in bold, "\ 2", and so on.

String x = "Live for nothing,die for something"; Regex r = new Regex (@ "^ Live for no ([a murz] {5}), die for some\ 1 $"); if (r.IsMatch (x)) {Console.WriteLine ("group1 value:" + r.Match (x). Groups [1] .value); / / output: thing} / / get the contents of the group. Note that here is Groups [1], because Groups [0] is the entire matching string, that is, the contents of the entire variable x. String x = "Live for nothing,die for something"; Regex r = new Regex (@ "^ Live for no (? [a murz] {5}), die for some\ 1 $"); if (r.IsMatch (x)) {Console.WriteLine ("group1 value:" + r.Match (x). Groups ["G1"] .value); / / output: thing} / / can be indexed by group name. Use the following format to identify the name of a group (?) . String x = "Live for nothing nothing"; Regex r = new Regex (@ "([a murz] +)\ 1"); if (r.IsMatch (x)) {x = r.Replace (x, "$1"); Console.WriteLine ("var x:" + x); / / output: Live for nothing} / / removes the repeated "nothing" in the original string. Outside the expression, use "$1" to refer to * groups. The following is referenced by the group name: string x = "Live for nothing nothing"; Regex r = new Regex (@ "(? [a murz] +)\ 1"); if (r.IsMatch (x)) {x = r.Replace (x, "${G1}"); Console.WriteLine ("var x:" + x) / / output: Live for nothing} string x = "Live for nothing"; Regex r = new Regex (@ "^ Live for no (?: [a murz] {5}) $"); if (r.IsMatch (x)) {Console.WriteLine ("group1 value:" + r.Match (x). Groups [1] .value) / / output: (empty)} / add "?:" before the group to indicate that this is a "non-capture group", that is, the engine will not save the contents of the group.

(8) greed and non-greed

The regular expression engine is greedy, and as long as the pattern allows, it will match as many characters as possible. You can change the matching pattern to non-greedy by adding "?" after the "repeat description character" (*, +). Look at the following example:

Copy the contents to the clipboard program code

String x = "Live for nothing,die for something"; Regex R1 = new Regex (@ ". * thing"); if (r1.IsMatch (x)) {Console.WriteLine ("match:" + r1.Match (x) .value); / / output: Live for nothing,die for something} Regex R2 = new Regex (@ ". *? thing"); if (r2.IsMatch (x)) {Console.WriteLine ("match:" + r2.Match (x) .Value) / / output: Live for nothing}

(9) backtracking and non-backtracking

Use "(? >...)" Method to make a non-retroactive declaration. Due to the greedy nature of the regular expression engine, in some cases, it will backtrack to get a match, as shown in the following example:

Copy the contents to the clipboard program code

String x = "Live for nothing,die for something"; Regex R1 = new Regex (@ ". * thing,"); if (r1.IsMatch (x)) {Console.WriteLine ("match:" + r1.Match (x) .value); / / output: Live for nothing,} Regex R2 = new Regex (@ "(? >. *) thing,") If (r2.IsMatch (x)) / / does not match {Console.WriteLine ("match:" + r2.Match (x) .value);}

/ / in R1, ". *" because of its greedy nature, it will match all the way to the * of the string, then match "thing", but fail in the match ",", and the engine will backtrack and match successfully at "thing,".

In R2, the entire expression match failed because non-backtracking was enforced.

(10) forward and reverse pre-search

Forward pre-search declaration format: positive declaration "(? =...)" , the negative declaration "(?!)", the declaration itself is not part of the final matching result, see the following example:

Copy the contents to the clipboard program code

String x = "1024 used 2048 free"; Regex R1 = new Regex (@ "\ d {4} (? = used)"); if (r1.Matches (x) .count = = 1) {Console.WriteLine ("R1 match:" + r1.Match (x) .value); / / output: 1024} Regex R2 = new Regex (@ "\ d {4} (?! Used) "); if (r2.Matches (x) .count = = 1) {Console.WriteLine (" R2 match: "+ r2.Match (x) .value); / / output: 2048}

The positive declaration in R1 indicates that it must be followed by "used" after four digits, and the negative declaration in R2 means that four digits cannot be followed by "used".

Reverse pre-search declaration format: positive declaration "(?

The reverse positive declaration in / / R1 indicates that four digits must be followed by "used:", while the reverse negative declaration in R2 means that four digits must be followed by a string other than "used:".

(11) hexadecimal character range

Characters with\ xXX numbers in the range of 0 to 255. for example, spaces can be represented by "\ x20".

\ uXXXX any character can be represented by "\ u" plus a 4-digit hexadecimal number of its number. for example, Chinese characters can be represented by "[\ u4e00 -\ u9fa5]".

(12) relatively complete matching of [0100]

The following is a more comprehensive example. For matching [0100], special considerations include

* 00 is legal, 00. Legal, 00.00 legal, 001.100 legal

* an empty string is illegal, only the decimal point is illegal, and a value greater than 100 is illegal.

* the value can be suffixed, for example, "1.07f" indicates that the value is of a float type (not considered)

Copy the contents to the clipboard program code

Regex r = new Regex (@ "^\ +? 0 *)? (?: 100 (\ .0 *)? | (\ d {0 2} (? =\.\ d) |\ d {1Magne2} (? = ($|\. $)) (\.\ d *)) $"); string x = ""; while (true) {x = Console.ReadLine () If (x! = "exit") {if (r.IsMatch (x)) {Console.WriteLine (x + "succeed!");} else {Console.WriteLine (x + "failed!");}} else {break;}}

(13) exact matching is sometimes difficult

Some requirements to achieve accurate matching is more difficult, such as: date, Url, Email address, etc., some of which you even need to study some special documents to write accurate and complete expressions, for this case, can only choose the second, to ensure a more accurate match.

This is the end of this article on "sample Analysis of regular expressions in C#". I hope the above content can be helpful to you, so that you can learn more knowledge. if you think the article is good, please share it for more people to see.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Development

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report