In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-04-05 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >
Share
Shulou(Shulou.com)06/02 Report--
This article mainly introduces how to use metacharacters in regular expressions, which is very detailed and has a certain reference value. Friends who are interested must read it!
The details are as follows:
Note: in all examples, the regular expression matching result is included between [and] in the source text, and some examples will be implemented using Java. If it is the regular expression usage of java itself, it will be explained in the appropriate place. All the java examples passed the test under JDK1.6.0_13.
Escape special characters
Metacharacters are characters that have a special meaning in regular expressions. Because metacharacters have a special meaning in regular expressions, they cannot be used to represent themselves. A metacharacter can be escaped by preceded by a backslash, so that the resulting escape sequence matches the character itself rather than its special metacharacter meaning. For example, if you want to match [and], you must escape it:
And
.
The slash\ character is required for metacharacter escape, which means that the\ character orientation is also a metacharacter, which must be escaped to\\ to match the\ character itself. Such as matching windows file path.
Matching white space characters
Metacharacters can be roughly divided into two types: one is used to match text (such as.), and the other is required by the syntax of regular expressions (such as [and]).
When searching for regular expressions, we often encounter situations where we need to match non-print white space characters in the original text. For example, we may need to find all the tabs, or we may need to find newline characters, which are difficult to enter directly into a regular expression, and we can enter them using the special metacharacters listed below:
\ B rewind (and delete) one character (Backspace key)\ f feed character\ nnewline character\ r carriage return\ t tab (Tab key)\ v vertical tab
To take a look at an example, remove the blank lines from the file:
Text:
8 5 4 1 6 3 2 7 9
7 6 2 9 5 8 3 4 1
9 3 1 4 2 7 8 5 6
6 9 3 8 7 5 1 2 4
5 1 8 3 4 2 6 9 7
2 4 7 6 1 9 5 3 8
3 26 7 8 4 9 1 5
4 8 9 5 3 1 7 6 2
1 7 5 2 9 6 4 8 3
Regular expression:\ r\ n\ r\ n
Analyze:\ r\ nmatch a carriage return + newline combination, which is used as the end tag of the text line in the windows operating system. A search using the regular expression\ r\ n\ r\ n matches two consecutive end-of-line tags, which happen to be blank lines.
Note: in Unix and Linux operating systems, only a newline character is used to end a line of text. In other words, matching blank lines in Unix or Linux systems only uses\ n\ n.No additional\ r is required. Regular expressions that apply to both windows and Unix/Linux should include a first\ r and a must match\ n, that is,\ r?\ n\ r?\ n, which will be discussed in a later article.
The Java code is as follows:
Public static void matchBlankLine () throws Exception {BufferedReader br = new BufferedReader (new FileReader (new File); StringBuilder sb = new StringBuilder (); char [] cbuf = new char [1024]; int len = 0; while (br.ready () & (len = br.read (cbuf)) > 0) {br.read (cbuf); sb.append (cbuf, 0, len);} String reg = "\ r\ n\ r\ n" System.out.println ("original content:\ n" + sb.toString ()); System.out.println ("after processing:--"); System.out.println (sb.toString (). ReplaceAll (reg, "\ r\ n"));}
The running results are as follows:
Original content: 8 54 1 6 3 2 7 97 6 2 9 5 83 4 19 3 1 4 2 2 7 8 5 66 9 3 8 7 5 1 45 1 83 4 6 6 9 72 4 7 5 3 83 2 7 7 4 9 1 54 9 5 3 7 6 21 7 5 2 9 6 4 4 83 after treatment:-- 8 54 1 6 3 2 7 97 6 2 9 5 83 4 19 3 1 4 2 7 8 5 66 9 3 8 7 5 1 2 45 1 83 4 2 6 9 72 4 7 6 19 5 3 83 2 6 7 8 4 9 1 54 8 9 5 3 1 7 6 21 7 5 2 9 6 4 83
Third, match specific character categories
Character sets (matching one of multiple characters) is the most common form of matching, while some commonly used character sets can be replaced with special metacharacters. These metacharacters match a certain category of characters (class metacharacters). Class metacharacters are not essential, because they can be matched by enumerating the relevant characters one by one or by defining a character interval. However, the regular expressions constructed by them are simple and easy to understand and are often used in practical applications.
1. Matching numbers and non-numbers
\ d any number equivalent to [0-9] or [0123456789]
\ d any non-numeric, equivalent to [^ 0-9] or [^ 0123456789]
2. Match letters and numbers with non-letters and numbers
Letters (Amurz is case-insensitive), numbers, and underscores are a common set of characters that can be used as metacharacters as follows:
\ w any letter (case-insensitive), number, underscore, equivalent to [0-9amurzAmelzz]
\ W any non-alphanumeric and underscore is equivalent to [^ 0-9a murzAmurz]
3. Match white space characters with non-white space characters
\ s any white space character is equivalent to [\ f\ n\ r\ t\ v]
\ s any white space character is equivalent to [^\ f\ n\ r\ t\ v]
Note: the backspace metacharacter\ b is not outside the range of\ s.
4. Match hexadecimal or octal values
Hexadecimal: given with the prefix\ x, for example:\ x0A corresponds to the ASCII character 10 (newline character), which is equivalent to\ n.
Octal: given with the prefix\ 0, the number itself can be two or three digits, for example:\ 011 corresponds to the ASCII character 9 (tab), which is equivalent to\ t.
4. Use POSIX character class
The POSIX character class is a shorthand supported by many regular expression implementations. Java also supports it, but JavaScript does not. The POSIX characters are as follows:
[: alnum:] any letter or number is equivalent to [a-zA-Z0-9] [: alpha:] any letter, equivalent to [a-zA-Z] [: blank:] space or tab, equivalent to [\ t] [: cntrl:] ASCII control character (ASCII 0 to 31, plus ASCII 127) [: digit:] any number is equivalent to [0-9] [: graph:] any printable character But excluding spaces [: lower:] any lowercase letter is equivalent to [an alnum] [: print:] any printable character [: space:] that neither belongs to [: alnum:] or [: cntrl:] any white space character, including spaces, is equivalent to [^\ f\ n\ r\ t\ v] [: upper:] any uppercase letter. Equivalent to [Amurz] [: xdigit:] any hexadecimal number, equivalent to [a-fA-F0-9]
POSIX characters are not quite the same as metacharacters you've seen before. Let's take a look at an example of using regular expressions to match colors in a web page:
Text: testing
Regular expression: # [[: xdigit:]] [[: xdigit:] [[: xdigit:]]
Results: test
Note: the pattern used here begins with [[, ends with]], which is necessary to use the POSIX character class. POSIX characters must be enclosed between [: and:], the outer [and] characters are used to define a collection, and the inner [and] characters are part of the POSIX character class itself.
The POSIX character representation in java is different, not between [: and:], but starting with\ p, between {and}, with a difference in case, and adding\ p {ASCII}, as shown below:
\ p {Alnum} alphanumeric characters: [\ p {Alpha}\ p {Digit}]\ p {Alpha} alphanumeric characters: [\ p {Lower}\ p {Upper}]\ p {ASCII} all ASCII: [\ x00 -\ x7F]\ p {Blank} spaces or tabs: [\ t]\ p {Cntrl} control characters: [\ x00 -\ x1F\ x7F]\ p {Digit} decimal digits: [0-9]\ p {Graph} See characters: [\ p {Alnum}\ p {Punct}]\ p {Lower} lowercase characters: [Amurz]\ p {Print} printable characters: [\ p {Graph}\ x20]\ p {Punct} punctuation:! "# $% &'() * + -. /: ? @ [\] ^ _ `{|} ~\ p {Space} blank characters: [\ t\ n\ x0B\ f\ r]\ p {Upper} uppercase characters: [Amurz]\ p {XDigit} hexadecimal numbers: [0-9a-fA-F] above are all the contents of the article "how to use metacharacters in regular expressions". Thank you for reading! Hope to share the content to help you, more related knowledge, welcome to follow the industry information channel!
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.