In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-03-28 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >
Share
Shulou(Shulou.com)06/02 Report--
This article mainly shows you "how to match a set of characters in a regular expression", which is easy to understand and well-organized. I hope it can help you solve your doubts. Let me lead you to study and learn how to match a set of characters in a regular expression.
The details are as follows:
Note: in all examples, the regular expression matching result is included between [and] in the source text, and some examples will be implemented using Java. If it is the regular expression usage of java itself, it will be explained in the appropriate place. All the java examples passed the test under JDK1.6.0_13.
1. Match one of multiple characters
In the example of matching a text file that starts with na or sa, the regular expression used is .a.\ .txt. If another file is cal.txt, it will also be matched. What if you only want to match files that start with na or sa?
Since you only want to find n or s, use something that matches any character. Obviously not. In regular expressions, we can use [and] to define a set of characters. In a set of characters defined using [and], all characters between these two metacharacters are part of the set. The matching result of the character set is text that can match any member of the set.
Text:
Sales.txt
Na1.txt
Na2.txt
Sa1.txt
Sanatxt.txt
Cal.txt
Regular expression: [ns] a.\ .txt
Results:
Sales.txt
[na1.txt]
[na2.txt]
[sa1.txt]
Sanatxt.txt
Cal.txt
Analysis: the regular expression used here starts with [na], and this collection will match the characters n or s, not any other characters. [and] do not match any characters, they are only responsible for defining a character collection. Then a matches a character a,\. Will match one. The character itself, txt matches the txt character itself, and the matching result is the same as we expected.
However, if one of the files is usa1.txt, it will also be matched. This is the problem of location matching, which will be discussed later.
Second, the use of character set interval
In the above example, what if we only want to match files that start with na or sa and followed by a number? In the regular expression [ns] a.\ .txt, Will match to any character, including numbers. This problem can be solved by using character sets:
Sales.txt
Na1.txt
Na2.txt
Sa1.txt
San.txt
Sanatxt.txt
Cal.txt
Regular expression: [ns] a [0123456789]\ .txt
Results:
Sales.txt
[na1.txt]
[na2.txt]
[sa1.txt]
San.txt
Sanatxt.txt
Cal.txt
Analysis: as you can see from the results, we only match files that start with na or sa, followed by a number, while san.txt is not matched because a character set [0123456789] is used to limit that the third character can only be a number.
In regular expressions, some character intervals are frequently used, such as 0-9 ~ (?) ~ amurz and so on. In order to simplify the definition of character intervals, regular expressions provide a special metacharacter-to define character intervals. Like the example above, we can use regular expressions to match: [ns] a [0-9]\ .txt, and the result is exactly the same as above.
Character ranges are not limited to numbers, they are legal character ranges like the following:
[Amurf]: matches all uppercase letters from A to F.
[Amurz]: matches all uppercase letters from A to Z.
[Amurz]: matches all letters from the ASCII character A to the ASCII character z. But this range is generally not used, it is just an example. Because they also contain characters such as [and ^ that are arranged between Z and an in ASCII.
The beginning and end of a character range can be any character in the ASCII character list. However, in practical use, the most commonly used are numeric and alphabetic character intervals.
Note: when defining a character interval, the trailing character of the interval cannot be less than the first character (such as [9-0]). This is not allowed. -as a metacharacter can only appear between [and], if anywhere other than [and], it is just an ordinary character and will only match-itself.
Multiple character intervals can be given in the same character set, for example: [0-9a-zA-Z] will match any uppercase and lowercase letters and numbers.
Take a look at an example of matching colors on a web page:
Text:
test
Regular expression: # [0-9A-Fa-f] [0-9A-Fa-f]
Results: test
Analysis: in a web page, the color is generally expressed as an RGB value that begins with #, R represents red, G represents green, and B represents blue. Any color can be reconciled through different combinations of RGB. RGB values are represented by hexadecimal values, such as # 000000 for white, # FFFFFF for black, and # FF0000 for red. So the regular expression for matching colors in a web page starts with #, followed by six identical sets of [0-9A-Fa-f] characters (this can be simplified to # [0-9A-Fa-f] {6}, which will be discussed later in repeated matches).
Third, take non-matching
A character set is usually used to specify a set of characters that must match one of them, but in some cases we need to do the opposite and give a set of characters that we don't need to get, in other words, except for the characters in that character set, any other character can be matched.
For example, to match a file that starts with na or sa and is not followed by a number:
Text:
Sales.txt
Na1.txt
Na2.txt
Sa1.txt
Sanatxt.txt
San.txt
Regular expression: [ns] a [^ 0-9]\ .txt
Results:
Sales.txt
Na1.txt
Na2.txt
Sa1.txt
Sanatxt.txt
[san.txt]
Analysis: the pattern used in this example is exactly the opposite of the previous one, where [0-9] only matches numbers, while here [^ 0-9] matches non-numbers.
Note: ^ denotes the negative between [and]. If it appears at the beginning of the regular expression, it indicates that the position match is matched, which will be discussed later. At the same time, the effect of ^ is applied to all characters or character intervals in a given character set, not just the character or character interval immediately following the ^ character. For example, [^ 0-9a-z] does not match any numbers or lowercase letters.
The above is all the content of the article "how to match a set of characters in regular expressions". Thank you for reading! I believe we all have a certain understanding, hope to share the content to help you, if you want to learn more knowledge, welcome to follow the industry information channel!
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.