In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-03-30 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >
Share
Shulou(Shulou.com)06/02 Report--
This article mainly introduces how to repeat regular expression matching, has a certain reference value, interested friends can refer to, I hope you can learn a lot after reading this article, let the editor take you to understand it.
1. How many matches are there
The first thing to know is the composition of an e-mail address: a set of characters that begin with an alphanumeric or underscore, followed by the @ symbol, followed by the domain name, the user name @ domain address. However, this is also related to the specific mailbox service provider, some are also allowed in the user name. Characters.
1. Match one or more characters
To match multiple repeats of the same character (or character set), simply add a + character to the character (or character set) as a suffix. + matches one or more characters (at least one). For example: a matches an itself, a + will match one or more consecutive a; [0-9] + matches multiple consecutive digits.
Note: when adding a + suffix to a character set, the + must be placed outside the character set, otherwise it will not be a duplicate match. For example, [0-9 +] means a number or a + sign, which is grammatically correct, but it is not what we want.
Text: Hello, mhmyqn@qq.com or mhmyqn@126.com is my email.
Regular expression:\ wicked @ (\ w+\.) +\ w+
Results: Hello, [mhmyqn@qq.com] or [mhmyqn@126.com] is my email.
Parsing:\ W + can match one or more characters, while subexpressions (\ w +\.) + can match like xxxx.edu. Such a string will not be in the end. The character ends, so there will be a\ w + after it. E-mail addresses like mhmyqn@xxxx.edu.cn will also match.
2. Match zero or more characters
Match zero or more characters using the meta character *, which is exactly the same as +. As long as you put it after the next character or character set, you can match that character (or character set) zero or more times in a row. For example, the regular expression ab*c can match ac, abc, abbbbbc and so on.
3. Match zero or one character
Match zero or one character using metacharacters? As mentioned in the previous article, the regular expression\ r\ n\ r\ n is used to match a blank exercise, but the metacharacter?,\ r?\ n\ r?\ n can be used to match both blank lines in windows and blank lines in Unix and Linux without the need for\ r in Unix and Linux. Let's look at an example of a URL that matches the http or https protocols:
Text: The URL is http://www.mikan.com, to connect securely use https://www.mikan.cominstead.
Regular expression: https?:// (\ w +\.) +\ w +
Results: The URL is [http://www.mikan.com], to connect securely use [https://www.mikan.com] instead.
Analysis: this model is based on https? At the beginning, meaning? The previous character can be with or without it, so it can match http or https, and the latter part is the same as the previous example.
Second, the repetition times of the match
+, * and? in regular expressions Solved a lot of problems, but:
1) there is no upper limit on the number of characters that + and * match. We cannot set a maximum for the number of characters they will match.
2) +, * and? Matches at least one or zero characters. We cannot set a separate minimum for the number of characters they will match.
3) if we only use * and +, we cannot set them to match the number of characters to an exact number.
The regular expression provides a syntax for setting the number of repetitions, which are given in {and} characters, with values written between them.
1. Set an exact value for the number of repeated matches
If you want to set an exact value for the number of repeated matches, write that number between {and}. For example, {4} means that the character in front of it (or character set) must be repeated 4 times in a row in the original text to be considered a match, if only 3 times, it is not a match.
As in the example of matching colors on a page mentioned in previous articles, you can match with the number of repeats: # [[: xdigit:]] {6} or # [0-9a-fA-F] {6}, the POSIX character is #\\ p {XDigit} {6} in java.
2. Set an interval for the number of repeated matches
The {} syntax can also be used to set an interval for the number of repeated matches, that is, to set a minimum and maximum value for the number of repeated matches. This interval must be given in the form {n, m}, where n > = m > = 0. For example, a regular expression (such as date 2012-08-12 or 2012-8-12) that checks whether the date format is correct (not the validity of the date):\ d {4} -\ d {1pc2} -\ d {1pc2}.
3. At least how many times does the match repeat?
The last use of {} syntax is to give a minimum number of repetitions (but not the maximum number of repetitions), such as {3,} to repeat at least 3 times. Note: there must be a comma in {3,}, and there can be no spaces after the comma. Otherwise, something will go wrong.
To take a look at an example, use a regular expression to find all amounts greater than $100:
Text:
$25.36
$125.36
$205.0
$2500.44
$44.30
Regular expression: $\ d {3,}\. D {2}
Results:
$25.36
[$125.36]
[$205.0]
[$2500.44]
$44.30
+, *,? Can be expressed as the number of repeats:
+ is equivalent to {1,}
* equivalent to {0,}
? It is equivalent to {0jue 1}.
Third, prevent over-matching
? Only zero or one character can be matched, and {n} and {n ·m} also have an upper limit on the number of matching repeats, but there are no upper limits such as *, +, {n,}, which can sometimes lead to over-matching.
Let's look at an example of matching a html tag.
Text:
Yesterday is history,tomorrow is a mystery, but today is a gift.
Regular expression:. *
Results:
Yesterday is [history,tomorrow is a mystery, but today is a gift].
Analysis: match tags (case-insensitive), match tags (case-insensitive). But the result is not as expected, there are three, after the first label, all the things match up to the last one.
What causes it? Because * and + are greedy metacharacters, they behave as much as possible from the beginning of a piece of text to the end of the text, rather than from the beginning of the text to the first match.
Lazy versions of these metacharacters can be used when such greed is not needed. Laziness means matching as few characters as possible, as opposed to greed. Lazy metacharacters only need to add one to greedy metacharacters? Suffixes are fine. The following is the corresponding lazy version of greedy metacharacters:
*?
+ +?
{n,} {n,}?
So in the above example, the regular expression just needs to be changed to. *? Then, the results are as follows:
History
Mystery
Gift
Thank you for reading this article carefully. I hope the article "how to repeatedly match regular expressions" shared by the editor will be helpful to everyone. At the same time, I also hope that you will support and pay attention to the industry information channel. More related knowledge is waiting for you to learn!
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.