In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-02-24 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >
Share
Shulou(Shulou.com)06/02 Report--
This article mainly introduces the "regular expression introduction course". In the daily operation, I believe that many people have doubts about the regular expression introduction tutorial. The editor consulted all kinds of materials and sorted out simple and easy-to-use operation methods. I hope it will be helpful for you to answer the questions of "introduction to regular expressions"! Next, please follow the editor to study!
1 introduction
Regular expressions allow you to define a pattern through which to perform the corresponding operation on a string. A substring that matches a pattern is called a match.
A regular expression is a string of characters that define a search pattern.
Regular expressions are mainly used in the following scenarios:
Input verification
Find replace operation
Advanced string manipulation
File search or rename
Whitelist and blacklist
Regular expressions are not suitable for these scenarios:
XML or HTML parsing
A date that matches exactly
There are many engines that implement regular matching, each with its own features. This book will avoid discussing feature differences (between different engines), but will only discuss features that are common to different engines in most cases.
The examples throughout the book use JavaScript. As a result, the book may be slightly skewed towards JavaScript's regular engine.
2 basic
Regular expressions are usually formatted as / / and the following / is usually omitted for brevity. We will discuss flag in more detail in the next chapter.
Let's start with the regular expression / pplink g. Now, think of / g flag as fixed.
/ p/g
As we can see, / pbank g matches all lowercase p characters.
Be careful
By default, regular expressions are case sensitive.
An instance of a regular expression pattern found in the input string is called a match.
/ pp/g
3 character group
You can match a character from a set of characters.
/ [aeiou] / g
[aeiou] / g matches all vowels in the input string.
Here is another example:
/ p[aeiou] t/g
We match a p, followed by a vowel, then a t.
There is a more intuitive shortcut to match a character in a continuous range.
/ [amurz] / g
Warning
The expression / [amurz] / g matches only one character. In the above example, each character has a separate match. Does not match the entire string.
We can also combine ranges and individual characters in regular expressions.
/ [A-Za-z0-9 percent -] / g
Our regular expression / [A-Za-z0-9 matches -] / g matches a character that must (at least) be one of the following characters:
A-Z
AMuz
0-9
_ or-
We can also "negate" these rules:
/ [^ aeiou] / g
The only difference between / [aeiou] / g and / [^ aeiou] / g is that ^ follows the left parenthesis. The purpose is to "negate" the rules defined in parentheses. It means:
Match any character that does not belong to a, e, I, o and u
3.1 exampl
Illegal username character
/ [^ a-zA-Z_0-9 -] / g
Specified character
/ [A-HJ-NP-Za-kmnp-z2-9] / g
4 character escape
Character escape is an abbreviated expression of some common character classes.
4.1 numeric characters\ d
The escape character\ d indicates that it matches the numeric character 0-9. Equivalent to [0-9].
/\ dhand g (please take a closer look here)
/\ d\ dUnip g
\ D is the opposite of\ d, equivalent to [^ 0-9].
/\ Dzod g
4.2 word characters\ w
The escape character\ w matches the word character. These include:
Lowercase letter amurz
Capital letter Amurz
The number 0-9
Underscore _
Equivalent to [a-zA-Z0-9]
/\ wUnig
/\ Wzag
4.3 White space character\ s
The escape character\ s matches the white space character. The specific character set that matches depends on the regular expression engine, but most at least include:
Space
Tab Tab\ t
Enter\ r
Newline character\ n
Change the page\ f
Others may also include vertical tabs (\ v). The Unicode self-recognition engine usually matches all characters in the delimiter category. However, the technical details are usually not important.
/\ sUnig
/\ Splink g (uppercase s)
4.4 any character.
Although it is not a typical character escape. . Can match any 1 character. (in addition to the newline character\ n, the newline character can also be matched through the dotall modifier.)
/. / g
5. Escape
In regular expressions, some characters have special meanings, which we will discuss in this chapter:
| |
{,}
()
[,]
^, $
+, *,?
\
. The literal amount only in the character class.
-: sometimes a special character in a character class.
When we want to match these characters literally, we can precede them with\ "escape" them.
/\ (paren\) / g
/ (paren) / g
/ example\ .com / g
/ example.com/g
/ A\ + / g
/ Aguilera g
/ worth\ $5Unip g
/ worth $5gamg
5.1 exampl
JavaScript inline comments
/. *
A substring surrounded by an asterisk
/ * [^\ *] *
The first and last asterisks are literal, so they have to escape with\ *. The asterisk in the character set does not need to be escaped, but I escaped it for clarity. The asterisk immediately following the character set indicates a repetition of the character set, which we will discuss in later chapters.
6 groups
As the name implies, groups are used to "combine" the components of regular expressions. These groups can be used to:
Extract a subset of matches
Repeat grouping any number of times
Refer to previously matched substrings
Enhance readability
Allow complex replacements
In this chapter, we first learn how the group works, and there will be more examples in later chapters.
6.1 capture Group
The capture group uses (… ) indicates. The following is an example of explanation:
/ a (bcd) eUnip g
The capture group allows partial matches to be extracted.
/ ([^ {}] *)\} / g
Through the regular functions of the language, you will be able to extract text that matches between parentheses.
Capture groups can also be used to partially group regular expressions for repetition. Although we will repeat it in detail in the following sections, here is an example that demonstrates the usefulness of groups.
/ a (bcd) + eUnip g
At other times, they are used to group logically similar parts of regular expressions to improve readability.
/ (\ d\ d)-W (\ d\ d) / g
6.2 backtracking
Backtracking allows you to reference previously captured substrings.
Match the first group can use\ 1, match the second group can use\ 2, and so on.
/ ([abc]) ×\ 1 ×\ 1Universe g
You cannot use backtracking to reduce repetition in regular expressions. They refer to the matching of groups, not patterns.
/ [abc] [abc] [abc] / g
/ [abc]\ 1\ 1Universe g
The following is an example that demonstrates common use cases:
/\ w + ([, |])\ w +\ 1\ wband Universe g
This cannot be achieved through repetitive character classes.
/\ w + [, |]\ w + [, |]\ wband Universe g
6.3 non-capture group
Non-capture groups are very similar to capture groups, except that they do not create "captures". But take the form (?:.)
Non-capture groups are usually used with capture groups. Maybe you are trying to use capture groups to extract some parts of the match. While you may want to use a group without disturbing the capture order, you should use a non-capture group.
6.4 exampl
Query string parameters
/ ^\? (\ w+) = (\ w+) (?: & (\ w+) = (\ w+)) * $/ g
We individually match the first set of key-value pairs because this allows us to use the & delimiter as part of the repeating group.
(basic) HTML tag
As a rule of thumb, don't use regular expressions to match XML/HTML. However, I would like to provide a relevant example:
/ (.*) / gi
Name
Find:\ b (\ w+) (\ w+)\ b
Replace:
In replacement operations, you often use 2; capture uses\ 1,\ 2
Before replacement
John Doe
Jane Doe
Sven Svensson
Janez Novak
Janez Kranjski
Tim Joe
After replacement
Doe, John
Doe, Jane
Svensson, Sven
Novak, Janez
Kranjski, Janez
Joe, Tim
Backtracking and plural
Find:\ bword (s?)\ b
Replace: phrase$1
Before replacement
This is a paragraph with some words.
Some instances of the word "word" are in their plural form: "words".
After replacement
This is a paragraph with some phrases.
Yet, some are in their singular form: "phrase".
7 repeat
Repetition is a powerful and universal feature of regular expressions. There are several ways to represent repetition in regular expressions.
7.1 optional
Can we use it? Set a section to optional (0 or 1 time).
/ a?/g
Another example:
/ https?/g
We can also make capture groups and non-capture groups programmatically optional.
/ url: (www\.)? example\ .com / g
7.2 zero or more times
If we want to match zero or more tags, we can use * as the suffix.
/ axiapax g
Our regular expression even matches an empty string.
7.3 one or more times
If we want to match one or more tags, we can use + as the suffix.
/ axiapax g
7.4 accurate x times
If we want to match a particular tag exactly x times, we can add the {x} suffix. This is functionally equivalent to copying and pasting the tag x times.
/ a {3} / g
The following is an example of a hexadecimal color code that matches six characters in uppercase.
/ # [0-9A-F] {6} / g
Here, the tag {6} is applied to the character set [0-9A-F].
7.5 between the minimum and maximum times
If we want to match a specific tag between the minimum and maximum times, we can add {min,max} after that tag.
/ a {2,4} / g
Warning
Do not have spaces after commas in {min,max}.
7.6 at least x times
If we want to match a particular tag at least x times, we can add {x,} after the tag. Similar to {min, max}, except that there is no upper limit.
/ a {2,} / g
7.7 precautions for greedy mode
Regular expressions use greedy mode by default. In greedy mode, as many characters as possible match the requirements.
/ axiapax g
/ ".*" / g
In the * * repeat operator (? , *, +,.) add after? You can make the match lazy.
/ ". *?" / g
Here, this can also be replaced by using [^ "]. This is the best thing to do.
/ "[^"] * "/ g
Laziness means to stop as soon as the conditions are met, but greed means to stop only when the conditions are no longer met.
-Andrew S on StackOverflow
/ / g
/ / g
7.8 exampl
Bitcoin address
/ ([13] [a-km-zA-HJ-NP-Z0-9] {26 record33}) / g (thinking: {26pr 33}? )
Youtube video
/ (?: https?:\ /\ /)? (?: www\.)? youtube\ .com\ / watch\?. *? v = ([^ &\ s] +). * / gm
We can use the anchor to adjust the expression so that it doesn't match the last incorrect link, which we'll get to later.
8 alternating
Alternation allows you to match one of several phrases. This is more powerful than a set of characters limited to a single character.
Use pipe symbols | separate multiple phrases
/ foo | bar | baz/g
Match one of foo, bar, and baz.
If only part of the rule needs to be "alternated", you can use groups for wrapping, both capture and non-capture groups.
/ Try (foo | bar | baz) / g
Try is followed by one of foo, bar, and baz.
Match the number between 100 and 250:
/ 1\ d\ d | 2 [0-4]\ d | 250 × g
This can be generated using the Regex Numeric Range Generator tool.
Examples
Hexadecimal color
Let's improve the previous example of hexadecimal color matching.
/ # [0-9A-F] {6} | [0-9A-F] {3}
It is important that [0-9A-F] {6} precede [0-9A-F] {3}. Otherwise:
/ # ([0-9A-F] {3} | [0-9A-F] {6}) / g
Tips
The regular expression engine alternates attempts from left to right.
Roman numerals
/ ^ M {0jue 4} (CM | CD | Drunc {0jue 3}) (XC | XL | Lexx {0jue 3}) (IX | IV | Vogue I {0Magol 3}) $/ g
9 modifier
Modifiers allow us to divide regular expressions into different "patterns".
The modifier is the part that follows / pattern/.
Different engines support different modifiers. Here we only discuss the most common modifiers.
9.1 Global modifier (g)
So far, all examples have set global modifiers. If the global modifier is not enabled, the regular expression matching the first will no longer match any other characters.
/ [aeiou] / g
/ [aeiou] /
9.2 case-insensitive modifier (I)
As the name implies, enabling this modifier makes the rule case-insensitive when matching.
/ # [0-9A-F] {6} / I
/ # [0-9A-F] {6} /
/ # [0-9A-Fa-f] {6} /
9.3 Multiline mode modifier (m)
Limited support
In Ruby, the m modifier executes other functions.
Multiline modifiers are related to the handling of anchor points when processing a multiline string that contains newline characters. By default, / ^ foo$/ only matches "foo".
We might want it to match foo on one line in a multiline string.
Let's take "bar\ nfoo\ nbaz" as an example:
Bar foo baz
Without the m modifier, the above string is treated as a single line bar\ nfoo\ nbaz, and the regular expression ^ foo$ does not match any characters.
If there is an m modifier, the above string is treated as three lines. ^ foo$ can match to the middle line.
9.4 Dot-all modifier (s)
Limited support
This modifier is not supported by JavaScript before ES2018. Ruby does not support this modification either, but is represented by m.
. Usually matches any character except a newline character. After using the dot all modifier, it can also match newline characters.
10 anchor point
The anchor itself does not match anything. However, they limit the location where the match occurs.
You can think of anchor points as "invisible characters".
10.1 start of line ^
Insert the ^ sign at the beginning of the regular so that the rest of the regular must match at the beginning of the string. You can think of it as always matching an invisible character at the beginning of the string.
/ ^ pthumb g
10.2 end of line
Insert the $sign at the end of the regular, similar to the first line character. You can think of it as always matching an invisible character at the end of the string.
/ pawned _ hand _ g
^ and $anchors are often used together to ensure that the regularity matches the string as a whole, not just partially.
/ ^ paired _ Universe _ g
Let's review an example of repetition and add two anchors at the end of the regular.
/ ^ https?$/g
Without these two anchor points, http/2 and shttp will also be matched.
10.3 word boundary\ b
A word boundary is the position between a character and a non-word character.
The word boundary anchor\ b matches the hypothetical invisible characters that exist between characters and non-word characters.
/\ bp/g
Prompt
The characters include amurz, Amurz, 0-9, and _.
/\ bp\ bUnip g
/\ bcat\ bUnip g
There is also a non-word boundary anchor.
As the name implies, it matches everything except the word boundary.
/\ Bp/g
/\ Bp\ Bhand g
Tips
^... $and\ b... \ b is a common pattern, and you almost always need these two to prevent accidental matches.
10.4 exampl
Trailing space
/\ s+$/gm
Markdown title
/ ^ # # / gm
There are no anchor points:
/ # # / gm
11 Zero width assertion (lookaround)
Zero-width assertions can be used to validate conditions without matching any text.
You can only watch, not move.
Advance assertion (lookhead)
Positive (? =...)
Negative (?!)
Advance assertion (lookbehind)
Positive (?
11.1 advance assertion (lookhead)
Forward (positive)
/ _ (? = [aeiou]) / g
Notice how the following characters do not match. It can be confirmed by looking at the front.
/ (. +) _ (? = [aeiou]) (? =\ 1) / g
The regular engine checks on _ using (? = [aeiou]) and (? =\ 1).
/ (? =. *). * / g
Negative (Negative)
/ _ (?! [aeiou]) / g
/ ^ (?!. *). * $/ g
If there are no anchor points, the parts of each example that do not have # will be matched.
Negative antecedents are often used to prevent matching specific phrases.
/ foo (?! bar) / g
/-(?: (?! -).) * / g
11.2 examples
Password authentication
/ ^ (? = .*\ d) (? = .* [a murz]) (? = .* [Amurz]) (? = .* [a-zA-Z]). {8,} $/
Zero-width assertions can be used to validate multiple conditions.
Quoted string
/ (['"]) (?: (?!\ 1).) *\ 1Unigram
Without prior assertions, the best we can do is:
/ (['"]) [^'"] *\ 1zag
12 advanced example
JavaScript comment
/\ /\ * [\ s\ S] *?\ * / |\ /. * / g
[\ s\ S] is a technique for matching any character, including newline characters. We avoid using the dot-all modifier because we need to use it. Represents a single-line comment.
24 hours
/ ^ ([01]? [0-9] | 2 [0-3]): [0-5] [0-9] (: [0-5] [0-9])? $/ g
IP address
/\ b (?: 2 (?: [0-4] [0-9] | 5 [0-5]) | [0-1]? [0-9]? [0-9])\.) {3} (?: 2 ([0-4] [0-9] | 5 [0-5]) | [0-1]? [0-9]? [0-9]))\ Bg hand
Meta label
/ / gm
Replace:
Floating point number
Optional symbol
Optional integer part
Optional decimal part
Optional index part
/ ^ ([+ -]? (? =\.\ d |\ d) (?:\ d +)? (?:\.?\ d *)) (?: [eE] ([+ -]?\ d +))? $/ g
Forward advance assertions (? =\.\ d |\ d) ensure that there is no match.
HSL Color
An integer from 0 to 360
/ ^ 0* (?: 360 | 3 [0-5]\ d | [12]?\ d?) $/ g
Percentage
/ ^ (?: 100 (?:\ .0 +)? |\ d?\ d (?:\.\ d+))% $/ g
HSL and percentage
/ ^ hsl\ (?: 3 [0-5]\ d | [12]?\ d?\ d)\ s * (?: 100 (?:\ .0 +)? |\ d?\ d (?:\.\ d +))%\ s *) {2}\) $/ gi
13 next step
If you want to learn more about regular expressions and how they work:
Awesome-regex
Regex tag on StackOverflow
StackOverflow RegEx FAQ
R/regex
RexEgg
Regular-Expressions.info
Regex Crossword
Regex Golf
At this point, the study of "introduction to regular expressions" is over. I hope to be able to solve your doubts. The collocation of theory and practice can better help you learn, go and try it! If you want to continue to learn more related knowledge, please continue to follow the website, the editor will continue to work hard to bring you more practical articles!
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.