Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

An introduction to regular expressions

2025-02-24 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Share

Shulou(Shulou.com)06/02 Report--

This article mainly introduces the "regular expression introduction course". In the daily operation, I believe that many people have doubts about the regular expression introduction tutorial. The editor consulted all kinds of materials and sorted out simple and easy-to-use operation methods. I hope it will be helpful for you to answer the questions of "introduction to regular expressions"! Next, please follow the editor to study!

1 introduction

Regular expressions allow you to define a pattern through which to perform the corresponding operation on a string. A substring that matches a pattern is called a match.

A regular expression is a string of characters that define a search pattern.

Regular expressions are mainly used in the following scenarios:

Input verification

Find replace operation

Advanced string manipulation

File search or rename

Whitelist and blacklist

Regular expressions are not suitable for these scenarios:

XML or HTML parsing

A date that matches exactly

There are many engines that implement regular matching, each with its own features. This book will avoid discussing feature differences (between different engines), but will only discuss features that are common to different engines in most cases.

The examples throughout the book use JavaScript. As a result, the book may be slightly skewed towards JavaScript's regular engine.

2 basic

Regular expressions are usually formatted as / / and the following / is usually omitted for brevity. We will discuss flag in more detail in the next chapter.

Let's start with the regular expression / pplink g. Now, think of / g flag as fixed.

/ p/g

As we can see, / pbank g matches all lowercase p characters.

Be careful

By default, regular expressions are case sensitive.

An instance of a regular expression pattern found in the input string is called a match.

/ pp/g

3 character group

You can match a character from a set of characters.

/ [aeiou] / g

[aeiou] / g matches all vowels in the input string.

Here is another example:

/ p[aeiou] t/g

We match a p, followed by a vowel, then a t.

There is a more intuitive shortcut to match a character in a continuous range.

/ [amurz] / g

Warning

The expression / [amurz] / g matches only one character. In the above example, each character has a separate match. Does not match the entire string.

We can also combine ranges and individual characters in regular expressions.

/ [A-Za-z0-9 percent -] / g

Our regular expression / [A-Za-z0-9 matches -] / g matches a character that must (at least) be one of the following characters:

A-Z

AMuz

0-9

_ or-

We can also "negate" these rules:

/ [^ aeiou] / g

The only difference between / [aeiou] / g and / [^ aeiou] / g is that ^ follows the left parenthesis. The purpose is to "negate" the rules defined in parentheses. It means:

Match any character that does not belong to a, e, I, o and u

3.1 exampl

Illegal username character

/ [^ a-zA-Z_0-9 -] / g

Specified character

/ [A-HJ-NP-Za-kmnp-z2-9] / g

4 character escape

Character escape is an abbreviated expression of some common character classes.

4.1 numeric characters\ d

The escape character\ d indicates that it matches the numeric character 0-9. Equivalent to [0-9].

/\ dhand g (please take a closer look here)

/\ d\ dUnip g

\ D is the opposite of\ d, equivalent to [^ 0-9].

/\ Dzod g

4.2 word characters\ w

The escape character\ w matches the word character. These include:

Lowercase letter amurz

Capital letter Amurz

The number 0-9

Underscore _

Equivalent to [a-zA-Z0-9]

/\ wUnig

/\ Wzag

4.3 White space character\ s

The escape character\ s matches the white space character. The specific character set that matches depends on the regular expression engine, but most at least include:

Space

Tab Tab\ t

Enter\ r

Newline character\ n

Change the page\ f

Others may also include vertical tabs (\ v). The Unicode self-recognition engine usually matches all characters in the delimiter category. However, the technical details are usually not important.

/\ sUnig

/\ Splink g (uppercase s)

4.4 any character.

Although it is not a typical character escape. . Can match any 1 character. (in addition to the newline character\ n, the newline character can also be matched through the dotall modifier.)

/. / g

5. Escape

In regular expressions, some characters have special meanings, which we will discuss in this chapter:

| |

{,}

()

[,]

^, $

+, *,?

\

. The literal amount only in the character class.

-: sometimes a special character in a character class.

When we want to match these characters literally, we can precede them with\ "escape" them.

/\ (paren\) / g

/ (paren) / g

/ example\ .com / g

/ example.com/g

/ A\ + / g

/ Aguilera g

/ worth\ $5Unip g

/ worth $5gamg

5.1 exampl

JavaScript inline comments

/. *

A substring surrounded by an asterisk

/ * [^\ *] *

The first and last asterisks are literal, so they have to escape with\ *. The asterisk in the character set does not need to be escaped, but I escaped it for clarity. The asterisk immediately following the character set indicates a repetition of the character set, which we will discuss in later chapters.

6 groups

As the name implies, groups are used to "combine" the components of regular expressions. These groups can be used to:

Extract a subset of matches

Repeat grouping any number of times

Refer to previously matched substrings

Enhance readability

Allow complex replacements

In this chapter, we first learn how the group works, and there will be more examples in later chapters.

6.1 capture Group

The capture group uses (… ) indicates. The following is an example of explanation:

/ a (bcd) eUnip g

The capture group allows partial matches to be extracted.

/ ([^ {}] *)\} / g

Through the regular functions of the language, you will be able to extract text that matches between parentheses.

Capture groups can also be used to partially group regular expressions for repetition. Although we will repeat it in detail in the following sections, here is an example that demonstrates the usefulness of groups.

/ a (bcd) + eUnip g

At other times, they are used to group logically similar parts of regular expressions to improve readability.

/ (\ d\ d)-W (\ d\ d) / g

6.2 backtracking

Backtracking allows you to reference previously captured substrings.

Match the first group can use\ 1, match the second group can use\ 2, and so on.

/ ([abc]) ×\ 1 ×\ 1Universe g

You cannot use backtracking to reduce repetition in regular expressions. They refer to the matching of groups, not patterns.

/ [abc] [abc] [abc] / g

/ [abc]\ 1\ 1Universe g

The following is an example that demonstrates common use cases:

/\ w + ([, |])\ w +\ 1\ wband Universe g

This cannot be achieved through repetitive character classes.

/\ w + [, |]\ w + [, |]\ wband Universe g

6.3 non-capture group

Non-capture groups are very similar to capture groups, except that they do not create "captures". But take the form (?:.)

Non-capture groups are usually used with capture groups. Maybe you are trying to use capture groups to extract some parts of the match. While you may want to use a group without disturbing the capture order, you should use a non-capture group.

6.4 exampl

Query string parameters

/ ^\? (\ w+) = (\ w+) (?: & (\ w+) = (\ w+)) * $/ g

We individually match the first set of key-value pairs because this allows us to use the & delimiter as part of the repeating group.

(basic) HTML tag

As a rule of thumb, don't use regular expressions to match XML/HTML. However, I would like to provide a relevant example:

/ (.*) / gi

Name

Find:\ b (\ w+) (\ w+)\ b

Replace:

In replacement operations, you often use 2; capture uses\ 1,\ 2

Before replacement

John Doe

Jane Doe

Sven Svensson

Janez Novak

Janez Kranjski

Tim Joe

After replacement

Doe, John

Doe, Jane

Svensson, Sven

Novak, Janez

Kranjski, Janez

Joe, Tim

Backtracking and plural

Find:\ bword (s?)\ b

Replace: phrase$1

Before replacement

This is a paragraph with some words.

Some instances of the word "word" are in their plural form: "words".

After replacement

This is a paragraph with some phrases.

Yet, some are in their singular form: "phrase".

7 repeat

Repetition is a powerful and universal feature of regular expressions. There are several ways to represent repetition in regular expressions.

7.1 optional

Can we use it? Set a section to optional (0 or 1 time).

/ a?/g

Another example:

/ https?/g

We can also make capture groups and non-capture groups programmatically optional.

/ url: (www\.)? example\ .com / g

7.2 zero or more times

If we want to match zero or more tags, we can use * as the suffix.

/ axiapax g

Our regular expression even matches an empty string.

7.3 one or more times

If we want to match one or more tags, we can use + as the suffix.

/ axiapax g

7.4 accurate x times

If we want to match a particular tag exactly x times, we can add the {x} suffix. This is functionally equivalent to copying and pasting the tag x times.

/ a {3} / g

The following is an example of a hexadecimal color code that matches six characters in uppercase.

/ # [0-9A-F] {6} / g

Here, the tag {6} is applied to the character set [0-9A-F].

7.5 between the minimum and maximum times

If we want to match a specific tag between the minimum and maximum times, we can add {min,max} after that tag.

/ a {2,4} / g

Warning

Do not have spaces after commas in {min,max}.

7.6 at least x times

If we want to match a particular tag at least x times, we can add {x,} after the tag. Similar to {min, max}, except that there is no upper limit.

/ a {2,} / g

7.7 precautions for greedy mode

Regular expressions use greedy mode by default. In greedy mode, as many characters as possible match the requirements.

/ axiapax g

/ ".*" / g

In the * * repeat operator (? , *, +,.) add after? You can make the match lazy.

/ ". *?" / g

Here, this can also be replaced by using [^ "]. This is the best thing to do.

/ "[^"] * "/ g

Laziness means to stop as soon as the conditions are met, but greed means to stop only when the conditions are no longer met.

-Andrew S on StackOverflow

/ / g

/ / g

7.8 exampl

Bitcoin address

/ ([13] [a-km-zA-HJ-NP-Z0-9] {26 record33}) / g (thinking: {26pr 33}? )

Youtube video

/ (?: https?:\ /\ /)? (?: www\.)? youtube\ .com\ / watch\?. *? v = ([^ &\ s] +). * / gm

We can use the anchor to adjust the expression so that it doesn't match the last incorrect link, which we'll get to later.

8 alternating

Alternation allows you to match one of several phrases. This is more powerful than a set of characters limited to a single character.

Use pipe symbols | separate multiple phrases

/ foo | bar | baz/g

Match one of foo, bar, and baz.

If only part of the rule needs to be "alternated", you can use groups for wrapping, both capture and non-capture groups.

/ Try (foo | bar | baz) / g

Try is followed by one of foo, bar, and baz.

Match the number between 100 and 250:

/ 1\ d\ d | 2 [0-4]\ d | 250 × g

This can be generated using the Regex Numeric Range Generator tool.

Examples

Hexadecimal color

Let's improve the previous example of hexadecimal color matching.

/ # [0-9A-F] {6} | [0-9A-F] {3}

It is important that [0-9A-F] {6} precede [0-9A-F] {3}. Otherwise:

/ # ([0-9A-F] {3} | [0-9A-F] {6}) / g

Tips

The regular expression engine alternates attempts from left to right.

Roman numerals

/ ^ M {0jue 4} (CM | CD | Drunc {0jue 3}) (XC | XL | Lexx {0jue 3}) (IX | IV | Vogue I {0Magol 3}) $/ g

9 modifier

Modifiers allow us to divide regular expressions into different "patterns".

The modifier is the part that follows / pattern/.

Different engines support different modifiers. Here we only discuss the most common modifiers.

9.1 Global modifier (g)

So far, all examples have set global modifiers. If the global modifier is not enabled, the regular expression matching the first will no longer match any other characters.

/ [aeiou] / g

/ [aeiou] /

9.2 case-insensitive modifier (I)

As the name implies, enabling this modifier makes the rule case-insensitive when matching.

/ # [0-9A-F] {6} / I

/ # [0-9A-F] {6} /

/ # [0-9A-Fa-f] {6} /

9.3 Multiline mode modifier (m)

Limited support

In Ruby, the m modifier executes other functions.

Multiline modifiers are related to the handling of anchor points when processing a multiline string that contains newline characters. By default, / ^ foo$/ only matches "foo".

We might want it to match foo on one line in a multiline string.

Let's take "bar\ nfoo\ nbaz" as an example:

Bar foo baz

Without the m modifier, the above string is treated as a single line bar\ nfoo\ nbaz, and the regular expression ^ foo$ does not match any characters.

If there is an m modifier, the above string is treated as three lines. ^ foo$ can match to the middle line.

9.4 Dot-all modifier (s)

Limited support

This modifier is not supported by JavaScript before ES2018. Ruby does not support this modification either, but is represented by m.

. Usually matches any character except a newline character. After using the dot all modifier, it can also match newline characters.

10 anchor point

The anchor itself does not match anything. However, they limit the location where the match occurs.

You can think of anchor points as "invisible characters".

10.1 start of line ^

Insert the ^ sign at the beginning of the regular so that the rest of the regular must match at the beginning of the string. You can think of it as always matching an invisible character at the beginning of the string.

/ ^ pthumb g

10.2 end of line

Insert the $sign at the end of the regular, similar to the first line character. You can think of it as always matching an invisible character at the end of the string.

/ pawned _ hand _ g

^ and $anchors are often used together to ensure that the regularity matches the string as a whole, not just partially.

/ ^ paired _ Universe _ g

Let's review an example of repetition and add two anchors at the end of the regular.

/ ^ https?$/g

Without these two anchor points, http/2 and shttp will also be matched.

10.3 word boundary\ b

A word boundary is the position between a character and a non-word character.

The word boundary anchor\ b matches the hypothetical invisible characters that exist between characters and non-word characters.

/\ bp/g

Prompt

The characters include amurz, Amurz, 0-9, and _.

/\ bp\ bUnip g

/\ bcat\ bUnip g

There is also a non-word boundary anchor.

As the name implies, it matches everything except the word boundary.

/\ Bp/g

/\ Bp\ Bhand g

Tips

^... $and\ b... \ b is a common pattern, and you almost always need these two to prevent accidental matches.

10.4 exampl

Trailing space

/\ s+$/gm

Markdown title

/ ^ # # / gm

There are no anchor points:

/ # # / gm

11 Zero width assertion (lookaround)

Zero-width assertions can be used to validate conditions without matching any text.

You can only watch, not move.

Advance assertion (lookhead)

Positive (? =...)

Negative (?!)

Advance assertion (lookbehind)

Positive (?

11.1 advance assertion (lookhead)

Forward (positive)

/ _ (? = [aeiou]) / g

Notice how the following characters do not match. It can be confirmed by looking at the front.

/ (. +) _ (? = [aeiou]) (? =\ 1) / g

The regular engine checks on _ using (? = [aeiou]) and (? =\ 1).

/ (? =. *). * / g

Negative (Negative)

/ _ (?! [aeiou]) / g

/ ^ (?!. *). * $/ g

If there are no anchor points, the parts of each example that do not have # will be matched.

Negative antecedents are often used to prevent matching specific phrases.

/ foo (?! bar) / g

/-(?: (?! -).) * / g

11.2 examples

Password authentication

/ ^ (? = .*\ d) (? = .* [a murz]) (? = .* [Amurz]) (? = .* [a-zA-Z]). {8,} $/

Zero-width assertions can be used to validate multiple conditions.

Quoted string

/ (['"]) (?: (?!\ 1).) *\ 1Unigram

Without prior assertions, the best we can do is:

/ (['"]) [^'"] *\ 1zag

12 advanced example

JavaScript comment

/\ /\ * [\ s\ S] *?\ * / |\ /. * / g

[\ s\ S] is a technique for matching any character, including newline characters. We avoid using the dot-all modifier because we need to use it. Represents a single-line comment.

24 hours

/ ^ ([01]? [0-9] | 2 [0-3]): [0-5] [0-9] (: [0-5] [0-9])? $/ g

IP address

/\ b (?: 2 (?: [0-4] [0-9] | 5 [0-5]) | [0-1]? [0-9]? [0-9])\.) {3} (?: 2 ([0-4] [0-9] | 5 [0-5]) | [0-1]? [0-9]? [0-9]))\ Bg hand

Meta label

/ / gm

Replace:

Floating point number

Optional symbol

Optional integer part

Optional decimal part

Optional index part

/ ^ ([+ -]? (? =\.\ d |\ d) (?:\ d +)? (?:\.?\ d *)) (?: [eE] ([+ -]?\ d +))? $/ g

Forward advance assertions (? =\.\ d |\ d) ensure that there is no match.

HSL Color

An integer from 0 to 360

/ ^ 0* (?: 360 | 3 [0-5]\ d | [12]?\ d?) $/ g

Percentage

/ ^ (?: 100 (?:\ .0 +)? |\ d?\ d (?:\.\ d+))% $/ g

HSL and percentage

/ ^ hsl\ (?: 3 [0-5]\ d | [12]?\ d?\ d)\ s * (?: 100 (?:\ .0 +)? |\ d?\ d (?:\.\ d +))%\ s *) {2}\) $/ gi

13 next step

If you want to learn more about regular expressions and how they work:

Awesome-regex

Regex tag on StackOverflow

StackOverflow RegEx FAQ

R/regex

RexEgg

Regular-Expressions.info

Regex Crossword

Regex Golf

At this point, the study of "introduction to regular expressions" is over. I hope to be able to solve your doubts. The collocation of theory and practice can better help you learn, go and try it! If you want to continue to learn more related knowledge, please continue to follow the website, the editor will continue to work hard to bring you more practical articles!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Internet Technology

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report