Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

[shell Foundation] 02, regular expression

2025-01-19 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Servers >

Share

Shulou(Shulou.com)06/03 Report--

I. wildcard characters

Before you talk about regular expressions, look back at wildcards. Many people confuse regular expressions with wildcards.

Wildcard is a special sentence, which mainly contains * and? Sign (and {} ^!), which is mainly used to blur search for files, using it instead of one or more real characters, especially when the full file name is uncertain, to match files that match the criteria.

* represents any number of arbitrary characters, including 0 or more

? Represents any one character

[] stands for matching any character contained in all [], which is used the same way in wildcards and regular expressions

[123] indicates that any of the three characters can be matched to 123.

[1, 2, 3] means that you can match any of the four characters 123 and comma.

^ and! These two symbols are often used with [] to denote inversion, and [^ A] can match any character that is not A.

Note:

Wildcards (globbing) are used to match file names, defining both characters and ranges

Regular expressions are used with text search tools to match the contents of a text file, usually in behavioral units.

Second, regular expression

1. Overview

A regular expression (Regular Expression) is a text pattern that includes normal characters (for example, letters between an and z) and special characters (called metacharacters). Regular expressions use a single string to describe and match a series of strings that match a syntactic rule.

In 1956, a mathematician named Stephen Kleene, based on the early work of McCulloch and Pitts, published a paper entitled "representation of neural network events", which introduced the concept of regular expressions. Regular expressions are used to describe what he calls "algebra of regular sets", so the term "regular expressions" is adopted.

Subsequently, it was found that this work could be applied to some early research on computational search algorithms using Ken Thompson, the main inventor of Unix. The first utility for regular expressions is the qed editor in Unix.

As they say, the rest is well-known history. Regular expressions have been an important part of text-based editors and search tools since then.

2. Regular expression classification

The common notation of regular expressions actually comes from Perl. In fact, regular expressions derive from Perl a prominent school called PCRE (Perl Compatible Regular Expression), which is characterized by notations such as "\ d", "\ w", "\ s" and so on. But in addition to PCRE, there are other schools of regular expressions, such as the regular expressions of the POSIX specification that will be introduced below.

The full name of POSIX is Portable Operating System Interface for uniX, which consists of a series of specifications that define the functions that should be supported by the UNIX operating system, so the regular expression of the POSIX specification is really just the POSIX specification about regular expressions.

It defines two schools: BRE (Basic Regular Expression, basic regular expression) and ERE (Extended Regular Express, extended regular expression).

At present, regular expressions have been widely used in many software, including * nix (Linux, Unix, etc.), HP and other operating systems, PHP, C#, Java and other development environments, as well as a lot of application software, we can see the shadow of regular expressions.

For example, python provides re modules for Perl-style regular expression patterns.

3. Regular expression syntax

Regular expression (regular expression) describes a pattern of string matching, which can be used to check whether a string contains a certain substring, replace a matching substring, or extract a substring from a string that meets a certain condition.

A regular expression is a text pattern that consists of ordinary characters (for example, characters a to z _ 0 through 9) and special characters (called metacharacters). The pattern describes one or more strings to match when searching for text. The regular expression acts as a template that matches a character pattern with the searched string.

Ordinary character

Normal characters include all printable and non-printable characters that are not explicitly specified as metacharacters. This includes all uppercase and lowercase letters, all numbers, all punctuation and some other symbols

Metacharacters (special characters)

Character matching

Metacharacters are characters with special meanings, and many metacharacters require special treatment when trying to match them. To match these special characters, you must first "escape" the characters, that is, place the backslash character (\) before them. The following table lists the special characters in regular expressions:

The metacharacter describes the position at the end of the input string where $matches. If the Multiline property of the RegExp object is set, $also matches'\ n'or'\ r'. To match the $character itself, use\ $. () grouping marks the start and end positions of a subexpression. Subexpressions can be obtained for later use. To match these characters, use\ (and\). * matches the previous subexpression zero or more times. To match the * character, use\ *. + matches the previous subexpression one or more times. To match the + character, use\ +. . Matches any single character except the newline character\ n. Be a match. , please use\. . [marks the beginning of a bracketed expression. To match [, use\ [. ? Matches the previous subexpression zero or once, or indicates a non-greedy qualifier. To play Standard PvP match? Characters, please use\? \ Mark the next character as either a special character, a literal character, a backward reference, or an octal escape character. For example,'n' matches the character'n'. \ n' matches the newline character. The sequence'\ 'matches "\", while'\ ('matches "(". ^ matches the start position of the input string unless used in a square bracket expression, where it indicates that the character collection is not accepted. To match the ^ character itself, use\ ^. {Mark the beginning of the qualifier expression. To match {, use\ {. | indicates a choice between the two items. To match |, use\ |. Word by word instead of character by character

[] any character that appears in formula parentheses, such as the answer to an one-way multiple choice question, may be any of the ABCD options. It is [ABCD] expressed by a regular expression. If you encounter a relatively large range of matches, you need to use the "-" sign to limit the range, such as [Amurz] for all lowercase letters. Always note that the "-" sign is not a single character.

[^] any single character outside the specified range

Commonly used character sets:

[aMuz] all lowercase letters; note that all letters are represented in the file name wildcard, including uppercase and lowercase, and only lowercase in regular expressions

[Amurz], [0-9], [a-zA-Z0-9], [^ a-zA-Z0-9], [Amurz\ -]

[: lower:], [: upper:], [: alpha:] [: digit:], [: alnum:]

[: punct:] all symbols

[: space:] all white space characters, including spaces, tabs, page feeds, etc.; equivalent to [\ f\ n\ r\ t\ v], excluding blank lines

\ d: any number

\ d: any non-numeric

\ w: match letters, numbers and underscores, equivalent to [: alnum:] _]

\ W: match non-alphanumeric, non-underscore, equivalent to [^ [: alnum:] _]

\ s: matches any white space characters, including spaces, tabs, page feeds, etc.; equivalent to [\ f\ n\ r\ t\ v], excluding blank lines

\ s: matches any non-white space character, which is equivalent to [^\ f\ n\ r\ t\ v]

[\ d], [\ D], [\ s], [\ w] are also available

Qualifier

Times matching

Qualifiers are used to specify how many times a given component of a regular expression must appear to satisfy a match. Is there * or + or? Or {n} or {n,} or {n ~ m}.

The qualifiers for regular expressions are:

The character description * matches the previous subexpression zero or more times. For example, zo* can match "z" and "zoo". * is equivalent to {0,}. + matches the previous subexpression one or more times. For example, 'zo+' can match "zo" and "zoo", but not "z". + is equivalent to {1,}. ? Matches the previous subexpression zero or once. For example, "do (es)?" Can match "do" in "do" or "does". ? It is equivalent to {0jue 1}. {n} n is a non-negative integer. Match the determined n times. For example,'o {2} 'does not match the' o'in 'Bob', but does match the two o in 'food'. {n,} n is a non-negative integer. Match at least n times. For example,'o {2,} 'does not match' o'in 'Bob', but does match all o in 'foooood'. O {1,}'is equivalent to 'oasis'. O {0,}'is equivalent to 'oval'. {n ·m} m} m and n are non-negative integers, where n

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Servers

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report