How to use regular expressions in Shell 07/04 Update SLTechnology News&Howtos

How to use regular expressions in Shell

2025-07-04 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Servers >

Shulou(Shulou.com)05/31 Report--

This article mainly introduces how to use regular expressions in Shell. It is very detailed and has a certain reference value. Friends who are interested must read it!

Regular expression

Regular expression: a tool used to describe a tool for finding strings that conform to certain complex rules when writing programs or web pages that deal with strings. In other words, the code for text rules.

Regular expressions, like wildcards, are a tool for text matching, but it can describe matching requirements more accurately. Common tools that support regular expressions are: grep tool family that matches lines of text; sed stream editor that changes the input stream; languages that deal with strings such as awk, python, perl, Tcl, etc.; file viewer or pager more, page, less, etc.; text editors such as ed, vi, emacs, vim, etc.

There are many ways to embed comments inside a regular expression, so it has the ability to self-documenting itself.

\ b is a special code of a regular expression (metacharacter, meta character) that represents the beginning or end of a word, that is, the boundary of a word. English words are usually separated by spaces, punctuation, or line breaks, but\ b does not match any of them, only one "position"-the first character and the second character of this position are not all (one is, one is not or does not exist)\ w. "." Is another metacharacter that matches any character except a newline character, and "*" specifies that the content before it can be repeated any number of times in a row. ". *" means any number of characters that do not contain line breaks. The "\ d" metacharacter matches a digit (0 or 1 or 2). ),\ d {n} means that "\ d" must be repeated n times in a row.

Three kinds of grep have emerged in history, all of which can be used to match text: Grep is the earliest text matching program, using the basic regular expression (Basic Regular Expression,BRE) supported by POSIX; Egrep is extended grep (Extended grep), using extended regular expression (Extended Regular Expression,ERE); and Fgrep is fast grep (Fast grep), which is used to match fixed characters rather than regular expressions. In the POSIX (The Portable Operating System Interface) standard released in 1992, three versions were merged into one. Fgrep and egrep can be used on all UNIX/Linux systems, but are marked as deprecated (not recommended).

From the most basic point of view, there are metacharacters (special characters) and general characters matching these two basic characters in the rule. General characters refer to characters that do not have any special meaning, while metacharacters are given some special meaning.

The meta characters supported by BRE and ERE are the same and different in the POSIX standard, while the GNU version of grep used by Linux is more powerful and can use the egrep and fgrep features with the-G,-E,-F options.

Grep under Linux supports POSIX's special character classes except fgrep. The character set (POSIX Character class) is a character enclosed in'[: 'and':] 'and needs to be placed in [] to become a regular expression, for example, [A-Za-z0-9] is equivalent to [: alnum:], and the sort symbol (Collating symbol) is' [.' And'.] Enclosed characters that treat multiple character sequences as one element, such as [.cn.] Represents a sequence of cn characters; the equivalent character set (Equivalence class) represents a family of characters that should be considered equivalent, enclosed with'[= 'and' =]'. Regular expressions allow POSIX character sets to be mixed with other character sets, such as [[: alpha:]!] Match any English letter or exclamation point.

In the figure above, the [[: digit:] _] + regular expression is used, which matches one or more "numeric characters or underscores" and uses the-E argument to support ERE.

There are four ways to match a single character: a generic character, an escaped meta character, a dot 'meta character, and a square bracket expression. General characters refer to characters not listed in Table 4-1, including literal and numeric characters, white space characters and punctuation characters. General characters match themselves. For example, regular china matches the word china rather than China. If you want to match at the same time, you need to use a square bracket expression. Some meta characters are listed in Table 4-1, indicating the meaning of some special cases. When the meta characters cannot represent themselves and these characters are needed, escape characters are used, the escape characters are put before the normal characters, and the escape characters themselves are ignored. Dot characters represent any character, are rarely used alone, and are often mixed with other meta characters to match multiple characters. The square bracket expression (bracket expression) is used to match different situations, for example, [cC] hina only matches china and China, [^ abc] matches any character except abc, and in the square bracket expression, all other meta characters lose their meaning, such as [\.] Matches the backslash and the period, not the period.

In basic regular expressions, the easiest way to represent multiple characters is to concatenate them. However, this method has many limitations, and the application of the modifier meta provides flexible matching capabilities. Where the asterisk (*) meta character matches a single character before 0 or more asterisks. Interval expressions can match the number of repeats of specified characters, such as ab\ {3\} c matching b between an and c repeats three times, ab\ {3,\} c matching b repeating at least three times, ab\ {3p5\} c matching b repeating three to five times. ERE is similar to BRE when matching multiple characters, but supports more expressions, but interval expressions in ERE do not need escape characters, and their\ {and\} represent only the curly braces themselves. In ERE,? Match 0 or one pre-regular expression, + match one or more pre-regular expressions, for example, ab?c matches only ac and abc,ab+c matches abc, abbc, abbbc... Does not match ac.

Anchor characters (^ and $) are used to match the beginning and end of a string, and if ^ and $are used together, the regular expression between the two matches the entire regular expression or entire line, while ^ $matches an empty string or blank line. Anchors in BRE are meta characters only at the beginning and end of regular expressions, while anchor characters in regular expressions only represent themselves. Anchor characters in ERE are always meta characters, and the anchor characters contained in regular expressions are meaningful, but cannot match any string. For example, ABC ^ defg matches the string "ABC ^ def" in BRE, but never matches anything in ERE.

Operator precedence means that when different meta characters appear at the same time, the high priority meta characters will be processed before the lower priority ones.

A mechanism called backward reference (backreference) is provided in BRE to match the parts selected by the previous regular expression. \ 1 -\ 9 references the previously selected pattern, and'\ ('and'\) 'enclose the parts you want to reference later. For example,\ (ab\)\ (cd\) [efg] *\ 1\ 2 matches abcdabcd, abcdeabcd, abcdfabcd, abcdgabcd,\ (go\). *\ 1 matches two go before and after a line.

Alternation is a feature unique to ERE. When square bracket expressions are used, alternating means that you can "match this character or that character", but not this character sequence or that character sequence. Alternation is to separate different sequences with pipe symbols, for example, you | me matches you or me. Alternating characters and pipe symbolic meanings can be used in a regular expression to provide multiple choices. Because it has the lowest priority, it extends until the end of the new alternating character, or regular expression.

In BRE, some meta characters are used to modify the case where the leading character matches repetition, but only for a single character. The grouping function in ERE modifies the prestring with meta characters, enclosing the substring with'()', for example, (go) + matches one or more consecutive go. Grouping is useful when using alternation, for example, (Lily | Lucy) qualifies the matching Lily or Lucy.

The above is all the content of the article "how to use regular expressions in Shell". Thank you for reading! Hope to share the content to help you, more related knowledge, welcome to follow the industry information channel!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.