Regular expressions in shell programming (1) basic regular expressions 07/13 Update SLTechnology News&Howtos

Regular expressions in shell programming (1) basic regular expressions

2025-07-13 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Servers >

Shulou(Shulou.com)06/02 Report--

Regular expression

Having learned the basic usage of Shell scripts, you can edit Shell scripts with conditional judgments, loops, and other statements. Next we will begin to introduce a very important concept-regular expressions (RegularExpression,RE).

Definition of regular expression

Regular expressions are also known as regular expressions and regular expressions. It is often abbreviated to regex, regexp, or RE in code. A regular expression uses a single string to describe and match a series of strings that conform to certain syntactic rules. To put it simply, it is a method of matching strings, through some special symbols, to quickly find, delete, and replace a specific string.

A regular expression is a text pattern consisting of ordinary characters and metacharacters. Patterns are used to describe one or more strings to match when searching for text. The regular expression acts as a template that matches a character pattern with the searched string. Ordinary characters include uppercase and lowercase letters, numbers, punctuation and other symbols, while metacharacters refer to special characters that have a special meaning in regular expressions. It can be used to specify the occurrence pattern of its leading character (that is, the character before the metacharacter) in the target object.

Regular expressions are commonly used in scripting and text editors. Many text processors and programming languages support regular expressions, such as the common text processors (grep, egrep, sed, awk) in Perl and Linux systems mentioned earlier. Regular expression has a powerful function of text matching, which can process text quickly and efficiently in the ocean of text.

Regular expression usage

For ordinary computer users, because there are not many opportunities to use regular expressions, they can not understand the charm of regular expressions, but for system administrators, regular expressions are one of the necessary skills.

Regular expressions are very important for system administrators, and a large amount of information will be generated during the operation of the system, some of which are very important and some are just informed information. As a system administrator, if you look at so much information data directly, you can't quickly locate the important information, such as "user account login failure", "service startup failure" and so on. At this point, you can quickly extract "problematic" information through regular expressions. In this way, the operation and maintenance work can become more simple and convenient.

At present, many software also support regular expressions, the most common is the mail server. In Internet, spam / advertising messages often cause network congestion, and if these problematic emails are eliminated in advance on the server side, the client will reduce a lot of unnecessary bandwidth consumption. At present, the commonly used mail server postfix and the related analysis software that supports the mail server all support the regular expression comparison function. Compare the title and content of the letter with a special string, and filter out the problem email when you find it.

In addition to mail servers, many server software supports regular expressions. Although these software support regular expressions, the comparison rules of strings still need to be added by the system administrator, so as a system administrator, regular expression is one of the skills that must be mastered.

Basic regular expression

The string expression method of regular expression can be divided into basic regular expression and extended regular expression according to different degree of rigor and function. The underlying regular expression is the most basic part of a commonly used regular expression. In the common file processing tools in Linux systems, grep and sed support basic regular expressions, while egrep and awk support extended regular expressions. To master the use of basic regular expressions, you must first understand the meaning of metacharacters contained in basic regular expressions, which are described one by one through the grep command.

Example of the underlying regular expression:

Finding a specific character is very simple, such as executing the following command to find out the location of the specific character "the" from the test.txt file. Where "- n" indicates that the line number is displayed, and "- I" indicates that it is case-insensitive. After the command is executed, the font color changes to red for characters that meet the matching criteria (all replaced by bold display in this chapter).

Find specific characters

[root@localhost ~] # grep-n 'the' test.txt

[root@localhost ~] # grep-in 'the' test.txt

Reverse selection, such as finding lines that do not contain the "the" character, needs to be done through the "- vn" option of the grep command.

[root@localhost ~] # grep-vn 'the' test.txt

Use brackets "[]" to find collection characters

When you look for the strings "shirt" and "short", you can find that both strings contain "sh" and "rt". At this point, execute the following command to find both "shirt" and "short". No matter how many characters there are in "[]", they represent only one character, that is, "[io]" matches "I" or "o".

[root@localhost] # grep-n'sh [io] rt' test.txt

To find a duplicate single character "oo", simply execute the following command.

[root@localhost ~] # grep-n 'oo' test.txt

If you look for strings that are not preceded by "w" before "oo", you only need to do this by selecting "[^]" in the reverse direction of the collection characters. For example, executing the "grep-n'[^ w] oo'test.txt" command means looking for strings in test.txt text that are not preceded by "w" before "oo".

[root@localhost ~] # grep-n'[^ w] oo' test.txt

In the execution results of the above command, it is found that "woood" and "wooooood" also match the matching rules, and both contain "w". In fact, from the execution results, we can see that the characters that meet the matching criteria are shown in bold, and in the above results, we can see that the bold display in "# woood #" is "ooo", and the "o" before "oo" is in line with the matching rules. Similarly, "# woooooood #" also meets the matching rules.

If you don't want lowercase letters in front of "oo", you can use the "grep-n'[^ amurz] oo'test.txt" command, where "Amurz" represents lowercase letters and uppercase letters are represented by "Amurz".

[root@localhost ~] # grep-n'[^ a Murz] oo' test.txt

[root@localhost ~] # grep-n'[^ a-zA-Z] oo' test.txt

Filter strings that begin with aripz _ Z

Finding rows that contain numbers can be done with the "grep-n'[0-9] 'test.txt" command.

[root@localhost] # grep-n'[0-9] 'test.txt

Find the beginning of the line "^" and the character "$" at the end of the line

The underlying regular expression contains two positioning metacharacters: "^" (the beginning of the line) and "$" (the end of the line). In the above example, there are many lines containing "the" when querying the "the" string, and if you want to query lines that begin with the "the" string, you can do so with the "^" metacharacter.

[root@localhost ~] # grep-n'^ the' test.txt

[root@localhost ~] # grep-n'\. $''test.txt

When querying blank lines, execute the "grep-n'^ $'test.txt" command.

[root@localhost ~] # grep-n'^ $'test.txt

Find any character "." And the repeating character "*"

[root@localhost] # grep-n'w.. d' test.txt

"" means repeating zero or more preceding single characters. "o" means having zero (that is, null characters) or characters greater than or equal to one "o". Because null characters are allowed, executing the "grep-n'o'test.txt" command will output and print everything in the text. If it is "oo", the first o must exist, and the second o must be zero or more o, so all materials that contain o, oo, ooo, ooo, etc., meet the standard. By the same token, if the query contains at least two strings of o or more, execute the command "grep-n characters' test.txt".

[root@localhost ~] # grep-n 'ooo*' test.txt

The query begins with a w and ends with a string of at least one o, which can be achieved by executing the following command.

[root@localhost ~] # grep-n 'woo*d' test.txt

The query begins with a w and ends with a dispensable string of characters in the middle.

[root@localhost ~] # grep-n'. Progresd' test.txt

Query the line of any number

[root@localhost] # grep-n'[0-9] [0-9] * 'test.txt

Find continuous character range "{}"

In the above example, we use "." With "*" to set zero to an infinite number of repeating characters, what if you want to limit repeating strings in a range? For example, if you look for consecutive characters of three to five o, you need to use the bounded character "{}" in the underlying regular expression. Because "{}" has a special meaning in Shell, when using the "{}" character, you need to use the escape character "\" to convert the "{}" character into a normal character. The use of the "{}" character is as follows.

Query the characters of two o:

[root@localhost ~] # grep-n'o\ {2\} 'test.txt

The query begins with w and ends with d, with a string of 2'5 o in the middle

[root@localhost ~] # grep-n'wo\ {2pm 5\} d 'test.txt

The query begins with w and ends with d, with strings of more than 2 o in the middle

[root@localhost ~] # grep-n'wo\ {2,\} d'test.txt

Metacharacter summary

Extended regular expression

In general, the use of basic regular expressions is sufficient, but sometimes a wider range of extended regular expressions is needed to simplify the entire instruction. For example, use the underlying regular expression to query lines other than the blank line in the file and the line beginning with "#" (usually used to view the valid configuration file), and execute "grep-v'^ $'test.txt | grep-v' ^ #'". Here you need to use the pipe command to search twice. If you use an extended regular expression, it can be simplified to "egrep-v'^ $| ^ # 'test.txt", where the pipe symbol in single quotation marks indicates or (or).

In addition, the grep command only supports basic regular expressions, and if you use extended regular expressions, you need to use the egrep or awk command. The awk command is explained in a later section, where we use the egrep command directly. The usage of the egrep command is similar to that of the grep command. The egrep command is a search file acquisition pattern that allows you to search for any string and symbol in a file, or for a string of one or more files. A prompt can be a single character, a string, a word, or a sentence.

Like the basic regular expression type, the extended regular expression also contains multiple metacharacters, the common extended regular expression

The metacharacters of type mainly include the following

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.