In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-04-06 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Servers >
Share
Shulou(Shulou.com)06/02 Report--
Shell programming regular expressions and file processor skills display:-basic regular expressions-extended regular expressions-sed tool usage-awk tool usage regular expressions before learning the basic usage of Shell scripts, you can use conditional judgment, loops and other statements to edit Shell scripts. Next we will begin to introduce a very important concept-regular expressions (RegularExpression,RE). 1 Overview of regular expressions Let's take a look at the definition and purpose of regular expressions. 1. The definition of regular expression regular expression is also called regular expression, regular expression. It is often abbreviated to regex, regexp, or RE in code. A regular expression uses a single string to describe and match a series of strings that conform to certain syntactic rules. To put it simply, it is a method of matching strings, through some special symbols, to quickly find, delete, and replace a specific string. A regular expression is a text pattern consisting of ordinary characters and metacharacters. Patterns are used to describe one or more strings to match when searching for text. The regular expression acts as a template that matches a character pattern with the searched string. Ordinary characters include uppercase and lowercase letters, numbers, punctuation and other symbols, while metacharacters refer to special characters that have a special meaning in regular expressions. It can be used to specify the occurrence pattern of its leading character (that is, the character before the metacharacter) in the target object. Regular expressions are commonly used in scripting and text editors. Many text processors and programming languages support regular expressions, such as the common text processors (grep, egrep, sed, awk) in Perl and Linux systems mentioned earlier. Regular expression has a powerful function of text matching, which can process text quickly and efficiently in the ocean of text. two。 The use of regular expressions for ordinary computer users, because there are not many opportunities to use regular expressions, they can not understand the charm of regular expressions, but for system administrators, regular expressions are one of the necessary skills. Regular expressions are very important for system administrators, and a large amount of information will be generated during the operation of the system, some of which are very important and some are just informed information. As a system administrator, if you look at so much information data directly, you can't quickly locate the important information, such as "user account login failure", "service startup failure" and so on. At this point, you can quickly extract "problematic" information through regular expressions. In this way, the operation and maintenance work can become more simple and convenient. At present, many software also support regular expressions, the most common is the mail server. In Internet, spam / advertising messages often cause network congestion, and if these problematic emails are eliminated in advance on the server side, the client will reduce a lot of unnecessary bandwidth consumption. At present, the commonly used mail server postfix and the related analysis software that supports the mail server all support the regular expression comparison function. Compare the title and content of the letter with a special string, and filter out the problem email when you find it. In addition to mail servers, many server software supports regular expressions. Although these software support regular expressions, the comparison rules of strings still need to be added by the system administrator, so as a system administrator, regular expression is one of the skills that must be mastered. (3) the string expression method of basic regular expression can be divided into basic regular expression and extended regular expression according to different degree of rigor and function. The underlying regular expression is the most basic part of a commonly used regular expression. In the common file processing tools in Linux systems, grep and sed support basic regular expressions, while egrep and awk support extended regular expressions. To master the use of basic regular expressions, you must first understand the meaning of metacharacters contained in basic regular expressions, which are described one by one through the grep command. Example of a basic regular expression
The following operation requires a test file named test.txt to be prepared in advance, as shown below. [root@localhost] # cat test.txt he was short and fat. Find specific characters
Finding a specific character is very simple, such as executing the following command to find out the location of the specific character "the" from the test.txt file. Where "- n" indicates that the line number is displayed, and "- I" indicates that it is case-insensitive. After the command is executed, the font color changes to red for characters that meet the matching criteria (all replaced by bold display in this chapter). [root@localhost] # grep-n 'the' test.txt. Reverse selection, such as finding lines that do not contain the "the" character, needs to be done through the "- vn" option of the grep command. [root@localhost ~] # grep-vn 'the' test.txt uses square brackets "[]" to find collection characters
When you look for the strings "shirt" and "short", you can find that both strings contain "sh" and "rt". At this point, execute the following command to find both "shirt" and "short". No matter how many characters there are in "[]", they represent only one character, that is, "[io]" matches "I" or "o". [root@localhost ~] # grep-n'shio] rt' test.txt to find a duplicate single character "oo", you only need to execute the following command. [root@localhost ~] # grep-n 'oo'test.txt if you look for strings that are not preceded by "w" before "oo", you only need to reverse select "[^]" of the collection characters to achieve this purpose, such as executing the "grep-n' [^ w] oo'test.txt" command to look for strings in test.txt text that are not preceded by "w" before "oo". [root@localhost ~] # grep-n'[^ w] oo' test.txt found that "woood" and "wooooood" also match the matching rules in the execution results of the above command, both of which contain "w". In fact, from the execution results, we can see that the characters that meet the matching criteria are shown in bold, and in the above results, we can see that the bold display in "# woood #" is "ooo", and the "o" before "oo" is in line with the matching rules. Similarly, "# woooooood #" also meets the matching rules. If you don't want lowercase letters in front of "oo", you can use the "grep-n'[^ amurz] oo'test.txt" command, where "Amurz" represents lowercase letters and uppercase letters are represented by "Amurz".
[root@localhost ~] # grep-n'[^ a Murz] oo' test.txt
Finding rows containing numbers can be done with the "grep-n'[0-9] 'test.txt" command. [root@localhost ~] # grep-n' [0-9] 'test.txt
Find the beginning of the line "^" and the character "$" at the end of the line
The underlying regular expression contains two positioning metacharacters: "^" (the beginning of the line) and "$" (the end of the line). In the above example, there are many lines containing "the" when querying the "the" string, and if you want to query lines that begin with the "the" string, you can do so with the "^" metacharacter.
[root@localhost ~] # grep-n'^ the' test.txt
Queries that begin with lowercase letters can be filtered by the "^ [Amurz]" rule, lines that begin with uppercase letters can be filtered using "^ [Amurz]", and queries that do not begin with letters use the "^ [^ a-zA-Z]" rule. [root@localhost ~] # grep-n'^ [a Murz] 'test.txt [root@localhost ~] # grep-n' ^ [^ a-zA-Z] 'test.txt "^" symbol has a different function inside and outside the metacharacter set "[]" symbol, indicating reverse selection within the "[]" symbol and positioning the beginning of the line outside the "[]" symbol. Conversely, you can use the "$" locator if you want to find a line that ends with a particular character. For example, execute the following command to query rows that end with a decimal point (.). Because the decimal point (.) is also a metacharacter in regular expressions (which will be discussed later), you need to use the escape character "\" to convert characters with special meaning into ordinary characters. [root@localhost ~] # grep-n'\. $'test.txt when querying blank lines, execute the "grep-n' ^ $'test.txt" command.
[root@localhost ~] # grep-n'^ $'test.txt
Find any character "." And the repeating character "*"
As mentioned earlier, the decimal point (.) in a regular expression is also a metacharacter that represents any character. For example, execute the following command to find a string of four characters that begins with w and ends with d.
[root@localhost ~] # grep-n'w. D 'test.txt in the above result, the "wood" string "w... d" matches the rule. If you want to query oo, ooo, ooooo, and so on, you need to use asterisk (*) metacharacters. It is important to note, however, that "*" represents the repetition of zero or more of the first single characters. " O * "means to have zero characters (that is, null characters) or greater than or equal to one" o "character. Because null characters are allowed, executing the" grep-n'o*'test.txt "command will print everything in the text. If it is "oo*", the first o must exist, and the second o must be zero or more o, so all materials that contain o, oo, ooo, ooo, etc., meet the standard. By the same token, if the query contains at least two strings of o or more, execute the command "grep-n characters' test.txt". The [root@localhost ~] # grep-n 'ooo*' test.txt query begins with a w and ends with a string of at least one o, which can be achieved by executing the following command. The [root@localhost ~] # grep-n 'woo*d' test.txt query ends with a w beginning with a d, and the characters in the middle are optional strings. [root@localhost ~] # grep-n'. Roomd' test.txt queries the row of any number. [root@localhost] # grep-n'[0-9] [0-9] * 'test.txt
Find continuous character range "{}"
In the above example, we use "." With "*" to set zero to an infinite number of repeating characters, what if you want to limit repeating strings in a range? For example, if you look for consecutive characters of three to five o, you need to use the bounded character "{}" in the underlying regular expression. Because "{}" has a special meaning in Shell, when using the "{}" character, you need to use the escape character "\" to convert the "{}" character into a normal character. The use of the "{}" character is as follows.
(1) query the characters of two o.
[root@localhost ~] # grep-n'o\ {2\} 'test.txt
(2) the query begins with w and ends with d, with a string of 2'5 o in the middle.
[root@localhost ~] # grep-n'wo\ {2pm 5\} d 'test.txt
(3) the query begins with w and ends with d, with strings of more than 2 o in the middle.
[root@localhost ~] # grep-n'wo\ {2,\} d 'test.txt metacharacters Summary through the above simple examples, we can see that the metacharacters of common basic regular expressions mainly include the following, as shown in the table. Metacharacters act as ^ to match the beginning of the input string. Used in square brackets expressions to indicate that the character collection is not included. $matches the end of the input string. . Matches any single character except "\ r\ n" marks the next character as a special character, literal character, backward reference, octal escape character. * matches the previous subexpression zero or more times. To match the "" character, use the "\" [] character collection. Matches any of the characters contained. For example, "[abc]" can match the set of "a" [^] assigned characters in "plain". Matches any character that is not included. [n1-n2] character range. Matches any character in the specified range. {n} n is a non-negative integer, and the n times {n,} n determined by matching is a non-negative integer, at least n times. Both n and n are non-negative integers, where n $CONFIGsed-I-e'/ ^ local_enable/s/NO/YES/g'-e'/ ^ write_enable/s/NO/YES/g' $CONFIG grep "listen" $CONFIG | | sed-I'$alisten=YES' $CONFIG# starts the vsftpd service and is set to automatically run the systemctl restart vsftpdsystemctl enable vsftpd [root@localhost ~] # chmod + x local_only_ftp.shawk tool after boot. Awk is a powerful editing tool in Linux/UNIX systems. Reading the input text line by line, searching according to the specified matching pattern, formatting and outputting or filtering the qualified content can achieve quite complex text operations without interaction, and is widely used in Shell scripts to complete a variety of automatic configuration tasks. Common usage of 1.awk the command format used by awk is as follows, where single quotation marks and curly braces "{}" are used to set the processing action on the data. Awk can process the target file directly or through the "- f" read script. Awk option 'mode or condition {editing instruction}' file 1 file 2 "/ filter and output contents of file symbol condition awk-f script file 1 file 2" / / invoke editing instructions from the script, filter and output the above mentioned sed commands are often used for a whole line of processing, while awk prefers to divide a line into multiple "fields" before processing By default, the delimiter of the field is a space or the tab key. The result of awk execution can be printed and displayed through the function of print. In the process of using the awk command, you can use the logical operator "& &" for "and", "|" for "or", "!" It means "not"; you can also perform simple mathematical operations, such as +, -, *, /,%, ^ for addition, subtraction, multiplication, division, remainder, and multiplier, respectively. / etc/passwd is a very typical format file in Linux system, and the fields are separated by ":". Most of the log files in Linux system are also format files. Extracting relevant information from these files is one of the daily work of operation and maintenance. If you need to find out the user name, user ID, group ID and other columns of / etc/passwd, execute the following awk command. [root@localhost ~] # awk-F':'{print $1score3 print 4}'/ etc/passwd root 00 awk reads information from an input file or standard input, just like sed, the information is read line by line. The difference is that awk treats a line in a text file as a record and a part (column) of a line as a field (field) in a record. To manipulate these different fields, awk borrows a method similar to location variables in shell, using $1, $2, and $3 "to represent different fields in rows (records) sequentially. In addition, awk uses $0 to represent the entire line (record). Different fields are separated by specified characters. The default delimiter for awk is a space. Awk allows you to specify delimiters in the form of "- F delimiters" on the command line. Therefore, in the above example, the awk command processes the / etc/passwd file as shown in figure 4.1. The awk schematic awk contains several special built-in variables (available directly) as follows: * FS: specifies the field delimiter for each line of text, which defaults to spaces or tab stops. * NF: the number of fields in the rows currently processed. * NR: the line number (ordinal) of the currently processed row. * $0: the entire line content of the currently processed line. * $n: the nth field (nth column) of the currently processed row. * FILENAME: the name of the file being processed. * RS: data records are separated. The default is\ n, that is, one record for each behavior. two。 Usage example 1) Line output text awkawk'{print} 'test.txt' {print $0}' test.txt// output all content, which is equivalent to cat test.txt// output all content, equivalent to cat test.txtawk 'NR==1,NR==3 {print}' test.txt// output line 1'3 content awk'(NR > = 1) & & (NR
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 298
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.