Sed application finishing 07/09 Update SLTechnology News&Howtos

Sed application finishing

2025-07-09 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Servers >

Shulou(Shulou.com)06/03 Report--

You can learn how sed works before finishing.

Here is an example of sed-f script_file input_file, which works as shown in the following figure:

Where input_file represents the processing file and script_file represents the script command.

The working principle is as follows:

A) first read the first line of the processing file into the schema space.

B) then execute the command set in the script command on this line in the pattern space, and execute the command set in the script command from top to bottom.

C) output the contents of the mode space after the execution of the script command.

D) empty the contents of the schema space and read in the second line of the processing file, and repeat the contents of steps b) and c) until all the contents of the file are processed.

Miserable programmers can look at the pseudo-code in the upper right corner of the picture and explain it more clearly.

Pattern space

There is a word called pattern space in the above description, which is explained below. Since the execution of the sed command does not modify the original file, that is, what the input file is, after the execution of the sed command, the input file does not change, so you certainly cannot edit it on the basis of the input file, so you need a separate space to dump the contents of the file, and then process and export it. The pattern space is such a space where the contents of the input file are transferred, and the sed command reads a line of contents into the input file at one time, and uses the steps supported by the sed command to execute the processing of the contents in the mode space, then outputs the processed results and deletes the contents in the mode space, ready to read the contents in the next line of input file.

Multiline mode space

As mentioned above, the pattern space reads one line at a time in the input file for processing, sometimes reading only one line into the pattern space has limited processing power to the input file, for example, it is difficult to deal with a phrase that starts at the end of the line and ends at the beginning of the next line. The multi-line pattern space is proposed to solve this problem, which allows the content of the pattern space to be extended from one line to multiple lines. The details will be explained later in this article.

Keep the space

The pattern space is a buffer that holds the current input line, while the reserved space is a portion of the buffer that is used to temporarily store the contents of the pattern space. The content in the pattern space can be copied to the hold space, and the content in the hold space can also be copied back to the pattern space. The details will be explained later in this article.

Regular expression metacharacter

Metacharacter

Function

Example

Line header locator

/ ^ my/ matches all lines that start with my

End of line locator

/ my$/ matches all lines ending in my

Matches a single character other than a newline character

/ m..y/ matches the line containing the letter m, followed by two arbitrary characters, followed by the letter y

Match zero or more leading characters

/ my*/ matches lines containing the letter m followed by zero or more y letters

[]

Matches any character within the specified character group

/ [Mm] y / match lines containing My or my

[^]

Matches any character that is not in the specified character group

/ [^ Mm] y / match contains y, but the character before y is not the line of M or m

\ (..\)

Save matched characters

1 20s /\ (you\) self/\ 1r/ marks the pattern between metacharacters and saves it as label 1, which can then be referenced using\ 1. You can define up to 9 tags, numbering from the left, and the leftmost one is the first. In this example, lines 1 through 20 are processed, and you is saved as label 1, and if youself is found, it is replaced with your.

Save the lookup string for reference in the replacement string

The s/my/**&**/ symbol & stands for the lookup string. My will be replaced with * * my**

\ / match lines that contain words ending in my

X\ {m\}

Continuous m x

/ 9\ {5\} / match rows with 5 consecutive 9s

X\ {m,\}

At least m x

/ 9\ {5,\} / match lines with at least five consecutive 9s

X\ {mdirection n\}

At least m, but not more than n x

/ 9\ {5pm 7\} / matches rows with 5 to 7 consecutive 9s

Common commands and options

Sample command function a\

Adds one or more rows after the current line. When there are multiple lines, except for the last line, the line should be continued with "\" at the end of each line.

C\ replaces the text on the current line with the new text after this symbol. When there are multiple lines, except for the last line, the line should be continued with "\" at the end of each line.

I\ insert text before the current line. When there are multiple lines, except for the last line, the line should be continued with "\" at the end of each line.

D Delete rows

H copy the contents of the pattern space to the staging buffer

H appends the contents of the pattern space to the temporary buffer

G copy the contents of the temporary buffer to the pattern space, overwriting the original content

G appends the contents of the temporary buffer to the pattern space and appends the original content

L list non-print characters

P print line

Read in the next input line and start processing it from the next command instead of the first command

Q end or exit sed

R read input lines from a file

! Apply a command to all lines except the selected line

S replace one string with another

G do global substitution within the line

W writes the selected line to the file

X swap the contents of temporary buffer and pattern space

Y replace a character with another character (you cannot use the y command on a regular expression

-e to make multiple edits, that is, to apply multiple sed commands to the input line

-n

Cancel the default output

-f specifies the file name of the sed script

Application example arrangement

Real arrangement of advanced applications

First of all, you should understand the definition of schema space. The pattern space is the cache where the lines are read, and all the processing of the lines of text by sed is done in this cache. This is helpful to the following study.

Normally, sed reads the pending line into the pattern space, and the commands in the script process the line one after another until the script is finished, and then the line is output, and the pattern space is left empty; then repeat the action, and the new line in the file is read in until the file is processed.

However, for a variety of reasons, such as the user wants a command in the script to be executed under certain conditions, or the schema space is reserved for next processing, it is possible that sed does not follow the normal process when processing files. At this time, sed sets up some advanced commands to meet the requirements of the user.

In general, these commands can be divided into the following three categories:

1. N, D, P: dealing with the problem of multiline pattern space

2. H, h, G, g, x: put the contents of the schema space into the storage space for subsequent editing

3.:, b, t: implement the branch and conditional structure in the script.

Processing of multiline mode space:

Because regular expressions are line-oriented, it is quite difficult to use commands such as grep if a phrase is at the end of one line and the other part is at the beginning of the next line. However, with the help of sed's multi-line commands N, D, P, this task can be easily accomplished.

The multi-line Next (N) command is relative to the next (n) command, which outputs the contents of the pattern space and then reads the next line into the pattern space, but the script does not move to the beginning but starts after the current n command; the former saves the contents of the original pattern space and reads the new line in, separated by a newline character "\ n". After the N command is executed, the control flow continues to process the mode space with commands that follow the N command.

It is worth noting that in multiline mode, the special characters "^" and "$" match the beginning and end of the pattern space, not the beginning and end of the embedded "\ n".

Example 1:

$cat expl.1

Consult Section 3.1 in the Owner and Operator

Guide for a description of the tape drives

Available on your system.

Now replace "Owner and Operator Guide" with "Installation Guide":

$sed'/ Operator$/ {

> N

> s/Owner and Operator\ nGuide/Installation Guide\

> /

>} 'expl.1

Note in the above example that there is a newline character embedded between lines; in addition, if you want to insert a newline character in the content used for substitution, use the escape of "\" as above.

Look at another example:

Example 2:

$cat expl.2

Consult Section 3.1 in the Owner and Operator

Guide for a description of the tape drives

Available on your system.

Look in the Owner and Operator Guide shipped with your system.

Two manuals are provided including the Owner and

Operator Guide and the User Guide.

The Owner and Operator Guide is shipped with your system.

$sed 's/Owner and Operator Guide/Installation Guide/

> / Owner/ {

> N

> s / *\ n / /

> s/Owner and Operator Guide * / Installation Guide\

> /

} 'expl.2

The results are as follows:

Consult Section 3.1 in the Installation Guide

For a description of the tape drives

Available on your system.

Look in the Installation Guide shipped with your system.

Two manuals are provided including the Installation Guide

And the User Guide.

The Installation Guide is shipped with your system.

It seems superfluous to make two substitutions in the sed command. In fact, if you remove the first replacement and run the script, you will find that there are two problems with the output. One is that the last line in the result will not be replaced (or even output in some versions of sed). This is because the last line matches "Owner" and executes the N command, but at the end of the file, some versions will print the line directly and then exit, while others will exit immediately without printing. This problem can be solved by using the command "$! n". This means that the N command has no effect on the last line. Another problem is that the "look manuals" paragraph is split into two lines, and the blank line with the next paragraph is deleted. This is the result of the replacement of embedded newline characters. Therefore, it is not superfluous to make two replacements in sed.

Example 3:

$cat expl.3

This is a test paragraph in Interleaf style ASCII. Another line

In a paragraph. Yet another.

V.1111111111111111111111100000000000000000001111111111111000000

100001000100100010001000001000000000000000000000000000000000000

000000

More lines of text to be found after the figure.

These lines should print.

Our sed command goes like this:

$sed'/ {

> N

> c\

> .LP

> /, / / {

> w fig.interleaf

> / / I\

> .FG\

> .FE

> d

> / ^ $/ d 'expl.3

The result of running is:

.LP

This is a test paragraph in Interleaf style ASCII. Another line

In a paragraph. Yet another.

.FG

.FE

.LP

More lines of text to e found after the figure.

These lines should print.

The contents between and are written to the file "fig.interleaf". It is worth noting that the command "d" does not affect what the command I inserts.

The command "d" deletes the contents of the pattern space, then reads in a new line, and the sed script executes again from scratch. The difference with the command "D" is that it removes part of the pattern space until the first newline character is embedded, but no new lines are read, and the script goes back to the beginning of processing the rest.

Example 4:

$cat expl.4

This line is followed by 1 blank line.

This line is followed by 2 blank line.

This line is followed by 3 blank line.

This line is followed by 4 blank line.

This is the end.

Different delete commands get different results:

$sed'/ ^ $/ {$sed'/ ^ $/ {

> N > N

> / ^\ nhammer D > / ^\ nhammer D

>} 'expl.4 >}' expl.4

Sed's default action for each line in the file (processed or not) is to output it. If you add the option "- n", the output action will be suppressed, and if you want the output to be printed, you need a print command. The print command for single-line mode space is "p" and the print command for multi-line mode space is "P". The P command prints the portion of the pattern space up to the first embedded newline character.

The P command usually appears after the N command and before the D command, thus forming an input / output loop. In this case, there are always two lines of text in the pattern space, and the output is always one line of text. The purpose of using this loop is to output the first line in the pattern space, and then the script goes back to the beginning and processes the second line in the space. Imagine that without this loop, when the script is fully executed, the content in the pattern space will be output, which may not meet the requirements of the user or reduce the efficiency of program execution.

Here is an example:

Example 5:

$cat expl.5

Here are examples of the UNIX

System. Where UNIX

System appears, it should be the UNIX

Operating System.

$sed'/ UNIX$/ {

> N

> /\ nSystem/ {

> sbat / Operating & /

> P

> D

>} 'expl.5

The result of the replacement is:

Here are examples of the UNIX Operating

System. Where UNIX Operating

System appears, it should be the UNIX

Operating System.

You can compare the differences between the two types of commands by replacing the "P" and "D" in the sed command with lowercase.

The following example is quite difficult:

Example 6:

$cat expl.6

I want to see @ fl (what will happen) if we put the

Font change commands @ fl (on a set of lines). If I understand

Things (correctly), the @ fl (third) line causes problems. No?.

Is this really the case, or is it (maybe) just something else?

Let's test having two on a line @ fl (here) and @ fl (there) as

Well as one that begins on one line and ends @ fl (somewhere

On another line). What if @ fl (it is here) on the line?

Another @ fl (one).

What we need to do now is to replace "fl@ (…) with"\ fB (...). \ fR. Here are the sed commands that meet the criteria:

$sed 's/@fl (\ ([^)] *\)) /\ fB\ 1\\ fR/g

> / @ fl (. * /

> N

> s/@fl (\ (. *\ n [^)] *\)) /\ fB\ 1\\ fR/g

> P

> D

>} 'expl.6

However, if you do not use this input-output loop, but instead use N alone, there will be a problem:

$sed 's/@fl (\ ([^)] *\)) /\ fB\ 1\\ fR/g

> / @ fl (. * /

> N

> s/@fl (\ (. *\ n [^)] *\)) /\ fB\ 1\\ fR/g

>} 'expl.6

There are vulnerabilities in such sed scripts.

Store rows:

The definition of schema space has been explained earlier, and there is also a cache called storage space in sed. Content in pattern space and storage space can be copied to each other through a set of commands:

Command shorthand function

Hold h or H copies or appends the contents of the schema space to the storage space

Get g or G copies or appends the contents of the storage space to the schema space

Exchange x swap mode space and storage space content

The difference between uppercase and lowercase commands is that uppercase commands append the contents of the source space to the target space, while lowercase commands overwrite the target space with the contents of the source space. It is worth noting that both the Hold command and the Get command add a newline character to the original content of the destination space before adding the contents of the source space to the newline character.

From the following example, we can understand the preliminary application of this part:

Example 7:

$cat expl.7

one

two

eleven

twenty-two

one hundred and eleven

two hundred and twenty two

All we have to do is swap the first line with the second line, the third line with the fourth line, and the fifth line with the sixth line. The various commands of sed are:

$sed'

> / 1 / {

> h

> d

> / 2 / {

> G

>} 'expl.7

The process goes like this: first, sed reads the first line into the schema space, then the h command stores it in the storage space, and a d command empties the contents of the schema space; then sed reads the second line into the schema space, and then the G command appends the contents of the storage space to the schema space (note that a newline character is added at the end of the original content of the schema space).

The final results are as follows:

two

one

twenty-two

eleven

two hundred and twenty two

one hundred and eleven

When using the H or h command, it is more common to add the d command after this command, so that the sed script will not reach the end, so the contents of the schema space will not be output. In addition, if you replace d with n, or G with g, you will not achieve your goal.

What is the most convenient case conversion of the child mother? it is estimated that it is tr.

$tr "[a murz]"[Amurz]" File

What's interesting is that sed can also complete this transformation. The corresponding command is y:

$sed'

> / [address] / YAccord abcdefghijklmnopqrstuvwxyzCompare ABCDEFGHIJKLMNOPQSTUVWXYZ 'File

However, the y command completely modifies the entire line, so it doesn't work if you just change the case of a few characters in the line. To do this, you need to use the Hold and Get commands just mentioned.

Cat expl.8

Find the Match statement

Consult the Get statement

Using the Read statement to retrieve data

$sed'/ the. * statement/ {

> h

> s/.*the\ (. *\) statement.*/\ 1 /

> y/abcdefghijklmnopqrstuvwxyz/ABCDEFGHIJKLMNOPQRSTUVWXYZ/

> G

> s /\ (. *\)\ n\ (. * the\). *\ (statement.*\) /\ 2\ 1\ 3 /

>} 'expl.8

Use the first line of processing to illustrate the meaning of this command:

(1) "find the Match statement" is put into storage space

(2) change the line to get: Match

(3) convert the result of (2) to uppercase: MATCH

(4) remove (1) the reserved content from the storage space is appended to the pattern space, and the content of the pattern space is:

MATCH\ nfind the Match statement

(5) replace the content of the pattern space again to get: find the MATCH statement.

The following examples use more solid regular expressions, but it doesn't matter. Take your time, all problems can be solved. In addition, the text used in this example is mainly related to editing and typesetting, which I am not good at, so I just take out the sed script, grasp the core, and save the details:

Example 9:

$cat expl.9.sed

S / [] [\ *.] /\ & / g

S / [\ &] / /\ & / g

S / ^\ .XX / /

Spockets /\ / /

S / ^\\ .XX\ (. *\) $/ / ^\ .XX\ / s\ /\ 1Accord /

S /\ nCompact /

(1) h: put the lines of the speech into the storage space.

(2) s / [] [\ *.] /\ & / g: this expression is more difficult. If the first character in "[]" is "]", then "]" loses its special meaning. In addition, in "[]", only "\" has a special meaning, and the implication is "*", "." They are all taken literally, and the escape of "\" must be used to make them have a special meaning; although it does not appear in the expression, it should be mentioned that in "[]" only "^" appears in the first position. it means "not", the rest is literal, and "$" has a special meaning only at the end of the regular expression. " The special meaning of "\" is removed, and "&" indicates forward reference, so the second command means to change the "[", "]", "\", "*", "." in the pattern space. " Use "\ [", "\]", "\", "\ *", and "\." To replace.

(3) x: swap mode space and storage space. After executing this command, the content of the pattern space is the content of the original text, while the content of the storage space changes, and each special character is replaced with "\ &".

(4) s / [\ &] /\ & / g: for pattern space processing, the occurrence of "\" or "&" will be replaced with "\" or "\ &".

(5) smobiles /: this is easy to understand by adding a "/" at the end of the pattern space.

(6) x: exchange the contents of the two spaces again.

(7) s / ^\\ .XX\ (. *\) $/ / ^\ .XX\ / s\ /\ 1Actuator: this is not difficult, just those quotes are easy to confuse people, be careful, there will be no problem, just skip it.

(Cool G: omitted.

(9) s /\ nCompact: delete the newline character.

What is the use of this script? The experiment is clear with the following text:

.XX "asterisk (*) metacharacter"

The following is the result of each command, with the first and second lines representing the contents of the schema space and storage space, respectively:

1. Xx "asterisk (*) metacharacter"

.XX "asterisk (*) metacharacter"

2. XX "asterisk (\ *) metacharacter"

.XX "asterisk (*) metacharacter"

3. Xx "asterisk (*) metacharacter"

XX "asterisk (\ *) metacharacter"

4. Xx "asterisk (*) metacharacter"

XX "asterisk (\ *) metacharacter"

5. "asterisk (*) metacharacter"

XX "asterisk (\ *) metacharacter"

6. "asterisk (*) metacharacter" /

XX "asterisk (\ *) metacharacter"

7. XX "asterisk (\ *) metacharacter"

"asterisk (*) metacharacter" /

8. / ^\ .XX / s / "asterisk (\ *) metacharacter" /

"asterisk (*) metacharacter" /

9. / ^\ .XX / s / "asterisk (\ *) metacharacter" /\ n / "asterisk (*) metacharacter" /

10./ ^\ .XX / s / "asterisk (\ *) metacharacter" / "asterisk (*) metacharacter" /

See, actually "s / [\ &] /\ & /" doesn't work in our example, but it's indispensable, because in the second part of the s command, "\" and "&" have a special meaning, so escape its special meaning in advance.

Do you get it? When you want to use a shell script to automatically generate a sed script that is mainly a replacement command, you will see how critical this is to the handling of special characters.

With the above application, the storage space can even store the contents of many lines for future output. In fact, this feature is very effective for text that has a very obvious structure, such as html. Here are some examples:

Example 10

Cat expl.10

My wife won't let me buy a power saw. She is afraid of an

Accident if I use one.

So I rely on a hand saw for a variety of weekend projects like

Building shelves.

However, if I made my living as a carpenter, I would

Have to use a power

Saw. The speed and efficiency provided by power tools

Would be essential to being productive.

For people who create and modify text files

Sed and awk are power tools for editing.

Most of the things that you can do with these programs

Can be done interactively with a text editor. However

Using these programs can save many hours of repetitive

Work in achieving the same result.

$sed'/ ^ $/! {

> H

> d

> / ^ $/ {

> x

> s / ^\ n /

> splanchnic /

> G

>} 'expl.10

Run this command and see what the result looks like. In fact, the result doesn't matter anymore. Through this child, you should learn the idea of process control embodied in the script. The first part of the script uses "!" Represents the processing of mismatched lines, but because of the existence of "d", it does not go to the bottom of the script, and naturally there will not be any output; in the second part of the script, the script is indeed at the end. accordingly, the contents of the pattern space and storage space are cleared, ready to read into the next paragraph.

This example is already over, but there is also a situation where what happens if the last line of the file is not a blank line? Obviously, the last paragraph of the text will not be output. How to deal with this situation? The wisest way is to "create" a blank line yourself. The new script looks like this:

$sed'${

> / ^ $/! {

> H

> splanch.hammer _ _ /

> / ^ $/! {

> H

> d

> / ^ $/ {

> x

> s / ^\ n /

> splanchnic /

> G

>} 'expl.10

Flow control command

To give users real "freedom" when writing sed scripts, sed also allows you to set tokens with ":" in the script, and then use the "b" and "t" commands for flow control. As the name implies, "b" means "branch" and "t" means "test"; the former is the branch command and the latter is the test command.

First of all, let's take a look at the various types of labels. This label is placed where you want the process to start, on a separate line, starting with a colon. No spaces or tabs are allowed between colons and transitions, and the tag will be considered part of the tag if there is a space at the end.

Let's talk about the b command. Its format is as follows:

[address] b [label]

What it means is that if address is satisfied, the sed process jumps with the tag: if the tag is specified, the script first assumes that the tag is on a line below the b command, and then goes to that line to execute the corresponding command; if the tag does not exist, the control flow jumps directly to the end of the script. Otherwise, continue to execute subsequent commands.

In some cases, the b command and! The commands are similar, but! The command only works on the content in the next {}, while the b command gives the user enough freedom to choose which commands should and should not be executed in the sed script. Here are several classic uses of the b command:

(1) create a loop:

: top

Command1

Command2

/ pattern/b top

Command3

(2) ignore some commands that do not meet the conditions:

Command1

/ patern/b end

Command2

: end

Command3

(3) only one of the two parts of the command can be executed:

Command1

/ pattern/b dothere

Command

: dothere

Command3

The format of the t command is the same as the b command:

[address] t [label]

It means that if address is met, the sed script will transfer the process according to the tag indicated by the t command. The rules of the label are the same as those of the b command mentioned above. Here is also an example:

S/pattern/replacement/

T break

Command

: break

Take the sed script in use case 6 as an example. In fact, if you think about it, the script is not powerful enough: what if some @ fl structure spans two lines, say three lines? This requires the following enhanced version of sed:

$cat expl.6.sed

: begin

/ @ fl (\ ([^)] *\) / {

SQL /\\ fB\ 1\\ fR/g

B begin

}

/ @ fl (. * / {

S/@f1 (\ ([^)] *\ n [^)] *\) /\\ fB\ 1\\ fR/g

T again

B begin

}

: again

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.