Linux text processing tools and regular expressions 07/09 Update SLTechnology News&Howtos

Linux text processing tools and regular expressions

2025-07-09 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Servers >

Shulou(Shulou.com)06/02 Report--

Tool for extracting text: cat,less,more file interception: head,tail column extraction: cut sorting and statistics: sort,wc by keyword extraction: grep file view command cat,nl,tac,revcat [OPTION]. [FILE]...-E: display line Terminator $- n: number each line displayed-A: display all control characters-b: non-blank line number-s: compress consecutive blank lines into a line nl display line number tac reverse connection and print file rev reverse print line character pagination view file contents more: page view file more [OPTIONS...] FILE...-d: display page flip and exit prompts less: useful commands for viewing files or STDIN output page by page include: / text search text nbank N jump to the next or previous matching less command is the pager used by the man command to display the content before or after the text head [OPTION]. [FILE]...-c # specifies the first # bytes-n # specifies the first # lines-# ditto tail [OPTION]... [FILE]...-c # specify # bytes after acquisition-n # specify line # after acquisition-# ditto-f track shows the newly appended contents of file fd, common log monitoring is equivalent to-- follow=descriptor-F tracking file name, equivalent to-- follow=name-- retrytailf similar to tail-f, when the file does not grow, it does not access the file to extract text cut by column and merge file pastecut [OPTION]. [FILE]...-d DELIMITER: indicates the delimiter, default tab-f FILEDS:#: # field #, # [, #]: discrete multiple fields For example, 1 f7cut 3, 6 etc/passwdcat words: consecutive multiple fields, such as 1-6 mixed use: 1-3 Magne7 Lizc cut by character-- output-delimiter=STRING specifies the output delimiter to display the specified column of file or STDIN data cut-d:-F1 / etc/passwdcat / etc/passwd | cut-d:-f7cut-c2-5 / usr/share/dict/wordspaste merges the columns of the same line number of two files into one line paste [OPTION]. [FILE]...-d delimiter: specifies the delimiter, defaults to TAB-s: all lines are displayed in one line example: paste F1 f2paste-s f1 f2 tools for text analysis text data statistics: wc collates text: sort comparison files: diff and patch collect text statistics wc

Can be used to count the total number of lines, words, bytes and characters in a file

You can count the data in files or STDIN

Wc story.txt39 237 1901 story.txt line count byte common options-l count only lines-w count total words only-c count total number only-m count total number of characters-L display the length of the longest line in the file text sort sort displays the sorted text in STDOUT Do not change the original file sort [options] file (s) common options-r perform reverse direction (top to bottom) collation-R random sort-n execute collate by numeric size-f option ignore (fold) characters in string case-u option (unique Unique) remove duplicate lines from the output-t c option use c as the field delimiter-k # option to sort out the # column separated by c characters. You can use the uniquniq command multiple times: remove duplicate lines uniq [OPTION] from the input. [FILE]...-c: show the number of repeats per line-d: show only repeated lines-u: show only lines that have not been repeated Note: continuous and identical is repeated often used with the sort command: sort userlist.txt | uniq-c comparison file compares the difference between two files diff foo.conf foo2.conf5c5

< use_widgets = no--->

Use_widgets = yes indicates that line 5 is different (change) copy the output of the file change patchdiff command is saved in a file called "patch" and use the-u option to output a "unified)" diff format file Most suitable for patch file patch copy changes made in other files (use with caution) apply-b option to automatically back up changed files diff-u foo.conf foo2.conf > foo.patchpatch-b foo.conf foo.patchgrep: text filtering (pattern: pattern) tool grepegrepfgrep (does not support regular expression search) function: text search tool, line by line matching check of target text according to the "pattern" specified by the user Print matching line patterns: filter conditions grep [OPTIONS] PATTERN [FILE...] grep root / etc/passwdgrep "$USER" / etc/passwdgrep'$USER' / etc/passwdgrep `whoami` / etc/passwdgrep command options written by regular expression characters and text characters-- color=auto: shade the matched text-m # match # times and then stop-v display lines that are not matched by pattern-I ignore character size Write-n displays matching line numbers-c statistics the number of matched lines-o displays only the matched string-Q silent pattern Do not output any information-A # after, the last # line-B # before, the first # line-C # context, each # line-e to implement the logical or relationship between multiple options grep-e 'cat'-e 'dog' file-w matches the whole word-E uses ERE-F equivalent to fgrep, does not support regular expressions-f file to process regular expressions REGEXP: Regular Expressions based on pattern files, patterns written by a class of special characters and text characters Some of these characters (metacharacters) do not represent the literal meaning of the characters, but the functional programs that represent control or wildcard support: vim, less,grep,sed,awk, nginx,varnish, etc. There are two categories: basic regular expressions: BRE,grep,vim extended regular expressions: ERE,grep-E, egrep,nginx regular expression engine: using different algorithms Check the PCRE (Perl Compatible Regular Expressions) metacharacter classification of the software module dealing with regular expressions: character matching, matching times, position anchoring, grouping man 7 regular expression metacharacter matching:. Match any single character [] matches any single character in the specified range, for example: [wang] [0-9] [a murz] [a-zA-Z] [^] matches any single character [: alnum:] letters and numbers [: alpha:] outside the specified range represents any uppercase and lowercase character Also known as Amurz Amurz [: lower:] lowercase letters [: upper:] uppercase letters [: blank:] white space characters (spaces and tabs) [: space:] horizontal and vertical white space characters (wider than [: blank:]) [: cntrl:] non-printable control characters (backspace, delete, Alarm bell.) [: digit:] decimal digits [: xdigit:] hexadecimal digits [: graph:] printable non-blank characters [: print:] printable characters [: punct:] punctuation matching times: used after the characters to be specified Used to specify the number of times the previous character will appear * match the previous character any number of times, including 0 (greedy mode: match as long as possible). * any character of any length\? Match the character in front of it 0 or 1 times\ + match the character in front of it at least once\ {n\} match the character in front of it at least n times, at most n times\ {n, n\} match the previous character at most n times\ {n,\} match the previous character at least n times position anchor: locate the position where it appears ^ line first anchor The leftmost $line end anchor for the pattern, and the rightmost ^ PATTERN$ for the pattern matching entire line ^ $blank line ^ [[: space:]] * $blank line\

< 或 \b 词首锚定，用于单词模式的左侧\>

Or\ b suffix anchoring, used on the right side of the word pattern\ to match the entire word grouping: () bind one or more characters together and treat them as a whole, for example: (root) + the content matched by the pattern in parentheses is recorded by the regular expression engine in internal variables These variables are named as:\ 1,\ 2,\ 3,.\ 1 represents the character matched by the first left parenthesis from the left and matches the pattern between the right parenthesis:\ (string1\ (string2\)\)\ 1: string1\ (string2\)\ 2: string2 backward reference: reference to the character matched by the pattern in the preceding grouped parenthesis Instead of the pattern itself or: | example: a\ | b an or bC\ | cat C or cat\ (C\ | c\) at Cat or categrep and the extended regular expression egrep = grep-Eegrep [OPTIONS] PATTERN [FILE...] Metacharacters of extended regular expressions: character matching:. Any single character [] the specified range of characters [^] does not match within the specified range of character times: * match the previous character any time? 0 or 1 + 1 or more {m} match m times {mdirection n} at least m, at most n times position anchor: ^ line beginning $line end\,\ b end grouping: () backward reference:\ 1,\ 2,. Or: a | b an or bC | cat C or cat (C | c) at Cat or cat

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.