In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-03-30 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Servers >
Share
Shulou(Shulou.com)06/01 Report--
This article mainly explains "the practical application of gawk gsub function". Friends who are interested may wish to have a look. The method introduced in this paper is simple, fast and practical. Let's let the editor take you to learn the practical application of gawk gsub function.
When doing a data cleaning requirement, you need to query duplicate data with the same fields in both tables. The general idea is to use the statements statement, similar to:
Select *
From a
Where exists (select 1
From b
Where a.col1 = b.col1
And a.col2 = a.col2)
But the trouble here is that there are too many columns to match:
A.INFOCODE,a.SOURCENAME,a.SOURCETYPE,a.PUBLISHTYPE,a.NOTICEDATE,a.ENDDATE,a.NOTICETITLE,a.LANGUAGE,a.IMPORTLEVEL,a.SOURCEURL,a.ATTACHTYPE,a.ATTACHNAME,a.ATTACHSIZE,a.FORM,a.ACCESSORYNUM,a.NOTICESTATE,a.PUBLISHDATE,a.FILENUMBER
Solve this problem with Linux text processing:
Put this paragraph in a text first:
Root@bd-dev-mingshuo-183:/tmp#more 1
A.INFOCODE,a.SOURCENAME,a.SOURCETYPE,a.PUBLISHTYPE,a.NOTICEDATE,a.ENDDATE,a.NOTICETITLE,a.LANGUAGE,a.IMPORTLEVEL,a.SOURCEURL,a.ATTACHTYPE,a.ATTACHNAME,a.ATTACHSIZE,a.FORM,a.ACCESSORYNUM
A.NOTICESTATErect a.PUBLISHDATErea.FILENUMBER here introduce the gsub function in gawk
Gsub matches all the contents that conform to the regular expression, and then replaces it, which is equivalent to sed's Unigram g'.
The syntax is as follows:
Gsub (regular expression, subsitution string, target string)
The target range of processing is the third field, and the matching condition is the first parameter, and after matching, it is replaced with the second parameter.
Process one line of text as multiple lines of text:
Root@bd-dev-mingshuo-183:/tmp#more 1 | gawk 'gsub (/, /, "\ n", $0)'
A.INFOCODE
A.SOURCENAME
A.SOURCETYPE
A.PUBLISHTYPE
A.NOTICEDATE
A.ENDDATE
A.NOTICETITLE
A.LANGUAGE
A.IMPORTLEVEL
A.SOURCEURL
A.ATTACHTYPE
A.ATTACHNAME
A.ATTACHSIZE
A.FORM
A.ACCESSORYNUM
A.NOTICESTATE
A.PUBLISHDATE
A.FILENUMBER copies each column:
Root@bd-dev-mingshuo-183:/tmp#more 1 | gawk 'gsub (/, /, "\ n", $0)' | gawk-F'\ n'{print "on", $0, "=", $0, "and"}'
On a.INFOCODE = a.INFOCODE and
On a.SOURCENAME = a.SOURCENAME and
On a.SOURCETYPE = a.SOURCETYPE and
On a.PUBLISHTYPE = a.PUBLISHTYPE and
On a.NOTICEDATE = a.NOTICEDATE and
On a.ENDDATE = a.ENDDATE and
On a.NOTICETITLE = a.NOTICETITLE and
On a.LANGUAGE = a.LANGUAGE and
On a.IMPORTLEVEL = a.IMPORTLEVEL and
On a.SOURCEURL = a.SOURCEURL and
On a.ATTACHTYPE = a.ATTACHTYPE and
On a.ATTACHNAME = a.ATTACHNAME and
On a.ATTACHSIZE = a.ATTACHSIZE and
On a.FORM = a.FORM and
On a.ACCESSORYNUM = a.ACCESSORYNUM and
On a.NOTICESTATE = a.NOTICESTATE and
On a.PUBLISHDATE = a.PUBLISHDATE and
On a.FILENUMBER = a.FILENUMBER and
Replace
Root@bd-dev-mingshuo-183:/tmp#more 1 | gawk 'gsub (/, /, "\ n", $0)' | gawk-F'\ n'{print $0, "=" = ", $0," and "}'| sed's and = a sed
A.INFOCODE = b.INFOCODE and
A.SOURCENAME = b.SOURCENAME and
A.SOURCETYPE = b.SOURCETYPE and
A.PUBLISHTYPE = b.PUBLISHTYPE and
A.NOTICEDATE = b.NOTICEDATE and
A.ENDDATE = b.ENDDATE and
A.NOTICETITLE = b.NOTICETITLE and
A.LANGUAGE = b.LANGUAGE and
A.IMPORTLEVEL = b.IMPORTLEVEL and
A.SOURCEURL = b.SOURCEURL and
A.ATTACHTYPE = b.ATTACHTYPE and
A.ATTACHNAME = b.ATTACHNAME and
A.ATTACHSIZE = b.ATTACHSIZE and
A.FORM = b.FORM and
A.ACCESSORYNUM = b.ACCESSORYNUM and
A.NOTICESTATE = b.NOTICESTATE and
A.PUBLISHDATE = b.PUBLISHDATE and
A.FILENUMBER = b.FILENUMBER and
The processing process is relatively simple, focusing on the application of gsub function in gawk, as well as processing ideas.
At this point, I believe that you have a deeper understanding of the "practical application of gawk gsub function", you might as well come to the actual operation! Here is the website, more related content can enter the relevant channels to inquire, follow us, continue to learn!
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 284
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.