In addition to Weibo, there is also WeChat
Please pay attention

WeChat public account
Shulou
 
            
                     
                
2025-10-25 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Servers >
Share
Shulou(Shulou.com)06/01 Report--
This article introduces the knowledge of "how to delete duplicate lines in some fields of big data file under Linux". In the operation of actual cases, many people will encounter such a dilemma, so let the editor lead you to learn how to deal with these situations. I hope you can read it carefully and be able to achieve something!
Recently wrote a data acquisition program to generate a file containing more than 10 million rows of data, the data is composed of four fields, according to the requirements need to delete the second field duplicate row, looking for linux also did not find a suitable tool, sed/gawk and other stream processing tools can only deal with one line per line, and can not find duplicate fields of the row. It seems that I have no choice but to python a program. I suddenly remembered to use mysql, so I made a big shift:
1. Import data into the table using mysqlimport-- local dbname data.txt. The table name should be the same as the file name.
two。 Execute the following sql statement (unique fields are required to be uniqfield)
The code is as follows:
Use dbname
Alter table tablename add rowid int auto_increment not null
Create table t select min (rowid) as rowid from tablename group by uniqfield
Create table T2 select tablename. * from tablename,t where tablename.rowid= t.rowid
Drop table tablename
Rename table t2 to tablename
This is the end of the introduction of "how to delete duplicate lines in some fields in big data file under Linux". Thank you for your reading. If you want to know more about the industry, you can follow the website, the editor will output more high-quality practical articles for you!
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

The market share of Chrome browser on the desktop has exceeded 70%, and users are complaining about

The world's first 2nm mobile chip: Samsung Exynos 2600 is ready for mass production.According to a r


A US federal judge has ruled that Google can keep its Chrome browser, but it will be prohibited from

Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope





 
             
            About us Contact us Product review car news thenatureplanet
More Form oMedia: AutoTimes. Bestcoffee. SL News. Jarebook. Coffee Hunters. Sundaily. Modezone. NNB. Coffee. Game News. FrontStreet. GGAMEN
© 2024 shulou.com SLNews company. All rights reserved.