Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

How to traverse files in a specific directory in Python to extract specified information

2025-02-23 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Share

Shulou(Shulou.com)06/01 Report--

This article is about how to traverse files in a specific directory in Python to extract specified information. The editor thinks it is very practical, so share it with you as a reference and follow the editor to have a look.

Demand

You need to traverse a file in a directory (text / csv, which contains url with http/https protocol) to extract the domain name contained in it and reoutput it.

Code # coding:utf-8#author: Duckweeds7import reimport osimport csvimport codecsimport urllibdef splitSign (str1): # remove the extra symbols and extract the domain name part can be modified to meet the needs str2 = str1.replace (',',') proto, rest = urllib.splittype (str2) # here the method in the urllib library is used For details, you can search res, rest = urllib.splithost (rest) return resdef text_save (filename, data): # filename is the path to the CSV file, and data is the list of data to be written. File = open (filename,'a') # a to append w to overwrite for i in range (len (data)): s = str (data [I]). Replace ('[','). Replace ('],') # remove []. These two lines can be selected as s = s.replace (",','). Replace (',',') +\ n'# remove single quotes and commas according to the data Append the newline character file.write (s) file.close () print ("Complete") def walkFile (file) at the end of each line: regex = re.compile ('[a-zA-z] +: / / [^\ s] *') all_urls = [] for root, dirs Files in os.walk (file): # root indicates the folder path currently being accessed # dirs indicates the subdirectory name list # files under this folder indicates the files under this folder list # traverses the files under the directory for f in files: f_obj = open (os.path.join (root) F)) # because files is a file name, it needs to be stitched with os.path into an absolute path get_urls = regex.findall (f_obj.read ()) # regular extraction of url all_urls.extend (map (splitSign)) Get_urls) # map function performs splitSign function processing on each item in get_urls set_urls = set (all_urls) # set collection to reprocess text_save ('E:\\ test\\ test.csv' List (set_urls)) # the output file name needs to be the absolute path if _ _ name__ = ='_ _ main__': walkFile ('E:\\ test') # enter the folder path to be processed. Thank you for reading! On "how to traverse files in a specific directory in Python to extract specified information" this article is shared here, I hope the above content can be of some help to you, so that you can learn more knowledge, if you think the article is good, you can share it out for more people to see it!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Internet Technology

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report