In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-01-16 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Development >
Share
Shulou(Shulou.com)06/02 Report--
This article introduces you Python traversal directory files, read, tens of millions of data merge example analysis, the content is very detailed, interested friends can refer to, hope to be helpful to you.
1. Use Python to judge files and folders
Recursion: the main purpose is to traverse folders and files
Judge the attributes of folders and files
First, traverse the folder to see what kind of files are in the folder, and read out all the files in the folder.
The import ospath= ". / data" # path files = os.listdir (path) # os.listdir () method returns a list of files or folder names contained in the specified folder. For file in files: print (file) if os.path.isfile (path+ "/" + file): # os.path.isfile (path) determines whether the path is a file print ('file'+' this is a file') filename,extension = os.path.splitext (file) # split path Returns the tuple of pathname and file extension if extension = = ".txt": print (filename+' this is a text file') elif extension = = ".xlsx": print (filename+' this is an excel file') if os.path.isdir (path + "/" + file): print (file+ "is a folder")
Read result:
Use Python to get all the files and folders and read the corresponding files
On the basis of traversing folders, how to quickly read specified files and improve work efficiency?
Just import the pandas package and read_excel_ the files we need on the basis of the above code.
Import pandas as pdimport os path ='. / data'def get_all_files (path): print (the'-'* 25folders' function is called'+'-'* 25) the files = os.listdir (path) # os.listdir () method returns a list of files or folder names contained in the specified folder. For file in files: if os.path.isfile (path+ "/" + file): # os.path.isfile (path) determines whether the path is a file print ('file'+ "> is a file) filename,extension = os.path.splitext (file) # split path Returns the tuple of the pathname and file extension if extension = = ".txt": print (filename+ "# is a text file #") print ("read the contents of" + filename+ "file.") Data = pd.read_table (path+'/'+file) print (data) elif extension = ".xlsx": print (filename+'# is the Excel file #') print ("read the contents of the" + filename+ "file.") Data = pd.read_excel (path+'/'+file) print (data) elif extension = ".csv": print (filename+'# is the csv file #') print ("read the contents of the" + filename+ "file.") Data = pd.read_csv (path+'/'+file) print (data) if os.path.isdir (path+ "/" + file): print (file+ "is a folder ¥") get_all_files (path+'/'+file) get_all_files (path)
Read successfully!
Use Python to merge data
We have a lot of tables to deal with in our daily work, how to merge the tables in many folders in batches?
Key points:
The use of DataFrame.append (* other*, * ignore_index=False*, * verify_integrity=False*, * sort=None*) append
Other: is to add data, append is not picky food, this other can be dataframe,dict,Seris,list and so on.
Ignore_index: when the parameter is True, after the data is merged, the data will be merged according to 0mem1pd2 and 3. Resets the index in the order, ignoring the old index
Verify_integrity: when the parameter is True, an error will be reported if the merged data contains the same rows as the original data.
Path='./project_data' # # declares an empty DataFrame, which is used to do the final data merge final_data = pd.DataFrame () # declare an empty DataFrame Used to do the final data merge final_data = pd.DataFrame () def get_all_files (path): global final_data print ("-" * 20 + "function is called" + "-" * 20) files = os.listdir (path) for file in files: if os.path.isfile (path + "/" + file): print (file+ "> is a file") filename Extension=os.path.splitext (file) # determines whether it is a text file if extension= = ".txt": print (filename+ "# is a text file #") print ("read the contents of" + filename+ "file.") Data = pd.read_table (path+'/' + file) print (data) elif extension=='.xlsx': print (filename+ "# is Excel file #") print ("read the contents of" + filename+ "file.") Data = pd.read_excel (path+'/' + file) print (data) elif extension=='.csv': print (filename + "is the csv file Is the file to be processed this time ") # get the file content file_data = pd.read_csv (path +'/'+ file) final_data = final_data.append (file_data) Ignore_index=True) # append description: add an element object print ("merge" + filename+ "file data") # to the end of the list ls to determine whether it is a folder elif os.path.isdir (path+'/'+file): print (file + "merge") Clip ¥") get_all_files (path +'/'+ file) get_all_files (path) print (" data merge complete ")
To start the merge, let's take a look at the merged data:
A total of more than 10 million pieces of data, if we use Excel, it will take a lot of time to merge so many tables, and it will be very stuck.
On the Python traversal directory files, read, tens of thousands of data merge example analysis is shared here, I hope the above content can be of some help to you, can learn more knowledge. If you think the article is good, you can share it for more people to see.
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.