In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-01-19 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Development >
Share
Shulou(Shulou.com)06/02 Report--
Editor to share with you how Python operates Excel files in batches. I hope you will get something after reading this article. Let's discuss it together.
Introduction of OS module for batch operation
OS's full name is Operation System, which refers to the operating system. In Python, the OS module mainly provides some functions to interact with the operating system, that is, the computer system. Many of our automation operations depend on the functionality of the module.
Basic operation of OS module to get the current work path
We talked about how to install Anaconda and how to use Jupyter notebook to write code at the beginning of the Python basics section. But do you know where the code you wrote in Jupyter notebook is stored on the computer?
Is it true that many students don't know? It's easy to know by typing the following code in Jupyter notebook and running it:
Import osos.getcwd ()
Run the above code to get the following result:
'C:\\ Users\\ zhangjunhong\\ python Library\\ Python report Automation'
The above file path is the path where the notebook code file is located. The file path in which your code is stored will get the corresponding results.
Get all the file names under a folder
We often import files local to the computer into Python for processing, and we need to know the storage path and file name of the file before importing. If there are only one or two files, we can just enter the file name and file path manually, but sometimes there are many files that need to be imported. At this time, the efficiency of manual input will be relatively low, so you need to use code to improve efficiency.
There are four Excel files in the following folder:
We can use os.listdir (path) to get all the file names under the path path. The specific implementation code is as follows:
Import osos.listdir ('DVU Unix Datagram ScienceAccorder shareRedDB dataUnix Test')
Run the above code to get the following result:
['March performance-Zhang Mingming .xlsx','Li Dan March performance .xlsx', 'Wang Meiyue-March performance .xlsx', 'Chen Kai March performance .xlsx']
Rename the file name
Renaming files is also a high-frequency requirement, and we can use os.rename ('old_name','new_name') to rename files. Old_name is the old file name, and new_name is the new file name.
Let's first create a new file named test_old under the test folder, and then use the following code to change the test_old file name to test_new:
Os.rename ('DJUX Dataget ScienceNow. Xlsx')
After running the above code, and then under the test folder, you can see that the test_old file no longer exists, only test_new.
Create a folder
When we want to create a new folder under the specified path, we can choose to create a new folder manually or use os.mkdir (path) to create a new folder, just specify the specific path (path).
As shown below, when we run the following code, we create a new folder called test11 under the D:/Data-Science/share/data path:
Os.mkdir ('Dvu Lexi Dataget ScienceAccording to shareRedDB _ data _ Test11')
Delete a folder
Deleting a folder corresponds to creating a folder. Of course, we can choose to delete a folder manually, or we can delete it using os.removedirs (path) to indicate the path to delete (path).
As shown below, when we run the following code, we delete the test11 folder we just created:
Os.removedirs ('Dvu Dataget ScienceUniverse shareUniverse Dataget test11') deletes a file
Delete a specific file when you delete a file, while deleting a folder deletes an entire folder, including all the files in the folder. The file is deleted using os.remove (path), indicating the path where the file is located (path).
As shown below, when we run the following code, we delete the test_new file in the test folder:
Os.remove uses OS module to batch read multiple files under one file.
Sometimes there are multiple similar files under a folder, such as performance files for different people in a department. We need to read these files in batches into Python and then process them.
As we learned earlier, how to read a file, you can use load_work or read_excel, either way, you just need to specify the path to read the file.
So how to read it in batches? First get all the file names under the file, and then traverse and read each file. The specific implementation code is as follows:
Import pandas as pd# gets all the filenames under the folder name_list = os.listdir ('DJUGUAGUAGUAZAZAZAZHAZAGUAGUAGUA Test`) # for Loop reads for i in name_list: df = pd.read_excel (rattled DJUGN) print (' {} read complete!' .format (I))
If you want to perform data operations on the read-in file, put the specific operation implementation code after reading the substitution code. For example, we need to delete duplicate values for each read file. The implementation code is as follows:
Import pandas as pd# gets all the filenames under the folder name_list = os.listdir ('Dizuzag for Science') # data Loop reads for i in name_list: df = pd.read_excel (ringing DVOG ScienceThink sharedataGTST'+ I) df = df.drop_duplicates () # delete duplicate value processing print ('{} read complete!') .format (I) create folders in bulk
Sometimes we need to create specific folders based on a specific theme, such as 12 folders based on the month. We learned how to create a single folder, and to create multiple folders in batches, you only need to iterate through the statements that execute the single folder. The specific implementation code is as follows:
Month_num = ['January', 'February', 'March', 'April', 'May', 'June', 'July', 'August', 'September', 'October', 'November', 'December'] for i in month_num: os.mkdir ('DDM DataCompact ScienceThink Sciencesharehand data print' + I) print ('{} created!' .format (I))
After running the above code, 12 new folders are created under the file path:
Batch rename Fil
Sometimes we have a lot of files with the same theme, but the names of these files are confusing. For example, the following files show the performance of each employee in March, but the naming format is not quite the same. We need to unify it into a format such as name + March performance. To achieve this effect, you can achieve this through the renaming of files learned earlier, the previous only talked about the operation of a single file, so how to batch operation on multiple files at the same time?
The specific implementation code is as follows:
Import os# gets all the filenames under the specified file old_name = os.listdir ('DRU _ GUBG _ database _ GTST') name = ["Zhang Mingming", "Li Dan", "Wang Mengyue" "Chen Kai"] # iterate through each name for n in name: # iterate through each old file name for o in old_name: # determine whether the old file name contains a specific name # rename if n in o: os.rename if so. 'Dazzle Sciencesharehand dataUnix'+ n + "March performance .xlsx")
After running the above code, you can see that all the original file names under the file have been renamed.
Other batch operations merge multiple files in batch
As shown below, there are monthly sales dailies from January to June under this folder. It is known that the structure of these dailies is the same, with only two columns of date and sales volume. Now we want to merge these dailies of different months into one.
The specific implementation code for merging monthly sales dailies into one file is as follows:
Import osimport pandas as pd# gets all the filenames under the specified file name_list = os.listdir ('DRU _ date': []) # create an empty DataFramedf_o = pd.DataFrame with the same structure ('sales volume': []}) # traversing and reading each file for i in name_list: df = pd.read_excel (ringing DVOGUAGUA ScienceAccordThree _ data _ pd.concat'+ I) # Longitudinal stitching df_v = pd.concat ([df_o,df]) # assign the stitched result to df_o df_o = df_vdf_o
Run the above code and you will get the merged file df_o, as shown below:
Split a document into multiple files according to a specified column
The above talked about how to merge multiple files in batches, and we also have the reverse requirement of merging multiple files, that is, splitting a file into multiple files according to the specified column.
Or the above data set, suppose we now have a file for January-June, which has one more month in addition to the date and sales columns. what we need to do now is to split the file into multiple files according to the month column, and each month is stored as a separate file.
The specific implementation code is as follows:
# generate a new month column df_o ['month'] = df_o ['date'] .apply (lambda x:x.month) # iterate through each month value for m in df_o ['month'] .unique (): # filter out data for a specific month value df_month = df_ o ['month'] = = m] # Save the filtered data Df_month.to_csv (ruddy Dvu Gunther Datagram ScienceUniverse shareSplitter Datagram'+ str (m) + monthly sales Daily _ after split .csv')
By running the above code, we can see multiple split files under the target path:
After reading this article, I believe you have a certain understanding of "how Python operates Excel files in batches". If you want to know more about it, you are welcome to follow the industry information channel. Thank you for reading!
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.