In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-03-26 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Development >
Share
Shulou(Shulou.com)06/01 Report--
This article mainly introduces the relevant knowledge of "how Pandas splits Excel and merges Excel in batches". The editor shows you the operation process through actual cases. The operation method is simple, fast and practical. I hope this article "how to split Excel and merge Excel in batches by Pandas" can help you solve the problem.
1. Demonstration of examples
1. Split a large Excel into multiple Excel
two。 Merge multiple small Excel into one large Excel and mark the source
Work_dir= ". / course_datas/c15_excel_split_merge" splits_dir=f "{work_dir} / splits" import osif not os.path.exists (splits_dir): os.mkdir (splits_dir) II. Read the source Excel to Pandasimport pandas as pddf_source = pd.read_excel (f "{work_dir} / crazyant_blog_articles_source.xlsx") df_source.head () id title tags
0 2585 Tensorflow how to receive variable length list feature python,tensorflow, feature engineering
1 2583 Pandas to achieve data merging concat pandas,python, data analysis
What is the purpose of the Index index of 2 2574 Pandas? Pandas,python, data analysis
3 2564 data sets commonly used in machine learning: python, machine learning
4 2561 A data scientist's training path data analysis df_source.indexRangeIndex (start=0, stop=258, step=1) df_source.shape
(258,3)
Total_row_count = df_source.shape [0] total_row_count
two hundred and fifty eight
Third, split a large Excel into multiple Excel
1. Using the df.iloc method, split a large dataframe into several small dataframe
two。 Each small Excel will be saved using dataframe.to_excel
1. Calculate the number of rows for each excel after the split
# this big excel will be split to these people user_names = ["xiao_shuai", "xiao_wang", "xiao_ming", "xiao_lei", "xiao_bo", "xiao_hong"] # the number of tasks per person split_size = total_row_count / / len (user_names) if total_row_count% len (user_names)! = 0: split_size + = 1split_size
forty-three
2. Split into multiple dataframe
Df_subs = [] for idx, user_name in enumerate (user_names): # iloc start index begin = idx*split_size # iloc end index end = begin+split_size # implementation df splits df_sub = df_ source.ilocs according to iloc [begin: end] # saves each child df in the list df_subs.append ((idx, user_name, df_sub))
3. Save each datafame into excel
For idx, user_name, df_sub in df_subs: file_name = f "{splits_dir} / crazyant_blog_articles_ {idx} _ {user_name} .xlsx" df_sub.to_excel (file_name, index=False) IV. Merge multiple small Excel into one large Excel
1. Traverse the folder to get a list of Excel files to merge
two。 Read to the dataframe separately and add a column to each df to mark the source
3. Batch merge of df using pd.concat
4. Output the merged dataframe to excel
1. Iterate through the folder to get a list of Excel names to merge
Import osexcel_names = [] for excel_name in os.listdir (splits_dir): excel_names.append (excel_name) excel_names
['crazyant_blog_articles_0_xiao_shuai.xlsx'
'crazyant_blog_articles_1_xiao_wang.xlsx'
'crazyant_blog_articles_2_xiao_ming.xlsx'
'crazyant_blog_articles_3_xiao_lei.xlsx'
'crazyant_blog_articles_4_xiao_bo.xlsx'
'crazyant_blog_articles_5_xiao_hong.xlsx']
two。 Read to dataframe respectively
Df_list = []
For excel_name in excel_names: # read each excel to df excel_path = f "{splits_dir} / {excel_name}" df_split = pd.read_excel (excel_path) # get username username = excel_name.replace ("crazyant_blog_articles_", ") .replace (" .xlsx ",") [2:] print (excel_name, username) # add one column to each df That is, user name df_split ["username"] = username df_list.append (df_split)
Crazyant_blog_articles_0_xiao_shuai.xlsx xiao_shuai
Crazyant_blog_articles_1_xiao_wang.xlsx xiao_wang
Crazyant_blog_articles_2_xiao_ming.xlsx xiao_ming
Crazyant_blog_articles_3_xiao_lei.xlsx xiao_lei
Crazyant_blog_articles_4_xiao_bo.xlsx xiao_bo
Crazyant_blog_articles_5_xiao_hong.xlsx xiao_hong
3. Merge using pd.concat
Df_merged = pd.concat (df_list) df_merged.shape
(258,4)
Df_merged.head ()
Id title tags username
0 2585 Tensorflow how to receive variable length list feature python,tensorflow, feature engineering xiao_shuai
1 2583 Pandas to achieve data merging concat pandas,python, data analysis xiao_shuai
What is the purpose of the Index index of 2 2574 Pandas? Pandas,python, data analysis xiao_shuai
3 2564 data sets commonly used in machine learning Daquan python, machine learning xiao_shuai
4 2561 A data scientist's training path data analysis xiao_shuai
Df_merged ["username"] .value_counts ()
Xiao_hong 43
Xiao_bo 43
Xiao_shuai 43
Xiao_lei 43
Xiao_wang 43
Xiao_ming 43
Name: username, dtype: int64
4. Output the merged dataframe to excel
That's all for df_merged.to_excel (f "{work_dir} / crazyant_blog_articles_merged.xlsx", index=False) about "how Pandas splits Excel and merges Excel in batches". Thank you for reading. If you want to know more about the industry, you can follow the industry information channel. The editor will update different knowledge points for you every day.
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.