In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-04-05 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Development >
Share
Shulou(Shulou.com)06/03 Report--
This article is about how to split / merge Excel in batches in Pandas data analysis. The editor thinks it is very practical, so share it with you as a reference and follow the editor to have a look.
I. falsifying data
Work_dir= ". / datas" splits_dir=f "{work_dir} / splits" import osif not os.path.exists (splits_dir): os.mkdir (splits_dir) # 0. Read source Excel to Pandasimport pandas as pddf_source=pd.read_excel (f "{work_dir} / 1.xlsx") df_source.head () df_source.indexdf_source.shapetotal_row_count=df_source.shape [0] total_row_count
2. Program demonstration 1. Split a large Excel into multiple Excel
Using the df.iloc method, split a large dataframe into several small dataframe
Each small Excel will be saved using dataframe.to_excel
# 1. Calculate the number of rows per excel after split # this big excel Will be split to these people user_names= ['xiao_shuai', "xiao_wang", "xiao_ming", "xiao_lei", "xiao_bo", "xiao_hong"] # number of people per person split_size=total_row_count//len (user_names) if total_row_count%len (user_names)! = 0: split_size+=1split_size# is split into multiple dataframedf_subs= [] for idx User_name in enumerate (user_names): # iloc start index begin=idx*split_size # iloc end index end=begin+split_size # implementation df splits df_sub=df_ source.ilocbegin according to iloc [: end] # stores each child df in the list df_subs.append ((idx,user_name,df_sub)) # 3. Save each dataframe to excelfor idx,user_name,df_sub in df_subs: file_name=f "{splits_dir} / articles_ {idx} _ {user_name} .xlsx" df_sub.to_excel (file_name,index=False) 2, merge multiple small Excel into one big Excel
Traverse the folder to get a list of Excel files to merge
Read to the dataframe separately and add a column to each df to mark the source
Batch merge of df using pd.concat
Output the merged dataframe to excel
# 1. Traverse the folder Get the list of Excel names to be merged import osexcel_names= [] for excel_name in os.listdir (splits_dir): excel_names.append (excel_name) excel_names#2 read to dataframedf_list= [] for excel_name in excel_names: # read each excel to df excel_path=f "{splits_dir} / {excel_name}" df_split=pd.read_excel (excel_path) # to get username username=excel_name.replace ("articles_") ") .replace (" .xlsx ",") [2:] print (excel_name,username) # add 1 column to each df That is, user name df_split ["username"] = username df_list.append (df_split) # 3. Use pd.concat to merge df_merged=pd.concat (df_list) df_merged.shapedf_merged.head () df_merged ["username"] .value_counts () # 4. Output the merged dataframe to exceldf_merged.to_excel (f "{work_dir} / result_merged.xlsx", index=False)
Thank you for reading! On "Pandas data analysis of how to split / merge Excel in batches" this article is shared here, I hope the above content can be of some help to you, so that you can learn more knowledge, if you think the article is good, you can share it out for more people to see it!
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.