In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-01-18 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Development >
Share
Shulou(Shulou.com)06/01 Report--
Most people do not understand the knowledge points of this article "based on Python how to achieve comparison of Exce tools", so the editor summarizes the following contents, detailed contents, clear steps, and has a certain reference value. I hope you can get something after reading this article. Let's take a look at this "how to achieve comparison of Exce tools based on Python" article.
1. Parameters.
Two sheet pages of the same excel file, one ODS (old data) and one DWH (new data)
Generate a comparison file
Design two primary keys input primary key 1 input primary key 2
(default new and old file column names are the same)
two。 Effect.
Generated file
The top 10 data with the same amount of data and inconsistent data in each field
Top 10 for different amounts of data, taking different data on both sides, excluding different data, and inconsistent data for each field.
3. Realize
Cyclic comparison combination column (primary key + contrast column)
Pandas deals with differential data, and openpyxl processes the data format of the generated sheet. (Mr. into data, and then adjust the format)
Configuration
Import pandas as pdfrom openpyxl import load_workbook# chooses the file path path=r "C:\ Users\ Desktop\ Migration_Data_Compari\ contrast file .xls" # input ("Select file path:") TargetPath=r "C:\ Users\ tubule classmate\ Desktop\ Migration_Data_Comparison_Tool\ object file\ comparison result .xlsx" DATA_ODS=pd.read_excel (r "C:\ Users\ tubule classmate\ Desktop\ Migration_Data_Comparison_Tool\ comparison file .xls" Sheet_name= "ODS") DATA_DWH=pd.read_excel (r "C:\ Users\ Xiao Guan\ Desktop\ Migration_Data_Comparison_Tool\ comparison file .xls", sheet_name= "DWH") # Select the primary key Primarykey= "employee number" # input ("Select key 1:") Primarykey# employee number
I. the amount of data
Output Table 1-amount of data
Def write_to_excel_DataVolume (Data,TargetPath): # cor_df is the dataframe writer = pd.ExcelWriter (TargetPath, engine='xlsxwriter') to be saved # here use Data.to_excel (writer,sheet_name='Sheet1', encoding='utf8', header=False, startcol=0, startrow=2) # to start the dataframe data from line 2 workbook = writer.book format1 = workbook.add_format ({# package the style first Then assign a value to 'bold': True, # bold' text_wrap': True, # whether to wrap the line 'valign':' bottom', # vertical alignment 'align':' center', # horizontal alignment 'fg_color':' # C5D9F1, # cell background color 'border': 1 # Border}) writer_sheet = writer.sheets ['Sheet1'] # set width writer_sheet.set_column ("Alav I", 16) writer_sheet.set_column (' Corel Che Magi 30) writer_sheet.merge_range (0meme 0meme 2J 'comparison result', format1) writer_sheet.merge_range (4Min 2 Jing 4 'data quantity difference', format1) writer_sheet.write (1J 0J'' Format1) writer_sheet.write (1 ODS, "DWH"] DataFrame_DataVolume.index= ["data volume"] DataFrame_DataVolume#writeFileDataVolume (DataFrame_DataVolume,TargetPath) write_to_excel_DataVolume (DataFrame_DataVolume,TargetPath) writer.save () writer.close () DataFrame_DataVolume=pd.DataFrame ([DATA_ODS.shape [0]], [DATA_DWH.shape [0]) .TDataFrame_DataVolume.columns = ["ODS", "DWH"] DataFrame_DataVolume.index= ["data volume"]
Output form 2-data volume difference contract
If DATA_ODS.shape [0] = = DATA_DWH.shape [0]: passelse: DATA_ODS_Primarykey=pd.DataFrame (DATA_ ODS [Primarykey]) DATA_DWH_Primarykey=pd.DataFrame (DATA_ DWA [Primarykey]) df_union = pd.concat ([DATA_ODS_Primarykey,DATA_DWH_Primarykey]) # implement 1 df_diff_ODS = df_union.append (DATA_ODS_Primarykey). Drop_duplicates (subset=df_union.columns.to_list () Keep=False) df_diff_DWH = df_union.append (DATA_DWH_Primarykey). Drop_duplicates (subset=df_union.columns.to_list () Keep=False) # DWH more contracts df_diff_ODS # DWH fewer contracts df_diff_DWH df_diff_DWH_Data= [] df_diff_ODS_Data= [] for i in df_diff_ODS.head (10) .values.tolist (): for n in i: df_diff_ODS_Data.append (n) for i in df_diff_DWH.head (10) .values.tolist (): df_diff_DWH_Data.append (n) while True: if len (df_diff_DWH_Data) > len (df_diff_ODS_Data): df_diff_ODS_Data.append ("-") elif len (df_diff_DWH_Data)
< len(df_diff_ODS_Data): df_diff_DWH_Data.append("-") elif len(df_diff_DWH_Data)== len(df_diff_ODS_Data): break DataFrame_DataVolume_Count_result=pd.DataFrame(df_diff_DWH_Data,df_diff_ODS_Data).reset_index() DataFrame_DataVolume_Count_result.columns=['DWH多的合同','DWH少的的合同'] DataFrame_DataVolume_Count_result=DataFrame_DataVolume_Count_result.reset_index() DataFrame_DataVolume_Count_result.columns=['序号','DWH多的合同','DWH少的的合同']DataFrame_DataVolume_Count_resultfrom openpyxl import load_workbook def write_to_excel_Count_result(Data,TargetPath): df_Old = pd.DataFrame(pd.read_excel(TargetPath)) #读取原数据文件和表 writer = pd.ExcelWriter(TargetPath,engine='openpyxl') book=load_workbook(TargetPath) writer.book = book writer.sheets = dict((ws.title, ws) for ws in book.worksheets) df_rows = df_Old.shape[0] #获取原数据的行数 Data.to_excel(writer,startrow=df_rows+1, index=False,startcol=0,header=True)#将数据写入excel中的aa表,从第一个空行开始写 writer.save()#保存write_to_excel_Count_result(DataFrame_DataVolume_Count_result,TargetPath)The above is about the content of this article on "how to compare Exce tools based on Python". I believe we all have a certain understanding. I hope the content shared by the editor will be helpful to you. If you want to know more related knowledge, please pay attention to the industry information channel.
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.