In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-01-15 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >
Share
Shulou(Shulou.com)06/02 Report--
It is believed that many inexperienced people have no idea about how to use pandas to merge data in python. Therefore, this paper summarizes the causes and solutions of the problem. Through this article, I hope you can solve this problem.
Pandas has four methods for dataframe splicing: concat, append, join and merge.
The differences between concat, append, join, and merge are as follows:
1. Concat (): the top-level method of pandas, which provides axis settings that can be used for inline or external stitching operations between rows (add rows, same below) or column directions (add columns, same below) in df
2. .append (): method of dataframe data type, which provides stitching operation of row direction
3. Join (): the method of dataframe data type, which provides stitching operation in column direction, and supports four operation types: left, right, inline and external.
4. Merge (): the top-level method of pandas, which provides functions similar to SQL database connection operations and supports all four types of SQL connection operations, such as left, right, inline and outreach.
Concat (objs, axis=0, join='outer', join_axes=None, ignore_index=False
Keys=None, levels=None, names=None, verify_integrity=False
Copy=True)
"
Description of common parameters:
Axis: splicing axis direction, default is 0, splicing along the row; if 1, splicing along the column
Join: the default external 'outer', splices all the label on the other axis, and the missing values are filled with NaN; inline' inner', splices only the same label on the other axis
Join_axes: a labels that specifies the axes to be spliced, which can be used when the join is neither inline nor external
Ignore_index: reordering index
Keys: multiple indexes
"
Import pandas as pddef df_maker (cols, idxs): return pd.DataFrame ({c: [c+str (I) for i in idxs] for c in cols}, index=idxs)
Df1 = df_maker ('abc', [1meme 2Magne3]) df2 = df_maker (' cde', [3meme 4Mae 5]) print (df1) print (df2) print (pd.concat ([df1,df2])) # by default along axis=0 Concat print (pd.concat ([df1,df2], ignore_index=True)) # reset index (the effect is similar to pd.concat ([df1,df2]) .reset_index (drop=True)) print (pd.concat ([df1,df2], axis=1)) # merge print (pd.concat ([df1,df2], axis=1, join='inner')) # along the column Use outreach because only index=3 is repeated in the line So there is only one line print (pd.concat ([df1,df2], axis=1, join_axes= [df1.index]) # that specifies that only the index of df1 is fetched
From pandas import Indexindex = Index ([1jing2jin4]) print (pd.concat ([df1,df2], axis=1, join_axes= [index]) # Custom index
Print (pd.concat ([df1,df2], axis=0,keys= ["first group", "second group")) # defines multiple indexes through key
Append (self, other, ignore_index=False, verify_integrity=False)
"
Description of common parameters:
Other: another df
Ignore_index: if True, rearrange the index
Verify_integrity: verify the uniqueness of index. If there is any repetition, an error will be reported. If ignore_index is already set, this parameter is invalid
"
Import pandas as pddef df_maker (cols, idxs): return pd.DataFrame ({c: [c+str (I) for i in idxs] for c in cols}, index=idxs)
Df1 = df_maker ('abc', [1je 2je 3]) df2 = df_maker (' cde', [3je 4je 5]) print (df1.append (df2)) # the effect is similar to pd.concat ([df1,df2]) print (df1.append (df2,ignore_index=True)) # index rearrangement, and the effect is similar to pd.concat ([df1,df2], ignore_index=True) # print (df1.append (df2,verify_integrity=True)) # because both df have index=3, so error report
Join (other, on=None, how='left', lsuffix='', rsuffix='', sort=False)
"
Description of common parameters:
On: the name of the df column on the left side of the reference is key (set_index operation may be required first). If not specified, follow index to join.
How: {'left',' right', 'outer',' inner'}. By default, 'left',' means to follow the index of the left df (or the corresponding column if on is declared); if 'right'abs, follow the df on the left
If 'inner'' is inline; if 'outer'' is fully connected.
Sort: whether to sort by the value corresponding to the key of join. Default is False.
Lsuffix,rsuffix: when there is a conflict between the column names of left and right, avoid errors by setting suffixes
"
Import pandas as pdimport numpy as np
Df3 = pd.DataFrame ({'lkey': [' foo','bar','baz','foo'], 'value':np.arange (1p5)}) df4 = pd.DataFrame ({' rkey': ['foo','bar','qux','bar'],' value':np.arange (3p7)}) print (df3) print (df4) # print (df3.join (df4)) # both have the same column name 'value' So print (df3.join (df4, lsuffix='_df3',rsuffix='_df4')) # avoid conflicts by adding suffixes print (df3.set_index ('lkey') .join (df4.set_index (' rkey'), how='outer',lsuffix='_df3',rsuffix='_df4')) # can set_indexprint (df3.join (df4.set_index ('rkey'), on='lkey',lsuffix='_df3') by set_indexprint the key on both sides Rsuffix='_df4')) # you can also set the key in the following df And merge with the columns in the specified left df through on, and the returned index remains unchanged.
Pd.merge (left, right, how='inner', on=None, left_on=None, right_on=None
Left_index=False, right_index=False, sort=False
Suffixes= ('_ x','_ y'), copy=True, indicator=False
Validate=None):
"
It can be used either as the top-level method of pandas or as a method of DataFrame data structure
Description of common parameters:
How: {'left',' right', 'outer',' inner'}, the default 'inner', is similar to the inline of SQL. Left' is similar to SQL's left couplet; 'right' is similar to SQL's right couplet.
Outer' is similar to the all-Union of SQL.
On: the reference column name for merging must be the same. If None, the method automatically matches the same column names in both tables
Left_on: the column to which the left df joins
Right_on: the column to which df joins on the right
Suffixes: name prefixes in the left and right columns
Validate: default None, which can be defined as "one_to_one", "one_to_many", "many_to_one" and "many_to_many", that is, to verify whether one-to-one, one-to-many, many-to-one, or
Many-to-many relationship
"
"
SQL sentence review:
Inline: SELECT a.inline, b.* from table1 as an inner join table2 as b on a.ID=b.ID
Leftist League: SELECT A. alliance, b.* from table1 as a left join table2 as b on a.ID=b.ID
Right couplet: SELECT a.mom, b.* from table1 as a right join table2 as b on a.ID=b.ID
Quanlian: SELECT a.union, b.* from table1 as a full join table2 as b on a.ID=b.ID
"
Import pandas as pddf3 = pd.DataFrame ({'lkey': [' foo','bar','baz','foo'], 'value':np.arange (1p5)}) df4 = pd.DataFrame ({' rkey': ['foo','bar','qux','bar'],' value':np.arange (3p7)}) print (df3) print (df4) print (pd.merge (df3,df4)) # on is None and automatically looks for the same column name, that is, 'value' And defaults to inline print (pd.merge (df3,df4,how='outer')) # print (pd.merge (df3,df4, left_on='lkey',right_on='rkey')) # default inline in external mode 2 foo*2 barprint (pd.merge (df3, df4, left_on='lkey',right_on='rkey', how='left')) # connect print (pd.merge (df3, df4, left_on='lkey',right_on='rkey', how='right')) based on df3 on the left # connect print (pd.merge (df3, df4, left_on='lkey',right_on='rkey') based on df4 on the right How='outer')) # full connection print (pd.merge (df3, df4, left_on='lkey',right_on='rkey', how='inner')) # read the above content through the connection Have you mastered how to use pandas to merge data in python? If you want to learn more skills or want to know more about it, you are welcome to follow the industry information channel, thank you for reading!
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.