How to use pandas to merge data in python 07/01 Update SLTechnology News&Howtos

How to use pandas to merge data in python

2025-07-01 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Shulou(Shulou.com)06/02 Report--

It is believed that many inexperienced people have no idea about how to use pandas to merge data in python. Therefore, this paper summarizes the causes and solutions of the problem. Through this article, I hope you can solve this problem.

Pandas has four methods for dataframe splicing: concat, append, join and merge.

The differences between concat, append, join, and merge are as follows:

1. Concat (): the top-level method of pandas, which provides axis settings that can be used for inline or external stitching operations between rows (add rows, same below) or column directions (add columns, same below) in df

2. .append (): method of dataframe data type, which provides stitching operation of row direction

3. Join (): the method of dataframe data type, which provides stitching operation in column direction, and supports four operation types: left, right, inline and external.

4. Merge (): the top-level method of pandas, which provides functions similar to SQL database connection operations and supports all four types of SQL connection operations, such as left, right, inline and outreach.

Concat (objs, axis=0, join='outer', join_axes=None, ignore_index=False

Keys=None, levels=None, names=None, verify_integrity=False

Copy=True)

Description of common parameters:

Axis: splicing axis direction, default is 0, splicing along the row; if 1, splicing along the column

Join: the default external 'outer', splices all the label on the other axis, and the missing values are filled with NaN; inline' inner', splices only the same label on the other axis

Join_axes: a labels that specifies the axes to be spliced, which can be used when the join is neither inline nor external

Ignore_index: reordering index

Keys: multiple indexes

Import pandas as pddef df_maker (cols, idxs): return pd.DataFrame ({c: [c+str (I) for i in idxs] for c in cols}, index=idxs)

Df1 = df_maker ('abc', [1meme 2Magne3]) df2 = df_maker (' cde', [3meme 4Mae 5]) print (df1) print (df2) print (pd.concat ([df1,df2])) # by default along axis=0 Concat print (pd.concat ([df1,df2], ignore_index=True)) # reset index (the effect is similar to pd.concat ([df1,df2]) .reset_index (drop=True)) print (pd.concat ([df1,df2], axis=1)) # merge print (pd.concat ([df1,df2], axis=1, join='inner')) # along the column Use outreach because only index=3 is repeated in the line So there is only one line print (pd.concat ([df1,df2], axis=1, join_axes= [df1.index]) # that specifies that only the index of df1 is fetched

From pandas import Indexindex = Index ([1jing2jin4]) print (pd.concat ([df1,df2], axis=1, join_axes= [index]) # Custom index

Print (pd.concat ([df1,df2], axis=0,keys= ["first group", "second group")) # defines multiple indexes through key

Append (self, other, ignore_index=False, verify_integrity=False)

Description of common parameters:

Other: another df

Ignore_index: if True, rearrange the index

Verify_integrity: verify the uniqueness of index. If there is any repetition, an error will be reported. If ignore_index is already set, this parameter is invalid

Import pandas as pddef df_maker (cols, idxs): return pd.DataFrame ({c: [c+str (I) for i in idxs] for c in cols}, index=idxs)

Df1 = df_maker ('abc', [1je 2je 3]) df2 = df_maker (' cde', [3je 4je 5]) print (df1.append (df2)) # the effect is similar to pd.concat ([df1,df2]) print (df1.append (df2,ignore_index=True)) # index rearrangement, and the effect is similar to pd.concat ([df1,df2], ignore_index=True) # print (df1.append (df2,verify_integrity=True)) # because both df have index=3, so error report

Join (other, on=None, how='left', lsuffix='', rsuffix='', sort=False)

Description of common parameters:

On: the name of the df column on the left side of the reference is key (set_index operation may be required first). If not specified, follow index to join.

How: {'left',' right', 'outer',' inner'}. By default, 'left',' means to follow the index of the left df (or the corresponding column if on is declared); if 'right'abs, follow the df on the left

If 'inner'' is inline; if 'outer'' is fully connected.

Sort: whether to sort by the value corresponding to the key of join. Default is False.

Lsuffix,rsuffix: when there is a conflict between the column names of left and right, avoid errors by setting suffixes

Import pandas as pdimport numpy as np

Df3 = pd.DataFrame ({'lkey': [' foo','bar','baz','foo'], 'value':np.arange (1p5)}) df4 = pd.DataFrame ({' rkey': ['foo','bar','qux','bar'],' value':np.arange (3p7)}) print (df3) print (df4) # print (df3.join (df4)) # both have the same column name 'value' So print (df3.join (df4, lsuffix='_df3',rsuffix='_df4')) # avoid conflicts by adding suffixes print (df3.set_index ('lkey') .join (df4.set_index (' rkey'), how='outer',lsuffix='_df3',rsuffix='_df4')) # can set_indexprint (df3.join (df4.set_index ('rkey'), on='lkey',lsuffix='_df3') by set_indexprint the key on both sides Rsuffix='_df4')) # you can also set the key in the following df And merge with the columns in the specified left df through on, and the returned index remains unchanged.

Pd.merge (left, right, how='inner', on=None, left_on=None, right_on=None

Left_index=False, right_index=False, sort=False

Suffixes= ('_ x','_ y'), copy=True, indicator=False

Validate=None):

It can be used either as the top-level method of pandas or as a method of DataFrame data structure

Description of common parameters:

How: {'left',' right', 'outer',' inner'}, the default 'inner', is similar to the inline of SQL. Left' is similar to SQL's left couplet; 'right' is similar to SQL's right couplet.

Outer' is similar to the all-Union of SQL.

On: the reference column name for merging must be the same. If None, the method automatically matches the same column names in both tables

Left_on: the column to which the left df joins

Right_on: the column to which df joins on the right

Suffixes: name prefixes in the left and right columns

Validate: default None, which can be defined as "one_to_one", "one_to_many", "many_to_one" and "many_to_many", that is, to verify whether one-to-one, one-to-many, many-to-one, or

Many-to-many relationship

SQL sentence review:

Inline: SELECT a.inline, b.* from table1 as an inner join table2 as b on a.ID=b.ID

Leftist League: SELECT A. alliance, b.* from table1 as a left join table2 as b on a.ID=b.ID

Right couplet: SELECT a.mom, b.* from table1 as a right join table2 as b on a.ID=b.ID

Quanlian: SELECT a.union, b.* from table1 as a full join table2 as b on a.ID=b.ID

Import pandas as pddf3 = pd.DataFrame ({'lkey': [' foo','bar','baz','foo'], 'value':np.arange (1p5)}) df4 = pd.DataFrame ({' rkey': ['foo','bar','qux','bar'],' value':np.arange (3p7)}) print (df3) print (df4) print (pd.merge (df3,df4)) # on is None and automatically looks for the same column name, that is, 'value' And defaults to inline print (pd.merge (df3,df4,how='outer')) # print (pd.merge (df3,df4, left_on='lkey',right_on='rkey')) # default inline in external mode 2 foo*2 barprint (pd.merge (df3, df4, left_on='lkey',right_on='rkey', how='left')) # connect print (pd.merge (df3, df4, left_on='lkey',right_on='rkey', how='right')) based on df3 on the left # connect print (pd.merge (df3, df4, left_on='lkey',right_on='rkey') based on df4 on the right How='outer')) # full connection print (pd.merge (df3, df4, left_on='lkey',right_on='rkey', how='inner')) # read the above content through the connection Have you mastered how to use pandas to merge data in python? If you want to learn more skills or want to know more about it, you are welcome to follow the industry information channel, thank you for reading!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.