In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-02-24 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Development >
Share
Shulou(Shulou.com)06/02 Report--
This article mainly introduces "what are the Pandas data merging methods in Python". In the daily operation, I believe that many people have doubts about the Pandas data merging methods in Python. The editor consulted all kinds of materials and sorted out simple and easy-to-use operation methods. I hope it will be helpful to answer the doubts about "what are the Pandas data merging methods in Python?" Next, please follow the editor to study!
1.concat ()
Concat () can be used for inline or external splicing operations between two or more DataFrame rows / columns, and the rows are joined (along the y-axis) by default.
Mode of use
Pd.concat (objs: Union [Iterable [~ FrameOrSeries], Mapping [Union [Hashable, NoneType], ~ FrameOrSeries]], axis=0, join='outer', ignore_index: bool = False, keys=None, levels=None, names=None, verify_integrity: bool = False, sort: bool = False, copy: bool = True,) main parameters
Objs: a sequence or mapping of Series,DataFrame objects.
Axis: connected axis, 0 ('index', row), 1 (' columns', column), default is 0.
Join: connection method, inner (intersection), outer (union). Default is outer.
Ignore_index: whether to reset the index value of the concatenated axis. If True, reset the index to 0,... N-1.
Keys: create a hierarchical index. Can be a list or array of any values, tuple array, array list (if levels is set to a multilevel array)
Names: the name of the level in the generated hierarchical index.
Example
Create two DataFrame.
Df1 = pd.DataFrame ({'char': [' averse,'b'], 'num': [1,2]}) df2 = pd.DataFrame ({' char': ['baked,' c'], 'num': [3,4]})
By default, concat () splices the direction of the row, connecting with outer.
Pd.concat ([D1, D2])
Clear the existing index and reset the index.
Pd.concat ([D1, D2], ignore_index=True)
Add a hierarchical index at the outermost layer of the data through the keys parameter.
Pd.concat ([D1, D2], keys= ['D1', 'D2'])
Specify the names parameter to mark the created index key.
Pd.concat ([D1, D1], keys= ['D1', 'D2'], names= [' DF Name', 'Row ID'])
Combine two DataFrame with overlapping columns and return everything. The columns outside the intersection are populated with NaN.
Df3 = pd.DataFrame ({'char': [' baked,'c'], 'float': [3.0,4.0]}) pd.concat ([df1, df3])
Combine two DataFrame with overlapping columns to return only the contents of the overlapping columns.
Pd.concat ([df1, df3], join= "inner")
Specifies that axis=1 combines DataFrame objects horizontally along the x-axis.
Df4 = pd.DataFrame ({'char': [' baked, 'cased,' d'], 'num': [3,4,5]}, index=range (1,4)) pd.concat ([df1, df4], axis=1)
2.merge ()
Merge () can only be used for inline or external merge operations between two DataFrame column directions. Default column merging (along the x-axis), taking intersection (that is, using the intersection of two DataFrame column names as the join key)
Mode of use
Pd.merge (left, right, how: str = 'inner', on=None, left_on=None, right_on=None, left_index: bool = False, right_index: bool = False, sort: bool = False, suffixes= (' _ x,'_ y'), copy: bool = True, indicator: bool = False, validate=None,) parameters
Left:DataFrame
Right:DataFrame or Series with name
How: {'left',' right', 'outer',' inner'}, default is' inner', connection mode
On: the column index name used for the join, which must exist in both the left and right DataFrame. The default is the intersection of the two DataFrame column names as the join key.
Left_on: the column name used to connect the key in the left DataFrame. This parameter is useful when the left and right columns have different names but have the same meaning.
Right_on: the column name used for the join key in the DataFrame on the right
Left_index: defaults to False and does not use the row index in the left DataFrame as the join key (but in this case it is best to use JOIN)
Right_index: defaults to False and does not use the row index in the right DataFrame as the join key (but in this case it is best to use JOIN)
Sort: default is False. Sort the merged data. Setting it to False can improve performance.
Suffixes: a tuple of string values that specifies the suffix name to be appended to the column name when the same column name exists in the left and right DataFrame. The default is ('_ x','_ y').
Copy: the default is True. Data is always copied to the data structure. Setting it to False can improve performance.
Indicator: displays the source of the data in the merged data
Validate: {"one_to_one" or "1:1", "one_to_many" or "1VR m", "many_to_one" or "MVR 1", "many_to_many" or "MVR m"} if specified, checks whether the merge is of the specified type.
Example
Create two DataFrame.
Df1 = pd.DataFrame ({'name': [' A _ 1,'B _ 1,'C _ 1], 'grade': [60, 70, 80]}) df2 = pd.DataFrame ({' name': ['B _ 1,'C _ 1,'D1], 'grade': [70,80,100]})
By default, merge () merges based on columns that exist at the same time in the two DataFrame, and the merge method takes the intersection.
Df1.merge (df2)
Specify the merging method as outer, and take the union.
Df1.merge (df2, how='outer')
Let's create two more DataFrame.
Df1 = pd.DataFrame ({'name1': [' A _ 1,'B _ 1,'B _ 1,'C _ 1], 'grade': [60, 70, 80, 90]}) df2 = pd.DataFrame ({' name2': ['B _ 1,'C _ 1,'D _ 1,'E _ 1], 'grade': [70,80,90,100]})
Merge df1 and df2 based on the name1 and name2 columns. The grade column is appended with the default suffixes _ x and _ y.
Df1.merge (df2, left_on='name1', right_on='name2')
Merges df1 and df2 and appends the specified left and right suffixes to the end of the overlapping column.
Df1.merge (df2, left_on='name1', right_on='name2', suffixes= ('_ 1mm,'_ 2'))
3.append ()
Append () can be used for stitching in the direction of two or more DataFrame lines (along the y-axis), which takes union by default.
Mode of use
Df1.append (other, ignore_index=False, verify_integrity=False, sort=False) parameters
Other: specify the data to add. DataFrame or Series objects, or a list of these objects
Ignore_index: whether to ignore the index. If True, the axis will be reset to 0,1, … N-1. Default is False
Verify_integrity: if True, ValueError is raised when an index with duplicates is created. Default is False
Sort: if the columns of df1 and other are not aligned, sort the columns. The default is False.
Example
Create two DataFrame.
Df1 = pd.DataFrame ([[1,2], [3,4]], columns=list ('AB')) df2 = pd.DataFrame ([[5,6], [7,8]], columns=list (' BC'))
By default, append () splices two DataFrame vertically along the y-axis, populating the NaN with columns outside the intersection of the df1,df2.
Df1.append (df2)
Set ignore_index to True to reach the index of the reset axis.
Df1.append (df2, ignore_index=True)
4.join ()
Join () is used for stitching the column direction (along the x-axis) between two or more DataFrame. Left stitching is the default.
Mode of use
Df1.join (other, on=None, how='left', lsuffix='', rsuffix='', sort=False)
Other: specify the data to add. DataFrame or Series objects, or a list of these objects
On: connected columns, using index joins by default
How: {'left',' right', 'outer',' inner'}, default is' left', connection mode
Lsuffix: an empty string by default, indicating the suffix of the repeating column in df1
Suffixes for repeating columns in rsuffix:other
Sort: sorts the results on the join key in dictionary order. If False, the order of the connection keys depends on the connection type (keyword).
Example
Create two DataFrame.
Df1 = pd.DataFrame ({'Atoll: [' A0,'A1,'A2,'A3,'A4], 'val': [' V0,'V1,'V2, V3, V4]}) df2 = pd.DataFrame ({'B3: [B3, B4, B5]), 'val': [' V3') 'V4','V5']})
If we want to join using the val column, we need to set val to the index in df1 and df2.
Df1.set_index ('val') .join (df2.set_index (' val'))
Another way to use val column concatenation is to specify the on parameter. Df1.join can only use the index of df2, but you can use any column in df1. So you can simply index the val column in df2 and specify the df1 connection as val through the on parameter.
Df1.join (df2.set_index ('val'), on='val')
Connect the df1,df2 using an external connection
Df1.join (df2.set_index ('val'), on='val', how='outer')
At this point, the study on "what are the methods of merging Pandas data in Python" is over. I hope to be able to solve your doubts. The collocation of theory and practice can better help you learn, go and try it! If you want to continue to learn more related knowledge, please continue to follow the website, the editor will continue to work hard to bring you more practical articles!
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.