In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-01-15 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Development >
Share
Shulou(Shulou.com)06/01 Report--
This article mainly introduces what are the Pandas data merge functions in Python. It is very detailed and has a certain reference value. Friends who are interested must finish reading it.
1. Concat
Concat is a function dedicated to data connection merging in pandas. It is very powerful and supports vertical merging and horizontal merging. By default, vertical merging can be set through parameters.
Pd.concat (objs: 'Iterable [NDFrame] | Mapping [Hashable, NDFrame]', axis=0, join='outer', ignore_index: 'bool' = False, keys=None, levels=None, names=None, verify_integrity:' bool' = False, sort: 'bool' = False, copy:' bool' = True,)-> 'FrameOrSeriesUnion'
In the function method, the meanings of the parameters are as follows:
Objs: data for connection, which can be a list of DataFrame or Series
Axis=0: the way to connect. Default is 0, which means vertical connection. Optional 1 is horizontal connection.
Join='outer': merging method. Default is inner, which means intersection. Optional outer is union.
Ignore_index: whether to keep the original index
Keys=None: join relationship, using the passed value as the primary index
Levels=None: used to construct multi-level indexes
Names=None: the name of the index
Verify_integrity: check whether the index is duplicated. If it is True, an error will be reported if the index is duplicated.
Sort: sort columns in union and merge mode
Copy: whether to make a deep copy
Next, we will demonstrate the function of this function.
Basic connection
In [1]: import pandas as pdIn [2]: S1 = pd.Series (['asides,' b']) In [3]: S2 = pd.Series (['croup,' d']) In [4]: s1Out [4]: 0A1 bdtype: objectIn [5]: s2Out [5]: 0c1 ddtype: objectIn [6]: pd.concat ([S1, S2]) Out [6]: 0a1 b0c1 ddtype: objectIn [7]: df1 = pd.DataFrame ([['a'') 1], ['baked, 2]],...: columns= [' letter', 'number']) In [8]: df2 = pd.DataFrame ([' letter', 'number']],...: columns= [' letter', 'number']) In [9]: pd.concat ([df1]) Df2]) Out [9]: letter number0 a 11 b 20 c 31 d 4
Horizontal connection
In [10]: pd.concat ([df1, df2], axis=1) Out [10]: letter number letter number0 a 1 c 31 b 2 d 4
By default, concat is union, and if one of the two data does not correspond to a row or column, it is populated with a null value of NaN.
Merge intersection
In [11]: df3 = pd.DataFrame ([['letter', 3,' cat'], ['dumped, 4,' dog']],...: columns= ['letter',' number'] 'animal']) In [12]: df1Out [12]: letter number0 a 11 b 2In [13]: df3Out [13]: letter number animal0 c 3 cat1 d 4 dogIn [14]: pd.concat ([df1, df3], join='inner') Out [14]: letter number0 a 11 b 20 c 31 d 4
Index reset (do not retain the original index)
In [15]: pd.concat ([df1, df3], join='inner', ignore_index=True) Out [15]: letter number0 a 11 b 22 c 33 d equivalent to the above output In [16]: pd.concat ([df1, df3] Join='inner') .reset_index (drop=True) Out [16]: letter number0 a 11 b 22 c 33 d 4
Specify index
In [17]: pd.concat ([df1, df3], keys= ['df1','df3']) Out [17]: letter number animaldf1 0 a 1 NaN 1 b 2 NaNdf3 0 c 3 cat 1 d 4 dogIn [18]: pd.concat ([df1, df3], keys= [' df1','df3'], names= ['df name' 'row ID']) Out [18]: letter number animaldf name line ID df1 0 a 1 NaN 1 b 2 NaNdf3 0 c 3 cat 1 d 4 dog
Detect duplicates
If the index is duplicated, it fails the test and an error is reported.
In [19]: pd.concat ([df1, df3], verify_integrity=True) Traceback (most recent call last):... ValueError: Indexes have overlapping values: Int64Index ([0,1], dtype='int64')
Columns sorting under merge and union
In [21]: pd.concat ([df1, df3], sort=True) Out [21]: animal letter number0 NaN a 11 NaN b 20 cat c 31 dog d 4
Merger of DataFrame and Series
In [22]: pd.concat ([df1, S1]) Out [22]: letter number 00 a 1.0 NaN1 b 2.0 NaN0 NaN NaN A1 NaN NaN bIn [23]: pd.concat ([df1, S1] Axis=1) Out [23]: letter number 00a1a1b 2b# In [24]: df1.assign (new column = S1) Out [24]: letter number new column 0a1a1b 2 bIn [25]: df1 ['new column'] = s1In [26]: df1Out [26]: letter number new column 0a 1 a1 b 2 b
These are some of the functions of the concat function method. By contrast, another function, append, can also be used for data appending (vertical merging).
2. Append
Append is mainly used to append data, which is a relatively simple and direct way of data merging.
Df.append (other, ignore_index: 'bool' = False, verify_integrity:' bool' = False, sort: 'bool' = False,)->' DataFrame'
In the function method, the meanings of the parameters are as follows:
Other: data for append, which can be DataFrame or Series or a list of components
Ignore_index: whether to keep the original index
Verify_integrity: check whether the index is duplicated. If it is True, an error will be reported if the index is duplicated.
Sort: sort columns in union and merge mode
Next, we will demonstrate the function of this function.
Basic addition
In [41]: df1.append (df2) Out [41]: letter number0 a 11 b 20 c 31 d 4In [42]: df1.append ([df1,df2] Df3]) Out [42]: letter number animal0 a 1 NaN1 b 2 NaN0 a 1 NaN1 b 2 NaN0 c 3 NaN1 d 4 NaN0 c 3 cat1 d 4 dog
Columns reset (do not retain the original index)
In [43]: df1.append ([df1,df2,df3], ignore_index=True) Out [43]: letter number animal0 a 1 NaN1 b 2 NaN2 a 1 NaN3 b 2 NaN4 c 3 NaN5 d 4 NaN6 c 3 cat7 d 4 dog
Detect duplicates
If the index is duplicated, it fails the test and an error is reported.
In [44]: df1.append ([df1,df2], verify_integrity=True) Traceback (most recent call last):... ValueError: Indexes have overlapping values: Int64Index ([0,1], dtype='int64')
Index sort
In [46]: df1.append ([df1,df2,df3], sort=True) Out [46]: animal letter number0 NaN a 11 NaN b 20 NaN a 11 NaN b 20 NaN c 31 NaN d 40 cat c 31 dog d 4
Append Series
In [49]: s = pd.Series ({'letter':'s1','number':9}) In [50]: sOut [50]: letter s1number 9dtype: objectIn [51]: df1.append (s) Traceback (most recent call last):... TypeError: Can only append a Series if ignore_index=True or if the Series has a nameIn [53]: df1.append (s, ignore_index=True) Out [53]: letter number0 a 11 b 22 S19
Additional dictionary
This works better when crawling. Every time a piece of data is crawled, it is merged into DataFrame-like data and stored.
In [54]: dic = {'letter':'s1','number':9} In [55]: df1.append (dic, ignore_index=True) Out [55]: letter number0 a 11 b 22 s 93. Merge
The merge function method is similar to join in SQL and can be pd.merge or df.merge, except that the data to be merged in the latter is
Pd.merge (left: 'DataFrame | Series', right:' DataFrame | Series', how: 'str' =' inner', on: 'IndexLabel | None' = None, left_on:' IndexLabel | None' = None, right_on: 'IndexLabel | None' = None, left_index:' bool' = False, right_index: 'bool' = False, sort:' bool' = False Suffixes: 'Suffixes' = (' _ x','_ y'), copy: 'bool' = True, indicator:' bool' = False, validate: 'str | None' = None,)->' DataFrame'
In the function method, the key parameters have the following meanings:
Left: left side data for connection
Right: data on the right side of the connection
How: data connection method. Default is inner. Optional outer, left and right
On: connect key fields, both in the left and right data need to exist, otherwise use left_on and right_on
Left_on: the key field of the data on the left for the connection
Right_on: the key field of the data on the right for the connection
Left_index: True indicates that the left index is a join key field
Right_index: True indicates that the right index is a join key field
Suffixes: 'Suffixes' = (' _ x _ names,'_ y'), which can be freely specified, that is, the column names will display the suffix after merging the same column names.
Indicator: whether to show the home source of a row of data after merging
Next, we will demonstrate the function of this function.
Basic merger
In [55]: df1 = pd.DataFrame ({'key': [' foo', 'bar',' bal'],...: 'value2': [1,2,3]}) In [56]: df2 = pd.DataFrame ({' key': ['foo',' bar', 'baz'],.:' value1': [5,6] 7]}) In [57]: df1.merge (df2) Out [57]: key value2 value10 foo 1 51 bar 2 6
Other connection methods
In [58]: df1.merge (df2, how='left') Out [58]: key value2 value10 foo 1 5.01 bar 2 6.02 bal 3 NaNIn [59]: df1.merge (df2, how='right') Out [59]: key value2 value10 foo 1.051 bar 2.0 62 baz NaN 7In [60]: df1.merge (df2) How='outer') Out [60]: key value2 value10 foo 1.05.01 bar 2.06.02 bal 3.0 NaN3 baz NaN 7.0In [61]: df1.merge (df2 How='cross') Out [61]: key_x value2 key_y value10 foo 1 foo 51 foo 1 bar 62 foo 1 baz 73 bar 2 foo 54 bar 2 bar 65 bar 2 baz 76 bal 3 foo 57 bal 3 bar 68 bal 3 baz 7
Specify connection key
You can specify a single connection key or multiple connection keys
In [62]: df1 = pd.DataFrame ({'lkey1': [' foo', 'bar',' bal'],...: 'lkey2': [' await, 'baked,' c'],...: 'value2': [1,2,3]}) In [63]: df2 = pd.DataFrame ({' rkey1': ['foo',' bar') 'baz'],...:' rkey2': ['await,' baked,'c'],...: 'value2': [5,6 7]}) In [64]: df1Out [64]: lkey1 lkey2 value20 foo a 11 bar b 22 bal c 3In [65]: df2Out [65]: rkey1 rkey2 value20 foo a 51 bar b 62 baz c 7In [66]: df1.merge (df2, left_on='lkey1') Right_on='rkey1') Out [66]: lkey1 lkey2 value2_x rkey1 rkey2 value2_y0 foo a 1 foo a 51 bar b 2 bar b 6In [67]: df1.merge (df2, left_on= ['lkey1','lkey2'], right_on= [' rkey1'] 'rkey2']) Out [67]: lkey1 lkey2 value2_x rkey1 rkey2 value2_y0 foo a 1 foo a 51 bar b 2 bar b 6
Specify the index as the key
Out [68]: df1.merge (df2, left_index=True, right_index=True) Out [68]: lkey1 lkey2 value2_x rkey1 rkey2 value2_y0 foo a 1 foo a 51 bar b 2 bar b 62 bal c 3 baz c 7
Set duplicate column suffix
In [69]: df1.merge (df2, left_on='lkey1', right_on='rkey1', suffixes= ['left', 'right']) Out [69]: lkey1 lkey2 value2 left rkey1 rkey2 value2 right 0 foo a 1 foo a 51 bar b 2 bar b 6
Connection indication
Add a new column to display data sources
In [70]: df1.merge (df2, left_on='lkey1', right_on='rkey1', suffixes= ['left', 'right'], how='outer' ...: indicator=True...:) Out [70]: lkey1 lkey2 value2 left rkey1 rkey2 value2 right _ merge0 foo a 1.0 foo a 5.0 both1 bar b 2.0 bar b 6.0 both2 bal c 3.0 NaN left_only3 NaN baz c 7.0 right_only4. Join
Join is a bit like append to concat, for data consolidation.
Df.join (other: 'FrameOrSeriesUnion', on:' IndexLabel | None' = None, how: 'str' =' left', lsuffix: 'str' ='', rsuffix: 'str' ='', sort: 'bool' = False,)->' DataFrame'
In the function method, the key parameters have the following meanings:
Other: right side data for merging
On: connect key fields, both in the left and right data need to exist, otherwise use left_on and right_on
How: data connection method. Default is inner. Optional outer, left and right
Lsuffix: the suffix of the same name on the left
Rsuffix: the suffix of the same name column on the right
Next, we will demonstrate the function of this function.
In [71]: df = pd.DataFrame ({'key': [' K0','K1','K2','K3','K4','K5'],...:'A3','A1','A2','A3','A4','A5']}) In [72]: other = pd.DataFrame ({'key': [' K0','K1') In [73]: dfOut [73]: key A0 K0 A01 K1 A12 K2 A23 K3 34 K4 A45 K5 A5In [74]: otherOut [74]: key B0K0B01 K1 B12 K2 B2In [75]: df.join (other, on='key') Traceback (most recent call last):... ValueError: You are trying to merge on object and int64 columns. If you wish to proceed you should use pd.concat
If you want to use the key keyword, you need key to be an index.
Specify key
In [76]: df.set_index ('key') .join (other.set_index (' key')) Out [76]: a Bkey K0 A0 B0K1 A1 B1K2 A2 B2K3 A3 NaNK4 A4 NaNK5 A5 NaNIn [77]: df.join (other.set_index ('key'), on='key') Out [77]: key A B0 K0 A0 B01 K1 A1 B2 K23 K3 NaN4 K4 NaN5 K5 A5 NaN
Specify duplicate column suffix
In [78]: df.join (other, lsuffix='_ left', rsuffix=' right') Out [78]: key_ left A key right B0 K0 A0 K0 B01 K1 A1 K1 B12 K2 A2 K2 B23 K3 A3 NaN NaN4 K4 NaN NaN5 K5 A5 NaN NaN
Other parameters will not be introduced, which is basically the same as merge.
5. Combine
In the process of data merging, we may need to calculate the values of the corresponding positions. Pandas provides combine and combine_first function methods for cooperative operations in this respect.
Df.combine (other: 'DataFrame', func, fill_value=None, overwrite:' bool' = True,)-> 'DataFrame'
For example, when merging data, take the smallest value of the cell.
In [79]: df1 = pd.DataFrame ({'Aids: [0,0],' bundles: [4,4]}) In [80]: df2 = pd.DataFrame ({'Aids: [1,1],' bones: [3,3]}) In [81]: df1Out [81]: a B0041 10 4In [82]: df2Out [82]: a B01 31 1 3In [83]: take_smaller = lambda S1, S2: S1 if s1.sum ()
< s2.sum() else s2In [84]: df1.combine(df2, take_smaller)Out[84]: A B0 0 31 0 3# 也可以调用numpy的函数In [85]: import numpy as npIn [86]: df1.combine(df2, np.minimum)Out[86]: A B0 0 31 0 3 fill_value填充缺失值 In [87]: df1 = pd.DataFrame({'A': [0, 0], 'B': [None, 4]})In [87]: df2 = pd.DataFrame({'A': [1, 1], 'B': [3, 3]})In [88]: df1Out[88]: A B0 0 NaN1 0 4.0In [89]: df2Out[89]: A B0 1 31 1 3In [90]: df1.combine(df2, take_smaller, fill_value=-88)Out[90]: A B0 0 -88.01 0 4.0 overwrite=False保留 In [91]: df1 = pd.DataFrame({'A': [0, 0], 'B': [4, 4]})In [92]: df2 = pd.DataFrame({'B': [3, 3], 'C': [-10, 1], }, index=[1, 2])In [93]: df1Out[93]: A B0 0 41 0 4In [94]: df2Out[94]: B C1 3 -102 3 1In [95]: df1.combine(df2, take_smaller)Out[95]: A B C0 NaN NaN NaN1 NaN 3.0 -10.02 NaN 3.0 1.0# 保留A列原有的值In [96]: df1.combine(df2, take_smaller, overwrite=False)Out[96]: A B C0 0.0 NaN NaN1 0.0 3.0 -10.02 NaN 3.0 1.0 另外一个combine_first df.combine_first(other: 'DataFrame') ->'DataFrame'
When the element in df is empty, it is replaced by the one in other, and the result is union merging.
In [97]: df1 = pd.DataFrame ({'Aids: [None, 0],' Barrier: [None, 4]}) In [98]: df2 = pd.DataFrame ({'Aids: [1,1],' baked: [3] 3]}) In [99]: df1Out [99]: a B0 NaN NaN1 0.0 4.0In [100]: df2Out [100]: a B0 1 31 11 3In [101]: df1.combine_first (df2) Out [101]: a B0 1.03.01 0.0 4.0In [102]: df1 = pd.DataFrame ({'Aids: [None, 0],' Bread: [4] None]}) In [103]: df2 = pd.DataFrame ({'Barrier: [3,3],' index=: [1,1]}, index= [1] In: df1Out: a B0 NaN 4.01 0.0 NaNIn: df2Out: BC1 1312 3 1In: df1.combine_first (df2) Out: a B C 0 NaN 4.0 NaN1 0.0 1.02 NaN 3.0 is all the content of the article "what are the Pandas data merging functions in Python" Thank you for reading! Hope to share the content to help you, more related knowledge, welcome to follow the industry information channel!
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.