How to add, delete, change, check, remove duplicates, sample and other basic operations in Pandas 04/28 Update SLTechnology News&Howtos

How to add, delete, change, check, remove duplicates, sample and other basic operations in Pandas

2025-04-28 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Servers >

Shulou(Shulou.com)06/01 Report--

This article mainly explains "how to add, delete, change, check, remove duplicates, sample and other basic operations of Pandas data box", interested friends may wish to have a look. The method introduced in this paper is simple, fast and practical. Next let the editor to take you to learn "Pandas how to add, delete, change, check, to repeat, sampling and other basic operations"!

Summary

There are three main index functions for pandas:

Loc tag index, row and column names

Iloc integer index (absolute positional index), rows and columns in absolute sense, starting with 0

Ix is a combination of iloc and loc

At is a shortcut to loc

Iat is a shortcut to iloc

Create a test dataset:

Import pandas as pddf = pd.DataFrame ({'asides: [1, 2, 3],' baked: ['A', 'baked,' c'], 'clocked: ["A", "B", "C"]}) print (df) a b c0 1 an A1 2b B2 3 c C line operation selects a row print (df.loc [1J:]) a 2b bc BName: 1, dtype: object selects a multiline print (df.loc [1:2]) :]) # Select 1:2 line Slice is 1 a b c 1 2 b B 2 23 c Cprint (df.loc [::-1 slice:]) # Select all lines, slice is-1, so select 0 to 2 lines for reverse a b c 2 3 c C 1 2 b B0 1 an Aprint (df.loc [0df.loc [0VZ 2JZ 2):]) # Select lines 0 to 2, slice = 2 It is equivalent to print (df.loc [0print [0print]), because there are only three lines of a b c01 an A 23 c C conditional screening general condition screening print (df.loc [:, "a"] > 2) # the principle is to first make a judgment, and then screen 0 False1 False2 TrueName: a, dtype: boolprint (df.loc [df.loc [:, "a"] > 2J:]) a b c23 c C.

In addition, conditional filtering can also set logical operators | for or, & for and, and ~ for not

In: s = pd.Series (range (- 3,4)) In [132]: s [(s)

< -1) | (s >

Out: 0-31-24 15 26 3dtype: int64isin non-indexed columns using isinIn [141s]: s = pd.Series (np.arange (5), index=np.arange (5) [::-1], dtype='int64') In [143i]: s.isin ([2,4,6]) Out [143s]: 4 False3 False2 True1 False0 Truedtype: boolIn [144s]: s [s.isin ([2,4]) 2 20 4dtype: int64 index columns use isinIn [145i]: s [s.index.isin ([2,4,6])] Out [145i]: 402 2dtype: int64# compare it to the followingIn [146i]: s [[2,4,6]] Out [146i]: 22.040.06 NaNdtype: float64 combined with any () / all () In [151l]: df = pd.DataFrame ({'vals': [1,2,3) in multi-column indexing 4], 'ids': [' await, 'baked,' faded,'n'],.: 'ids2': [' axed, 'nailed,' cased,'n']}).: In [156]: values = {'ids': [' ajar,'b'], 'ids2': [' ajar,'c'] 'vals': [1,3]} In [157l]: row_mask = df.isin (values) .all (1) In [158]: DF [row _ mask] Out [158]: ids ids2 vals0 a 1where () In [1]: dates = pd.date_range (' 1Accord 2000 1where, periods=8) In [2]: df = pd.DataFrame (np.random.randn (8,4), index=dates, columns= ['Aids,' Bread,'C'') In [3]: dfOut [3]: A B C D2000-01-01 0.469112-0.282863-1.509059-1.1356322000-01-02 1.212112-0.173215 0.119209-1.0442362000-01-03-0.861849-2.104569-0.494929 1.0718042000-01-04 0.721555-0.706771-1.039575 0.2718602000-01-05 -0.424972 0.567020 0.276232-1.0874012000-01-06-0.673690 0.113648-1.478427 0.5249882000-01-07 0.404705 0.577046-1.715002-1.0392682000-01-08-0.370647-1.157892-1.344312 0.844885In: df.where (df

< 0, -df)Out[162]: A B C D2000-01-01 -2.104139 -1.309525 -0.485855 -0.2451662000-01-02 -0.352480 -0.390389 -1.192319 -1.6558242000-01-03 -0.864883 -0.299674 -0.227870 -0.2810592000-01-04 -0.846958 -1.222082 -0.600705 -1.2332032000-01-05 -0.669692 -0.605656 -1.169184 -0.3424162000-01-06 -0.868584 -0.948458 -2.297780 -0.6847182000-01-07 -2.670153 -0.114722 -0.168904 -0.0480482000-01-08 -0.801196 -1.392071 -0.048788 -0.808838 DataFrame.where() differs from numpy.where()的区别 In [172]: df.where(df < 0, -df) == np.where(df < 0, df, -df) 当series对象使用where()时，则返回一个序列 In [141]: s = pd.Series(np.arange(5), index=np.arange(5)[::-1], dtype='int64')In [159]: s[s >

Out: 3 12 21 30 4dtype: int64In: s.where (s > 0) Out: 4 NaN3 1.02 2.01 3.00 4.0dtype: float64 sampling screening

DataFrame.sample (n=None, frac=None, replace=False, weights=None, random_state=None, axis=None)

When filtering with weights, the unassigned column weights are 0, and if the weight sum is not 1, each weight is divided by the sum. Random_state can set the seed for sampling (seed). Axis can set column random sampling.

In: df2 = pd.DataFrame ({'col1':,' weight_column': [0.5,0.4,0.1,0]}) In: df2.sample (n = 3, weights = 'weight_column') Out: col1 weight_column1 8 0.40 9 0.52 7 0.1 add row df.loc [3] :] = 4 a b c0 1.0 an A1 2.0 b B2 3.0 c C3 4.0 4 insert row

Pandas does not directly specify the method of inserting rows in the index, so you have to set it yourself

Line = pd.DataFrame ({df.columns [0]: "-", df.columns [1]: "-", df.columns [2]: "-"}, index= [1]) df = pd.concat ([df.loc [: 0], line,df.loc [1:]]). Reset_index (drop=True) # df.loc [: 0] cannot be written as df.loc [0] Because df.loc [0] returns series a b c01.0a A1-2 2.0 b B3 3.0 c C4 4.044 exchange line df.loc [[1jue 2],:] = df.loc [[2jue 1],:]. Values a b c01a A1 3 c C222b B delete line df.drop (df) a c1 2b B2 3 c attention

In a data box indexed by time, the index is shaped.

In [39]: dfl = pd.DataFrame (np.random.randn (5) 4), columns=list ('ABCD'), index=pd.date_range (' 20130101') Periods=5) In [40]: dflOut [40]: A B C D2013-01-01 1.075770-0.109050 1.643563-1.4693882013-01-02 0.357021-0.674600-1.776904-0.9689142013-01-03-1.294524 0.413738 0.276662-0.4720352013-01-04-0.013960-0.362543-0.006154-0.9230612013-01-050. 895717 0.805244-1.206412 2.565646In [41]: dfl.loc ['20130102 column operation] Out [41]: A B C D2013-01-02 0.357021-0.674600-1.776904-0.9689142013-01-03-1.294524 0.413738 0.276662-0.4720352013-01-04-0.013960-0.362543-0.006154-0.923061 column operation Select a column of print (df.loc [: "a"]) 0 11 22 3Name: a, dtype: int64 Select multi-column print (df.loc [:, "a": "b"]) a b01a1 2b23 c add column. If you add a column to an existing column, assign df.loc [:, "d"] = 4 a b c d01a A 41 2b B 423 c C 4 exchange the value df.loc [:, ['baked,' a']] = df.loc [:, ['a'] 'b'] .valuesprint (df) a b c0 a 1 A1 b 2 B2 c 3 C delete column

1) Direct del DF ['column-name']

2) using the drop method, there are three equivalent expressions:

DF= DF.drop ('column_name', 1)

DF.drop ('column_name',axis=1, inplace=True)

DF.drop ([DF.columns [[0jue 1,], axis=1,inplace=True)

Df.drop ("a", axis=1,inplace=True) print (df) b c0 an A1 b B2 c C

There are some other features as well:

Slice df.loc [::,::]

Select random sampling df.sample ()

Get rid of the duplicated ()

Query. Lookup

At this point, I believe you have a deeper understanding of "how to add, delete, change, check, remove duplicates, sample and other basic operations of Pandas". You might as well come to the actual operation. Here is the website, more related content can enter the relevant channels to inquire, follow us, continue to learn!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.