What is the difference between duplicated and drop_duplicates () in python and how to use it 07/04 Update SLTechnology News&Howtos

What is the difference between duplicated and drop_duplicates () in python and how to use it

2025-07-04 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Development >

Shulou(Shulou.com)06/01 Report--

This article mainly introduces the python in duplicated and drop_duplicates () what is the difference and how to use the relevant knowledge, the content is detailed and easy to understand, simple and fast operation, has a certain reference value, I believe you read this python duplicated and drop_duplicates () what are the differences and how to use the article will have a harvest, let's take a look at it.

Preface

In the calculation, face_track_id map has a feeling:

Start verifying data= {'state': [1, data 1, 2, 2, 1, 2, 2, 2],' pop': ['a repeated data (n is the number of repetitions) frame [frame.duplicated () = True], and only 1 repeated data (n is the number of repetitions) frame [frame.duplicated () = True]

At first, I was very confused. Obviously (1 duplicate b) only appeared once, where was it? In fact, the result of other people's return is to remove the row data that has already appeared once. So it looks a little confuse, it feels like (1) there is no repetition, but in fact, other people's function is very simple, it returns the repeated value and is not redundant.

# indicates that the drop_duplicates () function removes all duplicate data and retains the first item of duplicate data by default. # for example, (2) appeared 3 times, showed 2 times in duplicated (), and retained a frame.drop_duplicates (). Shape$ (4) (2) # left a completely unique data row frame.drop_duplicates () after drop_dupicates ()

Add: pandas duplicate value handling for python (duplicated () and drop_duplicates ())

1. Generate duplicate record data import numpy as npimport pandas as pd # generate duplicate data df=pd.DataFrame (np.ones ([5L2]), columns= ['col1','col2']) df [' col3'] = ['a 'col3','col4','col1''] df ['col4'] = [3 columns= [' col3','col4','col1']] df=df.reindex ('col2']) # put the new column in the first column DF 2, judge duplicate record (row) # judge duplicate data isDplicated=df.duplicated () # judge duplicate data record isDplicated 3. Delete duplicate value # delete duplicate value new_df1=df.drop_duplicates () # Delete all records with the same column value in the data record new_df2=df.drop_duplicates (['col3']) # Delete records in the data record with the same col3 column value new_df3=df.drop_duplicates ([' col4']) # Delete the record new_df4=df.drop_duplicates (['col3']) with the same col4 column value in the data record 'col4']) # Delete records with the same column values in data records (col3 and col4). This is the end of the article new_df1new_df2new_df3new_df4 on "what's the difference and how to use duplicated and drop_duplicates () in python?" Thank you for reading! I believe you all have a certain understanding of the knowledge of "what is the difference and how to use duplicated and drop_duplicates () in python". If you want to learn more, you are welcome to follow the industry information channel.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.