In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-04-11 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Development >
Share
Shulou(Shulou.com)06/02 Report--
This article introduces the knowledge of "how to use the index alignment method of Pandas in Python". In the operation of actual cases, many people will encounter such a dilemma, so let the editor lead you to learn how to deal with these situations. I hope you can read it carefully and be able to achieve something!
one。 Indexed objects support set operations: Union, crossover, difference, symmetry difference.
Demo1:
Import pandas as pd import numpy as np college = pd.read_csv ('data/college.csv') columns = college.columnsc1 = columns [: 4] c2 = columns [2:5] print (c1.union (c2)) print (C1 | c2)
Demo2:
Import pandas as pd import numpy as np college = pd.read_csv ('data/college.csv') columns = college.columnsc1 = columns [: 4] c2 = columns [2:5] print ("C1:", C1) print ("c2:", c2) print (c1.symmetric_difference (c2)) print (C1 ^ c2)
two。 Generate new data with copy ()
An is B: indicates that they both point to the same object. This means that if one is modified, the other will change as well.
Demo1:
three。 Unequal index (difference method of index)
Demo1:
Use difference to find out which tabs are in baseball_14 but not in baseball_15 or baseball_ 16
Import pandas as pd import numpy as np baseball_14 = pd.read_csv ('data/baseball14.csv', index_col='playerID') baseball_15 = pd.read_csv (' data/baseball15.csv', index_col='playerID') baseball_16 = pd.read_csv ('data/baseball16.csv', index_col='playerID') print (baseball_14.index.difference (baseball_15.index)) print (baseball_14.index.difference (baseball_16.index))
four。 Use fill_value to avoid missing values in arithmetic operations
Demo1:
Import pandas as pd import numpy as np baseball_14 = pd.read_csv ('data/baseball14.csv', index_col='playerID') baseball_15 = pd.read_csv (' data/baseball15.csv') Index_col='playerID') # H column: hits_14 = baseball_14 ['H'] hits_15 = baseball_15 ['H'] print (hits_14.head ()) print (hits_15.head ()) print (hits_14.head () + hits_15.head ())
The following four pieces of data are recorded, but because they do not exist in two tables at the same time, the sum will produce NaN, which requires the use of fill_value
Demo2:
Import pandas as pd import numpy as np baseball_14 = pd.read_csv ('data/baseball14.csv', index_col='playerID') baseball_15 = pd.read_csv (' data/baseball15.csv', index_col='playerID') baseball_16 = pd.read_csv ('data/baseball16.csv') Index_col='playerID') # H column: hits_14 = baseball_14 ['H'] hits_15 = baseball_15 ['H'] hits_16 = baseball_16 ['H'] print (hits_14.head (). Add (hits_15.head (), fill_value=0))
* if an element is a missing value in both Series, even if fill_value is used, the result is still missing.
five。 Append columns from different DataFrame
Demo:
Import pandas as pd import numpy as np employee = pd.read_csv ('data/employee.csv') D1 = employee [[' DEPARTMENT', 'BASE_SALARY']] print ("before sorting:") print (d1.head ()) # within each department Sort BASE_SALARY D2 = d1.sort_values (['DEPARTMENT',' BASE_SALARY'], ascending = [True] False]) print ("sorted:") print (d2.head ()) # use the drop_duplicates method to retain the first line of each department d3 = d2.drop_duplicates (subset = 'DEPARTMENT') print (' deduplicated:') print (d3.head ()) # use DEPARTMENT as the row index d3 = d3.set_index ('DEPARTMENT') employee = employee.set_index (' DEPARTMENT') # add a column to employee's DataFrame The corresponding missing item is the missing value # storing the maximum wage of each Department employee ['MAX_SALARY'] = d3 [' BASE_SALARY'] pd.options.display.max_columns = 3print ('merged:') print (employee.head ()) # check with query to see if there is any # output with BASE_SALARY greater than MAX_DEPT_SALARY should be 0print ('query result:') print (employee.query ('BASE_SALARY > MAX_SALARY'))
Employee ['MAX_SALARY'] = d3 [' BASE_SALARY']
The condition that this statement can be executed successfully is that there is no duplicate index in d3, that is, drop_duplicates has been executed
Running result:
This is the end of the content of "how to use the index alignment method of Pandas in Python". Thank you for reading. If you want to know more about the industry, you can follow the website, the editor will output more high-quality practical articles for you!
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.