Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

What are the efficient Pandas functions

2025-02-24 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Development >

Share

Shulou(Shulou.com)06/02 Report--

This article mainly explains "what are the efficient Pandas functions". The content of the explanation is simple and clear, and it is easy to learn and understand. Please follow the editor's train of thought to study and learn "which efficient Pandas functions".

Before introducing these functions, the first step is to import pandas and numpy.

Import numpy as np import pandas as pd

1. Query

Query is the filtering query function of pandas, which uses Boolean expressions to query the columns of DataFrame, that is, to filter according to the rules of the columns.

Usage:

Pandas.DataFrame.query (self, expr, inplace = False, * * kwargs)

Parameter function:

Expr: query string to evaluate

Inplace=False: whether the query should modify the data or return the modified copy

Kwargs:dict keyword parameter

First, generate a df:

Values_1 = np.random.randint (10, size=10) values_2 = np.random.randint (10, size=10) years = np.arange (2010) groups = ['group':groups,' year':years, 'value_1':values_1,' value_2':values_2}) df

Filtering queries are relatively simple to use, such as checking the column value_1

Df.query ('value_1

< value_2') 查询列year>

= 2016 of the row records:

Df.query ('year > = 2016')

2. Insert

Insert is used to insert a new data column at a specified location in DataFrame. By default, new columns are added to the end, but you can change the location parameter to add the new column to any location.

Usage:

Dataframe.insert (loc, column, value, allow_duplicates=False)

Parameter function:

Loc: int type, indicating the column in which the insertion position is; if the data is inserted in the first column, then loc=0

Column: give the inserted column a name, such as column=' 's new column'

Value: the value of the new column, such as number, array, series, etc.

Allow_duplicates: whether to allow duplicate column names. Select Ture to allow new column names to be duplicated with existing column names

Then use the previous df:

Insert a new column in the third column:

# the value of the new column new_col = np.random.randn (10) # insert the new column in the third column and calculate the df.insert (2, 'new_col', new_col) df from 0

3. Cumsum

Cumsum is the cumulative function of pandas, which is used to find the cumulative value of a column. Usage:

DataFrame.cumsum (axis=None, skipna=True, args, kwargs)

Parameter function:

The name of axis:index or axis

Skipna: excluding NA/ null values

Take the previous df as an example, group has groups A, B, and C, and year has multiple years. We only know the values of value_1 and value_2 for the current year. Now we can use the cumsum function to calculate the cumulative values under the group grouping, such as those before An and 2014.

Of course, only using the cumsum function can not distinguish groups (A, B, C), so we need to combine the grouping function groupby to accumulate the values of (A, B, C) respectively.

Df ['cumsum_2'] = df [[' value_2','group']] .groupby ('group') .cumsum () df

4. Sample

Sample is used to randomly select several rows or columns from the DataFrame. Usage:

DataFrame.sample (n=None, frac=None, replace=False, weights=None, random_state=None, axis=None)

Parameter function:

N: number of rows to be extracted

Frac: the percentage of rows extracted, such as frac=0.8, is 80% of them.

Replace: whether it is put back sampling, True: have put back sampling False: not put back sampling

Weights: character index or probability array

Random_state: random number generator seed

Axis: select rows or columns to extract data axis=0: extract rows axis=1: extract columns

For example, five rows are randomly selected from the df:

Sample1 = df.sample (nasty 5) sample1

Take 60% of the rows randomly from df, and set random number seeds to get the same sample each time:

Sample2 = df.sample (frac=0.6,random_state=2) sample2

5. Where

Where is used to replace values in rows or columns based on conditions. If the condition is met, the original value is maintained, and if the condition is not met, it is replaced with another value. Replace it with NaN by default, or you can specify a special value.

Usage:

DataFrame.where (cond, other=nan, inplace=False, axis=None, level=None, errors='raise', try_cast=False, raise_on_error=None)

Parameter function:

Cond: Boolean condition, if cond is true, keep the original value, otherwise replace it with other

Other: special value to replace

If inplace:inplace is true, it operates on the original data. If False, it operates on the copy of the original data.

Axis: rows or columns

Replace the value less than 5 in the column value_1 in df with 0:

Df ['value_1'] .where (df [' value_1'] > 5,0)

Where is a mask operation.

In computer science and digital logic, "mask" (English: Mask) refers to a series of binary digits, which achieve shielded finger positioning by bit-by-bit operation with the target number.

6. Isin

Isin is also a filtering method that is used to see if a column contains a string, and the return value is Boolean Series to indicate each row.

Usage:

Series.isin (values) or DataFrame.isin (values)

Filter the rows in df whose year column values are in ['2010', '2014,' 2014, and 2017]:

Years = ['2010, 2014, 2014, 2017] DF [df.year.isin (years)]

7. Loc and iloc

Loc and iloc are usually used to select rows and columns, and their functions are similar, but their usage is different.

Usage:

DataFrame.loc [] or DataFrame.iloc []

Loc: select rows and columns by label (column and index)

Iloc: select rows and columns by index location

Select the data in rows 1-3 and column 1-2 of df, and use iloc:

Df.iloc [: 3,:2]

Use loc:

Df.loc [: 2, ['group','year']] 1

Tip: when using loc, the index refers to the index value, including the upper boundary. The iloc index refers to the location of the row, excluding the upper boundary.

Select rows 1, 3, 5, year and value_1 columns:

Df.loc [[1JI 3jue 5], ['year','value_1']]

8. Pct_change

Pct_change is a statistical function that represents the percentage difference between the current element and the previous element, and the interval between the two elements can be adjusted.

For example, given three elements, calculate the percentage difference and get [NaN, 0.5,1.0], which increases by 50% from the first element to the second element and 100% from the second element to the third element.

Usage:

DataFrame.pct_change (periods=1, fill_method='pad', limit=None, freq=None, * * kwargs)

Parameter function:

Periods: interval, i.e. step size

Fill_method: a way to handle null values

Calculate the growth rate for the value_1 column of df:

Df.value_1.pct_change ()

9. Rank

Rank is a ranking function that ranks the values of the original sequence according to the rules (from big to small, from small to big), and returns the ranking after the ranking.

For example, there is a sequence [1Magne7d5], which uses rank to rank from small to big, and returns [1mem4d3], which is the ranking position of each value in the previous sequence.

Usage:

Rank (axis=0, method: str = 'average', numeric_only: Union [bool, NoneType] = None, na_option: str =' keep', ascending: bool = True, pct: bool = False)

Parameter function:

Axis: rows or columns

Method: the way to return the ranking. You can choose {'average',' min', 'max',' first', 'dense'}

Method=average default setting: the same value occupies the top two, and you can't tell who is 1 and who is 2, then the median value is 1.5, and the following one is the third.

Method=max: two people tied for second place, and the next person is third.

Method=min: two people tied for 1st place, and the next person is 3rd.

Method=dense: two people tied for first place, and the next person is second place.

Method=first: the same value is set according to its relative position in the sequence

Ascending: positive order and reverse order

Rank the value_1 listed in df:

Df ['rank_1'] = df [' value_1'] .rank () df

10. Melt

Melt is used to turn a wide table into a narrow table. It is a pivot perspective reversal operation function that converts column names to column data (columns name → column values) and reconstructs DataFrame.

To put it simply, the specified column is opened to the row to become two columns, the category is the variable (assignable) column, and the value is the value (assignable) column.

Usage:

Pandas.melt (frame, id_vars=None, value_vars=None, var_name=None, value_name='value', col_level=None)

Parameter function:

Frame: it means DataFrame

Id_vars [tuple, list, or ndarray, optional]: column names that do not need to be converted, referencing columns used as identifier variables

Value_vars [tuple, list, or ndarray, optional]: references the column to be unperspected. If not specified, use all columns that are not set to id_vars

Var_name [scalar]: refers to the name used for the variable column. If None, use-- frame.columns.name or 'variable'

Value_name [scalar, default is' value']: refers to the name used for the "value" column

Col_level [int or string, optional]: if listed as MultiIndex, it will use this level to melt

For example, there is a string of data showing the movement of people in different cities and every day:

Import pandas as pd df1 = pd.DataFrame ({'city': {0:' a, 1: 'baked, 2:' c'}, 'day1': {0: 1, 1: 3, 2: 5},' day2': {0: 2, 1: 4, 2: 6}}) df1

Now change the day1 and day2 columns into variable columns, and add a value column:

Pd.melt (df1, id_vars= ['city'])

Thank you for your reading, the above is the content of "what are the efficient Pandas functions". After the study of this article, I believe you have a deeper understanding of what efficient Pandas functions there are, and the specific use needs to be verified in practice. Here is, the editor will push for you more related knowledge points of the article, welcome to follow!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Development

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report