In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-02-24 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Development >
Share
Shulou(Shulou.com)06/02 Report--
This article mainly explains "what are the efficient Pandas functions". The content of the explanation is simple and clear, and it is easy to learn and understand. Please follow the editor's train of thought to study and learn "which efficient Pandas functions".
Before introducing these functions, the first step is to import pandas and numpy.
Import numpy as np import pandas as pd
1. Query
Query is the filtering query function of pandas, which uses Boolean expressions to query the columns of DataFrame, that is, to filter according to the rules of the columns.
Usage:
Pandas.DataFrame.query (self, expr, inplace = False, * * kwargs)
Parameter function:
Expr: query string to evaluate
Inplace=False: whether the query should modify the data or return the modified copy
Kwargs:dict keyword parameter
First, generate a df:
Values_1 = np.random.randint (10, size=10) values_2 = np.random.randint (10, size=10) years = np.arange (2010) groups = ['group':groups,' year':years, 'value_1':values_1,' value_2':values_2}) df
Filtering queries are relatively simple to use, such as checking the column value_1
Df.query ('value_1
< value_2') 查询列year>= 2016 of the row records:
Df.query ('year > = 2016')
2. Insert
Insert is used to insert a new data column at a specified location in DataFrame. By default, new columns are added to the end, but you can change the location parameter to add the new column to any location.
Usage:
Dataframe.insert (loc, column, value, allow_duplicates=False)
Parameter function:
Loc: int type, indicating the column in which the insertion position is; if the data is inserted in the first column, then loc=0
Column: give the inserted column a name, such as column=' 's new column'
Value: the value of the new column, such as number, array, series, etc.
Allow_duplicates: whether to allow duplicate column names. Select Ture to allow new column names to be duplicated with existing column names
Then use the previous df:
Insert a new column in the third column:
# the value of the new column new_col = np.random.randn (10) # insert the new column in the third column and calculate the df.insert (2, 'new_col', new_col) df from 0
3. Cumsum
Cumsum is the cumulative function of pandas, which is used to find the cumulative value of a column. Usage:
DataFrame.cumsum (axis=None, skipna=True, args, kwargs)
Parameter function:
The name of axis:index or axis
Skipna: excluding NA/ null values
Take the previous df as an example, group has groups A, B, and C, and year has multiple years. We only know the values of value_1 and value_2 for the current year. Now we can use the cumsum function to calculate the cumulative values under the group grouping, such as those before An and 2014.
Of course, only using the cumsum function can not distinguish groups (A, B, C), so we need to combine the grouping function groupby to accumulate the values of (A, B, C) respectively.
Df ['cumsum_2'] = df [[' value_2','group']] .groupby ('group') .cumsum () df
4. Sample
Sample is used to randomly select several rows or columns from the DataFrame. Usage:
DataFrame.sample (n=None, frac=None, replace=False, weights=None, random_state=None, axis=None)
Parameter function:
N: number of rows to be extracted
Frac: the percentage of rows extracted, such as frac=0.8, is 80% of them.
Replace: whether it is put back sampling, True: have put back sampling False: not put back sampling
Weights: character index or probability array
Random_state: random number generator seed
Axis: select rows or columns to extract data axis=0: extract rows axis=1: extract columns
For example, five rows are randomly selected from the df:
Sample1 = df.sample (nasty 5) sample1
Take 60% of the rows randomly from df, and set random number seeds to get the same sample each time:
Sample2 = df.sample (frac=0.6,random_state=2) sample2
5. Where
Where is used to replace values in rows or columns based on conditions. If the condition is met, the original value is maintained, and if the condition is not met, it is replaced with another value. Replace it with NaN by default, or you can specify a special value.
Usage:
DataFrame.where (cond, other=nan, inplace=False, axis=None, level=None, errors='raise', try_cast=False, raise_on_error=None)
Parameter function:
Cond: Boolean condition, if cond is true, keep the original value, otherwise replace it with other
Other: special value to replace
If inplace:inplace is true, it operates on the original data. If False, it operates on the copy of the original data.
Axis: rows or columns
Replace the value less than 5 in the column value_1 in df with 0:
Df ['value_1'] .where (df [' value_1'] > 5,0)
Where is a mask operation.
In computer science and digital logic, "mask" (English: Mask) refers to a series of binary digits, which achieve shielded finger positioning by bit-by-bit operation with the target number.
6. Isin
Isin is also a filtering method that is used to see if a column contains a string, and the return value is Boolean Series to indicate each row.
Usage:
Series.isin (values) or DataFrame.isin (values)
Filter the rows in df whose year column values are in ['2010', '2014,' 2014, and 2017]:
Years = ['2010, 2014, 2014, 2017] DF [df.year.isin (years)]
7. Loc and iloc
Loc and iloc are usually used to select rows and columns, and their functions are similar, but their usage is different.
Usage:
DataFrame.loc [] or DataFrame.iloc []
Loc: select rows and columns by label (column and index)
Iloc: select rows and columns by index location
Select the data in rows 1-3 and column 1-2 of df, and use iloc:
Df.iloc [: 3,:2]
Use loc:
Df.loc [: 2, ['group','year']] 1
Tip: when using loc, the index refers to the index value, including the upper boundary. The iloc index refers to the location of the row, excluding the upper boundary.
Select rows 1, 3, 5, year and value_1 columns:
Df.loc [[1JI 3jue 5], ['year','value_1']]
8. Pct_change
Pct_change is a statistical function that represents the percentage difference between the current element and the previous element, and the interval between the two elements can be adjusted.
For example, given three elements, calculate the percentage difference and get [NaN, 0.5,1.0], which increases by 50% from the first element to the second element and 100% from the second element to the third element.
Usage:
DataFrame.pct_change (periods=1, fill_method='pad', limit=None, freq=None, * * kwargs)
Parameter function:
Periods: interval, i.e. step size
Fill_method: a way to handle null values
Calculate the growth rate for the value_1 column of df:
Df.value_1.pct_change ()
9. Rank
Rank is a ranking function that ranks the values of the original sequence according to the rules (from big to small, from small to big), and returns the ranking after the ranking.
For example, there is a sequence [1Magne7d5], which uses rank to rank from small to big, and returns [1mem4d3], which is the ranking position of each value in the previous sequence.
Usage:
Rank (axis=0, method: str = 'average', numeric_only: Union [bool, NoneType] = None, na_option: str =' keep', ascending: bool = True, pct: bool = False)
Parameter function:
Axis: rows or columns
Method: the way to return the ranking. You can choose {'average',' min', 'max',' first', 'dense'}
Method=average default setting: the same value occupies the top two, and you can't tell who is 1 and who is 2, then the median value is 1.5, and the following one is the third.
Method=max: two people tied for second place, and the next person is third.
Method=min: two people tied for 1st place, and the next person is 3rd.
Method=dense: two people tied for first place, and the next person is second place.
Method=first: the same value is set according to its relative position in the sequence
Ascending: positive order and reverse order
Rank the value_1 listed in df:
Df ['rank_1'] = df [' value_1'] .rank () df
10. Melt
Melt is used to turn a wide table into a narrow table. It is a pivot perspective reversal operation function that converts column names to column data (columns name → column values) and reconstructs DataFrame.
To put it simply, the specified column is opened to the row to become two columns, the category is the variable (assignable) column, and the value is the value (assignable) column.
Usage:
Pandas.melt (frame, id_vars=None, value_vars=None, var_name=None, value_name='value', col_level=None)
Parameter function:
Frame: it means DataFrame
Id_vars [tuple, list, or ndarray, optional]: column names that do not need to be converted, referencing columns used as identifier variables
Value_vars [tuple, list, or ndarray, optional]: references the column to be unperspected. If not specified, use all columns that are not set to id_vars
Var_name [scalar]: refers to the name used for the variable column. If None, use-- frame.columns.name or 'variable'
Value_name [scalar, default is' value']: refers to the name used for the "value" column
Col_level [int or string, optional]: if listed as MultiIndex, it will use this level to melt
For example, there is a string of data showing the movement of people in different cities and every day:
Import pandas as pd df1 = pd.DataFrame ({'city': {0:' a, 1: 'baked, 2:' c'}, 'day1': {0: 1, 1: 3, 2: 5},' day2': {0: 2, 1: 4, 2: 6}}) df1
Now change the day1 and day2 columns into variable columns, and add a value column:
Pd.melt (df1, id_vars= ['city'])
Thank you for your reading, the above is the content of "what are the efficient Pandas functions". After the study of this article, I believe you have a deeper understanding of what efficient Pandas functions there are, and the specific use needs to be verified in practice. Here is, the editor will push for you more related knowledge points of the article, welcome to follow!
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.