Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

How to use the pandas apply () function

2025-04-05 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Development >

Share

Shulou(Shulou.com)06/02 Report--

This article mainly explains "how to use the pandas apply () function". The content of the article is simple and clear, and it is easy to learn and understand. Please follow the editor's train of thought to study and learn "how to use the pandas apply () function".

To understand the functions of pandas, it is necessary to have some concept and understanding of functional programming. Functional programming, including functional programming thinking, is of course a very complex topic, but for the apply () function introduced today, you only need to understand that a function, as an object, can be passed to other functions as an argument, as well as a return value of a function.

Functions as objects can bring about huge changes in the style of the code. For example, there is a variable of type list that contains data from 1 to 10, from which you need to find all the numbers that are divisible by 3. In the traditional way:

Def can_divide_by_three (number): if number% 3 = = 0: return True else: return Falseselected_numbers = [] for number in range (1,11): if can_divide_by_three (number): selected_numbers.append (number)

Loops are indispensable because the can_divide_by_three () function is used only once, so consider simplifying it with an lambda expression:

Divide_by_three = lambda x: True if x% 3 = = 0 else Falseselected_numbers = [] for number in range (1,11): if divide_by_three (item): selected_numbers.append (item)

The above is the traditional programming thinking, while the functional programming thinking is completely different. We can think of it this way: can we just focus on and set the rules and leave the loop to the programming language to take care of the numbers of specific rules from list? Yes, of course. When programmers only care about rules (which may be a condition or defined by a function), the code is much simpler and more readable.

The Python language provides the filter () function with the following syntax:

Filter (function, sequence)

The function of the filter () function: execute function (item) on the item in sequence in turn, compose the item with the result of True into a List/String/Tuple (depending on the type of sequence) and return. With this function, the above code can be simplified to:

Divide_by_three = lambda x: True if x% 3 = = 0 else Falseselected_numbers = filter (divide_by_three, range (1,11))

Put the lambda expression in the statement, and the code is simplified to just one sentence:

Selected_numbers = filter (lambda x: X% 3 = = 0, range (1,11)) Series.apply ()

Back to the topic, the apply () function of pandas can act on Series or the entire DataFrame, and it also automatically traverses the entire Series or DataFrame, running the specified function on each element.

For example, there is now a set of data on students' test scores:

Name Nationality Score Zhang Han 400 Li Hui 450 Wang Han 460

If the nationality is not Han nationality, the total score will be added to the test score by 5 points. Now we need to use pandas to do this calculation. We will add a column to the Dataframe. Of course, if it's just to get the results, the numpy.where () function is simpler, and the main purpose here is to demonstrate the use of the Series.apply () function.

Import pandas as pddf = pd.read_csv ("studuent-score.csv") df ['ExtraScore'] = df [' Nationality'] .apply (lambda x: 5 if x! = 'Han' else 0) df ['TotalScore'] = df [' Score'] + df ['ExtraScore']

For the Nationality column, pandas iterates through each value and executes the lambda anonymous function on that value, storing the results of the calculation in a new Series. The above code displays the result in jupyter notebook as follows:

Name Nationality Score ExtraScore TotalScore

0 Zhang Han 400 0 400

Li Hui 450 5 455

2 Wang Han 460 0 460

Of course, the apply () function can also execute the built-in functions of python, for example, we want to get the number of characters in the column Name, if we use apply ():

The df ['NameLength'] = df [' Name'] .apply (len) apply function receives a function with parameters

According to the pandas help document pandas.Series.apply-pandas 1.3.1 documentation, this function can receive positional parameters or keyword parameters, with the following syntax:

Series.apply (func, convert_dtype=True, args= (), * * kwargs)

For the func parameter, the first parameter in the function definition is required, so the parameters other than the first parameter of funct () are treated as extra parameters and passed as parameters. Let's still use the previous example to illustrate. Suppose that other ethnic minorities except the Han nationality have bonus points. Let's put the bonus points in the parameters of the function and first define an add_extra () function:

Def add_extra (nationality, extra): if nationality! = "Han": return extra else: return 0

Add a column to df:

Df ['ExtraScore'] = df.Nationality.apply (add_extra, args= (5,))

The position parameter is passed by args = () and is of type tuple. You can also call it with the following method:

Df ['ExtraScore'] = df.Nationality.apply (add_extra, extra=5)

After running, the result is:

Name Nationality Score ExtraScore

0 Zhang Han 400 0

Li Hui 450 5

2 Wang Han 460 0

Use add_extra as the lambda function:

Df ['Extra'] = df.Nationality.apply (lambda n, extra: extra if n = =' Han 'else 0, args= (5,))

Let's move on to keyword parameters. Suppose we can give different points to different nationalities and define the add_extra2 () function:

Def add_extra2 (nationaltiy, * * kwargs): return kwargs [nationaltiy] df ['Extra'] = df.Nationality.apply (add_extra2, Han = 0, Hui = 10, Tibet = 5)

The running result is:

Name Nationality Score Extra

0 Zhang Han 400 0

Li Hui 450 10

2 Wang Han 460 0

Compared with the syntax of the apply function, it is not difficult to understand.

DataFrame.apply ()

The DataFrame.apply () function iterates through each element, running the specified function on the element. For example, the following example:

Import pandas as pdimport numpy as npmatrix = [[1df.apply 2 xyz' 3], [4je 5je 6], [7m 8m 9]] df = pd.DataFrame (matrix, columns=list ('xyz'), index=list (' abc')) df.apply (np.square)

After executing the square () function on df, all elements perform square operations:

X y za 1 4 9b 16 25 36c 49 64 81

If you only want apply () to act on specified rows and columns, you can qualify it with the name attribute of the row or column. For example, the following example squares the x column:

Df.apply (lambda x: np.square (x) if x. Nameplate else x) x y za 1 2 3b 16 5 6c 49 8 9

The following example squares the x and y columns:

Df.apply (lambda x: np.square (x) if x.name in ['x,'y'] else x) x y za 1 4 3b 16 25 6c 49 64 9

The following example squares the first line (the line of the a tag):

Df.apply (lambda x: np.square (x) if x.name = ='a 'else x, axis=1)

By default, axis=0 means by column, and axis=1 means by row.

Apply () calculation date subtraction example

Usually we often use date calculation, such as to calculate the interval between two dates, such as the following set of data about the start and end dates of wbs:

Wbs date_from date_to job1 2019-04-01 2019-05-01 job2 2019-04-07 2019-05-17 job3 2019-05-16 2019-05-31 job4 2019-05-20 2019-06-11

Suppose you want to calculate the number of days between start and end dates. A simpler way is to subtract two columns (of type datetime):

Import pandas as pdimport datetime as dtwbs = {"wbs": ["job1", "job2", "job3", "job4"], "date_from": ["2019-04-01", "2019-04-07", "2019-05-16", "2019-05-20"], "date_to": ["2019-05-01", "2019-05-17", "2019-05-31" "2019-06-11"]} df = pd.DataFrame (wbs) df ['elpased'] = df [' date_to'] .apply (pd.to_datetime)-df ['date_from'] .apply (pd.to_datetime)

The apply () function converts the date_from and date_to columns to datetime types. Let's print df:

Wbs date_from date_to elapsed0 job1 2019-04-01 2019-05-01 30 days1 job2 2019-04-07 2019-05-17 40 days2 job3 2019-05-16 2019-05-31 15 days3 job4 2019-05-20 2019-06-11 22 days

The date interval has been calculated, but it is followed by a unit days, because the two datetime types are subtracted and the resulting data type is timedelta64, and if you want only a number, you need to convert it using the days attribute of timedelta.

Elapsed= df ['date_to'] .apply (pd.to_datetime)-df [' date_from'] .apply (pd.to_datetime) df ['elapsed'] = elapsed.apply (lambda x: x.days)

The same effect can be achieved by using the DataFrame.apply () function. We need to define a function get_interval_days () whose first column is a variable of type Series, and when executed, receives each line of DataFrame in turn.

Import pandas as pdimport datetime as dtdef get_interval_days (arrLike, start, end): start_date = dt.datetime.strptime (arrLike [start],'% Ymuri% mmi% d') end_date = dt.datetime.strptime (arrLike [end],'% Ymuri% mmi% d') return (end_date-start_date). Dayswbs = {"wbs": ["job1", "job2", "job3", "job4"] "date_from": ["2019-04-01", "2019-04-07", "2019-05-16", "2019-05-20"], "date_to": ["2019-05-01", "2019-05-17", "2019-05-31", "2019-06-11"]} df = pd.DataFrame (wbs) df ['elapsed'] = df.apply (get_interval_days, axis=1, args= (' date_from') (date_to') Thank you for your reading The above is the content of "how to use the pandas apply () function". After the study of this article, I believe you have a deeper understanding of how to use the pandas apply () function, and the specific use needs to be verified in practice. Here is, the editor will push for you more related knowledge points of the article, welcome to follow!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Development

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report