What are the more useful methods in Pandas? 07/03 Update SLTechnology News&Howtos

What are the more useful methods in Pandas?

2025-07-03 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Shulou(Shulou.com)06/02 Report--

Today, I will talk to you about what are several methods that are easier to use in Pandas, which may not be well understood by many people. in order to make you understand better, the editor has summarized the following contents for you. I hope you can get something according to this article.

It is said that I have not done in-depth learning for a long time now. I have been doing NLP for a period of time, thinking that I can try all kinds of high-end algorithms, but I do not have them yet. Instead, I feel more like data mining. Most of the problems usually encountered are data cleaning work. At this time, tools are very important. There is a good tool that can get twice the result with half the effort. For example, suddenly there is an idea, and then you start to wheeze and build wheels. Finally, you find that, oh, there is a ready-made way to solve problems that can be solved by one line of code. In the end, dozens of lines have been written. As the saying goes, "if you want to flash something, you must first sharpen its weapon." All right, don't talk too much nonsense, here are some magical ways.

Data screening

First import the data, the data has a total of 4 columns, namely, date, week, brand and quantity, a total of 14 rows of data.

Import pandas as pd

Data = pd.read_table ("test.txt")

Print (data.head (2))

Print (data.shape)

Date week number of brands

0 1 3 1 20

1 1 3 5 48

(14, 4)

Then we can take a look at the list of several possible brands and see that there are five brands in total.

Brand = data ['Brand']

Print (set (brand.values.tolist ()

{1, 2, 3, 4, 5}

OK, what should I do if I want to see the data of Brand 1 now? You can do this.

Brand_1 = data [data ['brand'] .isin ([1])]

Print (brand_1)

Date week number of brands

0 1 3 1 20

2 2 4 1 16

4 3 5 1 1411

9 4 6 1 1176

Take a look, the isin () method is used here, so you get all the data for brand 1, and some people may say that another way is to use Groupby. Well, Groupby is the method used to do grouping statistics in pandas. I have no idea? It doesn't matter. Here's the introduction.

There are two more things to do here. Can I check the data of multiple brands? Yes, that's fine.

Brand_n = data [data ['Brand'] .isin ([1,2])]

Print (brand_n)

Date week number of brands

0 1 3 1 20

2 2 4 1 16

4 3 5 1 1411

5 3 5 2 811

9 4 6 1 1176

10 4 6 2 824

The isin () method passes in a list. OK, what do I do if I want to see data other than brand 1? It's easy to use pandas. That's okay.

Brand_ex_1 = data [~ data ['Brand'] .isin ([1])]

Print (brand_ex_1)

Date week number of brands

1 1 3 5 48

3 2 4 3 20

5 3 5 2 811

6 3 5 3 1005

73 5 4 773

8 3 5 5 1565

10 4 6 2 824

11 4 6 3 802

12 4 6 4 1057

13 4 6 5 1107

2. Data grouping

OK, and then say, groupby,groupby is group data by xx. The data is divided into groups according to xx. First look at a chestnut and first group the data by date.

Data_grouped = data.groupby (by=' date')

Print ("all {} groups" .format (data_grouped.ngroups))

# there are 4 groups

Print (data_grouped.ngroup (ascending=True))

0 3

1 3

2 2

3 2

4 1

5 1

6 1

7 1

8 1

9 0

10 0

11 0

12 0

13 0

View the index after grouping

Indices = data_grouped.indices

Day_1 = indices [3]

Print (day_1)

[4 5 6 7 8]

It can still be like this.

For i, j in data_grouped:

Print (I, j.index)

1 Int64Index ([0,1], dtype='int64')

2 Int64Index ([2,3], dtype='int64')

3 Int64Index ([4,5,6,7,8], dtype='int64')

4 Int64Index ([9, 10, 11, 12, 13], dtype='int64')

The grouped data is still a DataFrame object, so you can call the index method.

If you want to do a statistical analysis of the grouped data, you can do so

Import pandas as pd

Data = pd.read_table ("test.txt")

Data_grouped = data.groupby (by=' date') ['quantity'] .mean ()

Print (data_grouped)

This allows you to see the average of each day's quantity.

3. Appy method

What can I do if I want to give the quantity column where each value is multiplied by 2? There are many ways to do it. Here's how to do it with apply.

Import pandas as pd

Data = pd.read_table ("test.txt")

Def double_df (x):

Return 2 * x

Data_double = data ['quantity'] .apply (double_df)

Print (data_double)

0 40

1 96

2 32

3 40

4 2822

5 1622

6 2010

7 1546

8 3130

9 2352

10 1648

11 1604

12 2114

13 2214

Name: quantity, dtype: int64

This makes it easy to do this multiplication task, but the output doesn't seem to be what we want, because we want to keep other columns, so what should we do?

Import pandas as pd

Data = pd.read_table ("test.txt")

Def double_df (x):

Return 2 * x

Data_copy = data.copy ()

Data_copy ['quantity'] = data ['quantity'] .apply (double_df)

Print (data_copy)

Date week number of brands

0 1 3 1 40

1 1 3 5 96

2 2 4 1 32

3 2 4 3 40

4 3 5 1 2822

5 3 5 2 1622

6 3 5 3 2010

7 3 54 1546

8 3 5 5 3130

9 4 6 1 2352

10 4 6 2 1648

11 4 6 3 1604

12 4 6 4 2114

13 4 6 5 2214

Here, you can first copy a copy of data, and then give the column "quantity" in the copied data with the column apply function of the quantity in data, so that there is no data loss.

Well, this is the basic application of apply, what if we want to use the apply function on two columns of data. At first, I wouldn't either. I suddenly had this idea that day, because my data was in both columns, and then I wanted to count the nature of the two columns, but I didn't know how to use them, and then I found the answer on stackflow. Here to share with you.

Def double_df (a, b):

Return "{: .03f}" .format (a / b)

Data_apply = data.apply (lambda row:

Double_df (row ['week']

Row ['Brand'])

Axis=1)

Print (data_apply)

0 3.000

1 0.600

2 4.000

3 1.333

4 5.000

5 2.500

6 1.667

7 1.250

8 1.000

9 6.000

10 3.000

11 2.000

12 1.500

13 1.200

Dtype: object

Or this.

Def double_df (rows):

Return "{: .03f}" .format (rows ['week] / rows [' Brand'])

Data_apply = data.apply (double_df, axis=1)

Print (data_apply)

0 3.000

1 0.600

2 4.000

3 1.333

4 5.000

5 2.500

6 1.667

7 1.250

8 1.000

9 6.000

10 3.000

11 2.000

12 1.500

13 1.200

Dtype: object

Similarly, if you want to get all the original data, you'd better make a copy, or you may report an error. If you are interested, you can try it.

4. Delete NaN and spaces in Pandas

There are only two ways to deal with the missing data, one is to delete it directly, and the other is to add some other data, then how does Pandas delete the missing value? Originally, Pandas provided the dropna method, which was solved by a direct method, but sometimes the missing value is not Nan, but spaces or something else. I have encountered it, and then toss and toss about it, always reporting the error of ValueError, but I clearly used dropna, indicating that the data has not been cleaned. OK, the following method is the three methods I have collected to delete missing values on the Internet, which can be used directly.

Def delet_pandas_na (in_df, columns_name, method='one'):

If method = 'one':

Out_df = in_df.copy ()

Out_ DF [columns _ name] =\

In_ DF [columns _ name] .apply (

Lambda x: np.NaN if len (str (x)) < 1 else x)

Out_df_res = out_ DF [out _ DF [columns _ name] .notnull ()]

Return out_df_res

Elif method = 'two':

Out_df = (in_ DF [columns _ name] .isnull ()) |\

(in_ DF [columns _ name] .apply (

Lambda x: str (x) .isspace ())

Out_df_res = in_df [~ out_df]

Return out_df_res

Else:

In_df.dropna (inplace=True)

Indices_to_keep = ~ in_df.isin ([np.nan

Np.inf

-np.inf]) .any (1)

Return in_ DF [to_keep _ to_keep] .astype (np.float64) after reading the above, do you have any further understanding of what are the better methods in Pandas? If you want to know more knowledge or related content, please follow the industry information channel, thank you for your support.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.