In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-01-16 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >
Share
Shulou(Shulou.com)06/02 Report--
Today, I will talk to you about what are several methods that are easier to use in Pandas, which may not be well understood by many people. in order to make you understand better, the editor has summarized the following contents for you. I hope you can get something according to this article.
It is said that I have not done in-depth learning for a long time now. I have been doing NLP for a period of time, thinking that I can try all kinds of high-end algorithms, but I do not have them yet. Instead, I feel more like data mining. Most of the problems usually encountered are data cleaning work. At this time, tools are very important. There is a good tool that can get twice the result with half the effort. For example, suddenly there is an idea, and then you start to wheeze and build wheels. Finally, you find that, oh, there is a ready-made way to solve problems that can be solved by one line of code. In the end, dozens of lines have been written. As the saying goes, "if you want to flash something, you must first sharpen its weapon." All right, don't talk too much nonsense, here are some magical ways.
Data screening
First import the data, the data has a total of 4 columns, namely, date, week, brand and quantity, a total of 14 rows of data.
Import pandas as pd
Data = pd.read_table ("test.txt")
Print (data.head (2))
Print (data.shape)
"
Date week number of brands
0 1 3 1 20
1 1 3 5 48
(14, 4)
"
Then we can take a look at the list of several possible brands and see that there are five brands in total.
Brand = data ['Brand']
Print (set (brand.values.tolist ()
"
{1, 2, 3, 4, 5}
"
OK, what should I do if I want to see the data of Brand 1 now? You can do this.
Brand_1 = data [data ['brand'] .isin ([1])]
Print (brand_1)
"
Date week number of brands
0 1 3 1 20
2 2 4 1 16
4 3 5 1 1411
9 4 6 1 1176
"
Take a look, the isin () method is used here, so you get all the data for brand 1, and some people may say that another way is to use Groupby. Well, Groupby is the method used to do grouping statistics in pandas. I have no idea? It doesn't matter. Here's the introduction.
There are two more things to do here. Can I check the data of multiple brands? Yes, that's fine.
Brand_n = data [data ['Brand'] .isin ([1,2])]
Print (brand_n)
"
Date week number of brands
0 1 3 1 20
2 2 4 1 16
4 3 5 1 1411
5 3 5 2 811
9 4 6 1 1176
10 4 6 2 824
"
The isin () method passes in a list. OK, what do I do if I want to see data other than brand 1? It's easy to use pandas. That's okay.
Brand_ex_1 = data [~ data ['Brand'] .isin ([1])]
Print (brand_ex_1)
"
Date week number of brands
1 1 3 5 48
3 2 4 3 20
5 3 5 2 811
6 3 5 3 1005
73 5 4 773
8 3 5 5 1565
10 4 6 2 824
11 4 6 3 802
12 4 6 4 1057
13 4 6 5 1107
"
2. Data grouping
OK, and then say, groupby,groupby is group data by xx. The data is divided into groups according to xx. First look at a chestnut and first group the data by date.
Data_grouped = data.groupby (by=' date')
Print ("all {} groups" .format (data_grouped.ngroups))
# there are 4 groups
Print (data_grouped.ngroup (ascending=True))
"
0 3
1 3
2 2
3 2
4 1
5 1
6 1
7 1
8 1
9 0
10 0
11 0
12 0
13 0
"
View the index after grouping
Indices = data_grouped.indices
Day_1 = indices [3]
Print (day_1)
"
[4 5 6 7 8]
"
It can still be like this.
For i, j in data_grouped:
Print (I, j.index)
"
1 Int64Index ([0,1], dtype='int64')
2 Int64Index ([2,3], dtype='int64')
3 Int64Index ([4,5,6,7,8], dtype='int64')
4 Int64Index ([9, 10, 11, 12, 13], dtype='int64')
"
The grouped data is still a DataFrame object, so you can call the index method.
If you want to do a statistical analysis of the grouped data, you can do so
Import pandas as pd
Data = pd.read_table ("test.txt")
Data_grouped = data.groupby (by=' date') ['quantity'] .mean ()
Print (data_grouped)
This allows you to see the average of each day's quantity.
3. Appy method
What can I do if I want to give the quantity column where each value is multiplied by 2? There are many ways to do it. Here's how to do it with apply.
Import pandas as pd
Data = pd.read_table ("test.txt")
Def double_df (x):
Return 2 * x
Data_double = data ['quantity'] .apply (double_df)
Print (data_double)
"
0 40
1 96
2 32
3 40
4 2822
5 1622
6 2010
7 1546
8 3130
9 2352
10 1648
11 1604
12 2114
13 2214
Name: quantity, dtype: int64
"
This makes it easy to do this multiplication task, but the output doesn't seem to be what we want, because we want to keep other columns, so what should we do?
Import pandas as pd
Data = pd.read_table ("test.txt")
Def double_df (x):
Return 2 * x
Data_copy = data.copy ()
Data_copy ['quantity'] = data ['quantity'] .apply (double_df)
Print (data_copy)
"
Date week number of brands
0 1 3 1 40
1 1 3 5 96
2 2 4 1 32
3 2 4 3 40
4 3 5 1 2822
5 3 5 2 1622
6 3 5 3 2010
7 3 54 1546
8 3 5 5 3130
9 4 6 1 2352
10 4 6 2 1648
11 4 6 3 1604
12 4 6 4 2114
13 4 6 5 2214
"
Here, you can first copy a copy of data, and then give the column "quantity" in the copied data with the column apply function of the quantity in data, so that there is no data loss.
Well, this is the basic application of apply, what if we want to use the apply function on two columns of data. At first, I wouldn't either. I suddenly had this idea that day, because my data was in both columns, and then I wanted to count the nature of the two columns, but I didn't know how to use them, and then I found the answer on stackflow. Here to share with you.
Def double_df (a, b):
Return "{: .03f}" .format (a / b)
Data_apply = data.apply (lambda row:
Double_df (row ['week']
Row ['Brand'])
Axis=1)
Print (data_apply)
"
0 3.000
1 0.600
2 4.000
3 1.333
4 5.000
5 2.500
6 1.667
7 1.250
8 1.000
9 6.000
10 3.000
11 2.000
12 1.500
13 1.200
Dtype: object
"
Or this.
Def double_df (rows):
Return "{: .03f}" .format (rows ['week] / rows [' Brand'])
Data_apply = data.apply (double_df, axis=1)
Print (data_apply)
"
0 3.000
1 0.600
2 4.000
3 1.333
4 5.000
5 2.500
6 1.667
7 1.250
8 1.000
9 6.000
10 3.000
11 2.000
12 1.500
13 1.200
Dtype: object
"
Similarly, if you want to get all the original data, you'd better make a copy, or you may report an error. If you are interested, you can try it.
4. Delete NaN and spaces in Pandas
There are only two ways to deal with the missing data, one is to delete it directly, and the other is to add some other data, then how does Pandas delete the missing value? Originally, Pandas provided the dropna method, which was solved by a direct method, but sometimes the missing value is not Nan, but spaces or something else. I have encountered it, and then toss and toss about it, always reporting the error of ValueError, but I clearly used dropna, indicating that the data has not been cleaned. OK, the following method is the three methods I have collected to delete missing values on the Internet, which can be used directly.
Def delet_pandas_na (in_df, columns_name, method='one'):
If method = 'one':
Out_df = in_df.copy ()
Out_ DF [columns _ name] =\
In_ DF [columns _ name] .apply (
Lambda x: np.NaN if len (str (x)) < 1 else x)
Out_df_res = out_ DF [out _ DF [columns _ name] .notnull ()]
Return out_df_res
Elif method = 'two':
Out_df = (in_ DF [columns _ name] .isnull ()) |\
(in_ DF [columns _ name] .apply (
Lambda x: str (x) .isspace ())
Out_df_res = in_df [~ out_df]
Return out_df_res
Else:
In_df.dropna (inplace=True)
Indices_to_keep = ~ in_df.isin ([np.nan
Np.inf
-np.inf]) .any (1)
Return in_ DF [to_keep _ to_keep] .astype (np.float64) after reading the above, do you have any further understanding of what are the better methods in Pandas? If you want to know more knowledge or related content, please follow the industry information channel, thank you for your support.
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.