In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-01-19 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Development >
Share
Shulou(Shulou.com)06/01 Report--
This article mainly introduces Pandas how to achieve DataFrame operation, statistics and sorting operations, the article is very detailed, has a certain reference value, interested friends must read it!
Because the data structure of DataFrame contains multiple rows and columns, the calculation and statistics of DataFrame can be based on row data or column data. In order to make it easier for us to use, Pandas provides us with common calculation and statistical methods:
Method of operation method of operation summing sum maximum value max calculating mean mean minimum value min calculating variance var standard deviation std median median mode mode quantile quantile -. Operation
Following the example above, we already have the scores of N students in math, Chinese and English. Now, if we want to calculate the total score of each student, then we can use the following methods:
The summation of rows demonstrates the following two methods: method 1: first delete the column data to be summed (remove the name column), then use the sum function summation method 2: select the columns to be summed one by one, and then use the operator summation method to add a column like the original DataFrame. The data is the sum of each row of data''df ['sum'] = df [[' chinese', 'math',' english']] .sum (1) # method 1df ['sum'] = df [' chinese'] + df ['math'] + df [' english'] # method 2Output: name chinese english math sum0 XiaoMing 99 100 80 2791 LiHua 102 79 92 2732 HanMeiNei 111 130 104 345
We passed parameter 1 in the sum method, which represents that the axis we use is the row (summing the row data). If we want to calculate the sum of each column, we only need to pass 0 (the default parameter of the sum function is 0, so it can not be passed):
Df [['chinese',' math', 'english']] .sum (0) Output:chinese 312math 276english 309dtype: int64
Now that we have the total score, then the math teacher or the Chinese teacher will care about the average data score of the students in this class. Similarly, we can calculate it very quickly:
Df ['math'] .mean () # method 1: directly use the mean mean method provided by Pandas df [' math'] .sum () / df.shape [0] # method 2: use the summation method to calculate the sum divided by the total number of people (rows) Output:92.0
Ben? Use DataFrame's shape method, which is used to display the number of rows and columns of DataFrame, which is 0 rows and 1 columns. It is important to note that the output column values do not contain index columns.
The above? Only the average score of mathematics is calculated, and interested partners can work out the average score of English and Chinese by themselves.
two。 Statistics
At this time, the math teacher has a new demand. He wants to check the statistics of the highest, lowest, and median math scores of the students in this class, so there is no panic at all. Pandas can help us with all of them:
Minimum value of df ['math'] .min () # math column Output:80df [' math'] .max () # maximum value of math column Output:104df ['math'] .quantile ([0.3,0.4,0.5]) # 30,40,50 quantile of math column Output:0.3 87.20.4 89.60.5 92.0Name: math Dtype: standard deviation of float64df ['math'] .std () # math column Output:12df [' math'] .var () # Variance of math column Output:144df ['math'] .mean () # average of math column Output:92df [' math'] .median () # math column median Output:92df ['math'] .mode () # math column mode Returns a Series object (it is possible to have a juxtaposition, in this case the mode is 1, so all are returned) Output:0 801 922 104dtype: int64
We can also use DataFrame's describe method to view the basic statistics of DataFrame:
Df.describe () Outprint: chinese english math sumcount 3.000000 3.000000 3.0 3.000000mean 104.000000 103.000000 92.0 299.000000std 6.244998 25.632011 12.0 39.949969min 99.000000 79.000000 80.0 273.000025% 100.500000 89.500000 86.0 276.00000050% 102.000000 100.000000 92.0 279.000075% 106.500000 115.000000 98.0 312.000000max 111.000000 130.000000 104.0 345.000000. Sort
Generally speaking, our score tables are sorted according to the total score from high to low:
Df = df.sort_values (by='sum', ascending=False) Output: name chinese english math sum2 HanMeiNei 11130104 3450 XiaoMing 9910080 2791 LiHua 10279 92273
You can see that we use the sort_values method to sort the DataFrame, while the by parameter is passed in 'sum' to specify sorting by the' sum' field, and ascending is used to set whether to sort descending (False) or ascending (True, default). After sorting with sort_values, a new DataFrame object is returned by default, that is, the original DataFrame object is not affected. In this example, we assign the sorted object to the original DataFrame object. If you do not want to sort the object, you can create a new object. You only need to input inplace=True (modify the original DataFrame):
Df.sort_values (by='sum', ascending=False, inplace=True) print (df) Output: name chinese english math sum2 HanMeiNei 11130104 3450 XiaoMing 9910080 2791 LiHua 10279 92273
Careful friends may find that when we sort, if the row data in DataFrame is adjusted, the index value of the row will not change. In the above example, because we use the default incremental column index, it does not look very friendly after sorting, but don't worry, we can still reset the index value:
Df = df.sort_values (by='sum', ascending=False). Reset_index () Output: index name chinese english math sum0 2 HanMeiNei 11130104 3451 0 XiaoMing 99 10080 2792 1 LiHua 10279 92 273
After resetting the index using reset_index, the index column of our DataFrame object is indeed reset to an incremental sequence, along with a column of data named index. Of course, we can pass in drop=True so that the original index column is not inserted into the new DataFrame:
Df = df.sort_values (by='sum', ascending=False). Reset_index (drop=True) name chinese english math sum0 HanMeiNei 111130104 3451 XiaoMing 9910080 2792 LiHua 10279 92273
In order to show the ranking more intuitively, we can show the ranking of students by using an index value of + 1:
Df.index + = 1 name chinese english math sum1 HanMeiNei 111130104 3452 XiaoMing 99 10080 2793 LiHua 10279 92273 are all the contents of the article "how Pandas implements DataFrame operations, statistics and sorting operations". Thank you for reading! Hope to share the content to help you, more related knowledge, welcome to follow the industry information channel!
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.