In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-01-22 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >
Share
Shulou(Shulou.com)06/02 Report--
This article introduces the knowledge of "introduction to the usage of R language and Python data aggregation function". In the operation of actual cases, many people will encounter such a dilemma, so let the editor lead you to learn how to deal with these situations. I hope you can read it carefully and be able to achieve something!
R language and Pandas of Python have very rich data aggregation functions, so let's take an inventory of the usage of these functions today.
R language:
Transform
Mutate
Aggregate
Grouy_by+summarize
Ddply
Python:
Groupby
Pivot.table
In R, the quickest way to create new variables is through transform (of course you can choose to use a custom function), which supports the creation of multiple variables based on the same data box.
The classic Yingwei dataset is still used here to demonstrate:
Iris1%group_by (Species)% >% summarize (sums=sum (Sepal.Length))
Grouping aggregation in the R language will greatly improve its execution efficiency if it uses vector functions to operate:
Tapply (iris$Sepal.Length,iris$Species,mean)
Tapply (iris$Sepal.Length,iris$Species,sum)
Tapply (X, INDEX, FUN = NULL, … , simplify = TRUE)
Tapply is a fast grouping aggregation function, its parameters are easy to understand, by providing a metric, a sub-category field, an aggregation function can complete the simple answer data aggregation function.
Library (plyr)
Ddply (iris,. (Species), summarize,means=mean (Sepal.Length))
Ddply (iris,. (Species), summarize,means=sum (Sepal.Length))
Ddply (.data, .aggregate, .fun =) # generally only needs to provide data boxes, aggregate classification fields, and the final aggregate function and aggregate variable formula. It is used in the same way as the built-in tpply.
-
Python:
-
Import pandas as pd
Import numpy as np
The data aggregation tools used in Python mainly include groupby function, agg function, povit_table and so on.
Groupby
Agg
Povit_table
Iris=pd.read_csv ("C:/Users/RAINDU/Desktop/iris.csv", sep= ",")
Iris.head ()
Iris.describe ()
The grouping data can be aggregated quickly by using the groupby method in pandas.
Iris.groupby ('Species') [' Sepal.Length'] .mean ()
Iris.groupby ('Species') [' Sepal.Length'] .sum ()
Iris.groupby ('Species') [' Sepal.Length'] .agg ([len,np.sum,np.mean])
Iris.groupby ('Species') [' Sepal.Length'] .agg ({'count':len,'sum':np.sum,'mean':np.mean})
# Custom name the output:
You can use the corresponding aggregate function directly to aggregate only one variable, and you can use the agg function to aggregate multiple variables.
Pd.pivot_table (iris,index= ["Species"], values= ["Sepal.Length"], aggfunc= [len,np.sum,np.mean], margins=False)
This is the end of the introduction to the usage of R language and Python data aggregation function. Thank you for reading. If you want to know more about the industry, you can follow the website, the editor will output more high-quality practical articles for you!
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.