In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-01-19 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Servers >
Share
Shulou(Shulou.com)06/01 Report--
This article introduces the relevant knowledge of "collection of hive statistical functions". In the operation of actual cases, many people will encounter such a dilemma. Then let the editor lead you to learn how to deal with these situations. I hope you can read it carefully and be able to achieve something!
Set statistics function 1. Count function: count
Syntax: count (*), count (expr), count (DISTINCT expr [, expr_.])
Return value: int
Description: count (*) counts the number of rows retrieved, including rows with null values; count (expr) returns the number of non-null values in the specified field; count (DISTINCTexpr [, expr_.]) Returns the number of different non-null values for the specified field
For example:
Hive > select count (*) from lxw_dual
twenty
Hive > select count (distinct t) from lxw_dual
ten
two。 Summation statistics function: sum
Syntax: sum (col), sum (DISTINCT col)
Return value: double
Description: the result of the addition of col in the statistical result set of sum (col); the result of the addition of different values of col in the statistical result of sum (DISTINCT col)
For example:
Hive > select sum (t) from lxw_dual
one hundred
Hive > select sum (distinct t) from lxw_dual
seventy
3. Average statistical function: avg
Syntax: avg (col), avg (DISTINCT col)
Return value: double
Description: the average value of col in the avg (col) statistical result set; the average value of the addition of different col values in the avg (DISTINCT col) statistical result
For example:
Hive > select avg (t) from lxw_dual
fifty
Hive > select avg (distinct t) from lxw_dual
thirty
4. Minimum statistical function: min
Syntax: min (col)
Return value: double
Description: the minimum value of col field in the statistical result set
For example:
Hive > select min (t) from lxw_dual
twenty
5. Maximum statistical function: max
Syntax: maxcol)
Return value: double
Description: the maximum value of the col field in the statistical result set
For example:
Hive > select max (t) from lxw_dual
one hundred and twenty
6. Non-empty set population variable function: var_pop
Syntax: var_pop (col)
Return value: double
Description: global variables of col non-empty sets in statistical result sets (ignore null)
For example:
7. Non-empty set sample variable function: var_samp
Syntax: var_samp (col)
Return value: double
Description: sample variables of col non-empty sets in statistical result sets (ignore null)
For example:
8. Overall standard deviation function: stddev_pop
Syntax: stddev_pop (col)
Return value: double
Description: this function calculates the overall standard deviation and returns the square root of the population variable, which is the same as the square root of the VAR_POP function.
For example:
9. Sample standard deviation function: stddev_samp
Syntax: stddev_samp (col)
Return value: double
Description: this function calculates the standard deviation of the sample
For example:
10. Median function: percentile
Syntax: percentile (BIGINT col, p)
Return value: double
Description: for the exact percentile of the pth, p must be between 0 and 1, but the col field currently supports only integers and does not support floating point types.
For example:
11. Median function: percentile
Syntax: percentile (BIGINT col, array (p1 [, p2] …))
Return value: array
Description: the function is similar to the above, and then you can enter multiple percentiles, and the return type is also array, where is the corresponding percentile.
For example:
Select percentile (score,) from lxw_dual; fetches the data of 0.2 and 0.4 positions in the field.
twelve。 Approximate median function: percentile_approx
Syntax: percentile_approx (DOUBLE col, p [, B])
Return value: double
Note: for the approximate percentile of pth, p must be between 0 and 1, and the return type is double, but the col field supports floating point type. Parameter B controls the approximate accuracy of memory consumption. The larger the B is, the higher the accuracy of the result is. The default is 10000. When the number of distinct values in the col field is less than B, the result is the exact percentile
For example:
13. Approximate median function: percentile_approx
Syntax: percentile_approx (DOUBLE col, array (p1 [, p2]...) [, B])
Return value: array
Description: the function is similar to the above, and then you can enter multiple percentiles, and the return type is also array, where is the corresponding percentile.
For example:
14. Histogram: histogram_numeric
Syntax: histogram_numeric (col, b)
Return value: array
Description: calculate the histogram information of col based on b.
For example:
Hive > select histogram_numeric (100Pol 5) from lxw_dual
[{"x": 100.0, "y": 1.0}]
This is the end of the "Collection of hive Statistical functions". Thank you for reading. If you want to know more about the industry, you can follow the website, the editor will output more high-quality practical articles for you!
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.