Collection of hive statistical functions 07/15 Update SLTechnology News&Howtos

Collection of hive statistical functions

2025-07-15 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Servers >

Shulou(Shulou.com)06/01 Report--

This article introduces the relevant knowledge of "collection of hive statistical functions". In the operation of actual cases, many people will encounter such a dilemma. Then let the editor lead you to learn how to deal with these situations. I hope you can read it carefully and be able to achieve something!

Set statistics function 1. Count function: count

Syntax: count (*), count (expr), count (DISTINCT expr [, expr_.])

Return value: int

Description: count (*) counts the number of rows retrieved, including rows with null values; count (expr) returns the number of non-null values in the specified field; count (DISTINCTexpr [, expr_.]) Returns the number of different non-null values for the specified field

For example:

Hive > select count (*) from lxw_dual

twenty

Hive > select count (distinct t) from lxw_dual

ten

two。 Summation statistics function: sum

Syntax: sum (col), sum (DISTINCT col)

Return value: double

Description: the result of the addition of col in the statistical result set of sum (col); the result of the addition of different values of col in the statistical result of sum (DISTINCT col)

For example:

Hive > select sum (t) from lxw_dual

one hundred

Hive > select sum (distinct t) from lxw_dual

seventy

3. Average statistical function: avg

Syntax: avg (col), avg (DISTINCT col)

Return value: double

Description: the average value of col in the avg (col) statistical result set; the average value of the addition of different col values in the avg (DISTINCT col) statistical result

For example:

Hive > select avg (t) from lxw_dual

fifty

Hive > select avg (distinct t) from lxw_dual

thirty

4. Minimum statistical function: min

Syntax: min (col)

Return value: double

Description: the minimum value of col field in the statistical result set

For example:

Hive > select min (t) from lxw_dual

twenty

5. Maximum statistical function: max

Syntax: maxcol)

Return value: double

Description: the maximum value of the col field in the statistical result set

For example:

Hive > select max (t) from lxw_dual

one hundred and twenty

6. Non-empty set population variable function: var_pop

Syntax: var_pop (col)

Return value: double

Description: global variables of col non-empty sets in statistical result sets (ignore null)

For example:

7. Non-empty set sample variable function: var_samp

Syntax: var_samp (col)

Return value: double

Description: sample variables of col non-empty sets in statistical result sets (ignore null)

For example:

8. Overall standard deviation function: stddev_pop

Syntax: stddev_pop (col)

Return value: double

Description: this function calculates the overall standard deviation and returns the square root of the population variable, which is the same as the square root of the VAR_POP function.

For example:

9. Sample standard deviation function: stddev_samp

Syntax: stddev_samp (col)

Return value: double

Description: this function calculates the standard deviation of the sample

For example:

10. Median function: percentile

Syntax: percentile (BIGINT col, p)

Return value: double

Description: for the exact percentile of the pth, p must be between 0 and 1, but the col field currently supports only integers and does not support floating point types.

For example:

11. Median function: percentile

Syntax: percentile (BIGINT col, array (p1 [, p2] …))

Return value: array

Description: the function is similar to the above, and then you can enter multiple percentiles, and the return type is also array, where is the corresponding percentile.

For example:

Select percentile (score,) from lxw_dual; fetches the data of 0.2 and 0.4 positions in the field.

twelve。 Approximate median function: percentile_approx

Syntax: percentile_approx (DOUBLE col, p [, B])

Return value: double

Note: for the approximate percentile of pth, p must be between 0 and 1, and the return type is double, but the col field supports floating point type. Parameter B controls the approximate accuracy of memory consumption. The larger the B is, the higher the accuracy of the result is. The default is 10000. When the number of distinct values in the col field is less than B, the result is the exact percentile

For example:

13. Approximate median function: percentile_approx

Syntax: percentile_approx (DOUBLE col, array (p1 [, p2]...) [, B])

Return value: array

Description: the function is similar to the above, and then you can enter multiple percentiles, and the return type is also array, where is the corresponding percentile.

For example:

14. Histogram: histogram_numeric

Syntax: histogram_numeric (col, b)

Return value: array

Description: calculate the histogram information of col based on b.

For example:

Hive > select histogram_numeric (100Pol 5) from lxw_dual

[{"x": 100.0, "y": 1.0}]

This is the end of the "Collection of hive Statistical functions". Thank you for reading. If you want to know more about the industry, you can follow the website, the editor will output more high-quality practical articles for you!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.