The query of the hive table of good programmer big data's learning route 07/01 Update SLTechnology News&Howtos

The query of the hive table of good programmer big data's learning route

2025-07-01 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Shulou(Shulou.com)06/03 Report--

Good programmer big data learning route hive table query

1. join query

1. Small result sets always drive large result sets (small tables drive large tables, small tables are placed on the left table). Try not to use join, but join is difficult to avoid.

left join, left outer join, left semi join

Hive supports left join since version 0.8

Left join and left outer join have similar effects.

The on in the join of hive can only be connected with the equivalent "=", not with

< >

= 90;

5. sort

sort by: sort, partial sort, can only guarantee the result sorting of a single reducer. order by: sort, global sort. Ensure that the results of the entire job are sorted. When there is only one reducer, sort by and order by have the same effect. It is recommended to use sort by usual and: desc asc . (default ascending order)

##Query student average scores sorted in descending order

select

avg(grade) avg_score

from

student join score

on student.id=score.stu_id

group by student.id

order by avg_score desc;

Number of reducers (equal to 1 or 2):

set mapreduce.job.reduces=2

##Order by

select

avg(grade) avg_score

from

student join score

on student.id=score.stu_id

group by student.id

order by avg_score desc;

##Sort by

select

avg(grade) avg_score

from

student join score

on student.id=score.stu_id

group by student.id

sort by avg_score desc;

6.distribute by：

Controls how output from map is reduced. This procedure is used by default for converting the entire hive statement into a job. If it is not written, the hash value in the first column is used by default. When there is only one reducer, it cannot be represented. If distribute by and sort by appear together, pay attention to the order problem? distribute by in front

clusterd by: This is equivalent to distribute by and sort by(ascending order). The following field name needs to be the same clusterd by It has both distribute by and sort by (only in ascending order)

select

from

student

distribute by id

sort by id;

select

from

student

clusterd by id;

7. limit : Limit the result set.

select

id,name

from student

limit 3;

8.union all：

Merge the result sets of two or more queries together without reordering each result set. The number of fields must be consistent, and the field types must be the same as much as possible.

##Merge ID103 using union all

select

id sid,

name snames

from student

where id103;

9.union：

Merge two or more query result sets together, de-duplicate, sort the merged data

##Merge ID103 using union

select

id sid,

name sname

from student

where id103

order by sname;

10. subquery

A subquery is a SELECT statement embedded within another SQL statement. Hive has limited support for subqueries. It only allows subqueries to appear in the FROM clause of a SELECT statement. Hive supports unrelated subqueries that are queried in WHERE clauses using either IN or EXISTS syntax. Hive currently does not support related subqueries, and the execution of related subqueries depends on the data of external queries.

##Unrelated Subqueries

select

id,name

from

student

where id

(select stu_id

from

score);

##Execution of related subqueries depends on data from external queries

select sid,uname

from

(select

id sid,

name uname

from student

) s

order by sid

##Outer queries access the results of subqueries like tables, which is why subqueries must be given an alias (s). Columns in a subquery must have unique names so that outer queries can reference them.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.