In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-02-24 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >
Share
Shulou(Shulou.com)06/03 Report--
1. Overview of execution proc
View the execution flow of the hive statement: explain select … . from t_table...
View the execution flow of the hive statement: explain select … . from t_table... ; operator is the minimum execution unit of hive Hive executes MapReduce programs through execmapper and execreducer, and the execution mode has local mode and distributed mode. Each operator represents a HDFS operation or MapReduce job.
Operator for hive:
Job responsibilities of Hive compiler: Parser: convert Hql statements into abstract grammar books (Abstract Syntax Tree) Semantic Analyzer: convert abstract syntax trees into query blocks Logic Plan Generator: convert query trees into logical query plans Logic Optimizer: rewrite logical query plans, optimize logical execution plans Physical Plan Gernerator: convert logical execution plans into physical plans Physical Optimizer: choose the best join strategy, optimize physical execution plans 2. How Hive works
The general steps of the process are:
1. Users submit queries and other tasks to Driver.
two。 The compiler gets the user's task Plan.
3. According to the user's task, the compiler Compiler goes to MetaStore to get the metadata information of Hive.
4. Compiler Compiler gets metadata information, compiles tasks, first converts HiveQL into abstract syntax tree, then converts abstract syntax tree into query block, converts query block into logical query plan, rewrites logical query plan, transforms logical plan into physical plan (MapReduce), and finally chooses the best strategy.
5. Submit the final plan to Driver.
Driver transfers the planned Plan to ExecutionEngine for execution, obtains metadata information, and submits it to JobTracker or SourceManager to perform the task. The task will directly read the file in HDFS and perform the corresponding operation.
7. Gets the result of the execution.
8. Get and return the execution result.
3. Analysis of the specific implementation process of hive (1) Join (reduce join)
Example: SELECT pv.pageid, u.age FROM page_view pv JOIN user u ON pv.userid = u.userid
Map side: take the columns in the JOIN ON condition as the Key, take the required fields in the page_ view table, and the table identification as the value, and finally sort through the key, that is, the join field.
Shuffle side: Hash according to the value of Key, and push the Key/Value pair to different pairs of Reduce according to the Hash value
Reduce side: grouping according to key, taking out different data according to the identification of different tables, and splicing.
(2) group by
Example: SELECT pageid, age, count (1) FROM pv_users GROUP BY pageid, age
Map side:
Key: use pageid and age as key, and have combiner on the output side of map.
Value: 1 time
Reduce side: summing the value
(3) distinct
Example: select distinct age from log
Map side:
Key:age
Value:null
Reduce side:
A group needs only one output context.write (key,null).
(4) distinct+count
Example: select count (distinct userid) from weibo_temp
Even if the number of reduce is set to 3, only one will be executed in the end, because count () is global and only one reducetask can be turned on.
Map side:
Key:userid
Value: null
Reduce side:
Only one for a group, define a global variable for counting, and output context.write (key,count) in cleanup (Context context)
Of course, distinct+count is a practice that is easy to generate data skew, which should be avoided as much as possible, and if it cannot be avoided, use this method:
Select count (1) from (select distinct userid from weibo_temp); this can parallelize multiple reduce task tasks, thus solving the problem of excessive pressure on a single node.
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.