Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

Example Analysis of Hive Command Operation

2025-02-24 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Servers >

Share

Shulou(Shulou.com)06/01 Report--

This article mainly shows you the "sample analysis of Hive command operation", which is easy to understand and clear. I hope it can help you solve your doubts. Let the editor lead you to study and learn the article "sample analysis of Hive command operation".

1. Prepare the text file and start Hadoop [root @ hadoop0 ~] # cat / opt/test.txt

JieJie

MengMeng

NingNing

JingJing

FengJie

[root@hadoop0 ~] # start-all.sh

Warning: $HADOOP_HOME is deprecated.

Starting namenode, logging to / opt/hadoop/libexec/../logs/hadoop-root-namenode-hadoop0.out

Localhost: starting datanode, logging to / opt/hadoop/libexec/../logs/hadoop-root-datanode-hadoop0.out

Localhost: starting secondarynamenode, logging to / opt/hadoop/libexec/../logs/hadoop-root-secondarynamenode-hadoop0.out

Starting jobtracker, logging to / opt/hadoop/libexec/../logs/hadoop-root-jobtracker-hadoop0.out

Localhost: starting tasktracker, logging to / opt/hadoop/libexec/../logs/hadoop-root-tasktracker-hadoop0.out

2. Enter the command line [root@hadoop0 ~] # hive

WARNING: org.apache.hadoop.metrics.jvm.EventCounter is deprecated. Please use org.apache.hadoop.log.metrics.EventCounter in all the log4j.properties files.

Logging initialized using configuration in jar bank, filebank, bank, hop, bank, etc.,

Hive history file=/tmp/root/hive_job_log_root_201509252001_1674268419.txt

3. Query yesterday's table hive > select * from stu

OK

JieJie 26 NULL

MM 24 NULL

Time taken: 17.05 seconds

4. Display database hive > show databases

OK

Default

Time taken: 0.237 seconds

5. Create database hive > create database test

OK

Time taken: 0.259 seconds

Hive > show databases

OK

Default

Test

6. Use the database Time taken: 0.119 seconds

Hive > use test

OK

Time taken: 0.03 seconds

7. Create the textfile default format of the table without data compression, resulting in high disk overhead and high data parsing overhead.

It can be used in combination with Gzip and Bzip2 (the system automatically checks and decompresses the query automatically), but in this way, hive does not split the data, so it cannot operate on the data in parallel.

SequenceFile is a kind of binary file support provided by Hadoop API, which is easy to use, divisible and compressible.

SequenceFile supports three compression options: NONE, RECORD, and BLOCK. Record compression ratio is low, and BLOCK compression is generally recommended.

Rcfile is a storage method that combines row and column storage. First of all, it divides the data into rows to ensure that the same record is on the same block, avoiding the need to read multiple block to read a record. Secondly, block data column storage is beneficial to data compression and fast column access.

Hive > create table test1 (str STRING) STORED AS TEXTFILE

OK

Time taken: 0.598 seconds

-- load data

Hive > LOAD DATA LOCAL INPATH'/ opt/test.txt' INTO TABLE test1

Copying data from file:/opt/test.txt

Copying file: file:/opt/test.txt

Loading data to table test.test1

OK

Time taken: 1.657 seconds

Hive > select * from test1

OK

JieJie

MengMeng

NingNing

JingJing

FengJie

Time taken: 0.388 seconds

Hive > select count (*) from test1

Total MapReduce jobs = 1

Launching Job 1 out of 1

Number of reduce tasks determined at compile time: 1

In order to change the average load for a reducer (in bytes):

Set hive.exec.reducers.bytes.per.reducer=

In order to limit the maximum number of reducers:

Set hive.exec.reducers.max=

In order to set a constant number of reducers:

Set mapred.reduce.tasks=

Starting Job = job_201509252000_0001, Tracking URL = http://hadoop0:50030/jobdetails.jsp?jobid=job_201509252000_0001

Kill Command = / opt/hadoop/libexec/../bin/hadoop job-Dmapred.job.tracker=hadoop0:9001-kill job_201509252000_0001

Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 1

2015-09-25 20 09 Stage-1 map = 0%, reduce = 0%

2015-09-25 20 sec 10 Cumulative CPU 19806 Stage-1 map = 100%, reduce = 0%

2015-09-25 20 sec 10 Cumulative CPU 54223 Stage-1 map = 100%, reduce = 100%

MapReduce Total cumulative CPU time: 6 seconds 950 msec

Ended Job = job_201509252000_0001

MapReduce Jobs Launched:

Job 0: Map: 1 Reduce: 1 Cumulative CPU: 6.95 sec HDFS Read: 258 HDFS Write: 2 SUCCESS

Total MapReduce CPU Time Spent: 6 seconds 950 msec

OK

five

Time taken: 77.515 seconds

Create table test1 (str STRING) STORED AS TEXTFILE

Create table test2 (str STRING)

Hive > create table test3 (str STRING) STORED AS SEQUENCEFILE

OK

Time taken: 0.112 seconds

Hive > create table test4 (str STRING) STORED AS RCFILE

OK

Time taken: 0.502 seconds

8. Import the data from the old table into the new table INSERT OVERWRITE TABLE test4 SELECT * FROM test1

9. Set hive parameters hive > SET hive.exec.compress.output=true

Hive > SET io.seqfile.compression.type=BLOCK

10. Check the hive parameter hive > SET

I believe we all have a certain understanding, hope to share the content to help you, if you want to learn more knowledge, welcome to follow the industry information channel!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Servers

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report