Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

The log of hadoop's Hive actual operation sogou

2025-02-26 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Share

Shulou(Shulou.com)06/03 Report--

Hands-on combat operation Sogou log file

The content used in this section is from Sogou Lab, and the website is:

http://www.sogou.com/labs/dl/q.html

Users can choose to download different versions of data according to their actual memory configuration of Spark machine. In order for all learners to successfully operate the log, we use a mini version of tar.gz file with a size of 384K. Download address: www.sogou.com/labs/resource/q.php

The log format is as follows:

Duel record:

00:00:00 2982199073774412 [360 Security Guard] 8 3 download.it.com.cn/softweb/software/firewall/antivirus/20067/17938.html

The file format is as follows:

Access time\tUser ID\tQuery term\tRank of URL in returned results\tOrder number of user click\tURL of user click

Create tables in hive:

create table sogou(sTime string,userId string,Keyword string,Paiming int,Dianji int,URL string) row format delimited

fields terminated by '\t' lines terminated by '\n' stored as textfile

The results were as follows:

hive> create table sogou(sTime string,userId string,Keyword string,Paiming int,Dianji int,URL string) row format delimited

> fields terminated by '\t' lines terminated by '\n' stored as textfile;

OK

Time taken: 4.422 seconds

Import data:

load data local inpath '/home/dyq/Documents/SogouQ.sample' overwrite into table sogou;

The results showed:

hive> load data local inpath '/home/dyq/Documents/SogouQ.sample' overwrite into table sogou;

Loading data to table default.sogou

OK

Time taken: 3.567 seconds

View table creation:

hive> desc sogou;

OK

stime string

userid string

keyword string

paiming int

dianji int

url string

Time taken: 2.515 seconds, Fetched: 6 row(s)

View data import:

Number of records viewed:

select count(*) from sogou;

It can be seen that the query process is very long and inefficient.

hive> select count(*) from sogou;

WARNING: Hive-on-MR is deprecated in Hive 2 and may not be available in the future versions. Consider using a different execution engine (i.e. tez, spark) or using Hive 1.X releases.

Query ID = dyq_20160828114648_be63db30-cce4-47f0-aa1d-2e0f626c274c

Total jobs = 1

Launching Job 1 out of 1

Number of reduce tasks determined at compile time: 1

In order to change the average load for a reducer (in bytes):

set hive.exec.reducers.bytes.per.reducer=

In order to limit the maximum number of reducers:

set hive.exec.reducers.max=

In order to set a constant number of reducers:

set mapreduce.job.reduces=

Starting Job = job_1472347049416_0004, Tracking URL = http://ubuntu:8088/proxy/application_1472347049416_0004/

Kill Command = /opt/hadoop-2.6.2/bin/hadoop job -kill job_1472347049416_0004

Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 1

2016-08-28 11:47:28,067 Stage-1 map = 0%, reduce = 0%

2016-08-28 11:48:09,628 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 8.1 sec

2016-08-28 11:48:35,383 Stage-1 map = 100%, reduce = 100%, Cumulative CPU 10.68 sec

MapReduce Total cumulative CPU time: 10 seconds 680 msec

Ended Job = job_1472347049416_0004

MapReduce Jobs Launched:

Stage-Stage-1: Map: 1 Reduce: 1 Cumulative CPU: 10.68 sec HDFS Read: 906303 HDFS Write: 105 SUCCESS

Total MapReduce CPU Time Spent: 10 seconds 680 msec

OK

10000

Time taken: 109.492 seconds, Fetched: 1 row(s)

Export data to external files

insert overwrite local directory '/home/dyq/Documents/SogouQdept' select a.Keyword,a.URL from sogou a;

Export only 2 fields, see the situation

hive> insert overwrite local directory '/home/dyq/Documents/SogouQdept' select a.Keyword,a.URL from sogou a;

WARNING: Hive-on-MR is deprecated in Hive 2 and may not be available in the future versions. Consider using a different execution engine (i.e. tez, spark) or using Hive 1.X releases.

Query ID = dyq_20160828115140_2bc274b4-7845-46c2-9ac7-e45fcc5b7b23

Total jobs = 1

Launching Job 1 out of 1

Number of reduce tasks is set to 0 since there's no reduce operator

Starting Job = job_1472347049416_0005, Tracking URL = http://ubuntu:8088/proxy/application_1472347049416_0005/

Kill Command = /opt/hadoop-2.6.2/bin/hadoop job -kill job_1472347049416_0005

Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 0

2016-08-28 11:51:54,594 Stage-1 map = 0%, reduce = 0%

2016-08-28 11:52:06,398 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 3.14 sec

MapReduce Total cumulative CPU time: 3 seconds 140 msec

Ended Job = job_1472347049416_0005

Moving data to local directory /home/dyq/Documents/SogouQdept

MapReduce Jobs Launched:

Stage-Stage-1: Map: 1 Cumulative CPU: 3.14 sec HDFS Read: 901894 HDFS Write: 178503 SUCCESS

Total MapReduce CPU Time Spent: 3 seconds 140 msec

OK

Time taken: 28.859 seconds

View the contents of the sogouQdept file:

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Internet Technology

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report