In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-02-26 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >
Share
Shulou(Shulou.com)06/03 Report--
Hands-on combat operation Sogou log file
The content used in this section is from Sogou Lab, and the website is:
http://www.sogou.com/labs/dl/q.html
Users can choose to download different versions of data according to their actual memory configuration of Spark machine. In order for all learners to successfully operate the log, we use a mini version of tar.gz file with a size of 384K. Download address: www.sogou.com/labs/resource/q.php
The log format is as follows:
Duel record:
00:00:00 2982199073774412 [360 Security Guard] 8 3 download.it.com.cn/softweb/software/firewall/antivirus/20067/17938.html
The file format is as follows:
Access time\tUser ID\tQuery term\tRank of URL in returned results\tOrder number of user click\tURL of user click
Create tables in hive:
create table sogou(sTime string,userId string,Keyword string,Paiming int,Dianji int,URL string) row format delimited
fields terminated by '\t' lines terminated by '\n' stored as textfile
The results were as follows:
hive> create table sogou(sTime string,userId string,Keyword string,Paiming int,Dianji int,URL string) row format delimited
> fields terminated by '\t' lines terminated by '\n' stored as textfile;
OK
Time taken: 4.422 seconds
Import data:
load data local inpath '/home/dyq/Documents/SogouQ.sample' overwrite into table sogou;
The results showed:
hive> load data local inpath '/home/dyq/Documents/SogouQ.sample' overwrite into table sogou;
Loading data to table default.sogou
OK
Time taken: 3.567 seconds
View table creation:
hive> desc sogou;
OK
stime string
userid string
keyword string
paiming int
dianji int
url string
Time taken: 2.515 seconds, Fetched: 6 row(s)
View data import:
Number of records viewed:
select count(*) from sogou;
It can be seen that the query process is very long and inefficient.
hive> select count(*) from sogou;
WARNING: Hive-on-MR is deprecated in Hive 2 and may not be available in the future versions. Consider using a different execution engine (i.e. tez, spark) or using Hive 1.X releases.
Query ID = dyq_20160828114648_be63db30-cce4-47f0-aa1d-2e0f626c274c
Total jobs = 1
Launching Job 1 out of 1
Number of reduce tasks determined at compile time: 1
In order to change the average load for a reducer (in bytes):
set hive.exec.reducers.bytes.per.reducer=
In order to limit the maximum number of reducers:
set hive.exec.reducers.max=
In order to set a constant number of reducers:
set mapreduce.job.reduces=
Starting Job = job_1472347049416_0004, Tracking URL = http://ubuntu:8088/proxy/application_1472347049416_0004/
Kill Command = /opt/hadoop-2.6.2/bin/hadoop job -kill job_1472347049416_0004
Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 1
2016-08-28 11:47:28,067 Stage-1 map = 0%, reduce = 0%
2016-08-28 11:48:09,628 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 8.1 sec
2016-08-28 11:48:35,383 Stage-1 map = 100%, reduce = 100%, Cumulative CPU 10.68 sec
MapReduce Total cumulative CPU time: 10 seconds 680 msec
Ended Job = job_1472347049416_0004
MapReduce Jobs Launched:
Stage-Stage-1: Map: 1 Reduce: 1 Cumulative CPU: 10.68 sec HDFS Read: 906303 HDFS Write: 105 SUCCESS
Total MapReduce CPU Time Spent: 10 seconds 680 msec
OK
10000
Time taken: 109.492 seconds, Fetched: 1 row(s)
Export data to external files
insert overwrite local directory '/home/dyq/Documents/SogouQdept' select a.Keyword,a.URL from sogou a;
Export only 2 fields, see the situation
hive> insert overwrite local directory '/home/dyq/Documents/SogouQdept' select a.Keyword,a.URL from sogou a;
WARNING: Hive-on-MR is deprecated in Hive 2 and may not be available in the future versions. Consider using a different execution engine (i.e. tez, spark) or using Hive 1.X releases.
Query ID = dyq_20160828115140_2bc274b4-7845-46c2-9ac7-e45fcc5b7b23
Total jobs = 1
Launching Job 1 out of 1
Number of reduce tasks is set to 0 since there's no reduce operator
Starting Job = job_1472347049416_0005, Tracking URL = http://ubuntu:8088/proxy/application_1472347049416_0005/
Kill Command = /opt/hadoop-2.6.2/bin/hadoop job -kill job_1472347049416_0005
Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 0
2016-08-28 11:51:54,594 Stage-1 map = 0%, reduce = 0%
2016-08-28 11:52:06,398 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 3.14 sec
MapReduce Total cumulative CPU time: 3 seconds 140 msec
Ended Job = job_1472347049416_0005
Moving data to local directory /home/dyq/Documents/SogouQdept
MapReduce Jobs Launched:
Stage-Stage-1: Map: 1 Cumulative CPU: 3.14 sec HDFS Read: 901894 HDFS Write: 178503 SUCCESS
Total MapReduce CPU Time Spent: 3 seconds 140 msec
OK
Time taken: 28.859 seconds
View the contents of the sogouQdept file:
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.