In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-04-06 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >
Share
Shulou(Shulou.com)06/03 Report--
1. Download the software:
Wget http://apache.fayea.com/pig/pig-0.15.0/pig-0.15.0.tar.gz
two。 Decompression
Tar-zxvf pig-0.15.0.tar.gz
Mv pig-0.15.0 / usr/local/
Ln-s pig-0.15.0 pig
3. Configure environment variables:
Export PATH=PATH=$HOME/bin:/usr/local/hadoop/bin:/usr/local/hadoop/sbin:/usr/local/pig/bin:$PATH
Export PIG_CLASSPATH=/usr/local/hadoop/etc/hadoop
4. Enter grunt shell:
Log in to pig in local mode: all files and execution processes in this mode are local and are generally used for testing programs
[hadoop@host61 ~] $pig-x local
15-10-03 01:14:09 INFO pig.ExecTypeProvider: Trying ExecType: LOCAL
15-10-03 01:14:09 INFO pig.ExecTypeProvider: Picked LOCAL as the ExecType
2015-10-03 01 Apache Pig version 1414 09756 [main] INFO org.apache.pig.Main-Apache Pig version 0.15.0 (r1682971) compiled Jun 01 2015, 11:44:35
2015-10-03 01 Logging error messages to 1414 09758 [main] INFO org.apache.pig.Main-Logging error messages to: / home/hadoop/pig_1443860049744.log
2015-10-03 01 Default bootup file 14013 [main] INFO org.apache.pig.impl.util.Utils-Default bootup file / home/hadoop/.pigbootup not found
2015-10-03 01 fs.default.name is deprecated 1414 main INFO org.apache.hadoop.conf.Configuration.deprecation-fs.default.name is deprecated. Instead, use fs.defaultFS
2015-10-03 01 mapred.job.tracker is deprecated 14 main 12656 [mapred.job.tracker is deprecated] Instead, use mapreduce.jobtracker.address
2015-10-03 01 INFO org.apache.pig.backend.hadoop.executionengine.HExecutionEngine 1412 885 [main] INFO org.apache.pig.backend.hadoop.executionengine.HExecutionEngine-Connecting to hadoop file system at: file:///
2015-10-03 01 io.bytes.per.checksum is deprecated 1415 [main] 13573. Instead, use dfs.bytes-per-checksum
Grunt >
Log in in Mapreduce mode: actual working mode:
[hadoop@host63 ~] $pig
15-10-03 02:11:54 INFO pig.ExecTypeProvider: Trying ExecType: LOCAL
15-10-03 02:11:54 INFO pig.ExecTypeProvider: Trying ExecType: MAPREDUCE
15-10-03 02:11:54 INFO pig.ExecTypeProvider: Picked MAPREDUCE as the ExecType
2015-10-03 02 Apache Pig version 1115 086 [main] INFO org.apache.pig.Main-Apache Pig version 0.15.0 (r1682971) compiled Jun 01 2015, 11:44:35
2015-10-03 02 Logging error messages to 1115 087 [main] INFO org.apache.pig.Main-Logging error messages to: / home/hadoop/pig_1443863515062.log
2015-10-03 02 Default bootup file 11 main 55271 [main] INFO org.apache.pig.impl.util.Utils-Default bootup file / home/hadoop/.pigbootup not found
2015-10-03 02 mapred.job.tracker is deprecated 11 main 59735 [mapred.job.tracker is deprecated]. Instead, use mapreduce.jobtracker.address
2015-10-03 02 fs.default.name is deprecated 11 main 59740 [fs.default.name is deprecated]. Instead, use fs.defaultFS
2015-10-03 02 Connecting to hadoop file system at 11 Connecting to hadoop file system at 59742 [main] Connecting to hadoop file system at: hdfs://host61:9000/
2015-10-03 02 mapred.job.tracker is deprecated 12 main [06256] mapred.job.tracker is deprecated. Instead, use mapreduce.jobtracker.address
2015-10-03 02 Connecting to map-reduce job tracker at 12 Connecting to map-reduce job tracker at 06257 [main] Connecting to map-reduce job tracker at: host61:9001
2015-10-03 02 fs.default.name is deprecated 12 fs.default.name is deprecated 06265 [main]. Instead, use fs.defaultFS
Grunt >
5.pig operates in the following three ways:
1. Script
2.grunt
3. Embedded system
6. Log in to pig and use the common commands:
[hadoop@host63 ~] $pig
15-10-03 06:01:01 INFO pig.ExecTypeProvider: Trying ExecType: LOCAL
15-10-03 06:01:01 INFO pig.ExecTypeProvider: Trying ExecType: MAPREDUCE
15-10-03 06:01:01 INFO pig.ExecTypeProvider: Picked MAPREDUCE as the ExecType
2015-10-03 06 Apache Pig version 01lo 01412 [main] INFO org.apache.pig.Main-Apache Pig version 0.15.0 (r1682971) compiled Jun 01 2015, 11:44:35
2015-10-03 006 Logging error messages to 01lo 01413 [main] INFO org.apache.pig.Main-Logging error messages to: / home/hadoop/pig_1443877261408.log
2015-10-03 06 Default bootup file 01lo 01502 [main] INFO org.apache.pig.impl.util.Utils-Default bootup file / home/hadoop/.pigbootup not found
2015-10-03 006 mapred.job.tracker is deprecated 01V 03657 [main] mapred.job.tracker is deprecated. Instead, use mapreduce.jobtracker.address
2015-10-03 006 fs.default.name is deprecated 01V 03657 [main] fs.default.name is deprecated. Instead, use fs.defaultFS
2015-10-03 006 Connecting to hadoop file system at 01lo 03662 [main] INFO org.apache.pig.backend.hadoop.executionengine.HExecutionEngine-Connecting to hadoop file system at: hdfs://host61:9000/
2015-10-03 006 INFO org.apache.hadoop.conf.Configuration.deprecation-mapred.job.tracker is deprecated. Instead, use mapreduce.jobtracker.address
2015-10-03 06 Connecting to map-reduce job tracker at 01lo 05968 [main] INFO org.apache.pig.backend.hadoop.executionengine.HExecutionEngine-Connecting to map-reduce job tracker at: host61:9001
2015-10-03 006 INFO org.apache.hadoop.conf.Configuration.deprecation-fs.default.name is deprecated. Instead, use fs.defaultFS
Grunt > help
Commands:
;-See the PigLatin manual for details: http://hadoop.apache.org/pig
File system commands:
Fs-Equivalent to Hadoop dfs command: http://hadoop.apache.org/common/docs/current/hdfs_shell.html
Diagnostic commands:
Describe [:: cd /
Show the current directory:
Grunt > ls
Hdfs://host61:9000/in
Hdfs://host61:9000/out
Hdfs://host61:9000/user
Grunt > cd / in
Grunt > ls
Hdfs://host61:9000/in/jdk-8u60-linux-x64.tar.gz 181238643
Hdfs://host61:9000/in/mytest1.txt 23
Hdfs://host61:9000/in/mytest2.txt 24
Hdfs://host61:9000/in/mytest3.txt 4
View file information:
Grunt > cat mytest1.txt
This is the first file
Copy the files in hdfs to the operating system:
Grunt > copyToLocal / in/mytest5.txt / home/hadoop/mytest.txt
[hadoop@host63 ~] $ls-l mytest.txt
-rw-r--r--. 1 hadoop hadoop 102 Oct 3 06:23 mytest.txt
Use the sh+ operating system commands to execute commands in the operating system in grunt:
Grunt > sh ls-l / home/hadoop/mytest.txt
-rw-r--r--. 1 hadoop hadoop 102 Oct 3 06:23 / home/hadoop/mytest.txt
7.pig 's data model:
Bag: tabl
Tuple: OK, record
Field: attribute
Pig does not require that different tuple in the same bag have the same number or type of field.
Common statements of 8.pig latin:
LOAD: point out how to load data
FOREACH: scan line by line and do some processing
FILTER: filtering rows
DUMP: display the results to the screen
STORE: save the results to a file
9. Sample data processing:
Generate test files:
[hadoop@host63 tmp] $ls-l / | awk'{if (NR! = 1) print $NF "#" $5}'> / tmp/mytest.txt
[hadoop@host63 tmp] $cat / tmp/mytest.txt
Bin#4096
Boot#1024
Dev#3680
Etc#12288
Home#4096
Lib#4096
Lib64#12288
Lost+found#16384
Media#4096
Mnt#4096
Opt#4096
Proc#0
Root#4096
Sbin#12288
Selinux#0
Srv#4096
Sys#0
Tmp#4096
Usr#4096
Var#4096
Load the file:
Grunt > records = LOAD'/ tmp/mytest.txt' USING PigStorage ('#') AS (filename:chararray,size:int)
2015-10-03 07 fs.default.name is deprecated. 35 main 48 479. Instead, use fs.defaultFS
2015-10-03 07 mapred.job.tracker is deprecated. 35 mapred.job.tracker is deprecated 35 main. Instead, use mapreduce.jobtracker.address
2015-10-03 07 io.bytes.per.checksum is deprecated 35 io.bytes.per.checksum is deprecated 35 main. Instead, use dfs.bytes-per-checksum
2015-10-03 0715 INFO org.apache.hadoop.conf.Configuration.deprecation-mapred.job.tracker is deprecated. Instead, use mapreduce.jobtracker.address
2015-10-03 07 io.bytes.per.checksum is deprecated. 35 main 48 723. Instead, use dfs.bytes-per-checksum
2015-10-03 07 fs.default.name is deprecated. 35 main 48 723. Instead, use fs.defaultFS
Display the file:
Grunt > DUMP records
(bin,4096)
(boot,1024)
(dev,3680)
(etc,12288)
(home,4096)
(lib,4096)
(lib64,12288)
(lost+found,16384)
(media,4096)
(mnt,4096)
(opt,4096)
(proc,0)
(root,4096)
(sbin,12288)
(selinux,0)
(srv,4096)
(sys,0)
(tmp,4096)
(usr,4096)
(var,4096)
Displays the structure of the records:
Grunt > DESCRIBE records
Records: {filename: chararray,size: int}
Filter records:
Grunt > filter_records = FILTER records BY size > 4096
Grunt > DUMP fileter_records
(etc,12288)
(lib64,12288)
(lost+found,16384)
(sbin,12288)
Grunt > DESCRIBE filter_records
Filter_records: {filename: chararray,size: int}
Grouping:
Grunt > group_records = GROUP records BY size
Grunt > DUMP group_records
(0, {(sys,0), (proc,0), (selinux,0)})
(1024, {(boot,1024)})
(3680, {(dev,3680)})
(4096, (var,4096), (usr,4096), (tmp,4096), (srv,4096), (root,4096), (opt,4096), (mnt,4096), (media,4096), (lib,4096), (home,4096), (bin,4096)})
(12288, {(etc,12288), (lib64,12288), (sbin,12288)})
(16384, {(lost+found,16384)})
Grunt > DESCRIBE group_records
Group_records: {group: int,records: {(filename: chararray,size: int)}}
Format:
Grunt > format_records = FOREACH group_records GENERATE group, FLATTEN (records)
De-weight:
Grunt > dis_records = DISTINCT records
Sort:
Grunt > ord_records = ORDER dis_records BY size desc
Take the first three rows of data:
Grunt > top_records=LIMIT ord_records 3
Find the maximum value:
Grunt > max_records = FOREACH group_records GENERATE group,MAX (records.size)
Grunt > DUMP max_records
(0pl 0)
(1024pr 1024)
(3680, 3680)
(4096 and 4096)
(12288, 12288)
(16384pm 16384)
View the execution plan:
Grunt > EXPLAIN max_records
Save the recordset:
Grunt > STORE group_records INTO'/ tmp/mytest_group'
Grunt > STORE filter_records INTO'/ tmp/mytest_filter'
Grunt > STORE max_records INTO'/ tmp/mytest_max'
10.UDF
Pig supports writing UDF using java,python,javascript
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.