Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

Installation and use of pig

2025-04-06 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Share

Shulou(Shulou.com)06/03 Report--

1. Download the software:

Wget http://apache.fayea.com/pig/pig-0.15.0/pig-0.15.0.tar.gz

two。 Decompression

Tar-zxvf pig-0.15.0.tar.gz

Mv pig-0.15.0 / usr/local/

Ln-s pig-0.15.0 pig

3. Configure environment variables:

Export PATH=PATH=$HOME/bin:/usr/local/hadoop/bin:/usr/local/hadoop/sbin:/usr/local/pig/bin:$PATH

Export PIG_CLASSPATH=/usr/local/hadoop/etc/hadoop

4. Enter grunt shell:

Log in to pig in local mode: all files and execution processes in this mode are local and are generally used for testing programs

[hadoop@host61 ~] $pig-x local

15-10-03 01:14:09 INFO pig.ExecTypeProvider: Trying ExecType: LOCAL

15-10-03 01:14:09 INFO pig.ExecTypeProvider: Picked LOCAL as the ExecType

2015-10-03 01 Apache Pig version 1414 09756 [main] INFO org.apache.pig.Main-Apache Pig version 0.15.0 (r1682971) compiled Jun 01 2015, 11:44:35

2015-10-03 01 Logging error messages to 1414 09758 [main] INFO org.apache.pig.Main-Logging error messages to: / home/hadoop/pig_1443860049744.log

2015-10-03 01 Default bootup file 14013 [main] INFO org.apache.pig.impl.util.Utils-Default bootup file / home/hadoop/.pigbootup not found

2015-10-03 01 fs.default.name is deprecated 1414 main INFO org.apache.hadoop.conf.Configuration.deprecation-fs.default.name is deprecated. Instead, use fs.defaultFS

2015-10-03 01 mapred.job.tracker is deprecated 14 main 12656 [mapred.job.tracker is deprecated] Instead, use mapreduce.jobtracker.address

2015-10-03 01 INFO org.apache.pig.backend.hadoop.executionengine.HExecutionEngine 1412 885 [main] INFO org.apache.pig.backend.hadoop.executionengine.HExecutionEngine-Connecting to hadoop file system at: file:///

2015-10-03 01 io.bytes.per.checksum is deprecated 1415 [main] 13573. Instead, use dfs.bytes-per-checksum

Grunt >

Log in in Mapreduce mode: actual working mode:

[hadoop@host63 ~] $pig

15-10-03 02:11:54 INFO pig.ExecTypeProvider: Trying ExecType: LOCAL

15-10-03 02:11:54 INFO pig.ExecTypeProvider: Trying ExecType: MAPREDUCE

15-10-03 02:11:54 INFO pig.ExecTypeProvider: Picked MAPREDUCE as the ExecType

2015-10-03 02 Apache Pig version 1115 086 [main] INFO org.apache.pig.Main-Apache Pig version 0.15.0 (r1682971) compiled Jun 01 2015, 11:44:35

2015-10-03 02 Logging error messages to 1115 087 [main] INFO org.apache.pig.Main-Logging error messages to: / home/hadoop/pig_1443863515062.log

2015-10-03 02 Default bootup file 11 main 55271 [main] INFO org.apache.pig.impl.util.Utils-Default bootup file / home/hadoop/.pigbootup not found

2015-10-03 02 mapred.job.tracker is deprecated 11 main 59735 [mapred.job.tracker is deprecated]. Instead, use mapreduce.jobtracker.address

2015-10-03 02 fs.default.name is deprecated 11 main 59740 [fs.default.name is deprecated]. Instead, use fs.defaultFS

2015-10-03 02 Connecting to hadoop file system at 11 Connecting to hadoop file system at 59742 [main] Connecting to hadoop file system at: hdfs://host61:9000/

2015-10-03 02 mapred.job.tracker is deprecated 12 main [06256] mapred.job.tracker is deprecated. Instead, use mapreduce.jobtracker.address

2015-10-03 02 Connecting to map-reduce job tracker at 12 Connecting to map-reduce job tracker at 06257 [main] Connecting to map-reduce job tracker at: host61:9001

2015-10-03 02 fs.default.name is deprecated 12 fs.default.name is deprecated 06265 [main]. Instead, use fs.defaultFS

Grunt >

5.pig operates in the following three ways:

1. Script

2.grunt

3. Embedded system

6. Log in to pig and use the common commands:

[hadoop@host63 ~] $pig

15-10-03 06:01:01 INFO pig.ExecTypeProvider: Trying ExecType: LOCAL

15-10-03 06:01:01 INFO pig.ExecTypeProvider: Trying ExecType: MAPREDUCE

15-10-03 06:01:01 INFO pig.ExecTypeProvider: Picked MAPREDUCE as the ExecType

2015-10-03 06 Apache Pig version 01lo 01412 [main] INFO org.apache.pig.Main-Apache Pig version 0.15.0 (r1682971) compiled Jun 01 2015, 11:44:35

2015-10-03 006 Logging error messages to 01lo 01413 [main] INFO org.apache.pig.Main-Logging error messages to: / home/hadoop/pig_1443877261408.log

2015-10-03 06 Default bootup file 01lo 01502 [main] INFO org.apache.pig.impl.util.Utils-Default bootup file / home/hadoop/.pigbootup not found

2015-10-03 006 mapred.job.tracker is deprecated 01V 03657 [main] mapred.job.tracker is deprecated. Instead, use mapreduce.jobtracker.address

2015-10-03 006 fs.default.name is deprecated 01V 03657 [main] fs.default.name is deprecated. Instead, use fs.defaultFS

2015-10-03 006 Connecting to hadoop file system at 01lo 03662 [main] INFO org.apache.pig.backend.hadoop.executionengine.HExecutionEngine-Connecting to hadoop file system at: hdfs://host61:9000/

2015-10-03 006 INFO org.apache.hadoop.conf.Configuration.deprecation-mapred.job.tracker is deprecated. Instead, use mapreduce.jobtracker.address

2015-10-03 06 Connecting to map-reduce job tracker at 01lo 05968 [main] INFO org.apache.pig.backend.hadoop.executionengine.HExecutionEngine-Connecting to map-reduce job tracker at: host61:9001

2015-10-03 006 INFO org.apache.hadoop.conf.Configuration.deprecation-fs.default.name is deprecated. Instead, use fs.defaultFS

Grunt > help

Commands:

;-See the PigLatin manual for details: http://hadoop.apache.org/pig

File system commands:

Fs-Equivalent to Hadoop dfs command: http://hadoop.apache.org/common/docs/current/hdfs_shell.html

Diagnostic commands:

Describe [:: cd /

Show the current directory:

Grunt > ls

Hdfs://host61:9000/in

Hdfs://host61:9000/out

Hdfs://host61:9000/user

Grunt > cd / in

Grunt > ls

Hdfs://host61:9000/in/jdk-8u60-linux-x64.tar.gz 181238643

Hdfs://host61:9000/in/mytest1.txt 23

Hdfs://host61:9000/in/mytest2.txt 24

Hdfs://host61:9000/in/mytest3.txt 4

View file information:

Grunt > cat mytest1.txt

This is the first file

Copy the files in hdfs to the operating system:

Grunt > copyToLocal / in/mytest5.txt / home/hadoop/mytest.txt

[hadoop@host63 ~] $ls-l mytest.txt

-rw-r--r--. 1 hadoop hadoop 102 Oct 3 06:23 mytest.txt

Use the sh+ operating system commands to execute commands in the operating system in grunt:

Grunt > sh ls-l / home/hadoop/mytest.txt

-rw-r--r--. 1 hadoop hadoop 102 Oct 3 06:23 / home/hadoop/mytest.txt

7.pig 's data model:

Bag: tabl

Tuple: OK, record

Field: attribute

Pig does not require that different tuple in the same bag have the same number or type of field.

Common statements of 8.pig latin:

LOAD: point out how to load data

FOREACH: scan line by line and do some processing

FILTER: filtering rows

DUMP: display the results to the screen

STORE: save the results to a file

9. Sample data processing:

Generate test files:

[hadoop@host63 tmp] $ls-l / | awk'{if (NR! = 1) print $NF "#" $5}'> / tmp/mytest.txt

[hadoop@host63 tmp] $cat / tmp/mytest.txt

Bin#4096

Boot#1024

Dev#3680

Etc#12288

Home#4096

Lib#4096

Lib64#12288

Lost+found#16384

Media#4096

Mnt#4096

Opt#4096

Proc#0

Root#4096

Sbin#12288

Selinux#0

Srv#4096

Sys#0

Tmp#4096

Usr#4096

Var#4096

Load the file:

Grunt > records = LOAD'/ tmp/mytest.txt' USING PigStorage ('#') AS (filename:chararray,size:int)

2015-10-03 07 fs.default.name is deprecated. 35 main 48 479. Instead, use fs.defaultFS

2015-10-03 07 mapred.job.tracker is deprecated. 35 mapred.job.tracker is deprecated 35 main. Instead, use mapreduce.jobtracker.address

2015-10-03 07 io.bytes.per.checksum is deprecated 35 io.bytes.per.checksum is deprecated 35 main. Instead, use dfs.bytes-per-checksum

2015-10-03 0715 INFO org.apache.hadoop.conf.Configuration.deprecation-mapred.job.tracker is deprecated. Instead, use mapreduce.jobtracker.address

2015-10-03 07 io.bytes.per.checksum is deprecated. 35 main 48 723. Instead, use dfs.bytes-per-checksum

2015-10-03 07 fs.default.name is deprecated. 35 main 48 723. Instead, use fs.defaultFS

Display the file:

Grunt > DUMP records

(bin,4096)

(boot,1024)

(dev,3680)

(etc,12288)

(home,4096)

(lib,4096)

(lib64,12288)

(lost+found,16384)

(media,4096)

(mnt,4096)

(opt,4096)

(proc,0)

(root,4096)

(sbin,12288)

(selinux,0)

(srv,4096)

(sys,0)

(tmp,4096)

(usr,4096)

(var,4096)

Displays the structure of the records:

Grunt > DESCRIBE records

Records: {filename: chararray,size: int}

Filter records:

Grunt > filter_records = FILTER records BY size > 4096

Grunt > DUMP fileter_records

(etc,12288)

(lib64,12288)

(lost+found,16384)

(sbin,12288)

Grunt > DESCRIBE filter_records

Filter_records: {filename: chararray,size: int}

Grouping:

Grunt > group_records = GROUP records BY size

Grunt > DUMP group_records

(0, {(sys,0), (proc,0), (selinux,0)})

(1024, {(boot,1024)})

(3680, {(dev,3680)})

(4096, (var,4096), (usr,4096), (tmp,4096), (srv,4096), (root,4096), (opt,4096), (mnt,4096), (media,4096), (lib,4096), (home,4096), (bin,4096)})

(12288, {(etc,12288), (lib64,12288), (sbin,12288)})

(16384, {(lost+found,16384)})

Grunt > DESCRIBE group_records

Group_records: {group: int,records: {(filename: chararray,size: int)}}

Format:

Grunt > format_records = FOREACH group_records GENERATE group, FLATTEN (records)

De-weight:

Grunt > dis_records = DISTINCT records

Sort:

Grunt > ord_records = ORDER dis_records BY size desc

Take the first three rows of data:

Grunt > top_records=LIMIT ord_records 3

Find the maximum value:

Grunt > max_records = FOREACH group_records GENERATE group,MAX (records.size)

Grunt > DUMP max_records

(0pl 0)

(1024pr 1024)

(3680, 3680)

(4096 and 4096)

(12288, 12288)

(16384pm 16384)

View the execution plan:

Grunt > EXPLAIN max_records

Save the recordset:

Grunt > STORE group_records INTO'/ tmp/mytest_group'

Grunt > STORE filter_records INTO'/ tmp/mytest_filter'

Grunt > STORE max_records INTO'/ tmp/mytest_max'

10.UDF

Pig supports writing UDF using java,python,javascript

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Internet Technology

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report