Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

Pig installation instruction

2025-02-28 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Share

Shulou(Shulou.com)06/03 Report--

Introduction to Pig:

Pig is a sub-project of the Apache project. Pig provides a platform to support large-scale data analysis. The outstanding feature of Pig is that its structure can withstand a large number of parallel tasks, which enables it to deal with large-scale data sets.

Pig features:

Pig simplifies the development of MapReduce tasks

Pig can be regarded as the client software of Hadoop, and it can connect to the Hadoop cluster for data analysis.

Pig is convenient for users who are not familiar with Java. PigLatin, a relatively simple data flow-oriented language similar to SQL, is used for data processing.

PigLatin can perform sorting, filtering, summation, grouping, association and other common operations, and can also customize functions. It is a lightweight scripting language for data analysis and processing.

Pig can be thought of as a PigLatin to MapReduce mapper

When Pig runs in MapReduce mode, it accesses a Hadoop cluster and the location where HDFS is installed. At this time, Pig will automatically allocate and reclaim the cluster.

Pig data schema: relation (relational), bag (package), tuple (tuple), field (field, column)

Relation: a tuple with the same field (column) is called a relationship

Bag (package): similar to tables in a relational database, contains multiple tuple

Tuple (tuple): equivalent to rows in a relational database, unlike relational databases, tuple does not require each row to have the same structure

Field (fields, columns): similar to columns in relational databases, except that tables can be nested in field, while tables cannot be nested in columns in relational databases

Installation and configuration of Pig:

Installation: extract the installation package and add environment variables

Tar-zxvf pig-0.17.0.tar.gz-C ~ / app

Vim / .bash_profix

PIG_HOME= PIG_HOME=/app/pig-0.17.0

Export PIG_HOME

PATH=$PIG_HOME/bin:$PATH

Export PATH

Pig has two operating modes: local mode: manipulating Linux files

Startup mode: pig-x local

Cluster mode: link to HDFS

PIG_CLASSPATH=/app/hadoop-2.7.3/etc/hadoop

Export PIG_CLASSPATH

Start command: pig

Pig operation Linux command:

Sh is followed by the linux command to directly manipulate files in linux.

Ls cd cat mkdir pwd operation HDFS

CopyFromLocal from linux system copy files to HDFS

CopyToLocal from HDFS copy files to linux system

Register define uses Pig custom functions

PigLatin statement:

-> HistoryServer using Hadoop is required

Mr-jobhistory-daemon.sh start historyserver

Address: http://192.168.10.100:19888/jobhistory

-> commonly used PigLatin statements

Load loads data into bag (table)

Foreach is equivalent to a loop, traversing every piece of data in bag

Filter is equivalent to where.

Group by grouping

Join connection

Generate extraction column

Union/intersect set operation

Output: dump prints directly to the screen

Store output to HDFS

For example: 7654 MARTINJERES SALESMANWER 7698 MALESMANL 1981CHANGUPIN28 1250 1400 MATING 30

Load employee data to bag (table)

Emp = load'/ input/table/emp.csv' using PigStorage (',')

As (empno:int, ename:chararray, job:chararray, mgr:int, hiredate:chararray, sal:int, comm:int, deptno:int)

Dept = load'/ scott/dept.csv' using PigStorage (',') as (deptno:int,dname:chararray,loc:chararray)

View table structure: describe emp

Query employee information: employee number, name, salary

SQL statement: select empno, ename, sal from emp

PL statement: emp = foreach emp generate empno, ename, sal

Output to screen: dump emp

Query employee information, sorted by monthly salary:

SQL statement: select * from emp order by sal

PL statement: emp = order emp by sal

Grouping: seek the maximum salary of each department

SQL statement: select deptno, max (sql) from emp group by deptno

PL statement: need to divide into two parts

1. Grouping

Emp_group = group emp by deptno

two。 Seek the maximum value of each department

Max_sal = foreach emp_group generate group, MAX (emp.sal)

Enquire about the staff of Department 10:

SQL statement: select * from emp where deptno = 10

PL statement: deptno_10 = filter emp by deptno==10

Multi-table query: employee name, department name

SQL statement: select e.ename d.dname from emp e, dept d where e.ename=d.dname

PL statement: implemented in two parts

1. Extract two fields from two tables and put them into one table

Newtable = join dept by deptno, emp by ename

two。 Traverse the extracted table to extract the employee name of the employee table and the department name of the department table

Table = foreach newtable generate dept:: dname, emp:: ename

Set operation: relational database Oracle: each collection participating in the set operation must have the same number of columns and the same type

Inquire about the employees of departments 10 and 20

SQL statement: select * from emp where deptno=10

Union select * from emp where deptno=20

PL statement: emp10 = filter emp by deptno==10

Emp20 = filter dept by deptno==20

Emp10_20 = union emp10, emp20

Use PL to implement WordCount:

① load data

Mydata = load'/ data/data.txt' as (line:chararray)

② splits a string into words

Words = foreach mydata generate flatten (TOKENIZE (line)) as word

③ groups words

Grpd = group words by word

④ counts the number of words in each group

Cntd = foreach grpd generate group,COUNT (words)

⑤ print result

Dump cntd

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Internet Technology

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report