In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-02-28 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >
Share
Shulou(Shulou.com)06/03 Report--
Introduction to Pig:
Pig is a sub-project of the Apache project. Pig provides a platform to support large-scale data analysis. The outstanding feature of Pig is that its structure can withstand a large number of parallel tasks, which enables it to deal with large-scale data sets.
Pig features:
Pig simplifies the development of MapReduce tasks
Pig can be regarded as the client software of Hadoop, and it can connect to the Hadoop cluster for data analysis.
Pig is convenient for users who are not familiar with Java. PigLatin, a relatively simple data flow-oriented language similar to SQL, is used for data processing.
PigLatin can perform sorting, filtering, summation, grouping, association and other common operations, and can also customize functions. It is a lightweight scripting language for data analysis and processing.
Pig can be thought of as a PigLatin to MapReduce mapper
When Pig runs in MapReduce mode, it accesses a Hadoop cluster and the location where HDFS is installed. At this time, Pig will automatically allocate and reclaim the cluster.
Pig data schema: relation (relational), bag (package), tuple (tuple), field (field, column)
Relation: a tuple with the same field (column) is called a relationship
Bag (package): similar to tables in a relational database, contains multiple tuple
Tuple (tuple): equivalent to rows in a relational database, unlike relational databases, tuple does not require each row to have the same structure
Field (fields, columns): similar to columns in relational databases, except that tables can be nested in field, while tables cannot be nested in columns in relational databases
Installation and configuration of Pig:
Installation: extract the installation package and add environment variables
Tar-zxvf pig-0.17.0.tar.gz-C ~ / app
Vim / .bash_profix
PIG_HOME= PIG_HOME=/app/pig-0.17.0
Export PIG_HOME
PATH=$PIG_HOME/bin:$PATH
Export PATH
Pig has two operating modes: local mode: manipulating Linux files
Startup mode: pig-x local
Cluster mode: link to HDFS
PIG_CLASSPATH=/app/hadoop-2.7.3/etc/hadoop
Export PIG_CLASSPATH
Start command: pig
Pig operation Linux command:
Sh is followed by the linux command to directly manipulate files in linux.
Ls cd cat mkdir pwd operation HDFS
CopyFromLocal from linux system copy files to HDFS
CopyToLocal from HDFS copy files to linux system
Register define uses Pig custom functions
PigLatin statement:
-> HistoryServer using Hadoop is required
Mr-jobhistory-daemon.sh start historyserver
Address: http://192.168.10.100:19888/jobhistory
-> commonly used PigLatin statements
Load loads data into bag (table)
Foreach is equivalent to a loop, traversing every piece of data in bag
Filter is equivalent to where.
Group by grouping
Join connection
Generate extraction column
Union/intersect set operation
Output: dump prints directly to the screen
Store output to HDFS
For example: 7654 MARTINJERES SALESMANWER 7698 MALESMANL 1981CHANGUPIN28 1250 1400 MATING 30
Load employee data to bag (table)
Emp = load'/ input/table/emp.csv' using PigStorage (',')
As (empno:int, ename:chararray, job:chararray, mgr:int, hiredate:chararray, sal:int, comm:int, deptno:int)
Dept = load'/ scott/dept.csv' using PigStorage (',') as (deptno:int,dname:chararray,loc:chararray)
View table structure: describe emp
Query employee information: employee number, name, salary
SQL statement: select empno, ename, sal from emp
PL statement: emp = foreach emp generate empno, ename, sal
Output to screen: dump emp
Query employee information, sorted by monthly salary:
SQL statement: select * from emp order by sal
PL statement: emp = order emp by sal
Grouping: seek the maximum salary of each department
SQL statement: select deptno, max (sql) from emp group by deptno
PL statement: need to divide into two parts
1. Grouping
Emp_group = group emp by deptno
two。 Seek the maximum value of each department
Max_sal = foreach emp_group generate group, MAX (emp.sal)
Enquire about the staff of Department 10:
SQL statement: select * from emp where deptno = 10
PL statement: deptno_10 = filter emp by deptno==10
Multi-table query: employee name, department name
SQL statement: select e.ename d.dname from emp e, dept d where e.ename=d.dname
PL statement: implemented in two parts
1. Extract two fields from two tables and put them into one table
Newtable = join dept by deptno, emp by ename
two。 Traverse the extracted table to extract the employee name of the employee table and the department name of the department table
Table = foreach newtable generate dept:: dname, emp:: ename
Set operation: relational database Oracle: each collection participating in the set operation must have the same number of columns and the same type
Inquire about the employees of departments 10 and 20
SQL statement: select * from emp where deptno=10
Union select * from emp where deptno=20
PL statement: emp10 = filter emp by deptno==10
Emp20 = filter dept by deptno==20
Emp10_20 = union emp10, emp20
Use PL to implement WordCount:
① load data
Mydata = load'/ data/data.txt' as (line:chararray)
② splits a string into words
Words = foreach mydata generate flatten (TOKENIZE (line)) as word
③ groups words
Grpd = group words by word
④ counts the number of words in each group
Cntd = foreach grpd generate group,COUNT (words)
⑤ print result
Dump cntd
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.