In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-04-05 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >
Share
Shulou(Shulou.com)06/02 Report--
Preface
As I began to write MR programs recently, I used to have a little HIVE, but I didn't have much contact with MR. No matter in principle or in practice, there were some obstacles. Finally, I recorded this process and shared it with you today.
Environmental preparation
Set up the LINUX virtual machine on VM, and install and configure HADOOP2.2.0. Here is the pseudo-distribution of single node.
Install the hadoop plug-in in eclipse
For newcomers to MR like us, it's best to have a local HADOOP runtime environment, which has many benefits:
If every time we finish writing a MR program, we upload it into a JAR package and run it on an online server, then every time MR runs for a very long time, maybe it takes a long time to wait, and the result is not consistent with our expectations, so we have to change the program and start all over again, which will be a little painful!
Running the MR program on our local HADOOP is very fast, just a few seconds, and more importantly, we can
Prepare input files locally to test the logic of MR, which is very convenient for debugging / development programs!
Example and principle analysis
Suppose we have an input file like this:
Cate-a spu-1 1
Cate-a spu-1 2
Cate-a spu-2 3
Cate-a spu-2 4
Cate-a spu-3 5
Cate-a spu-3 6
Cate-a spu-1 7
Cate-a spu-4 8
Cate-a spu-4 9
Cate-a spu-1 8
...
We want to get the sum of cate, spu, and cate and TOP3 of spu.
As shown in the figure above, the running process of MAP/REDUCE is roughly described:
Input file + InputFormat is provided to MAP
It needs to be clear what is the KEY1/VALUE1 provided to MAP? What is the KEY2/VALUE2 that MAP is going to export?
After the MAP is output, the partition operation will be carried out, that is, to decide which reduce the KEY2/VALUE2 is sent to.
The partition is determined by job.setPartitionerClass
Within the same partition, the KEY2 is sorted according to job.setSortComparatorClass
If not, then according to the compareTo method of KEY
Next, in the grouping phase, KEY3 and VALUE iterators are constructed.
Grouping is based on job.setGroupingComparatorClass, which is in the same group as long as the comparator compares the same.
The KEY3/VALUE iterator is handed over to the reduce method
Steps:
Custom KEY
KEY should be serializable, comparable, and you just need to pay attention to implementing WritableComparable.
Focus on the compareTo method.
@ Overridepublic int compareTo (Cate2SpuKey that) {System.out.println ("start sorting KEY..."); if (cate2.equals (that.getCate2 () {return spu.compareTo (that.getSpu ());} return cate2.compareTo (that.getCate2 ());}
Zoning
Partition, which is the first comparison of KEY, extends Partitioner and provide getPartition.
Here is based on the cate partition.
Grouping
It is important to note that the grouping class must provide a constructor and override the
Public int compare (WritableComparable W1, WritableComparable W2). Here they are grouped according to cate,spu.
Through the above, the SUM (counts) value of the spu score of Cate can be obtained.
Through the eclipse hadoop plug-in, it is convenient for us to upload test files to HDFS, browse and delete HDFS files, and what is more convenient is to run / debug MR programs like ordinary JAVA programs (no longer need to be packed into JAR packages), so that we can track every step of MR, which is very convenient for logic testing.
So how to score cate and TOP3 of spu?
We only need to take the output file of the previous MR as the input of another MR, and take cate+counts as KEY, spu as VALUE, according to cate partition, grouping, sorting: cate in the same case, reverse order according to counts
Finally, you can take TOP3 in the reduce stage.
@ Overrideprotected void reduce (Cate2CountsKey key, Iterable values,Reducer.Context context) throws IOException, InterruptedException {System.out.println ("reduce..."); System.out.println ("before VALUES iteration." Key: "+ key.toString (); System.out.println (" before VALUES iteration. " Key: "+ key.getCounts (); int top = 3 for (Text t: values) {if (top > 0) {System.out.println (" VALUES iteration. Key: "+ key.toString (); System.out.println (" VALUES iteration... Key: "+ key.getCounts (); context.write (new Text (key.getCate2 () +"\ t "+ t.toString ()), new Text (key.getCounts () +")); top--;}} System.out.println ("reduce over...");}
So up to now, grouping and fetching TOP is complete.
A question: what exactly is KEY in the reduce phase?
In the MR taking TOP3 in the above example, we use cate+counts as KEY,spu as VALUE.
Cate is used as the basis for partition and grouping, and the sort is in reverse order according to the counts under the same cate. As shown in the following figure:
So what is the KEY in the reduce method?
Spu1,spu4,spu3... Is in VALUES, so what is the corresponding KEY of this iterator?
Is that cate+42? Or something?
Will this KEY change during the VALUES iteration?
We can take a look at the contents of the console printout in ECLIPSE:
From the point of view of printing, the following conclusions can be analyzed:
After grouping, the KEY to be processed by the reduce method is the first KEY of all KEY in the same group, and during the VALUES iteration, the KEY does not re-NEW, but uses the SETTER reflection to reset the attribute values, so that the KEY obtained in the VALUES iteration is the corresponding KEY.
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.