Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

What's the use of hadoop datajoin?

2025-01-16 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Servers >

Share

Shulou(Shulou.com)05/31 Report--

This article mainly explains "what's the use of hadoop datajoin". Interested friends might as well take a look. The method introduced in this paper is simple, fast and practical. Let's let the editor take you to learn "what's the use of hadoop datajoin"?

Reduce side join of hadoop datajoin

Hadoop provides a jar package called datajoin to solve the problem of associating two tables. Jar is located at / hadoop/contrib/datajoin to bring in the jar package for development.

Several concepts involved:

1.Data Source: basically similar to a table in a relational database, in the form of: (CSV format in the example)

Customers Orders

1memStephanie Leung,555-555-5555 3meme A Magi 12.95pr 02lim Junmuri 2008

2futher Edward Kim,123-456-7890 1 meme Brecinct 88.25pr 20murmur2008

3Jose Madriz,281-330-8004 2pm CJM 32.00Jose Jose 2007 30m Novmi 2007

4meme David Stork,408-555-0000 3meme Dpeng 25.02pr 22lijanlie2009

2.Tag: because the record type (Customers or Orders) is separated from the record itself, marking a Record ensures that special metadata exists consistently in the record. For this purpose, we will mark each record with its own Data source name.

3.Group Key:Group Key is similar to a join key in a relational database. In our example, group key is Customer ID (3 in the first column). Because the datajoin package allows users to customize group key, it is more general and common than join key in relational databases.

Use the following sample data

Customers-20140716

1last Stephanie Leung,555-555-5555

2. Edward Kim,123-456-7890

3Jose Madriz,281-330-8004

4. David Stork,408-555-0000

Orders-20140716

3Permanent A 12.95jue 02Murray 2008

1meme BJE 88.25pr 20m Maymuri 2008

2phathc, 32.00pr. 30, Novmuri 2007.

3This is 25.02, 25.02, and 22 Janmuri 2009.

Please look at the flow chart

The first part, custom data types. The data type mainly consists of two parts: tag and data,tag. The labels on the data are used to indicate which file the data comes from. Data is a data record.

The above code:

Public class TaggedWritable extends TaggedMapOutput {private Text data; public TaggedWritable () {this.tag = new Text (""); this.data = new Text ("");} public TaggedWritable (Text data) {this.data = data;} @ Override public void readFields (DataInput in) throws IOException {this.tag.readFields (in); this.data.readFields (in) } @ Override public void write (DataOutput out) throws IOException {this.tag.write (out); this.data.write (out);} @ Override public Text getData () {return data;}}

The second part, map function

Public class Mapclass extends DataJoinMapperBase {@ Override protected Text generateGroupKey (TaggedMapOutput aRecord) {String line = aRecord.getData () .toString (); String [] tokens = line.split (","); String groupKey = tokens [0]; return new Text (groupKey);} @ Override protected Text generateInputTag (String inputFile) {String datasource = inputFile.split ("-") [0]; return new Text (datasource) } @ Override protected TaggedWritable generateTaggedMapOutput (Object value) {TaggedWritable retv = new TaggedWritable (new Text (value.toString (); retv.setTag (this.inputTag); return retv;}}

Pay special attention to protected TaggedWritable generateTaggedMapOutput (Object value) in the map function. The return type of this method is the type you defined in the first step.

The third part, reduce

Public class Reduce extends DataJoinReducerBase {@ Override protected TaggedMapOutput combine (Object [] tags, Object [] values) {if (tags.length < 2) {return null;} String joinedStr = ""; for (int iTuno; I 0) {joinedStr + = ",";} TaggedWritable tw = (TaggedWritable) values [I] String line= tw.getData (). ToString (); System.out.println ("line=:" + line); String [] tokens = line.split (",", 2); joinedStr + = tokens [1];} TaggedWritable retv = new TaggedWritable (new Text (joinedStr)); retv.setTag ((Text) tags [0]); return retv;}}

The reduce process will combine the primary key with the data output, and you do not need to write the primary key in the string you spliced.

Public class Datajoin extends Configured implements Tool {@ Override public int run (String [] args) throws Exception {Configuration conf = this.getConf (); JobConf job = new JobConf (conf, Datajoin.class); job.setJarByClass (Datajoin.class); Path in = new Path ("hdfs://172.16.0.87:9000/user/jeff/datajoin/") Path out = new Path ("hdfs://172.16.0.87:9000/user/jeff/datajoin/out"); FileInputFormat.setInputPaths (job, in); FileOutputFormat.setOutputPath (job, out); job.setJobName ("DataJoin"); job.setMapperClass (Mapclass.class); job.setReducerClass (Reduce.class); job.setInputFormat (TextInputFormat.class) Job.setOutputFormat (TextOutputFormat.class); job.setMapOutputKeyClass (Text.class); job.setMapOutputValueClass (TaggedWritable.class); job.setOutputKeyClass (Text.class); job.setOutputValueClass (Text.class); job.set ("mapred.textoutputformat.separator", ","); JobClient.runJob (job); return 0 } public static void main (String [] args) throws Exception {int res = ToolRunner.run (new Configuration (), new Datajoin (), args); System.exit (res);}}

The output after running the mapreduce task is:

1 meme Stephanie Leung,555-555-555 555 people's Republic of China 88.25 people's Republic of China

2futher Edward Kim,123-7890pr 32.00pr 30m Novlyle 2007

3Jose Madriz,281-330-8004 Magi 12.95 Jose 2008

3Jose Madriz,281-330-8004 DPY 25.02 Jose Jose-330-8004 Jose Jose 2009

You can control the output form of the function, left or right, in the combin function of reduce.

At this point, I believe you have a deeper understanding of "what is the use of hadoop datajoin"? you might as well do it in practice. Here is the website, more related content can enter the relevant channels to inquire, follow us, continue to learn!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Servers

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report