What is the method of using the map function of hadoop 07/04 Update SLTechnology News&Howtos

What is the method of using the map function of hadoop

2025-07-04 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Servers >

Shulou(Shulou.com)05/31 Report--

This article mainly introduces "what is the use of hadoop's map function". In daily operation, I believe many people have doubts about the use of hadoop's map function. Xiaobian consulted all kinds of materials and sorted out simple and easy-to-use methods of operation. I hope it will be helpful to answer the question of "what is the use of hadoop's map function?" Next, please follow the editor to study!

When a large table associates a small table, you can use hadoop's DistributedCache to cache the small scale in memory, and hadoop distributes the memory to each server that needs map operation for data cleaning and association.

For example, there is such a data user login information login:

1,0,20121213

2,0,20121213

3,1,20121213

4,1,20121213

1,0,20121114

The first column is the user id, the second column is the gender, and the third column is the login time.

You need to replace the user id in the table with the user's name and gender with Chinese characters, and then count the number of login times.

The users table is:

1, Zhang San, hubei

3, Wang Wu, tianjin

4, Zhao Liu, guangzhou

2, Li Si, beijing

The sex table is:

0, male

1, female

The correlation of the dimension table is carried out in the map function, the output is the name, the gender is key, and the login is value.

Public class Mapclass extends Mapper {private Map userMap = new HashMap (); private Map sexMap = new HashMap (); private Text oKey = new Text (); private Text oValue = new Text (); private String [] kv; @ Override protected void setup (Context context) {BufferedReader in = null / / get the file to be cached from the current job try {Path [] paths = DistributedCache.getLocalCacheFiles (context. GetConfiguration ()); String uidNameAddr = null; String sidSex = null For (Path path: paths) {if (path.toString (). Contains ("users")) {in = new BufferedReader (new FileReader (path.toString ()) While (null! = (uidNameAddr = in.readLine ()) {userMap.put (uidNameAddr.split (",",-1) [0], uidNameAddr.split (",-1) [1]) } else if (path.toString () .contains ("sex")) {in = new BufferedReader (new FileReader (path.toString () While (null! = (sidSex = in.readLine ()) {sexMap.put (sidSex.split (",",-1) [0], sidSex.split (",-1) [1]) } catch (IOException e) {e.printStackTrace ();} finally {try {if (in! = null) {in.close () }} catch (IOException e) {e.printStackTrace ();} @ Override protected void map (LongWritable key, Text value, Context context) throws IOException, InterruptedException {kv = value.toString () .split (",") / / map join: filter out unwanted data in the map phase if (userMap.containsKey (kv [0]) & & sexMap.containsKey (kv [1])) {oKey.set (userMap.get (kv [0]) + "," + sexMap.get (kv [1])); oValue.set ("1"); context.write (oKey, oValue);}

Reduce function:

Public class Reduce extends Reducer {private Text oValue = new Text (); @ Override protected void reduce (Text key, Iterable values, Context context) throws IOException, InterruptedException {int sumCount = 0; for (Text val: values) {sumCount + = Integer.parseInt (val.toString ());} oValue.set (String.valueOf (sumCount)); context.write (key, oValue);}}

The main function is:

Public class MultiTableJoin extends Configured implements Tool {@ Override public int run (String [] args) throws Exception {Job job = new Job (getConf (), "MultiTableJoin"); job.setJobName ("MultiTableJoin"); job.setJarByClass (MultiTableJoin.class); job.setMapperClass (Mapclass.class); job.setReducerClass (Reduce.class); job.setInputFormatClass (TextInputFormat.class); job.setOutputFormatClass (TextOutputFormat.class) Job.setOutputKeyClass (Text.class); job.setOutputValueClass (Text.class); String [] otherArgs = new GenericOptionsParser (job.getConfiguration (), args). GetRemainingArgs (); / / We use the address of the first and second parameters as the file path to cache DistributedCache.addCacheFile (new Path (otherArgs [0]). ToUri (), job.getConfiguration ()) DistributedCache.addCacheFile (new Path (otherArgs [1]). ToUri (), job.getConfiguration (); FileInputFormat.addInputPath (job, new Path (otherArgs [2])); FileOutputFormat.setOutputPath (job, new Path (otherArgs [3])); return job.waitForCompletion (true)? 0: 1;} public static void main (String [] arg0) throws Exception {String [] args = new String [4] Args [0] = "hdfs://172.16.0.87:9000/user/jeff/decli/sex"; args [1] = "hdfs://172.16.0.87:9000/user/jeff/decli/users"; args [2] = "hdfs://172.16.0.87:9000/user/jeff/decli/login"; args [3] = "hdfs://172.16.0.87:9000/user/jeff/decli/out" Int res = ToolRunner.run (new Configuration (), new MultiTableJoin (), args); System.exit (res);}}

The output of the calculation is:

Zhang San, male 2

Li Si, male 1

Wang Wu, female 1

Zhao Liu, female 1

At this point, the study on "what is the use of the map function of hadoop" is over. I hope to be able to solve your doubts. The collocation of theory and practice can better help you learn, go and try it! If you want to continue to learn more related knowledge, please continue to follow the website, the editor will continue to work hard to bring you more practical articles!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.