In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-04-08 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Servers >
Share
Shulou(Shulou.com)05/31 Report--
This article mainly introduces "what is the use of hadoop's map function". In daily operation, I believe many people have doubts about the use of hadoop's map function. Xiaobian consulted all kinds of materials and sorted out simple and easy-to-use methods of operation. I hope it will be helpful to answer the question of "what is the use of hadoop's map function?" Next, please follow the editor to study!
When a large table associates a small table, you can use hadoop's DistributedCache to cache the small scale in memory, and hadoop distributes the memory to each server that needs map operation for data cleaning and association.
For example, there is such a data user login information login:
1,0,20121213
2,0,20121213
3,1,20121213
4,1,20121213
1,0,20121114
The first column is the user id, the second column is the gender, and the third column is the login time.
You need to replace the user id in the table with the user's name and gender with Chinese characters, and then count the number of login times.
The users table is:
1, Zhang San, hubei
3, Wang Wu, tianjin
4, Zhao Liu, guangzhou
2, Li Si, beijing
The sex table is:
0, male
1, female
The correlation of the dimension table is carried out in the map function, the output is the name, the gender is key, and the login is value.
Public class Mapclass extends Mapper {private Map userMap = new HashMap (); private Map sexMap = new HashMap (); private Text oKey = new Text (); private Text oValue = new Text (); private String [] kv; @ Override protected void setup (Context context) {BufferedReader in = null / / get the file to be cached from the current job try {Path [] paths = DistributedCache.getLocalCacheFiles (context. GetConfiguration ()); String uidNameAddr = null; String sidSex = null For (Path path: paths) {if (path.toString (). Contains ("users")) {in = new BufferedReader (new FileReader (path.toString ()) While (null! = (uidNameAddr = in.readLine ()) {userMap.put (uidNameAddr.split (",",-1) [0], uidNameAddr.split (",-1) [1]) } else if (path.toString () .contains ("sex")) {in = new BufferedReader (new FileReader (path.toString () While (null! = (sidSex = in.readLine ()) {sexMap.put (sidSex.split (",",-1) [0], sidSex.split (",-1) [1]) } catch (IOException e) {e.printStackTrace ();} finally {try {if (in! = null) {in.close () }} catch (IOException e) {e.printStackTrace ();} @ Override protected void map (LongWritable key, Text value, Context context) throws IOException, InterruptedException {kv = value.toString () .split (",") / / map join: filter out unwanted data in the map phase if (userMap.containsKey (kv [0]) & & sexMap.containsKey (kv [1])) {oKey.set (userMap.get (kv [0]) + "," + sexMap.get (kv [1])); oValue.set ("1"); context.write (oKey, oValue);}
Reduce function:
Public class Reduce extends Reducer {private Text oValue = new Text (); @ Override protected void reduce (Text key, Iterable values, Context context) throws IOException, InterruptedException {int sumCount = 0; for (Text val: values) {sumCount + = Integer.parseInt (val.toString ());} oValue.set (String.valueOf (sumCount)); context.write (key, oValue);}}
The main function is:
Public class MultiTableJoin extends Configured implements Tool {@ Override public int run (String [] args) throws Exception {Job job = new Job (getConf (), "MultiTableJoin"); job.setJobName ("MultiTableJoin"); job.setJarByClass (MultiTableJoin.class); job.setMapperClass (Mapclass.class); job.setReducerClass (Reduce.class); job.setInputFormatClass (TextInputFormat.class); job.setOutputFormatClass (TextOutputFormat.class) Job.setOutputKeyClass (Text.class); job.setOutputValueClass (Text.class); String [] otherArgs = new GenericOptionsParser (job.getConfiguration (), args). GetRemainingArgs (); / / We use the address of the first and second parameters as the file path to cache DistributedCache.addCacheFile (new Path (otherArgs [0]). ToUri (), job.getConfiguration ()) DistributedCache.addCacheFile (new Path (otherArgs [1]). ToUri (), job.getConfiguration (); FileInputFormat.addInputPath (job, new Path (otherArgs [2])); FileOutputFormat.setOutputPath (job, new Path (otherArgs [3])); return job.waitForCompletion (true)? 0: 1;} public static void main (String [] arg0) throws Exception {String [] args = new String [4] Args [0] = "hdfs://172.16.0.87:9000/user/jeff/decli/sex"; args [1] = "hdfs://172.16.0.87:9000/user/jeff/decli/users"; args [2] = "hdfs://172.16.0.87:9000/user/jeff/decli/login"; args [3] = "hdfs://172.16.0.87:9000/user/jeff/decli/out" Int res = ToolRunner.run (new Configuration (), new MultiTableJoin (), args); System.exit (res);}}
The output of the calculation is:
Zhang San, male 2
Li Si, male 1
Wang Wu, female 1
Zhao Liu, female 1
At this point, the study on "what is the use of the map function of hadoop" is over. I hope to be able to solve your doubts. The collocation of theory and practice can better help you learn, go and try it! If you want to continue to learn more related knowledge, please continue to follow the website, the editor will continue to work hard to bring you more practical articles!
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 252
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.