Several ways for Hadoop applications to reference third-party jar (1) 07/19 Update SLTechnology News&Howtos

Several ways for Hadoop applications to reference third-party jar (1)

2025-07-19 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Shulou(Shulou.com)06/03 Report--

Although we can develop Hadoop applications under Eclipse, when we refer to third-party jar files, how to run the program under Hadoop cluster after we write the jar package is a problem we must solve in the process of program development. By searching for information, several feasible solutions are summarized here.

Originally intended to also a, but wrote later, found that I am too long-winded, write a little too fine, decided to write separately, written in two good...

The first article: mainly writing Eclipse introduces third-party jars and packages to run on clusters.

The second article mainly writes about how to introduce third-party jars into Hadoop applications, and personal opinions on how to introduce third-party jars in production environments. The article links to "Several ways for Hadoop applications to reference third-party jars (II)"

===========================================================================================

Here, let's first talk about the method of introducing third-party jars under Eclipse, which has something to do with the following jar package. Generally speaking, there are two methods: one method is to directly under the directory of the local disk, and then introduce it; the other method is to create a new lib directory under the Eclipse project root directory, and then introduce it.

The first method of introduction

Here we test it with a simple modification of the WordCount provided by Hadoop. The code is as follows:

package com.hadoop.examples;import java.io.IOException;import java.util.StringTokenizer;import org.apache.hadoop.conf.Configuration;import org.apache.hadoop.fs.Path;import org.apache.hadoop.io.IntWritable;import org.apache.hadoop.io.Text;import org.apache.hadoop.mapreduce.Job;import org.apache.hadoop.mapreduce.Mapper;import org.apache.hadoop.mapreduce.Reducer;import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;import org.apache.hadoop.util.GenericOptionsParser;import com.hadoop.hdfs.OperateHDFS;public class WordCount { public static class TokenizerMapper extends Mapper { private final static IntWritable one = new IntWritable(1); private Text word = new Text(); public void map(Object key, Text value, Context context) throws IOException, InterruptedException { //it does nothing, just to test the introduction of third-party jar, if not found, it will definitely report ClassNotFound exception OperateHDFS s = new OperateHDFS(); StringTokenizer itr = new StringTokenizer(value.toString()); while (itr.hasMoreTokens()) { word.set(itr.nextToken()); context.write(word, one); } } } public static class IntSumReducer extends Reducer { private IntWritable result = new IntWritable(); public void reduce(Text key, Iterable values, Context context) throws IOException, InterruptedException { int sum = 0; for (IntWritableval : values) { sum += val.get(); } result.set(sum); context.write(key, result); } } public static void main(String[] args) throws Exception { Configuration conf = new Configuration(); String[] otherArgs = new GenericOptionsParser(conf, args).getRemainingArgs(); if (otherArgs.length

< 2) { System.err.println("Usage: wordcount [...] "); System.exit(2); } Job job = new Job(conf, "word count"); job.setJarByClass(WordCount.class); job.setMapperClass(TokenizerMapper.class); job.setCombinerClass(IntSumReducer.class); job.setReducerClass(IntSumReducer.class); job.setOutputKeyClass(Text.class); job.setOutputValueClass(IntWritable.class); for (int i = 0; i < otherArgs.length - 1; ++i) { FileInputFormat.addInputPath(job, new Path(otherArgs[i])); } FileOutputFormat.setOutputPath(job, new Path( otherArgs[otherArgs.length - 1])); System.exit(job.waitForCompletion(true) ? 0 : 1); }} 我把一个jar包放在的D盘的mylib目录下面，这个jar是上一篇文章《使用Hadoop提供的API操作HDFS》中的类打成的jar包

Right-click Build Path==> Configure Build Path==> Add External JARs, and find the jar file under the above directory to import it. At this point, if you run the WordCount class under Eclipse, the program executes correctly without problems.

At this time, when packaging the whole project, you can see that there is no chance to select the imported third-party jar package when packaging, only the class to be packaged can be selected. Here, only a WordCount is selected. The packaging process is as follows

At this point, run WordCount_dlib.jar on the Hadoop cluster with the following command:

hadoop jar WordCount_dlib.jar com.hadoop.examples.WordCount input outputdlib

No accident, the program reported ClassNotFound exception, the result is as follows

The second method of introduction

In the root directory of the project (if not the root directory, I did not test whether it is feasible) under the new lib directory (right click on the project, new folder), the name can only be lib, you change to other names, it will not automatically load. Put the jar above under lib, and then right click Build Path==« Configure Build Path ==» Add JARs, find the jar file under lib of the project directory, you can import it. At this point, if you run the WordCount class under Eclipse, the program executes correctly without problems.

At this time, when packaging the whole project, you can see that when packaging, you can choose the jar package under lib, or you can choose the class to be packaged. Here, only one WordCount is selected. The packaging process is as follows

Run WordCount_plib.jar on the Hadoop cluster with the following command:

hadoop jar WordCount_plib.jar com.hadoop.examples.WordCount input outputplib

Program can run normally, the results are shown below,

Why is this happening? Because WordCount_plib.jar has jars inside it, it can automatically load jars under its internal lib directory.

Name If the name of the jar stored is not lib, although you can also type in the jar file under the directory, it will report ClassNotFound exception, I changed the name of lib to mylib and typed WordCount_mylib.jar, and ran the test. The running results are as follows:

This is why I emphasized above that the root directory of the project must be called lib only.

The four jar files involved in the article will be attached as attachments. The list of documents in the annex describes:

Imported third-party jar files: OperateHDFS.jar The first way to import jar package, the jar file: WordCount_dlib.jar The second way to import jar package, the jar file: WordCount_plib.jar The second way to import jar package, but the directory name is not called lib, called mylib, the jar file WordCount_mylib.jar

The article is too long, write the rest of the next one...

Attachment: down.51cto.com/data/2365539

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.