Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

How to run MapReduce Program in IDEA

2025-04-04 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Share

Shulou(Shulou.com)06/02 Report--

How to run the MapReduce program in IDEA, in view of this problem, this article introduces the corresponding analysis and solution in detail, hoping to help more partners who want to solve this problem to find a more simple and easy way.

1. Idea local independent mode runs MapReduce1.1, decompresses Hadoop, and sets environment variables

Extract Hadoop from the local directory, such as C:\ Hadoop

Set the environment variable:

HADOOP_HOME points to the directory extracted by Hadoop.

HADOOP_USER_NAME: user name, user name under which Hadoop runs (the next section requires remote commit, which is the same as that used by the HDFS cluster)

PATH: add values pointing to HADOOP_HOME\ bin and HADOOP_HOME\ sbin

Important: Windows system: winutils.exe and hadoop.dll are required for Windows to run Hadoop:

Https://github.com/cdarlint/winutils downloads the corresponding Hadoop version of

Copy hadoop.dll to C:\ Windows\ System32

Winutils.exe is copied to HADOOP_HOME\ bin

1.2. New project

The sample project is in src/hadoop

Select a build tool such as Gradle or Maven, and add the following dependencies: version corresponds to the version of Hadoop.

/ / https://mvnrepository.com/artifact/org.apache.hadoop/hadoop-commoncompile group: 'org.apache.hadoop', name:' hadoop-common', version: '3.2.1 scratch / https://mvnrepository.com/artifact/org.apache.hadoop/hadoop-clientcompile group:' org.apache.hadoop', name: 'hadoop-client', version:' 3.2.1'

Log output configuration: project / src/main/resource/log4j.properties

Log4j.appender.A1.Encoding=UTF-8log4j.rootLogger=INFO, stdoutlog4j.appender.stdout=org.apache.log4j.ConsoleAppenderlog4j.appender.stdout.layout=org.apache.log4j.PatternLayoutlog4j.appender.stdout.layout.ConversionPattern=%d {ABSOLUTE} |%-5.5p |%-16.16t |%-32.32c {1} |%-32.32C% 4L |% m% n

New: org.xiao.hadoop.chapter01.WordCount.class:

Public class WordCount {public static class WordCountMapper extends Mapper {private final static IntWritable ONE = new IntWritable (1); private final Text word = new Text () @ Override public void map (Object key, Text value, Context context) throws IOException, InterruptedException {/ / cut the string according to the space, and write the output to the "Hadoop authoritative Guide" P25 StringTokenizer itr = new StringTokenizer (value.toString ()); while (itr.hasMoreTokens ()) {word.set (itr.nextToken ()) Context.write (word, ONE);}} public static class WordCountReducer extends Reducer {private final IntWritable result = new IntWritable (); @ Override public void reduce (Text key, Iterable values, Context context) throws IOException, InterruptedException {int sum = 0; for (IntWritableval: values) {sum + = val.get () } result.set (sum); context.write (key, result);}} public static void main (String [] args) throws Exception {/ / read configuration file Configuration conf = new Configuration (); / / set task name Job job = Job.getInstance (conf, "WordCount") / / set the running jar package, in the form of class: TODO: verify that the jar package job.setJarByClass (WordCount.class) is set directly; / / Map class, job.setMapperClass (WordCountMapper.class) / / the setting of the Reducer class / / Combiner is optional and belongs to the optimization scheme, which is used to find out the results of each Map, and then re-aggregate / / through Reducer to reduce the output of each map. The final result is the same, the third edition of Hadoop authoritative Guide Chinese P35 job.setCombinerClass (WordCountReducer.class); job.setReducerClass (WordCountReducer.class) / / sets the type of output key--value. Hadoop's org.apache.hadoop.io package provides a set of optimized network serialization transport basic types. / / do not directly use Java's embedded type / / Text is equivalent to String job.setOutputKeyClass (Text.class); / / IntWritable is equivalent to Integer job.setOutputValueClass (IntWritable.class); / / set the path to the input file, either directly or by passing in the parameter FileInputFormat.addInputPath (job, new Path ("input/chapter01/WordCount")) / / set the storage path of the output file FileOutputFormat.setOutputPath (job, new Path ("output/chapter01/WordCount")); / / true means to print the operation log of job and Task, and return zero System.exit (job.waitForCompletion (true)? 0: 1) if the normal operation ends;}}

If it is a Windows system or Gradle project, you need to open the Idea settings and set the Gradle settings as follows, otherwise the log will be garbled.

Hadoop input file: input/chapter01/WordCount/word.txt

Hello worldhello hadoophello bigdatahello hadoop and bigdata

Hadoop output folder, which needs to be deleted before output/chapter01/WordCount runs the program.

In WordCount.class, press Ctrl + Shift + F10 to run the program directly. Project running configuration:

Example output: src/hadoop/output/chapter01/WordCount/part-r-00000, no errors.

And 1bigdata 2hadoop 2hello 4world 12, Idea remotely submit MapReduce

Premise completed: Hadoop installation and configuration

Update 2020.05.07: trace the source code and find that this only uses files in the cluster and is not submitted to the cluster. See 2.5 True remote submission.

2.1. On the basis of the previous section, add the following configuration:

File: resource/core-site.xml

Fs.defaultFS hdfs://master:9000

File resource/mapred-site.xml

Mapred.remote.os Linux Remote MapReduce framework's OS, can be either Linux or Windows mapreduce.app-submission.cross-platform true

Hdfs-site.xml and yarn-site.xml can directly copy the configuration files on the cluster.

Install the BigDataTools plug-in

Install Idea's official BigDataTools plug-in and configure to connect to the HDFS cluster. It is convenient to upload, download and delete files.

2.3. Modify the code

The Map input file path can be an absolute path or a relative path.

/ / read the configuration file, automatically read the xmlConfiguration conf of resource = new Configuration (); / / omit other / / set the path of the input file, you can specify directly or pass in the parameter / / new Path (arg [0]) and pass / / Path ("input") through Programmer argument equal to hdfs://master:9000/user/ {HADOOP_USER_NAME} / inputFileInputFormat.addInputPath (job, new Path ("input")) / / set the storage path of the output file FileOutputFormat.setOutputPath (job, new Path ("output"))

Upload input/chapter01/WordCount/word.txt to hdfs://master:9000/user/ {HADOOP_USER_NAME} / input, (above: extract Hadoop and set environment variables)

2.4. Run the project

If the output folder already exists, it needs to be deleted first.

Similarly, press Ctrl + Shift + F10 to run the project, and the results are stored in hdfs://master:9000/user/ {HADOOP_USER_NAME} / output/part-r-00000.

2.5. True remote submission

First, use Gradle to package the code into a jar file, modify the file src/hadoop/build.gradle, and add

Dependencies {/ / omit dependency} / / supports Chinese encoding and annotation tasks.withType (JavaCompile) {options.encoding = "UTF-8"}

Package it into jar using Gradle, click the jar command that is framed on the right, and on the left is the production jar file.

There are two ways to submit jar to remote:

Method 1: where the file src/hadoop/src/main/java/org/xiao/hadoop/chapter01/WordCount.java reads the configuration file

/ / read the configuration file Configuration conf = new Configuration (); conf.set ("mapreduce.job.jar", "D:/Project/BigDateNotes/src/hadoop/build/libs/hadoop-1.0.0.jar")

Or the mapred-site.xml file is added, note that regardless of method one or two, you must specify: mapreduce.framework.name is yarn.

Mapreduce.framework.name yarn mapreduce.job.jar D:/Project/BigDateNotes/src/hadoop/build/libs/hadoop-1.0.0.jar

Just submit and run.

Summary:

Winutils.exe and hadoop.dll are required to run Hadoop under Window

It is recommended to use build tools such as Maven and Gradle to manage Hadoop dependencies

Under Windows, you need to set build and run using and tests run using of Gradle to IDEA (because of Chinese comments and garbled terminal output)

The three configurations of mapred.remote.os,mapreduce.app-submission.cross-platform,mapreduce.job.jar are required for Idea remote submission.

This is the answer to the question about how to run the MapReduce program in IDEA. I hope the above content can be of some help to you. If you still have a lot of doubts to be solved, you can follow the industry information channel to learn more about it.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Internet Technology

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report