Building Hadoop2.7.2 Native Mode Development Environment by Windows8.1+Eclipse 04/22 Update SLTechnology News&Howtos

Building Hadoop2.7.2 Native Mode Development Environment by Windows8.1+Eclipse

2025-04-22 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Shulou(Shulou.com)06/02 Report--

The following describes how to build a local mode development environment for hadoop2.7.2 on Windows8.1 to prepare for the later development of mapreduce.

Before building the development environment, we first choose the development tools, that is, we are all familiar with Eclipse (I use the eclipse4.4.2 version this time). Eclipse provides a plug-in for hadoop, through which we can write mapreduce in eclipse. However, this plug-in may need to be compiled as the version of hadoop or eclipse is upgraded. Therefore, before we develop, it is very important to learn to compile the hadoop plug-in for eclipse. Compiling the eclipse plug-in uses the ant tool, and the ant tool is outside the scope of this introduction.

1. First obtain the hadoop2x-eclipse-plugin plug-in through sourcetree.

1.1.The plug-in address is downloaded on github: https://github.com/winghc/hadoop2x-eclipse-plugin.git

1.2. Decompress the downloaded plug-in hadoop2x-eclipse-plugin-master.zip on the local disk, and the directory structure after decompression is as follows:

Then modify the build.xml file in the F:\ Hadoop\ eclipsechajian\ hadoop2x-eclipse-plugin-master\ src\ contrib\ eclipse-plugin directory

Since the website is compiled based on the hadoop2.6 version, version 2.7.2 needs to be modified for build.xml as follows:

Find the tag, there is a pile of sub-element under this element, and change this paragraph to

And add two new element:

The above jar packages need to be used when compiling hadoop2.7.2 eclipse plug-ins, and an error will be reported if they are not added, so we will add them before ant compilation.

1.4. then find the tag and write the package just copy to the Bundle-ClassPath of the mainfest.mf file when ant is built:

Lib/servlet-api-$ {servlet-api.version} .jar

Lib/commons-io-$ {commons-io.version} .jar

And replace lib/htrace-core-$ {htrace.version} .jar with lib/htrace-core-$ {htrace.version}-incubating.jar

1.5. Modify the file\ hadoop2x-eclipse-plugin\ src\ ivy\ libraries.properties, which configures the version of each jar package needed to build ant, as well as the version of building hadoop. Since the downloaded plug-in compiles hadoop2.6.0, we need to modify the following configuration and change the following properties and values to correspond to the version of hadoop2.7.2 and the jar package of the current environment:

Hadoop.version=2.7.2

Apacheant.version=1.9.7

Commons-collections.version=3.2.2

Commons-httpclient.version=3.1

Commons-logging.version=1.1.3

Commons-io.version=2.4

Slf4j-api.version=1.7.10

Slf4j-log4j12.version=1.7.10

In fact, when ant is built, the jar package version (\ hadoop-2.7.2\ share\ hadoop\ common) in the local hadoop2.7.2 directory is selected, so as long as the version number is changed to the corresponding version number, it can be shown below:

1.6. finally modify the file\ hadoop2x-eclipse-plugin\ ivy\ libraries.properties. The version of the file is the same as the one above, but there is another version that needs to be modified.

The version of htrace.version should be changed to 3.1.0.

Then cd to the F:\ Hadoop\ eclipsechajian\ hadoop2x-eclipse-plugin-master\ src\ contrib\ eclipse-plugin directory

Execute the following command:

Ant jar-Dversion=2.7.2-Declipse.home=D:\ eclipse_hadoop-Dhadoop.home=F:\ Hadoop\ hadoop-2.7.2

Explain this command:-Dversion refers to the version of the plug-in, Declipse.home refers to the installation directory of eclipse, and-Dhadoop.home refers to the installation directory of hadoop-2.7.2 in the local file.

After the command is executed successfully, it can be found in the\ hadoop2x-eclipse-plugin\ build\ contrib\ eclipse-plugin directory.

Hadoop-eclipse-plugin-2.7.2.jar package, this package is the compiled eclipse hadoop2.7.2 plug-in, put this plug-in into the plugins directory of the eclipse installation directory, we can go to eclipse and find a view called mapreduce, we can start to try to write mapreduce programs.

Download eclipse and configure JDK

Go to http://www.eclipse.org/downloads/ to download the version you need, what we download here is the win64 bit version. Extract it directly into the directory. Make simple settings and choose the version of jdk according to your development needs.

1.9. set up the hadoop plug-in

Select window-preferences from the eclipse menu to open the settings menu

At this point, the Eclipse development environment is completed, and the running environment of hadoop will be built below. The hadoop project needs to submit the program to the hadoop running environment to run.

2. After the Eclipse plug-in has been compiled, you need to install Hadoop2.7.2

It is relatively troublesome to build a hadoop environment, and it is necessary to install a virtual machine or cygwin, but by checking the official information and groping, we have built a local model on window, which does not require virtual machines and cygwin dependencies, and the official website clearly points out that cygwin no longer supports hadoop2.x.

Build Hadoop local mode running environment under Windows reference: http://wiki.apache.org/hadoop/Hadoop2OnWindows

Configure the windows environment as follows:

Java JDK: I use 1.8.Configuring JAVA_HOME. If installed by default, it will be installed in C:\ Program Files\ Java\ jdk1.8.0_51. There is a space in this directory. An error will be reported when starting hadoop, JAVA_HOME is incorrect. At this point, you need to change the JAVA_ home value of the environment variable to: C:\ Progra~1\ Java\ jdk1.8.0_51,Program Files can be replaced by Progra~.

2.2. Hadoop environment variable: create a new HADOOP_HOME and point to the hadoop decompression directory, such as F:\ Hadoop\ hadoop-2.7.2. Then add:% HADOOP_HOME%\ bin; to the path environment variable.

2.3.The Hadoop dependent library: winutils related. Winutils support, hadoop.dll and other files are required for hadoop to run on windows. Download address: http://download.csdn.net/detail/fly_leopard/9503059

Note that files such as hadoop.dll do not conflict with hadoop. To avoid dependency errors, put hadoop.dll into the next copy of c:/windows/System32, and then restart the computer.

2.4.The hadoop environment test:

From a cmd window, switch to hadoop-2.7.2\ bin, and execute the hadoop version command, which is displayed as follows:

Hadoop basic file configuration: hadoop configuration file is located under: hadoop-2.7.2\ etc\ hadoop

Core-site.xml 、 hdfs-site.xml 、 mapred-site.xml 、 yarn-site.xml

Core-site.xml:

Fs.default.name

Hdfs://0.0.0.0:19000

Hdfs-site.xml:

Dfs.replication

one

Dfs.namenode.name.dir

File:/Hadoop/hadoop-2.7.2/data/dfs/namenode

Dfs.datanode.data.dir

File:/Hadoop/hadoop-2.7.2/data/dfs/datanode

Mapred-site.xml:

Mapreduce.job.user.name

% USERNAME%

Mapreduce.framework.name

Yarn

Yarn.apps.stagingDir

/ user/%USERNAME%/staging

Mapreduce.jobtracker.address

Local

Where% USERNAME% is the user name under which your computer executes hadoop.

Yarn-site.xml:

Yarn.server.resourcemanager.address

0.0.0.0:8020

Yarn.server.resourcemanager.application.expiry.interval

60000

Yarn.server.nodemanager.address

0.0.0.0:45454

Yarn.nodemanager.aux-services

Mapreduce_shuffle

Yarn.nodemanager.aux-services.mapreduce.shuffle.class

Org.apache.hadoop.mapred.ShuffleHandler

Yarn.server.nodemanager.remote-app-log-dir

/ app-logs

Yarn.nodemanager.log-dirs

/ dep/logs/userlogs

Yarn.server.mapreduce-appmanager.attempt-listener.bindAddress

0.0.0.0

Yarn.server.mapreduce-appmanager.client-service.bindAddress

0.0.0.0

Yarn.log-aggregation-enable

True

Yarn.log-aggregation.retain-seconds

-1

Yarn.application.classpath

% HADOOP_CONF_DIR%,%HADOOP_HOME%/share/hadoop/common/*,%HADOOP_HOME%/share/hadoop/common/lib/*,%HADOOP_HOME%/share/hadoop/hdfs/*,%HADOOP_HOME%/share/hadoop/hdfs/lib/*,%HADOOP_HOME%/share/hadoop/mapreduce/*,%HADOOP_HOME%/share/hadoop/mapreduce/lib/*,%HADOOP_HOME%/share/hadoop/yarn/*,%HADOOP_HOME%/share/hadoop/yarn/lib/*

Where:% HADOOP_CONF_DIR% is the installation path of hadoop; the path of yarn.nodemanager.log-dirs configuration item is created in the same directory of your hadoop installation path, for example, my hadoop is on F disk, so the directory of this configuration item is created on F disk.

2.6. Format system files:

Execute hdfs namenode-format under hadoop-2.7.2/bin

Wait for the execution to be completed. Do not repeat format, which is prone to exception.

2.7.After formatting, execute start-dfs.cmd under hadoop-2.7.2/sbin to start hadoop

Visit: http://localhost:50070

Execute start-yarn.cmd under hadoop-2.7.2/sbin to start yarn, visit http://localhost:8088 to view resources and node management

This means that the hadoop2.7.2 runtime environment has been built.

3. Combine Eclipse to create MR project and use local system to develop hadoop local mode.

I am using Eclipse to develop using a local file system, not as much as using HDFS,HDFS in a fully distributed environment, so there is no need to introduce it here. In addition, there are not many articles about using Eclipse to develop DFS Locations (this does not affect development). This is used to view the HDFS file system on the cluster (as I understand it so far). Anyway, I used this to connect to the hadoop (local mode) launched on the local windows8.1. I never practiced it successfully. I reported the following error:

Java.lang.NoClassDefFoundError: org/apache/htrace/SamplerBuilder

At org.apache.hadoop.hdfs.DFSClient. (DFSClient.java:635)

At org.apache.hadoop.hdfs.DFSClient. (DFSClient.java:619)

At org.apache.hadoop.hdfs.DistributedFileSystem.initialize (DistributedFileSystem.java:149)

At org.apache.hadoop.fs.FileSystem.createFileSystem (FileSystem.java:2653)

At org.apache.hadoop.fs.FileSystem.access$200 (FileSystem.java:92)

At org.apache.hadoop.fs.FileSystem$Cache.getInternal (FileSystem.java:2687)

At org.apache.hadoop.fs.FileSystem$Cache.get (FileSystem.java:2669)

At org.apache.hadoop.fs.FileSystem.get (FileSystem.java:371)

At org.apache.hadoop.fs.FileSystem.get (FileSystem.java:170)

At org.apache.hadoop.eclipse.server.HadoopServer.getDFS (HadoopServer.java:478)

At org.apache.hadoop.eclipse.dfs.DFSPath.getDFS (DFSPath.java:146)

At org.apache.hadoop.eclipse.dfs.DFSFolder.loadDFSFolderChildren (DFSFolder.java:61)

At org.apache.hadoop.eclipse.dfs.DFSFolder$1.run (DFSFolder.java:178)

At org.eclipse.core.internal.jobs.Worker.run (Worker.java:54)

This issue has been resolved because the corresponding plug-in jar package is missing; the following three plug-ins need to be placed in the $eclipse_home\ plugins\ directory.

All right, let's move on to the introduction to developing hadoop using Eclipse

3.1. after the above environment is built, let's talk about how to develop it. We use hadoop's wordcount to test it.

Create a mr project

Set the project name

Create a class

Set class properties

After the creation is completed, copy the contents of the WordCount.java file in the hadoop-2.7.2-src\ hadoop-mapreduce-project\ hadoop-mapreduce-examples\ src\ main\ java\ org\ apache\ hadoop\ examples directory to the file you just created.

3.2 next create a configuration environment

Create a Source Floder named resources in the project, and then copy all the configuration files under F:\ Hadoop\ hadoop-2.7.2\ etc\ hadoop to that directory.

3.3.Running WordCount programs

After the above, you will complete the configuration of the development environment, and then try to see if the run is successful.

In the image above, the red circle is the key point, and the input and output path of wordcount is configured, because I am using the local file system instead of HDFS in the local mode, so this place is using file:/// instead of hdfs:// (need to pay special attention).

Then click the Run button and hadoop will run.

The operation is successful when the following occurs:

16-09-15 22:18:37 INFO client.RMProxy: Connecting to ResourceManager at / 0.0.0.0:8032

16-09-15 22:18:39 WARN mapreduce.JobResourceUploader: No job jar file set. User classes may not be found. See Job or Job#setJar (String).

16-09-15 22:18:39 INFO input.FileInputFormat: Total input paths to process: 2

16-09-15 22:18:40 INFO mapreduce.JobSubmitter: number of splits:2

16-09-15 22:18:41 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1473949101198_0001

16-09-15 22:18:41 INFO mapred.YARNRunner: Job jar is not present. Not adding any jar to the list of resources.

16-09-15 22:18:41 INFO impl.YarnClientImpl: Submitted application application_1473949101198_0001

22:18:41 on 16-09-15 INFO mapreduce.Job: The url to track the job: http://Lenovo-PC:8088/proxy/application_1473949101198_0001/

16-09-15 22:18:41 INFO mapreduce.Job: Running job: job_1473949101198_0001

16-09-15 22:18:53 INFO mapreduce.Job: Job job_1473949101198_0001 running in uber mode: false

16-09-15 22:18:53 INFO mapreduce.Job: map 0 reduce 0

16-09-15 22:19:03 INFO mapreduce.Job: map 100% reduce 0

16-09-15 22:19:10 INFO mapreduce.Job: map 100 reduce 100%

16-09-15 22:19:11 INFO mapreduce.Job: Job job_1473949101198_0001 completed successfully

16-09-15 22:19:12 INFO mapreduce.Job: Counters: 50

File System Counters

FILE: Number of bytes read=119

FILE: Number of bytes written=359444

FILE: Number of read operations=0

FILE: Number of large read operations=0

FILE: Number of write operations=0

HDFS: Number of bytes read=194

HDFS: Number of bytes written=0

HDFS: Number of read operations=2

HDFS: Number of large read operations=0

HDFS: Number of write operations=0

Job Counters

Killed map tasks=1

Launched map tasks=2

Launched reduce tasks=1

Rack-local map tasks=2

Total time spent by all maps in occupied slots (ms) = 12156

Total time spent by all reduces in occupied slots (ms) = 4734

Total time spent by all map tasks (ms) = 12156

Total time spent by all reduce tasks (ms) = 4734

Total vcore-milliseconds taken by all map tasks=12156

Total vcore-milliseconds taken by all reduce tasks=4734

Total megabyte-milliseconds taken by all map tasks=12447744

Total megabyte-milliseconds taken by all reduce tasks=4847616

Map-Reduce Framework

Map input records=2

Map output records=8

Map output bytes=78

Map output materialized bytes=81

Input split bytes=194

Combine input records=8

Combine output records=6

Reduce input groups=4

Reduce shuffle bytes=81

Reduce input records=6

Reduce output records=4

Spilled Records=12

Shuffled Maps = 2

Failed Shuffles=0

Merged Map outputs=2

GC time elapsed (ms) = 187

CPU time spent (ms) = 1733

Physical memory (bytes) snapshot=630702080

Virtual memory (bytes) snapshot=834060288

Total committed heap usage (bytes) = 484966400

Shuffle Errors

BAD_ID=0

CONNECTION=0

IO_ERROR=0

WRONG_LENGTH=0

WRONG_MAP=0

WRONG_REDUCE=0

File Input Format Counters

Bytes Read=44

File Output Format Counters

Bytes Written=43

Then view the run results in the output path (the output path configured in the run):

The following problems may occur during operation:

1), question 1:

16-09-15 22:12:08 INFO client.RMProxy: Connecting to ResourceManager at / 0.0.0.0:8032

Exception in thread "main" java.net.ConnectException: Call From Lenovo-PC/192.168.1.105 to 0.0.0.0 java.net.ConnectException 9000 failed on connection exception: java.net.ConnectException: Connection refused: no further information; For more details see: http://wiki.apache.org/hadoop/ConnectionRefused

At sun.reflect.NativeConstructorAccessorImpl.newInstance0 (Native Method)

At sun.reflect.NativeConstructorAccessorImpl.newInstance (NativeConstructorAccessorImpl.java:62)

At sun.reflect.DelegatingConstructorAccessorImpl.newInstance (DelegatingConstructorAccessorImpl.java:45)

At java.lang.reflect.Constructor.newInstance (Constructor.java:423)

At org.apache.hadoop.net.NetUtils.wrapWithMessage (NetUtils.java:792)

At org.apache.hadoop.net.NetUtils.wrapException (NetUtils.java:732)

At org.apache.hadoop.ipc.Client.call (Client.java:1479)

At org.apache.hadoop.ipc.Client.call (Client.java:1412)

At org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke (ProtobufRpcEngine.java:229)

At com.sun.proxy.$Proxy12.getFileInfo (Unknown Source)

At org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getFileInfo (ClientNamenodeProtocolTranslatorPB.java:771)

At sun.reflect.NativeMethodAccessorImpl.invoke0 (NativeMethod)

At sun.reflect.NativeMethodAccessorImpl.invoke (NativeMethodAccessorImpl.java:62)

At sun.reflect.DelegatingMethodAccessorImpl.invoke (DelegatingMethodAccessorImpl.java:43)

At java.lang.reflect.Method.invoke (Method.java:498)

At org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod (RetryInvocationHandler.java:191)

At org.apache.hadoop.io.retry.RetryInvocationHandler.invoke (RetryInvocationHandler.java:102)

At com.sun.proxy.$Proxy13.getFileInfo (Unknown Source)

At org.apache.hadoop.hdfs.DFSClient.getFileInfo (DFSClient.java:2108)

At org.apache.hadoop.hdfs.DistributedFileSystem$22.doCall (DistributedFileSystem.java:1305)

At org.apache.hadoop.hdfs.DistributedFileSystem$22.doCall (DistributedFileSystem.java:1301)

At org.apache.hadoop.fs.FileSystemLinkResolver.resolve (FileSystemLinkResolver.java:81)

At org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus (DistributedFileSystem.java:1301)

At org.apache.hadoop.fs.FileSystem.exists (FileSystem.java:1424)

At org.apache.hadoop.mapreduce.JobSubmissionFiles.getStagingDir (JobSubmissionFiles.java:116)

At org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal (JobSubmitter.java:144)

At org.apache.hadoop.mapreduce.Job$10.run (Job.java:1290)

At org.apache.hadoop.mapreduce.Job$10.run (Job.java:1287)

At java.security.AccessController.doPrivileged (Native Method)

At javax.security.auth.Subject.doAs (Subject.java:422)

At org.apache.hadoop.security.UserGroupInformation.doAs (UserGroupInformation.java:1657)

At org.apache.hadoop.mapreduce.Job.submit (Job.java:1287)

At org.apache.hadoop.mapreduce.Job.waitForCompletion (Job.java:1308)

At org.apache.hadoop.examples.WordCount.main (WordCount.java:87)

Caused by: java.net.ConnectException: Connection refused: no further information

At sun.nio.ch.SocketChannelImpl.checkConnect (Native Method)

At sun.nio.ch.SocketChannelImpl.finishConnect (SocketChannelImpl.java:717)

At org.apache.hadoop.net.SocketIOWithTimeout.connect (SocketIOWithTimeout.java:206)

At org.apache.hadoop.net.NetUtils.connect (NetUtils.java:531)

At org.apache.hadoop.net.NetUtils.connect (NetUtils.java:495)

At org.apache.hadoop.ipc.Client$Connection.setupConnection (Client.java:614)

At org.apache.hadoop.ipc.Client$Connection.setupIOstreams (Client.java:712)

At org.apache.hadoop.ipc.Client$Connection.access$2900 (Client.java:375)

At org.apache.hadoop.ipc.Client.getConnection (Client.java:1528)

At org.apache.hadoop.ipc.Client.call (Client.java:1451)

... 27 more

The above problem occurs because the ports in the core-site.xml in the project are inconsistent with those in the locally installed hadoop configuration file, core-site.xml. Please modify it to be consistent.

2), question 2:

16-09-15 22:14:45 INFO client.RMProxy: Connecting to ResourceManager at / 0.0.0.0:8032

16-09-15 22:14:48 INFO ipc.Client: Retrying connect to server: 0.0.0.0/0.0.0.0:8032. Already tried 0 time (s); retry policy is RetryUpToMaximumCountWithFixedSleep (maxRetries=10, sleepTime=1000 MILLISECONDS)

16-09-15 22:14:50 INFO ipc.Client: Retrying connect to server: 0.0.0.0/0.0.0.0:8032. Already tried 1 time (s); retry policy is RetryUpToMaximumCountWithFixedSleep (maxRetries=10, sleepTime=1000 MILLISECONDS)

16-09-15 22:14:52 INFO ipc.Client: Retrying connect to server: 0.0.0.0/0.0.0.0:8032. Already tried 2 time (s); retry policy is RetryUpToMaximumCountWithFixedSleep (maxRetries=10, sleepTime=1000 MILLISECONDS)

16-09-15 22:14:54 INFO ipc.Client: Retrying connect to server: 0.0.0.0/0.0.0.0:8032. Already tried 3 time (s); retry policy is RetryUpToMaximumCountWithFixedSleep (maxRetries=10, sleepTime=1000 MILLISECONDS)

If the above problem indicates that yarn did not start, start yarn.

3), question 3:

16-09-15 22:16:00 INFO client.RMProxy: Connecting to ResourceManager at / 0.0.0.0:8032

16-09-15 22:16:02 WARN mapreduce.JobResourceUploader: No job jar file set. User classes may not be found. See Job or Job#setJar (String).

16-09-15 22:16:02 INFO input.FileInputFormat: Total input paths to process: 2

16-09-15 22:16:03 INFO mapreduce.JobSubmitter: number of splits:2

16-09-15 22:16:03 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1473948945298_0001

16-09-15 22:16:04 INFO mapred.YARNRunner: Job jar is not present. Not adding any jar to the list of resources.

16-09-15 22:16:04 INFO impl.YarnClientImpl: Submitted application application_1473948945298_0001

22:16:04 on 16-09-15 INFO mapreduce.Job: The url to track the job: http://Lenovo-PC:8088/proxy/application_1473948945298_0001/

16-09-15 22:16:04 INFO mapreduce.Job: Running job: job_1473948945298_0001

16-09-15 22:16:08 INFO mapreduce.Job: Job job_1473948945298_0001 running in uber mode: false

16-09-15 22:16:08 INFO mapreduce.Job: map 0 reduce 0

22:16:08 on 16-09-15 INFO mapreduce.Job: Job job_1473948945298_0001 failed with state FAILED due to: Application application_1473948945298_0001 failed 2 times due to AM Container for appattempt_1473948945298_0001_000002 exited with exitCode:-1000

For more detailed output, check application tracking page: http://Lenovo-PC:8088/cluster/app/application_1473948945298_0001Then, click on links to logs of each attempt.

Diagnostics: Could not find any valid local directory for nmPrivate/container_1473948945298_0001_02_000001.tokens

Failing this attempt. Failing the application.

16-09-15 22:16:08 INFO mapreduce.Job: Counters: 0

If the above problems occur, it means that you do not use administrator privileges to start hadoop, please use administrator privileges to start hadoop.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.