In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-02-28 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >
Share
Shulou(Shulou.com)06/02 Report--
The following describes how to build a local mode development environment for hadoop2.7.2 on Windows8.1 to prepare for the later development of mapreduce.
Before building the development environment, we first choose the development tools, that is, we are all familiar with Eclipse (I use the eclipse4.4.2 version this time). Eclipse provides a plug-in for hadoop, through which we can write mapreduce in eclipse. However, this plug-in may need to be compiled as the version of hadoop or eclipse is upgraded. Therefore, before we develop, it is very important to learn to compile the hadoop plug-in for eclipse. Compiling the eclipse plug-in uses the ant tool, and the ant tool is outside the scope of this introduction.
1. First obtain the hadoop2x-eclipse-plugin plug-in through sourcetree.
1.1.The plug-in address is downloaded on github: https://github.com/winghc/hadoop2x-eclipse-plugin.git
1.2. Decompress the downloaded plug-in hadoop2x-eclipse-plugin-master.zip on the local disk, and the directory structure after decompression is as follows:
Then modify the build.xml file in the F:\ Hadoop\ eclipsechajian\ hadoop2x-eclipse-plugin-master\ src\ contrib\ eclipse-plugin directory
Since the website is compiled based on the hadoop2.6 version, version 2.7.2 needs to be modified for build.xml as follows:
Find the tag, there is a pile of sub-element under this element, and change this paragraph to
And add two new element:
The above jar packages need to be used when compiling hadoop2.7.2 eclipse plug-ins, and an error will be reported if they are not added, so we will add them before ant compilation.
1.4. then find the tag and write the package just copy to the Bundle-ClassPath of the mainfest.mf file when ant is built:
Lib/servlet-api-$ {servlet-api.version} .jar
Lib/commons-io-$ {commons-io.version} .jar
And replace lib/htrace-core-$ {htrace.version} .jar with lib/htrace-core-$ {htrace.version}-incubating.jar
1.5. Modify the file\ hadoop2x-eclipse-plugin\ src\ ivy\ libraries.properties, which configures the version of each jar package needed to build ant, as well as the version of building hadoop. Since the downloaded plug-in compiles hadoop2.6.0, we need to modify the following configuration and change the following properties and values to correspond to the version of hadoop2.7.2 and the jar package of the current environment:
Hadoop.version=2.7.2
Apacheant.version=1.9.7
Commons-collections.version=3.2.2
Commons-httpclient.version=3.1
Commons-logging.version=1.1.3
Commons-io.version=2.4
Slf4j-api.version=1.7.10
Slf4j-log4j12.version=1.7.10
In fact, when ant is built, the jar package version (\ hadoop-2.7.2\ share\ hadoop\ common) in the local hadoop2.7.2 directory is selected, so as long as the version number is changed to the corresponding version number, it can be shown below:
1.6. finally modify the file\ hadoop2x-eclipse-plugin\ ivy\ libraries.properties. The version of the file is the same as the one above, but there is another version that needs to be modified.
The version of htrace.version should be changed to 3.1.0.
Then cd to the F:\ Hadoop\ eclipsechajian\ hadoop2x-eclipse-plugin-master\ src\ contrib\ eclipse-plugin directory
Execute the following command:
Ant jar-Dversion=2.7.2-Declipse.home=D:\ eclipse_hadoop-Dhadoop.home=F:\ Hadoop\ hadoop-2.7.2
Explain this command:-Dversion refers to the version of the plug-in, Declipse.home refers to the installation directory of eclipse, and-Dhadoop.home refers to the installation directory of hadoop-2.7.2 in the local file.
After the command is executed successfully, it can be found in the\ hadoop2x-eclipse-plugin\ build\ contrib\ eclipse-plugin directory.
Hadoop-eclipse-plugin-2.7.2.jar package, this package is the compiled eclipse hadoop2.7.2 plug-in, put this plug-in into the plugins directory of the eclipse installation directory, we can go to eclipse and find a view called mapreduce, we can start to try to write mapreduce programs.
Download eclipse and configure JDK
Go to http://www.eclipse.org/downloads/ to download the version you need, what we download here is the win64 bit version. Extract it directly into the directory. Make simple settings and choose the version of jdk according to your development needs.
1.9. set up the hadoop plug-in
Select window-preferences from the eclipse menu to open the settings menu
At this point, the Eclipse development environment is completed, and the running environment of hadoop will be built below. The hadoop project needs to submit the program to the hadoop running environment to run.
2. After the Eclipse plug-in has been compiled, you need to install Hadoop2.7.2
It is relatively troublesome to build a hadoop environment, and it is necessary to install a virtual machine or cygwin, but by checking the official information and groping, we have built a local model on window, which does not require virtual machines and cygwin dependencies, and the official website clearly points out that cygwin no longer supports hadoop2.x.
Build Hadoop local mode running environment under Windows reference: http://wiki.apache.org/hadoop/Hadoop2OnWindows
Configure the windows environment as follows:
Java JDK: I use 1.8.Configuring JAVA_HOME. If installed by default, it will be installed in C:\ Program Files\ Java\ jdk1.8.0_51. There is a space in this directory. An error will be reported when starting hadoop, JAVA_HOME is incorrect. At this point, you need to change the JAVA_ home value of the environment variable to: C:\ Progra~1\ Java\ jdk1.8.0_51,Program Files can be replaced by Progra~.
2.2. Hadoop environment variable: create a new HADOOP_HOME and point to the hadoop decompression directory, such as F:\ Hadoop\ hadoop-2.7.2. Then add:% HADOOP_HOME%\ bin; to the path environment variable.
2.3.The Hadoop dependent library: winutils related. Winutils support, hadoop.dll and other files are required for hadoop to run on windows. Download address: http://download.csdn.net/detail/fly_leopard/9503059
Note that files such as hadoop.dll do not conflict with hadoop. To avoid dependency errors, put hadoop.dll into the next copy of c:/windows/System32, and then restart the computer.
2.4.The hadoop environment test:
From a cmd window, switch to hadoop-2.7.2\ bin, and execute the hadoop version command, which is displayed as follows:
Hadoop basic file configuration: hadoop configuration file is located under: hadoop-2.7.2\ etc\ hadoop
Core-site.xml 、 hdfs-site.xml 、 mapred-site.xml 、 yarn-site.xml
Core-site.xml:
Fs.default.name
Hdfs://0.0.0.0:19000
Hdfs-site.xml:
Dfs.replication
one
Dfs.namenode.name.dir
File:/Hadoop/hadoop-2.7.2/data/dfs/namenode
Dfs.datanode.data.dir
File:/Hadoop/hadoop-2.7.2/data/dfs/datanode
Mapred-site.xml:
Mapreduce.job.user.name
% USERNAME%
Mapreduce.framework.name
Yarn
Yarn.apps.stagingDir
/ user/%USERNAME%/staging
Mapreduce.jobtracker.address
Local
Where% USERNAME% is the user name under which your computer executes hadoop.
Yarn-site.xml:
Yarn.server.resourcemanager.address
0.0.0.0:8020
Yarn.server.resourcemanager.application.expiry.interval
60000
Yarn.server.nodemanager.address
0.0.0.0:45454
Yarn.nodemanager.aux-services
Mapreduce_shuffle
Yarn.nodemanager.aux-services.mapreduce.shuffle.class
Org.apache.hadoop.mapred.ShuffleHandler
Yarn.server.nodemanager.remote-app-log-dir
/ app-logs
Yarn.nodemanager.log-dirs
/ dep/logs/userlogs
Yarn.server.mapreduce-appmanager.attempt-listener.bindAddress
0.0.0.0
Yarn.server.mapreduce-appmanager.client-service.bindAddress
0.0.0.0
Yarn.log-aggregation-enable
True
Yarn.log-aggregation.retain-seconds
-1
Yarn.application.classpath
% HADOOP_CONF_DIR%,%HADOOP_HOME%/share/hadoop/common/*,%HADOOP_HOME%/share/hadoop/common/lib/*,%HADOOP_HOME%/share/hadoop/hdfs/*,%HADOOP_HOME%/share/hadoop/hdfs/lib/*,%HADOOP_HOME%/share/hadoop/mapreduce/*,%HADOOP_HOME%/share/hadoop/mapreduce/lib/*,%HADOOP_HOME%/share/hadoop/yarn/*,%HADOOP_HOME%/share/hadoop/yarn/lib/*
Where:% HADOOP_CONF_DIR% is the installation path of hadoop; the path of yarn.nodemanager.log-dirs configuration item is created in the same directory of your hadoop installation path, for example, my hadoop is on F disk, so the directory of this configuration item is created on F disk.
2.6. Format system files:
Execute hdfs namenode-format under hadoop-2.7.2/bin
Wait for the execution to be completed. Do not repeat format, which is prone to exception.
2.7.After formatting, execute start-dfs.cmd under hadoop-2.7.2/sbin to start hadoop
Visit: http://localhost:50070
Execute start-yarn.cmd under hadoop-2.7.2/sbin to start yarn, visit http://localhost:8088 to view resources and node management
This means that the hadoop2.7.2 runtime environment has been built.
3. Combine Eclipse to create MR project and use local system to develop hadoop local mode.
I am using Eclipse to develop using a local file system, not as much as using HDFS,HDFS in a fully distributed environment, so there is no need to introduce it here. In addition, there are not many articles about using Eclipse to develop DFS Locations (this does not affect development). This is used to view the HDFS file system on the cluster (as I understand it so far). Anyway, I used this to connect to the hadoop (local mode) launched on the local windows8.1. I never practiced it successfully. I reported the following error:
Java.lang.NoClassDefFoundError: org/apache/htrace/SamplerBuilder
At org.apache.hadoop.hdfs.DFSClient. (DFSClient.java:635)
At org.apache.hadoop.hdfs.DFSClient. (DFSClient.java:619)
At org.apache.hadoop.hdfs.DistributedFileSystem.initialize (DistributedFileSystem.java:149)
At org.apache.hadoop.fs.FileSystem.createFileSystem (FileSystem.java:2653)
At org.apache.hadoop.fs.FileSystem.access$200 (FileSystem.java:92)
At org.apache.hadoop.fs.FileSystem$Cache.getInternal (FileSystem.java:2687)
At org.apache.hadoop.fs.FileSystem$Cache.get (FileSystem.java:2669)
At org.apache.hadoop.fs.FileSystem.get (FileSystem.java:371)
At org.apache.hadoop.fs.FileSystem.get (FileSystem.java:170)
At org.apache.hadoop.eclipse.server.HadoopServer.getDFS (HadoopServer.java:478)
At org.apache.hadoop.eclipse.dfs.DFSPath.getDFS (DFSPath.java:146)
At org.apache.hadoop.eclipse.dfs.DFSFolder.loadDFSFolderChildren (DFSFolder.java:61)
At org.apache.hadoop.eclipse.dfs.DFSFolder$1.run (DFSFolder.java:178)
At org.eclipse.core.internal.jobs.Worker.run (Worker.java:54)
This issue has been resolved because the corresponding plug-in jar package is missing; the following three plug-ins need to be placed in the $eclipse_home\ plugins\ directory.
All right, let's move on to the introduction to developing hadoop using Eclipse
3.1. after the above environment is built, let's talk about how to develop it. We use hadoop's wordcount to test it.
Create a mr project
Set the project name
Create a class
Set class properties
After the creation is completed, copy the contents of the WordCount.java file in the hadoop-2.7.2-src\ hadoop-mapreduce-project\ hadoop-mapreduce-examples\ src\ main\ java\ org\ apache\ hadoop\ examples directory to the file you just created.
3.2 next create a configuration environment
Create a Source Floder named resources in the project, and then copy all the configuration files under F:\ Hadoop\ hadoop-2.7.2\ etc\ hadoop to that directory.
3.3.Running WordCount programs
After the above, you will complete the configuration of the development environment, and then try to see if the run is successful.
In the image above, the red circle is the key point, and the input and output path of wordcount is configured, because I am using the local file system instead of HDFS in the local mode, so this place is using file:/// instead of hdfs:// (need to pay special attention).
Then click the Run button and hadoop will run.
The operation is successful when the following occurs:
16-09-15 22:18:37 INFO client.RMProxy: Connecting to ResourceManager at / 0.0.0.0:8032
16-09-15 22:18:39 WARN mapreduce.JobResourceUploader: No job jar file set. User classes may not be found. See Job or Job#setJar (String).
16-09-15 22:18:39 INFO input.FileInputFormat: Total input paths to process: 2
16-09-15 22:18:40 INFO mapreduce.JobSubmitter: number of splits:2
16-09-15 22:18:41 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1473949101198_0001
16-09-15 22:18:41 INFO mapred.YARNRunner: Job jar is not present. Not adding any jar to the list of resources.
16-09-15 22:18:41 INFO impl.YarnClientImpl: Submitted application application_1473949101198_0001
22:18:41 on 16-09-15 INFO mapreduce.Job: The url to track the job: http://Lenovo-PC:8088/proxy/application_1473949101198_0001/
16-09-15 22:18:41 INFO mapreduce.Job: Running job: job_1473949101198_0001
16-09-15 22:18:53 INFO mapreduce.Job: Job job_1473949101198_0001 running in uber mode: false
16-09-15 22:18:53 INFO mapreduce.Job: map 0 reduce 0
16-09-15 22:19:03 INFO mapreduce.Job: map 100% reduce 0
16-09-15 22:19:10 INFO mapreduce.Job: map 100 reduce 100%
16-09-15 22:19:11 INFO mapreduce.Job: Job job_1473949101198_0001 completed successfully
16-09-15 22:19:12 INFO mapreduce.Job: Counters: 50
File System Counters
FILE: Number of bytes read=119
FILE: Number of bytes written=359444
FILE: Number of read operations=0
FILE: Number of large read operations=0
FILE: Number of write operations=0
HDFS: Number of bytes read=194
HDFS: Number of bytes written=0
HDFS: Number of read operations=2
HDFS: Number of large read operations=0
HDFS: Number of write operations=0
Job Counters
Killed map tasks=1
Launched map tasks=2
Launched reduce tasks=1
Rack-local map tasks=2
Total time spent by all maps in occupied slots (ms) = 12156
Total time spent by all reduces in occupied slots (ms) = 4734
Total time spent by all map tasks (ms) = 12156
Total time spent by all reduce tasks (ms) = 4734
Total vcore-milliseconds taken by all map tasks=12156
Total vcore-milliseconds taken by all reduce tasks=4734
Total megabyte-milliseconds taken by all map tasks=12447744
Total megabyte-milliseconds taken by all reduce tasks=4847616
Map-Reduce Framework
Map input records=2
Map output records=8
Map output bytes=78
Map output materialized bytes=81
Input split bytes=194
Combine input records=8
Combine output records=6
Reduce input groups=4
Reduce shuffle bytes=81
Reduce input records=6
Reduce output records=4
Spilled Records=12
Shuffled Maps = 2
Failed Shuffles=0
Merged Map outputs=2
GC time elapsed (ms) = 187
CPU time spent (ms) = 1733
Physical memory (bytes) snapshot=630702080
Virtual memory (bytes) snapshot=834060288
Total committed heap usage (bytes) = 484966400
Shuffle Errors
BAD_ID=0
CONNECTION=0
IO_ERROR=0
WRONG_LENGTH=0
WRONG_MAP=0
WRONG_REDUCE=0
File Input Format Counters
Bytes Read=44
File Output Format Counters
Bytes Written=43
Then view the run results in the output path (the output path configured in the run):
The following problems may occur during operation:
1), question 1:
16-09-15 22:12:08 INFO client.RMProxy: Connecting to ResourceManager at / 0.0.0.0:8032
Exception in thread "main" java.net.ConnectException: Call From Lenovo-PC/192.168.1.105 to 0.0.0.0 java.net.ConnectException 9000 failed on connection exception: java.net.ConnectException: Connection refused: no further information; For more details see: http://wiki.apache.org/hadoop/ConnectionRefused
At sun.reflect.NativeConstructorAccessorImpl.newInstance0 (Native Method)
At sun.reflect.NativeConstructorAccessorImpl.newInstance (NativeConstructorAccessorImpl.java:62)
At sun.reflect.DelegatingConstructorAccessorImpl.newInstance (DelegatingConstructorAccessorImpl.java:45)
At java.lang.reflect.Constructor.newInstance (Constructor.java:423)
At org.apache.hadoop.net.NetUtils.wrapWithMessage (NetUtils.java:792)
At org.apache.hadoop.net.NetUtils.wrapException (NetUtils.java:732)
At org.apache.hadoop.ipc.Client.call (Client.java:1479)
At org.apache.hadoop.ipc.Client.call (Client.java:1412)
At org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke (ProtobufRpcEngine.java:229)
At com.sun.proxy.$Proxy12.getFileInfo (Unknown Source)
At org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getFileInfo (ClientNamenodeProtocolTranslatorPB.java:771)
At sun.reflect.NativeMethodAccessorImpl.invoke0 (NativeMethod)
At sun.reflect.NativeMethodAccessorImpl.invoke (NativeMethodAccessorImpl.java:62)
At sun.reflect.DelegatingMethodAccessorImpl.invoke (DelegatingMethodAccessorImpl.java:43)
At java.lang.reflect.Method.invoke (Method.java:498)
At org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod (RetryInvocationHandler.java:191)
At org.apache.hadoop.io.retry.RetryInvocationHandler.invoke (RetryInvocationHandler.java:102)
At com.sun.proxy.$Proxy13.getFileInfo (Unknown Source)
At org.apache.hadoop.hdfs.DFSClient.getFileInfo (DFSClient.java:2108)
At org.apache.hadoop.hdfs.DistributedFileSystem$22.doCall (DistributedFileSystem.java:1305)
At org.apache.hadoop.hdfs.DistributedFileSystem$22.doCall (DistributedFileSystem.java:1301)
At org.apache.hadoop.fs.FileSystemLinkResolver.resolve (FileSystemLinkResolver.java:81)
At org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus (DistributedFileSystem.java:1301)
At org.apache.hadoop.fs.FileSystem.exists (FileSystem.java:1424)
At org.apache.hadoop.mapreduce.JobSubmissionFiles.getStagingDir (JobSubmissionFiles.java:116)
At org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal (JobSubmitter.java:144)
At org.apache.hadoop.mapreduce.Job$10.run (Job.java:1290)
At org.apache.hadoop.mapreduce.Job$10.run (Job.java:1287)
At java.security.AccessController.doPrivileged (Native Method)
At javax.security.auth.Subject.doAs (Subject.java:422)
At org.apache.hadoop.security.UserGroupInformation.doAs (UserGroupInformation.java:1657)
At org.apache.hadoop.mapreduce.Job.submit (Job.java:1287)
At org.apache.hadoop.mapreduce.Job.waitForCompletion (Job.java:1308)
At org.apache.hadoop.examples.WordCount.main (WordCount.java:87)
Caused by: java.net.ConnectException: Connection refused: no further information
At sun.nio.ch.SocketChannelImpl.checkConnect (Native Method)
At sun.nio.ch.SocketChannelImpl.finishConnect (SocketChannelImpl.java:717)
At org.apache.hadoop.net.SocketIOWithTimeout.connect (SocketIOWithTimeout.java:206)
At org.apache.hadoop.net.NetUtils.connect (NetUtils.java:531)
At org.apache.hadoop.net.NetUtils.connect (NetUtils.java:495)
At org.apache.hadoop.ipc.Client$Connection.setupConnection (Client.java:614)
At org.apache.hadoop.ipc.Client$Connection.setupIOstreams (Client.java:712)
At org.apache.hadoop.ipc.Client$Connection.access$2900 (Client.java:375)
At org.apache.hadoop.ipc.Client.getConnection (Client.java:1528)
At org.apache.hadoop.ipc.Client.call (Client.java:1451)
... 27 more
The above problem occurs because the ports in the core-site.xml in the project are inconsistent with those in the locally installed hadoop configuration file, core-site.xml. Please modify it to be consistent.
2), question 2:
16-09-15 22:14:45 INFO client.RMProxy: Connecting to ResourceManager at / 0.0.0.0:8032
16-09-15 22:14:48 INFO ipc.Client: Retrying connect to server: 0.0.0.0/0.0.0.0:8032. Already tried 0 time (s); retry policy is RetryUpToMaximumCountWithFixedSleep (maxRetries=10, sleepTime=1000 MILLISECONDS)
16-09-15 22:14:50 INFO ipc.Client: Retrying connect to server: 0.0.0.0/0.0.0.0:8032. Already tried 1 time (s); retry policy is RetryUpToMaximumCountWithFixedSleep (maxRetries=10, sleepTime=1000 MILLISECONDS)
16-09-15 22:14:52 INFO ipc.Client: Retrying connect to server: 0.0.0.0/0.0.0.0:8032. Already tried 2 time (s); retry policy is RetryUpToMaximumCountWithFixedSleep (maxRetries=10, sleepTime=1000 MILLISECONDS)
16-09-15 22:14:54 INFO ipc.Client: Retrying connect to server: 0.0.0.0/0.0.0.0:8032. Already tried 3 time (s); retry policy is RetryUpToMaximumCountWithFixedSleep (maxRetries=10, sleepTime=1000 MILLISECONDS)
If the above problem indicates that yarn did not start, start yarn.
3), question 3:
16-09-15 22:16:00 INFO client.RMProxy: Connecting to ResourceManager at / 0.0.0.0:8032
16-09-15 22:16:02 WARN mapreduce.JobResourceUploader: No job jar file set. User classes may not be found. See Job or Job#setJar (String).
16-09-15 22:16:02 INFO input.FileInputFormat: Total input paths to process: 2
16-09-15 22:16:03 INFO mapreduce.JobSubmitter: number of splits:2
16-09-15 22:16:03 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1473948945298_0001
16-09-15 22:16:04 INFO mapred.YARNRunner: Job jar is not present. Not adding any jar to the list of resources.
16-09-15 22:16:04 INFO impl.YarnClientImpl: Submitted application application_1473948945298_0001
22:16:04 on 16-09-15 INFO mapreduce.Job: The url to track the job: http://Lenovo-PC:8088/proxy/application_1473948945298_0001/
16-09-15 22:16:04 INFO mapreduce.Job: Running job: job_1473948945298_0001
16-09-15 22:16:08 INFO mapreduce.Job: Job job_1473948945298_0001 running in uber mode: false
16-09-15 22:16:08 INFO mapreduce.Job: map 0 reduce 0
22:16:08 on 16-09-15 INFO mapreduce.Job: Job job_1473948945298_0001 failed with state FAILED due to: Application application_1473948945298_0001 failed 2 times due to AM Container for appattempt_1473948945298_0001_000002 exited with exitCode:-1000
For more detailed output, check application tracking page: http://Lenovo-PC:8088/cluster/app/application_1473948945298_0001Then, click on links to logs of each attempt.
Diagnostics: Could not find any valid local directory for nmPrivate/container_1473948945298_0001_02_000001.tokens
Failing this attempt. Failing the application.
16-09-15 22:16:08 INFO mapreduce.Job: Counters: 0
If the above problems occur, it means that you do not use administrator privileges to start hadoop, please use administrator privileges to start hadoop.
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.