Deployment and testing of hadoop2.6.5+sqoop1.4.6 Environment (2) 07/15 Update SLTechnology News&Howtos

Deployment and testing of hadoop2.6.5+sqoop1.4.6 Environment (2)

2025-07-15 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Shulou(Shulou.com)06/03 Report--

First, let's take a look at the roles played by the four VM in the cluster:

IP hostname hadoop cluster plays the role 10.0.1.100 hadoop-test-nn NameNode,ResourceManager10.0.1.101 hadoop-test-snn SecondaryNameNode10.0.1.102 hadoop-test-dn1 DataNode,NodeManager10.0.1.103 hadoop-test-dn2 DataNode,NodeManager

1. Extract the resulting hadoop-2.6.5.tar.gz into / usr/local/ and establish a / usr/local/hadoop soft link.

Mv hadoop-2.6.5.tar.gz / usr/local/tar-xvf hadoop-2.6.5.tar.gzln-s / usr/local/hadoop-2.6.5 / usr/local/hadoop

two。 Modify the / usr/local/hadoop,/usr/local/hadoop-2.6.5 master group to hadoop to ensure that hadoop users can use:

Chown-R hadoop:hadoop / usr/local/hadoop-2.6.5chown-R hadoop:hadoop / usr/local/hadoop

3. For ease of use, configure the HADOOP_HOME variable and modify the PATH variable by adding the following record in / etc/profile:

Export HADOOP_HOME=/usr/local/hadoopexport PATH=$JAVA_HOME/bin:$HADOOP_HOME/bin:$HADOOP_HOME/sbin

4. The configuration file of hadoop is stored in the $HADOOP_HOME/etc/hadoop/ directory. We complete the environment construction by modifying the attributes in the configuration file in this directory:

1) modify the hadoop-env.sh script to set the JAVA_HOME variable in the script:

# comment in hadoop-env.sh and add the following line # export JAVA_HOME=$ {JAVA_HOME} export JAVA_HOME=/usr/local/java/jdk1.7.0_45

2) create a masters file, which is used to specify which hosts play the role of SecondaryNameNode, and add the hostname of SecondaryNameNode to the master file:

# add the following line hadoop-test-snn to masters

3) create a slaves file, which is used to specify which hosts play the role of DataNode, and add the hostname of DataNode to the slaves file:

# add the following line hadoop-test-dn1hadoop-test-dn2 to slaves

4) modify the attribute values in the core-site.xml file, and set the url and hdfs temporary file directories of hdfs:

Fs.defaultFS hdfs://hadoop-test-nn:8020 hadoop.tmp.dir / hadoop/dfs/tmp

5) modify the attribute values in the hdfs-site.xml file and configure the attributes related to hdfs,NameNode,DataNode:

Dfs.http.address hadoop-test-nn:50070 dfs.namenode.secondary.http-address hadoop-test-snn:50090 dfs.namenode.name.dir / hadoop/dfs/name dfs.datanode.name.dir / hadoop/dfs/data dfs.datanode.ipc.address 0.0.0.0:50020 dfs.datanode.http.address 0.0.0.0:50075 dfs.replication 2

Attribute value description:

The address of the web monitoring page of dfs.http.address:NameNode. The default listener is port 50070.

Dfs.namenode.secondary.http-address: the address of the web monitoring page of SecondaryNameNode. The default listener is port 50090.

Where dfs.namenode.name.dir:NameNode metadata is saved on hdfs

Where dfs.datanode.name.dir:DataNode metadata is saved on hdfs

Dfs.datanode.ipc.address:DataNode 's ipc listening port, which transmits information to NameNode through a heartbeat

The address of the web monitoring page of dfs.datanode.http.address:DataNode. The default listener is port 50075.

Number of copies of each data on dfs.replication:hdfs

6) modify mapred-site.xml, and the development framework adopts yarn architecture:

Mapreduce.framework.name yarn

7) now that the yarn architecture is adopted, it is necessary to configure the relevant attributes of yarn and make the following modifications in yarn-site.xml:

Yarn.nodemanager.aux-services mapreduce_shuffle yarn.resourcemanager.hostname hadoop-test-nn The address of the applications manager interface yarn.resourcemanager.address ${yarn.resourcemanager.hostname}: 8040 The address of the scheduler interface yarn.resourcemanager.scheduler.address ${yarn.resourcemanager.hostname}: 8030 The http address of the RM web application. Yarn.resourcemanager.webapp.address ${yarn.resourcemanager.hostname}: 8088 yarn.resourcemanager.resource-tracker.address ${yarn.resourcemanager.hostname}: 8025

Attribute value description:

Hostname of the node where the yarn.resourcemanager.hostname:ResourceManager resides

Yarn.nodemanager.aux-services: configure the extension service on the NodeManager node. When it is specified as mapreduce-shuffle, the mapreduce program we write can output from map task to reduce task

Yarn.resourcemanager.address:NodeManager communicates with ResourceManager through this port, and the default listening is on port 8032 (the port used in this article has been modified)

The scheduling service interface address provided by yarn.resourcemanager.scheduler.address:ResourceManager is also the address entered in the Map/Reduce Master column when configuring mapreduce location in eclipse. Default listening on port 8030

The address of the web monitoring page of yarn.resourcemanager.webapp.address:ResourceManager. The default listener is port 8088.

Yarn.resourcemanager.resource-tracker.address:NodeManager reports the running status of the task to ResourceManager through this port so that ResourceManagerg can track the task. The default listening is on port 8031 (the configuration used in this article has modified the port)

There are other attribute values, such as the address used by yarn.resourcemanager.admin.address to send administrative commands, the number of handler sent through RPC requests that yarn.resourcemanager.resource-tracker.client.thread-count can handle, and so on, which can be added to the configuration file if necessary.

8) copy the modified configuration file to each node:

Scp core-site.xml hdfs-site.xml mapred-site.xml masters slaves yarn-site.xml hadoop-test-snn:/usr/local/hadoop/etc/hadoop/scp core-site.xml hdfs-site.xml mapred-site.xml masters slaves yarn-site.xml hadoop-test-dn1:/usr/local/hadoop/etc/hadoop/scp core-site.xml hdfs-site.xml mapred-site.xml masters slaves yarn-site.xml hadoop-test-dn2:/usr/local/hadoop/etc/hadoop/

9) NameNode formatting operation. When you use hdfs for the first time, you need to format the NameNode node, and the formatted path should be the parent directory of the path specified by many attributes ending with dir in hdfs-site.xml, where the path specified is the absolute path on the file system. If the user has full control over their parent directory, the directories specified by these properties can be created automatically when hdfs starts.

So first create the / hadoop directory and change the directory to hadoop:

Mkdir / hadoopchown-R hadoop:hadoop / hadoop

Then use hadoop users to format NameNode:

Su-hadoop$HADOOP_HOME/bin/hdfs namenode-format

Note: please pay attention to the log information output during the execution of the command. If there is an error or exception prompt, please check the permissions of the specified directory first. The problem may lie here.

10) start the hadoop cluster service: after NameNode is formatted successfully, you can use the script under $HADOOP_HOME/sbin/ to start and stop the services of the node. On the NameNode node, you can use start/stop-yarn.sh and start/stop-dfs.sh to start and stop yarn and HDFS, or you can use start/stop-all.sh to start and stop services on all nodes, or use hadoop-daemon.sh to start and stop specific services on the specified node. Start-all.sh is used here to start services on all nodes:

Start-all.sh

Note: during the startup process, the output log will show the process of the started service, and the log will be saved in a specific directory as * .out. If you find that a specific service has not started successfully, you can check the log to troubleshoot.

11) check the operation. After the startup is complete, you can see the relevant running processes using the jps command. Because the services are different, the processes on different nodes are different:

NameNode 10.0.1.100: [hadoop@hadoop-test-nn ~] $jps4226 NameNode4487 ResourceManager9796 Jps10.0.1.101 SecondaryNameNode: [hadoop@hadoop-test-snn ~] $jps4890 Jps31518 SecondaryNameNode10.0.1.102 DataNode: [hadoop@hadoop-test-dn1 ~] $jps31421 DataNode2888 Jps31532 NodeManager10.0.1.103 DataNode: [hadoop@hadoop-test-dn2 ~] $jps29786 DataNode29896 NodeManager1164 Jps

At this point, the complete distributed environment of Hadoop has been built.

12) run the test program

You can use the provided mapreduce sample program, wordcount, to verify that the hadoop environment is working properly, which is included in the hadoop-mapreduce-examples-2.6.5.jar package in the $HADOOP_HOME/share/hadoop/mapreduce/ directory, using the command format

Hadoop jar hadoop-mapreduce-examples-2.6.5.jar wordcount [...]

First, upload a file to the / test_wordcount directory of HDFS, where / etc/profile is used for testing:

# create / test_wordcount directory on hdfs [hadoop@hadoop-test-nn mapreduce] $hdfs dfs-mkdir / test_wordcount# upload / etc/profile to / test_wordcount directory [hadoop@hadoop-test-nn mapreduce] $hdfs dfs-put / etc/profile / test_wordcount [hadoop@hadoop-test-nn mapreduce] $hdfs dfs-ls / test_wordcount Found 1 items-rw-r--r-- 2 hadoop supergroup 2064 2017-08-06 21:28 / test_wordcount/profile# tests using the wordcount program [hadoop@hadoop-test-nn mapreduce] $hadoop jar hadoop-mapreduce-examples-2.6.5.jar wordcount/ test_wordcount/profile / test_wordcount_out17/08/06 21:30:11 INFO client.RMProxy: Connecting to ResourceManager at hadoop-test-nn/10.0.1.100:804017/08/06 21:30:13 INFO input.FileInputFormat: Total input paths to process: 21:30 on 117-08-06: 13 INFO mapreduce.JobSubmitter: number of splits:117/08/06 21:30:13 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1501950606475_000117/08/06 21:30:14 INFO impl.YarnClientImpl: Submitted application application_1501950606475_000117/08/06 21:30:14 INFO mapreduce.Job: The url to track the job: http://hadoop-test-nn:8088/proxy/application_1501950606475_0001/17/08/06 21:30:14 INFO mapreduce.Job: Running job: job_1501950606475_000117 / 08 reduce 06 21:30:29 INFO mapreduce.Job: Job job_1501950606475_0001 running in uber mode: false17/08/06 21:30:29 INFO mapreduce.Job: map 0 reduce 0 reduce 06 21:30:39 INFO mapreduce.Job: map 100% reduce 0 Chark 06 21:30:49 INFO mapreduce.Job: map 100% reduce 100-08-06 21:30:50 INFO mapreduce.Job: Job job_1501950606475_0001 completed successfully17/08/06 21:30:51 INFO mapreduce. Job: Counters: 49 File System Counters FILE: Number of bytes read=2320 FILE: Number of bytes written=219547 FILE: Number of read operations=0 FILE: Number of large read operations=0 FILE: Number of write operations=0 HDFS: Number of bytes read=2178 HDFS: Number of bytes written=1671 HDFS: Number of read operations=6 HDFS: Number of large read operations=0 HDFS: Number of write operations=2 Job Counters Launched map tasks=1 Launched reduce tasks=1 Data-local map tasks=1 Total time spent by all maps in occupied slots (ms) = 7536 Total time spent by all reduces in occupied slots (ms) = 8136 Total time spent by all map tasks (ms) = 7536 Total time spent by all reduce tasks (ms) = 8136 Total vcore-milliseconds taken by all map tasks=7536 Total vcore-milliseconds taken by all reduce tasks=8136 Total megabyte-milliseconds taken by all map tasks=7716864 Total megabyte-milliseconds taken by all reduce tasks=8331264 Map-Reduce Framework Map input records=84 Map output records=268 Map output bytes=2880 Map output materialized bytes=2320 Input split bytes=114 Combine input records=268 Combine output records=161 Reduce input groups=161 Reduce shuffle bytes=2320 Reduce input records=161 Reduce output records=161 Spilled Records=322 Shuffled Maps = 1 Failed Shuffles=0 Merged Map outputs=1 GC time elapsed (ms) = 186CPU time spent (ms) = 1850 Physical memory (bytes) snapshot=310579200 Virtual memory (bytes) snapshot=1682685952 Total committed heap usage (bytes) = 164630528 Shuffle Errors BAD_ID=0 CONNECTION=0 IO_ERROR=0 WRONG_LENGTH=0 WRONG_MAP=0 WRONG_REDUCE=0 File Input Format Counters Bytes Read=2064 File Output Format Counters Bytes Written=1671

Check the output log, no errors occur, and view the results in the / test_wordcount_out directory:

[hadoop@hadoop-test-nn mapreduce] $hdfs dfs-ls / test_wordcount_outFound 2 items-rw-r--r-- 2 hadoop supergroup 0 2017-08-06 21:30 / test_wordcount_out/_SUCCESS-rw-r--r-- 2 hadoop supergroup 1671 2017-08-06 21:30 / test_wordcount_out/part-r-00000 [hadoop@hadoop-test-nn mapreduce] $hdfs dfs-cat / test_wordcount_out/part-r-00000

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.