How to build pseudo-distributed Environment in Hadoop 2.x 10/21 Update SLTechnology News&Howtos

How to build pseudo-distributed Environment in Hadoop 2.x

2025-10-21 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Database >

Shulou(Shulou.com)05/31 Report--

In this issue, the editor will bring you about how to build a pseudo-distributed environment in Hadoop 2.x. The article is rich in content and analyzed and described from a professional point of view. I hope you can get something after reading this article.

1. Modify hadoop-env.sh, yarn-env.sh, mapred-env.sh

Method: use notepad++ (beifeng user) to open these three files

Add code: export JAVA_HOME=/opt/modules/jdk1.7.0_67

2. Modify the configuration files of core-site.xml, hdfs-site.xml, yarn-site.xml and mapred-site.xml

1) modify core-site.xml

Fs.defaultFS hdfs://Hadoop-senior02.beifeng.com:8020 hadoop.tmp.dir / opt/modules/hadoop-2.5.0/data

2) modify hdfs-site.xml

Dfs.replication 1 dfs.namenode.http-address Hadoop-senior02.beifeng.com:50070

3) modify yarn-site.xml

Yarn.nodemanager.aux-services mapreduce_shuffle yarn.resourcemanager.hostname Hadoop-senior02.beifeng.com yarn.log-aggregation-enable true yarn.log-aggregation.retain-seconds 86400

4) modify mapred-site.xml

Mapreduce.framework.name yarn mapreduce.jobhistory.webapp.address 0.0.0.0:19888

3. Start hdfs

1) format namenode:$ bin/hdfs namenode-format

2) start namenode:$sbin/hadoop-daemon.sh start namenode

3) start datanode:$sbin/hadoop-daemon.sh start datanode

4) hdfs monitoring web page: http://hadoop-senior02.beifeng.com:50070

4. Start yarn

1) start resourcemanager:$sbin/yarn-daemon.sh start resourcemanager

2) start nodemanager:sbin/yarn-daemon.sh start nodemanager

3) yarn monitoring web page: http://hadoop-senior02.beifeng.com:8088

5. Test the wordcount jar package

1) location path: / opt/modules/hadoop-2.5.0

2) Code testing: bin/yarn jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.5.0.jar wordcount / input/sort.txt / output6/

Run the process:

16-05-08 06:39:13 INFO client.RMProxy: Connecting to ResourceManager at Hadoop-senior02.beifeng.com/192.168.241.130:8032

16-05-08 06:39:15 INFO input.FileInputFormat: Total input paths to process: 1

16-05-08 06:39:15 INFO mapreduce.JobSubmitter: number of splits:1

16-05-08 06:39:15 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1462660542807_0001

16-05-08 06:39:16 INFO impl.YarnClientImpl: Submitted application application_1462660542807_0001

06:39:16 on 16-05-08 INFO mapreduce.Job: The url to track the job: http://Hadoop-senior02.beifeng.com:8088/proxy/application_1462660542807_0001/

16-05-08 06:39:16 INFO mapreduce.Job: Running job: job_1462660542807_0001

16-05-08 06:39:36 INFO mapreduce.Job: Job job_1462660542807_0001 running in uber mode: false

16-05-08 06:39:36 INFO mapreduce.Job: map 0 reduce 0

16-05-08 06:39:48 INFO mapreduce.Job: map 100% reduce 0

16-05-08 06:40:04 INFO mapreduce.Job: map 100 reduce 100%

16-05-08 06:40:04 INFO mapreduce.Job: Job job_1462660542807_0001 completed successfully

16-05-08 06:40:04 INFO mapreduce.Job: Counters: 49

3) View the result: bin/hdfs dfs-text / output6/par*

Running result:

Hadoop 2

Jps 1

Mapreduce 2

Yarn 1

6. MapReduce History Server

1) start: sbin/mr-jobhistory-daemon.sh start historyserver

2) web ui interface: http://hadoop-senior02.beifeng.com:19888

7. Hdfs, yarn, mapreduce function

1) hdfs: distributed file system, highly fault-tolerant file system, suitable for deployment on cheap machines.

Hdfs is a master-slave structure, which is divided into namenode and datanode, where namenode is the namespace, datanode is the storage space, datanode is stored in the form of data blocks, each data block 128m

2) yarn: a general resource management system that provides unified resource management and scheduling for upper-level applications.

Yarn is divided into resourcemanager and nodemanager,resourcemanager responsible for resource scheduling and allocation, nodemanager is responsible for data processing and resources

3) mapreduce:MapReduce is a computational model, which is divided into Map (mapping) and Reduce (reduction).

After each row of data is processed by map, it appears in the form of key-value pairs, and sends it to reduce;reduce to summarize and count the data transmitted by map.

This is how to build a pseudo-distributed environment in Hadoop 2.x shared by the editor. If you happen to have similar doubts, please refer to the above analysis to understand. If you want to know more about it, you are welcome to follow the industry information channel.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.