Hadoop 3.0.0 installation configuration 07/06 Update SLTechnology News&Howtos

Hadoop 3.0.0 installation configuration

2025-07-06 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Shulou(Shulou.com)06/03 Report--

Environment description

According to the requirements, deploy the hadoop-3.0.0 basic functional architecture with three nodes as the installation environment and the operating system CentOS 7 x64

Openstack creates three virtual machines and starts deployment

IP address hostname

10.10.204.31 master

10.10.204.32 node1

10.10.204.33 node2

Functional node planning

Master node1 node2

NameNode

DataNode DataNode DataNode

HQuorumPeer NodeManager NodeManager

ResourceManager SecondaryNameNode

HMaster

Three nodes perform initialization operation

1. Update the system environment

Yum clean all & & yum makecache fast & & yum update-y & & yum install-y wget vim net-tools git ftp zip unzip

two。 Modify the hostname according to the plan

Hostnamectl set-hostname master

Hostnamectl set-hostname node1

Hostnamectl set-hostname node2

3. Add hosts parsing

Vim / etc/hosts

10.10.204.31 master

10.10.204.32 node1

10.10.204.33 node2

4.ping tests that the host names of the three hosts resolve to each other normally.

Ping master

Ping node1

Ping node2

5. Download and install the JDK environment

# hadoop 3.0 requires JDK 8.0 support

Cd / opt/

# normally, you need to log in to the oracle official website, register your account and agree to its agreement before you can download it. Here, you can download it directly by wget according to the link.

Wget-no-cookies-no-check-certificate-header "Cookie: gpw_e24=http%3A%2F%2Fwww.oracle.com%2F; oraclelicense=accept-securebackup-cookie"https://download.oracle.com/otn-pub/java/jdk/8u202-b08/1961070e4c9b4e26a04e7f5a083f551e/jdk-8u202-linux-x64.tar.gz""

# create JDK and hadoop installation paths

Mkdir / opt/modules

Cp / opt/jdk-8u202-linux-x64.tar.gz / opt/modules

Cd / opt/modules

Tar zxvf jdk-8u202-linux-x64.tar.gz

# configure environment variables

Export JAVA_HOME= "/ opt/modules/jdk1.8.0_202"

Export PATH=$JAVA_HOME/bin/:$PATH

Source / etc/profile

# permanent configuration method

Vim / etc/bashrc

# add lines

Export JAVA_HOME= "/ opt/modules/jdk1.8.0_202"

Export PATH=$JAVA_HOME/bin/:$PATH

6. Download unzipped hadoop-3.0.0 installation package

Cd / opt/

Wget http://archive.apache.org/dist/hadoop/core/hadoop-3.0.0/hadoop-3.0.0.tar.gz

Cp / opt/hadoop-3.0.0.tar.gz / modules/

Cd / opt/modules

Tar zxvf hadoop-3.0.0.tar.gz

7. Turn off selinux/firewalld Firewall

Systemctl disable firewalld

Vim / etc/sysconfig/selinux

SELINUX=disabled

8. Restart the server

Reboot

Master node operation

Description:

Test environment, all using root accounts to install and run hadoop

1. Add ssh password-free login

Ssh-keygen

# # enter three times

# copy key file to node1/node2

Ssh-copy-id master

Ssh-copy-id node1

Ssh-copy-id node2

two。 Test password-free login is normal

Ssh master

Ssh node1

Ssh node2

3. Modify hadoop configuration file

For hadoop configuration, you need to modify the configuration file:

Hadoop-env.sh

Yarn-env.sh

Core-site.xml

Hdfs-site.xml

Mapred-site.xml

Yarn-site.xml

Workers

Cd / opt/modules/hadoop-3.0.0/etc/hadoop

Vim hadoop-env.sh

Export JAVA_HOME=/opt/modules/jdk1.8.0_202

Vim yarn-env.sh

Export JAVA_HOME=/opt/modules/jdk1.8.0_202

Profile resolution:

Https://blog.csdn.net/m290345792/article/details/79141336

Vim core-site.xml

Fs.defaultFS

Hdfs://master:9000

Io.file.buffer.size

131072

Hadoop.tmp.dir

/ data/tmp

Hadoop.proxyuser.hadoop.hosts

Hadoop.proxyuser.hadoop.groups

# read / write buffer size in io.file.buffer.size queue file

Vim hdfs-site.xml

Dfs.namenode.secondary.http-address

Slave2:50090

Dfs.replication

three

The number of copies. The default configuration is 3, which should be less than the number of datanode machines.

Hadoop.tmp.dir

/ data/tmp

# namenode configuration

# dfs.namenode.name.dir NameNode persists the namespace and the path on the local file system of the transaction log, and if this is a comma-separated list of directories, tables with names are replicated in all directories for redundancy.

# dfs.hosts / dfs.hosts.exclude include / discard the list of data storage nodes, and use these files to control the list of allowed data storage nodes if necessary

# large file system with dfs.blocksize HDFS block size of 128MB (default)

# dfs.namenode.handler.count multiple NameNode server threads handle rpc from a large number of data nodes

# datanode configuration

A comma-separated path list of storage blocks on the local file system of # dfs.datanode.data.dir DataNode. If this is a comma-separated directory list, the data will be stored in all named directories, usually on different devices.

Vim mapred-site.xml

Mapreduce.framework.name

Yarn

Mapreduce.application.classpath

/ opt/modules/hadoop-3.0.0/etc/hadoop

/ opt/modules/hadoop-3.0.0/share/hadoop/common/

/ opt/modules/hadoop-3.0.0/share/hadoop/common/lib/

/ opt/modules/hadoop-3.0.0/share/hadoop/hdfs/

/ opt/modules/hadoop-3.0.0/share/hadoop/hdfs/lib/

/ opt/modules/hadoop-3.0.0/share/hadoop/mapreduce/

/ opt/modules/hadoop-3.0.0/share/hadoop/mapreduce/lib/

/ opt/modules/hadoop-3.0.0/share/hadoop/yarn/

/ opt/modules/hadoop-3.0.0/share/hadoop/yarn/lib/

Vim yarn-site.xml

Yarn.nodemanager.aux-services

Mapreduce_shuffle

Yarn.nodemanager.aux-services.mapreduce.shuffle.class

Org.apache.hadoop.mapred.ShuffleHandle

Yarn.resourcemanager.resource-tracker.address

Master:8025

Yarn.resourcemanager.scheduler.address

Master:8030

Yarn.resourcemanager.address

Master:8040

# resourcemanager and nodemanager configuration

# yarn.acl.enable allows ACLs. Default is false.

# yarn.admin.acl sets adminis on the cluster. ACLs are of for comma-separated-usersspacecomma-separated-groups. The default is to specify a value that represents anyone. In particular, the space indicates that there is no permission.

# yarn.log-aggregation-enable Configuration to enable or disable log aggregation configuration whether log aggregation is allowed.

# resourcemanager configuration

# yarn.resourcemanager.address value: ResourceManager host:port is used for client task submission. Description: if host:port is set, the yarn.resourcemanager.hostname.host:port hostname will be overwritten.

# yarn.resourcemanager.scheduler.address value: ResourceManager host:port is used by the application manager to obtain resources from the scheduler. Description: if host:port is set, the yarn.resourcemanager.hostname hostname will be overwritten

# yarn.resourcemanager.resource-tracker.address value: ResourceManager host:port is used for NodeManagers. Note: if you set host:port, the hostname setting of yarn.resourcemanager.hostname will be overridden.

# yarn.resourcemanager.admin.address value: ResourceManager host:port is used to manage commands. Description: if host:port is set, the setting of yarn.resourcemanager.hostname hostname will be overridden

# yarn.resourcemanager.webapp.address value: ResourceManager web-ui host:port. Description: if host:port is set, the setting of yarn.resourcemanager.hostname hostname will be overridden

# yarn.resourcemanager.hostname value: ResourceManager host. Description: can be set to replace all yarn.resourcemanager address resources with a single host name. As a result, the default port is the ResourceManager component.

# yarn.resourcemanager.scheduler.class value: ResourceManager scheduling class. Description: Capacity scheduling (recommended), Fair scheduling (also recommended), or Fifo scheduling. Use a fully qualified class name, such as org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.

# yarn.scheduler.minimum-allocation-mb value: the minimum memory allocated for each requested container on Resource Manager.

# yarn.scheduler.maximum-allocation-mb value: maximum memory allocated for each requested container on Resource Manager

# yarn.resourcemanager.nodes.include-path / yarn.resourcemanager.nodes.exclude-path value: allowed / discarded nodeManagers list description: if necessary, use these files to control the allowed NodeManagers list

Vim workers

Master

Slave1

Slave2

4. Modify startup file

# because the test environment starts the hadoop service with a root account, you need to add permissions to the startup file

Cd / opt/modules/hadoop-3.0.0/sbin

Vim start-dfs.sh

# add lines

HDFS_DATANODE_USER=root

HDFS_DATANODE_SECURE_USER=root

HDFS_NAMENODE_USER=root

HDFS_SECONDARYNAMENODE_USER=root

HDFS_ZKFC_USER=root

HDFS_JOURNALNODE_USER=root

Vim stop-dfs.sh

# add lines

HDFS_DATANODE_USER=root

HDFS_DATANODE_SECURE_USER=root

HDFS_NAMENODE_USER=root

HDFS_SECONDARYNAMENODE_USER=root

HDFS_ZKFC_USER=root

HDFS_JOURNALNODE_USER=root

Vim start-yarn.sh

# add lines

YARN_RESOURCEMANAGER_USER=root

HADOOP_SECURE_DN_USER=yarn

YARN_NODEMANAGER_USER=root

Vim stop-yarn.sh

# add lines

YARN_RESOURCEMANAGER_USER=root

HADOOP_SECURE_DN_USER=yarn

YARN_NODEMANAGER_USER=root

5. Push hadoop profile

Cd / opt/modules/hadoop-3.0.0/etc/hadoop

Scp. / root@node1:/opt/modules/hadoop-3.0.0/etc/hadoop/

Scp. / root@node2:/opt/modules/hadoop-3.0.0/etc/hadoop/

6. Format hdfs

# specify the hdfs storage path as / data/tmp/ in the configuration file

/ opt/modules/hadoop-3.0.0/bin/hdfs namenode-format

7. Start the hadoop service

# namenode three nodes

Cd / opt/modules/zookeeper-3.4.13

. / bin/zkServer.sh start

Cd / opt/modules/kafka_2.12-2.1.1

. / bin/kafka-server-start.sh. / config/server.properties &

/ opt/modules/hadoop-3.0.0/bin/hdfs journalnode &

# master node

/ opt/modules/hadoop-3.0.0/bin/hdfs namenode-format

/ opt/modules/hadoop-3.0.0/bin/hdfs zkfc-formatZK

/ opt/modules/hadoop-3.0.0/bin/hdfs namenode &

# slave1 node

/ opt/modules/hadoop-3.0.0/bin/hdfs namenode-bootstrapStandby

/ opt/modules/hadoop-3.0.0/bin/hdfs namenode &

/ opt/modules/hadoop-3.0.0/bin/yarn resourcemanager &

/ opt/modules/hadoop-3.0.0/bin/yarn nodemanager &

# slave2 node

/ opt/modules/hadoop-3.0.0/bin/hdfs namenode-bootstrapStandby

/ opt/modules/hadoop-3.0.0/bin/hdfs namenode &

/ opt/modules/hadoop-3.0.0/bin/yarn resourcemanager &

/ opt/modules/hadoop-3.0.0/bin/yarn nodemanager &

# namenode three nodes

/ opt/modules/hadoop-3.0.0/bin/hdfs zkfc &

# master node

Cd / opt/modules/hadoop-3.0.0/

. / sbin/start-all.sh

Cd / opt/modules/hadoop-3.0.0/hbase-2.0.4

. / bin/start-hbase.sh

8. Check that the hadoop service starts normally on each node

Jps

9. Run the test

Cd / opt/modules/hadoop-3.0.0

# create a test path on hdfs

. / bin/hdfs dfs-mkdir / testdir1

# create a test file

Cd / opt

Touch wc.input

Vim wc.input

Hadoop mapreduce hive

Hbase spark storm

Sqoop hadoop hive

Spark hadoop

# upload wc.input to HDFS

Bin/hdfs dfs-put / opt/wc.input / testdir1/wc.input

# run mapreduce Demo that comes with hadoop

. / bin/yarn jar / opt/modules/hadoop-3.0.0/share/hadoop/mapreduce/hadoop-mapreduce-examples-3.0.0.jar wordcount / testdir1/wc.input / output

# View the output file

Bin/hdfs dfs-ls / output

10. Status screenshot

Take screenshots after all services are started normally:

Zookeeper+kafka+namenode+journalnode+hbase

Pass by a little like, the technology is promoted to the first line, come on ↖ (^ ω ^) ↗!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.