Three installation modes of Hadoop 07/09 Update SLTechnology News&Howtos

Three installation modes of Hadoop

2025-07-09 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Servers >

Shulou(Shulou.com)06/01 Report--

This article mainly explains "three installation modes of Hadoop". Friends who are interested may wish to have a look. The method introduced in this paper is simple, fast and practical. Let's let the editor take you to learn the "three installation modes of Hadoop".

There are three Hadoop installation modes: stand-alone mode, pseudo-distributed mode, and truly distributed mode.

One stand-alone mode standalone

Stand-alone mode is the default mode for Hadoop. When the source package of Hadoop was extracted for the first time, Hadoop could not understand the hardware installation environment, so it conservatively chose the minimum configuration. All three XML files are empty in this default mode. When the configuration file is empty, Hadoop runs entirely locally. Because there is no need to interact with other nodes, stand-alone mode does not use HDFS and does not load any Hadoop daemons. This mode is mainly used to develop and debug the application logic of MapReduce programs.

Two pseudo-distribution mode installation

Tar xzvf hadoop-0.20.2.tar.gz

Configuration file for Hadoop:

Conf/hadoop-env.sh configuration JAVA_HOME

Core-site.xml configuration HDFS Node name and address

Hdfs-site.xml configure HDFS storage directory, replication quantity

Mapred-site.xml configure the jobtracker address of mapreduce

Configure ssh to generate a key so that ssh can connect without a password

(RSA algorithm, based on factor asymmetric encryption: public key encryption private key can be decrypted, private key encryption public key can decrypt)

Cd / root

Ssh-keygen-t rsa

Cd .ssh

If cp id_rsa.pub authorized_keys overrides the public key, you can connect without a password.

Start Hadoop bin/start-all.sh

Stop Hadoop bin/stop-all.sh

Two completely distributed mode

1. Configure the etc/hosts file to resolve the host name to IP or use the DNS service to resolve the host name

two。 Set up hadoop to run user: useradd grid > passwd grid

3. Configure ssh password connection: each node logs in with grid, enters the main working directory, ssh-keygen-t rsa produces the public key, then copies the public key of each node to the same file, and then copies the file containing the public keys of all nodes to the authorized_keys directory of each node, so that each node can connect to each other without a password.

4. Download and extract the hadoop installation package

5. Configure namenode, modify site file

6. Configure hadoop-env.sh

7. Configure masters and slaves files

8. Copy hadoop to each node

9. Format namenode

10. Start hadoop

11. Use jps to verify whether each background process starts successfully

Vim / etc/hosts

Vim / etc/sysconfig/network

Vim / etc/sysconfig/network-scripts/ifcfg-eth0

Service network restart

Service NetworkManager stop

Chkconfig NetworkManager off

Service iptables stop

Chkconfig iptables off

Service ip6tables stop

Chkconfig ip6tables off

Vim / etc/sysconfig/selinux enforce-- > disabled

Setenforce 0

Getenforce

Useradd hadoop

Passwd hadoop

SecureCRT produces the public key and copies the public key to / home/hadoop/.ssh:

Chmod 700 .ssh

Ssh-keygen-I-f PubKey_Master_Hadoop.pub > > authorized_key

Chmod 600 authorized_keys

Vim / etc/ssh/sshd-config is as follows:

RSAAuthentication yes # enable RSA authentication

PubkeyAuthentication yes # enables public and private key pairing authentication

AuthorizedKeysFile .ssh / authorized_keys # public key file path (same as the file generated above)

PasswordAuthentication no # forbids password authentication login (if necessary, but generally if the key is enabled, the password is not used)

Service sshd restart

SecureCRT uses PublicKey (PubKey_Master_Hadoop.pub) connection testing

Master uses the ssh public key to connect to Slave:

Mount the CD and create a yum source:

Vim / etc/fstab

Vim / etc/yum.repos.d/rhel-source.repo

Yum-y install ssh

Yum-y install rsync

Master host:

Mkdir .ssh

Ssh-keygen-t rsa-P''

Cat id_rsa.pub > > authorized_keys

Chmod 700 .ssh

Chmod 600 authorized_keys

Ssh localhost

Scp id_rsa.pub hadoop@192.168.175.12:~/,ssh

Ssh 192.168.175.12

Slave host:

Mkdir .ssh

Cat id_rsa.pub > > authorized_keys

Chmod 700 .ssh

Chmod 600 authorized_keys

Vim / etc/ssh/sshd-config as above

Service sshd restart

Install java: copy to / usr/hava

Chmod + x jdk-6u37-linux-x64.bin

Vim / etc/profile add JAVA_HOME

Source / etc profile

= =

Install hadoop-1.2.0: copy to / usr/hadoop-1.2.0, so the author and group are modified to hadoop

Vim / etc/profile:

Export JAVA_HOME=/usr/java/jdk1.6.0_37

Export HADOOP_HOME=/usr/hadoop-1.2.0

Export PATH=$PATH:$HADOOP_HOME/bin:$HADOOP_HOME/sbin

Source / etc/profile

Configure hadoop-env.sh: directory: / usr/hadoop/conf

# export HADOOP_ROOT_LOGGER=DEBUG,console

Export JAVA_HOME=/usr/java/jdk1.6.0_37

Export HADOOP_HOME_WARN_SUPPRESS=1

Configure the core-site.xml file:

Hadoop.tmp.dir

/ usr/hadoop-1.2.0/tmp

A base for other temporary directories.

Fs.default.name

Hdfs://192.168.175.11:9000

Configure the hdfs-site.xml file:

Dfs.replication

one

Dfs.datanode.max.xcievers / / the maximum number of files to be processed at the same time. Hbase concurrency is relatively large, at least 4096.

4096

Dfs.support.append / / if not specified, HBase may lose data when using HDFS storage

True

Configure the mapred-site.xml file:

Mapred.job.tracker

Http://192.168.175.11:9001

Configure the masters file:

Master.hadoop or 192.168.175.11

Configure the slave file:

Slave1.hadoop or 192.168.175.12

Slave2.hadoop or 192.168.175.13

Slave3.hadoop or 192.168.175.14

= =

Install hadoop-2.0.5: copy to / usr

Tar-zxvf hadoop-2.0.5-alpha.tar.gz

Mv hadoop-2.0.5-alpha / usr/hadoop

Chown-R hadoop:hadoop hadoop

Vim / etc/profile:

# set hadoop path

Export HADOOP_HOME=/usr/hadoop

Export PATH=$PATH:$HADOOP_HOME:$HADOOP_HOME/bin

Source / etc/profile

Configure hadoop-env.sh: directory: / usr/hadoop/etc/hadoop

Add: export JAVA_HOME=/usr/java/jdk1.6.0_37 at the end

Configure yarn-env.sh and .bash _ profile:

Export HADOOP_PREFIX=/usr/hadoop

Export PATH=$PATH:$HADOOP_PREFIX/bin

Export PATH=$PATH:$HADOOP_PREFIX/sbin

Export HADOOP_MAPRED_HOME=$ {HADOOP_PREFIX}

Export HADOOP_COMMON_HOME=$ {HADOOP_PREFIX}

Export HADOOP_HDFS_HOME=$ {HADOOP_PREFIX}

Export YARN_HOME=$ {HADOOP_PREFIX}

Export HADOOP_CONF_DIR=$ {HADOOP_FREFIX} / etc/hadoop

Export YARN_CONF_DIR=$ {HADOOP_FREFIX} / etc/hadoop

Configure the core-site.xml file:

Hadoop.tmp.dir

/ usr/hadoop/tmp (Note: please create a tmp folder under the / usr/hadoop directory first)

A base for other temporary directories.

Fs.default.name

Hdfs://192.168.175.11:9000

Configuration hdfs-site.xml file: modify the configuration of HDFS in Hadoop. The backup mode of configuration is 3 by default.

Dfs.replication

1 (Note: replication is the number of copies of data. If the number of copies is less than 3, error will be reported by default.)

Dfs.namenode.name.dir

File:/home/hadoop/dfs/name

True

Dfs.datanode.data.dir

File:/home/hadoop/dfs/data

True

Configuration mapred-site.xml file: modify the configuration file of MapReduce in Hadoop to configure the address and port of JobTracker.

Mapred.job.tracker

Hdfs://192.168.175.11:9001

Mapreduce.framework.name

Yarn

Mapred.system.dir

File:/home/hadoop/mapred/system

True

Mapred.local.dir

File:/home/hadoop/mapred/local

True

Mapred.job.tracker

Hdfs://192.168.175.11:9001

Configure yarn-site.xml:

Yarn.resourcemanager.address

192.168.175.11:8080

Yarn.resourcemanager.scheduler.address

192.168.175.11:8081

Yarn.resourcemanager.resource-tracker.address

192.168.175.11:8082

Yarn.nodemanager.aux-services

Mapreduce.shuffle

Yarn.nodemanager.aux-services.mapreduce.shuffle.class

Org.apache.hadoop.mapred.ShuffleHandler

Configure the masters file:

Master.hadoop or 192.168.175.11

Configure the slave file:

Slave1.hadoop or 192.168.175.12

Slave2.hadoop or 192.168.175.13

Slave3.hadoop or 192.168.175.14

Mkdir-p / usr/hadoop/tmp

Mkdir-p / home/hadoop/dfs/data

Mkdir-p / home/hadoop/dfs/name

Mkdir-p / home/hadoop/mapred/system

Mkdir-p / home/hadoop/mapred/local

Format HDFS file system: use hadoop users, only need to format once: hadoop namenode-format

Start the daemon

# hadoop-daemon.sh start namenode

# hadoop-daemon.sh start datanode

Can start: start-dfs.sh at the same time

Start the Yarn daemon

# yarn-daemon.sh start resourcemanager

# yarn-daemon.sh start nodemanager

Or start at the same time: start-yarn.sh

Check whether the daemon is started

# jps

At this point, I believe you have a deeper understanding of the "three installation modes of Hadoop". You might as well do it in practice. Here is the website, more related content can enter the relevant channels to inquire, follow us, continue to learn!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.