Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

The process of building a fully distributed cluster in Hadoop2.7.3+Spark2.1.0

2025-04-04 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Share

Shulou(Shulou.com)06/03 Report--

1. Select three servers (CentOS system 64-bit)

114.55.246.88 Primary Node

114.55.246.77 slave node

114.55.246.93 slave node

Later operations must also know the password of the root user if the operation is performed by an ordinary user, because some operations have to be performed by the root user. The above problem does not exist if you are operating with root users.

I operate with root users.

two。 Modify the hosts file

Modify the hosts files for the three servers.

Vi / etc/hosts

Add at the end of the foundation of the original file:

114.55.246.88 Master114.55.246.77 Slave1114.55.246.93 Slave2

Save after the modification is completed and execute the following command.

Source / etc/hosts

3.ssh no password authentication configuration

3.1 install and start the ssh protocol

We need two services: ssh and rsync.

You can check whether it has been installed with the following command:

Rpm-qa | grep openssh

Rpm-qa | grep rsync

If you do not have ssh and rsync installed, you can install them with the following command:

Yum install ssh (install ssh protocol)

Yum install rsync (rsync is a remote data synchronization tool that allows you to quickly synchronize files between multiple hosts through LAN/WAN)

Service sshd restart (Startup Service)

3.2 configure Master to log in to all Salve without password

Configure the Master node, the following is the configuration action on the Master node.

1) generate a password pair on the Master node and execute the following command on the Master node:

Ssh-keygen-t rsa-P''

The generated key pairs: id_rsa and id_rsa.pub, are stored in the "/ root/.ssh" directory by default.

2) then make the following configuration on the Master node to append id_rsa.pub to the authorized key.

Cat ~ / .ssh/id_rsa.pub > > ~ / .ssh/authorized_keys

3) modify the following contents of the ssh configuration file "/ etc/ssh/sshd_config" to remove the comments on the following:

RSAAuthentication yes # enable RSA authentication

PubkeyAuthentication yes # enables public and private key pairing authentication

AuthorizedKeysFile .ssh / authorized_keys # public key file path (same as the file generated above)

4) restart the ssh service to make the previous configuration valid.

Service sshd restart

5) verify whether the password-less login is successful.

Ssh localhost

6) the next step is to copy the public key to all Slave machines. Use the following command to copy the public key:

Scp / root/.ssh/id_rsa.pub root@Slave1:/root/

Scp / root/.ssh/id_rsa.pub root@Slave2:/root/

Then configure the Slave node, and the following is the configuration operation on the Slave1 node.

1) create a ".ssh" folder under "/ root/". If it already exists, you don't need to create it.

Mkdir / root/.ssh

2) append the public key of Master to Slave1's authorization file "authorized_keys".

Cat / root/id_rsa.pub > > / root/.ssh/authorized_keys

3) modify "/ etc/ssh/sshd_config". For specific steps, refer to steps 3 and 4 of the previous Master settings.

4) use Master to log in to Slave1 without a password using ssh

Ssh 114.55.246.77

5) delete the "id_rsa.pub" file in the "/ root/" directory.

Rm-r / root/id_rsa.pub

Repeat the above five steps to configure the Slave2 server in the same way.

Configure all Slave password-less login Master

The following is the configuration operation on the Slave1 node.

1) create "Slave1" its own public and private keys, and append its own public key to the "authorized_keys" file, execute the following command:

Ssh-keygen-t rsa-P''

Cat / root/.ssh/id_rsa.pub > > / root/.ssh/authorized_keys

2) copy the Slave1 node's public key "id_rsa.pub" to the "/ root/" directory of the Master node.

Scp / root/.ssh/id_rsa.pub root@Master:/root/

The following is the configuration operation on the Master node.

1) append the public key of Slave1 to Master's authorization file "authorized_keys".

Cat ~ / id_rsa.pub > > ~ / .ssh/authorized_keys

2) delete the "id_rsa.pub" file copied by Slave1.

Rm-r / root/id_rsa.pub

Test password-less login from Slave1 to Master after configuration is complete.

Ssh 114.55.246.88

Follow the steps above to establish a password-less login between Slave2 and Master. In this way, Master can log in to each Slave without password authentication, and each Slave can log in to Master without password authentication.

4. Install the base environment (JAVA and SCALA environments)

4.1 Java1.8 environment building

1) download jdk-8u121-linux-x64.tar.gz and decompress

Tar-zxvf jdk-8u121-linux-x64.tar.gz

2) add the Java environment variable and add to / etc/profile:

Export JAVA_HOME=/usr/local/jdk1.8.0_121PATH=$JAVA_HOME/bin:$PATH

CLASSPATH=.:$JAVA_HOME/lib/rt.jar

Export JAVA_HOME PATH CLASSPATH

3) refresh the configuration after saving

Source / etc/profile

4.2 Scala2.11.8 environment building

1) download scala installation package scala-2.11.8.rpm installation

Rpm-ivh scala-2.11.8.rpm

2) add the Scala environment variable and add to / etc/profile:

Export SCALA_HOME=/usr/share/scala

Export PATH=$SCALA_HOME/bin:$PATH

3) refresh the configuration after saving

Source / etc/profile

5.Hadoop2.7.3 fully distributed build

The following is the operation on the Master node:

1) download binary package hadoop-2.7.3.tar.gz

2) decompress and move to the appropriate directory. I am used to putting the software in the / opt directory. The command is as follows:

Tar-zxvf hadoop-2.7.3.tar.gz

Mv hadoop-2.7.3 / opt

3) modify the corresponding configuration file.

Modify / etc/profile by adding the following:

Export HADOOP_HOME=/opt/hadoop-2.7.3/

Export PATH=$PATH:$HADOOP_HOME/bin

Export PATH=$PATH:$HADOOP_HOME/sbin

Export HADOOP_MAPRED_HOME=$HADOOP_HOME

Export HADOOP_COMMON_HOME=$HADOOP_HOME

Export HADOOP_HDFS_HOME=$HADOOP_HOME

Export YARN_HOME=$HADOOP_HOME

Export HADOOP_ROOT_LOGGER=INFO,console

Export HADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_HOME/lib/native

Export HADOOP_OPTS= "- Djava.library.path=$HADOOP_HOME/lib"

Execute after the modification is completed:

Source / etc/profile

Modify $HADOOP_HOME/etc/hadoop/hadoop-env.sh and modify JAVA_HOME as follows:

Export JAVA_HOME=/usr/local/jdk1.8.0_121

Modify $HADOOP_HOME/etc/hadoop/slaves, delete the original localhost, and change it to the following:

Slave1

Slave2

Modify $HADOOP_HOME/etc/hadoop/core-site.xml

Fs.defaultFS

Hdfs://Master:9000

Io.file.buffer.size

131072

Hadoop.tmp.dir

/ opt/hadoop-2.7.3/tmp

Modify $HADOOP_HOME/etc/hadoop/hdfs-site.xml

Dfs.namenode.secondary.http-address

Master:50090

Dfs.replication

two

Dfs.namenode.name.dir

File:/opt/hadoop-2.7.3/hdfs/name

Dfs.datanode.data.dir

File:/opt/hadoop-2.7.3/hdfs/data

Copy template and generate xml. The command is as follows:

Cp mapred-site.xml.template mapred-site.xml

Modify $HADOOP_HOME/etc/hadoop/mapred-site.xml

Mapreduce.framework.name

Yarn

Mapreduce.jobhistory.address

Master:10020

Mapreduce.jobhistory.address

Master:19888

Modify $HADOOP_HOME/etc/hadoop/yarn-site.xml

Yarn.nodemanager.aux-services

Mapreduce_shuffle

Yarn.resourcemanager.address

Master:8032

Yarn.resourcemanager.scheduler.address

Master:8030

Yarn.resourcemanager.resource-tracker.address

Master:8031

Yarn.resourcemanager.admin.address

Master:8033

Yarn.resourcemanager.webapp.address

Master:8088

4) copy the hadoop folder of the Master node to Slave1 and Slave2.

Scp-r / opt/hadoop-2.7.3 root@Slave1:/opt

Scp-r / opt/hadoop-2.7.3 root@Slave2:/opt

5) modify / etc/profile on Slave1 and Slave2 respectively, and the process is the same as Master.

6) start the cluster on the Master node and format the namenode before starting:

Hadoop namenode-format

Start:

/ opt/hadoop-2.7.3/sbin/start-all.sh

At this point, the fully distributed environment of hadoop has been built.

7) check whether the cluster starts successfully:

Jps

Master shows:

SecondaryNameNode

ResourceManager

NameNode

Slave shows:

NodeManager

DataNode

Building a fully distributed environment based on 6.Spark2.1.0

The following operations are done in the Master node.

1) download binary package spark-2.1.0-bin-hadoop2.7.tgz

2) extract and move to the appropriate directory with the following command:

Tar-zxvf spark-2.1.0-bin-hadoop2.7.tgz

Mv hadoop-2.7.3 / opt

3) modify the corresponding configuration file.

Modify / etc/profie by adding the following:

Export SPARK_HOME=/opt/spark-2.1.0-bin-hadoop2.7/

Export PATH=$PATH:$SPARK_HOME/bin

Copy spark-env.sh.template into spark-env.sh

Cp spark-env.sh.template spark-env.sh

Modify $SPARK_HOME/conf/spark-env.sh to add the following:

Export JAVA_HOME=/usr/local/jdk1.8.0_121export SCALA_HOME=/usr/share/scala

Export HADOOP_HOME=/opt/hadoop-2.7.3export HADOOP_CONF_DIR=/opt/hadoop-2.7.3/etc/hadoop

Export SPARK_MASTER_IP=114.55.246.88export SPARK_MASTER_HOST=114.55.246.88export SPARK_LOCAL_IP=114.55.246.88export SPARK_WORKER_MEMORY=1g

Export SPARK_WORKER_CORES=2export SPARK_HOME=/opt/spark-2.1.0-bin-hadoop2.7export SPARK_DIST_CLASSPATH=$ (/ opt/hadoop-2.7.3/bin/hadoop classpath)

Copy slaves.template into slaves

Cp slaves.template slaves

Modify $SPARK_HOME/conf/slaves to add the following:

Master

Slave1

Slave2

4) copy the configured spark file to the Slave1 and Slave2 nodes.

Scp / opt/spark-2.1.0-bin-hadoop2.7 root@Slave1:/opt

Scp / opt/spark-2.1.0-bin-hadoop2.7 root@Slave2:/opt

5) modify Slave1 and Slave2 configurations.

Modify / etc/profile on Slave1 and Slave2 respectively to increase the configuration of Spark, the process is the same as Master.

Modify $SPARK_HOME/conf/spark-env.sh in Slave1 and Slave2 to change export SPARK_LOCAL_IP=114.55.246.88 to the IP of the corresponding node of Slave1 and Slave2.

6) start the cluster on the Master node.

/ opt/spark-2.1.0-bin-hadoop2.7/sbin/start-all.sh

7) check whether the cluster starts successfully:

Jps

Master adds on top of Hadoop:

Master

Slave adds on top of Hadoop:

Worker

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Internet Technology

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report