Introduction to spark and Construction of Cluster Environment 04/25 Update SLTechnology News&Howtos

Introduction to spark and Construction of Cluster Environment

2025-04-25 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Shulou(Shulou.com)06/03 Report--

Software environment: VMware workstation 11.0

Linux: CentOS 6.7

Hadoop-2.7.3

Jdk-1.0.7_67

Spark-2.1.0-bin-hadoop2.7/

Installing virtual machines and jdk will not repeat this.

Explain the installation of hadoop and spark directly.

one. Download the hadoop source package. Click here to download: http://hadoop.apache.org/

1. Download it and unzip it to the specified directory.

Tar-zxvf hadoop-2.7.3.tar.gz-C / usr/hadoop

two。 After decompressing, enter the directory cd / usr/hadoop/hadoop-2.7.3

3. Modify configuration files: several configuration files need to be modified to enter cd / hadoop-2.7.3/etc/hadoop

1 > cp etc/hadoop/hadoop-env.sh.template.hadoop-env.sh

Cp etc/hadoop/hdfs-site.xml.templete hdfs-site.xml

Cp etc/hadoop/core-site.templete core-site.xml

Cp etc/hadoop/mapred-env.sh.templete mapred-env.sh

Cp etc/hadoop/mapred-site.templete mapred-site.sh

Cp etc/hadoop/slaves.templete slaves

Cp etc/yarn-env.sh.templete yarn-env.sh

Cp etc/yarn-site.xml.templete yarn-site.xml

Note: in general, it is best to make a backup or copy modification when modifying the system configuration file, like this.

Contents of the hadoop-env.sh configuration file

# The java implementation to use. Export JAVA_HOME=/opt/modules/jdk1.7.0_67/ this is the location where you need to modify the jdk installed for your own machine. The contents of other files do not need to be modified. Hdfs-site.xml dfs.replication 2 dfs.block.size 134217728 dfs.namenode.name.dir / home/hadoopdata/dfs/name dfs.datanode.data.dir / home/hadoopdata/dfs/data fs.checkpoint.dir / home/hadoopdata / checkpoint/dfs/cname fs.checkpoint.edits.dir / home/hadoopdata/checkpoint/dfs/cname dfs.http.address master:50070 dfs.secondary.http.address slave1:50090 dfs.webhdfs.enabled true Dfs.permissions true

3 > core-site.xml

Fs.defaultFS

Hdfs://master:8020

Io.file.buffer.size

4096

Hadoop.tmp.dir

/ opt/modules/hadoop-2.7.3/data/tmp

4 > mapred-env.sh

Export JAVA_HOME=/usr/local/java/jdk1.7.0_67/

Change it to the jdk path of your computer.

5 > mapred-site.xml

Mapreduce.framework.name

Yarn

True

Mapreduce.jobhistory.address

Master:10020

Mapreduce.jobhistory.webapp.address

Master:19888

Mapreduce.job.ubertask.enable

True

Mapred.job.tracker

Master:9001

6 > slaves setting requires several nodes to run

Master

Slave1

Slave2

7 > yarn-env.sh

JAVA=$JAVA_HOME/bin/java refers to the path of jdk

8 > yarn-site.xml

Yarn.resourcemanager.hostname

Salve2

Yarn.nodemanager.aux-services

Mapreduce_shuffle

Yarn.resourcemanager.address

Slave2:8032

Yarn.resourcemanager.scheduler.address

Slave2:8030

Yarn.resourcemanager.resource-tracker.address

Slave2:8031

Yarn.resourcemanager.admin.address

Slave2:8033

Yarn.resourcemanager.webapp.address

Slave2:8088

Yarn.log-aggregation-enable

True

Yarn.nodemanager.remote-app-log-dir

/ opt/modules/hadoop-2.7.3/tmp/logs

Note: after you modify the configuration, you need to distribute the hadoop to other nodes. Slave1 and slave2.

Scp-r / usr/hadoop/hadoop-2.7.3 root@slave1:/usr/hadoop

Scp-r / usr/hadoop/hadoop-2.7.3 root@slave2:/usr/hadoop

Then you need to modify the environment variable file. You can modify the current user's environment variable file, namely ~. / bash_rc or global variable file / etc/profile

Export JAVA_HOME=/usr/local/jdk-1.7.0_67

Export PATH=:PATH:$JAVA_HOME/bin:$HADOOP_HOME/bin:$HADOOP_HOME/sbin:

Save: wq; can then send the environment variable file to other nodes.

two。 Format namenode nodes

Hadoop namenode-format will see a pile of output information. The namenode is then prompted for a successful formatting.

three. Start the cluster.

You can start it all or separately.

There is a sbin directory under the hadoop installation path, where the startup scripts of the system are stored.

Start all: start-all.sh

Start the node separately: hadoop-daemon.sh start namenode/datanode xxx.

Use jps to view the process after startup.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.