Set up docker and build spark cluster in docker container 07/09 Update SLTechnology News&Howtos

Set up docker and build spark cluster in docker container

2025-07-09 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Shulou(Shulou.com)06/03 Report--

I personally tested and built, because it is a beginner to build all the way down a lot of detours. If there are any wrong or more concise steps, please put forward

Environment: virtual machine installed on win10, centos7 installed on virtual machine, and liunx interface (previously built once, on the premise that both network and port can be telnet, but cannot access the service address in docker container, this time is to prevent host machine from being unable to access using virtual machine interface browser), centos7 command is different from centos6, and there is no iptables command in centos7, if you want to use your own installation.

The virtual machine ip:192.168.20.129 built by myself

Spark master node IP: 172.17.0.2 corresponds to docker container name cloud1

Spark worker node IP: 172.17.0.3 corresponds to docker container name cloud2

Spark worker node IP: 172.17.0.4 corresponds to the docker container name cloud3 installation docker container steps:

Command under 1.root permission to view the current kernel version $uname-r. Docker requires the CentOS system kernel version to be higher than 3.10.

two。 Log in to Centos with root permissions to ensure that the yum package is updated to the latest

Command: yum update

3. If the old version is installed, uninstall the old docker first

Command: yum remove docker docker-common docker-selinux docker-engine

4. Install the required software package (or dependent package)

Command: yum install-y yum-utils device-mapper-persistent-data lvm2

5. Set up the yum source

(Ali) order: yum-config-manager-- add-repo http://mirrors.aliyun.com/docker-ce/linux/centos/docker-ce.repo

(official website) order: yum-config-manager-add-repo https://download.docker.com/linux/centos/docker-ce.repo

6. View all docker versions in the repository and select

Command: yum list docker-ce-- showduplicates | sort-r

7. Install docker

Command: yum install docker-ce # since only stable repository is opened by default in repo, the latest stable version 17.12.0 is installed here

Command: yum install # for example: sudo yum install docker-ce-17.12.0.ce

8. Start and join the boot boot

Command: systemctl start docker

Systemctl enable docker

9. Check whether the installation is successful

Command: docker version

Set up a spark cluster in the docker container. The hadoop2.7,jdk1.8, spark2.4, scala2.12.8 and zookeeper3.4.12 installed here

Steps for docker container to build spark cluster:

1. First pull a ubuntn into the docker

Command: docker pull ubuntu

If the liun is too slow, you can configure the acceleration image https://www.daocloud.io to register your own users, and then find your own copy of the liun acceleration command on the acceleration page and execute it in liunx. For more information, please refer to https://blog.51cto.com/14159501/2338377.

two。 You can view the image after downloading the ubuntu to your local location.

Command: docker images

Create a java directory under the path / usr/local in centos7 (command mkdir java), which stores the installation packages such as jdk,spark,scala to be installed, and drag it directly to this directory with SSH. After the installation package of the directory, copy to the docker container. After installing centos7, the SSH port is enabled by default. You can use the command ps-ef | grep ssh to check whether the SSH is started (Note: SSH should also be installed and self-started in the docker container, otherwise there will be problems when starting the Hadoop node later, which will be described later when you build the container)

3. Run Mirror

Command: docker run-- name cloud1-h cloud1-- add-host cloud1:172.17.0.2-it ubuntu

The running mirror gives the name of a cloud1 and assigns the IP address 172.17.0.2

From the command: docker network inspect bridge, you can see the container name and IP address of the startup in container.

Check the ip address of the container according to the command docker inspect container name | grep IPAddress

4. Configure SSH in the container

View SSH status: service ssh status

If not, install:

Apt-get update-Update

Apt install net-tools-if the ifconfig command is not installed

Apt install iputils-ping-if the ping command is not installed

Apt-get install vim-install the vim command

Apt-get install ssh-install SSH

After installation, add / usr/sbin/sshd to the command vim ~ / .bashrc

If ssh default configuration root cannot log in, change PermitRootLogin no to yes in / etc/ssh/sshd_config

Generate access key

Cd ~ / switch to the root directory

Ssh-keygen-t rsa-P''- f ~ / .ssh/id_rsa

Cd .ssh enter the ssh directory

Cat id_rsa.pub > > authorized_keys

Service ssh start enables ssh

Ssh localhost date verifies that SSH can be used

Ssh root@cloud1 tests whether the connection is successful

Check to see if SSH: which ssh is installed

Which sshd

Check to see if the SSH service starts:

Ps aux | grep ssh

5. Create a java directory in the container / usr/local to place the toolkit to be installed

Under liunx instead of the container, use the command

The docker cp / usr/local/java/ container ID:/usr/local/ copies all the installation tools under java to the container (under the same path). Another way here is to map the / usr/local/java directory under liunx to the container when starting the image. Command: docker run-v / usr/local/java/:/usr/local-it ubuntu or command docker run-I-t-v / usr/local/java:/usr/local/java image ID / bin/bash

6. Start installing JDK. ZOOKEEPER, SCALA,HADOOP,SPARK

Change the directory to / usr/local/java, where all installation packages are located

6. 1: install JDK:

Command: tar-xzvf jdk.xx.xx.tar.gz decompress the package

Empowerment: chmod 777 decompressed jdk package

Delete compressed package: rm-rf jdk.xx.xx.tar.gz

Vim ~ / .bashrc add parameters for JDK

Export JAVA_HOME=/usr/local/java/jdk1.8.0_191

Export PATH=$PATH:$JAVA_HOME/bin

Save exit command: source ~ / .bashrc to make the changed file effective

Check whether java-version is installed successfully

6. 2: install scala

Decompress: tar-zxvf scala-2.12.8.tgz

Empowerment: chmod 777 scala-2.12.8

Delete compressed package: rm-rf scala-2.12.8

Vim ~ / .bashrc add parameters for scala

Export SCALA_HOME=/usr/local/java/scal-2.12.8

Export PATH=$PATH:$SCALA_HOME/bin

Save the exit command source ~ / .bashrc

Command to check whether the installation is successful: scala-version

6.3:Zookeeper installation

Decompress: tar-zxvf zookeeper-3.4.12.tar.gz

Empowerment: chmod 777 zookeeper-3.4.12

Delete compressed package: rm-rf zookeeper-3.4.12.tar.gz

Vim ~ / .bashrc add parameters for zookeeper

Export ZOOKEEPER_HOME=/usr/local/java/zookeeper-3.4.12

Export PATH=$PATH:$ZOOKEEPER_HOME/bin

Save the exit command source ~ / .bashrc

Generate zookeeper configuration file

Execute the command cp / usr/local/java/zookeeper-3.4.12/conf/zoo_sample.cfg / usr/local/java/zookeeper-3.4.12/conf/zoo.cfg under zookeeper-3.4.12/conf under zookeeper decompression package

Modify the zoo.cfg file

# modify the data storage directory to:

DataDir=/root/zookeeper/tmp (to first create a zookeeper directory under root, tmp directory command: mkdir ~ / zookeeper; mkdir ~ / zookeeper/tmp)

# add Zkserver configuration information at the end:

Server.1=cloud1:2888:3888

Server.2=cloud2:2888:3888

Server.3=cloud3:2888:3888

Create a file under the / root/zookeeper/tmp path myid command: touch ~ / zookeeper/tmp/myid

Execute the command under / root/zookeeper/tmp: echo 1 > ~ / zookeeper/tmp/myid

When you open vim myid, you can see that 1 is written in myid.

6.4: install Hadoop

Command tar-zxvf Hadoop-2.7.7.tar.gz

Vim ~ / .bashrc add parameters for hadoop (command: vi ~ / .bashrc)

Export HADOOP_HOME=/usr/local/java/hadoop-2.7.7 (filled in according to the actual catalog)

Export PATH=$PATH:$HADOOP_HOME/bin:$HADOOP_HOME/sbin

6.4.1: modify the Hadoop startup configuration file (/ usr/local/java/hadoop-2.6.4/etc/hadoop/hadoop-env.sh):

# modify JAVA_HOME

Export JAVA_HOE=/usr/local/java/jdk1.8.0_191

6.4.2: configure the core configuration file (/ usr/local/java/Hadoop-2.6.4/etc/Hadoop/core-site.xml)

Fs.defaultFS

Hdfs://ns1

Hadoop.tmp.dir

/ root/hadoop/tmp

Ha.zookeeper.quorum

Cloud1:2181,cloud2:2181,cloud3:2181

6.4.3: modify the HDFS configuration file (/ usr/local/java/Hadoop-2.6.4/etc/Hadoop/hdfs-site.xml)

Dfs.nameservices

Ns1

Dfs.ha.namenodes.ns1

Nn1,nn2

Dfs.namenode.rpc-address.ns1.nn1

Cloud1:9000

Dfs.namenode.http-address.ns1.nn1

Cloud1:50070

Dfs.namenode.rpc-address.ns1.nn2

Cloud2:9000

Dfs.namenode.http-address.ns1.nn2

Cloud2:50070

Dfs.namenode.shared.edits.dir

Qjournal://cloud1:8485;cloud2:8485;cloud3:8485/ns1

Dfs.journalnode.edits.dir

/ root/hadoop/journal

Dfs.ha.automatic-failover.enabled

True

Dfs.client.failover.proxy.provider.ns1

Org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider

Dfs.ha.fencing.methods

Sshfence

Shell (/ bin/true)

Dfs.ha.fencing.ssh.private-key-files

/ root/.ssh/id_rsa

Dfs.ha.fencing.ssh.connect-timeout

30000

Dfs.http.address

0.0.0.0:50070

6.4.4: modify the configuration file of Yarn (/ usr/local/java/Hadoop-2.6.4/etc/Hadoop/yarn-site.xml)

Yarn.resourcemanager.hostname

Cloud1

Yarn.nodemanager.aux-services

Mapreduce_shuffle

6.4.5: modify the maprep-site.xml file (/ usr/local/java/Hadoop-2.6.4/etc/Hadoop/maprep-site.xml)

Mv mapred-site.xml.template mapred-site.xml

Vim mapred-site.xml

Mapreduce.framework.name

Yarn

6.4.6: modify the configuration file (/ usr/local/java/Hadoop-2.6.4/etc/Hadoop/slaves) for the specified DataNode and NodeManager

Cloud1

Cloud2

Cloud3

5 install spark

Command tar-zxvf spark-2.4.0-bin-hadoop2.7.tgz

Add the parameters of scala to the host ~ / .bashrc (command: vi ~ / .bashrc)

Export SPARK_HOME=/usr/local/java/spark-1.6.1-bin-hadoop2.6

Export PATH=PATH:$SPARK_HOME/bin:$SPARK_HOME/sbin

Spark startup configuration file:

Cp / usr/spark-1.6.1-bin-hadoop2.6/conf/spark-env.sh.template / usr/spark-1.6.1-bin-hadoop2.6/conf/spark-env.sh

Modify the contents of the spark-env.sh profile:

Export SPARK_MASTER_IP=cloud1

Export SPARK_WORKER_MEMORY=128m

Export JAVA_HOME=/usr/local/java/jdk1.8.0_191

Export SCALA_HOME=/usr/local/java/scala-2.12.8

Export SPARK_HOME=/usr/local/java/spark-1.6.1-hadoop2.6

Export HADOOP_CONF_DIR=/usr/local/java/hadoop-2.6.4/etc/hadoop

Export SPARK_LIBRARY_PATH=$$SPARK_HOME/lib

Export SCALA_LIBRARY_PATH=$SPARK_LIBRARY_PATH

Export SPARK_WORKER_CORES=1

Export SPARK_WORKER_INSTANCES=1

Export SPARK_MASTER_PORT=7077

Modify the configuration file (/ usr/local/java/spark-1.6.1-bin-hadoop2.6/conf/slaves) of the specified Worker:

Cloud1

Cloud2

Cloud3

Cluster deployment:

1 # submit the cloud1 container, and the command returns the number of the new image

2 docker commit cloud1

4 returns an id

3 # label the new image as Spark

4 docker tag cloud1

Commit this container into a new image

Then use this image to run two containers, one is cloud2~cloud3

#-h specify the hostname after the container is running

Docker run-- name cloud2-h cloud2-- add-host cloud2:172.17.0.3-- add-host cloud3:172.17.0.4-- add-host cloud1:172.17.0.2-it cloud1

Docker run-- name cloud3-h cloud3-- add-host cloud3:172.17.0.4-- add-host cloud1:172.17.0.2-- add-host cloud2:172.17.0.3-it cloud1...

# manually modify the myid in cloud2~cloud3

In cloud2: echo 2 > / zookeeper/tmp/myid opens the myid and there is a 2.

In cloud3: echo 3 > / zookeeper/tmp/myid opens the myid and there is a 3.

# start zookeeper cluster (start zk on cloud1, cloud2 and cloud3, respectively)

~ / zookeeper/bin/zkServer.sh start

# use status to check whether it is started (you can see the status only when cloud1 is started to cloud3)

~ / zookeeper/bin/zkServer.sh status

# start journalnode (start all journalnode on cloud1, note: this script is called hadoop-daemons.sh, note that it is the script in plural s)

# run the jps command to verify that there are many JournalNode processes on cloud1, cloud2 and cloud3

~ / hadoop/sbin/hadoop-daemon.sh start journalnode = = "start in every cloud

~ / hadoop/sbin/hadoop-daemons.sh start journalnode = = "what starts is all

# format HDFS (in the bin directory) and execute the command on cloud1:

~ / hadoop/bin/hdfs namenode-format

# format ZK (execute on cloud1, under bin directory)

~ / hadoop/bin/hdfs zkfc-formatZK

# start HDFS (execute on cloud1)

~ / hadoop/sbin/start-dfs.sh

# execute start-yarn.sh on cloud1

~ / hadoop/sbin/start-yarn.sh

# start spark cluster

~ / spark/sbin/start-all.sh

~ / spark/sbin/start-master.sh-start the master node

~ / spark/sbin/slaves.sh-- start all worker nodes, and enable local worker nodes separately for slave without s

After startup, the browser visits spark= > cloud1:8080 yarn= > cloud1:8088 hdfs= > cloud1:50070

Note: the three containers cloud1, cloud2, and cloud3 must ensure that SSH is enabled, and communication between nodes must be used.

Ensure that there are three node ip and names in each container / etc/hosts, and the virtual machine firewall must be turned off

Solving problems encountered in the construction process:

Install the curl command: > sudo apt-get install curl

If you prompt "Temporary failure resolving 'archive.ubuntu.com'" during the installation of curl, add nameserver 202.96.134.133 nameserver 8.8.8.8 to the / etc/resolv.conf file

If you want to use native ip access in a virtual machine, you have to map the port from the docker container:

Add port mapping (source: https://blog.csdn.net/hp_satan/article/details/77531794)

A, get the container ip

Docker inspect $container_name | grep IPAddress

b. Add forwarding rules

Iptables-t nat-A DOCKER-p tcp-- dport $host_port-j DNAT-- to-destination $docker_ip:$docker_port

123456

Delete Port Mapping Rul

a. Get rule number

Iptables-t nat-nL-- line-number

b. Delete rules according to number

Iptables-t nat-D DOCKER $num

[root@localhost] # iptables-t nat-A DOCKER-p tcp-- dport 8080-j DNAT-- to-destination 172.17.0.2 nat 8080

[root@localhost] # iptables-t nat-A DOCKER-p tcp-- dport 50070-j DNAT-- to-destination 172.17.0.2 nat 50070

Liunx View process command:

View the process:

1. The ps command is used to view the currently running processes.

Grep is a search

For example: ps-ef | grep java

It means to view the process information that CMD is java in all processes.

2. Ps-aux | grep java

-aux displays all statu

The kill command is used to terminate the process

For example: kill-9 [PID]

-9 means forcing the process to stop immediately

Usually use ps to view the process PID and use the kill command to terminate the process

The command to turn off the firewall in centos7

Systemctl stop firewalld.service # stop firewall

Systemctl disable firewalld.service # prevents firewall from booting

Firewall-cmd-- state # View the default firewall status (show notrunning when turned off and running when turned on)

Docker network problems:

Docker network ls shows how docker connects to the network.

Docker network inspect bridge is empty in the container at this time

Docker run-- name cloud1-h cloud1-- add-host cloud1:172.17.0.2-it ubuntu will have the cloud1 name in the container after the execution.

Systemctl restart docker

Docker inspect container_name | grep IPAddress to check the ip address of the container

The container installed by Docker's Ubuntu image does not have the ifconfig command and ping command

Resolve:

Apt-get update

Apt install net-tools # ifconfig

Apt install iputils-ping # ping

Apt-get install vim command

Copy from the host to the container docker cp host_path containerID:container_path

Copy from the container to the host docker cp containerID:container_path host_path

Start the container

Start the container and start bash (interactive mode):

$docker run-I-t / bin/bash

The startup container runs in the background (in a more general way):

$docker run-d-it image_name

Ps: the image_name here contains tag:hello.demo.kdemo:v1.0

Docker start container ID or container name-"this startup is not interactive

Docker run-d-p 80 12345 weba:v0.1 (container name)-"the background thread starts to map port 80 of the host to port 12345 in the mirror.

Docker run-d-p 80weba:v0.1 12345-name web weba:v0.1 (container name)- is to change the container name from weba:v0.1 to container

Docker attach container name or container ID  general production environment does not use this command, access to some web services will always be stuck unresponsive, guess is listening blocking access

Docker exec-it container ID / bin/bash can be changed to: docker exec-it container ID sh if / bin/bash is not found

Attach to a running container

Docker attach

Go inside the running container and run bash (better than attach)

Docker exec-t-I

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.