How to quickly build Spark Cluster with docker 04/01 Update SLTechnology News&Howtos

How to quickly build Spark Cluster with docker

2025-04-01 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Servers >

Shulou(Shulou.com)06/01 Report--

This article introduces the knowledge of "how to build a Spark cluster quickly with docker". In the operation of actual cases, many people will encounter such a dilemma. Next, let the editor lead you to learn how to deal with these situations. I hope you can read it carefully and be able to achieve something!

Preface

Spark is a distributed computing framework developed by Berkeley. Compared with Hadoop, Spark can cache intermediate results to memory and improve the efficiency of some computing scenarios that need iteration. Let's take a look at the tutorial on how to quickly build a Spark cluster using docker.

intended for

Developers who are using spark

Developers who are learning docker or spark

Preparatory work

Install docker

(optional) download java and spark with hadoop

Spark cluster

Spark runtime architecture diagram

As shown above: the Spark cluster consists of the following two parts

Cluster Manager (Mesos, Yarn or standalone Mode)

Work node (worker)

How to dockerize (this example uses Standalone mode)

1. Split the spark cluster

Base (basic image)

Master (primary node mirror)

Worker (working image)

2. Write base Dockerfile

Note: centos is selected to facilitate the switch between versions of the basic image, so download java and spark to facilitate debugging. You can download the installation files and build a static file server locally, which can be quickly done with Node.js 's http-server.

The command is as follows

Npm install http-server-g http-server-p 54321 ~ / Downloads

Start writing Dockerfile officially.

FROM centos:7MAINTAINER RavenZZ # install system tool RUN yum update-yRUN yum upgrade-yRUN yum install-y byobu curl htop man unzip nano wgetRUN yum clean all# install JavaENV JDK_VERSION 8u11ENV JDK_BUILD_VERSION b1slave if the network speed is fast You can download # RUN curl-LO "http://download.oracle.com/otn-pub/java/jdk/$JDK_VERSION-$JDK_BUILD_VERSION/jdk-$JDK_VERSION-linux-x64.rpm"-H 'Cookie: oraclelicense=accept-securebackup-cookie' & & rpm-I jdk-$JDK_VERSION-linux-x64.rpm directly from the origin server. Rm-f jdk-$JDK_VERSION-linux-x64.rpm;RUN curl-LO "http://192.168.199.102:54321/jdk-8u11-linux-x64.rpm" & & rpm-I jdk-$JDK_VERSION-linux-x64.rpm; rm-f jdk-$JDK_VERSION-linux-x64.rpm;ENV JAVA_HOME / usr/java/defaultRUN yum remove curl Yum clean allWORKDIR sparkRUN\ curl-LO 'http://192.168.199.102:54321/spark-2.1.0-bin-hadoop2.7.tgz' & &\ tar zxf spark-2.1.0-bin-hadoop2.7.tgzRUN rm-rf spark-2.1.0-bin-hadoop2.7.tgzRUN mv spark-2.1.0-bin-hadoop2.7/*. / ENV SPARK_HOME / sparkENV PATH / spark/bin:$PATHENV PATH / spark/sbin:$PATH

3. Write master Dockerfile

FROM ravenzz/spark-hadoopMAINTAINER RavenZZ COPY master.sh / ENV SPARK_MASTER_PORT 7077ENV SPARK_MASTER_WEBUI_PORT 8080ENV SPARK_MASTER_LOG / spark/logsEXPOSE 8080 7077 6066CMD ["/ bin/bash", "/ master.sh"]

4. Write worker Dockerfile

FROM ravenzz/spark-hadoop MAINTAINER RavenZZ COPY worker.sh / ENV SPARK_WORKER_WEBUI_PORT 8081 ENV SPARK_WORKER_LOG / spark/logs ENV SPARK_MASTER "spark://spark-master:32769" EXPOSE 8081 CMD ["/ bin/bash", "/ worker.sh"]

5 、 docker-compose

Version: '3'services: spark-master: build: context:. / master dockerfile: Dockerfile ports:-"50001context 6066"-"50002context 7077" # SPARK_MASTER_PORT-"50003context 8080" # SPARK_MASTER_WEBUI_PORT expose:-7077 spark-worker1: build: context:. / worker dockerfile: Dockerfile ports:-"50004context 8081" links:-spark-master environment:-SPARK_MASTER=spark://spark-master:7077 spark-worker2: Build: context:. / worker dockerfile: Dockerfile ports:-"50005worker dockerfile 8081" links:-spark-master environment:-SPARK_MASTER=spark://spark-master:7077

6. Test cluster

Docker-compose up

The result of accessing http://localhost:50003/ is shown in the figure

This is the end of the content of "how to quickly build a Spark cluster with docker". Thank you for reading. If you want to know more about the industry, you can follow the website, the editor will output more high-quality practical articles for you!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.