Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

How to deploy Spark Standalone Cluster with Rainbond

2025-01-17 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Share

Shulou(Shulou.com)06/01 Report--

In this issue, the editor will bring you about how to deploy Spark Standalone clusters in Rainbond. The article is rich in content and analyzes and describes for you from a professional point of view. I hope you can get something after reading this article.

Standalone is a master-slave cluster deployment model provided by Spark itself. This article describes a conventional 1-master and multi-slave cluster deployment model, in which master services rely on Rainbond platform monitoring to ensure their availability and support rescheduling and restart. Worker services can scale multiple nodes as needed.

The screenshot of the deployment effect is as follows:

Rainbond deployment effect diagram

Spark master UI diagram

Deployment steps

Before you start, you need to complete the installation and construction of the Rainbond platform. Refer to the Rainbond installation and deployment reference documentation for students who have mastered the basic operation of Rainbond, so if you are new to the Rainbond platform, please refer to the Rainbond Quick start Guide first.

Deploy master services for a single instance

Deploy spark-master and use Rainbond to create components based on Docker image:

Bde2020/spark-master:3.0.1-hadoop3.2

After confirming that the detection is created successfully, select the advanced settings to make three special settings.

Add environment variables to the environment variable module

SPARK_DAEMON_JAVA_OPTS=-Dspark.deploy.recoveryMode=FILESYSTEM-Dspark.deploy.recoveryDirectory=/data

We need to set spark-master to "Recovery with Local File System" mode. You can recover data from persistent files after a master restart, maintaining the availability of the master service.

Add shared storage / data persisted master data to the storage settings so that it can be rebooted and recovered.

Open the external service of port 8080 in the port management, and after the component starts successfully, you can access the UI of master.

Select the component type as stateful single instance in the deployment properties

Once deployed as a stateful component, it can obtain a stable internal access domain name for the worker component to connect to. Stateful service control ensures that master nodes will not start multiple instances repeatedly.

After the settings are completed, select confirm creation to start the master service.

The component successfully clicks to access to open the master UI. As shown in the figure above, we can get the access address of the master service in UI: spark://gr7b570e:7077. Notice that the address shown on the UI is spark://gr7b570e-0:7077. We need to use spark://gr7b570e:7077. Copy and record this address.

Note that the actual value of the address, please check your UI display, here is just an example.

Deploy worker instances with multiple instances

Deploy spark-worker by creating components based on the Docker-run command, which allows you to set some necessary properties directly:

Docker run-it-e SPARK_MASTER=spark://gr7b570e:7077-e SPARK_WORKER_MEMORY=1g bde2020/spark-worker:3.0.1-hadoop3.2

Two environment variables are specified in the above creation method.

SPARK_MASTER specifies the address of the master, which is obtained by the component created in the previous step.

SPARK_WORKER_MEMORY sets the amount of memory for a single instance of worker, which can be set according to the memory allocated for each instance. For example, if you assign 1GB to each instance, set SPARK_WORKER_MEMORY=1g. If you do not set this variable, the service automatically reads the amount of memory of the operating system. Because we use the container deployment method, the value read will be all the memory of the host. Will be much larger than the available memory values actually allocated by the worker instance.

Also enter the advanced settings and set the component deployment mode to stateful multi-instance.

Confirm the creation of the component. After launching successfully, you can set the number of running instances of worker in the component's scaling page.

At this point, our Spark cluster has been deployed.

Spark data reading

The principle of nearby data processing is gradually broken.

In the past, we preferred to deploy data processing services (hadoop, yarn, etc.) to the place closest to the data. The main reason is that the mode of hadoop computing data consumes more IO. If the data and computing are classified, the consumption caused by network IO will be greater and the network bandwidth will be required more.

But the Spark mechanism is different, the Spark computing mode is to cache the data into memory as much as possible, which means that the resources consumed by Spark are mainly memory and CPU. Then the memory and CPU allocation of the device that stores the data may not be sufficient. Therefore, the separation of data and calculation will be a better choice.

More choices after the separation of data and calculation

Data and computing separation means that computing services are deployed separately, and storage services provide data for computing services through the network. Through the network, it means that there are a variety of protocol modes to choose from. In addition to the traditional HDFS, object storage is commonly used at present, such as various services compatible with S3, or distributed file systems, which can be reasonably selected according to data types and actual needs. Computing Services (spark worker) can flexibly allocate computing resources in distributed clusters according to the needs of tasks.

Master node master / slave is highly available

Spark can provide master / slave switching of master service based on ZooKeeper. The configuration is also relatively simple.

Rainbond cloud native application management platform to achieve micro-service architecture without changing the code, managing Kubernetes without learning containers, helping enterprises to implement applications on the cloud, and continuously delivering any enterprise applications to Kubernetes clusters, hybrid clouds, multi-clouds and other infrastructure. It is the supporting platform of Rainstore cloud native application store.

This is how to deploy a Spark Standalone cluster with Rainbond shared by the editor. If you happen to have similar doubts, you might as well refer to the above analysis to understand. If you want to know more about it, you are welcome to follow the industry information channel.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Internet Technology

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report