In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-01-19 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >
Share
Shulou(Shulou.com)06/01 Report--
Today, I will talk to you about how to carry out the Standalone mode and configuration in the running environment of big data Spark, which may not be well understood by many people. in order to make you understand better, the editor has summarized the following contents for you. I hope you can get something according to this article.
Big data Spark operating environment: Standalone mode and related configuration
Standalone mode
Here we take a look at the cluster mode that runs only with Spark's own nodes, which is what we call the stand-alone deployment (Standalone) mode. Spark's Standalone mode embodies the classic master-slave mode.
Cluster planning:
1 unzip the file
Upload the spark-3.0.0-bin-hadoop3.2.tgz.tgz file to Linux and extract it to the specified location
Tar-zxvf spark-3.0.0-bin-hadoop3.2.tgz-C / opt/module cd / opt/module mv spark-3.0.0-bin-hadoop3.2 spark-standalone2 modify configuration file
1) enter the conf directory of the decompressed path, and modify the slaves.template file name to slaves
Mv slaves.template slaves
2) modify slaves file and add work node
Hadoop102hadoop103hadoop104
3) modify the spark-env.sh.template file name to spark-env.sh
Mv spark-env.sh.template spark-env.sh
4) modify the spark-env.sh file to add the JAVA_HOME environment variable and the master node corresponding to the cluster
Export JAVA_HOME=/opt/module/jdk1.8.0_212 SPARK_MASTER_HOST=hadoop102SPARK_MASTER_PORT=7077
Note: Port 7077 is equivalent to port 8020 of hadoop3.x internal communication. The port here needs to confirm its own virtual machine configuration.
5) distribute the spark-standalone directory
Xsync spark-standalone3 starts the cluster
1) execute script commands:
Sbin/start-all.sh
2) View the running process of three servers
= hadoop102= 3330 Jps 3238 Worker 3163 Master = hadoop103= 2966 Jps 2908 Worker = hadoop104= 2978 Worker 3036 Jps
3) View the Web UI interface of Master resource monitoring: http://hadoop102:8080
4 submit application bin/spark-submit\-- class org.apache.spark.examples.SparkPi\-- master spark://hadoop102:7077\. / examples/jars/spark-examples_2.12-3.0.0.jar\ 10
-- class represents the main class of the program to execute
-- master spark://hadoop102:7077 standalone deployment mode, connecting to the Spark cluster
Spark-examples_2.12-the jar package where the 3.0.0.jar runs the class
The number 10 represents the entry parameter of the program, which is used to set the number of tasks currently applied
When a task is executed, multiple Java processes are generated
When performing tasks, the total number of cores of server cluster nodes is used by default, and each node has 1024m of memory.
5 configure History Service
Since the cluster monitoring hadoop102:4040 page cannot see the running of the historical task after the spark-shell is stopped, the history server is configured to record the running of the task during development.
1) modify the spark-defaults.conf.template file name to spark-defaults.conf
Mv spark-defaults.conf.template spark-defaults.conf
2) modify the spark-default.conf file and configure the log storage path
Spark.eventLog.enabled true spark.eventLog.dir hdfs://hadoop102:8020/directory
Note: the hadoop cluster needs to be started and the directory directory on HDFS needs to exist ahead of time.
Sbin/start-dfs.sh hadoop fs-mkdir / directory
3) modify spark-env.sh file and add log configuration
Export SPARK_HISTORY_OPTS= "- Dspark.history.ui.port=18080-Dspark.history.fs.logDirectory=hdfs://hadoop102:8020/directory-Dspark.history.retainedApplications=30"
Note: write in one line! Separate the spaces!
Parameter 1 means that the port number for WEB UI access is 18080.
Parameter 2 meaning: specify the history server log storage path
Parameter 3 means: specify the number of Application history records to be saved. If this value is exceeded, the old application information will be deleted. This is the number of applications in memory, not the number of applications displayed on the page.
4) distribute configuration files
Xsync conf
5) restart the cluster and history services
Sbin/start-all.sh sbin/start-history-server.sh
6) resume the task
Bin/spark-submit\-class org.apache.spark.examples.SparkPi\-master spark://hadoop102:7077\. / examples/jars/spark-examples_2.12-3.0.0.jar\ 10
7) View History Service: http://hadoop102:18080
6 configure High availability (HA)
The so-called high availability is because there is only one Master node in the current cluster, so there will be a single point of failure. Therefore, in order to solve the problem of single point of failure, it is necessary to configure multiple Master nodes in the cluster. Once the active Master fails, the backup Master provides services to ensure that the job can continue to execute. The high availability here is generally set by Zookeeper.
Cluster planning:
1) stop the cluster
Sbin/stop-all.sh
2) start Zookeeper
3) modify the spark-env.sh file to add the following configuration
Note the following: # SPARK_MASTER_HOST=hadoop102#SPARK_MASTER_PORT=7077 adds the following: # Master monitoring page default access port is 8080, but it conflicts with Zookeeper, so change it to 8989, or you can customize it. Please pay attention to SPARK_MASTER_WEBUI_PORT=8989 export SPARK_DAEMON_JAVA_OPTS= "- Dspark.deploy.recoveryMode=ZOOKEEPER-Dspark.deploy.zookeeper.url=hadoop102,hadoop103,hadoop104-Dspark.deploy.zookeeper.dir=/spark" when accessing the UI monitoring page.
Note: write in one line! Separate the spaces!
4) distribute configuration files
Xsync conf/
5) start the cluster
Sbin/start-all.sh
6) start the individual Master node of hadoop103, and the Master state of hadoop103 node is in standby state
[bigdata@hadoop103 spark-standalone] $sbin/start-master.sh
7) submit the application to the highly available cluster
Bin/spark-submit\-class org.apache.spark.examples.SparkPi\-master spark://hadoop102:7077,hadoop103:7077\. / examples/jars/spark-examples_2.12-3.0.0.jar\ 10
8) stop the Master resource monitoring process of hadoop102
9) check the Master resource monitoring Web UI of hadoop103, and after a short period of time, the Master status of the hadoop103 node is promoted to active.
After reading the above, do you have any further understanding of how to implement the Standalone mode and configuration in big data's Spark runtime environment? If you want to know more knowledge or related content, please follow the industry information channel, thank you for your support.
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.