In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-01-19 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >
Share
Shulou(Shulou.com)06/01 Report--
This article is about how to analyze the Spark cluster and task execution process, the editor thinks it is very practical, so I share it with you to learn. I hope you can get something after reading this article.
Spark cluster components
Spark is a typical Master/Slave architecture, and the cluster mainly consists of the following four components: the driver in the Driver:Spark framework, which runs the main () function written by the user to Application. Analogous to MapReduce's MRAppmasterMaster: master node, control the entire cluster, monitor worker. In Yarn mode, Worker for the global resource manager: slave node, responsible for controlling the compute node, starting Executor. Analogy Node Resource Manager Executor in Yarn: a computational task executor, a process running on a worker node. Similar to MapTask and ReduceTask in MapReduce
Basic execution process of Spark
Take the StandAlone operation mode as an example:
1. The client starts the application and Driver related work, and submits the task to Master to apply for resources.
2.Master allocates resources to Worker and informs worker to start executor
3.Worker starts Executor,Worker to create ExecutorRunner threads, ExecutorRunner starts ExecutorBackend processes, and Executor communicates with Driver (task distribution, monitoring, etc.)
After 4.ExecutorBackend starts, it registers with Driver's SchedulerBackend, and SchedulerBackend submits the task to Executor to run 5. 0. The job ends after all Stage is completed.
The author emphasizes:
Operations performed on the Driver side
Building DAG Diagram by SparkContext
DAGScheduler divides tasks into stage and generates TaskSet for partitions that need to be processed
TaskScheduler sends out task.
SchedulerBackend submits the task to Executor to run
General rules of resource division
Get resources on all worker
Sort by resource size
Get the resources in the sorted order
Polling
Give priority to those who have more resources.
Spark Task Scheduler is different with different operation modes, such as Yarn mode: yarn-cluster mode is YarnClusterScheduler,yarn-client mode and YarnClientClusterScheduler mode is
The above is how to analyze the Spark cluster and task execution process. The editor believes that there are some knowledge points that we may see or use in our daily work. I hope you can learn more from this article. For more details, please follow the industry information channel.
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.