Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

What are the characteristics and advantages of Spark operating architecture

2025-04-01 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Development >

Share

Shulou(Shulou.com)06/02 Report--

What are the characteristics and advantages of Spark operating architecture? I believe many inexperienced people are at a loss about this. Therefore, this paper summarizes the causes and solutions of the problem. Through this article, I hope you can solve this problem.

As a distributed computing framework, Spark is similar to the MapReduce of Hadoop ecosystem developed by big data. The computing idea is very similar to MR. Both of them are the idea of divide and conquer, but the utilization rate is much higher than MR. The following is the arrangement of big data's interview questions on the running architecture of Spark, including the basic process, architectural features and advantages of Spark running.

1. Spark runs the basic process:

(1) build the running environment of Spark Application (start SparkContext). SparkContext registers with the resource manager (which can be Standalone, Mesos or YARN) and applies to run Executor resources.

(2) the resource manager allocates Executor resources and starts Executor,Executor operation. The operation will be sent to the resource manager with the heartbeat.

(3) SparkContext is constructed into DAG graph, DAG graph is decomposed into Stage, and Taskset is sent to Task Scheduler. Executor applies to SparkContext for Task,Task Scheduler to issue Task to Executor to run while SparkContext issues the application code to Executor.

(4) Task runs on Executor, and all resources are released after running.

2. Features of Spark running architecture:

(1) each Application gets its own executor process, which resides during the Application and runs tasks in a multithreaded manner.

(2) the Spark task has nothing to do with the resource manager, as long as you can get the executor process and keep communicating with each other.

(3) the Client that submits the SparkContext should be close to the Worker node (the node running Executor), preferably in the same Rack, because there is a lot of information exchange between SparkContext and Executor when the Spark program is running; if you want to run in a remote cluster, it is best to use RPC to submit the SparkContext to the cluster, and do not run SparkContext far away from Worker.

(4) Task adopts the optimization mechanism of data locality and speculative execution.

3. Advantages of Spark:

(1) High computational efficiency

Resource reuse; coarse-grained resource scheduling.

(2) easy to use

Support for writing in multiple languages; more than 80 methods are provided for us to use.

(3) strong versatility.

The components in the Spark biosphere are encapsulated based on SparkCore.

(4) strong adaptability.

Can accept hundreds of data sources; can run on a variety of resource scheduling frameworks.

After reading the above, have you mastered the characteristics and advantages of the Spark operating architecture? If you want to learn more skills or want to know more about it, you are welcome to follow the industry information channel, thank you for reading!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Development

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report