Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

How to analyze the explanation and relation of Spark nouns

2025-03-31 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Share

Shulou(Shulou.com)06/01 Report--

How to analyze the interpretation and relationship of Spark nouns? in view of this problem, this article introduces the corresponding analysis and solutions in detail, hoping to help more partners who want to solve this problem to find a more simple and feasible way.

Noun

Driver

The physical level of driver refers to the startup program that inputs and submits spark commands, and the logical level is responsible for scheduling the running process of spark, including applying for resources from master, dismantling tasks, and the code level is sparkcontext.

Worker

Worker refers to the physical node that can be run.

Executor

Executor refers to a handler that executes a spark task, and for java, it is a process that has a jvm. A worker node can run multiple executor, as long as there are sufficient resources.

Job

Job means that an action,rdd operation is divided into two types, one is transform and the other is action. When it comes to action, spark will complete all rdd operations from the last action to the current action with one job.

Stage

Stage means that a shuffle,rdd can be divided into wide dependency (shuffle dependency) and narrow dependency (narraw dependency) during operation, as shown in the following figure. And wide dependence refers to shuffle.

At the request of someone, explain what narrow dependency is, that is, each partition of the parent rdd acts only in the partition of one child rdd, as the original saying goes, each partition of the parent RDD is used by at most one partition of the child RDD.

Task

Task is the minimum execution unit of spark. Generally speaking, the operation of a partition is a task. The concept of partition is explained a little bit here.

The default number of partitions for spark is 2, and the minimum partition is 2. There are many ways to change the number of partitions, about three stages.

1. During the startup phase, the default number of partitions is initialized through spark.default.parallelism

two。 Generate rdd phase, which can be configured by parameters

In the 3.rdd operation phase, the number of partition of the parent rdd is inherited by default, and the final result is affected by shuffle operation and non-shuffle operation. The number of partition varies with different operations.

Noun relation

Physical relation

The spark operating architecture diagram given on the official website

Logical relation

The following figure is a summary of the logical diagram, if there is anything wrong, please remind me.

This is the end of the answer to the question on how to analyze the interpretation and relationship of Spark nouns. I hope the above content can be of some help to you. If you still have a lot of doubts to be solved, you can follow the industry information channel for more related knowledge.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Internet Technology

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report