Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

How to convert RDD by operator when Spark runs the transformation

2025-04-07 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Servers >

Share

Shulou(Shulou.com)05/31 Report--

This article mainly introduces how to convert Spark to RDD through operators. It is very detailed and has a certain reference value. Friends who are interested must finish reading it.

Operator operation process

The following figure depicts the transformation of RDD by operator when Spark runs the transformation. Operator is a function defined in RDD, which can transform and manipulate the data in RDD.

1. Input: when the Spark program is running, the data is inputted into Spark from the external data space (such as distributed storage: textFile reading HDFS, parallelize method input Scala collection or data), and the data is entered into the Spark runtime data space, transformed into data blocks in Spark, and managed through BlockManager.

2. Running: after Spark data input forms RDD, you can use transformation operators, such as filter, to operate the data and transform RDD into a new RDD. Through the Action operator, Spark is triggered to submit jobs. If the data needs to be reused, the data can be cached to memory through the Cache operator.

3, output: program run-end data will be output Spark runtime space, stored in distributed storage (such as saveAsTextFile output to HDFS), or Scala data or collection (collect output to Scala collection, count returns Scala int data).

Operator classification

It can be roughly divided into three categories of operators:

1. The Transformation operator of Value data type, this transformation does not trigger the submission job, and the data item processed is value-type data.

2. Transfromation operator of Key-Value data type, this transformation does not trigger the submission job, and the data item processed is a data pair of Key- value type.

3. Action operators, which will trigger SparkContext to submit Job jobs.

The above is all the content of the article "how to transform RDD by operator when Spark runs the transformation". Thank you for reading! Hope to share the content to help you, more related knowledge, welcome to follow the industry information channel!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 287

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Servers

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report