Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

Soul of spark: RDD and DataSet

2025-04-01 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Share

Shulou(Shulou.com)06/03 Report--

Spark is based on abstract RDD, which converts the requirements of different processed data into RDD, and then carries out a series of operator operations on RDD to get the results.

RDD is a fault-tolerant, parallel data structure that can store data to disk and memory, control data partitioning, and provide rich API to manipulate data.

The definition of 1:RDD and Analysis of its five characteristics

RDD is an abstract concept of distributed memory and a highly restricted shared memory model, that is, the set of read-only record partitions in RDD, which can be computed in parallel across all nodes of the cluster, and is an abstract model based on working sets.

(1) list of divisions

(2) each partition has a calculation function.

(3) lists that depend on other RDD

(4) RDD divider of key-value data type

(5) each partition has a list of priority locations.

Definition of 2:DataSet and Analysis of its Internal Mechanism

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Internet Technology

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report