Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

Analyzing the position of RDD in Spark

2025-04-06 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Share

Shulou(Shulou.com)06/02 Report--

The core concept of 1.Spark is RDD (resilient distributed dataset), which refers to a read-only, partitioned distributed dataset. All or part of the dataset can be cached in memory and reused among multiple computations.

2.RDD is abstractly a collection of elements that contains data. It is partitioned, divided into multiple partitions, each distributed on different Worker nodes in the cluster, so that the data in the RDD can be manipulated in parallel. (distributed dataset)

3.RDD is usually created from files on Hadoop, that is, HDFS files or Hive tables; sometimes it can also be transformed from local creation of RDD.

4. Although the traditional MapReduce has the advantages of automatic fault tolerance, load balancing and scalability, its biggest disadvantage is the use of acyclic data flow model, which makes a large number of disk IO operations in the iterative formula. RDD is the abstract method to solve this shortcoming. The most important feature of RDD is that it provides fault tolerance and automatically recovers from node failures. That is, if the RDD partition on a node is lost due to a node failure, the RDD will automatically recalculate the partition through its own data source. All this is transparent to the user. The lineage feature of RDD.

5.RDD data is stored in memory by default, but when memory resources are insufficient, Spark automatically writes RDD data to disk. (elastic)

The position and function of RDD in Spark

1) Why is there Spark? Because the traditional parallel computing model can not effectively solve iterative computing (iterative) and interactive computing (interactive), and the mission of Spark is to solve these two problems, which is also the value and reason for its existence.

2) how does Spark solve iterative computation? The main idea of its implementation is RDD, which stores all calculated data in distributed memory. Iterative computing is usually repeated iterative computation of the same data set, and the data in memory will greatly improve the IO operation. This is also the core of Spark: memory computing.

3) how does Spark implement interactive computing? Because Spark is implemented in Scala language, Spark and scala can be tightly integrated, so Spark can perfectly use scala's interpreter, so that scala in it can manipulate distributed datasets as easily as local collection objects.

4) the relationship between Spark and RDD? It can be understood that RDD is a fault-tolerant abstract method of cluster computing based on memory, and Spark is the implementation of this abstract method.

Conclusion

Thank you for watching. If there are any deficiencies, you are welcome to criticize and correct them.

If you have a partner who is interested in big data or a veteran driver who works in big data, you can join the group:

658558542

Welcome everyone to exchange and share, study and exchange, and make common progress. There are also a lot of free materials to help you overcome difficulties on your way to becoming big data engineers and even architects! )

Finally, I wish all the big data programmers who encounter bottlenecks to break through themselves and wish you all the best in the future work and interview.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Internet Technology

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report