In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-04-08 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >
Share
Shulou(Shulou.com)06/01 Report--
This article will explain in detail the understanding and analysis of how to carry out Spark closures in the development of big data. The content of the article is of high quality, so the editor shares it with you for reference. I hope you will have a certain understanding of the relevant knowledge after reading this article.
1. Understanding closures from Scala
A closure is a function and the return value depends on one or more variables declared outside the function. Generally speaking, a closure can be simply thought of as another function that can access local variables in a function.
Such as the anonymous function below:
Val multiplier = (i:Int) = > I * 10
There is a variable I in the body of the function, which is used as an argument to the function. Like another piece of code below:
Val multiplier = (i:Int) = > I * factor
There are two variables in multiplier: I and factor. One of the I is the formal argument of the function, and when the multiplier function is called, I is assigned a new value. However, factor is not a formal parameter, but a free variable. Consider the following code:
Var factor = 3 val multiplier = (i:Int) = > I * factor
Here we introduce a free variable factor, which is defined outside the function.
The function variable multiplier defined in this way becomes a "closure" because it refers to the variable defined outside the function. The process of defining this function is to capture the free variable to form a closed function.
A complete example:
Object Test {def main (args: Array [String]) {println ("muliplier (1) value =" + multiplier (1)) println ("muliplier (2) value =" + multiplier (2))} var factor = 3 val multiplier = (i:Int) = > I * factor} 2.Spark
Let's take a look at the following code:
Val data=Array (1,2,3,4,5) var counter = 0var rdd = sc.parallelize (data) / /? What happens if you do this rdd.foreach (x = > counter + = x) println ("Counter value:" + counter)
First of all, it is certain that the output above is that 0MagnePark breaks down the processing of RDD operations into tasks, with each task executed by Executor. Before execution, Spark calculates the closure of the task. Closures are the variables and methods that must be visible when Executor calculates on RDD (in this case, foreach ()). The closure is serialized and sent to each Executor, but the copy is sent to the Executor, so the output on the Driver is still the counter itself. If you want to update the global one, use the accumulator and use updateStateByKey in the spark-streaming to update the public state.
In addition, closures in Spark have other functions.
1. Clear useless global variables sent by Driver to Executor, etc., and copy only useful variable information to Executor
two。 Make sure that the data sent to Executor is serialized.
For example, when using DataSet, the definition of case class must be under the class, not within the method, even if there is no grammatical problem, if you have used json4s to serialize, the introduction of implicit val formats = DefaultFormats is best placed under the class, otherwise you have to serialize the format separately, even if you don't use it anything else.
This is the end of the understanding and analysis of how to carry out Spark closures in the development of big data. I hope the above content can be helpful to you and learn more knowledge. If you think the article is good, you can share it for more people to see.
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 225
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.