In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-02-24 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Servers >
Share
Shulou(Shulou.com)06/01 Report--
Today, I will talk to you about how spark-shell implements PageRank. Many people may not know much about it. In order to make you understand better, the editor has summarized the following content for you. I hope you can get something according to this article.
Talk is cheap, show code
Scala code implementation:
Val links = sc.parallelize (Array (('averse, Array (' d')), ('baked, Array (' a')), ('cased, Array (' ajar,'b')), ('dink, Array (' axiom,'c')) / / set the initial rank value of each page to 1.0 Universe / var ranks = links.mapValues (_ = > 1.0) var ranks = sc.parallelize (Array ('ajar, 1.0), (' b')) For (I links.map (destURL = > (destURL, rank / links.size))} / / simplified rank formula, update ranks ranks = contribsRdd.reduceByKey (_ + _). MapValues (0.15 + _ * 0.85)}
Code analysis:
The first iteration, transform operation, rdd.join (other), connects the two RDD of links and ranks
JoinRdd = links.join (ranks)
The results are as follows:
Res1: Array [(Char, (Array [Char], Double))] = Array ((d, (Array (a, c), 1.0)), (b, (Array (a), 1.0)), (a, (Array (d), 1.0)), (c, (Array (a, b), 1.0)
The first iteration, transform operation: rdd.flatMap (func)
ContribsRdd = joinRdd.flatMap {
/ / Note that the links here is the value obtained by pattern matching, and the type is Array [Char], not the previous ParallelCollectionRDD.
Case (srcURL, (links, rank)) = > links.map (destURL = > (destURL, rank / links.size))
}
The results are as follows:
Res2: Array [(Char, Double)] = Array ((Char 0.5), (cmen0.5), (amem1.0), (dmem1.0), (amem0.5), (bmem0.5)
The first iteration, transform operation: rdd.reduceByKey (func) & rdd.mapValues (func)
/ / simplified rank calculation formula, update ranks
Ranks = contribsRdd.reduceByKey (_ +). MapValues (0.15 + _ * 0.85)
The results are as follows:
Res3: Array [(Char, Double)] = Array ((dmem1.0), (bmem0.575), (amem1.84999), (cmem0.575))
There is a loss of precision caused by the double type. The rank value of a page should be 1.85.
The result of the iteration:
In the first iteration, the results of ranks are as follows:
Res1: Array [(Char, Double)] = Array ((dmem1.0), (bmem0.575), (amem1.84999), (cmem0.575))
In the second iteration, the results of ranks are as follows:
Res2: Array [(Char, Double)] = Array ((dmem1.72249), (bmem0.394375), (amem1.308124), (cmae0.575))
At the third iteration, the results of ranks are as follows:
Res3: Array [(Char, Double)] = Array ((dmem1.26190), (bmem0.39437), (amem1.46165), (cmem0.88206))
...
At the 21st iteration, the results of ranks are as follows:
Res21: Array [(Char, Double)] = Array ((dmem1.37039), (bmeme0.46126), (ameme1.43586), (cmeme 0.73247))
At the 22nd iteration, the results of ranks are as follows:
Res22: Array [(Char, Double)] = Array ((dmem1.37048), (bmem0.46130), (amem1.43579), (cmem0.73241))
At the 23rd iteration, the results of ranks are as follows:
Res23: Array [(Char, Double)] = Array ((dmem1.37042), (bmeme0.46127), (ameme1.43583), (cmem0.73245))
From the above iterative results, it can be concluded that when iterations reach 22 times, the rank value (keeping 4 decimal places) tends to be stable.
The rank value at this time can be used as the result value:
A: 1.4358 b: 0.4613 c: 0.7325 d: 1.3704
After reading the above, do you have any further understanding of how spark-shell implements PageRank? If you want to know more knowledge or related content, please follow the industry information channel, thank you for your support.
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.