In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-01-31 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Servers >
Share
Shulou(Shulou.com)05/31 Report--
This article mainly explains "how to use PageRank". The content in the article is simple and clear, and it is easy to learn and understand. Please follow the editor's train of thought to study and learn how to use PageRank.
PageRank is an iterative algorithm that performs multiple connections, so it is a good demo for RDD partitioning operations, and the algorithm maintains two datasets
(pageID,listList) contains a list of adjacent pages for each page. (pageID,rank) contains the current sort value of each page, and the pageRank calculation process is roughly as follows: initialize the sort value of each page to 1.0 in each iteration, for the page p, release a contribution value of rank (p) / numNeighbors (p) to each of its adjacent pages (directly connected pages). Set the ranking value of each page to 0.15 + 0.85 * contributionsReceived, where 2 and 3 will repeat several times, in this process, the algorithm will gradually converge to the actual PageRank value of each page, and generally iterate 10 times in practice. Package com.sowhat.spark
Import org.apache.spark.rdd.RDD
Import org.apache.spark. {HashPartitioner, SparkConf, SparkContext}
/ * *
* links = (pageID,LinkList)
* ranks = (pageID,rank)
* * /
Object MyPageRank {
Def main (args: Array [String]): Unit = {
Val conf: SparkConf = new SparkConf () .setMaster ("local [*]") .setAppName ("pagerank")
/ / create a SparkContext, which is the entry point for submitting the spark App
Val sc = new SparkContext (conf)
Val links: RDD [(String, Seq [string])] = sc.objectFile [(String, Seq [string])] ("filepwd") .partitionBy (new HashPartitioner (100)) .persist ()
Var ranks: RDD [(String, Double)] = links.mapValues (x = > 1.0)
For (I links.map (dest = > (dest, rank / links.size))
}
)
Ranks = contributions.reduceByKey (_ + _) .mapValues (v = > 0.15 + 0.85 * v)
}
Ranks.saveAsTextFile ("ranks")
}
}
The algorithm starts from initializing the value of each element of ranksRDD to 1.0, and then constantly updates the Rank value in each iteration, and the main optimization part is as follows.
LinksRDD will connect with ranks in each iteration, so partitionBy of big data set links will save a lot of network communication optimization overhead. For the same reason as above, data can be saved in early memory with persist for use in each iteration. When we first created the ranks, we used mapValues instead of map () to preserve the partitioning of the parent RDD links, which reduced the overhead of the first join operation. Use mapValues after reduceByKey in the loop body because reduceByKey is already a hash partition, so it is more efficient in the next iteration.
Recommendation: to maximize the potential of partition-related optimizations, try to use mapValues or flatMapValues when you don't need to change element keys.
This article uses mdnice typesetting
Thank you for your reading, the above is the content of "how to use PageRank", after the study of this article, I believe you have a deeper understanding of how to use PageRank, and the specific use needs to be verified in practice. Here is, the editor will push for you more related knowledge points of the article, welcome to follow!
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.