Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

Using golang basic data structure and algorithm Web Page ranking / PageRank to realize Random Walk

2025-01-31 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Development >

Share

Shulou(Shulou.com)06/03 Report--

This article introduces the knowledge of "using golang basic data structure and algorithm web page ranking / PageRank to achieve random walk". In the operation of actual cases, many people will encounter such a dilemma, so let the editor lead you to learn how to deal with these situations. I hope you can read it carefully and be able to achieve something!

Page ranking (PageRank/ Page ranking), random walk page ranking (PageRank, also known as Page ranking) is an algorithm that sorts search results when searching web pages. Web page ranking is an algorithm that uses the link structure between web pages to calculate the value of web pages. In page ranking, the more pages you link to, the more important it is. Suppose the weight of a page that is not linked to a page is 1. The weight of a page with links to a page is the sum of the weights linked to the page. If a page is linked to more than one page, all the pages in the link will share its weight equally. In a page ranking, the more pages you link to, the higher the value of the links sent out by that page. You can use the Random Walk Model (random walk model) to solve the problem of web page interlinking. The operation of a user browsing a web page can be defined as follows: the probability of a user jumping to a web page linked to the current web page with equal probability is 1-α, and the probability of remote jumping to a web page in another web page with equal probability is α. Simulate the process of users randomly visiting the page, each page visited, its weight increased by 1, until the total number of visits reached N times, the weight value of each page represents the "probability of browsing this page at some point". It can be used directly as the weight of the page. From [Japan] Ishida Baohui; Miyazaki Shoichi

Implement PageRank algorithm based on random walk model and verify its effectiveness and stability (page weight remains stable after several simulations)

Design

IPage: Web model interface

IPageRanking: Web page ranking algorithm interface

TPage: the implementation of Web Page Model

TRandomWalkPageRanking: PageRank algorithm based on random walk model to implement IPageRanking interface

Unit testing

Page_rank_test.go to verify the effectiveness and stability of the web page ranking algorithm

First of all, its effectiveness is verified by simple case.

Then randomly generate a large number of random interlinked web pages to verify the stability of web page weights after multiple rounds of random walks.

Package othersimport ("fmt" pr "learning/gooop/others/page_rank"math/rand"sort"testing"time") func Test_PageRank (t * testing.T) {fnAssertTrue: = func (b bool) Msg string) {if! b {t.Fatal (msg)}} t.Log ("testing simple case") p11: = pr.NewPage ("p11") p12: = pr.NewPage ("p12") p13: = pr.NewPage ("p13") p21: = pr.NewPage (" P21 ") p22: = pr.NewPage (" p22 ") p31: = pr.NewPage (" p31 ") p32: = pr.NewPage (" p32 ") p11.AddLink (p21) p11.AddLink (p22) p12.AddLink (p21) p12.AddLink (p22) p13.AddLink (p21) p13.AddLink (p22) p21.AddLink (p31) P22.AddLink (p31) p31.AddLink (p32) p32.AddLink (p31) samples: = [] pr.IPage {p11 P12 pr.RandomWalkPageRanking.RankAll p13, p21, p22, p31, p32,} pr.RandomWalkPageRanking.RankAll (samples, 1000) sort.Sort (sort.Reverse (pr.IPageSlice (samples) for _, p: = range samples {t.Log (p)} fnAssertTrue (samples [0] .ID () = "p31", "expecting top.1 = p31") fnAssertTrue (samples [1] .ID () = "p32") "expecting top.2 = p32") fnAssertTrue (samples [2] .ID () = = "p21" | | samples [2] .ID () = = "p22", "expecting top.3 in (p21 focus p22)") fnAssertTrue (samples [3] .ID () = = "p21" | | samples [3] .ID () = "p22", "expecting top.4 in (p21)" P22) ") / / generate 1000 random pages iPageCount: = 1000 pages: = make ([] pr.IPage, iPageCount) for iMagnee _: = range pages {pages [I] = pr.NewPage (fmt.Sprintf (" pd.com ", I))} r: = rand.New (rand.NewSource (time.Now (). UnixNano () for I P: = range pages {/ / add random page links for JJJ iPageLinks: = 0, r.Intn (10) J

< iPageLinks; { n := r.Intn(iPageCount) if n != i { j++ p.AddLink(pages[n]) } } } // rank pages and get top 100 mapTop100 := make(map[string]bool, 0) fnTestRanking := func(rounds int) { t.Logf("testing page rank with %v rounds", rounds) bFirstRound := len(mapTop100) == 0 // page ranking pr.RandomWalkPageRanking.RankAll(pages, rounds) // sort pages by ranking sort.Sort(sort.Reverse(pr.IPageSlice(pages))) hits := 0 for i,p := range pages { if i < 10 { t.Log(p) } if i < 100 { if bFirstRound { mapTop100[p.ID()] = true } else if _,ok := mapTop100[p.ID()];ok { hits++ } } else { break } } if !bFirstRound { t.Logf("hit rate: %s%v", "%", hits) } } // test stability under different rounds fnTestRanking(3000) fnTestRanking(10000) fnTestRanking(30000) fnTestRanking(90000)}测试输出 测试表明, 当随机游走的总次数 >

= number of pages * when the average number of links sent out per page, the ranking is relatively stable

$go test-v page_rank_test.go = = RUN Test_PageRank page_rank_test.go:19: testing simple case page_rank_test.go:47: P (p31, 0.4206, [p32]) page_rank_test.go:47: P (p32, 0.3673, [p31]) page_rank_test.go:47: P (p21, 0.0639, [p31]) page_rank_test.go:47: P (p22, 0.0604) [p31]) page_rank_test.go:47: P (p11, 0.0300, [p21 p22]) page_rank_test.go:47: P (p12, 0.0291, [p21 p22]) page_rank_test.go:47: P (p13, 0.0287, [p21 p22]) page_rank_test.go:77: testing page rank with 3000 rounds page_rank_test.go:89: P (p604.com, 0.0039) []) page_rank_test.go:89: P (p807.com, 0.0035, [p709.com p328.com p303.com p110.com p858.com p394.com p231.com p731.com p83.com]) page_rank_test.go:89: P (p72.com, 0.0034, [p249.com p347.com p604.com p533.com p416.com p958.com p966.com p385.com]) page_rank_test.go:89: P (p712.com, 0.0033 [p485.com p451.com p236.com p141.com p168.com p693.com]) page_rank_test.go:89: P (p452.com, 0.0032, [p588.com p344.com p618.com p258.com p394.com p285.com p780.com p606.com p89.com]) page_rank_test.go:89: P (p709.com, 0.0031 [p666.com p554.com p103.com p202.com p230.com]) page_rank_test.go:89: P (p975.com, 0.0029, []) page_rank_test.go:89: P (p840.com, 0.0029, [p974.com p698.com p49.com p851.com p73.com]) page_rank_test.go:89: P (p867.com, 0.0028) [p588.com p196.com p931.com p381.com p621.com p848.com]) page_rank_test.go:89: P (p633.com, 0.0028, [p840.com]) page_rank_test.go:77: testing page rank with 10000 rounds page_rank_test.go:89: P (p604.com, 0.0039, []) page_rank_test.go:89: P (p807.com, 0.0034) [p709.com p328.com p303.com p110.com p858.com p394.com p231.com p731.com p83.com]) page_rank_test.go:89: P (p72.com, 0.0034, [p249.com p347.com p604.com p533.com p416.com p958.com p966.com p385.com]) page_rank_test.go:89: P (p452.com, 0.0033 [p588.com p344.com p618.com p258.com p394.com p285.com p780.com p606.com p89.com]) page_rank_test.go:89: P (p712.com, 0.0033, [p485.com p451.com p236.com p141.com p168.com p693.com]) page_rank_test.go:89: P (p709.com, 0.0031 [p666.com p554.com p103.com p202.com p230.com]) page_rank_test.go:89: P (p975.com, 0.0029, []) page_rank_test.go:89: P (p840.com, 0.0029, [p974.com p698.com p49.com p851.com p73.com]) page_rank_test.go:89: P (p612.com, 0.0028) [p116.com p562.com p179.com p37.com p761.com]) page_rank_test.go:89: P (p319.com, 0.0028, [p726.com p63.com p558.com p301.com p185.com p944.com]) page_rank_test.go:104: hit rate:% 98 page_rank_test.go:77: testing page rank with 30000 rounds page_rank_test.go:89: P (p604.com, 0.0039 []) page_rank_test.go:89: P (p807.com, 0.0034, [p709.com p328.com p303.com p110.com p858.com p394.com p231.com p731.com p83.com]) page_rank_test.go:89: P (p72.com, 0.0034, [p249.com p347.com p604.com p533.com p416.com p958.com p966.com p385.com]) page_rank_test.go:89: P (p452.com, 0.0033 [p588.com p344.com p618.com p258.com p394.com p285.com p780.com p606.com p89.com]) page_rank_test.go:89: P (p712.com, 0.0032, [p485.com p451.com p236.com p141.com p168.com p693.com]) page_rank_test.go:89: P (p709.com, 0.0031 [p666.com p554.com p103.com p202.com p230.com]) page_rank_test.go:89: P (p975.com, 0.0029, []) page_rank_test.go:89: P (p840.com, 0.0029, [p974.com p698.com p49.com p851.com p73.com]) page_rank_test.go:89: P (p319.com, 0.0028) [p726.com p63.com p558.com p301.com p185.com p944.com]) page_rank_test.go:89: P (p612.com, 0.0028, [p116.com p562.com p179.com p37.com p761.com]) page_rank_test.go:104: hit rate:% 98 page_rank_test.go:77: testing page rank with 90000 rounds page_rank_test.go:89: P (p604.com, 0.0039 []) page_rank_test.go:89: P (p807.com, 0.0034, [p709.com p328.com p303.com p110.com p858.com p394.com p231.com p731.com p83.com]) page_rank_test.go:89: P (p72.com, 0.0034, [p249.com p347.com p604.com p533.com p416.com p958.com p966.com p385.com]) page_rank_test.go:89: P (p452.com, 0.0033 [p588.com p344.com p618.com p258.com p394.com p285.com p780.com p606.com p89.com]) page_rank_test.go:89: P (p712.com, 0.0032, [p485.com p451.com p236.com p141.com p168.com p693.com]) page_rank_test.go:89: P (p709.com, 0.0031 [p666.com p554.com p103.com p202.com p230.com]) page_rank_test.go:89: P (p975.com, 0.0029, []) page_rank_test.go:89: P (p840.com, 0.0029, [p974.com p698.com p49.com p851.com p73.com]) page_rank_test.go:89: P (p612.com, 0.0028) [p116.com p562.com p179.com p37.com p761.com]) page_rank_test.go:89: P (p319.com, 0.0028, [p726.com p63.com p558.com p301.com p185.com p944.com]) page_rank_test.go:104: hit rate:% 98Murray-PASS: Test_PageRank (13.93s) PASSok command-line-arguments 13.936sIPage.go

Web page model interface

Package page_rankimport "fmt" type IPage interface {fmt.Stringer ID () string GetWeight () float64 SetWeight (float64) GetLinks () [] IPage AddLink (IPage)} type IPageSlice [] IPagefunc (me IPageSlice) Len () int {return len (me)} func (me IPageSlice) Less (iMagazine int) bool {return me.GetWeight () < me [j] .GetWeight ()} func (me IPageSlice) Swap (I J int) {me [I], me [j] = me [j], me [I]} IPageRanking.go

Web page ranking algorithm interface

Package page_ranktype IPageRanking interface {RankAll (pages [] IPage, rounds int)} tPage.go

Implementation of Web Page Model

Package page_rankimport ("fmt"strings") type tPage struct {id string weight float64 links [] IPage} func NewPage (id string) IPage {return & tPage {id: id, weight: 0, links: [] IPage {} } func (me * tPage) ID () string {return me.id} func (me * tPage) GetWeight () float64 {return me.weight} func (me * tPage) SetWeight (w float64) {me.weight = w} func (me * tPage) GetLinks () [] IPage {return me.links} func (me * tPage) AddLink (p IPage) {me.links = append (me.links) P)} func (me * tPage) String () string {linkStrings: = make ([] string, len (me.links)) for iMagic p: = range me.links {linkStrings [I] = p.ID ()} return fmt.Sprintf ("p (% v,% 8.4f, [% v])", me.id, me.weight, strings.Join (linkStrings, ")} tRandomWalkPageRanking.go

PageRank algorithm based on random walk model to realize IPageRanking interface

Package page_rankimport ("math/rand"time") type tRandomWalkPageRanking struct {} var gPossiblityToLinkedPage = 85func newRandomWalkPageRanking () IPageRanking {return & tRandomWalkPageRanking {} func (me * tRandomWalkPageRanking) RankAll (pages [] IPage, rounds int) {iPageCount: = len (pages) if iPageCount 0 {/ / goto linked page current = me.VisitLinkedPage (current) R)} else {/ / goto unlinked page current = me.VisitUnlinkedPage (current, pages, r)} fVisitCount: = float64 (iVisitCount) for _, p: = range pages {p.SetWeight (p.GetWeight () / fVisitCount)}} func (me * tRandomWalkPageRanking) VisitLinkedPage (current IPage R * rand.Rand) IPage {links: = current.GetLinks () next: = links [r.Intn (len (links))] return next} func (me * tRandomWalkPageRanking) VisitUnlinkedPage (current IPage, pages [] IPage, r * rand.Rand) IPage {mapLinks: = make (map [string] bool, 0) mapLinks [current.ID ()] = true for _ P: = range current.GetLinks () {mapLinks [p.ID ()] = true} n: = len (pages) for {next: = pages [r.Intn (n)] if _, ok: = mapLinks [next.ID ()] ! ok {return next}} var RandomWalkPageRanking = newRandomWalkPageRanking () "use golang basic data structure and algorithm web page ranking / PageRank to achieve random walk" content here, thank you for reading. If you want to know more about the industry, you can follow the website, the editor will output more high-quality practical articles for you!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Development

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report