In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-02-21 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >
Share
Shulou(Shulou.com)06/03 Report--
Spark diagram processing GraphX learning notes!
What is GraphX?
Graphx uses a parallel processing framework such as Spark to implement some parallel executable algorithms on the graph.
Whether the algorithm can be parallelized has nothing to do with Spark itself.
Whether the algorithm is parallelized or not needs to be proved by mathematics.
It would be a wrong choice to use Spark to implement the proven parallelizable algorithm, because Graphx supports pregel's graph computing model.
What components and basic frameworks are included in Graphx?
1. Member variables
The important member variables in graph are
Vertices
Edges
Triplets
Why introduce triplets, mainly related to the computing model Pregel, in which both edge and vertex are recorded in triplets. The specific code will not be listed.
2. Member function
Functions are divided into several categories
Operation on all vertices or edges without changing the graph structure itself, such as mapEdges, mapVertices
Subgraph, similar to filter subGraph in collection operation
The segmentation of the graph, that is, the paritition operation, is very important for Spark computing. It is precisely because of different Partition that parallel processing is possible. Different PartitionStrategy has different benefits. The easiest thing to think of is to use Hash to divide the whole diagram into multiple areas.
Outer join operation of outerJoinVertices vertices
Third, the operation and operation of the graph GraphOps
The common algorithm of a graph is abstracted into the class GraphOps, and implicit conversion is made in Graph to convert Graph to GraphOps, with the following 12 operators:
CollectNeighborIds
CollectNeighbors
CollectEdges
JoinVertices
Filter
PickRandomVertex
Pregel
PageRank
StaticPageRank
ConnectedComponents
TriangleCount
StronglyConnectedComponents
RDD
RDD is the core of Spark system, so what new RDD has been introduced into Graphx? there are two, respectively.
VertexRDD
EdgeRDD
It is more important than EdgeRdd,VertexRDD, and there are many operations on it, mainly focusing on the merging of attributes on Vertex. When it comes to merging, we have to refer to relational algebra and set theory, so we can see many terms similar to sql in VertexRdd, such as
LeftJoin
InnerJoin
IV. GraphX scene analysis
1. Storage and loading of graphs
In mathematical calculation, the graph is represented by a matrix in linear algebra, so how to store it?
When learning data structure, the teacher must have said a lot of ways, no longer verbose.
However, in big data's environment, what if the graph is so large that the data of vertices and edges is not enough to be put in one file? Use HDFS
When loading, what if there is not enough memory on a machine? Delayed loading, when the data is really needed, the data is distributed to different machines in a cascading manner.
Generally speaking, we will save all vertex-related content in one file, vertexFile, and all edge-related information in another file, edgeFile.
When a specific graph is generated, the association of vertices in the graph can be represented by edge, and the structure of the graph is also shown.
The following is an official example of Spark, where a Graph is constructed with two Array.
Val users: RDD [(VertexId, (String, String))] =
Sc.parallelize ((3L, ("rxin", "student")), (7L, ("jgonzal", "postdoc"))
(5L, ("franklin", "prof")), (2L, ("istoica", "prof")
/ / Create an RDD for edges
Val relationships: RDD [Edge [string]] =
Sc.parallelize (Array (Edge (3L, 7L, "collab"), Edge (5L, 3L, "advisor")
Edge (2L, 5L, "colleague"), Edge (5L, 7L, "pi")
/ / Define a default user in case there are relationship with missing user
Val defaultUser = "John Doe", "Missing")
/ / Build the initial Graph
Val graph = Graph (users, relationships, defaultUser)
2 、 GraphLoader
GraphLoader is used specifically for graph loading and generation in graphx, and the most important function is edgeListFile.
/ / divided by vertex, divided into 4 partitions
Val graph = GraphLoader.edgeListFile (sc, "hdfs://192.168.0.10:9000/input/graph/web-Google.txt", minEdgePartitions = 4)
5. Examples of GraphX application
One line of code:
Val rank = graph.pageRank. Vertices
Implement with RDD:
Complete code / / Connect to the Spark clusterval sc = new SparkContext ("spark://master.amplab.org", "research") / / Load my user data and parse into tuples of user id and attribute listval users = (sc.textFile ("graphx/data/users.txt") .map (line = > line.split (",")) .map (parts = > (parts.head.toLong, parts.tail) / / Parse the edge data which is already in userId-> userId format val followerGraph = GraphLoader.edgeListFile (sc) "graphx/data/followers.txt") / / Attach the user attributes val graph = followerGraph.outerJoinVertices (users) {case (uid, deg, Some (attrList)) = > attrList / / Some users may not have attributes so we set them as empty case (uid, deg, None) = > Array.empty [String]} / / Restrict the graph to users with usernames and namesval subgraph = graph.subgraph (vpred = (vid) Attr) = > attr.size = = 2) / / Compute the PageRank// Get the attributes of the top pagerank usersval userInfoWithPageRank = subgraph.outerJoinVertices (pagerankGraph.vertices) {case (uid, attrList, Some (pr)) = > (pr, attrList.toList) case (uid, attrList, None) = > (0.0, attrList.toList)} println (userInfoWithPageRank.vertices.top (5) (Ordering.by (_. _ 2.room1)). MkString ("\ n"))
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.