In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-01-15 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >
Share
Shulou(Shulou.com)06/01 Report--
This article introduces the example analysis of Spark directed acyclic graph detection, the content is very detailed, interested friends can refer to, hope to be helpful to you.
01
-
Background introduction of Spark
Apache Spark is a fast and general computing engine specially designed for large-scale data processing. Spark is an open source cluster computing environment similar to Hadoop, which has the advantages of Hadoop MapReduce, but what is different from MapReduce is that the intermediate output of Job can be saved in memory, so there is no need to read and write HDFS, so Spark can be better applied to iterative MapReduce algorithms such as data mining and machine learning.
RDD, whose full name is Resilient Distributed Datasets, is a fault-tolerant and parallel data structure that allows users to explicitly store data to disk and memory, and to control data partitioning. RDD is the soul of Spark, and a RDD represents a read-only dataset that can be partitioned. There can be many partitions within RDD (partitions), and each partition has a large number of records (records).
The dependency relationship between RDD is expressed by directed acyclic graph (DAG). Let's take a look at the basic theory and algorithm of directed acyclic graph.
02
-
Directed acyclic graph (DAG)
In graph theory, a graph whose edge has no direction is called an undirected graph, and if an edge has a direction, it is called a directed graph. On the basis of an undirected graph, no vertex can return to the point after several edges, then the graph has no loop, called directed acyclic graph (DAG graph). As shown in the following figure, 4-> 6-> 1-> 2 is a path, 4-> 6-> 5 is also a path, and there is no vertex in the graph that can return to the point after several edges, you can get the following graph as DAG.
Degree of entry
Entry degree is one of the important concepts in graph theory algorithm. It usually refers to the sum of the number of times that a point in a digraph serves as the end point of an edge in the graph, that is, the number of edges of an item point is called the entry degree of the item point. As shown in the above figure, the entry degree of vertex 4 is 0.
Degree of output
Corresponding to the entry degree, the number of outgoing edges of a vertex is called the outgoing degree of the vertex. As shown in the image above, the entry degree of vertex 3 is 2.
03
-
Another example of a DAG application
In some task scheduling and scheduling problems. There are also some dependencies between different problems or tasks, and some tasks need to be done after certain tasks have been completed. It's like the teaching curriculum in some schools. Setting up a course depends on a pre-course, which can only be taken by students after they have taken the pre-course. If you treat a course as a node, draw a pointer from it to a course that is sequentially dependent on it. There might be a picture like this:
The Algorithms course points to Theoretical CS, which means that you need to complete the Algorithms course first. Artificial Intelligence relies on Theoretical CS,Machine learning, Artificial Intelligence,Neural Networks, and Machine learning, which is called a path.
You can also see that the node with a degree of 0 in the above figure has Introduction to CS, which is of great significance in directed graph traversal, as we will see below.
04
-
If there is a ring in the picture above, is it correct?
As shown above, if Machine learning points to Theoretical CS, it means that students who take Theoretical CS need to take Machine learning first, which is contrary to the original path Artificial Intelligence relying on Theoretical CS,Machine learning and relying on Artificial Intelligence! And it doesn't make sense that Theoretical CS is a basic theoretical course. How is it possible to finish machine learning before taking it? So there can be no loops. This diagram is incorrect. Therefore, this graph must be a directed acyclic graph!
05
-
How does a digraph detect whether there is a cycle or not?
So, how do you detect whether a digraph is DAG?
The loop detection of a directed graph is compared with the loop detection of an undirected graph. In an undirected graph, if we want to detect whether there is a loop in the middle of a graph, we need to mark the visited elements by depth-first or breadth-first. If you encounter a previously visited element again, there may be a ring. Is it possible to only mark and detect loops in a directed graph?
As shown in the following figure, the depth-first traversal method has traversed nodes 2 and 6, and marked. Now it traverses the other side of node 1, traversing 310, 4, 5, 6, 6 in turn. Because 6 has been traversed, it forms a loop, but in fact there is no loop. Therefore, it is not consistent with the reality, and it is wrong to mark the visited elements to determine whether there is a loop or not.
It feels like conditions should be added, what conditions should be added? If we add an array to save whether the current node is in the recursive stack onStack, we can eliminate the above problem, because 2 Magin6 is marked, recursively out of the stack, and then to 1, depth traversing the other side of 1 (3-> 4-> 5-> 6), so 6 is not on the onStack at this time, it is detected for the first time, so there is no loop.
Therefore, the acyclic detection of directed graph needs to rely on two constraints at the same time:
Mark visited elements
Whether the current node is in the recursive stack onStack
On the basis of the above figure, add nodes 7 and 8. As shown in the following figure, it can be predicted that when node 4 is searched according to depth first, child node 5 will be found. One of the edges of node 5 will find 7, 8, and 4. Node 4 is already in the onStack at this time, so it forms a loop and is a loop.
This is the end of the example analysis of Spark directed acyclic graph detection. I hope the above content can be helpful to you and learn more knowledge. If you think the article is good, you can share it for more people to see.
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.