In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-04-06 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Database >
Share
Shulou(Shulou.com)05/31 Report--
This article introduces the relevant knowledge of "what hinders the expansion of graphic database". In the operation process of actual cases, many people will encounter such difficulties. Next, let Xiaobian lead you to learn how to deal with these situations! I hope you can read carefully and learn something!
What is "extensibility of graphical databases"?
"Expansion" does not just mean storing more data on one computer or on multiple computers at random. For large or growing datasets, good query performance is essential.
So the real question is, can graphical databases perform satisfactorily when the dataset on a single computer grows to the point where it affects other functions? If you still can't understand why this is the primary problem, please join me in a quick review of the following graph database.
In simple terms, graph databases are used to store unstructured objects (vertices or nodes) and associated data for arbitrary data (attributes) and objects (edges). Edges usually indicate the point of effort between objects. Vertices and edges together form a graph network dataset.
Discrete mathematics defines a graph as a set of vertices and edges; computer science defines it as an abstract data type capable of representing connections or relationships. It differs from tabular data structures in relational database systems, which have limited ability to express data relationships.
As mentioned above, a graph consists of nodes (aka vertices [V]) connected by relationships (i.e. edges [E]).
Vertex A form path with any number of edges and any depth (length of path).
It can also be used for graphical modeling of cross-bank financial transactions, as shown in the following figure. In this example, we can define bank accounts as nodes and bank transactions and other relationships as edges.
Account and transaction information is stored in this way to traverse the depth of data that creates the graph unknown or changing. Writing and running such query functionality in a relational database is often a complex task (using a multi-model database you can model the relationship between a bank and its branches).
Graphical databases provide algorithms for users to query stored data and analyze relationships between them. These include traversal, pattern matching, shortest path, or distributed graph processing, such as analytic community detection, connecting components, or centrality. Most algorithms have one thing in common, which is the essence of solving supernode and network hop problems-algorithms traverse from one node to another by edges.
After a quick review, the challenge begins!
"Celebrity effect"
As mentioned above, vertices or nodes can have any number of edges. A classic example of a superpoint is an influencer-a supernode is a node in a graph dataset that has too many incoming or outgoing edges. Sir Patrick Stewart's Twitter account alone has more than 3.4 million followers.
If you now graphically model the account and tweet data, traversing its data, Patrick Stewart's account information, the algorithm must target all 3.4 million edges of the Steward account. This can extend query execution time and may even break authorized permissions. Similar problems exist in fraud detection (accounts making large transactions, network management-large IP hubs), etc.
Supernodes are an inherent problem with graphs and all graph databases, and there are two ways to minimize the impact of supernodes.
Source: Unsplash
Method 1: Split Super Node
More precisely, you can duplicate the node "Patrick Stewart" and split the data edge by some attribute, such as the country of fans or other specific groupings. This minimizes the performance impact of the supernode traversal data for use in querying classifications.
Method 2: Central Node Index
Vertex-centered indexes store both edge information and information about nodes. Still using Patrick Stewart's Twitter account as an example, you can group it like this: date/time information for followers to start following, country/region of followers, number of followers of followers, etc., all of which provide selectivity for more efficient use of ().
Query engines can use indexes to reduce the number of linear lookups required to perform traversal functions, as can fraud detection. The financial transaction above is the edge, and attributes such as transaction date or transaction amount can increase selection efficiency.
In some cases, neither of these methods works; performance degrades somewhat when traversing supernodes. In most cases, there are ways to optimize performance, but there is another problem that most graph databases haven't solved yet.
network hop problem
If you need to traverse a highly connected dataset, and all the memories required for querying are loaded on the same computer, querying a single primary memory takes about 100ns.
Suppose that the dataset is already far enough for a single instance, or that the operator wants to increase availability and processing power for clusters or packages. In the case of graphics, fragmentation means tearing down previously established connections, because the data needed for graph traversal may currently reside on different computers. This can cause network latency when querying information. The network may not be a developer problem, but query performance is.
Even if modern Gbit networks and servers are located in the same rack, network lookups cost about 5000 times more than in-memory lookups. Adding a bit of load to the network connecting cluster servers can have unpredictable consequences.
In this case, the traversal might start with database server 1 and click on nodes with edges pointing to vertices stored on DB Server 2 to find network hops through the network. Consider more practical cases where there are actually multiple hops in a single traversal query.
In fraud detection, IT network management, and even modern enterprise identification and Access Management scenarios, it may involve distributing graph data while also performing query functions in sub-second performance. And the large number of network hops generated during query execution can cause it to fail at a high scaling cost.
Smarter solutions
In most cases, if you have some knowledge of the data, you can slice the graph (customer ID, region, etc.) more intelligently. At other times, distributed graph analysis can be used to generate this domain knowledge by using community detection algorithms (such as ArangoDB's Pregel suite) for computation.
Fraud detection, for example, requires analyzing financial transactions to determine fraud patterns. In the past, fraudsters used banks in certain countries or regions to launder money. We can use this domain knowledge as the sharding key for the graph dataset and distribute all financial transactions performed in this region on DB server 1 and distribute processing of other transactions on other servers.
Now, with ArangoDB's SmartGraph feature, requests to launder money or query other graphs can be blocked locally, avoiding or at least significantly reducing network hops generated during queries. How on earth did this happen?
The query engine in ArangoDB remembers the data storage locations needed to traverse and sends requests to the query engine of each database server, which then processes the requests locally. The differences in results on each database server are then merged into the coordinator and sent to the client. For structured graphs, disjoint smart graphs can also be used to optimize queries.
There is a growing demand for solutions to data scaling problems, and graphics technology is increasingly important to answer such complex questions.
The author can say with certainty that it is feasible for graph databases to scale vertically, and horizontal scaling can also be achieved in ArangoDB. Of course, in some extremely unusual cases, neither the central node index nor SmartGraphs can help.
"What hinders the expansion of graph database" content is introduced here, thank you for reading. If you want to know more about industry-related knowledge, you can pay attention to the website. Xiaobian will output more high-quality practical articles for everyone!
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.