In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-04-04 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Development >
Share
Shulou(Shulou.com)06/01 Report--
This article mainly explains "what is the working principle of the graph database". Interested friends may wish to take a look at it. The method introduced in this paper is simple, fast and practical. Next, let the editor take you to learn "what is the working principle of the graph database"?
What is a graph database?
First, before we delve into what a graph database is, let's define this term. Graph database is a kind of data storage which is not only SQL (Not Only SQL), but also NoSQL. They are designed to store and retrieve data in a graph structure.
The storage mechanism used may vary from database to database. Some GDB may use more traditional database structures, such as table-based, and then have a graphical API layer at the top. The rest will be a "native" GDB-- that maintains the graphical structure of data from storage, management, and query to the entire construction of the database. Many currently available graphical databases do this by treating the relationship between entities as "first-class citizens".
Different types of graph databases
In a broad sense, there are two types of GDB, resource description framework (RDF) / triple storage / semantic map database and attribute map database.
RDF GDB uses the concept of triple, which is a statement made up of three elements: subject-predicate-object.
The subject will be the resource or node in the graph, the object will be another node or literal value, and the predicate represents the relationship between the subject and the object. There is no internal structure on the node or relationship, and everything is identified by a unique identifier in the form of URI.
The motivation behind this structure is to exchange and publish data.
The GDB attribute focuses on the concept of storing data close to a logical model. This, in turn, will be based on the problems sought by the data itself and focus on making the representation stored and queried as efficiently as possible.
Unlike RDF-based diagrams, nodes and relationships have internal structures that provide rich data representation and related metadata.
Analysis of attribute Graph Database
For the rest of this article, we will focus on native attribute graph databases, especially Neo4j. Let's examine the main components.
The main components of the property map database are as follows:
Nodes: also known as vertices in graph theory-- the main data elements of a graph
Relationship: in graph theory, it is also called edge-the link between two nodes. It will have direction and type. Nodes with no relationship are allowed, and no relationship between two nodes is not allowed.
Nodes and relationships
Tag: defines a node category, and a node can have multiple
Attribute: enrich a node or relationship and cannot be null!
Tags, types, and attributes
Review of Graph Database and Relational Database
Many developers are familiar with traditional relational databases, where the data is stored in tables in a well-defined schema.
Each row in the table is a discrete data entity. One of these elements in a row is often used to define its uniqueness: the primary key. It could be a unique ID, or it could be a person's ID number or something.
Then we reduce data duplication through a process called normalization. In normalization, we move a reference (such as someone's address) to another table. Therefore, we get a reference from the row that represents the entity to the line that represents the address of that person.
For example, if someone changes their address, you don't want multiple versions of that person's address everywhere, and you must try to remember all the different instances of the location of that person's address. Normalization ensures that you have a version of the data, so you can update it in one place.
Then when we query, we need to reconstruct the normalized data. We perform the so-called JOIN operation.
In our primary entity line, we have a primary key that identifies the ID of the entity, such as a person. We also have something called a foreign key, which represents a row in our address table. We connect the two tables through the primary key and the foreign key and use it to find the address in the address table. This is called JOIN, and these JOIN are done at query time and read time.
When we perform JOIN in a relational database, it is a set comparison operation in which we see where our two sets of data overlap (in this case, the people table and the address table). At a high level, this is how traditional relational databases work.
The working principle of original map database: connection and indexed adjacency
Let's take a quick look at the native graphics database and how it works.
We talked about that a discrete entity in a relational database is a row in a table. In the native graphics database, this row is equivalent to a node. It is still a discrete entity, so we still have this standardized element.
A node will be an entity. If we have a personal node, we will have a node per person. We will have a certain degree of uniqueness, such as social security numbers.
The key difference, however, is that when we connect this human node to another discrete entity, such as an address, we create a physical connection (also known as a relationship) between the two points.
The address will have a pointer indicating what is the outbound part of the relationship connected to the node? Then we have another pointer to the inbound part of the relationship to another node.
So, in fact, we are collecting a set of pointers, which is a manifestation of the physical connection between the two entities. That's the biggest difference.
In a relational database, you will use a connection to reconstruct the data when reading, which means that when querying, it will try to figure out how things map together.
In the graph database, since we already know that the two elements are connected, we do not need to look for mappings when querying. All we do is track storage relationships with other nodes.
This is what we call indexed adjacency. Compared with other database systems, this concept of indexed adjacency is the key to understanding the performance optimization of native graphics databases.
Indexed adjacency means that during local graph traversal, according to these pointers (relationships) of nodes in the connection graph, the performance of the operation does not depend on the overall size of the graph. This depends on the number of relationships connected to the node you are traversing.
When we say that JOIN is a collection operation (intersection), we use an index in a relational database to see where the two collections overlap. This means that the performance of JOIN operations starts to slow down as the table gets larger.
In large O sign terms, this is similar to logarithmic growth using indexes-similar to O (log n) and growing exponentially with the number of JOIN in the query.
On the other hand, the ergodic relationship in the graph is more based on the linear growth of the number of relationships in the nodes we actually traverse, rather than the overall size of the graph.
This is the basic query time optimization that the graph database provides us with indexed adjacency. From a performance perspective, this is really the most important thing when we think about native graphics databases.
A brief introduction to the movie picture
We have talked about the theoretical differences between graphics and relational databases. Now let's look at some side-by-side comparisons.
A movie map consists of a data set of actors, directors, producers, writers, critics, and films, as well as information about how they are connected.
Movie datasets include:
133 human nodes / entities
38 movie nodes / entities
253 relationships / connections between the above entities, described connections such as:
The person who directed the film.
The role played and the role played in the film
The person who wrote the movie
The person who makes the film
People who have commented on movies and given ratings and summaries
Someone who follows another person.
Although it is a relatively small data set, it fully describes the power of the graph.
Comparative data model
First, let's take a look at the data model of our respective databases. As with all data models, their appearance ultimately depends on the type of question you ask. So let's assume that we want to ask the following types of questions:
What movies have you acted in by yourself?
What movies does a person have to do with?
Who are all the co-actors that a person has ever worked with?
Based on these, here are the relevant potential data models:
Entity relation data Model of Film Graph attribute Graph data Model of Movie Graph
You'll find something right away-the ID is gone! Because once we know that there are connections, we connect the data together, and we no longer need them, or those mapping tables, to let us know how different data rows are joined together.
Comparison query
Now let's continue to compare some queries. Selecting some initial queries from the: PLAY movies example, let's take a look at some side-by-side comparisons of Cypher queries and what the equivalent SQL query looks like.
What is Cypher, I hear you ask? Cypher is a graph query language, which is used to query Neo4j graph database. There is also a version of OpenCypher that many other vendors are using.
As we query, it should start to become clearer about how the graph database and a query language that helps explore relationships are really starting to work. Let's start looking for Tom Hanks.
How to find Tom Hanks MATCH (p:Person {name: "Tom Hanks"}) RETURN p
Cypher
SELECT * FROM person WHERE person.name = "Tom Hanks"
SQL
How to find MATCH (: Person {name: "Tom Hanks"})-- > (m:Movie) RETURN m.titleSELECT movie.title FROM movieINNER JOIN movie_person ON movie.movie_id = person_movie.movie_idINNER JOIN person ON person_movie.person_id = person.person_idWHERE person.name = "Tom Hanks" how to find MATCH (: Person {name: "Tom Hanks"}) directed by Tom Hanks)-[: DIRECTED]-> (m:Movie) RETURN m.titleSELECT movie.title FROM movieINNER JOIN person_movie ON movie.movie_id = person_movie.movie_idINNER JOIN person ON person_movie.person_id = person.person_idINNER JOIN involvement ON person_movie.involve_id = involvement.involve_idWHERE person.name = "Tom Hanks" AND involvement.title = "Director" how to find Tom Hanks'co-actor MATCH (: Person {name: "Tom Hanks"})-- > (: Movie) (m) (m2) () (m) (m2)
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.