In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-04-01 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Database >
Share
Shulou(Shulou.com)06/01 Report--
The following talk about the advantages of the graph database Neo4j application in the GIS system, the secret of the text is close to the topic. So, no gossip, let's just read the following, I believe you will benefit from this article after reading the advantages of database Neo4j application in GIS system.
1. Overview 1.1. Brief introduction of figure database
Graph database (Graph Database) is a new type of NoSQL database based on graph theory. His data storage structure and data query method are based on graph theory. The basic elements of graph in graph theory are nodes and edges, and the corresponding nodes and relations in graph database.
In a graph database, the relationship between data and data forms a graph structure through nodes and relationships, and realizes all the characteristics of the database on this structure, such as the ability to create, read, update, delete (Create, Read, Update, Delete, CRUD) and other operations on graph data objects, as well as the ability to deal with transactions and high availability.
1.2. Application case
From the perspective of system science, the world is made up of various systems, and the system is made up of the connections between the machines that are part of the system. From this level, we can directly map the system and its relations to the nodes and relations in mathematical graph theory, thus using graph theory to model the world intuitively, and graph database technology is based on graph theory. it can also be said to be the basic and universal "language" to express the colorful world. Compared with the original system, the simulation system described by this "language" has the characteristic of "high fidelity", which is consistent with people's usual cognition of the system, and is very intuitive, natural, direct and efficient. There is no need for the transformation and processing of the intermediate process, which often complicates the problem or leaves out a lot of valuable information. It is precisely because the graph database technology can directly describe a variety of complex real-world systems, it has a wide range of applicability and higher application value.
In fact, Neo4j has successfully "captured" a large number of customers, and the number of customers and applications are still growing. These customers include internationally renowned companies or institutions such as Cisco, Hewlett-Packard, Wolman, LinkedIn, Adidas, and the FT Financial Times. At present, the industry classification of Neo4j customers is mainly concentrated in the areas of social networking, human resources and recruitment, finance, insurance, retail, advertising, e-commerce, logistics, transportation, IT, telecommunications, manufacturing, printing, cultural media and medical care. A large number of Neo4j customers mentioned above repeatedly complain about the shortcomings of the original products before adopting the graph database products, which need to be solved and implemented by the new products, commonly known as: pain points, customers and major pain points are listed in the following table:
Serial number
Enterprise name
Pain point analysis or challenge
one
MigRaven
Authorization and access control
two
Adidas
The data needed to provide personalized experience is distributed on all kinds of isolated islands of information.
three
BILLES
Increase online customers: must be able to handle a large number of small print orders, a large number of acquisitions led to the patchwork of the IT system
four
Cerved
Improve computational efficiency and rapid identification, directly or indirectly control the company's personnel: access to the top technology of big data's network analysis
five
Die Bayerische
Outdated management systems and different data formats: creating a standardized data framework
six
ICIJ
Help reporters break the complex Swiss building data to get better investigative news.
seven
IRCC
Relational databases do not provide enough flexibility for multiple virtual functions
eight
LinkedIn China
Launch the social networking platform as quickly as possible, while making room for important users and feature growth
nine
Musimap
To map all music titles, each with 55 weighted description criteria to allow in-depth processing and real-time recommendations
ten
Qualia
The original product is only optimized to track user behavior on one device
eleven
SchleichGmbH
Greater scalability and flexibility are needed in the product data network
twelve
TRANSPARENCY-ONE
Manage and search a large amount of data, no performance problems
thirteen
Wanderu
Help consumers find and book inter-city buses and trains when traveling in the United States
fourteen
WineDataSystem
There are no available reference resources, a lot of information and questions about the convenience of access and the flexibility of users
fifteen
Wobi
Quickly analyze a large amount of entire customer information
sixteen
EBay
Supports large-scale complex routing queries with fast and consistent performance
seventeen
Global500 Logisitics
Geolocation routing information is generated all the time, and the business needs these location information with complex relationships to support it, which leads to serious challenges to traditional relational databases.
eighteen
Glowbl
Talk about many possible designs, ah, the network comes together, shows all the contacts in the form of a graph, and manages these contacts and their interactions in real time.
nineteen
InfoJobs
Set up a new portal to simulate the potential career path of job seekers
twenty
Megree
Provide an overall view of the relationship and strength of these links
twenty-one
Pitney bowes
Gain 360-degree customer insight and competitive advantage by building next-generation tools
twenty-two
Walmart
Provide customers with the best online shopping experience
twenty-three
Telenor
Behind the online self-service management portal, you can find the agreement responsible for managing the customer's organizational structure.
two。 The application and advantage of graph database in 2.1. Advantages of Graph Database
Neo4j was originally designed to better describe the relationships between entities. In real life, each entity has inextricable relationships with other entities around them, and there is a lot of potential information in these relationships. However, the traditional relational database pays more attention to describing the internal attributes of the entity, and the relationship between the entity and the entity is mainly realized by foreign keys. Therefore, join operations are required when querying the relationships of an entity, especially deep relational queries require a large number of join operations, and join operations are usually very time-consuming. With the rapid increase of relational data in the real world, relational database has been gradually difficult to carry query massive data deep-seated relationship requires a large number of database table operations brought about by the computational complexity, Neo4j came into being under such circumstances.
2.1.1. Index-free adjacency
An important feature of Neo4j is to ensure the speed of relational query, that is, index-free adjacency attributes, and each node in the database maintains references to its neighboring nodes. So each node is equivalent to a microindex of its neighbor, which is much less expensive than using a global index. This means that the query time has nothing to do with the overall size of the graph and is only proportional to the number of nodes near it. In a relational database, global indexes are used to connect to each node, and these indexes add an intermediate tier to each traversal, resulting in a very high computational cost. The index-free connection provides a fast and efficient graph traversal ability for the graph database. The following figure shows the difference between a relational database and Neo4j in finding relationships:
2.2. Risk control
1. A group of two or more people form a brushing loop
2. Monitor the whole life cycle of a taxi-hailing order
At present, the data of an order involves multiple tables. With the development of business and the increasing amount of order data, the database is facing more and more connection operations:
In the data decision-making system, the longest SQL reaches 400 rows and joins seven tables. Using Neo4j's cyber language can greatly reduce the complexity of query.
2.3. Geographic information system
1. Construction of road network system
You can import the latitude and longitude data of points and paths into Neo4j and query the shortest navigation path through its built-in shortest path function. The current trajectory data can be corrected.
2. Neo4j Spatial library
Neo4j Spatial is a library that enables Neo4j to carry out complete spatial operations, supports the import of ESRI Shapfile files and OSM data, supports most geometric shapes such as points, lines, polygons, etc., and can carry out topological operations on spatio-temporal data, such as inclusion, overlay, intersection and so on. In addition, Neo4j Spatial can flexibly support other indexes in addition to its own R-Tree index based on spatial structure, as long as it can map data to geometric shapes, it can be processed by Neo4j Spatial. These characteristics make Neo4j more efficient and widely used in the analysis and processing of spatio-temporal data.
3. Configuration requirements 3.1. Data size
(Wuhan Road Network) the catalogue is as follows:
Screenshot of total file size
After cleaning and summarizing, it is re-imported into the graph database to display the total node data:
Current number of database nodes:
In total, Wuhan has about 2 million points of longitude and latitude, and the relationship between points and points is about 4 million. All the saved data files account for about 4.5g of disk space. In addition, attribute data such as distance and angle between points should be added later, Wuhan road network data account for a total of about 10G of disk.
According to the calculation of city size, cities of the same size as Wuhan (first-tier cities, new first-tier cities) are:
Beijing, Shanghai, Guangzhou, Shenzhen, Chengdu, Hangzhou, Wuhan, Chongqing, Nanjing, Tianjin, Suzhou, Xi'an, Changsha, Shenyang, Qingdao, Zhengzhou, Dalian, Dongguan, Ningbo. It is conservatively estimated to occupy 200g of disk space. The road network data of second-tier cities are conservatively estimated at 100g. The data between cities is about 100G (the amount of data is estimated by the proportion of POI), so apply for machine disk 500G.
3.2. CPU, memory requirements
Use py2neo to access the graph database and simulate 20 threads to continuously calculate the shortest path. The machine configuration and CPU are as follows:
Program screenshot
Screenshot of machine configuration
CPU status screenshot
Screenshot of memory status
The figure uses 8 core processors, 8 gigabytes of memory, CPU and memory are basically used up, and the access frequency is about 20 times per second, which is 11.57 times per second according to the daily million orders plan, and the peak time is in the order of 10 per second. Reserve some spare time for the operating system, so apply for 16GB of memory.
4. Expansion scheme 4.1. Read-only copy
The primary responsibility of a read-only copy is to expand the workload of graph operations (such as Cypher queries, process processing, etc.). Read-only replicas are like caching of protected data in the core cloud server, but they are not simple key caching. In fact, a read-only copy is a fully functional Neo4j database that can perform arbitrary (read-only) graph queries and process processing.
Read-only replicas replicate data asynchronously from the core cloud server through transaction logs. A read-only replica polls the core cloud server periodically (usually within milliseconds) to find any new transactions processed since the last poll, and then the core cloud server sends these new transactions to the read-only replica. A large number of read-only copies can replicate data from a relatively small number of core cloud servers, thus ensuring that a large number of graph query workloads are shared.
4.2. Urban sub-library
Because our query has a high degree of independence, for example, a ride-hailing path in Wuhan is completely within the scope of Wuhan data to complete the query. Therefore, when the subsequent load pressure increases, the data of different cities can be deployed to different machines, and the data needed by inter-city carpooling can be deployed separately. The query pressure can also be shared by placing data according to different longitude and latitude ranges.
Is there anything you don't understand about the advantages of the above database Neo4j application in the GIS system? Or if you want to know more about it, you can continue to follow our industry information section.
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.