Advantages of Neo4j application in GIS system 04/24 Update SLTechnology News&Howtos

Advantages of Neo4j application in GIS system

2025-04-24 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Database >

Shulou(Shulou.com)06/01 Report--

The following talk about the advantages of the graph database Neo4j application in the GIS system, the secret of the text is close to the topic. So, no gossip, let's just read the following, I believe you will benefit from this article after reading the advantages of database Neo4j application in GIS system.

1. Overview 1.1. Brief introduction of figure database

Graph database (Graph Database) is a new type of NoSQL database based on graph theory. His data storage structure and data query method are based on graph theory. The basic elements of graph in graph theory are nodes and edges, and the corresponding nodes and relations in graph database.

In a graph database, the relationship between data and data forms a graph structure through nodes and relationships, and realizes all the characteristics of the database on this structure, such as the ability to create, read, update, delete (Create, Read, Update, Delete, CRUD) and other operations on graph data objects, as well as the ability to deal with transactions and high availability.

1.2. Application case

From the perspective of system science, the world is made up of various systems, and the system is made up of the connections between the machines that are part of the system. From this level, we can directly map the system and its relations to the nodes and relations in mathematical graph theory, thus using graph theory to model the world intuitively, and graph database technology is based on graph theory. it can also be said to be the basic and universal "language" to express the colorful world. Compared with the original system, the simulation system described by this "language" has the characteristic of "high fidelity", which is consistent with people's usual cognition of the system, and is very intuitive, natural, direct and efficient. There is no need for the transformation and processing of the intermediate process, which often complicates the problem or leaves out a lot of valuable information. It is precisely because the graph database technology can directly describe a variety of complex real-world systems, it has a wide range of applicability and higher application value.

In fact, Neo4j has successfully "captured" a large number of customers, and the number of customers and applications are still growing. These customers include internationally renowned companies or institutions such as Cisco, Hewlett-Packard, Wolman, LinkedIn, Adidas, and the FT Financial Times. At present, the industry classification of Neo4j customers is mainly concentrated in the areas of social networking, human resources and recruitment, finance, insurance, retail, advertising, e-commerce, logistics, transportation, IT, telecommunications, manufacturing, printing, cultural media and medical care. A large number of Neo4j customers mentioned above repeatedly complain about the shortcomings of the original products before adopting the graph database products, which need to be solved and implemented by the new products, commonly known as: pain points, customers and major pain points are listed in the following table:

Serial number

Enterprise name

Pain point analysis or challenge

one

MigRaven

Authorization and access control

two

Adidas

The data needed to provide personalized experience is distributed on all kinds of isolated islands of information.

three

BILLES

Increase online customers: must be able to handle a large number of small print orders, a large number of acquisitions led to the patchwork of the IT system

four

Cerved

Improve computational efficiency and rapid identification, directly or indirectly control the company's personnel: access to the top technology of big data's network analysis

five

Die Bayerische

Outdated management systems and different data formats: creating a standardized data framework

six

ICIJ

Help reporters break the complex Swiss building data to get better investigative news.

seven

IRCC

Relational databases do not provide enough flexibility for multiple virtual functions

eight

LinkedIn China

Launch the social networking platform as quickly as possible, while making room for important users and feature growth

nine

Musimap

To map all music titles, each with 55 weighted description criteria to allow in-depth processing and real-time recommendations

ten

Qualia

The original product is only optimized to track user behavior on one device

eleven

SchleichGmbH

Greater scalability and flexibility are needed in the product data network

twelve

TRANSPARENCY-ONE

Manage and search a large amount of data, no performance problems

thirteen

Wanderu

Help consumers find and book inter-city buses and trains when traveling in the United States

fourteen

WineDataSystem

There are no available reference resources, a lot of information and questions about the convenience of access and the flexibility of users

fifteen

Wobi

Quickly analyze a large amount of entire customer information

sixteen

EBay

Supports large-scale complex routing queries with fast and consistent performance

seventeen

Global500 Logisitics

Geolocation routing information is generated all the time, and the business needs these location information with complex relationships to support it, which leads to serious challenges to traditional relational databases.

eighteen

Glowbl

Talk about many possible designs, ah, the network comes together, shows all the contacts in the form of a graph, and manages these contacts and their interactions in real time.

nineteen

InfoJobs

Set up a new portal to simulate the potential career path of job seekers

twenty

Megree

Provide an overall view of the relationship and strength of these links

twenty-one

Pitney bowes

Gain 360-degree customer insight and competitive advantage by building next-generation tools

twenty-two

Walmart

Provide customers with the best online shopping experience

twenty-three

Telenor

Behind the online self-service management portal, you can find the agreement responsible for managing the customer's organizational structure.

two。 The application and advantage of graph database in 2.1. Advantages of Graph Database

Neo4j was originally designed to better describe the relationships between entities. In real life, each entity has inextricable relationships with other entities around them, and there is a lot of potential information in these relationships. However, the traditional relational database pays more attention to describing the internal attributes of the entity, and the relationship between the entity and the entity is mainly realized by foreign keys. Therefore, join operations are required when querying the relationships of an entity, especially deep relational queries require a large number of join operations, and join operations are usually very time-consuming. With the rapid increase of relational data in the real world, relational database has been gradually difficult to carry query massive data deep-seated relationship requires a large number of database table operations brought about by the computational complexity, Neo4j came into being under such circumstances.

2.1.1. Index-free adjacency

An important feature of Neo4j is to ensure the speed of relational query, that is, index-free adjacency attributes, and each node in the database maintains references to its neighboring nodes. So each node is equivalent to a microindex of its neighbor, which is much less expensive than using a global index. This means that the query time has nothing to do with the overall size of the graph and is only proportional to the number of nodes near it. In a relational database, global indexes are used to connect to each node, and these indexes add an intermediate tier to each traversal, resulting in a very high computational cost. The index-free connection provides a fast and efficient graph traversal ability for the graph database. The following figure shows the difference between a relational database and Neo4j in finding relationships:

2.2. Risk control

1. A group of two or more people form a brushing loop

2. Monitor the whole life cycle of a taxi-hailing order

At present, the data of an order involves multiple tables. With the development of business and the increasing amount of order data, the database is facing more and more connection operations:

In the data decision-making system, the longest SQL reaches 400 rows and joins seven tables. Using Neo4j's cyber language can greatly reduce the complexity of query.

2.3. Geographic information system

1. Construction of road network system

You can import the latitude and longitude data of points and paths into Neo4j and query the shortest navigation path through its built-in shortest path function. The current trajectory data can be corrected.

2. Neo4j Spatial library

Neo4j Spatial is a library that enables Neo4j to carry out complete spatial operations, supports the import of ESRI Shapfile files and OSM data, supports most geometric shapes such as points, lines, polygons, etc., and can carry out topological operations on spatio-temporal data, such as inclusion, overlay, intersection and so on. In addition, Neo4j Spatial can flexibly support other indexes in addition to its own R-Tree index based on spatial structure, as long as it can map data to geometric shapes, it can be processed by Neo4j Spatial. These characteristics make Neo4j more efficient and widely used in the analysis and processing of spatio-temporal data.

3. Configuration requirements 3.1. Data size

(Wuhan Road Network) the catalogue is as follows:

Screenshot of total file size

After cleaning and summarizing, it is re-imported into the graph database to display the total node data:

Current number of database nodes:

In total, Wuhan has about 2 million points of longitude and latitude, and the relationship between points and points is about 4 million. All the saved data files account for about 4.5g of disk space. In addition, attribute data such as distance and angle between points should be added later, Wuhan road network data account for a total of about 10G of disk.

According to the calculation of city size, cities of the same size as Wuhan (first-tier cities, new first-tier cities) are:

Beijing, Shanghai, Guangzhou, Shenzhen, Chengdu, Hangzhou, Wuhan, Chongqing, Nanjing, Tianjin, Suzhou, Xi'an, Changsha, Shenyang, Qingdao, Zhengzhou, Dalian, Dongguan, Ningbo. It is conservatively estimated to occupy 200g of disk space. The road network data of second-tier cities are conservatively estimated at 100g. The data between cities is about 100G (the amount of data is estimated by the proportion of POI), so apply for machine disk 500G.

3.2. CPU, memory requirements

Use py2neo to access the graph database and simulate 20 threads to continuously calculate the shortest path. The machine configuration and CPU are as follows:

Program screenshot

Screenshot of machine configuration

CPU status screenshot

Screenshot of memory status

The figure uses 8 core processors, 8 gigabytes of memory, CPU and memory are basically used up, and the access frequency is about 20 times per second, which is 11.57 times per second according to the daily million orders plan, and the peak time is in the order of 10 per second. Reserve some spare time for the operating system, so apply for 16GB of memory.

4. Expansion scheme 4.1. Read-only copy

The primary responsibility of a read-only copy is to expand the workload of graph operations (such as Cypher queries, process processing, etc.). Read-only replicas are like caching of protected data in the core cloud server, but they are not simple key caching. In fact, a read-only copy is a fully functional Neo4j database that can perform arbitrary (read-only) graph queries and process processing.

Read-only replicas replicate data asynchronously from the core cloud server through transaction logs. A read-only replica polls the core cloud server periodically (usually within milliseconds) to find any new transactions processed since the last poll, and then the core cloud server sends these new transactions to the read-only replica. A large number of read-only copies can replicate data from a relatively small number of core cloud servers, thus ensuring that a large number of graph query workloads are shared.

4.2. Urban sub-library

Because our query has a high degree of independence, for example, a ride-hailing path in Wuhan is completely within the scope of Wuhan data to complete the query. Therefore, when the subsequent load pressure increases, the data of different cities can be deployed to different machines, and the data needed by inter-city carpooling can be deployed separately. The query pressure can also be shared by placing data according to different longitude and latitude ranges.

Is there anything you don't understand about the advantages of the above database Neo4j application in the GIS system? Or if you want to know more about it, you can continue to follow our industry information section.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.