What is the principle of distributed database and PostgreSQL distributed architecture? 07/06 Update SLTechnology News&Howtos

What is the principle of distributed database and PostgreSQL distributed architecture?

2025-07-06 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Database >

Shulou(Shulou.com)05/31 Report--

What is the principle of distributed database and PostgreSQL distributed architecture? I believe many inexperienced people don't know what to do about it. Therefore, this paper summarizes the causes and solutions of the problem. Through this article, I hope you can solve this problem.

What is a distributed database

The description in the principle of distributed system database system (third edition): "We define distributed database as a group of logically related databases distributed on the computer network." Distributed database management system (distributed DBMS) is a software system that supports the management of distributed databases, which makes the distribution transparent to users. Sometimes a distributed database system (Distributed Database System,DDBS) is used to represent both a distributed database and a distributed DBMS.

In the above statement, "a group of people distributed on the network and logically related to each other" is its essence. Physically, a group of logically related databases can be distributed on one or more physical nodes. Of course, it is mainly applied to multiple physical nodes. On the one hand, it is related to the improvement of the performance-to-price ratio of X86 servers, on the other hand, because the development of the Internet has brought the demand of high concurrency and massive data processing, the original single physical server node is not enough to meet this demand.

Distribution is not only reflected in the field of database, but also related to distributed storage, distributed middleware and distributed network. The ultimate goal is to better serve the change of business requirements. In a philosophical sense, it is a kind of promotion of productivity.

Second, the theoretical basis of distributed database

1. CAP theory

First of all, the technical theory of distributed database is based on the inheritance of the basic characteristics of single-node relational database, which mainly involves the ACID characteristics of transactions, the disaster recovery of transaction logs and the high availability of data redundancy.

Secondly, the design of distributed data should follow the CAP theorem, that is, a distributed system can not meet the three basic requirements of consistency (Consistency), availability (Availability) and partition tolerance (Partition tolerance) at the same time, and partition fault tolerance can not be given up at the same time, so architects usually make a tradeoff between availability and consistency. The tradeoff here is not simply total abandonment, but the sacrifice made by the business situation, or described by an Internet term "downgrade".

Aiming at the CAP theory, I consult the relevant foreign documents. The CAP theory comes from the formal proof of the Brewer conjecture published by Seth Gilbert and Nancy Lynch of MIT in 2002.

The three features of CAP are described as follows:

Consistency: ensure that each node in the distributed cluster returns the same, recently updated data. Consistency means that each client has the same view of data. There are many types of consistency models. Consistency in CAP refers to linearization or sequential consistency, which is strong consistency.

Availability: each non-failed node returns responses to all read and write requests within a reasonable amount of time. To be available, each node on both sides of the network partition must be able to respond within a reasonable amount of time.

Partition tolerance: despite the existence of network partitions, the system can continue to operate and ensure consistency. Network partitioning has become a reality. The distributed system that ensures the partition tolerance can recover from the partition properly after the partition is repaired.

The main points of the original text emphasize that CAP theory can not be simply understood as a choice of two.

In distributed database management system, partition tolerance is necessary. Network partitions and discarded messages have become a fact and must be handled appropriately. Therefore, the system designer must make a tradeoff between consistency and availability. Simply put, network partitions force designers to choose perfect consistency or perfect availability. In a given situation, a good distributed system will provide the best answer according to the level of importance of business requirements for consistency and availability, but usually the level of consistency requirements will be higher and the most challenging.

2. BASE theory

Based on the tradeoff of CAP theorem, BASE theory is developed. BASE is the abbreviation of three phrases: Basically Available (basic availability), Soft state (soft state) and Eventually consistent (final consistency). The core idea of BASE theory is that even if strong consistency can not be achieved, each application can adopt appropriate ways to make the system achieve final consistency according to its own business characteristics.

BA:Basically Available is basically available, and when a distributed system fails, it is allowed to lose part of its availability, that is, to ensure the availability of the core.

S:soft State soft state, which allows the system to have an intermediate state that does not affect the overall availability of the system.

E:Consistency is ultimately consistent, and all copies of data in the system can finally reach a consistent state after a certain period of time.

BASE theory is essentially an extension of CAP theory and a supplement to AP scheme in CAP.

Here is an additional explanation of what strong consistency is:

Strict Consistency (strong consistency), also known as Atomic Consistency (atomic consistency) or Linearizable Consistency (linear consistency), must meet two requirements:

1. Any read can read the last written data of a data.

2. All the processes in the system are in the same order as the global clock.

For relational databases, it is required that the updated data can be seen by subsequent visits, which is highly consistent. In short, the data in all nodes is the same at any time.

The final consistency of BASE theory belongs to weak consistency.

Next, we introduce another important concept of distributed database: distributed transaction. After browsing several articles about distributed transactions, I found that there are different descriptions, but the meaning is roughly the same. First of all, distributed transactions are transactions, which need to meet the characteristics of the ACID of the transaction. It is mainly considered that the data processed by business access is scattered on multi-nodes between networks. for distributed database systems, under the requirements of ensuring data consistency, distribute transactions and cooperate with multi-nodes to complete business requests.

Whether multi-nodes can work together to complete transactions normally and smoothly is the key, which directly determines the consistency of accessing data and the timeliness of response to requests. Therefore, it needs scientific and effective consistency algorithm to support it.

3. Consistency algorithm

At present, the main consistency algorithms include: 2PC, 3pc, paxos, Raft.

2PC: Two-Phase Commit (two-phase commit) is also considered to be a consistency protocol to ensure the consistency of data in distributed systems. Most relational databases use two-phase commit protocol to complete distributed transaction processing.

It mainly includes the following two stages:

Phase 1: commit transaction request (voting phase)

Phase 2: execute transaction commit (execution phase)

Advantages: simple principle and convenient implementation

Disadvantages: synchronous blocking, single point problem, inconsistent data, too conservative

3PC: Three- Phase Commi (three-phase commit) includes three phases: CanCommit, PreCommit and doCommit.

To avoid a three-phase commit when notifying all participants that the crash of one of the participants is inconsistent when the transaction is committed.

Three-phase commit adds a preCommit process on the basis of two-phase commit. When all participants receive the preCommit, they do not perform the action until the commit is received or the operation is completed after a certain period of time.

Advantages: reduce the blocking range of participants, and be able to continue to reach agreement after a single point of failure: the introduction of preCommit phase, at this stage, if there is a network partition, the coordinator can not communicate with the participants normally, and the participants will still commit transactions, resulting in data inconsistency.

The 2PC / 3PC protocol is used to guarantee the atomicity of operations belonging to multiple data fragments.

These data fragments may be distributed on different servers, and the 2PC / 3PC protocol ensures that operations on multiple servers either succeed or fail.

Paxos, Raft and Zab algorithms are used to ensure data consistency among multiple copies of the same data fragment. The following is a summary of the three algorithms.

Paxos algorithm mainly solves the single point problem of data slicing, and its purpose is to make the nodes of the whole cluster agree on the change of a certain value. Paxos (strong consistency) belongs to majority algorithm. Any point can put forward a proposal to modify a certain data, and whether to pass this proposal depends on whether more than half of the nodes in the cluster agree, so the Paxos algorithm requires that the nodes in the cluster are odd.

Raft algorithm is a simplified version of Paxos. Raft is divided into three sub-problems: first, Leader Election;, second, Log Replication;, and third, Safety. Raft defines three roles: Leader, Follower, and Candidate. At first, everyone is Follower. When Follower cannot listen to Leader, it can become Candidate and initiate a vote to select a new leader.

There are two basic processes:

① Leader election: every C andidate randomly puts forward an election plan after a certain period of time, and the one who gets the most votes in the most recent stage is chosen as Leader.

② synchronization log:L eader finds the latest record of log (records of the occurrence of various events) in the system and forces all follow to refresh to this record.

The Raft consistency algorithm simplifies the management of log copies by selecting a leader. For example, log entries (log entry) are only allowed to flow from leader to follower. ZAB is basically the same as raft.

III. Overview of PostgreSQL distributed architecture

PostgreSQL Development Timeline and Branch Diagram

1. Kernel-based distributed solution Postgres-XL

(1) what is Postgres-XL

Postgres-XL is an open source PG cluster software. XL stands for eXtensible Lattice, which means extensible PG "grid", hereinafter referred to as PGXL.

Officials say it is suitable not only for OLTP applications with high write operation pressure, but also for big data applications with read operation as the main operation. Its predecessor is Postgres-XC (PGXC for short). PGXC adds a cluster function on the basis of PG, which is mainly suitable for OLTP applications. PGXL is an upgrade product based on PGXC, adding some features suitable for OLAP applications, such as Massively Parallel Processing (MPP) features.

Generally speaking, the code of PGXL contains PG code, and installing a PG cluster using PGXL does not require a separate installation of PG. One of the problems caused by this is that it is impossible to choose any version of PG at will. Fortunately, it is more timely for PGXL to follow up with PG. The latest version of Postgres-XL 10R1 is based on PG 10.

(2) Technical architecture

Architecture figure 1

As you can see from the figure above, the Coordinator and datanode nodes can be configured as multiple and can be located on different hosts. Only the Coordinator node serves the application directly, and the Coordinator node stores the data distribution on multiple data nodes datanode.

The main components of Postgres-XC are gtm (Global Transaction Manager), gtm_standby, gtm_proxy, Coordinator and Datanode.

The global transaction node (GTM), which is the core component of Postgres-XC, is used for global transaction control and tuple visibility control. Gtm is a module for allocating GXID and managing PGXC MVCC, and there can only be one master gtm in a CLUSTER. Gtm_standby is the standby for gtm.

Main functions:

Generate a globally unique transaction ID

The status of the global transaction

Global information such as sequences

Gtm_proxy was born to reduce the pressure of gtm and is used to group tasks submitted by coordinator nodes. There can be multiple gtm_proxy in the machine.

The coordination node (Coordinator) is the interface between the data node (Datanode) and the application, which is responsible for receiving user requests, generating and executing distributed queries, and sending SQL statements to the corresponding data nodes.

Coordinator nodes do not physically store table data. Table data is distributed in the form of fragmentation or replication, and table data is stored on data nodes. When the application initiates SQL, it first arrives at the Coordinator node, and then the Coordinator node distributes the SQL to each data node to summarize the data. This system process is controlled by GXID and Global Snapshot.

The data node (datanode) physically stores table data, and the table data storage mode can be divided into two types: distributed and replicated. The data node stores only local data.

Data distribution

Replicated table copy table

Tables are replicated on multiple nodes

Distributed table distribution table

Hash

Round robin

Note: Round robin alternate placement is the easiest way to divide: each tuple is placed on the next node in turn, as shown in the following figure, to loop.

two。 Extended distributed scheme Citus

(1) what is Citus

Citus is an open source distributed database based on PostgreSQL, which automatically inherits the powerful SQL support ability and application ecology of PostgreSQL (not only the compatibility of client protocols but also the full compatibility of server extension and management tools). Citus is the not a fork of PostgreSQL, which adopts shared nothing architecture, there is no shared data between nodes, and a database cluster is formed by coordinator nodes and Work nodes. Focus on high-performance HTAP distributed databases.

Compared with stand-alone PostgreSQL,Citus, you can use more CPU cores, more memory, and save more data. By adding nodes to the cluster, you can easily extend the database.

The biggest difference between Citus and other similar PostgreSQL-based distributed solutions, such as GreenPlum,PostgreSQL-XL, is that it is an PostgreSQL extension rather than a separate branch of code. Citus can follow the evolution of PostgreSQL version with little cost and faster speed; at the same time, it can maximize the stability and compatibility of the database.

Citus supports the features of the new version of PostgreSQL and maintains compatibility with existing tools. Citus uses sharding and replication to scale out PostgreSQL across multiple machines. Its query engine will execute SQL on these servers to parallelize queries to achieve real-time (less than a second) response on large datasets.

Citus is currently divided into the following versions:

Citus Community Edition

Citus Business Edition

Cloud [AWS,citus cloud]

(2) Technical architecture

The Citus cluster consists of a central coordinating node (CN) and several working nodes (Worker).

CN only stores metadata related to data distribution, and the actual table data is divided into M fragments and scattered to N Worker. Such a table is called a sharding table, and you can create multiple copies of each shard of the sharding table to achieve high availability and load balancing.

Architecture figure 1

Citus official documentation recommends using PostgreSQL native stream replication as HA, and HA based on multiple copies may only be suitable for append only shards.

The application sends the query to the coordinator node, which is processed and sent to the work node. For each query coordinator, it is routed to a single work node or executed in parallel, depending on whether the data is on a single node or on multiple nodes. Citus MX mode allows direct access to work nodes for faster reads and writes.

Architecture figure 2 (

There are three types of tables for Citus

Sharding table (most commonly used)

Reference table

Local surface

Sharding tables mainly solve the problem of horizontal expansion of large tables. For dimension tables that are not very large and often need and sharding table Join, you can adopt a special sharding strategy, which is only divided into 1 shard and 1 copy is deployed on each Worker. Such tables are called "reference tables".

In addition to sharding tables and reference tables, there is a native PostgreSQL table that has not been shredded, which is called "local table". "local table" is suitable for some special scenarios, such as highly concurrent small table queries.

Client applications only interact with CN nodes when accessing data. After receiving the SQL request, CN generates a distributed execution plan, and sends each sub-task to the corresponding Worker node, then collects the results of Worker, and returns the final result to the client after processing.

After reading the above, have you mastered the principle of distributed database and the method of PostgreSQL distributed architecture? If you want to learn more skills or want to know more about it, you are welcome to follow the industry information channel, thank you for reading!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.