Meet the Future | Dialogue Zhu Xianwen: PostgreSQL is a dark horse that is about to launch. 04/28 Update SLTechnology News&Howtos

Meet the Future | Dialogue Zhu Xianwen: PostgreSQL is a dark horse that is about to launch.

2025-04-28 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Database >

Shulou(Shulou.com)06/01 Report--

In the 2017 DB-Engine annual database list, PostgreSQL topped the list for its popularity of more than 341 other monitored database management systems and was named DBMS of the year. Its overall ranking also surpasses MongoDB and ranks fourth in its popularity.

PostgreSQL is a dark horse in the field of DB, which has been living silently in the shadow of MySQL. This year, with the release of version 10.0, the introduction of Declarative Partitioning, improved query parallelism, logical replication and synchronous replication of Quorum Commit, the influence of PostgreSQL 10 is growing.

Today, we have the honor to invite Mr. Zhu Xianwen, an expert from PostgreSQL, to share with us the core technology, development status and future direction of PostgreSQL.

Meet the future

An exclusive interview with PostgreSQL on DB stage who is King of Kings

I am Zhu Xianwen, the general manager and founder of Chengdu Wenwu Information Technology Co., Ltd. I have been in the IT industry for nearly 20 years, and I am mainly familiar with the underlying technologies of IT infrastructure, such as database, storage and clustering. Before that, I worked for companies such as Oracle,Veritas,IBM, where my R & D experience was mainly in Oracle RAC, Storage and clusters, involving low-level technologies.

We are an entrepreneurial team with less than 20 people at this stage. We focus on PostgreSQL database business solutions and technical services, products and solutions, such as clustering, disaster recovery, backup, consulting, etc.

We have a self-developed high-performance private cloud system for databases, which supports PostgreSQL and Oracle databases to run efficiently and reliably.

As a senior expert in the field of PostgreSQL, please briefly introduce the development of PostgreSQL technology. 2 I really don't dare to be an expert. I'm just familiar with PostgreSQL.

PostgreSQL is a very advanced open source database with many advanced features and rich enterprise functions. There are many successful cases in finance, banking, telecommunications, manufacturing and other industries.

The Development of PostgreSQL

The predecessor of PostgreSQL is a joint research project between the US Department of Defense and UC Berkeley University, called Ingres, which originated in 1973; the research project was terminated in 1985, then open source, and named Postgre, and then renamed Postgre95; in 1996 because it joined the full SQL92 standard support, in order to emphasize the support for SQL, it changed its name to PostgreSQL, which has been in use until now. So far, its continuous active development history has been more than 32 years. Taking into account the development history of the Ingres period, the project is actually close to 45 years of continuous development.

The development of PostgreSQL has gone through several important versions

Since 8.0, many enterprise functions have been gradually added, including logging, table partitioning, physical synchronous replication, physical asynchronous replication, logical replication, online hot backup, and parallel queries.

At present, the latest version is 10.1, which improves table partitioning and hash table functions.

Characteristics of PostgreSQL

PostgreSQL database is very cross-platform, supporting almost all operating systems and CPU hardware platforms, such as AIX,HPUX,Linux,BSD,Windows and so on.

The development of PostgreSQL is driven by the community, a variety of advanced features mainly come from the feedback and needs of users; the members of the community come from commercial companies, universities, research institutions around the world, the development and distribution process is very rigorous, and the product code quality is very high. At present, many domestic companies develop their own commercial products based on PostgreSQL database.

There are also some obvious features, such as very rich data types, rich development interfaces and programming language support, rich index types, many enterprise-level advanced features, and so on, all of which can meet the requirements of most enterprise applications.

The development of PostgreSQL

Support for PostgreSQL databases is the same as for commercial databases, and every distribution community has supported it for five years since 6.3. this tradition has been going on for 20 years since 1998.

From the perspective of domestic use, the influence of PostgreSQL is becoming stronger and stronger, and more and more professional users use PostgreSQL in their business systems, such as Ping an of China, China Mobile, Unicom, the Internet, including Qunar, Tencent, and Ali.

It is becoming more and more perfect in terms of ecological zone and support. now there are professional companies, including Huawei, Tencent, Ali and our Chengdu Wenwu Information, which provide commercial support and services, and develop their own high-performance databases based on it.

What are the new features and technology points you are most concerned about in the PostgreSQL 10 release? Or what do you think is the most important change? three

In PostgreSQL 10, there are many new features. Only some of them are listed below. For details, please refer to the official Wiki: https://wiki.postgresql.org/wiki/New_in_postgres_10.

Big data processing: native partitioning, parallel execution, FDW distribution / push-down, faster query support

Replication and very horizontal expansion: logical replication, synchronous replication to achieve some of the functions of Quorum Commit- class Raft, temporary replication slots support, connection layer failover and routing, enhanced physical replication

System management: pg_receivewal supports compression, pg_stat_activity has a special background processing process, etc.

SQL function: Identity Columns,Crash Safe,Replicable HashIndexes,Transition Tables for Triggers

XML and JSON: full-text search support for XMLTable,JSON and JSONB

What is the best application scenario for PostgreSQL? What are the more successful cases and practices? What is the current market demand? Four people think that PostgreSQL is suitable for enterprise OLTP applications that require very high performance, reliability and business continuity, as well as small-scale OLAP applications, such as OLAP systems with less than 50T of data.

At present, there are still many cases that can be used for reference in China, such as Ping an Group has more than 1500 instance deployments; Leyou Maternal and Baby supplies Store, the core database system is a nearly 10T PostgreSQL online database to support the national business; in addition, there are exploratory, Qunar, Baidu Maps, etc., all have a large amount of PostgreSQL deployment, efficiently and reliably support the business system. There are some traditional industries, such as Zhejiang Mobile, Hubei Mobile, China Unicom and so on.

According to the information I know, the market demand for PostgreSQL database has been growing rapidly, and the growth is mainly concentrated in two aspects:

On the one hand, the newly built OLTP system requires high reliability and business continuity, and more and more users regard PostgreSQL as the preferred database.

On the other hand, many small OLAP business systems with less than 50T of data will give priority to PostgreSQL as the analysis engine.

This demand has been particularly evident in the last year or two.

Can you briefly introduce what are the modes of the high availability architecture of PostgreSQL database in Internet mode? five

The first is basically the same as the high-availability architecture of other databases, that is, the shared storage mode is adopted, and the database is stored on the shared storage; under normal circumstances, the host connects to the storage and starts the database to provide services; when the host fails, the standby machine takes over the storage and starts the database to continue to provide services. The advantage of this architecture is that the data is specially stored to provide protection, so there is no need to worry about loss. The time for switching services needs to be decided by the cluster management software. Generally speaking, the switching can be completed basically.

The second is a highly available architecture based on stream replication, which has several stages of development.

(1) the first stage is based on the replication of PostgreSQL WAL log files, which is rarely used at present. The general working principle is that there is a master database and a standby database in the cluster. When the WAL log is archived, the file is copied to the slave database at the same time. The slave database is always in a state of recovery. The WAL log files copied by the host are received and immediately restored to the slave. When the host goes down, the standby database immediately switches the mode and reverts to the external service of the main database.

(2) the second stage is physical replication-stream replication, the main database is working normally, and all committed transactions will not only be written in the local WAL log file, but also transfer the data to the standby database through the network. By controlling the confirmation of data transmission time on the network, it can be divided into asynchronous replication and synchronous replication. These two replication methods will not only involve indicators such as RTO and RPO defined by SLA, but also related to system performance.

(3) the current stage is a stage in which physical flow replication methods are abundant. In the previous replication mode, there are few means to control synchronous replication; at this stage, we can not only control how many synchronous replications are in the cluster, but also control the confirmation mode of successful data submission, such as how many synchronous replication nodes are successfully committed, how they are successfully submitted on synchronous nodes, first n, any n and other fine-grained control over the behavior of confirmation information when replication is successful. At the same time, you can also control the performance in the replication process in a relatively fine-grained way, such as buffer confirmation sent to the standby library, writing wal confirmation to the standby library, or replay confirmation required by the standby library.

The third is logical replication. Logical replication has many advantages, such as cross-platform and cross-operating system, you can control the tables that need to be replicated instead of the entire library to replicate part of the data, such as data synchronization for OLAP analysis system; it can also be used to upgrade business systems without downtime.

Another point is that PostgreSQL is rich in highly available solutions, including open source solutions such as pgpool, and some commercial solutions, such as our company's ECOX system. When designing and selecting highly available solutions, it is best for customers to purchase professional services in a rigorous production system. We are a good domestic service team and can provide complete solutions and related technologies.

Would you please introduce the more mature and popular storage engines in PostgreSQL and their usage scenarios? six

PostgreSQL does not have as many storage engines as MySQL databases. PostgreSQL has only one storage engine, is very serious about transaction processing, and is mainly used in high-performance OLTP business scenarios. It can also be used in small OLAP analytical business scenarios.

What is the biggest challenge that PostgreSQL database faces in the process of developing towards automatic operation and maintenance? How to overcome it? seven

PostgreSQL database, unlike our commonly used Oracle database, if the parameters are set properly, the application design is better, in this case, it does not need much maintenance.

For PostgreSQL, instead, we need to focus on the reliability of the storage subsystem, backup and so on.

The reliability of the storage subsystem needs to be carefully designed, because it is not only related to the performance of the system, but also to the reliability of the data itself. If it is a rigorous commercial application, it is recommended to give priority to reliable storage system and file system. As a professional manufacturer with rich implementation experience, we will recommend users to choose ZFS first, especially native ZFS. We have a complete solution in this field.

Advantages of PostgreSQL database over other open source databases 8

Compared with other databases, the advantages of PostgreSQL are obvious, such as:

There are large potential development groups, operation and maintenance groups and complete ecology; because the ecosystem of Oracle is very perfect and mature, the learning curve of people who are familiar with Oracle skills to PostgreSQL database is very smooth and the cost is very low. According to my own experience, it can be done in two weeks.

There are a large number of key business application cases and well-known customers in banking, telecommunications, insurance, government and other industries.

There are rich development interfaces and development language support, rich data types, support for traditional relational data and non-relational data. Very good for GIS and very good for JSON,JSONB,XMLTable support.

Very rich fdw extensions that can support almost all external data sources and databases.

Very advanced enterprise features, such as replication, partitioning, online hot backup, very rich indexes, functions, etc.

Excellent cross-platform and cross-operating system support. Support almost all hardware platforms and operating systems. From mainframe to embedded system.

High-quality code, elegant design, very long, continuously active development history.

Each release will receive product support for a period of 5 years.

Of course, there are areas that need to be improved, such as:

The publicity is not in place, and there are still many users who do not know or even know what kind of database PostgreSQL is. This will lead to mistakes in the selection of technical lines by users, which will lead to high costs for the development and maintenance of the following application systems. Therefore, the training and publicity of PostgreSQL database should be strengthened.

Domestic service providers engaged in PostgreSQL are relatively few, and high-quality professional service providers are even less.

Technically, incremental backup and recovery at the block level are not supported yet (this feature is already on the line diagram and will be available soon)

May I ask you to talk about your understanding and views on OceanBase database? nine

OceanBase is a very characteristic database with a brand-new design and a good performance in terms of high performance and high reliability. We have seen the power (performance) of handling more than 260000 transactions per second in the past 17 years.

Master-slave database of OceanBase

In the traditional database master-slave architecture, such as (Active) DataGuard, the master database provides full-function read-write services, the slave database provides read-only services, and the master database to the slave database keeps the data synchronized through stream replication technology.

In OceanBase, there is also the concept of master and slave, and replication is also master to slave. Unlike traditional databases, the master and slave concepts of this database are based on partitions of partitioned tables. Each table has multiple partitions, all nodes can have all or part of partitions, partitions have multiple replicas, which are distributed on other nodes in the cluster, and replicas can be regarded as slaves, receiving only the logs above the master and playing them back into memory. A partition that can read and write is a master. A master can have multiple slaves to ensure that there are multiple copies of the data, and the master-to-slave log transmission is completed through the Paxos protocol to ensure that the data can be correctly transferred to other nodes.

The whole cluster looks to the outside world, all nodes are read-write and full-function, which has obvious advantages over the traditional database, because of more activity, load balancing can be achieved better, and high-performance and reliable systems can be realized with low-cost hardware.

Looking carefully inside the cluster, we can see that there are many master-slave replications made up of different partitions of many tables and their replicas. All log data replication is based on Paxso protocol, which ensures that any node is damaged without the risk of data loss (of course, the number of broken nodes cannot be more than half of the total number of nodes).

Another interesting design of OceanBase is similar to the processing of check_point in the traditional database. According to the load and some requirements of SLA, the check_point time of the traditional database is generally kept between a few minutes and half an hour. The database needs to do a check_point to ensure the data consistency of the database. On the other hand, OceanBase database makes the operation cycle of traditional database similar to check_point function very long, such as data integration once a day (similar to the check_point operation of traditional database). This is beneficial to the life of SSD, a new type of electronic disk, because the operations of SSD are deleted and written in large pieces, so as to avoid write magnification within SSD as far as possible. the premise of this design is that the server has a very large memory configuration, such as 256g or even 1T, and the current machine memory configuration is relatively large, so it is easy to configure clusters with large memory. Then make the data buffer of the database large enough that all the operations of the database are in memory, which is equivalent to a quasi-memory database, which is much faster than the IO that operates the disk. Through these designs, the contradiction between fully distributed, high reliability, high performance concurrency and mvcc is reasonably avoided.

The design of OceanBase is very smart, and its appearance does give me a refreshing feeling. No matter it is technological innovation, architectural innovation, or technological source, it deserves a great praise. When it comes to technological innovation and architectural innovation, our swan color cloud system is designed for high-performance database business, and there are many refreshing technological innovation points in it. I hope more people can try it.

Of course, the emergence of a new thing needs a process of time to improve and grow / mature. For OceanBase, it also needs to be perfected at present, such as technical compatibility with the existing widely used Oracle, cross-library transactions, successful application cases in key industries, etc. Let's give it more time and patience. (of course, my understanding of OceanBase is relatively limited, there may be a lot of technical features left unmentioned, please forgive me)

With the arrival of the era of big data in recent years, NoSQL database shows more and more advantages in dealing with massive data. How do you view the future of the database and what direction will it develop in? ten

In terms of the data itself, more than 95% of the data produced in the real world is relational, and only a small amount of data is non-relational.

The so-called NoSQL is a technical scheme put forward by Google to deal with big data for many years. The main idea used is Map/Reduce. Anyone who has studied the database should know that this technology actually dealt with a large number of commonly used technical ideas of computing on mainframes in the 1960s.

Google finally launched its own Spanner database, and the result is very obvious. Google itself does not use NoSQL, but returns to the traditional SQL line, so it will go in the direction of SQL in the future.

How will PostgreSQL database evolve in the future and how to deal with the real-time processing needs of massive data? eleven

PostgreSQL will continue and actively develop high-quality software in the future, and provide technical characteristics to meet the needs of the market; the domestic market will also be popular and mature, and users will accept and widely use PostgreSQL database and benefit from it.

To deal with massive data, we can choose high-performance hardware and MPP architecture technology; in the future, there will be memory-based MPP, or even databases that use GPU to accelerate operations; but in the end, we still need to look at users' own needs and business characteristics, and carry out targeted design and implementation according to these to meet this kind of needs.

Cross-border and integration, opportunities and challenges, individuals and enterprises, present and future. Let all industries, enterprises, and everyone who works hard for the future, hear the voice of the forefront of the times and witness growth!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.