Postgresql database architecture 07/06 Update SLTechnology News&Howtos

Postgresql database architecture

2025-07-06 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Database >

Shulou(Shulou.com)06/01 Report--

Postgresql database is composed of connection management system (system controller), compiler execution system, storage management system, transaction system and system table.

①: connection management system: receives the request of the external operation to the system, preprocesses and distributes the operation request, and plays the role of system logic control.

②: compilation and execution system: it is composed of query compiler and query executor, which completes the analysis, processing and transformation of operation requests in the database, and finally realizes the operation of data in physical storage media.

③: storage management system: composed of index manager, memory manager and external memory manager, it is responsible for storing and managing physical data and providing support for compiling query systems.

④: transaction system: it is composed of transaction manager, log manager, concurrency control and lock manager. Log manager and transaction manager support transaction consistency for operation request processing. Lock manager and concurrency control provide consistency support for concurrent access data.

⑤: system table: it is the meta-information management center of postgresql database, including database object information and database management control information. The system table manages the metadata information and connects the modules of postgresql database organically to form an efficient data management system.

1. System table:

Data dictionary is the core of management and control information in relational database system. In postgresql database system, system table plays the role of data dictionary.

System table is the place where structural metadata is stored in postgresql database, and it is represented as a common table or view with system information in postgresql. Users can delete and then rebuild these tables, add columns, insert and update values, but the user to modify the system will lead to inconsistency of system information, which will lead to system control chaos. Under normal circumstances, the system table information should not be manually modified by the user, but should be automatically maintained by the system table operation associated with the sql command.

Each database of postgresql has its own set of system tables, most of which are copied from the template database when the database is created, so the data in these system tables are related to the database to which they belong. Only a few system tables are shared by all databases (such as pg_database), and the data in these system tables is about all databases.

Because the system table holds all the metadata of the database, the access to the system table is very frequent when the system is running. In order to improve the system performance, the shared system table cache is established in memory, and the hash function and hash table are used to improve the query efficiency.

Main system table functions:

①: pg_namespace:

Pg_namespace is used to store namespaces. Namespaces are the underlying structure of the sql92 schema: each namespace has an independent collection of relationships, types, and so on, but does not conflict with each other. The namespace hierarchy of postgresql is: database, schema, table, attribute.

②: pg_tablespace:

Pg_tablespace stores tablespace information, and placing tables in different tablespaces helps implement disk file layouts. Pg_tablespace has only one copy in the entire data cluster, which means that all databases in the same data cluster share a pg_tablespace table, instead of each database having its own pg_ tablespace table.

③: pg_database:

Pg_database stores the information of the database in the current data cluster, and it is also a system table shared in the whole cluster. Each meta-ancestor in the table represents a database in the cluster, and each database is assigned an OID as a unique identity and stored in the hidden attributes of the corresponding meta-ancestor.

④: pg_class:

Pg_class stores database object information about tables and tables with similar structures, including indexes, sequences, views, compound data types, toast tables, and so on. Each object is represented as a meta-ancestor in pg_class, and each object is assigned an OID as a unique identity, and the OID is stored as a hidden attribute of the meta-ancestor.

⑤: pg_type:

Pg_type stores data type information. Basic data types and enumerated types are created by create type, domain types are created by create domain, and compound data types are created automatically when the table is created.

⑥: pg_attribute:

Pg_attribute stores the attribute information of the table and has a meta-ancestor for each attribute of the table in the database.

⑦: pg_index:

Pg_index stores specific information about the index.

The system view is as follows:

These system views provide methods for querying system tables and accessing the internal state of the database, most of which are as follows:

Pg_cursors-open cursors

Pg_group-the group of database users

Pg_indexes-Index

Pg_locks-the lock currently held

Pg_prepared_statements-prepare statement

2. Data cluster

After the postgresql installation is complete, you must first use the initdb program to initialize the datastore on disk, that is, the dataset cluster. The user databases and system databases managed by postgresql are called dataset clusters. In the implementation of postgresql, a database is a collection of files on disk, but these files have specific file names, storage locations, and so on, and some files are related to each other. By default, all data for postgresql is read and stored in its data directory, which is usually referenced by the environment variable pgdata.

The configuration file and other subdirectories of the dataset cluster are also saved in pgdata. Each directory and its purpose are described as follows:

Pg_version: a file containing the number of the major version of postgresql.

Base directory: contains each database directory, named after the OID number of the database, where the directory named 1 corresponds to the template database template1

Global directory: a global table that contains the sharing of the entire cluster, such as pg_database

Pg_clog directory: a subdirectory containing transaction commit status data

Pg_multixact directory: a subdirectory containing multiple transaction state data (row locks for sharing)

Pg_stat_tmp directory: a subdirectory that contains temporary files needed by the statistics subsystem.

Pg_subtrans directory: a subdirectory that contains subtransaction status data.

Pg_tblspc directory: contains matching subdirectories that point to tablespaces.

Pg_twophase directory: a subdirectory that contains status files for preparatory transactions

Pg_xlog directory: a subdirectory that contains WAL (pre-written log) files.

Postmaster.opts file: records the command line parameters used when the server was last started.

Postmaster.pid file: a lock file that records the process number of the current daemon postmaster and the shared memory segment ID, which will be deleted after the server shuts down.

Postgresql.conf file: the main configuration file in which user-configurable parameters other than host-based access control and username mapping are saved.

Pg_hba.conf file: a host-based access control file that saves setting information about the client authentication method.

Pg_indent.conf file: a username mapping file that defines the correspondence between the operating system user name and the postgresql user name, which will be used by pg_hba.conf.

Introduction to the main parameters of initdb:

-A method specifies the default user authentication method for local connections.

The path to the D datadir data directory must be an empty directory that is readable and writable to the current user, or it can be specified using the environment variable pgdata

-E encodinc specifies the default database encoding method

-U name specifies the database super user name

-W instructs the superuser to set the password

-d runs in debug mode and can print out a lot of debugging information

-L specifies the location of the input file (such as postgres.bki).

3. System database:

After initialization is complete, three system databases are created by default: template1,template0 and postgres. Both template0 and postgres are copied from template1 during initialization.

Template1 and template0 databases are used to create databases. In postgresql, a new database is created by copying from a template database. In the command to create a database, you can use the-T option to specify which database is used as a template to create a new database.

Postgres is used to provide a connectable database for initializing users, just like a user's home directory in a Linux system.

Note:

These system databases can be deleted, but the two template databases must change the datistemplate property of their tuples in pg_database to false before deleting, otherwise it will prompt "cannot delete a template database".

4. Postgresql process structure:

Postgresql uses a dedicated server process architecture in which the two main processes are the daemon postmaster and the service process postgres. In essence, both postmaster and postgres are processes formed by loading postgres programs, but in different branches at run time. The daemon postmaster is responsible for the startup and shutdown of the entire system. It listens and accepts connection requests from the client and assigns the service process postgres to it. The service process postgres accepts and executes the commands sent by the client. It calls the main functional modules (such as compiler, optimizer, executor, etc.) on the underlying modules (such as storage, transaction management, index, etc.), completes all kinds of database operations on the client, and returns the execution results.

4.1. daemon postmaster

It is a master control process running on the server, which is responsible for starting and shutting down the whole system, and completes the recovery of the system when there is an error in the service process. It manages database files, listens and accepts connection requests from clients, and requests fork a postgres service process for client connections to execute various commands on the database on behalf of the client. At the same time, postmaster also manages auxiliary processes related to the operation of the database. Users can start postmaster using the postmaster, postgres, or pg_ctl commands.

Postmaster is like a dispatch center that handles client requests. When the client program needs to operate on the database, it will first send a start message to the postmaster to make a request. Postmaster will authenticate the client based on the information in the start message, and if the authentication passes, postmaster will create a new service process postgres for the client. Then postmaster transfers the interaction on the client side to the postgres service process, and postgres completes the database operation required by the client.

Postmaster is also responsible for system-wide operations, such as interrupts, which are not performed by postmaster itself, but simply assigns a child process to handle them at the appropriate time. At the same time, it restarts the system when the database crashes. The postmaster process establishes a shared memory and semaphore library at the beginning, and the communication between postmaster and its child processes is realized through shared memory and signals. This multi-process design makes the stability of the whole system better. Even if a background process crashes, it will not affect the work of other processes in the system. Postmaster only needs to reset the shared memory to recover from the crash of a single background process.

4.1.1. Worker process starts:

In postgresql, the daemon postmaster is responsible for booting and shutting down the entire system. In an example of a database management system, in addition to the daemon process postmaster and the service process postgres, there are also some other background processes dedicated to a specific task. When there is a problem with these worker processes, the postmaster process reproduces the problematic worker process. In the process of creating postmaster, the syslogger log process will be started first, and the initialization of the pgstat process and the autovacuum process will be completed. The status of the worker process will be detected in the listening cycle of the postmaster, and these auxiliary processes will be created or recreated.

①: syslogger worker process

The postmaster process calls the syslogger_start function to start the syslogger child process. Syslogger is a new feature added after 8.0. it collects all stderr output from postmaster, all background processes, and other child processes through a pipeline and writes them to the log file. The log file size and lifetime are set in the postgresql.conf configuration file, and if the current log file reaches these limits, it will be closed and a new log file will be created.

②: worker process initialization

After the syslogger worker process is started, postmaster begins to initialize the pgstat process and the autovacuum process, and allocate the necessary resources to the process.

During the initialization process of the pgstat process, the UDP port creation and testing for sending and receiving statistical messages is mainly completed. The UDP port creation process is the same as the TCP port creation process described earlier, but the socket port type is changed to sock_dgram.

4.2. Auxiliary process:

Each auxiliary process of postgresql completes specific functions and supports the operation and management of postgresql system. In the postmaster process, a global variable is set for each worker process to identify the process number, which is sysloggerPID, multiple writerPID, walwriterPID, autovacPID, pgarchPID, pgstatPID. When the values of these variables are 0, it identifies that the corresponding process has not been started.

4.2.1. Syslogger Syslog process:

Log information is an effective means for database administrators to obtain the running status of the database system. The size of the log file can be set in the configuration options of syslogger. Syslogger will close the current log file and generate a new log file when the log file reaches the specified size. The relevant parameters for log operations can be configured in the postgresql.conf configuration file as follows:

Log_destination: configure the log output target. Different values will be set according to different running platforms. The default is stderr under Linux.

Logging_collector: whether to enable the log collector and start the log function when setting on, otherwise the system will not generate a system log assistant process.

Log_directory: configure the log output folder.

Log_filename: configure the naming rules for log file names.

Log_rotation_size: configure the log file size at which the current log file will be closed, and then create a new file to use as the current log file.

(of course, other configuration parameters are provided in postgresql.conf, which can be set as needed. )

4.2.2. Background write process bgwriter

A BgWriter process is a process that writes dirty pages in shared memory to disk. It has two functions: one is to regularly brush dirty data from the memory buffer to the disk to reduce the blocking when querying; the other is that PG needs to write all dirty pages to disk when making regular checkpoints, and write some dirty pages in advance through BgWriter, which can reduce the IO operations to be carried out when setting checkpoints (CheckPoint, a kind of database recovery technology), so that the IO load of the system tends to be stable. BgWriter is a new feature added after PostgreSQL 8.0, and its mechanism can be controlled by configuring parameters starting with "bgwriter_" in the postgresql.conf file:

Bgwriter_delay:

The interval of time between two consecutive flush data of a backgroud writer process. The default value is 200, in milliseconds.

Bgwriter_lru_maxpages:

The maximum amount of data written by the backgroud writer process at a time. The default value is 100, in buffers. If the amount of dirty data is less than this value, the write operation is all completed by the backgroud writer process; otherwise, if it is greater than this value, the greater part will be completed by the server process process. Setting this value to 0 disables the backgroud writer writing process, which is completely done by server process, while a value of-1 means that all dirty data is done by backgroud writer. (checkpoint operation is not included here)

Bgwriter_lru_multiplier:

This parameter represents the number of blocks written to disk each time, of course, the value must be less than bgwriter_lru_maxpages. When the setting is too small, the amount of dirty data that needs to be written is greater than the amount of data written each time, so that the remaining work that needs to be written to disk needs to be completed by the server process process, which will degrade performance. A large value configuration means that the amount of dirty data written is more than the amount of buffer needed at that time, which makes it convenient to apply for buffer work again later, and the waste of IO may occur at the same time. The default value for this parameter is 2.0.

The maximum data volume of bgwriter is calculated as follows:

1000 million bgwriterators delayworthy bgwriterators lrusting maxpagesystems 8K = maximum amount of data

Bgwriter_flush_after:

BgWriter is triggered when the data page size reaches bgwriter_flush_after, and the default is 512KB.

4.2.3, PgArch (archiving) process

It is similar to the ARCH archiving process of Oracle database, except that ARCH archives redo log, and PgArch archives WAL logs. If you go deeper, the WAL log will be recycled, that is, the past WAL log will be overwritten by the newly generated log, and the PgArch process is designed to back up the WAL log before it is overwritten. The purpose of archiving logs is to enable the database to use full backups and the archived logs generated after backups, thus bringing the database back to any point in time. The PITR (Point-In-Time-Recovery) technology provided by PG since version 8.x is the use of archived logs.

The PgArch process is configured with the following parameters in the postgresql.conf file:

#-Archiving-# archive_mode = off # enables archiving Off, on Or always # (change requires restart) # archive_command =''# command to use to archive a logfile segment # placeholders:% p = path of file to archive #% f = file name only # e.g. 'test!-f / mnt/server/archivedir/%f & & cp% p / mnt/server/archivedir/%f'#archive_timeout = 0 # force a logfile segment switch after this # number of seconds 0 disables archive_mode: indicates whether to perform archiving operation. You can choose off (off), on (startup) and always (always on). The default value is off (off). Archive_command: a command set by the administrator to archive WAL logs. In the command for archiving, the predefined variable "% p" is used to refer to the WAL full-path file name that needs to be archived, and "% f" refers to the file name without a path (where the path is relative to the current working directory). The command specified by archive_command is invoked when each WAL segment file is archived. When the archiving command returns 0, PostgreSQL assumes that the file was archived successfully, and then deletes or recycles the WAL segment file. Otherwise, if a non-zero value is returned, PostgreSQL will assume that the file has not been successfully archived and will periodically retry until it succeeds. Archive_timeout: indicates the archiving cycle. The WAL segment is forced to be switched when the time set by this parameter is exceeded. The default value is 0 (indicates that this feature is disabled).

4.2.4. PgStat (Statistics Collection) process:

The PgStat process is the statistical information collector of the PostgreSQL database, which is used to collect statistical information during the operation of the database, such as the number of additions and deletions of tables, the number of data blocks, the changes of indexes, and so on. The main purpose of collecting statistics is to enable the optimizer to make a correct judgment and choose the best execution plan. The parameters related to the PgStat process in the postgresql.conf file are as follows:

#-# RUNTIME STATISTICS#-- -- #-Query/Index Statistics Collector-# track_activities = on#track_counts = on#track_io_timing = off#track_functions = none # none Pl, all#track_activity_query_size = 1024 # (change requires restart) # stats_temp_directory = 'pg_stat_tmp'track_activities: indicates whether statistics collection is enabled for commands currently executed in the session This parameter is only visible to superusers and session owners, and the default value is on (on). Track_counts: indicates whether statistics collection is enabled for database activities. Since database statistics are required when selecting cleaned databases in the AutoVacuum automatic cleanup process, the default value of this parameter is on. Track_io_timing: the data block iCompo is called regularly, and the default is off, because the database time will be called repeatedly when set to the open state, which adds a lot of overhead to the database. Only the superuser can set track_functions: indicates whether to turn on the number of calls to the function and the call time statistics. Track_activity_query_size: sets the number of bytes used to track the currently executed commands for each active session. The default value is 1024, which can only be set after the database starts. Stats_temp_directory: the temporary storage path for statistics. The path can be a relative path or an absolute path, and the parameter defaults to pg_stat_tmp. Setting this parameter can reduce the physical Icano of the database and improve performance. This parameter can only be modified in the postgresql.conf file or in the server command line.

4.2.5, AutoVacuum (automatic cleanup) process

In a PG database, after a UPDATE or DELETE operation on the data, the database does not immediately delete the old version of the data, but marks it as deleted. This is because the PG database has multiple versions of the mechanism, and if these older versions of the data are being opened by another transaction, it is necessary to keep them temporarily. When the transaction commits, the old version of the data is worthless, the database needs to clean up the garbage to make room, and the cleanup is carried out by the AutoVacuum process. The parameters related to the AutoVacuum process in the postgresql.conf file are:

#-# AUTOVACUUM PARAMETERS#-- -# autovacuum = on # Enable autovacuum subprocess? 'on' # requires track_counts to also be on.#log_autovacuum_min_duration =-1 #-1 disables, 0 logs all actions and # their durations > 0 logs only # actions running at least this number # of milliseconds.#autovacuum_max_workers = 3 # max number of autovacuum subprocesses # (change requires restart) # autovacuum_naptime = 1min # time between autovacuum runs # autovacuum_vacuum_threshold = 50 # min number of row updates before # vacuum#autovacuum_analyze_threshold = 50 # min number of row updates before # analyze#autovacuum_vacuum_scale_factor = 0.2 # fraction of table size before vacuum#autovacuum_analyze_scale_factor = 0.1 # fraction of table size before analyze # autovacuum_freeze_max_age = 200000000 # maximum XID age before forced vacuum # (change requires restart) # autovacuum_multixact_freeze_max_age = 400000000 # maximum multixact age # before forced vacuum # (change requires restart) # autovacuum_vacuum _ cost_delay = 20ms # default vacuum cost delay for # autovacuum In milliseconds #-1 means use vacuum_cost_delay#autovacuum_vacuum_cost_limit =-1 # default vacuum cost limit for # autovacuum -1 means use # vacuum_cost_limit autovacuum: whether to start the automatic cleaning function of the system The default is on. Log_autovacuum_min_duration: this parameter is used to record the execution time of autovacuum. When the execution time of autovaccum exceeds the log_autovacuum_min_duration parameter setting, the autovacuum information is recorded in the log. The default is "- 1", which means that it is not recorded. Autovacuum_max_workers: sets the maximum number of worker processes that the system automatically cleans up. Autovacuum_naptime: sets the interval between two system automatic cleanup operations. Autovacuum_vacuum_threshold and autovacuum_analyze_threshold: set vacuum and analyze to be executed when the threshold of the number of tuples updated on the table exceeds these thresholds, respectively. Autovacuum_vacuum_scale_factor and autovacuum_analyze_scale_factor: sets the scaling factor for the table size. Autovacuum_freeze_max_age: sets the upper limit of XID that needs to be forced to clean up the database. Autovacuum_vacuum_cost_delay: when the autovacuum process is about to execute, the cost execution of the vacuum is evaluated. If the autovacuum_vacuum_cost_limit setting value is exceeded, the delay is delayed. The delay time is autovacuum_vacuum_cost_delay. A value of-1 means that the vacuum_cost_delay value is used, and the default value is 20 ms. Autovacuum_vacuum_cost_limit: this value is the evaluation threshold of the autovacuum process. The default is-1, which means that the "vacuum_cost_limit" value is used. If the cost evaluated during the autovacuum process exceeds autovacuum_vacuum_cost_limit, the autovacuum process will hibernate.

2.4.6. WalWriter (pre-write log writing) process

The central idea of pre-written log WAL (Write Ahead Log, also known as Xlog) is that changes to the data file must only occur after these changes have been recorded in the log, that is, write the log first and then write the data (log first). By using this mechanism, the frequent writing of data to disk can be avoided, and the disk Icano can be reduced. These WAL logs can be used to recover the database after a downtime restart. The parameters related to the WalWriter process in the postgresql.conf file are as follows:

#-# WRITE AHEAD LOG#-- -- #-Settings-# wal_level = minimal # minimal Replica Or logical # (change requires restart) # fsync = on # flush data to disk for crash safety # (turning this off can cause # unrecoverable data corruption) # synchronous_commit = on # synchronization level # off, local, remote_write, remote_apply Or on#wal_sync_method = fsync # the default is the first option # supported by the operating system: # open_datasync # fdatasync (default on Linux) # fsync # fsync_writethrough # open_sync#full_page_writes = on # recover from partial page writes#wal_compression = off # enable compression of full-page writes#wal_log_hints = off # also do full page writes of non-critical updates # (change requires restart) # wal_buffers =-1 # min 32kB -1 sets based on shared_buffers # (change requires restart) # wal_writer_delay = 200ms # 1-10000 milliseconds#wal_writer_flush_after = 1MB # measured in pages, 0 disables#commit_delay = 0 # range 0-100000 In microseconds#commit_siblings = 5 # range 1-1000 wal_level: controls the level of wal storage. Wal_level determines how much information is written to the WAL. The default value is the smallest (minimal), where only the information needed to recover from a crash or immediate shutdown is written. Replica adds wal archive information and includes information needed by read-only servers. (new in 9.6, merging previous versions of archive and hot_standby) logical is mainly used for logical decoding scenario fsync: this parameter directly controls whether the log is written to disk first. The default value is ON (write first), which means that the system must wait for the WAL write to complete when the update data is written to disk. You can configure this parameter to OFF, which means that the updated data is written to disk without waiting for the WAL write to complete. Synchronous_commit: whether the parameter configuration waits for WAL to complete before returning status information to the user transaction. The default value is ON, which means that you must wait for WAL to complete before returning transaction state information; configured to OFF can feedback back transaction state more quickly. How wal_sync_method:WAL is written to disk. The default value is fsync. Available values include open_datasync, fdatasync, fsync_writethrough, fsync, and open_sync. Open_datasync and open_sync indicate that the O_DSYNC and O_SYNC flags are used when opening the WAL file; fdatasync and fsync respectively indicate that the fdatasync and fsync functions are called to write data each time the file is submitted. Both functions write the operating system's disk cache back to disk, but the former writes only the data portion of the file, while the latter also updates the file's properties synchronously. Fsync_writethrough says that each time it is committed and written back to disk, it ensures that the operating system disk cache is consistent with the contents in memory. Full_page_writes: indicates whether the entire page is written to WAL. Wal_buffers: the memory space used to store WAL data. The default value of the system is 64K. This parameter is also affected by two parameters: wal_writer_delay and commit_delay. The write interval of the wal_writer_delay:WalWriter process is 200ms by default, which may result in insufficient memory in the WAL buffer if the time is too long. If the time is too short, it will cause continuous writes to the WAL and increase the burden on the disk. Wal_writer_flush_after:commit_delay: indicates the time for a committed data to be stored in the WAL buffer. The default value is 0 milliseconds, which means there is no delay. If the transaction is set to a value other than 0, it will not be written to the WAL immediately after executing commit, but will still be stored in the WAL buffer, waiting for the WalWriter process to write to disk periodically. Commit_siblings: indicates that when a transaction issues a commit request, if the number of transactions executing in the database is greater than the commit_ siblings value, the transaction will wait for a period of time (the value of commit_delay); otherwise, the transaction will be written directly to WAL. The system default value is 5, which also determines the validity of commit_delay. Wal_writer_flush_after: when dirty data exceeds the threshold, it will be flushed out to disk.

2.4.7, CheckPoint (checkpoint) process

Checkpoints are transaction sequence points set by the system, and checkpoints are set to ensure that the log information before the checkpoint is brushed to disk. The parameters related to it in the postgresql.conf file are:

#-Checkpoints-# checkpoint_timeout = 5min # range 30s-1d#max_wal_size = 1GB#min_wal_size = 80MB#checkpoint_completion_target = 0.5 # checkpoint target duration, 0.0-1.0#checkpoint_flush_after = 256kB # measured in pages, 0 disables#checkpoint_warning = 30s # 0 disables

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.