Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

The use of PostgreSQL logical backup pg_dump and its principle Analysis

2025-01-16 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Database >

Share

Shulou(Shulou.com)06/01 Report--

First, principle analysis

1. Call getopt_long to parse the command line parameters, and save the parameters to static DumpOptions dopt;.

2. Determine whether the parameters are compatible or exit if they are not:

Options-s/--schema-only and-a/--data-only cannot be used together options-c/--clean and-a/--data-only cannot be used together options-inserts/--column-inserts and-o/--oids cannot be used together option-- if-exists requires option-c/--clean

3. Call CreateArchive to open the output file, and the output stream is fout. This function uses four files to encapsulate four different dump file formats. Adding new files can add new export file types to each package, which is independent and easy to maintain.

CreateArchive- > _ allocAH: switch (AH- > format) {case archCustom: InitArchiveFmt_Custom (AH); break; case archNull: InitArchiveFmt_Null (AH); break; case archDirectory: InitArchiveFmt_Directory (AH); break Case archTar: InitArchiveFmt_Tar (AH); break; default: exit_horribly (modulename, "unrecognized file format\"% d\ "\ n", fmt);}

4. Fout is an important global variable.

5. Call ConnectDatabase to connect to the database

6. Call setup_connection to execute some SQL statements on the connection:

SELECT pg_catalog.set_config ('search_path',', false); set client_encoding to'% s'//pg_dump-E specifies SET ROLE% s SET DATESTYLE / SET DATESTYLE = ISO; SET INTERVALSTYLE = POSTGRES; SET extra_float_digits TO 3; SET synchronize_seqscans TO off; SET statement_timeout = 0; SET lock_timeout = 0; SET idle_in_transaction_session_timeout = 0; SET row_security = off; BEGIN SET TRANSACTION ISOLATION LEVEL REPEATABLE READ, READ ONLY

7. in order to be compatible with the lower version, determine the values of some variables according to the server version number.

8. Call tblinfo = getSchemaData (fout, & numTables); decide which database objects to export. This function also calls the following function, which is worth paying attention to. To store the metadata for each object, these functions malloc the space until the end of the pg_dump process.

Extinfo = getExtensions (fout, & numExtensions); extinfoindex = buildIndexArray (extinfo, numExtensions, sizeof (ExtensionInfo)); getExtensionMembership (fout, extinfo, numExtensions); nspinfo = getNamespaces (fout, & numNamespaces); nspinfoindex = buildIndexArray (nspinfo, numNamespaces, sizeof (NamespaceInfo)); tblinfo = getTables (fout, & numTables); tblinfoindex = buildIndexArray (tblinfo, numTables, sizeof (TableInfo)); getOwnedSeqs (fout, fout, tblinfo); tblinfo = numTables (numTables, numTables) Funinfoindex = buildIndexArray (funinfo, numFuncs, sizeof (FuncInfo)); typinfo = getTypes (fout, & numTypes); typinfoindex = buildIndexArray (typinfo, numTypes, sizeof (TypeInfo)); getProcLangs (fout, & numProcLangs); getAggregates (fout, & numAggregates); oprinfo = getOperators (fout, & numOperators); oprinfoindex = buildIndexArray (oprinfo, numOperators, sizeof (OprInfo)); getAccessMethods (fout, & numAccessMethods); getOpclasses (getOpclasses, & getOpclasses); fout (fout, & fout) GetTSParsers (fout, & numTSParsers); getTSTemplates (fout, & numTSTemplates); getTSDictionaries (fout, & numTSDicts); getTSConfigurations (fout, & numTSConfigs); getForeignDataWrappers (fout, & numForeignDataWrappers); getForeignServers (fout, & numForeignServers); getDefaultACLs (fout, & numDefaultACLs); collinfo = getCollations (fout, & numCollations); collinfoindex = buildIndexArray (collinfo, numCollations, sizeof (CollInfo)); getConversions (fout, & numConversions); getCasts (fout, & fout) GetTransforms (fout, & numTransforms); inhinfo = getInherits (fout, & numInherits); getEventTriggers (fout, & numEventTriggers); processExtensionTables (fout, extinfo, numExtensions); flagInhTables (tblinfo, numTables, inhinfo, numInherits); getTableAttrs (fout, tblinfo, numTables); flagInhAttrs (fout- > dopt, tblinfo, numTables); getIndexes (fout, tblinfo, numTables); getExtendedStatistics (fout); getConstraints (fout, tblinfo, tblinfo); tblinfo (tblinfo, numTables, numTables) GetRules (fout, & numRules); getPolicies (fout, tblinfo, numTables); getPublications (fout); getPublicationTables (fout, tblinfo, numTables); getSubscriptions (fout) For each getXXXs function, the following process is performed, taking getTables as an example: 1) query the system table according to the server version number, read out the metadata information of the object 2) malloc memory space and store the query results into the data structure of the object, TableInfo3) for each metadata information, call selectDumpableTable to mark the table to be exported, if-t specifies the export table, traverses the list, gets the corresponding table and marks it: DUMP_COMPONENT_ALL -T specifies to delete the table, marked tbinfo- > dobj.dump = DUMP_COMPONENT_NONE4) dumpIdMap [dobj-> dumpId] = dobj; to store the metadata of the exported table in the dumpIdMap array 5) execute LOCK TABLE% s IN ACCESS SHARE MODE6 on the exported table) after saving all the metadata information, execute SET statement_timeout = 0 to ensure that the statement does not time out and can be executed forever

9. Call the getTableData function to get the data corresponding to the table. In fact, it is not the real data of the table, but the establishment of an "export object" for the table data, which will be exported in the future according to the exported object. Although you first put the exported object on the AH- > toc linked list, the actual export of data will not take up a lot of memory space, but for these metadata, when there are a lot of tables, the memory will not be released until the process exits, and the memory occupation is still very considerable.

This function calls makeTableDataInfo:

1) view, external table, partition table word table (derived from parent table) and unlogged permanent table do not need to be exported

2) it is excluded when judging whether the table is exported or not.

3) malloc a TableDataInfo to save table information

Typedef struct _ tableDataInfo {DumpableObject dobj; TableInfo * tdtable; / * link to table to dump * / bool oids; / * include OIDs in data? * / char * filtercond; / * WHERE condition to limit rows dumped * /} TableDataInfo

4) tdinfo- > dobj.catId.tableoid, tdinfo- > dobj.catId.oid, tdinfo- > dobj.name, tdinfo- > dobj.namespace information, and save dobj to the dumpIdMap array

10. If you need to export prawns, call getBlobs. The same as above is also saved to the array, but no data is really exported.

11. Call getDependencies to reorganize the dependency of each object.

12. GetDumpableObjects gets the dump object from the dumpIdMap array

13. SortDumpableObjectsByTypeName, sortDataAndIndexObjectsBySize (if parallel dump, sort by table size), sortDumpableObjects rearrange all objects: the export priority of different types of objects depends on the dbObjectTypePriority array; the same type is sorted by name

Static const int dbObjectTypePriority [] = {1, / * DO_NAMESPACE * / 4, / * DO_EXTENSION * / 5, / * DO_TYPE * / 5, / * DO_SHELL_TYPE * / 6, / * DO_FUNC * / 7, / * DO_AGG * / 8, / * DO_OPERATOR * / 8, / * DO_ACCESS_METHOD * / 9 / * DO_OPCLASS * / 9, / * DO_OPFAMILY * / 3, / * DO_COLLATION * / 11, / * DO_CONVERSION * / 18, / * DO_TABLE * / 20, / * DO_ATTRDEF * / 28, / * DO_INDEX * / 29, / * DO_STATSEXT * / 30, / * DO_RULE * / 31 / * DO_TRIGGER * / 27, / * DO_CONSTRAINT * / 32, / * DO_FK_CONSTRAINT * / 2, / * DO_PROCLANG * / 10, / * DO_CAST * / 23, / * DO_TABLE_DATA * / 24, / * DO_SEQUENCE_SET * / 19, / * DO_DUMMY_TYPE * / 12 / * DO_TSPARSER * / 14, / * DO_TSDICT * / 13, / * DO_TSTEMPLATE * / 15, / * DO_TSCONFIG * / 16, / * DO_FDW * / 17, / * DO_FOREIGN_SERVER * / 32, / * DO_DEFAULT_ACL * / 3, / * DO_TRANSFORM * / 21, / * DO_BLOB * / 25 / * DO_BLOB_DATA * / 22, / * DO_PRE_DATA_BOUNDARY * / 26, / * DO_POST_DATA_BOUNDARY * / 33, / * DO_EVENT_TRIGGER * / 38, / * DO_REFRESH_MATVIEW * / 34, / * DO_POLICY * / 35, / * DO_PUBLICATION * / 36 / * DO_PUBLICATION_REL * / 37 / * DO_SUBSCRIPTION * /}

14. DumpEncoding, dumpStdStrings, dumpSearchPath export the coding information, and use the bi-directional linked list TOCEntry to save the exported object. For example:

NewToc- > defn: "SET client_encoding='UTF8';\ n" SET standard_conforming_string='on'; SELECT pg_catalog.set_config ('search_path','',false);\ n

15. DumpDatabase exports the destination database information corresponding to this link, which is also newToc,newToc- > defn:CREATE DATABASE yzs WITH TEMPLATE=template0 ENCODING='UTF8' LC_COLLATE='zh_CN.UTF-8' LC_CTYPE='zh_CN.UTF-8'

16, traverse all objects, for each object call dumpDumpableObject, this function uses a bunch of such as dumpNamespace, dumpExtension, etc., insert them into the circular linked list.

For (I = 0; I)

< numObjs; i++) dumpDumpableObject(fout, dobjs[i]); --------------------------以上所有导出,不真正导出数据---------------------------- 17、遍历链表标记哪些对象Toc entry需要导出:ProcessArchiveRestoreOptions 18、如果导出格式时plain,则调用RestoreArchive,输出到文件显示的是SQL语句,不再是不可识别的二进制文件 19、关闭句柄释放资源CloseArchive,根据函数指针调用不同文件类型的_CloseArchive 二、不同格式的处理函数 -F, --format=c|d|t|p output file format (custom, directory, tar,plain text (default)) 目前,pg_dump支持4种导出格式: custum(pg_backup_custum.c):导出二进制格式的文件。包括文件头和文件体。文件体是一个链表,保存每个备份对象,每个可备份对象都有一套统一的结构表示,支持压缩 plain(pg_backup_null.c):把SQL脚本内容输出到标准输出,默认方式 file(pg_backup_file.c):导出包括备份一个主文件和一些辅助文件,主文件方式类似于custom文件格式,辅助文件是数据文件,每个辅助文件对应备份对象中的一个表,需要和-f一起使用 tar(pg_backup_tar.c):文件备份基本类似"file"方式,但最后备份的所有文件都要归档到一个tar文件。文件最大大小为8GB(受限于tar file format) PostgreSQL通过函数指针来实现这四种导出格式。在pg_backup_archive.h文件中有诸如下面的大量函数指针: typedef void (*ClosePtrType) (ArchiveHandle *AH); typedef void (*ReopenPtrType) (ArchiveHandle *AH); typedef void (*ArchiveEntryPtrType) (ArchiveHandle *AH, TocEntry *te); 这些函数指针,在下面文件里分别初始化: pg_backup_custum.c->

InitArchiveFmt_Custom (ArchiveHandle * AH) pg_backup_null.c- > InitArchiveFmt_Null (ArchiveHandle * AH) pg_backup_file.c- > InitArchiveFmt_Directory (ArchiveHandle * AH) pg_backup_tar- > InitArchiveFmt_Tar (ArchiveHandle * AH)

A large number of function pointers are used in the data structure ArchiveHandle, so that when initializing the Archive structure of different exported file formats, the processing functions can be assigned to different processing functions. In this way, the corresponding processing function can be called only according to the parameters of the file format specified by the user in pg_dump.c. See step 3 of the first part.

Generally speaking, the content exported by pg_dump can be divided into the definition of database objects and data. When the definition of database object is exported, the corresponding metadata information is read out by querying the system table, and all kinds of information of the object is placed on a linked list, including the oid of its dependent object. The specific data, that is, the data of each packet, is also abstracted as a database object, which is stored in this linked list.

By adjusting the export order, the definition of the database object is exported and then the data is exported, which is placed through the information of the corresponding data object node in the linked list, execute the corresponding SQL statement, read the data from the table and then export and write it out. Therefore, in memory, it is only the definition of objects on the linked list, and the data is written while reading, and can be read out by stream.

III. Usage

1) Export in directory format, which needs to be used with-f. Toc.dat holds information about all exportable objects (table definitions, etc.), other files are data, named after the oid of the table, and test is the directory.

[postgres@localhost] $pg_dump-- format=d yzs-f test [postgres@localhost] $cd test [postgres@localhost test] $lltotal 8RWMUR RWMUR. 1 postgres postgres 31 Mar 23 06:07 3010.dat.gzmurr RW Murray. 1 postgres postgres 2124 Mar 23 06:07 toc.dat

2) Export SQL statements to test.sql

[postgres@localhost] $pg_dump-- format=p yzs-f test.sql

3) output in binary format

[postgres@localhost] $pg_dump-- format=c-f test yzs

4) output in tar format. Different from the d format in that there is an additional restore.sql file (plain format file) and all the files are packaged into one file

[postgres@localhost] $pg_dump-- format=t-f test yzs [postgres@localhost ~] $tar-xvf testtoc.dat3010.datrestore.sql

5) Export only the database structure (no library is specified, default is postgres)

Pg_dump-s yzs-f 1.sql

6) export drop database and create database statements when exporting. Note that if a user connects to the library during import, the execution of the drop statement fails

Pg_dump-s yzs-C-c-f 1.txt

7.-t specifies that some tables are exported, and only objects such as tables that begin with item are exported.

Pg_dump-t temp*-f 1.txt yzs

8.-n only exports the specified schema, and you can specify the schema that is not exported by multiple-n

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Database

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report