In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-04-04 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Database >
Share
Shulou(Shulou.com)06/01 Report--
In the PostgreSQL HA environment based on streaming replication, such as network access / hardware failure, the Standby node is upgraded to Master node, but the database of the old Master node is not damaged. After troubleshooting, does the old Master node not need to be rebuilt into a Standby node by backup? The answer is yes, PG provides pg_rewind as a tool implementation.
Principle
In a PostgreSQL HA environment, after a Standby node is upgraded to a Master node, the timeline will be switched to a new timeline, such as from 1 to 2. While the timeline of the old Master node is still the original timeline, for example, it is still 1, then using the pg_rewind tool, how can the old Master node read the relevant data from the new Master node and become the new Standby node?
To put it simply, there are the following steps:
1. Determine the Checkpoint location where the new Master and old Master data are consistent. In this location, the new Master data is exactly the same as the old Master data. This can be obtained by reading the old and new Master node timeline history files, which are located in the $PGDATA/pg_wal/ directory and are named XX.history
two。 The old Master node reads the local log file WAL Record according to the Checkpoint obtained in the previous step, obtains the Block that changes after this Checkpoint, and stores information such as Block number in the way of linked list.
3. Copy the corresponding Block from the new Master node according to the Block information obtained in step 2, and replace the corresponding Block of the old Master node
4. Copy all other files except data files on the new Master node, including configuration files, etc. (if you copy data files, it is not different from backup)
5. The old Master starts the database and applies the WAL Record from Checkpoint.
Example
Considering the HA environment of two nodes, the old Master node IP is 192.168.26.25 the standby node (new Master) IP is 192.168.26.26, the simulation main library is down, the standby database is upgraded to the main database, and then the old Master node is switched to the Standby node through pg_rewind.
Before handover, 26.25 nodes related information
Testdb=# select * from pg_stat_replication -[RECORD 1]-+-- pid | 1537usesysid | 90113usename | replicatorapplication_name | standby_26client_addr | 192.168.26.26client_hostname | client_port | 53164backend_start | 2019-03-27 15:19:09.254987+08backend_xmin | 654state | streamingsent_lsn | | 0/6B000060write_lsn | 0/6B000060flush_lsn | 0/6B000060replay_lsn | 0/6B000060write_lag | 00:00:00.156457flush_lag | 00:00:00.158792replay_lag | 00:00:00.158815sync_priority | 0sync_state | asynctestdb=# testdb=# select pg_is_in_recovery () | -[RECORD 1]-+-- pg_is_in_recovery | f
Insert test data
Testdb=# create table t_oldmaster (id int); CREATE TABLEtestdb=# insert into t_oldmaster select generate_series (1pm 1000000); INSERT 0 1000000testdb=#
26.26Node information before switching, data has been replicated, role is Standby
Testdb=# select count (*) from tactile mastery; count-1000000 (1 row) testdb=# select pg_is_in_recovery (); pg_is_in_recovery-t (1 row)
Simulated Master node downtime
[xdb@localhost testdb] $ps-ef | grep postgresxdb 1353 1 0 14:46 pts/1 00:00:00 / appdb/xdb/pg11.2/bin/postgresxdb 1354 1353 0 14:46? 00:00:00 postgres: logger xdb 1356 1353 0 14:46? 00:00:00 postgres: checkpointer xdb 1357 1353 0 14:46? 00:00:00 postgres: background writer xdb 1358 1353 0 14:46? 00:00:00 postgres: walwriter xdb 1359 1353 0 14:46? 00:00:00 postgres: autovacuum launcher xdb 1360 1353 0 14:46? 00:00:00 postgres: archiver last was 00000010000000000000006Dxdb 1361 1353 0 14:46? 00:00:00 postgres: stats collector xdb 1362 1353 0 14:46? 00:00:00 postgres: logical replication launcher xdb 1418 1353 0 15:11? 00:00:02 postgres: xdb testdb [local] idlexdb 1537 1353 15:19? 00:00:00 postgres: walsender replicator 192.168.26.26 (53164) streaming 0/6ED4FDA8xdb 1555 1317 0 15:25 pts/1 00:00:00 grep-color=auto postgres [xdb@localhost testdb] $kill-9 1353 [xdb@localhost testdb] $ps-ef | grep postgresxdb 1557 1317 0 15:25 pts/ 00:00:00 grep-- color=auto postgres [xdb@localhost testdb] $
Standby node performs handover
Edit the recovery.conf file to add trigger_file and recovery_target_timeline
[xdb@localhost testdb] $cat recovery.conf standby_mode = 'on'primary_conninfo =' host=192.168.26.25 port=5432 user=replicator password=replicator application_name=standby_26'restore_command ='cp / data/archivelog/%f% p'archive_cleanup_command = 'pg_archivecleanup / data/archivelog% r'trigger_file =' / tmp/trigger_file'recovery_target_timeline = 'latest'# generate trigger file [xdb@localhost testdb] $touch / tmp/trigger_file
Restart the Standby node
[xdb@localhost testdb] $pg_ctl restartpg_ctl: PID file "/ data/pgsql/testdb/postmaster.pid" does not existIs server running?trying to start server anywaywaiting for server to start....2019-03-27 15V 33data/pgsql/testdb/postmaster.pid 28.218 CST [1901] LOG: listening on IPv4 address "0.0.0.0", port 54322019-03-27 15JZ 3319 CST [1901] LOG: listening on IPv6 address ":" Port 54322019-03-27 15 tmp/.s.PGSQL.5432 33 CST 28.223 CST [1901] LOG: listening on Unix socket "/ tmp/.s.PGSQL.5432" 2019-03-27 15 33 Switzerland 28.253 CST [1901] LOG: redirecting log output to logging collector process2019-03-27 15 15 Swiss 33 CST [1901] HINT: Future log output will appear in directory "pg_log".. Doneserver started [xdb@localhost testdb] $
Log output
2019-03-27 15 entering standby mode 33 entering standby mode 28.259 CST,1903,5c9b2748.76f,2,2019-03-27 15:33:28 CST,0,LOG,00000, "entering standby mode", "2019-03-27 15 15 entering standby mode 33 Swiss 28.264 CST,1903,5c9b2748.76f,3,2019-03-27 15:33:28 CST,1/0,0,LOG,00000," redo starts at 0/6B0164C0 " "2019-03-27 15 consistent recovery state reached at 0/6ED4FDA8 33 CST,1903,5c9b2748.76f,4,2019 29.416 CST,1903,5c9b2748.76f,4,2019-03-27 15:33:28 CST,1/0,0,LOG,00000," consistent recovery state reached at 0/6ED4FDA8 "," 2019-03-27 15 15 consistent recovery state reached at 0/6ED4FDA8 33 Swiss 29.416 CST,1903,5c9b2748.76f,5,2019-03-27 15:33:28 CST,1/0,0,LOG,00000, "invalid record length at 0/6ED4FDA8: wanted 24, got 0" , "2019-03-27 15 trigger file found 33 CST,1903,5c9b2748.76f,6,2019 29.416 CST,1903,5c9b2748.76f,7,2019-03-27 15:33:28 CST,1/0,0,LOG,00000," trigger file found: / tmp/trigger_file "," 2019-03-27 15 15 Swiss 33 Swiss 29.416 CST,1903,5c9b2748.76f,7,2019-03-27 15:33:28 CST,1/0,0,LOG,00000 "redo done at 0/6ED4FD70", "2019-03-27 15 CST,1/0,0,LOG,00000 3329.416 CST,1903,5c9b2748.76f,8,2019-03-27 15:33:28 CST,1/0,0,LOG,00000," last completed transaction was at log time 2019-03-27 15 last completed transaction was at log time 21 last completed transaction was at log time 57.519456 CST,1901,5c9b2748.76d,2 "," 2019-03-27 15 Swiss 3329.416 2019-03-27 15:33:28 CST,0,LOG,00000, "database system is ready to accept read only connections", "2019-03-27 15 CST,1/0,0,LOG,00000 33 CST,1903,5c9b2748.76f,9,2019-03-27 15:33:28 CST,1/0,0,LOG,00000," selected new timeline ID: 17 ","
The Standby node has been upgraded to a new Master node
Rename the recovery.conf file to recovery.done
/ tmp/trigger_file file deleted
Pg_is_in_recovery returns false
[xdb@localhost testdb] $cat recovery.done standby_mode = 'on'primary_conninfo =' host=192.168.26.25 port=5432 user=replicator password=replicator application_name=standby_26'restore_command ='cp / data/archivelog/%f% p'archive_cleanup_command = 'pg_archivecleanup / data/archivelog% r'trigger_file =' / tmp/trigger_file'recovery_target_timeline = 'latest' [root@localhost testdb] # ls-l / tmp/trigger_filels: cannot access / tmp/trigger_file: No Such file or directory [xdb@localhost testdb] $psql-d testdbpsql Type "help" for help.testdb=# select pg_is_in_recovery () Pg_is_in_recovery-f (1 row)
The timeline history file is generated in the pg_wal directory.
[xdb@localhost testdb] $ls. / pg_wal/00000010000000000000006E.partial 000000110000000000006F 00000011000000000000007200000010.history 00001100000000000070 00000011.history00000011000000000000006E 0000001100000000000071 archive_ status [XDB @ localhost testdb] $cat. / pg_wal/00000011.history 70 / 27000000 no recovery target specified8 0/2A0012E8 no recovery target specified12 0U32000000 no recovery target specified14 0/39A63BD0 no recovery target specified15 0/40A63B08 no recovery target specified16 0/6ED4FDA8 no recovery target specified [xdb@localhost testdb] $ Cat. / pg_wal/00000010.history 70 / 27000000 no recovery target specified8 0/2A0012E8 no recovery target specified12 0/32000000 no recovery target specified14 0/39A63BD0 no recovery target specified15 0/40A63B08 no recovery target specified
Generate test data
Testdb=# create table t_new_master (id int); CREATE TABLEtestdb=# insert into t_new_master select generate_series (1pm 1000000); INSERT 0 1000000testdb=#
Old Master nodes are downgraded to Standby nodes
Restart the old master node and insert test data to simulate data changes after the timeline bifurcates
[xdb@localhost testdb] $pg_ctl startpg_ctl: another server might be running Trying to start server anywaywaiting for server to start....2019-03-27 15 listening on IPv4 address 39 LOG [1561] LOG: listening on IPv4 address "0.0.0.0", port 54322019-03-27 15 LOG 39 LOG [1561] LOG: listening on IPv6 address ":" Port 54322019-03-27 15 listening on Unix socket 39 listening on Unix socket [1561] LOG: listening on Unix socket "/ tmp/.s.PGSQL.5432" 2019-03-27 15 15 pg_log CST [1561] LOG: redirecting log output to logging collector process2019-03-27 15 15 HINT: Future log output will appear in directory "pg_log".. Doneserver started [xdb@localhost testdb] $psql-d testdbpsql (11.2) Type "help" for help.testdb=# testdb=# create table t_fork (id int); CREATE TABLEtestdb=# insert into t_fork select generate_series (1meme 100000); INSERT 0 100000testdb=# testdb=#\ Q [xdb@localhost testdb] $
Close the old main library and execute pg_rewind
[xdb@localhost testdb] $pg_ctl stopwaiting for server to shut down.... Doneserver stopped [xdb@localhost testdb] $pg_rewind-target-pgdata=$PGDATA-- source-server= "host=192.168.26.26 port=5432 dbname=testdb"-- progress-- debug-nconnected to serverfetched file "global/pg_control", length 8192target server needs to use either data checksums or "wal_log_hints = on" Failure, exiting
Restart the database by setting the wal_log_hints parameter on in the master-slave setting postgres.conf
[xdb@localhost testdb] $pg_ctl startwaiting for server to start....2019-03-27 15 listening on IPv4 address 54 listening on IPv4 address 56.038 CST [1603] LOG: listening on IPv4 address "0.0.0.0", port 54322019-03-27 15V 54 Swiss 56.038 CST [1603] LOG: listening on IPv6 address ":" Port 54322019-03-27 15 listening on Unix socket 54 listening on Unix socket 56.042 CST [1603] LOG: listening on Unix socket "/ tmp/.s.PGSQL.5432" 2019-03-27 15 15 listening on Unix socket 54 listening on Unix socket 56.079 CST [1603] LOG: redirecting log output to logging collector process2019-03-27 15 15 tmp/.s.PGSQL.5432 54 listening on Unix socket 56.079 CST [1603] HINT: Future log output will appear in directory "pg_log". Doneserver started [xdb@localhost testdb] $pg_ctl stopwaiting for server to shut down.... Doneserver stopped
Execute pg_rewind again
[xdb@localhost testdb] $pg_rewind-target-pgdata=$PGDATA-source-server= "host=192.168.26.26 port=5432 dbname=testdb"-progress-debug-nconnected to serverfetched file "global/pg_control", length 8192fetched file "pg_wal/00000011.history" Length 261Source timeline history:Target timeline history:7: 0No such file or directorycould not find previous WAL record at 0/6ED4FDA8Failure 0-0No such file or directorycould not find previous WAL record at 0/6ED4FDA8Failure 270000008: 0No such file or directorycould not find previous WAL record at 0/6ED4FDA8Failure 27000000-0/2A0012E812: 0/2A0012E8-0Unique 3200000014: 0 No such file or directorycould not find previous WAL record at 0/6ED4FDA8Failure 32000000-0/39A63BD015: 0/39A63BD0-0/40A63B0816: 0/40A63B08-0/0servers diverged at WAL location 0/6ED4FDA8 on timeline 16could not open file "/ data/pgsql/testdb/pg_wal/00000010000000000000006E": No such file or directorycould not find previous WAL record at 0/6ED4FDA8Failure, exiting [xdb@localhost testdb] $
Copy archived wal files for native and new Master nodes
[xdb@localhost testdb] $scp-P 22 root@192.168.26.26:/data/archivelog/* $PGDATA/pg_wal/The authenticity of host '192.168.26.26 (192.168.26.26)' can't be established.ECDSA key fingerprint is SHA256:wT+B26a2wJvmsOz8QH2UK6QUqodwBYvDu/WMrmqZ5fU.ECDSA key fingerprint is MD5:79:63:4d:55:83:d9:d6:78:21:d1:b6:a1:e8:74:0f:b0.Are you sure you want to continue connecting (yes/no)? YesWarning: Permanently added '192.168.26.26' (ECDSA) to the list of known hosts.root@192.168.26.26's password: 00000010000000000000006E.partial 100% 16MB 44.1MB/s 00:00 000000110000000000006E 100% 16MB 71.5MB/s 00:00 000000110000000000006F 100% 16MB 47.0MB/s 00:00 000000110000000000070100% 16MB 26.7MB/s 00:00 00001100000000000071 100% 16MB 26.7MB/s 00:00 00000011000000000007100% 16MB 31.6MB/s 00:00 0000001100000000000073 100% 16MB 37.1MB/s 00:00 00000011.history 100% 261 323.1KB/s 00:00 [xdb@localhost testdb] $ls. / pg_wal/00000010000000000000006A.00000028.backup 0000001000000000000074 00000010.history 00000011000000000000007300000010000000000000006E.partial 00000000000000007500000011000000000000006E 00000011.history000000100000000000000070 00000000000000760011000000000000006F archive_status000000100000000000000071 0000001000000000000077 00001100000000070 RECOVERYHISTORY000000100000000000000072 00000010000000000078 00001100000000000710000000000073 00000010000000000079 00001100000000000000072 [xdb@localhost testdb] $cp / data/archivelog/20190327/00000010000000000000006*. / pg_wal
Re-execute pg_rewind
[xdb@localhost testdb] $pg_rewind-- target-pgdata=$PGDATA-- source-server= "host=192.168.26.26 port=5432 dbname=testdb"-- progress-nconnected to serverservers diverged at WAL location 0/6ED4FDA8 on timeline 16rewinding from last common checkpoint at 0/6B0164F8 on timeline 16reading source file listreading target file listreading WAL in targetneed to copy 296 MB (total source directory size is 437 MB) 303659 kB copiedcreating backup label and updating control filesyncing target data directoryDone!
Modify the configuration file recovery.conf
[xdb@localhost testdb] $scp-P 22 root@192.168.26.26:$PGDATA/recovery.done $PGDATAroot@192.168.26.26's password: recovery.done 100% 317 312.7KB/s 00:00 [xdb@localhost testdb] $mv recovery.done recovery.conf [xdb@localhost testdb] $vim recovery. Conf [xdb@localhost testdb] $cat recovery.conf standby_mode = 'on'primary_conninfo =' host=192.168.26.26 port=5432 user=replicator password=replicator application_name=standby_25'restore_command ='cp / data/archivelog/%f% p'archive_cleanup_command = 'pg_archivecleanup / data/archivelog% r' [xdb@localhost testdb] $
Restart the original main library
New Master node
There is an error in the log, and there is a problem with the construction.
2019-03-27 16 idle 2219.800 CST, "replicator", "2360," 192.168.26.25 "32974", 5c9b32bb.938 CST,4/0,0,ERROR,XX000 1, "idle", 2019-03-27 16:22:19 CST,4/0,0,ERROR,XX000, "requested starting point 0 at 0/6ED4FDA8 71000000 on timeline 16 is not in this server's history", "This server's history forked from timeline 16 at 0/6ED4FDA8.", "standby_25" 2019-03-27 16 idle 2224 CST, "replicator" ", 2361," 192.168.26.25 on timeline 32976 ", 5c9b32c0.939 CST,4/0,0,ERROR,XX000 1," idle ", 2019-03-27 16:22:24 CST,4/0,0,ERROR,XX000," requested starting point 0 at 0/6ED4FDA8 71000000 on timeline 16 is not in this server's history "," This server's history forked from timeline 16 at 0/6ED4FDA8. "," standby_25 "
As a case of failure, for reference.
references
Magical pg_rewind
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.