The treatment method of goldengate Fault 07/03 Update SLTechnology News&Howtos

The treatment method of goldengate Fault

2025-07-03 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Servers >

Shulou(Shulou.com)06/01 Report--

In this issue, the editor will bring you ways to deal with goldengate failures. The article is rich in content and analyzes and narrates it from a professional point of view. I hope you can get something after reading this article.

Problem description:

Our online gg was launched last Wednesday night, that is, the evening of April 19, when it was launched, it was configured on node 3 of rac. When restarting node 3, due to negligence, the original 32 GB of memory was only identified after getting up. I did not find it at that time. After running for a few days, I suddenly found that there were once or twice every day, and the concurrency of node 3 was very high. The average load at the operating system level soared from a few times to 50 or 60, resulting in a temporary fake death of the database. At exactly this point in time, the extraction process of gg is in top1. If you look at the memory usage of the operating system, there is only tens of k left. At first, it was suspected that it was a problem of nfs mounting, but finally, there was no problem. Finally, it was decided to urgently deal with the memory problem of node 3. The details are as follows:

After leaving work at 6: 00 p.m., because the website and boss are relatively busy from 6: 00 to 9: 00, nothing is done during this period. At 9: 00, notify the relevant OPS personnel to stop all the tomcat of node 3. Then I stop the gg here, uninstall the nfs, close all the database processes of node 3, and finally shut down. The operation is as follows:

GGSCI (rac3) 21 > stop mgr

GGSCI (rac3) 21 > stop extract xxxx

GGSCI (rac3) 21 > stop dpump xxxx

During the process of stopping, the information in errlog is as follows:

2012-04-26 20:57:39 INFO OGG-00497 Oracle GoldenGate Capture for Oracle, extksr1.prm: Writing DDL operation to extract trail file.

2012-04-26 21:01:36 INFO OGG-00987 Oracle GoldenGate Command Interpreter for Oracle: GGSCI command (oracle): stop extksr1.

2012-04-26 21:01:38 INFO OGG-01021 Oracle GoldenGate Capture for Oracle, extksr1.prm: Command received from GGSCI: STOP.

2012-04-26 21:01:39 INFO OGG-00991 Oracle GoldenGate Capture for Oracle, extksr1.prm: EXTRACT EXTKSR1 stopped normally.

2012-04-26 21:01:41 INFO OGG-00987 Oracle GoldenGate Command Interpreter for Oracle: GGSCI command (oracle): stop dpksr1.

2012-04-26 21:01:43 INFO OGG-01021 Oracle GoldenGate Capture for Oracle, dpksr1.prm: Command received from GGSCI: STOP.

2012-04-26 21:01:43 INFO OGG-00991 Oracle GoldenGate Capture for Oracle, dpksr1.prm: EXTRACT DPKSR1 stopped normally.

2012-04-26 21:01:47 INFO OGG-00987 Oracle GoldenGate Command Interpreter for Oracle: GGSCI command (oracle): stop mgr.

2012-04-26 21:01:49 INFO OGG-00963 Oracle GoldenGate Manager for Oracle, mgr.prm: Command received from GGSCI on host 10.1.8.49 (STOP).

2012-04-26 21:01:49 WARNING OGG-00938 Oracle GoldenGate Manager for Oracle, mgr.prm: Manager is stopping at user request.

After all the related processes are stopped, nfs,umount uninstalls node 1 and shared storage. It is very simple to skip the specific command. It is worth mentioning that when uninstalling shared storage, resources will be busy, as long as a-l parameter is added. At the same time, after the gg process of the master station stops, you will find that although the target process of the gg is in running status, the errlog will prompt you to extract relevant information about the stopped process:

2012-04-26 20:54:38 INFO OGG-00484 Oracle GoldenGate Delivery for Oracle, repksr1.prm: Executing DDL operation.

2012-04-26 20:54:38 INFO OGG-00483 Oracle GoldenGate Delivery for Oracle, repksr1.prm: DDL operation successful.

2012-04-26 20:54:38 INFO OGG-01408 Oracle GoldenGate Delivery for Oracle, repksr1.prm: Restoring current schema for DDL operation to [OGG].

2012-04-26 20:58:41 INFO OGG-01735 Oracle GoldenGate Collector: Synchronizing / home/oracle/ggs/trails/t1000239 to disk.

2012-04-26 20:58:41 INFO OGG-01670 Oracle GoldenGate Collector: Closing / home/oracle/ggs/trails/t1000239.

2012-04-26 20:58:41 INFO OGG-01675 Oracle GoldenGate Collector: Terminating because extract is stopped.

After the above steps are completed, stop the database-related processes and services on node 3, skip it, then shut down the computer, notify the colleagues waiting in the computer room, and then start to deal with the memory problem. After about 30 minutes, the memory problem is resolved, and after the server starts up, I start to deal with the follow-up here:

The first step is to start the portmap and nfs services on node 3, skipping.

After that, the node 1J2 and shared storage are mounted, and then an error will be reported when starting the mgr process, as follows:

2012-04-26 21:50:18 ERROR OGG-01117 Oracle GoldenGate Command Interpreter for Oracle: Received signal: Program interrupt (2).

2012-04-26 21:50:18 ERROR OGG-01668 Oracle GoldenGate Command Interpreter for Oracle: PROCESS ABENDING.

2012-04-26 21:51:43 INFO OGG-00987 Oracle GoldenGate Command Interpreter for Oracle: GGSCI command (oracle): start mgr.

2012-04-26 21:52:13 ERROR OGG-01454 Oracle GoldenGate Manager for Oracle, mgr.prm: Unable to lock file "/ share_disk/ggs/dirpcs/MGR.pcm" (error 37, No locks available).

2012-04-26 21:52:13 ERROR OGG-01668 Oracle GoldenGate Manager for Oracle, mgr.prm: PROCESS ABENDING.

The red part above roughly means that the mgr process cannot acquire the relevant locks on the shared storage, which will directly cause subsequent operations to fail. The solution is very simple: start the nfslock service on node 3, and then start the mgr process. After mgr starts, it is found that the extraction process abend has been dropped, and an error message about extract is thrown in the errlog, as shown below:

2012-04-26 21:54:34 INFO OGG-01026 Oracle GoldenGate Capture for Oracle, dpksr1.prm: Rolling over remote file / home/oracle/ggs/trails/t1000240.

2012-04-26 21:54:34 INFO OGG-01053 Oracle GoldenGate Capture for Oracle, dpksr1.prm: Recovery completed for target file / home/oracle/ggs/trails/t1000240, at RBA 1022.

2012-04-26 21:54:34 INFO OGG-01057 Oracle GoldenGate Capture for Oracle, dpksr1.prm: Recovery completed for all targets.

2012-04-26 21:54:35 ERROR OGG-00446 Oracle GoldenGate Capture for Oracle, extksr1.prm: Could not find archived log for sequence 16857 thread 3 under alternative destinations. SQL. Last alternative log tried / arch/rac3/3_16857_744833311.dbf, error retrieving redo file name for sequence 16857, archived = 1, use_alternate = 0Not able to establish initial position for sequence 16857, rba 1529360.

2012-04-26 21:54:35 ERROR OGG-01668 Oracle GoldenGate Capture for Oracle, extksr1.prm: PROCESS ABENDING.

The reason for this is very simple, that is, when node 3 is closed, vip drifts to other nodes, causing the archives on node 3 to belong to other nodes. When gg extracts the archives of node 3, the necessary archive logs cannot be found in the relevant directory, so abend is dropped. After the reason is clear, the solution is simple, directly to other nodes to copy the archive logs of node 3. Then start the extraction process and ok:

2012-04-26 21:57:22 INFO OGG-00993 Oracle GoldenGate Capture for Oracle, extksr1.prm: EXTRACT EXTKSR1 started.

2012-04-26 21:57:22 INFO OGG-01055 Oracle GoldenGate Capture for Oracle, extksr1.prm: Recovery initialization completed for target file / share_disk/ggs/trails/s1000239, at RBA 24518902.

2012-04-26 21:57:22 INFO OGG-01478 Oracle GoldenGate Capture for Oracle, extksr1.prm: Output file / share_disk/ggs/trails/s1 is using format RELEASE 10.4 pick 11.1.

2012-04-26 21:57:23 INFO OGG-01517 Oracle GoldenGate Capture for Oracle, extksr1.prm: Position of first record processed for Thread 1, Sequence 29645, RBA 18568720, SCN 18.122009990, Apr 26, 2012 9:01:24 PM.

2012-04-26 21:57:23 INFO OGG-01517 Oracle GoldenGate Capture for Oracle, extksr1.prm: Position of first record processed for Thread 2, Sequence 28161, RBA 12794496, SCN 18.122010368, Apr 26, 2012 9:01:32 PM.

2012-04-26 21:57:24 INFO OGG-01026 Oracle GoldenGate Capture for Oracle, extksr1.prm: Rolling over remote file / share_disk/ggs/trails/s1000239.

2012-04-26 21:57:24 INFO OGG-01053 Oracle GoldenGate Capture for Oracle, extksr1.prm: Recovery completed for target file / share_disk/ggs/trails/s1000240, at RBA 1019.

2012-04-26 21:57:24 INFO OGG-01057 Oracle GoldenGate Capture for Oracle, extksr1.prm: Recovery completed for all targets.

Gg main library:

GGSCI (rac3) 20 > info all

Program Status Group Lag Time Since Chkpt

MANAGER RUNNING

EXTRACT RUNNING DPKSR1 00:00:00 00:00:00

EXTRACT RUNNING EXTKSR1 00:00:00 00:00:04

Gg repository:

GGSCI (rptdb) 7 > info all

Program Status Group Lag Time Since Chkpt

MANAGER RUNNING

REPLICAT RUNNING REPKSR1 00:00:00 00:00:00

Finally, after observing for a period of time, it was found that there was nothing wrong with the main station and gg. The whole process lasted about an hour, and the following week continued to observe and monitor.

Record ~ ~

The above is the way to deal with goldengate faults shared by Xiaobian. If you happen to have similar doubts, you might as well refer to the above analysis to understand. If you want to know more about it, you are welcome to follow the industry information channel.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.