A bloody case caused by a long transaction-- the handling of blocking failure caused by the rollback failure of Informix long transaction 04/19 Update SLTechnology News&Howtos

A bloody case caused by a long transaction-- the handling of blocking failure caused by the rollback failure of Informix long transaction

2025-04-19 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Database >

Shulou(Shulou.com)06/01 Report--

Note: to increase the logic log, be sure to read the following command clearly, do not increase according to inertia!

Onparams-a-d llogdbs-s 500000-I

Original http://www.cnblogs.com/haoxiaobo/archive/2013/01/10/2854973.html

Informix 11.5 database, dual-computer hdr hot standby. This is the background.

One thing happened in the past two days: because a large transaction caused logical log exhaustion, the database state entered a state called "long transaction blocking Blocked:LONGTX" and stopped service. This paper analyzes its principle and solution.

1 the principle of long transaction blocking

When the transaction starts, the data will record a checkpoint Check Point in its logical log file. When the transaction runs, the checkpoint and subsequent logical logs are uncommitted and must be retained until the transaction is committed or rolled back before these logical logs can be marked as used and recycled again.

If there are many operations in a transaction, the transaction spans multiple logical log files. When a transaction uses more than a certain number of logical log files, it is judged to be a "long transaction". Because transaction rollback also requires the use of logical logs, when the database finds that the number of logical logs used by long transactions reaches the "rollback return point", this means that if you continue to execute the transaction, the remaining transactions may not be sufficient to guarantee the rollback of this long transaction, and the database will interrupt the transaction execution and roll back immediately.

However, because the logical log is also needed in the rollback process, as soon as the logical log is insufficient in the rollback, and the rollback operation uses up the remaining logical log files, but the rollback is not completed, the database will have a "long transaction blocking". This happens when one long transaction falls back and another transaction quickly consumes the remaining logs.

Note that the lack of logical log here does not mean that ontape-c has been backed up, but because the initial checkpoint of the transaction is in the Nth logical log file, and now it has been executed in the N1st logical log file (the logical file of informix is circular, and execution to N1 means that it has been chased), that is, all logical log files are in an uncommitted state. However, the current transaction is still insufficient, and in this case, even if all the log files have been backed up by ontape-c, they cannot be reused because the rollback or commit of the transaction has not been completed.

Please refer to http://www.ibm.com/developerworks/cn/data/library/techarticles/dm-1001haodh/index.html

2 phenomenon and inspection

At this point, if you check the status of the database, it will look like this:

Infodb% onstat-IBM Informix Dynamic Server Version 11.50.FC6-- On-Line (Prim LONGTX)-- Up 35 days 16:41:40-3920896 Kbytes Blocked:LONGTX

You can execute onstat-x to check the transaction

Infodb% onstat-xIBM Informix Dynamic Server Version 9.40.FC7-- On-Line (LONGTX)-- Up 35 days 16:41:56-3920896 Kbytes Blocked:LONGTXTransactionsaddress flags userthread locks beginlg curlog logposit isol retrys coord1c8b2b298 A 1c8aea078 0x0 COMMIT 0 1cd4d7918-1c8ae9850 0 000 0x0 COMMIT 0 1c8b2b508 0x0 COMMIT 0 1cd4d7918 1cd4d8068 1d44fdcb0-1d44fdcb0 2 119507 0x39722c DIRTY 0 1cd4d8068 Murray-1cd576e38 100 0x0 COMMIT 0 1cd4d82d8 Amuri-1cd577660 100 0x0 DIRTY 0

Note that the flags is the "Amurb -" transaction, and the B status indicates begined, indicating that the transaction is still in progress. Note that his beginlg, that is, the logical log file number at the beginning is 119408, while the current log has reached 119507, and the difference minus 1 is 100, which is exactly the number of logs set in this informix system (your system may be different), which means that this transaction has used up all the logical log files.

If you perform onstat-l to check the usage of the logical log file, you will see the following:

2a273c368 27 Umurb Muyashi-119506 7JV 250053 12500 12500 100.00 2a273c3d0 28 Uluk Maui-119507 7VV 262553 12500 018.50 2a273c438 29 U-B---L 119408 7RV 275053 12500 2313 100.00 2a273c4a0 30 Umel Qawha-119409 7JV 287553 12500 100.00 2a273c508 31 UFLI- 119410 7:300053 12500 12500 100.00

If you look at the log file (which refers to the text file), you will see something like this:

17:59:34 Aborting Long Transaction: tx: 0x1cd4d7918 username: informix uid: 300 17:59:35 Long Transaction 0x1cd4d7918 Aborted. Rollback Duration: 0 Seconds... Here is a large piece of checkpoint and logical log used up and backed up information until. 18:03:22 ALERT: The oldest logical log (119408) contains records from an open transaction (0x1cd4d7918). Logical logging will remain blocked until a log file is added. Add the log file with the onparams-a command, using the-I (insert) option, as in: onparams-a-d-s-i Then complete the transaction as soon as possible.

What this means is that the oldest logical log file includes an open transaction, and the logical log blocks until a new logical log file is added. To add logical log files, you can use onparams-an and the-I option, like the following command, so you can end the transaction as soon as possible.

Onparams-a-d-s-I

This command means: add a logical log file with the space of dbspace, the size is size, and insert it after the current log.

3 if it is handled normally.

Use onstat-d to take a look at the lack of 4k blocks of dbspace in your data (because logical logs can only use 4k blocks of chunk), such as mine:

Onstat-d Dbspacesaddress number flags fchunk nchunks pgsize flags owner name2a0e75028 1 0x40001 1 1 4096 N B informix rootdbs2a273fdc0 2 0x42001 2 1 8192 N TB informix tempdbs012a2740028 3 0x42001 3 1 8192 N TB informix tempdbs022a27401c0 4 0x42001 4 1 8192 N TB informix tempdbs032a2740358 5 0x42001 5 1 8192 N TB informix tempdbs042a27404f0 6 0x40001 6 1 4096 N B informix plogdbs2a2740688 7 0x40001 7 2 4096 N B informix llogdbs2a2740820 9 0x40001 19 70 8192 N B informix datadbs 8 active 2047 maximum Chunksaddress chunk/dbs offset size free bpages flags pathname2a0e751c0 11 0 500000 479443 PO-B- / informix.links/bej/rootchk2a27409b8 22 0 512000 511947 PO-B- / informix.links/bej/tempchk012a2740ba8 3 3 0 512000 511947 PO-B- / informix.links/bej/tempchk022a2740d98 44 0 512000 511947 PO-B- / informix.links/bej/tempchk032a274b028 5 50 512000 511947 PO-B- / informix.links/bej/tempchk042a274b218 66 0 512000 266947 PO-B- / informix.links/bej/plogchk2a274b408 7 7 0 2048000 172947 PO-B- / informix.links/bej/llogchk12a274b5f8 8 7 0 2048000 2047997 PO-B- / informix.links/bej/llogchk22a274b7e8 99 0 1024000 1023997 PO-B- / informix.links/bej/indxchk012a274b9d8 10 9 0 1024000 1023997 PO-B- / informix.links/bej/indxchk02

Note that in the first table, 4k has rootdbs and llogdbs, and llogdbs I specifically divided the dbspace for logical logs during installation. Notice that in the second list, the corresponding llogdbs has two chunck, while the first chunk llogchk1 has 172947 yuan of free, and the second llogchk2 has not used it at all, a full 8G.

If you don't have some space in advance, you'll have to find some space from other 4k dbspace, such as rootdbs, and if not, you'll have to add a new chunk. This is beyond the scope of this article.

Now, to add logical log files to the database, as prompted by the database, use the following command:

Onparams-a-d llogdbs-s 500000-I

The capacity unit above is KB, and 500000 is 500m. My original logical log was 50m, but this time in order to have enough space to roll back the transaction, I directly created a 500m log file.

If normal, the database will have a new logical log to continue with the transaction rollback, and if it is not enough, it will have to be added. But you'd better add it at once.

After a while, the transaction rollback is successful, the long transaction blocking state is released, the original checkpoint is rolled back, and all logical logs are restored to a recyclable state. Congratulations!

The logical log that has been added is too large. If you do not want to keep it, you can delete it with the following command

Onmode-l # forces skipping the current log file to proceed to the next log file. Onmode-l # forces skipping the current log file to proceed to the next log file. One more jump here, it's a little safer. Onmode-c # makes a checkpoint to block or unblock the database server. Onparams-d-l-y

The logical log file number can be viewed with onstat-l. Find the one you just added, and column 2 is its number.

So it's all over. It is recommended to make a level 0 backup of the database immediately.

But sometimes you have bad luck, if you still have a standby machine.

However, the actual situation yesterday is that the author is not so lucky. After performing the operation of adding logical log files, the database immediately down, and can not get up again.

19:34:32 Assert Failed: Unexpected virtual processor termination, pid = 213056, exit = 0x90009 19:34:32 IBM Informix Dynamic Server Version 11.50.FC6 19:34:32 Who: Session (5450877, life2@WIN-3XZYO8F2ZGA.lifebj.int, 3124, 7000002b0b548d0) Thread (5691992, sqlexec, 7000002a1335bb8, 1) File: mt.c Line: 14124 19:34:32 stack trace for pid 164620 written to / informix.dump/af.de4055c8 19:34:32 See Also: / informix.dump/af.de4055c8 19:34:36 mt.c Line 14124, thread 5691992, proc id 164620, Unexpected virtual processor termination, pid = 213056, exit = 0x90009. 19:34:38 The Master Daemon Died 19:34:38 PANIC: Attempting to bring system down 19:34:38 semctl: errno = 22 19:34:38 semctl: errno = 22-- above is the error reported by the system after adding the logical log file The following is the restart log-19:58:59 Log file 1 added to DBspace 7. 19:58:59 Logical Log 59579-Backup Completed 19:58:59 Assert Failed: Dynamic Server must abort 19:58:59 IBM Informix Dynamic Server Version 11.50.FC6 19:58:59 Who: Session (23, informix@bejlif, 0, 2a133e5b8) Thread (60, fast_rec, 2a1308878 5) File: rslog.c Line: 3629 19:58:59 Results: Dynamic Server must abort 19:58:59 Action: Reinitialize shared memory 19:58:59 stack trace for pid 176584 written to / informix.dump/af.4245b83 19:58:59 See Also: / informix.dump/af.4245b83 19:59:02 rslog.c, line 3629, thread 60, proc id 176584, Dynamic Server must abort. 19:59:03 Fatal error in ADM VP at mt.c:13851 19:59:03 Unexpected virtual processor termination, pid = 176584, exit = 0x100 19:59:03 PANIC: Attempting to bring system down

You can see that the increase of logfile was not successful, and after rebooting, the database tried to perform the operation that was not completed the last time it crashed, but there was still an error. I don't know why, but I guess it's the BUG of informix software or something. In this way, the mainframe is basically finished, anyway, there is nothing I can do about it.

But the problems that cannot be solved by technology can be solved by means of operation and maintenance. Our database belongs to the hdr master and standby computer. Then I watched onstat-x on the standby, where the long transaction was also locked, but the new logfile was not synchronized. So I had to switch the standby from "slave read-only" to stand-alone mode.

It is strongly recommended that before doing this, backup the data that can be backed up from the standby. The read-only database can retrieve the data. As soon as you enter the independent active mode, the unfinished long transactions will continue to be executed and long transactions will be blocked. You can't connect to the database. As a miserable IT, this is your last lifeline.

Onmode-d standard

The standby immediately entered the long transaction blocking mode. This is inevitable, and the standby needs to continue to roll back the transaction.

Then perform the same add logical log file operation.

Onparams-a-d llogdbs-s 500000-I

It worked this time! The standby machine quickly rolls back the transaction and returns to the Online state. Continue to carry out what was said earlier:

Check the standby data to make sure that the data is up-to-date, and before the business data reaches the blocking point, it means that you have not lost data.

The next thing to do is to make a level 0 backup from the standby to the tape.

Ontape-s-L 0

Put the tape on the host and perform the recovery.

Ontape-r

After finishing, do onmode-m to enter online mode. Check the data.

After the host is successfully restored, put the same tape back on the standby machine and perform a physical recovery:

Ontape-p

After the completion, you can synchronize the hdr of the master / slave machine:

On the host:

Onmode-d primary

On the standby:

Onmode-d secondary

I will not talk about the above backup, recovery and reconstruction of hdr, please follow your own manual.

5 the worst bad fruit

The above process failed, and you don't have a standby yet? Then you have regular backup tapes of the past, and if so, you can restore them according to your home manual. You can find a way to do something about the updated data since the last backup.

No backup tape yet? Why are you eating?

However, I heard that IBM has a way to delete or fix the damaged logic log, and then the machine will run again. Because the data is actually in the database.

But IBM's informix service is very expensive, and. The service for this product has been stopped.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.