What is the failure of database and its recovery strategy? 07/19 Update SLTechnology News&Howtos

What is the failure of database and its recovery strategy?

2025-07-19 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Database >

Shulou(Shulou.com)05/31 Report--

This article shows you how database failures and recovery strategies are. The content is concise and easy to understand. It will definitely make your eyes shine. I hope you can gain something through the detailed introduction of this article.

In the course of database operation, various failures may occur, which can be divided into three categories: transaction failure, system failure and media failure. Different recovery strategies should be adopted according to different types of failures.

1. Transaction failure and recovery:

Transaction failures represent failures caused by unexpected, abnormal program termination.

The causes of abnormal program termination include input data error, operation overflow, violation of storage protection, deadlock of parallel transaction, etc.

When a transaction failure occurs, the transaction that is forced to interrupt may have modified the database. In order to eliminate the impact of the transaction on the database, it is necessary to use the information recorded in the log file to forcibly roll back (RoLLBAcK) the transaction and restore the database to the initial state before modification.

To do this, check the log file for changes caused by these transactions and undo any changes made by these incomplete transactions.

This type of recovery is called transaction undo (unDo) and is described below.

(1) Scan the log file backwards to find updates to the transaction.

(2) Perform the reverse operation on the update operation of the transaction, that is, delete the inserted new record, insert the deleted record, restore the old value to the modified data, and use the old value to the new value. In this way, all the update operations that have been done by the transaction are scanned one by one from back to front, and the same processing is done until the start mark of the transaction is scanned and the transaction failure recovery is completed.

Thus, a transaction is both a unit of work and a unit of recovery. The shorter a transaction, the easier it is to undo it. If an application is running for a long time, it should be broken up into transactions, ending each transaction with an explicit coMMIT statement.

2, system failure and its recovery system failure refers to the system in the process of operation, due to some reason, causing the system to stop running, causing all running transactions to terminate in an abnormal way, requiring the system to restart. System failure may be caused by hardware errors (such as CPu failure, operating system) or DBMS code errors, sudden power failure, etc.

At this time, the contents of the database buffer in memory are all lost, although the database stored on the external storage device is not destroyed, but its contents are unreliable. After a system failure occurs, the impact on the database has the following two situations.

One scenario is when some of the outstanding transactions have been written to the database, so that after a system restart, all outstanding transactions are forcibly undone (unDo) to clean up the database modifications made by these transactions. These incomplete transactions have only the BEGIN TRANsLATl 0N tag in the log file and no COMMIT tag.

Another scenario is that some committed transactions have updates to the database that remain in buffers and have not yet been written to the physical database on disk, which also puts the database in an inconsistent state, so committed results of these transactions should be rewritten to the database. This type of recovery operation is called a redo of transactions (REDo). This committed transaction has both the BGIN TRANSMISSION tag and the COMMIT tag in the log file.

Therefore, recovery from a system failure involves both undoing all unfinished transactions and redoing all committed transactions in order to truly restore the database to a consistent state. This is done as follows.

(1) Forward scan log files to find transactions that have not yet been committed, and record their transaction identifiers in the revocation queue. At the same time, it looks for transactions that have already been committed and records their transaction IDs in the redo queue.

(2) Undo transactions in the undo queue. The method is the same as the undo method described in Transaction Failures.

(3) Redo the transactions in the redo queue. Redo processing method is to scan the log file forward, according to the log file registered in the operation content, re-execute the operation, so that the database restored to the latest available state.

After a system failure, because it is impossible to determine which unfinished transactions have updated the database and which transaction commit results have not been written to the database, after the system restarts, all unfinished transactions must be undone and all committed transactions must be redone.

However, some of the transactions that completed before the failure occurred ended normally and some ended abnormally. So there's no need to undo or redo them all.

Checkpoints are usually used to determine whether a transaction ends properly. Every once in a while, say 5 minutes, the system generates a checkpoint and does the following: a, writes what remains in the log buffer to the log file;b, writes a "checkpoint record" to the log file;c, writes the contents of the database buffer to the database, i.e., writes the updated contents to the physical database;d, writes the address of the checkpoint record in the log file to the "restart file."

Each checkpoint record contains information such as a list of all active transactions at checkpoint time and the address of the most recent log record for each transaction.

When restarting, the recovery management program first obtains the address of the checkpoint record from the "restart file," finds the contents of the checkpoint record from the log file, and looks back through the log to determine which transactions need to be undone, restored to the initial state, and which transactions need to be redone. Therefore, using checkpoint information can complete the restoration work timely, effectively and correctly.

Media failure and its recovery Media failure refers to the loss of part or all of the data stored in the external memory due to the destruction of the auxiliary memory medium during the operation of the system.

This type of failure is less likely to occur than transaction failures and system failures, but it is the most severe type of failure and is very destructive. Physical data and log files on disk may be corrupted. This requires mounting a backup database copy before the media failure occurred, and then using the log files to redo all transactions run after that copy.

The specific method is as follows.

(1) A copy of the database mounted to restore the database to the usable state at the time of the most recent dump.

(2) A copy of the loaded log file, redoing completed transactions based on the contents of the log file. First scan the log file to find the transactions committed at the time of the failure and log them to the redo queue. Then scan the log file forward and redo each transaction in the redo queue by scanning the log file forward and re-executing the registered operation for each redo transaction, that is, writing the "updated value" in the log record to the database.

This restores the database to a consistent state at some point before the failure.

What are the database failures and recovery strategies? Have you learned any knowledge or skills? If you want to learn more skills or enrich your knowledge reserves, please pay attention to the industry information channel.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.