Analysis of Fault recovery in Database transaction 07/02 Update SLTechnology News&Howtos

Analysis of Fault recovery in Database transaction

2025-07-02 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Database >

Shulou(Shulou.com)06/01 Report--

Brief introduction

Ensuring the consistency of the data is one of the most basic functions of the database, so how does the database ensure the consistency of the data in the database when the machine downmachine or other unexpected circumstances occur? The database itself mainly relies on undolog and redolog log files to maintain data consistency. This article will focus on undolog. How to use undolog to achieve database consistency.

Brief introduction of Database Architecture

To introduce the implementation mechanism of database consistency, it is necessary to introduce the overall structure of the database. Here, draw a diagram to introduce the architecture of the database. Since the structure of the database is not the focus of this article, draw a diagram here. If you want to learn more about the database architecture, you can refer to other articles. )

To simplify the database, it is mainly divided into several parts:

The query processor is mainly responsible for the parsing of the query sql, the selection of the execution plan and so on.

§transaction manager, the transaction is the smallest unit of database operation, and the transaction manager is mainly aimed at assigning transaction id and so on.

§Log Manager

§recovery Manager

§buffer manager, as we all know, the write operations in the database are done in the buffer, and then flush to the hard disk.

§hard disk data, logs, whether data from the database, or log files, are eventually written to the hard disk for persistent storage.

The following describes how the database does disaster recovery for the components mentioned above. This one is mainly about undolog, and the next one is about redolog.

Brief introduction to undo Log

The undo log, as its name implies, is the undo log, that is, the log records the relevant undo operations. Through the database architecture diagram just now, we can also see that the processing of writing data is mainly carried out in memory. Since it is carried out in memory, there will be the problem of data loss caused by machine downtime. So how does the database ensure data consistency through undo logs?

To describe this problem, we need to define several operations first. Suppose we want to do something like this now, we need to read a piece of data X from the database, and then change its value to Y, and then write it back. OK, in such an operation, the database may have to go through several processes like this. First of all, we will see if there is any in the buffer. If there is, we will return the data directly. We call such a process Read (X). Assuming that there is no buffer in the buffer, we need to read it from the hard disk to the buffer and then return it to the user. Then we define the process of reading from the hard disk to the buffer as Input (X), that is, if there is no buffer in the buffer. The database goes through a Read (X), then an Input (X), and then a Read (X). The same is true when modifying, the database needs to modify the contents of the buffer, then this operation is called Write (Y). There is also a process from memory flushing to disk, which we call output (Y). OK, with these definitions, we analyze them one by one from these processes. If the database fails in one of these processes, how to ensure the consistency of the data.

Before the introduction, let's briefly talk about the format of the undolog log. The format of the undolog log is like this. T represents a transaction ID,A represents a column of a row, and X represents the original value. In other words, when this log represents the transaction T, the original value of An is X. Yes, undolog only records the original value. He doesn't care how much you change it to. What he cares about is how much it used to be, because he will only do undo work in the future. In addition to this, start is recorded in undolog, which means to open a transaction, commit, which means to commit a transaction. In general, we can abstract into these first.

For example, if we want to do the above problem, that is, to read a value and then modify it (assuming it is not in the buffer), we have to go through the following steps:

See the table:

Serial number

Operation

Undolog

one

Start

two

Read (X)

three

Input (X)

four

Read (X)

five

Write (Y)

six

seven

Flush undolog

eight

Output (Y)

nine

...

ten

Commit

eleven

End

ten

Flush undolog

Let's look at the table above, and I'll explain it below. First of all, we need to make it clear that both the data in the database and the log are operated in memory first, and then flush to the hard disk. There is no doubt about this.

The first four steps should be easy to understand. At first, you need to record a start flag bit in undolog, then steps 2, 3 and 4 read the contents of the database, step 5 write to memory and change the value of X to Y, and then step 6 undolog will record that the original value of An in transaction T is X, so what about step 7? Should undolog be flush first, or should it be flush after output?

Let's assume that the flush of log is performed after output. If the database downloads just after output, the result is obvious. Undolog is not recorded in the log file (because there is no flush to the hard disk), so it cannot be redone, so it will lead to data inconsistency. Therefore, it is not advisable to flush undolog after output.

Let's take a look at the significance of the above order. Suppose there is an outage between steps 6 and 7, that is, before flush undolog, it will not affect the consistency of the data, because the data is not written to the hard disk. If there is a downtime between steps 7 and 8, although the data in the database is not written to the hard disk, log is already flush, then it will be redone through the log after flush, because the system does not know whether the log has been done, but even if it is redone, it does not affect the final data consistency. It is just to write the original data again, that is, from X to X, which does not affect the consistency of the database. Undolog is idempotent, that is, the result of doing it several times is the same. So the above order is reasonable.

Data recovery through undolog

Now that we have undolog, let's take a look at how the database recovers data through undolog. At this point, the recovery manager in the above architecture diagram works, and the recovery manager will scan the undolog to find out the start without end, because we can see from the above order that the "end" record flush to log is only flush after the transaction is committed, so, as long as there is an end record, it shows that the transaction itself has ended and data consistency can be guaranteed. So the recovery manager scans the start without end coordination, then starts with start and redoes it according to the previous values recorded in undolog. However, according to our previous model, we will find that when the recovery manager redoes, there can be no other writes, that is, the current writes should be rammed. And there is a problem, the recovery manager needs to scan the undolog from scratch, in fact, it is not necessary, there can be a checkpoint (on behalf of the previous data can ensure data consistency), the recovery manager only needs to find the last checkpoint, and then do it from checkpoint.

In view of the above two problems, we will discuss them below.

In the model described above, the recovery manager must perform log recovery by scanning the entire undolog, which is obviously not necessary, because the system interrupt must be affected in the last few transactions, the previous transaction should have completed commit or rollback, and there will be no abort, so how do we know which transactions are affected? if we know which transactions are affected. Then we can scan only a small part of the article instead of scanning the whole article. Here's how the database knows which transactions are affected, and the database introduces the concept of checkpoint for this purpose.

Checkpoint checkpoint

Checkpoint, or checkpoint. Writing a checkpoint in undolog indicates that all transactions before checkpoint have completed commit or rollback, that is, the transaction before the checkpoint no longer has the problem of data consistency. So how to achieve this checkpoint? In fact, the implementation mechanism is very simple, that is, periodically write to the undolog. Of course, this write is certainly not casually written in, when writing in, be sure to check whether the previous transaction is completed.

This brings a problem because the database is running all the time, that is, the transaction is constantly starting, and there may be n transactions already in the starting state. When the checkpoint is written in, a new transaction may start. If it is difficult to let the checkpoint wait until no new transaction is started and all the previous transactions have been committed, then the basic checkpoint does not have to be written in. So, in this case, you can only stop accepting new transactions when the checkpoint is written in, wait for the started transaction to be committed, and then the checkpoint is written. And then continue to accept new things. Something like this: for example, if there are now two transactions on T1 and T2, then write in undolog:

Undolog

Start T1

Start T2

When it comes to the checkpoint cycle, if you want to write the checkpoint into it, you have to wait until the T1 T2 has been fully submitted, and then write the checkpoint chkpoint. That is, if there is a T3 to open now, it cannot be opened. The system is in a rammed state. After writing, turn on T3, and log as follows:

Undolog

Start T1

Start T2

End T1

End T2

Chkpoint

Start T3

At this point, if the system dies, the recovery manager will scan forward from the tail of the undolog, and after scanning to the checkpoint, it will not scan forward, because the previous transactions have been committed and there is no data consistency problem. So you just need to redo it from checkpoint.

This is good and saves the trouble of undolog scanning from scratch, but the disadvantage is also obvious, that is, during the process of writing checkpoint, the system is in a tamped state and all writes are paused. Is there a better way to write to checkpoint without requiring a system pause? yes, of course, this is the non-static checkpoint I'm going to talk about below.

Non-static checkpoint

A non-static checkpoint is relative to a static checkpoint, and what is mentioned above is a static checkpoint, because the system cannot write while the checkpoint is written. The introduction of non-static checkpoints is to solve this problem.

The strategy for non-static checkpoints is to record currently active transactions while writing to the chkpoint. For example, if both T1 and T2 are active in the current state, start checkpoint will be written to the undolog (T1PowerT2), and the overall system will still be written normally, that is, other transactions can continue to be opened after this log is written. When T1 and T2 are completed, a record of end checkpoint will be written. For example, the record is as follows:

Undolog

Start T1

Start T2

Start checkpoint (T1 and T2)

Start T3

End T1

End T2

End chkpoint

Start checkpoint (T3)

End T3

End chkpoint

The above log record is that after the start of T1Magine T2, the checkpoint of start checkpoint (T1MagneT2) is written in undolog, while it is still acceptable to start other transactions, and then there is the start of T3 transaction.

Through this mechanism, the malpractice of shutting down the service during checkpoint writing can be effectively avoided, but now the problem is again, it is good to write checkpoints this way, but how does the recovery manager perform data recovery operations through such undolog? Because, if the checkpoint is static, you don't have to look forward after finding the checkpoint, but now it's different, because after finding the endcheckpoint, there may still be outstanding transactions ahead, so how does data recovery recover at this time?

In this case, after the database is down, the recovery manager will still scan undolog from tail to front. If "endchkpoint" is encountered, it does not mean that all transactions before checkpoint have been committed, but we can know that all uncommitted transactions are after the previous startcheckpoint, so we will continue to look forward and find startcheckpoint. After finding startcheckpoint, for example, startcheckpoint (T1PowerT2), because endchkpoint has been found previously. Therefore, the two transactions of T1pT2 can ensure data consistency. What needs to be redone is the non-T1PenT2 transactions between startchecpoint (T1pT2) and end chkpoint. These transactions need to be redone, so redo them.

There is another situation, that is, when the recovery manager is scanning, it first encounters the log of startcheckpoint (T1MagneT2). In this case, we first know that T1 T2 may be an unfinished transaction, and then we need to find out whether there is an end statement for a transaction after startcheckpoint. If there is, it means that the transaction is completed, if not, it means it is not completed, then we have to look back from checkpoint. Find the start of the transaction and redo it from start. To put it in a wordy way, let's take the last example to illustrate this situation.

For example, after the database is down, start scanning the undolog and get the following snippet:

Undolog

Start T1

Start T2

Start checkpoint (T1 and T2)

Start T3

End T1

At this time, the recovery manager got this fragment and scanned, and encountered start checkpoint (T1Magine T2) before meeting end chkpoint, which shows that T1 Magi T2 may not have completed the transaction, and before that, it also encountered the start of T3, without end T3, nor the beginning of any T3 checkpoint, indicating that T3 must be an unfinished transaction, so T3 must be redone. Why did you say that T _ 1 and T _ 2 may not have completed the transaction? Because I encountered start checkpoint (T1jue T2) and did not encounter end chkpoint, it does not mean that T1 and T2 must be unfinished. There may be one that has already been commit, because both of them do not have commit, so there is no end chkpoint. So at this time, I looked for the log under start and found "end T1", indicating that the transaction of T1 has been completed. All you have to do is find the opening of T2 and start to redo it, then go up through start checkpoint (T1MagneT2), find start T2, and start redoing T2, that is, in this log, T2 and T3 need to be redone, and then redone. (note: just now we talked about doing T3 and then redoing T2, which does not mean that this is the real order. in fact, the recovery manager first analyzes the transactions that need to be redone, and then does them together.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.