In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-03-31 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Database >
Share
Shulou(Shulou.com)06/01 Report--
No one wants to see the failure of the online production system.
But when there is a failure, which has a business impact and involves accountability afterwards,
It is easy to pass the buck and talk to each other between operation and maintenance and development and operation and maintenance teams.
After all, no one wants to carry the fault responsibility.
What is the most effective way to solve this kind of problem?
The answer is root cause analysis!
Usually after root cause analysis, it is easy to reproduce and simulate the fault, and everything becomes easier.
Xiao y would like to share with you a case today:
A batch program has been running for several years, and everything is fine.
In the case of no power outage, suddenly one day can no longer run away! The phenomenon is that there are bad blocks in the database.
At this time, the operation and maintenance team with some "experience" may unconsciously bemoan and want to take the initiative to carry this pot.
Take it easy! Problem analysis, do not encounter problems to dabble in, or do not worry to say, there is no way to go down
Little y takes you today, let's go on a trip of bad root cause analysis.
At the same time, at the end of the article, you will give a specific hint of the potential risk of causing a bad block. after reading this article, you will find that this hidden danger may be latent for a long time and break out sometime. I hope you can understand and prevent it early and avoid stepping on the pit.
There is a problem with the Oracle database, or can't find the cause?
You might as well try Zhongyi Technology, and we will try our best to find the root cause of the failure and performance problems for you.
Here comes the problem!
Date:
ORA-08102: index keyword not found, object number 64560, file 4, block 192411 (2)
That is, the index is inconsistent with the table data.
two
What are the common reasons for ORA-8102?
Inconsistencies between table and index data caused by the loss of write to the @ IO
Inconsistencies between tables and indexes caused by @ ORACLE BUG
Can index rebuild be deleted? Brainstorming for the first time
I believe that the solution of most experts:
All are based on the data in the table, delete the index and then create it!
But it would be irresponsible to delete and rebuild the index until the cause of the problem is found
It may not be able to solve the problem fundamentally, and it is likely to recur many times in the future.
For example, the re-indexing is based on the table data, but since the data of the table and the index are inconsistent, why not correct the data of the table based on the index data? For example, when "the write IO of the table is lost, but the write IO of the index is successful", the index should not be rebuilt based on the table! Therefore, while re-indexing may solve the problem, it can lead to loss and corruption of business data.
Therefore, we can not blindly follow the index to deal with the problem, which may lead to data loss and destruction! This system stores financial data, and I need to be more careful. I will not make any changes or adjustments until I determine the cause of the problem.
So is there really a bad block? We need in-depth analysis to understand the truth behind the matter!
Thinking time-- how to continue the Analysis of Bad Root causes
How to analyze the cause of bad block?
At this point, readers might as well think about it.
If it were you, how would you find out the cause of the bad block?
Don't worry, think for three or five more minutes, the cause of the problem is behind, when to turn down, it's up to you to decide.
.
.
.
.
.
.
.
Root cause analysis process
First get the SQL that reported the error.
Looking at the trace of ORA-8102 that appears in the alert log and searching for current sql, you can find that the ORA-8102 error occurred when executing the following SQL statement
As you can see, the function implemented by this SQL is simple:
Update the qualified BANKID field in the TAB_XXX table from the old value to the new value
Get the definition of the index
Retrieve the dba_objects according to the object number in the error message, and you can see that the index in question is INDEX_TAB_XXX_FUNC. Further get the definition of the index, as shown below:
You can see:
This is a special index, which is a function index created by user-defined functions, which is a little different from the function indexes we usually use, such as trim, to_number and other built-in functions.
Get the definition of a custom function
As you can see here, the user-defined function is used, and the custom function code is as follows:
This function implements:
Get the organization number of the first-level row in the TAB_ANOTHER_XXX table by passing in an organization number.
The truth is gradually emerging.
At this point, little y went through his brain once and basically knew the truth of the problem. Readers might as well think, what is the truth of the problem?
Don't worry, think for three or five more minutes, the truth of the problem is behind, when to turn down, it is up to you to decide.
……
……
……
……
……
……
The truth comes out-- a successful reversal
At this point, Xiao y finally figured out that this is not a bad block, not a BUG of the database, but a BUG of the application!
When the application is about to perform the following SQL
It is important to note that:
Error SQL update table is the TAB_XXX table, and the custom function in the functional index is to take the value of the TAB_ANOTHER_XXX table, these are two different tables, need to pay attention!
In addition, the function specifies the deterministic attribute, that is, for a fixed bankid, the return value of the function is required to be fixed and unique from the business.
If the business level changes, and his top changes for the same bankid, it will inevitably lead to inconsistent table and index data!
Specifically, the process of the problem is as follows:
Because the bankid field is to be updated, and there is an index on this field, in addition to updating the bankid field in the data BLOCK, ORACLE needs to update the key value in the function index because the bankid field is updated and there is a user-defined function index on the bankid field.
To do this, oracle first needs to find the position of bankid=:b1 in the functional index, that is, to find keyvalue= FUNC_GET_TOP (: B1), and the row in the block in which the rowid= in the index entry is modified.
If the result of running the FUNC_GET_TOP (: b1) function at this time is not consistent with the result calculated after running FUNC_GET_TOP (: b1) when creating the index, then you will certainly not be able to find one that satisfies the index data. That is, the function value in the function index is uncertain, so it is obvious that ORACLE cannot find "keyvalue= FUNC_GET_TOP (: b1) & the row in the rowid= modified data block in the index entry". Then the ORA-08102 will be reported: no index keyword found.
Here, the FUNC_GET_TOP function fetches the value of the top _ bank_id field based on bankidtop in another TAB_ANOTHER_ XXX table. If the top_bank_ id value of bankid=:b1 changes, then the result of FUNC_GET_TOP (: B1) changes, so for an incoming bankid, the result calculated twice is uncertain, that is, the index key value is not fixed. The deterministic attribute for the custom function is not satisfied in the function index (the output of the function is fixed and unique for each input).
This is the essential cause of this malfunction.
Why didn't you show up before?
A subordinate municipal bank originally belonged to a provincial bank (first-level bank), but because of its large volume of business, it later became a first-level bank, which means that the first-level bank of a network organization has changed from a provincial bank to a municipal bank. This was adjusted at the beginning of the year. It means that the table TAB_ANOTHER_XXX has changed, but the TAB_XXX table has not changed.
On the failure day (half a year later), the subordinate lines of the municipal bank carried out the organization withdrawal and merger for the first time, that is, the bank_id was modified, and the TAB_XXX table began to change, so it was necessary to maintain the index synchronously, that is, to modify the corresponding key value in the functional index, which triggered this problem. Bankid's top_bank_id was adjusted at the beginning of the year, but there was no need to maintain the index at that time, so adjusting the organization at the beginning of the year did not cause problems. It was more than half a year before the problem appeared.
Problem simulation reproduction
Principle explanation:
When the id=1 of the tb1 table is updated to id=12, the update on the data block has been completed (the execution plan is a full table scan, and there is no index on the id field), and the functional index fc1 (id) on the id field needs to be updated.
To update the functional index on the id field, you need to first execute fc1 (1) as the key value to look up, the result is obviously 111( new top_id), at this time the index obviously can not find "key value = 111, rowid points to an entry of database slot", so report ORA-8102, no index keyword found. It appears that the data is inconsistent with the index, but it is actually caused by the fact that there is no guarantee that the custom function index returns a fixed value.
Problem solving
To sum up, because we understand the whole process of the problem, it is safe to rebuild the index! Will not lead to data loss and destruction!
After feedback with the application team and the development project team on the nature of the problem, the application team organizers tested in the test environment, temporarily modified the top organization numbers of some organizations, raised the batch again, and the problem was reproduced and reported an error ORA-8102.
Since there will be no more changes in the rows at the update level in the short term, the temporary solution is to rebuild the index.
If there is a similar need to modify a first-level row later, rebuild the index again from the process. Technically, remove the functional index and use a subquery instead.
Finally, the application team submitted the changes to the re-indexing, the batch was completed successfully, the problem was solved successfully, and the root cause was found.
Risk hint
Risk Tips:
When using a custom function index, you need to ensure that, for a certain input value of the custom function, Ideci1, its output value, Odeci1, must be operationally unique. Otherwise, it will lead to the inconsistency between the index and the data and the error of ORA-8102. This kind of problem is hidden and may lurk for a period of time. The problem occurs when the output value of a certain input value of a custom function is changed from Olympus 1 to Olympian 2.
Therefore, we suggest that:
1) troubleshoot the use of user-defined function indexes, and work with the application team to determine whether the custom function can return a fixed and unique value for an incoming value.
2) transfer the risk of custom function index to the development center, popularize it, and put an end to this kind of problem in the development stage.
How to check if there is a similar problem in your system?
one
First check the functional indexes of all applications
The output is shown below. The DATA_DEFAULT field is the definition of the function index. After excluding trim/to_number/to_char and other custom functions, the function index is customized.
two
Then use the following command to get the function definition
When executing, the function name and user in the above command need to be replaced with the function and user in the above query analysis result.
three
Finally, sort out the contents of the custom function.
Check with the application team to see if the function ensures that the return value of the function is fixed and unique for each input. If it cannot be confirmed, rectification needs to be carried out to avoid failures caused by subsequent business changes and affect business continuity.
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.