ASM translation series 11: advanced knowledge Offline or drop? 02/12 Update SLTechnology News&Howtos

ASM translation series 11: advanced knowledge Offline or drop?

2026-02-12 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Database >

Shulou(Shulou.com)06/01 Report--

Original author: Bane Radulovic

Translator: Zhuang Peipei

Audit: Wei Xinghua Walk Technology

Joint production by DBGeeK community

Offline or drop?

When an ASM disk is not available, ASM removes it from the disk group, right? Depending on the situation, it usually depends on the ASM version and the redundancy level of the disk group. Because an external redundant disk group is directly dismount, the main focus is on normal and high redundant disk groups. ASM 10g version, the disk will be directly drop. From 11gR1, a disk is first offline when it is unavailable, and the disk repair timer begins to intervene. If the timer reaches the value of the disk group DISK_REPAIR_TIME attribute, the disk will be drop from the disk group to which it belongs. If the disk becomes available before the timer expires, its state will change back to online and will not be drop. But how did ASM find that disk recovery was available and what mechanism was in place to restore it to online?

Unavailable

A disk is considered unavailable when it cannot be read or written by ASM or ASM clients. Database is a typical ASM client, but ASM client is not limited to database. The disk will become unavailable for various reasons, such as damaged SCSI cable of the local hard disk, stored SAN switch or network failure, server failure in NFS space, site failure in double active scenarios, or disk failure, and so on. In either case, the ASM or ASM client will report an IO error, and ASM will handle it accordingly.

Drop

At ASM 10G, ASM immediately Drop becomes an unavailable disk. This triggers a rebalance operation that attempts to restore data redundancy. Once the rebalancing process is complete, the data redundancy is restored and the disk is removed from the disk group. Once the failure of the unavailable disk has been resolved, you can add the disk back to the disk group with the alter diskgroup command. For example: alter diskgroup DATA add disk 'ORCL: DISK077'; this triggers a rebalance operation again, and once the rebalance process is complete, the disk will revert to becoming a member of the disk group. But what happens if multiple disks fail at the same time, or if a disk failure occurs during rebalancing? This depends on several factors, such as the redundancy of the disk group, whether the disk is from the same or different failgroup, and whether the failed disk is a partner relationship. In a disk group with normal redundancy, ASM can tolerate one or more, or even all, disk failures from one failgroup. If disks from different failgroup become unavailable, ASM will tolerate them only if there is no partner relationship between them. The specific meaning of "tolerance" here means that the disk group can continue to online while ASM client access is not affected. In a disk group with a high redundancy level, ASM can tolerate disk failures from only one or more of the two failgroup, or even all of them. If disks from more than two failgroup become unavailable, the partner relationship rule is still in effect. Basically, ASM can tolerate any number of disks becoming unavailable, as long as there is no partner relationship between them.

Offline

When a disk is drop, the entire disk group needs to rebalance for it, and the whole process takes a lot of time. During this period, other disks may also fail, greatly increasing the risk of data loss. To solve this problem, ASM introduced the fast disk resync feature at the beginning of 11gR1. ASM no longer immediately drop unavailable disks, but instead sets them to the offline state. The point of this is to allow the ASM administrator to be told that there is a disk failure and fix the failure before the disk repair time timer reaches the threshold. The default disk repair timer threshold is 3.6 hours. This threshold can be adjusted through the alter diskgroup command, which is set to 12 hours. The command is as follows: alter diskgroup DATA set attribute 'DISK_REPAIR_TIME' =' 12 hours; while the disk is in offline, ASM keeps track of the modifications that need to be made on the offline disk. If the disk becomes available and returns to the online state before the timer reaches the threshold, ASM applies these modifications to the disk. This is the specific purpose of the fast disk resync feature. If the fault that caused the disk to go offline cannot be resolved, the disk will be dropped from the disk group by drop after the timer reaches the threshold.

Online

When a system administrator or ASM administrator fixes a failure that caused the disk to become unavailable (possibly replacing a failed cable), what should be done to restore the disk to its online state? Can this process be automatic? The answer also depends on the situation. If it is Exadata or Oracle Database Appliance, the disk will be automatically online. In other cases, the ASM administrator needs to restore the disk to the online state through the alter diskgroup command. For example: alter diskgroup DATA online disk 'ORCL: DISK077'; or alter diskgroup DATA online all

Conclusion

It is valuable to know what can happen in different failure scenarios, such as what the current version of ASM can and cannot do, and what level of protection the currently used disk group redundancy provides. About translator Zhuang Peipei, pre-sales engineer of Walk Technology Database, mainly responsible for database platform architecture design, product verification testing.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.