What are the high-availability cluster solutions of Oracle 07/04 Update SLTechnology News&Howtos

What are the high-availability cluster solutions of Oracle

2025-07-04 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Database >

Shulou(Shulou.com)05/31 Report--

This article mainly introduces "what are the high-availability cluster solutions of Oracle". In the daily operation, I believe many people have doubts about the high-availability cluster solutions of Oracle. The editor consulted all kinds of materials and sorted out simple and easy-to-use operation methods. I hope it will be helpful to answer the doubts of "what are the high-availability cluster solutions of Oracle?" Next, please follow the editor to study!

Three High availability Cluster schemes of Oracle

1 RAC (Real Application Clusters)

Multiple Oracle servers form a shared Cache, while these Oracle servers share a network-based storage. This system can tolerate single-machine / multi-machine failures. However, multiple nodes in the system need high-speed network interconnection, which basically means that everything is placed in a computer room, or in a data center. If there is something wrong with the computer room, such as the network, it will be broken. Therefore, using RAC alone can not meet the needs of the important business of ordinary Internet companies, which requires multiple computer rooms to tolerate accidents in a single computer room.

2 Data Guard. (the main function is redundancy)

Data Guard is suitable for multiple computer rooms. One computer room has a production database, and another computer room deploys a standby database. Standby database is divided into physical and logical. The physical standby database is mainly used for switching after production failure. On the other hand, logical standby database can share the read load of production database in peacetime.

3 MAA

MAA (Maximum Availability Architecture) is not an independent third, but a combination of the first two to provide the highest availability. RAC clusters are deployed in each computer room, and multiple computer rooms are synchronized with Data Guard.

Overview of RAC

Shared storage file systems (NFS), or even clustered file systems (such as OCFS2) are mainly used in storage area networks (all nodes directly access storage on the shared file system), which causes nodes to fail without affecting access to the file system from other nodes. In general, shared disk file systems are used in highly available clusters.

The core of Oracle RAC is the shared disk subsystem. All nodes in the cluster must be able to access all data, redo log files, control files and parameter files. Data disks must be available globally, allowing all nodes to access the database. Each node has its own redo log and control files, but other nodes must be able to access them in order to recover in the event of a system failure on that node.

Oracle RAC runs on top of a cluster, providing the highest level of availability, scalability, and low-cost computing power for Oracle databases. If one node in the cluster fails, Oracle will continue to run on the rest of the nodes. The main innovation of Oracle is a technology called cache consolidation. Cache merging enables nodes in the cluster to efficiently synchronize their memory caches through high-speed cluster interconnection, thus minimizing disk I and O. The most important advantage of caching is that it enables the disks of all nodes in the cluster to share access to all data. Data does not need to be partitioned between nodes. Oracle is the only vendor that provides open system databases with this capability. Other database software that claims to run on the cluster needs to partition the database data, which is impractical. The enterprise grid is the data center of the future, built on a large configuration of standardized commercial components, including processors, networks, and storage. Oracle RAC's cache merging technology provides the highest levels of availability and scalability. Oracle Database 10g and OracleRAC 10g significantly reduce operating costs and increase flexibility, giving the system greater adaptability, foresight, and flexibility. Dynamic provision of nodes, storage, CPU, and memory can achieve the required service levels while continuously reducing costs through increased utilization.

RAC integrated groupware management

Oracle RAC 10g provides a fully integrated clusterware management solution on all platforms on which Oracle Database 10g runs. This groupware feature includes cluster connectivity, message processing services and locking, cluster control and recovery, and a workload management framework (discussed below). The integrated clusterware management of Oracle RAC 10g has the following advantages:

(1) the cost is low. Oracle provides this feature free of charge.

(2) support from a single manufacturer. The problem of mutual prevarication has been eliminated.

(3) it is easier to install, configure and maintain continuously. Oracle RAC 10g Clusterware uses standard Oracle database management tools for installation, configuration, and maintenance. No other integration steps are required for this process.

(4) the quality of all platforms is consistent. Oracle tests the new software version more rigorously than third-party products.

(e) the functions of all platforms are consistent. For example, some third-party clusterware products limit the number of nodes that can be supported in the cluster. With Oracle RAC 10g, all platforms can support up to 64 nodes. Users can also get a consistent response experience on all platforms, effectively addressing high availability challenges, including server node failures, interconnection failures, and Imax O isolation phenomena.

(6) support advanced functions. This includes the integration of monitoring and notification capabilities to enable rapid and coordinated recovery between the database and the application layer in the event of a failure.

The architecture of RAC

RAC is a cluster solution for Oracle databases, which has the ability to coordinate the operation of two or more database nodes. The RAC structure diagram shown in the following figure:

Cluster Manager (Cluster Manager) integrates other modules in the cluster system and provides communication between cluster nodes through high-speed internal connections. The connection between the nodes is interconnected by the heartbeat line, and the information function on the heartbeat line determines the logical node member information and node updates of the cluster, as well as the running state of the node at a certain point in time to ensure the normal operation of the cluster system. The communication layer manages the communication between nodes. Its responsibility is to configure, interconnect the node information in the cluster, use the information generated by the heartbeat mechanism in the cluster manager, and the communication layer is responsible for the transmission to ensure the correct arrival of the information. There are also some cluster monitoring processes that constantly verify the health of different areas of the system. For example, heartbeat monitoring constantly verifies whether the heartbeat mechanism is working well. In an application environment, all servers use and manage the same database in order to disperse the workload of each server. Hardware requires at least two or more servers and a shared storage device; at the same time, two types of software are required, one is cluster software, and the other is the RAC component in the Oracle database. At the same time, the OS on all servers should be the same type of OS. According to the configuration policy of load balancer, when a client sends a request to the listener of a service, the server will send the request to the local RAC component for processing, or it may send it to the RAC component of another server for processing. After processing the request, RAC will access the shared storage device through cluster software. Logically, each node participating in the cluster has a separate instance that accesses the same database. Nodes communicate with each other through the communication layer (Communication Layer) of the cluster software. At the same time, in order to reduce the consumption of cache O, there is a global caching service, so each database instance retains a copy of the same database cache. The features in RAC are as follows:

The instance of each node has its own SGA

The instance of each node has its own background process

Each node has its own redo logs.

The instance of each node has its own undo table space

All nodes share a datafiles and controlfiles

The structure and Mechanism of RAC

Before Oracle9i, RAC was called OPS (Oracle Parallel Server). A big difference between RAC and OPS is that RAC uses Cache Fusion (High Cache merge) technology. The data blocks that have been taken out by nodes can be updated by another node before they are written to disk before they are written to disk, and then written to disk in the final version. In OPS, data requests between nodes need to be written to disk before the requesting node can read the data. When using Cache Fusion, the data buffers between the nodes of RAC transmit data blocks through a high-speed, low-latency internal network. The following figure is a schematic diagram of a typical RAC external service, and an Oracle RAC Cluster contains the following parts

Nodes of the cluster (Cluster node)-2 to N nodes or hosts running Oracle Database Server.

Private network (Network Interconnect)-A high-speed interconnected private network is needed between RAC to handle communication and Cache Fusion.

Shared storage (shared Storage)-RAC requires shared storage devices so that all nodes can access data files.

External service network (Production Network)-RAC external service network. Both clients and applications are accessed through this network.

RAC background process

Oracle RAC has its own unique background processes that do not play a configuration role in a single instance. As shown in the following figure, some background processes that RAC runs are defined. The functions of these background processes are described below.

(1) the LMS (Global cache service processes Global Cache Service process) process is mainly used to manage the access of data blocks in the cluster and to transmit block images in the Buffer Cache of different instances. Copy the block directly from the cache of the controlled instance, and then send a copy to the requested instance. And ensure that a data block can only be mirrored once in the Buffer Cache of all instances. The LMS process coordinates the access to the data block by passing messages in the instance. When an instance requests a data block, the LMD process of the instance issues a request for a data block resource, which is directed to the LMD process of the instance of the master data block, the LMD process of the master instance, and the LMD process of the instance in use. At this point, the LMS process that owns the instance of the resource creates a consistent read of the block mirror and passes the block to the BUFFER CACHE of the instance requesting the resource. The LMS process ensures that only one instance is allowed to update the block at a time and is responsible for maintaining the mirrored record of the block (including the status FLAG of the updated block). RAC provides 10 LMS processes (0,9), the number of which increases as the amount of data passed between nodes increases. (2) LMON (Lock Monitor Process, lock monitoring process) is a global queue service monitor. LMON processes of each instance communicate regularly to check the health status of each node in the cluster. When a node fails, it is responsible for cluster reconfiguration, GRD recovery and other operations. The service it provides is called Cluster Group Service (CGS).

LMON mainly relies on two kinds of heartbeat mechanism to complete the health examination.

(1) Network heartbeat between nodes (Network Heartbeat): it can be imagined that nodes send ping packets regularly to detect the status of nodes. If a response is received within a specified time, the other party's status is considered normal.

(2) by controlling the disk heartbeat of the file (controlfile heartbeat): the CKPT process of each node updates the data block of the control file every 3 seconds. This data block is called Checkpoint Progress Record, and the control file is shared, so the instances can check whether each other is updated in time.

(3) LMD (the global enqueue service daemon, lock management daemon) is a background process, also known as the global queue service daemon, because it is responsible for resource management requirements to control access blocks and global queues. Inside each instance, the LMD process manages incoming remote resource requests (that is, lock requests from other instances in the cluster). In addition, it is responsible for deadlock checking and monitoring conversion timeouts.

(4) LCK (the lock process, lock process) manages non-cache fusion, and lock requests are local resource requests. The LCK process manages resource requests and cross-instance invocation operations for instances of shared resources. During the recovery process, it establishes a list of invalid lock elements and validates the lock elements. Because of the primary function of LMS lock management during processing, only a single LCK process exists in each instance.

(5) DIAG (the diagnosability daemon, diagnostic daemon) is responsible for capturing information about process failures in the RAC environment. The tracking information is written out for failure analysis, and the information generated by DIAG is very useful in working with Oracle Support technology to find the cause of the failure. Only one DIAG process is required per instance.

(6) GSD (the global service daemon, global service process) interacts with RAC management tools dbca, srvctl and oem to complete management transactions such as instance startup and shutdown. In order to ensure the normal operation of these management tools, we must first start gsd on all nodes, and a GSD process supports multiple rac.gsd process bits ORACLEHOME/bin directory of a node, its log file is under the ORACLEHOME/bin directory, and its log file is ORACLE_HOME/srvm/log/gsdaemon.log. The GCS and GES processes are responsible for maintaining the status information of files and cache blocks for each data through the Global Resource Catalog (Global Resource Directory GRD). When an instance accesses the data and caches the data, other instances in the cluster also get a corresponding block image, so that other instances do not need to read the disk when accessing the data, but read the cache in the SGA directly. GRD exists in the memory structure of each active instance, which causes the SGA of the RAC environment to be larger than the SGA of the single instance database system. Other processes and memory structures are not much different from single instance databases.

RAC shared memory

RAC requires shared storage, independent of the instance, such as the ocr and votedisk mentioned above, as well as data files are stored in this shared storage. There are some storage methods such as OCFS, OCFS2, RAW, NFS, ASM and so on. OCFS (Oracle Cluster File System) and OCFS2 are just a file system, which, like NFS, provides a shared storage file system in a clustered environment. RAW bare device is also a kind of storage mode, which is supported by RAC in previous versions of oracle11g. Before Oralce9i, OPS/RAC support can only use this way, that is, map shared storage to RAW Device, and then choose RAW device storage for the data needed by Oracle, but RAW is not intuitive relative to the file system, not easy to manage, and the number of RAW Device is limited. RAW obviously needs a new scheme to replace it. This gives you a file system like OCFS. Of course, this is just a set file system implemented by Oracle itself, and there are file systems provided by other vendors that can be used as storage options. ASM is only a database storage solution, not a cluster solution, so here ASM should be different from the concept of RAW and OCFS/OCFS2 at the same level. RAW and OCFS/OCFS2 can not only be used as database storage solutions, but also as storage solutions in Clusterware, which is the storage needed in CRS, while ASM is only used as database storage, strictly speaking, it is only a node application (nodeapps) in RAC. ASM does not support ocr and votedisk, which are required for clusterware installation. After all, ASM itself needs an instance, and CRS is completely outside the architecture, which is one of the reasons why ASM's solution is used, but OCFS/OCFS2 and RAW are always added. The comparison of various RAC shared storage methods is as follows:

Cluster File system-OCFS/OCFS2 that supports windows and Linux

GPFS under AIX-the advantage is that it is easy to manage and intuitive to express, but the disadvantage is that it is based on file system management software and has to be dealt with by cache of OS, so it is not suitable for use in production environment. Can support CRS cluster software files and database files.

RAW bare device mode-through the hardware-supported shared storage system, directly stored with RAW devices, cluster software files and database files can be supported.

Network File system (NFS)-shared storage is implemented through NFS, but requires Oracle-certified NFS, which can support CRS cluster software files and database files.

ASM-- collection RAW has the advantages of high performance and easy management of cluster file system. Shared storage is introduced under Oracle10g, but ASM itself needs to be supported by Oracle instances, so ASM only supports database files, but not CRS files.

The difference between RAC Database and single instance Database

In order for all instances in RAC to access the database, all datafiles, control files, PFILE/Spfile and redo log files must be stored on a shared disk and can be accessed by all nodes at the same time, including bare devices and cluster file systems. RAC database is structurally different from a single instance: at least one more redo thread is configured for each instance, for example, a cluster of two instances requires at least four redo log group. Two redo group per instance. Also prepare an UNDO tablespace for each instance.

1. Redo and undo, who uses whose redo and undo segments when each instance makes changes to the database, locks the modified data separately, and separates the operations of different instances relatively independently to avoid data inconsistency. The special consideration of redo log and archive logs in this case will be considered later when backing up or restoring.

2. The instance of each node of memory and process has its own memory structure and process structure. The structure of each node is basically the same. Through Cache Fusion (cache fusion) technology, RAC synchronizes the cache information in SGA between nodes to improve access speed and ensure consistency.

At this point, the study on "what are the high-availability cluster solutions for Oracle" is over. I hope to be able to solve your doubts. The collocation of theory and practice can better help you learn, go and try it! If you want to continue to learn more related knowledge, please continue to follow the website, the editor will continue to work hard to bring you more practical articles!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.