Introduction to Oracle Cluster Health Monitor (CHM) 02/09 Update SLTechnology News&Howtos

Introduction to Oracle Cluster Health Monitor (CHM)

2026-02-09 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Database >

Shulou(Shulou.com)06/01 Report--

Introduction to Oracle Cluster Health Monitor (CHM) Overview

Cluster Health Monitor (hereinafter referred to as CHM) is a tool provided by Oracle, which is used to automatically collect the usage of operating system resources (CPU, memory, SWAP, processes, IWeiO, network, etc.). CHM collects data once a second.

These system resource data are very helpful for diagnosing node restart, Hang, instance expulsion (Eviction), performance problems and so on. In addition, users can use CHM to find some problems such as high system load and abnormal memory as early as possible, so as to avoid more serious problems.

CHM will automatically install the following software:

11.2.0.2 and later OracleGrid Infrastructure for Linux (excluding Linux Itanium), Solaris (Sparc 64 and x86-64)

11.2.0.3 and later OracleGrid Infrastructure for AIX, Windows (excluding Windows Itanium).

In the cluster, you can view the status of the resource (ora.crf) corresponding to CHM with the following command:

$crsctl stat res-t-init

[root@testrac2 bin] #. / crsctl stat resora.crf-init

NAME=ora.crf

TYPE=ora.crf.type

TARGET=ONLINE

STATE=ONLINE on testrac2

CHM mainly includes two services:

1) .System Monitor Service (osysmond): this service will run on all nodes, and osysmond will send the resource usage of each node to cluster logger service, which will receive and save the information of all nodes to CHM's database.

$ps-ef | grep osysmond

Root 7984 1 0Jun05? 01:16:14/u01/app/11.2.0/grid/bin/osysmond.bin

2) .Cluster Logger Service (ologgerd): in a cluster, ologgerd will have a host point (master) and a standby node (standby). When ologgerd cannot be started on the current node due to a problem, it is enabled on the standby node.

Primary node:

$ps-ef | grep ologgerd

Root 8257 1 0Jun05? 00:38:26/u01/app/11.2.0/grid/bin/ologgerd-M-d / u01/app/11.2.0/grid/crf/db/rac2

Backup node:

$ps-ef | grep ologgerd

Root 8353 1 0Jun05? 00:18:47/u01/app/11.2.0/grid/bin/ologgerd-m rac2-r-d

/ u01/app/11.2.0/grid/crf/db/rac1

CHM Repository: used to store collected data. By default, it exists under Grid Infrastructure home and requires 1 GB of disk space. Each node takes up about 0.5GB space every day. You can use OCLUMON to adjust its storage path and the amount of space allowed (data can only be saved for up to 3 days).

View current settin

The following command is used to view its current settings:

$oclumon manage-get reppath

CHM Repository Path = / u01/app/11.2.0/grid/crf/db/rac2

Done

$oclumon manage-get repsize

CHM Repository Size = 68082 / tmp/chm1.txt

$oclumon dumpnodeview-n node1 node2node3-last "12:00:00" > / tmp/chm1.txt

$oclumon dumpnodeview-allnodes-last "00:15:00" > / tmp/chm1.txt

Here are some parts of / tmp/chm1.txt:

Node: rac1 Clock:'06-15-12 07.40.01 'SerialNo:168880

SYSTEM:

# cpus: 1 cpu: 17.96 cpuq: 5 physmemfree: 32240 physmemtotal: 2065856 mcache:1064024 swapfree: 3988376 swaptotal: 4192956 ior: 57 io

W: 59 ios: 10 swpin: 0 swpout: 0 pgin: 57 pgout: 59 netr: 65.767 netw: 34.871 procs:183 rtprocs: 10 # fds: 4902 # sysfdlimit: 6815744

# disks: 4 # nics: 3 nicErrors: 0

TOP CONSUMERS:

Topcpu: 'mrtg (32385) 64.70' topprivmem: 'ologgerd (8353) 84068' topshm:'oracle (8760) 329452 'topfd:' ohasd.bin (6627) 720 'topthread:

'crsd.bin (8235) 44'

PROCESSES:

Name: 'mrtg' pid: 32385 # procfdlimit: 65536 cpuusage: 64.70 privmem: 1160 shm:1584 # fd: 5 # threads: 1 priority: 20 nice: 0

Name: 'oracle' pid: 32381 # procfdlimit: 65536 cpuusage: 0.29 privmem: 1456 shm:12444 # fd: 32 # threads: 1 priority: 15 nice: 0

...

Name: 'oracle' pid: 8756 # procfdlimit: 65536 cpuusage: 0.0 privmem: 2892 shm:24356 # fd: 47 # threads: 1 priority: 16 nice: 0

Node: rac2 Clock:'06-15-12 07.40.02 'SerialNo:168878

SYSTEM:

# cpus: 1 cpu: 40.72 cpuq: 8 physmemfree: 34072 physmemtotal: 2065856 mcache:1005636 swapfree: 3991808 swaptotal: 4192956 ior: 54 io

W: 104 ios: 11 swpin: 0 swpout: 0 pgin: 54 pgout: 104 netr: 77.817 netw: 33.008procs: 178 rtprocs: 10 # fds: 4948 # sysfdlimit: 68157

44 # disks: 4 # nics: 4 nicErrors: 0

TOP CONSUMERS:

Topcpu: 'orarootagent.bi (8490) 1.59' topprivmem: 'ologgerd (8257) 83108' topshm:'oracle (8873) 324868 'topfd:' ohasd.bin (6744) 720t

Opthread: 'crsd.bin (8362) 47'

PROCESSES:

Name: 'oracle' pid: 9040 # procfdlimit: 65536 cpuusage: 0.19 privmem: 6040 shm:121712 # fd: 33 # threads: 1 priority: 16 nice: 0

...

For more explanation of CHM, please refer to the official Oracle documentation:

Http://docs.oracle.com/cd/E11882_01/rac.112/e16794/troubleshoot.htm#CWADD92242

Oracle Clusterware Administration and Deployment Guide

11g Release 2 (11.2)

Part Number E16794-17

Or the My Oracle Support document:

Cluster Health Monitor (CHM) FAQ (Doc ID 1328466.1)

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.