How to understand the lmd process of oracle rac 07/12 Update SLTechnology News&Howtos

How to understand the lmd process of oracle rac

2025-07-12 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Database >

Shulou(Shulou.com)05/31 Report--

This article mainly introduces "how to understand the lmd process of oracle rac". In the daily operation, I believe that many people have doubts about how to understand the lmd process of oracle rac. The editor consulted all kinds of materials and sorted out simple and easy-to-use operation methods. I hope it will be helpful to answer the doubts of "how to understand the lmd process of oracle rac". Next, please follow the editor to study!

Conclusion

1. The test environment is oracle 10.2.0.1 rac

2 if the RAC process is interrupted abnormally, it will cause the instance to restart, and a SYSTEMSTATE DUMP file will be generated before shutting down the library.

3 the lmd process monitors the lmd process, that is, if the Lmon process dies, it will be restarted by the lmon process

4the LMD process is responsible for the global queue service, namely GES, which, to put it bluntly, is to manage resource requests across multiple instances of RAC. This shows the importance of the LMD process. If the LMD fails, the database DML operation will HANG.

In turn, it will cause the delay of IPC communication between RAC nodes.

5Magi IPC communication delay will generate the corresponding TRACE FILE of LMD

test

-- lmd meaning

The lmd process is the process responsible for the global queue service, that is, GES

It is responsible for resource requests from remote RAC nodes for each RAC instance; and it is a DAEMON process, that is, it is protected by a monitoring process, and if it does not exist, it is restarted by the monitoring process

It can be seen that if the lmd process is interrupted abnormally, it will directly cause the RAC node to be forcibly shut down, and a systemstate dump will be generated for analysis before shutting down the instance.

[oracle@jingfa1 ~] $ps-ef | grep lmd

Oracle 4774 1 0 Nov09? 00:00:31 asm_lmd0_+ASM1

Oracle 11220 1 0 02:13? 00:00:15 ora_lmd0_jingfa1

Oracle 30706 30376 0 05:19 pts/3 00:00:00 grep lmd

[oracle@jingfa1] $kill-9 11220

Tue Nov 10 05:20:03 2015

Errors in file / u01/app/oracle/admin/jingfa/bdump/jingfa1_pmon_11212.trc:

ORA-00482: LMD* process terminated with error

Tue Nov 10 05:20:03 2015

PMON: terminating instance due to error 482

Tue Nov 10 05:20:03 2015

Errors in file / u01/app/oracle/admin/jingfa/bdump/jingfa1_lms0_11222.trc:

ORA-00482: LMD* process terminated with error

Tue Nov 10 05:20:03 2015

System state dump is made for local instance

System State dumped to trace file / u01/app/oracle/admin/jingfa/bdump/jingfa1_diag_11214.trc

Tue Nov 10 05:20:03 2015

Trace dumping is performing id= [cdmp _ 20151110052003]

Tue Nov 10 05:20:08 2015

Instance terminated by PMON, pid = 11212

-- immediately after the instance, it will restart automatically.

Tue Nov 10 05:21:05 2015

Starting ORACLE instance (normal)

LICENSE_MAX_SESSION = 0

It can be seen that the lmd process will restart automatically.

[oracle@jingfa1 ~] $ps-ef | grep lmd

Oracle 3474 30376 0 05:23 pts/3 00:00:00 grep lmd

Oracle 4774 1 0 Nov09? 00:00:31 asm_lmd0_+ASM1

Oracle 32703 1 0 05:21? 00:00:00 ora_lmd0_jingfa1

It is said that the health of the lmd process is responsible for its monitoring process. The official manual is the lmon process. The LMON process is responsible for the cross-instance or global queue and resource management of each RAC instance, as well as the recovery operation of the global queue lock.

[oracle@jingfa1 bdump] $ps-ef | grep lmon

Oracle 4772 1 0 Nov09? 00:00:29 asm_lmon_+ASM1

Oracle 19857 30376 0 05:34 pts/3 00:00:00 grep lmon

Oracle 32701 1 0 05:21? 00:00:02 ora_lmon_jingfa1

[oracle@jingfa1 bdump] $kill-9 32701

It can be seen that if the LMON is interrupted abnormally, the LMD process to which it belongs will also be forced to shut down.

[oracle@jingfa1 bdump] $ps-ef | grep lmd

Oracle 4774 1 0 Nov09? 00:00:32 asm_lmd0_+ASM1

Oracle 21171 30376 0 05:34 pts/3 00:00:00 grep lmd

It can be seen that whenever the lmon process is interrupted abnormally, the database instance will be forced to restart.

Tue Nov 10 05:34:18 2015

Errors in file / u01/app/oracle/admin/jingfa/bdump/jingfa1_pmon_32695.trc:

ORA-00481: LMON process terminated with error

Tue Nov 10 05:34:18 2015

PMON: terminating instance due to error 481

Tue Nov 10 05:34:18 2015

System state dump is made for local instance

System State dumped to trace file / u01/app/oracle/admin/jingfa/bdump/jingfa1_diag_32697.trc

Tue Nov 10 05:34:18 2015

Trace dumping is performing id= [cdmp _ 20151110053418]

Tue Nov 10 05:34:23 2015

Instance terminated by PMON, pid = 32695

Tue Nov 10 05:35:19 2015

Starting ORACLE instance (normal)

It can be seen that lmon and lmd will restart automatically.

[oracle@jingfa1 bdump] $ps-ef | grep lmon

Oracle 4772 1 0 Nov09? 00:00:30 asm_lmon_+ASM1

Oracle 21820 1 0 05:35? 00:00:01 ora_lmon_jingfa1

Oracle 27926 30376 0 05:39 pts/3 00:00:00 grep lmon

[oracle@jingfa1 bdump] $ps-ef | grep lmd

Oracle 4774 1 0 Nov09? 00:00:33 asm_lmd0_+ASM1

Oracle 21822 1 0 05:35? 00:00:00 ora_lmd0_jingfa1

Oracle 28028 30376 0 05:39 pts/3 00:00:00 grep lmd

By extension, that is to say, it is certain that there will be some mechanism at the operating system level to ensure that lmon and lmd processes will restart when they are abnormally interrupted. What is this mechanism?

After analyzing the processes at the operating system level, mainly under / etc/init.d, it is found that lmon and its lmd belong to the ORACLE level, not the cluster level, and there is no corresponding process to control them.

Let's look at it another way. What are the parameters related to the lmd process and what do they mean?

NAME_1 VALUE_1 DESC1

_ lm_lmd_waittime 8 default waittime for lmd in centiseconds

-node1

SQL > select addr,program,username,pid,spid from v$process where username='oracle' and spid=21822

ADDR PROGRAM USERNAME PID SPID

0000000083A585C8 oracle@jingfa1 (LMD0) oracle 6 21822

-- node2

SQL > select addr,program,username,pid,spid from v$process where username='oracle' and spid=668

ADDR PROGRAM USERNAME PID SPID

0000000083A585C8 oracle@jingfa2 (LMD0) oracle 6 668

-- node2

SQL > conn tbs_zxy/system

Connected.

SQL > update t_lock set axiom 11 where astat1

1 row updated.

-- node1

SQL > update t_lock set axiom 1111 where astat1

-- hang residence

It can be seen that the above parameters are not directly related to lock detection, but lmd is related to global locks.

To put it another way, what if oradebug simulates pausing lmd and then generates a global lock?

-node1

Pause lmd

SQL > oradebug setospid 21822

Oracle pid: 6, Unix process pid: 21822, image: oracle@jingfa1 (LMD0)

SQL > oradebug suspend

Statement processed.

Tue Nov 10 06:03:44 2015

Unix process pid: 21822, image: oracle@jingfa1 (LMD0) flash frozen

-node2

Pause lmd

SQL > oradebug setospid 668

Oracle pid: 6, Unix process pid: 668, image: oracle@jingfa2 (LMD0)

SQL > oradebug suspend

Statement processed.

Tue Nov 10 06:06:08 2015

Unix process pid: 668, image: oracle@jingfa2 (LMD0) flash frozen

-node2

SQL > update t_lock set axiom 11 where astat1

1 row updated.

-- node1

SQL > update t_lock set axiom 1111 where astat1

-- hang residence

Now start to observe the alarm logs of node 1 and node 2

-- node2

Tue Nov 10 06:09:42 2015

IPC Send timeout detected.Sender: ospid 682-visible that the sending process is a SMON process

Receiver: inst 1 binc 432326879 ospid 21822-you can see that the recipient is the LMD process of NODE1

Tue Nov 10 06:09:45 2015

IPC Send timeout to 0.0 inc 20 for msg type 12 from opid 12-same as above, the recipient is also a SMON process

Tue Nov 10 06:09:45 2015

Communications reconfiguration: instance_number 1

Tue Nov 10 06:09:45 2015

IPC Send timeout detected.Sender: ospid 696-it can be seen that the MMON process is the sending process

Receiver: inst 1 binc 432326879 ospid 21822-it can be seen that the receiving process is the lmd process of the node

Tue Nov 10 06:09:48 2015

IPC Send timeout to 0.0 inc 20 for msg type 12 from opid 15-same as above, the recipient is the mmon sending process

-- node1

Tue Nov 10 06:09:23 2015

IPC Send timeout detected. Receiver ospid 21822-visible acceptance as a LMD process

Tue Nov 10 06:09:23 2015

Errors in file / u01/app/oracle/admin/jingfa/bdump/jingfa1_lmd0_21822.trc:-- generate a TRACE file for LMD

IPC Send timeout detected. Receiver ospid 21822-ditto

Tue Nov 10 06:09:27 2015

Errors in file / u01/app/oracle/admin/jingfa/bdump/jingfa1_lmd0_21822.trc:

It can be seen from the above that lmd is indeed related to global lock acquisition. If the LMD process fails, it will lead to communication problems among RAC2 nodes.

[oracle@jingfa2 bdump] $ps-ef | grep 682

Oracle 682 1 0 02:14? 00:00:01 ora_smon_jingfa2

Oracle 7157 13004 0 06:15 pts/1 00:00:00 grep 682

SQL > select spid,pid,program from v$process where spid=696

SPID PID PROGRAM

696 15 oracle@jingfa2 (MMON)

At this point, the study on "how to understand the lmd process of oracle rac" is over. I hope to be able to solve your doubts. The collocation of theory and practice can better help you learn, go and try it! If you want to continue to learn more related knowledge, please continue to follow the website, the editor will continue to work hard to bring you more practical articles!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.