The process and solution of the endless cycle of user-mode process in linux system 07/12 Update SLTechnology News&Howtos

The process and solution of the endless cycle of user-mode process in linux system

2025-07-12 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Servers >

Shulou(Shulou.com)06/01 Report--

This article mainly explains "the process and solution of the dead cycle of user-mode process in linux system". Interested friends may wish to have a look. The method introduced in this paper is simple, fast and practical. Let's let the editor take you to learn the process and solutions of the endless cycle of user-mode processes in the linux system.

1. Problem phenomenon

Business processes (user-mode multithreaded programs) hang up, the operating system is slow to respond, and there is nothing unusual in the system log. Judging from the kernel state stack of the process, it seems that all threads are stuck in the following stack flow in kernel state:

[root@vmc116 ~] # cat / proc/27007/task/11825/stack

[] retint_careful+0x14/0x32

[] 0xffffffffffffffff

2. Problem analysis.

1) Kernel stack analysis

From the point of view of the kernel stack, all the processes are blocked on the retint_careful. This is the flow during the interrupt return process. The code (assembly) is as follows:

Entry_64.S

The code is as follows:

Ret_from_intr:

DISABLE_INTERRUPTS (CLBR_NONE)

TRACE_IRQS_OFF

Decl PER_CPU_VAR (irq_count)

/ * Restore saved previous stack * /

Popq rsi

CFI_DEF_CFA rsi,SS+8-RBP / * reg/off reset after def_cfa_expr * /

Leaq ARGOFFSET-RBP (% rsi),% rsp

CFI_DEF_CFA_REGISTER rsp

CFI_ADJUST_CFA_OFFSET RBP-ARGOFFSET

. . .

Retint_careful:

CFI_RESTORE_STATE

Bt $TIF_NEED_RESCHED,%edx

Jnc retint_signal

TRACE_IRQS_ON

ENABLE_INTERRUPTS (CLBR_NONE)

Pushq_cfi rdi

SCHEDULE_USER

Popq_cfi rdi

GET_THREAD_INFO (% rcx)

DISABLE_INTERRUPTS (CLBR_NONE)

TRACE_IRQS_OFF

Jmp retint_check

This is actually the process that the user-mode process returns from the interrupt after the user-mode process is interrupted, combined with retint_careful+0x14/0x32, to disassemble, you can confirm that the blocking point is actually

SCHEDULE_USER

This is actually a call to schedule () for scheduling, that is, when the process goes into the process returned by the interrupt, it finds that it needs to be scheduled (TIF_NEED_RESCHED is set), so scheduling occurs here.

There is a question: why can't you see the stack frame at the level of schedule () in the stack?

Because this is called directly by assembly, there is no related stack frame stacking and context saving operations.

2) analyze the state information

Judging from the results of the top command, the relevant thread has actually been in R state all the time, the CPU is almost completely exhausted, and most of them are consumed in user mode:

[root@vmc116 ~] # top

Top-09:42:23 up 16 days, 2:21, 23 users, load average: 84.08,84.30,83.62

Tasks: 1037 total, 85 running, 952 sleeping, 0 stopped, 0 zombie

Cpu (s): 97.6%us, 2.2%sy, 0.2%ni, 0.0%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st

Mem: 32878852k total, 32315464k used, 563388k free, 374152k buffers

Swap: 35110904k total, 38644k used, 35072260k free, 28852536k cached

PID USER PR NI VIRT RES SHR S CPU MEM TIME+ COMMAND

27074 root 20 0 5316m 163m 14m R 10.2 0.5 321:06.17 z_itask_templat

27084 root 20 0 5316m 163m 14m R 10.2 0.5 296:23.37 z_itask_templat

27085 root 20 0 5316m 163m 14m R 10.2 0.5 337:57.26 z_itask_templat

27095 root 20 0 5316m 163m 14m R 10.2 0.5 327:31.93 z_itask_templat

27102 root 20 0 5316m 163m 14m R 10.2 0.5 306:49.44 z_itask_templat

27113 root 20 0 5316m 163m 14m R 10.2 0.5 310:47.41 z_itask_templat

25730 root 20 0 5316m 163m 14m R 10.2 0.5 283:03.37 z_itask_templat

30069 root 20 0 5316m 163m 14m R 10.2 0.5 283:49.67 z_itask_templat

13938 root 20 0 5316m 163m 14m R 10.2 0.5 261:24.46 z_itask_templat

16326 root 20 0 5316m 163m 14m R 10.2 0.5 150:24.53 z_itask_templat

6795 root 20 0 5316m 163m 14m R 10.2 0.5 100:26.77 z_itask_templat

27063 root 20 0 5316m 163m 14m R 9.9 0.5 337:18.77 z_itask_templat

27065 root 20 0 5316m 163m 14m R 9.9 0.5 314:24.17 z_itask_templat

27068 root 20 0 5316m 163m 14m R 9.9 0.5 336:32.78 z_itask_templat

27069 root 20 0 5316m 163m 14m R 9.9 0.5 338:55.08 z_itask_templat

27072 root 20 0 5316m 163m 14m R 9.9 0.5 306:46.08 z_itask_templat

27075 root 20 0 5316m 163m 14m R 9.9 0.5 316:49.51 z_itask_templat

...

3) process scheduling information

From the scheduling information of related threads: