Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

How to understand Watchdog in Linux Kernel

2025-01-18 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Servers >

Share

Shulou(Shulou.com)06/01 Report--

In this issue, the editor will bring you about how to understand Watchdog in the Linux kernel. The article is rich in content and analyzes and narrates it from a professional point of view. I hope you can get something after reading this article.

There are three watchdog in the Linux kernel, all of which need to be carefully fed and cared for, namely:

1. / dev/watchdog

2.softlockup detection mechanism

3.hardlockup detection mechanism

First of all, let's take a look at 1./dev/watchdog. How to feed this watchdog? there is a sample code in the linux kernel:

Samples/watchdog/watchdog-simple.c / / SPDX-License-Identifier: GPL-2.0 # include # include int main (void) {int fd = open ("/ dev/watchdog", O_WRONLY); int ret = 0; if (fd = =-1) {perror ("watchdog"); exit (EXIT_FAILURE) } while (1) {ret = write (fd, "\ 0", 1); if (ret! = 1) {ret =-1; break;} sleep (10) } close (fd); return ret;}

In this example, 0 is written to the "/ dev/watchdog" file every 10 seconds. This is the dog feeding process. Seeing this example, I don't seem to feel the great usefulness of this watchdog, but it is too useful in practical projects, for example:

The central bank of a certain country runs a database program on a Linux server with 4T memory and 320 cpu cores, and the database stores the bank account information of all his own people. when the database program is running, there is an IO reading and writing error, or the program bug is stuck, then his own people will not be able to save money and transfer money, and the whole national economy will be paralyzed in an instant.

At this point, think about whether the Linux system has any mechanism to solve this problem, and then "/ dev/watchdog" is coming.

At this time, you only need to add a sample program similar to the above in the database program, and feed the dog every 10 seconds.

As long as the database program is stuck and cannot be fed to the dog after it is stuck, for example, after the default 60s, the dog will go on strike and immediately trigger the server restart by default.

The restart of the server will reload the database program, or in the process of restarting the server, the server loses contact with the server cluster, which triggers the split-brain detection in the cluster and moves the database program to other devices in the cluster. at this time, a lot of losses are reduced. So this dog / dev/watchdog is too useful.

Let's take a look at how it works:

# ps-ef | grep watchdog root 104 20 2020? 00:00:00 [watchdogd] # ls-l / dev/watchdog* crw- 1 root root 10,130 Dec 30 20:04 / dev/watchdog crw- 1 root root 247,0 Dec 30 20:04 / dev/watchdog0

See that there is a kernel thread watchdogd in the system, and two character files: / dev/watchdog and / dev/watchdog0

The watchdogd real-time scheduling thread is responsible for specific execution of dog feeding. / dev/watchdog is a general operation interface file provided by the kernel to the user layer, which is used to open the dog, feed the dog, query status, and so on. / dev/watchdog0 is a specific dog implementation, which can be implemented based on a specific physical device, or the softdog kernel module simulates the hardware implementation in a software way (specific usage: modprobe softdog).

Let's take a look at how the softdog kernel module simulates hardware to implement this function:

Static int _ init softdog_init (void) hrtimer_init (& softdog_ticktock, CLOCK_MONOTONIC, HRTIMER_MODE_REL); softdog_ticktock.function = softdog_fire; static enum hrtimer_restart softdog_fire (struct hrtimer * timer) emergency_restart () Static int softdog_ping (struct watchdog_device * w) hrtimer_start (& softdog_ticktock, ktime_set (w-> timeout, 0), (60s) HRTIMER_MODE_REL)

From the point of view of the code implementation, it is easy to understand that after the watchdog (open "/ dev/watchdog") is enabled, the system restart will be triggered after 60s by default. During the 60s countdown, only after feeding the dog (softdog_ping) once will it return to 60s before it will trigger the system restart, so as long as the dog is continuously fed, emergency_restart () will not be executed and the system will not restart.

Let's take a look at the 2.softlockup detection mechanism and 3.hardlockup detection mechanism.

The dog feeding method of the softlockup detection mechanism is that the hrtimer on each cpu wakes up a migration/N kernel thread, and the migration/N resets a timestamp each time it is awakened.

The dog feeding method of the hardlockup detection mechanism is that hrtimer increments a variable each time it is executed.

With regard to the specific principle implementation and application scenarios of softlockup and hardlockup detection mechanism, I recently released a video "Linux Common Lock and lockup checking Mechanism", which includes implementation principle (linux kernel code layer) and principle verification (using ftrace debugging means), sample code, hands-on simulation experiments, we can fully understand softlockup/hardlockup.

This is how the editor shares how to understand Watchdog in the Linux kernel. If you happen to have similar doubts, you might as well refer to the above analysis. If you want to know more about it, you are welcome to follow the industry information channel.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Servers

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report