Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

Example Analysis of automatic "capture" Kernel crash Log implemented by Linux pstore

2025-02-24 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Servers >

Share

Shulou(Shulou.com)05/31 Report--

Today, I will talk to you about the example analysis of the automatic "capture" kernel crash log implemented by Linux pstore. Many people may not know much about it. In order to make you understand better, the editor has summarized the following for you. I hope you can get something according to this article.

Brief introduction

The pstore file system (yes, it is a file system) is an acronym for Persistent Storage. It was first designed and integrated into the main branch of Linux by Tony Luck in 2010. It was originally designed to automatically dump kernel logs (log_buf) during kernel Panic/Oops, and to render the transferred logs as files to user space to analyze kernel crashes after Panic restart.

This is very useful for analyzing the problem of low probability and no way to catch the scene, especially when intelligent Internet devices are becoming more and more popular, remote devices can capture crash logs and transmit them to the server through the network, and maintenance personnel can locate and solve the problem according to the collected logs, and then let the equipment upgrade iteratively through OTA.

According to the information searched on the Internet, there are actually a lot of similar implementations before the pstore file system.

Apanic

Android's earliest scheme for recording panic information. Found in the Android kernel of linux 2.6, it was not submitted to the community and was later abandoned for maintenance. There is no reason to give up on the Internet. I guess it is because it is only applicable to mtd nand, but now Android basically uses emmc. Apanic should be an abbreviation for Android Panic, which allows you to dump logs to mtd nand in the event of a kernel crash.

Ramoops

This refers to the earliest ramoops implementation, where the latest code has been integrated into pstore and exists in the form of the back end of pstore/ram. Ramoops can transfer the log to a ram that does not restart the power down. There is a requirement for ram that even if you restart ram, the data cannot be lost.

Crashlog

This is the kernel patch provided by openwrt and is not submitted to the kernel community. It is also based on ram and can only be transferred to Panic/Oops logs.

Mtdoops

The function supported by the MTD subsystem is very similar to pstore. It only supports the rollover of Panic/Oops logs, but cannot be rendered as a file. Users are required to parse the entire MTD partition by themselves. (because of the similarity in functionality, I implemented mtdpstore as an alternative to mtdoops.)

Kdump

If pstore is a lightweight solution for kernel crash log rollover, kdump is a heavyweight problem analysis tool. In the event of a crash, kdump generates a kernel that captures current information, and the kernel collects all the information in memory into the dump core file. After restart, the captured information is saved in a specific file. Similarly, there are netdump and diskdump. Kdump's scheme is suitable for devices with a lot of resources, such as servers, and its function is also very powerful, but it is very unfriendly to embedded devices.

After a long iteration, pstore supports not only the log of Panic/Oops (dmesg front end), but also the front end of pmsg, console and ftrace. In addition to the back end of pstore/ram, there is also the pstore/blk back end that I designed. In addition to supporting rollover to ram, there are also block device and mtd device.

The front end of pstore refers to the type of log transferred, and the back end of pstore refers to the type of device to which it is transferred.

The following front ends are currently supported:

Dmesg: mainly the kernel log in log_buf when Panic/Oops is transferred.

Pmsg: provides access to user space to store logs. You can see logs used for the storage system in Android.

Console: terminal log

Information about ftrace:function trace

The following backends are supported:

Pstore/ram:Persistent Ram, restart the memory that will not lose data

Pstore/blk: (v5.8 onwards) all writable block devices, such as disks, USB drives, emmc, NFTL nand, etc.

Mtd device: (v5.8 later) mtd devices, such as mtd nand. (the support of mtd devices depends on the pstore/blk backend, which is not exactly a stand-alone backend)

How to use it

Just like three steps to put an elephant in the refrigerator, just open the refrigerator, put the elephant in, and close the refrigerator door, it only takes three steps to use pstore:

Enable pstore

Mount the pstore file system

Read the rollover log file

For detailed instructions, you can see the documentation on the source code. This article only introduces the basic functions.

Documentation/admin-guide/ramoops.rst

Documentation/admin-guide/pstore-blk.rst

Enable

Select kernel pstore module in menuconfig

$make menuconfig |-> File systems |-> Miscellaneous filesystems |-> Persistent store support |-> Log kernel console messages # console frontend |-> Log user space messages # pmsg frontend |-> Persistent function tracer # ftrace frontend |-> Log panic/oops to a RAM buffer # pstore/ram backend |-> Log panic/oops to a block device # pstore/blk backend

You can choose 1 of the above two backends, and the front end will choose according to your own needs. As for the dmesg front end, you can't choose it by default. If you want to use it on mtd devices, you also need to select the mtdpstore module:

$make menuconfig |-> Device Drivers |-> Memory Technology Device (MTD) support |-> Log panic/oops to an MTD buffer based on pstore

Can you use it if you choose it? Although I would very much like to say "yes", the truth is a bit "bony". Even if all front ends use the default configuration, does pstore/ram at least need to know the range of memory available? does pstore/blk at least need to know which block device to use?

Pstore/ram supports three configuration modes of module parameters (cmdline), device tree, and Platform Data. From the code point of view, the priority relationship is: module parameters > Platform Data > device tree.

Pstore/blk supports both Kconfig and module parameters (cmdline) configuration, and module parameters have a higher priority than Kconfig.

Pstore/ram I do not have much contact, directly introduce the use of pstore/blk. For new students, ignore a lot of messy attribute configurations (using default values) and just tell the pstore/blk back end which block device to use.

Configure in Kconfig:

$make menuconfig |-> File systems |-> Miscellaneous filesystems |-> Persistent store support |-> Log panic/oops to a block device # pstore/blk backend |-> () block device identifier # which block device is used?

If you use cmdline, you can write:

Pstore_blk.blkdev=XXXX

Or load as a module:

$sudo insmod pstore_blk.ko blkdev=XXX

The block device here can be a sda representing the entire disk or a mmcblk0p4 representing a partition. Although seven variants are supported, two are commonly used:

/ dev/: for example, the second partition that uses a flash drive is / dev/sdb2

For example, the sixth partition of the mmc device is 179VR 6

The form goes something like this:

$sudo insmod pstore_blk.ko blkdev=/dev/sdb2

Or

$cat / proc/cmdline.... Pstore_blk.blkdev=179:6...

If it is a mtd device, you can specify the mtd partition name or number directly, for example:

Pstore_blk.blkdev=pstore # assumes that there is a MTD partition named pstore

OK, for new students, it is enough to configure it here. You can see how I tested it before from my github (see reference link [2]). If you need to know what each configuration item does, look at the kernel documentation (ramoops.rst or pstore_blk.rst), or press h in Kconfig to display a description of the relevant configuration item.

Mounting

After the device is enabled and configured correctly, there should be a log like this at startup:

Pstore_zone: registered pstore_blk as backend for kmsg (Oops,panic_write) pstore: Registered pstore_blk as persistent store backend

This means that pstore found the device and registered normally. Next, we also need to trigger pstore to read data from the device in the form of mount. Common mounts look like this:

Mount-t pstore pstore / sys/fs/pstore

After mounting, you can see information like this through mount:

# mount... Pstore on / sys/fs/pstore type pstore (rw,relatime)...

If the crash log has ever been triggered, there should be a file like this at the mount point:

# ll / sys/fs/pstore...-Rafael Rafael-1 root root 15521 Jan 1 00:06 dmesg-pstore_blk-0.

If verification is required, we can actively trigger a kernel crash like this:

# echo c > / proc/sysrq-trigger

I verified it on U disk, SD card, mmc and nand. Maintainer Kees Cook provides another loop-based verification method to simulate block devices with files. Of course, this method does not apply to the rollover of Panic logs, but can only be used in Oops or other frontend:

# insmod pstore.ko compress=off # insmod pstore_zone.ko # truncate pstore-blk.raw-size 100m # losetup-f-show pstore-blk.raw / dev/loop0 # insmod pstore_blk.ko blkdev=/dev/loop0 kmsg_size=16 console_size=64 best_effort=on

Read

After the above mount, you can see the transferred log file at the mount point. Since it is a file, it certainly supports a series of operations of the file, such as reading and deleting.

Root@TinaLinux:/sys/fs/pstore# head-n 10 dmesg-pstore_blk-1 Oops: Total 2 times Oops#1 Part1 [2.743794] Bluetooth: RFCOMM socket layer initialized [2.743813] Bluetooth: RFCOMM ver 1.11 [2.743822] 8021q: 802.1Q VLAN Support v1.8 [2.751766] reg-virt-consumer reg-virt-consumer.1: Failed to obtain supply 'drivevbus':-517 [2.752330] reg-virt-consumer reg- Virt-consumer.1: Failed to obtain supply 'drivevbus':-517 [2.752742] ubi0: attaching mtd4 [2.890302] random: crng init done [2.965927] ubi0: scanning is finished root@TinaLinux:/sys/fs/pstore# ll drwxr-x--- 2 root root 0 Jan 1 00:11. Drwxr-xr-x 5 root root 0 Jan 1 00:11.. -root-1 root root 15521 Jan 1 00:06 dmesg-pstore_blk-0-r Murray-1 root root 15128 Jan 1 00:11 dmesg-pstore_blk-1 root@TinaLinux:/sys/fs/pstore# rm dmesg-pstore_blk-1 root@TinaLinux:/sys/fs/pstore# ll drwxr-x--- 2 root root 0 Jan 1 00:13. Drwxr-xr-x 5 root root 0 Jan 1 00:11.. -root 15521 Jan 1 00:06 dmesg-pstore_blk-0

For the Panic/Oops log at the front end of dmesg, pstore automatically adds two lines of statistics. For example:

Oops: Total 2 times # indicates that Oops has been triggered, and this is the second time that Oops has been triggered since the system was first started after installation. Oops#1 Part1 # indicates that this is the first log that triggers Oops during the last run.

As you can see, the first line is the cumulative total number of triggers, and the second line is the number of times the last trigger was initiated.

The format of each file name is-- for example, dmesg-pstore_blk-1 represents the dmesg front end, the pstore_blk back end, and the log of the first zone that is the dmesg front end.

Of course, with the exception of the dmesg front end, the names of the other front ends look something like this:

# ll-Rafaq root root-1 root root 15 11:53 console-pstore-blk-0-RQM-1 root root 3666 Jan 15 11:53 demsg-pstore-blk-0-Ruki Ruki-1 root root 65524 Jan 15 11:53 ftrace-pstore-blk-0-Rafael-1 root root 9 Jan 15 11:53 pmsg-pstore-blk-0

In addition, the timestamp of each file indicates the time at which the crash was triggered. In the above example, the timestamp is unreasonable because the system does not update the system time synchronously.

look into the future

As I said earlier, pstore can play an important role in the growing popularity of Internet of things devices, such as smart speakers and floor sweepers.

Full feature support

So far, the community code has failed to support all the front ends of pstore, whether it's block devices or mtd devices.

Device dmesg (Oops) dmesg (Panic) pmsgconsoleftrace Block device YNYYYMTD device YYNNNram device YYYYY

If a block device needs to record Panic logs, it needs to provide an interface to write the block device during Panic. I have implemented such an interface in the full mmc and nand drivers, but it is not suitable for submission to the community for a variety of reasons. Community block-driven adaptation depends on the efforts of more students.

The MTD device has been defined as panic_write () for a long time, so it can support Panic log rollover. Other front ends are not supported because of their physical properties of erasing and writing. More adaptation work is needed for front-ends that cannot be page-aligned, such as pmsg,console,ftrace.

Migrate pstore/ram

The directory structure of the current pstore is as follows:

The implementation of $tree fs/pstore fs/pstore/ ├── blk.c # pstore/blk backend the implementation of ├── ftrace.c # ftrace front-end implementation ├── inode.c # pstore file system registration and operation ├── internal.h ├── Kconfig ├── Makefile ├── platform.c # pstore front-end core ├── pmsg.c # pmsg front-end implementation ├── ram. C # pstore/ram backend implementation ├── ram_core.c # pstore/ram backend implementation └── zone.c # pstore/zone realizes the allocation and management of storage space

Prior to my patch, only log rollover to ram was supported, so if we read the code, we will find that ram.c and ram_core.c implement two parts:

Dram space allocation and management

Read and write operation of dram

The blk.c I implemented supports rollover to block devices. But later I found that regardless of pstore/ram or pstore/blk, their allocation and management of storage space were very similar, so I extracted pstore/zone. Therefore, the expected level of code should be as follows:

Pstore/ram to be integrated into pstore/zone has reached a consensus with maintainer, but more students need to work together to do more compatibility, such as the support of ecc.

After reading the above, do you have any further understanding of Linux pstore's implementation of automatic "capture" kernel crash log instance analysis? If you want to know more knowledge or related content, please follow the industry information channel, thank you for your support.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Servers

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report