In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-03-28 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Servers >
Share
Shulou(Shulou.com)05/31 Report--
This article introduces the relevant knowledge of "how to solve the Linux kernel Bug". Many people will encounter such a dilemma in the operation of actual cases, so let the editor lead you to learn how to deal with these situations. I hope you can read it carefully and be able to achieve something!
A bug, someone upgrades the kernel to V5.10-rcX, and after executing dd and sync, he sees that the process has hang residency:
Some guys internally upgraded to v5.10-rcX and start to see a hang after dd + sync for a large file:-mount / dev/sda1 (ext4 filesystem) to directory / mnt;-run "if=/dev/zero of=test1 bs=1M count=2000" on directory / mnt;-run "sync"
You are prompted to print stack information for the hang resident process:
And get: [367.912761] INFO: task jbd2/sdb1-8 blocked for more than 3602 seconds. [367.919618] Not tainted 5.10.0-rc1-109488-g32ded76956b6 # 948 [367.925776] "echo 0 > / proc/sys/kernel/hung_task_timeout_secs" disables this message [367.933579] task:jbd2/sdb1-8 state:D stack: 0 pid: 3602 ppid: 2 flags:0x00000028 [367.941901] Call trace: [367.944351] _ switch_to+0xb8/0x168 [367.947840] _ schedule+0x30c/0x670 [367.951326] schedule+0x70/0x108 [367.954550] io_schedule+0x1c/0xe8 [367.957948] bit_wait_io+0x18/0x68 [367.961346] _ wait_on_bit+0x78/0xf0 [ 367.964919] out_of_line_wait_on_bit+0x8c/0xb0 [367.969356] _ _ wait_on_buffer+0x30/0x40 [367.973188] jbd2_journal_commit_transaction+0x1370/0x1958 [367.978661] kjournald2+0xcc/0x260 [367.982061] kthread+0x150/0x158 [367.985288] ret_from_fork+0x10/0x34 [367.988860] INFO: task sync:3823 blocked for more than 120 seconds. [367.995102] Not tainted 5.10.0-rc1-109488-g32ded76956b6 # 948 [368.001265] "echo 0 > / proc/sys/kernel/hung_task_timeout_secs" disables this message [368.009067] task:sync state:D stack: 0 pid: 3823 ppid: 3450 flags:0x00000009 [368.017397] Call trace: [368.019841] _ _ switch_to+0xb8/0x168 [368.023320] _ _ schedule+0x30c/0x670 [368.026804] schedule+0x70/0x108 [368.030025] jbd2_log_wait_commit+0xbc/0x158 [368.034290] ext4_sync_fs+0x188/0x1c8 [368.037947] sync_fs_one_sb+0x30/0x40 [ 368.041606] iterate_supers+0x9c/0x138 [368.045350] ksys_sync+0x64/0xc0 [368.048569] _ _ arm64_sys_sync+0x10/0x20 [368.052398] el0_svc_common.constprop.3+0x68/0x170 [368.057177] do_el0_svc+0x24/0x90 [368.060482] el0_sync_handler+0x118/0x168 [368.064478] el0_sync+0x158/0x180
And reported that the dispatch queue and the completion queue on the No. 100 cpu bound to the No. 8 hard queue were inconsistent, but there was no inflight. Because of this situation, I was distressed for three days and nights, and my hair fell all over the keyboard.
Estuary:/sys/kernel/debug/block/sda/hctx8 $cat cpu100/dispatched 3 0 estuary:/sys/kernel/debug/block/sda/hctx8 $cat cpu100/completed 20 On cpu100, it seems completed is less than number dispatched. / sys/devices/pci0000:74/0000:74:02.0/host0/port-0:0/end_device-0:0/target0:0:0/0:0:0:0/block/sda/sda1/inflight), the number of inflight is 0.
Then Ming Lei (bring your own stereo) gives advice that you need to collect the output of all the files in the / sys/kernel/debug/block/sda directory and / sys/block/sda/device directory:
Hello chenxiang, Can you collect the debugfs log via the following commands after the io hang is triggered? 1) debugfs log: (cd / sys/kernel/debug/block/sda & & find. -type f-exec grep-aH. 2) scsi sysfs info: (cd / sys/block/sda/device & & find. -type f-exec grep-aH. {}\;) Suppose the disk is / dev/sda.
Chenxiang feedback:
~ $cd / sys/kernel/debug/block/sdb & & find. -type f-exec grep-aH. {}\ .... / hctx9/tags:cleared=3891. / hctx9/tags:bits_per_word=64. / hctx9/tags:map_nr=63. / hctx9/tags:alloc_hint= {3264, 3265, 0, 3731, 2462, 842, 0, 0, 1278, 27, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 2424, 0, 0, 0, 346, 3, 3191, 235, 0, 0, 0, 0, 0, 0, 0, 0 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 88, 0, 0, 285, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1165, 538, 0, 372, 277, 3476, 0, 0, 0, 111, 0, 2081 0112,0,0,0,0,904,0,0,0,0,113,0,0,0,0,0,0,0,321,0}. / hctx9/tags:wake_batch=8. / hctx9/tags:wake_index=7. / hctx9/tags:ws= {. / hctx9/tags: {.wait _ cnt=8, .wait = inactive},. / hctx9/tags: {.wait _ cnt=8, .wait = inactive},. / hctx9/tags: {.wait_cnt=8 .wait = inactive},. / hctx9/tags: {.wait _ cnt=8, .wait = inactive},. / hctx9/tags: {.wait _ cnt=8, .wait = inactive},. / hctx9/tags: {.wait _ cnt=8, .wait = inactive},. / hctx9/tags: {.wait _ cnt=8, .wait = inactive},. / hctx9/tags: {.wait _ cnt=8, .wait = inactive} . / hctx9/tags:}. / hctx9/tags:round_robin=1. / hctx9/tags:min_shallow_depth=4294967295. / hctx9/ctx_map:00000000: 00...
Ming Lei took a look at the debug output, had a sip of Dahongpao, and directly suggested to test the patch:
Please try the following patch: diff-- git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c index 60c7a7d74852..03c6d0620bfd 100644-a/drivers/scsi/scsi_lib.c + b/drivers/scsi/scsi_lib.c @ @-1703 7 @ @ static blk_status_t scsi_queue_rq (struct blk_mq_hw_ctx * hctx, break) Case BLK_STS_RESOURCE: case BLK_STS_ZONE_RESOURCE:-if (atomic_read (& sdev- > device_busy) | |-scsi_device_blocked (sdev) + if (scsi_device_blocked (sdev)) ret = BLK_STS_DEV_RESOURCE; break; default:
After the feedback from the chenxiang test, the problem was solved, and the heart was filled with the pure joy of the Linux practitioner, and his eyes were full of reverence, eager to make a promise:
I have tested the patch 100 + times on two envirments (on which the issue occurs frequently before), and the issue doesn't occur. "how to solve the problem of Linux kernel Bug" is introduced here. Thank you for reading. If you want to know more about the industry, you can follow the website, the editor will output more high-quality practical articles for you!
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.