Kernel development can be very time consuming, not only for the time required to compile the kernel itself (especially with a beefy .config), but also the tasks of deploying, testing and debugging represent a large portion of the work.
In this article we will explore a slightly different approach that allows to speed up kernel development, testing and debugging using QEMU/KVM.
Some times we can speed up this process modularizing the portion of code that we need to analize. However, this approach is not always doable (we may not know yet the specific portion of code to debug or simply it's just not possible to modularize it - i.e., core kernel components).
Moreover, working with a kernel module can potentially crash or seriously compromise the system, causing loss of work and extra work to reboot and restore the previous session.
Using a virtualized environment is definitely better from this point of view: the testing environment can be easily re-deployed, restarted and easily resumed in case of failures. Moreover, the development environment is never compromised.
The first tool that we are going to use is virtme (https://github.com/amluto/virtme).
This tool is a wrapper on top of QEMU/KVM that allows to quickly spin up an instance on a kernel build directory, creating a live sandbox of a just compiled kernel that shares the $HOME directory (in read-only) with the host.
This gives the big advantage of being able to easily deploy additional files into the testing instance, simply by copying them into a folder inside your $HOME.
We are going to create a helper script on top of virtme to quickly start a kernel to test:
righiandr@xps-13:~$ cat bin/kernel-test
#!/bin/bash
virtme-run --kdir . $* -a "nokaslr" --qemu-opts -m 1024 -smp 4 -s -qmp tcp:localhost:4444,server,nowait
righiandr@xps-13:~$
Assuming we have just compiled the kernel in $HOME/linux, we can quickly start it using:
righiandr@xps-13:~/linux$ kernel-test
[ 0.000000] Linux version 5.3.0-rc3+ (righiandr@xps-13) (gcc version 9.1.0 (Ubuntu 9.1.0-9ubuntu2)) #43 SMP Sun Aug 11 08:46:51 CEST 2019
[ 0.000000] Command line: earlyprintk=serial,ttyS0,115200 console=ttyS0 psmouse.proto=exps "virtme_stty_con=rows 28 cols 104 iutf8" TERM=xterm-256color rootfstype=9p rootflags=version=9p2000.L,trans=virtio,access=any raid=noautodetect ro nokaslr init=/bin/sh -- -c "mount -t tmpfs run /run;mkdir -p /run/virtme/guesttools;/bin/mount -n -t 9p -o ro,version=9p2000.L,trans=virtio,access=any virtme.guesttools /run/virtme/guesttools;exec /run/virtme/guesttools/virtme-init"
...
virtme-init: console is ttyS0
root@(none):/#
Now we can start running our tests using the new kernel ("CTRL+a x" to close the session).
We can also assign a new block device to virtual instance, example (assign a 1GB raw disk):
righiandr@xps-13:~/linux$ dd if=/dev/zero of=/tmp/disk.img bs=1M count=1024
1024+0 records in
1024+0 records out
1073741824 bytes (1.1 GB, 1.0 GiB) copied, 0.414057 s, 2.6 GB/s
righiandr@xps-13:~/linux$ kernel-test --disk "disk1=/tmp/disk.img"
[ 0.000000] Linux version 5.3.0-rc3+ (righiandr@xps-13) (gcc version 9.1.0 (Ubuntu 9.1.0-9ubuntu2)) #43 SMP Sun Aug 11 08:46:51 CEST 2019
...
virtme-init: console is ttyS0
root@(none):/# fdisk -l /dev/sda
Disk /dev/sda: 1 GiB, 1073741824 bytes, 2097152 sectors
Disk model: disk
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
root@(none):/#
Having a separate disk dedicated to the instance is really useful to test I/O / filesystem features.
A downside of using "virtme --kdir" is that we're going to miss the kernel
module directory in the usual location /lib/modules/uname -r
. For example, if
we try to load xfs (compiled as an external module), we get the following
error:
root@(none):/# modprobe xfs
modprobe: ERROR: ../libkmod/libkmod.c:586 kmod_search_moddep() could not open moddep file '/lib/modules/5.3.0-rc3+/modules.dep.bin'
modprobe: FATAL: Module xfs not found in directory /lib/modules/5.3.0-rc3+
To resolve this limitation we can follow the following simple steps:
Install the kernel modules in a temporary directory in $HOME (for example /home/righiandr/tmp/kmod):
righiandr@xps-13:~/linux$ make modules_install INSTALL_MOD_PATH=~/tmp/kmod
Start the instance:
righiandr@xps-13:~/linux$ kernel-test --disk "disk1=/tmp/disk.img"
...
virtme-init: console is ttyS0
root@(none):/#
Mount (bind) the temporary kernel modules directory to the standard module path (run this inside the instance):
root@(none):/# mount --bind /home/righiandr/tmp/kmod/lib/modules /lib/modules
At this point the kernel is able to load external modules as usual, example:
root@(none):/# mount --bind /home/righiandr/tmp/kmod/lib/modules/ /lib/modules
root@(none):/# modprobe xfs
[ 17.257358] SGI XFS with security attributes, no debug enabled
root@(none):/#
Now let's see how we can track down a soft lockup bug (explicitly added) in xfs. Let's assume we have explicitly introduced a soft lockup bug in xfs and we have an easy way to reproduce the bug:
root@(none):/# mkfs.xfs /dev/sda
meta-data=/dev/sda isize=512 agcount=4, agsize=65536 blks
= sectsz=512 attr=2, projid32bit=1
= crc=1 finobt=1, sparse=1, rmapbt=0
= reflink=0
data = bsize=4096 blocks=262144, imaxpct=25
= sunit=0 swidth=0 blks
naming =version 2 bsize=4096 ascii-ci=0, ftype=1
log =internal log bsize=4096 blocks=2560, version=2
= sectsz=512 sunit=0 blks, lazy-count=1
realtime =none extsz=4096 blocks=0, rtextents=0
root@(none):/# mount /dev/sda /mnt
[ 293.523348] XFS (sda): Mounting V5 Filesystem
[ 293.529194] XFS (sda): Ending clean mount
root@(none):/# df /mnt/
[ 324.109386] watchdog: BUG: soft lockup - CPU#0 stuck for 22s! [df:328]
[ 324.110158] Modules linked in: xfs
[ 324.110522] irq event stamp: 64386
[ 324.110896] hardirqs last enabled at (64385): [<ffffffff810038fa>] trace_hardirqs_on_thunk+0x1a/0x20
[ 324.111843] hardirqs last disabled at (64386): [<ffffffff8100391a>] trace_hardirqs_off_thunk+0x1a/0x20
[ 324.112756] softirqs last enabled at (64384): [<ffffffff81e00338>] __do_softirq+0x338/0x435
[ 324.113592] softirqs last disabled at (64377): [<ffffffff810a03ae>] irq_exit+0xbe/0xd0
[ 324.114377] CPU: 0 PID: 328 Comm: df Not tainted 5.3.0-rc3+ #43
[ 324.114940] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.12.0-1 04/01/2014
[ 324.115820] RIP: 0010:xfs_fs_statfs+0x87/0x90 [xfs]
[ 324.116290] Code: 38 e8 7d b5 52 e1 48 8d bb d0 01 00 00 e8 71 b5 52 e1 48 8d bb 38 02 00 00 e8 65 b5 52 e1 48 8d bb 20 01 00 00 e8 49 fa af e1 <eb> fe 0f 1f 80 00 00 00 00 0f 1f 44 00 00 85 f6 75 03 31 c0 c3 55
[ 324.118115] RSP: 0018:ffffc90000217df0 EFLAGS: 00000282 ORIG_RAX: ffffffffffffff13
[ 324.118829] RAX: 0000000000000000 RBX: ffff88803ba4e000 RCX: 0000000000000000
[ 324.119506] RDX: ffff88803dc176e0 RSI: 8888888888888889 RDI: 0000000000000246
[ 324.120214] RBP: ffffc90000217df8 R08: 0000000000000000 R09: 0000000000000001
[ 324.120943] R10: 0000000000000001 R11: 0000000000000001 R12: ffff88803b3ec640
[ 324.121673] R13: ffffc90000217e90 R14: 0000000000000002 R15: 0000000000000000
[ 324.122409] FS: 00007fd7c6ccf580(0000) GS:ffff88803dc00000(0000) knlGS:0000000000000000
[ 324.123239] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 324.123832] CR2: 00007fd7c6bc20d0 CR3: 000000003a91a002 CR4: 0000000000360ef0
[ 324.124567] Call Trace:
[ 324.124841] statfs_by_dentry+0x73/0xa0
[ 324.125245] vfs_statfs+0x1b/0xc0
[ 324.125592] user_statfs+0x5b/0xb0
[ 324.125956] __do_sys_statfs+0x28/0x60
[ 324.126357] __x64_sys_statfs+0x16/0x20
[ 324.126766] do_syscall_64+0x65/0x1d0
[ 324.127153] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 324.127679] RIP: 0033:0x7fd7c6bee1fb
[ 324.128053] Code: c3 66 0f 1f 44 00 00 48 8b 05 91 8c 0d 00 64 c7 00 16 00 00 00 b8 ff ff ff ff c3 0f 1f 40 00 f3 0f 1e fa b8 89 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 65 8c 0d 00 f7 d8 64 89 01 48
[ 324.129961] RSP: 002b:00007ffe140f5ee8 EFLAGS: 00000246 ORIG_RAX: 0000000000000089
[ 324.130739] RAX: ffffffffffffffda RBX: 00007ffe140f7f7f RCX: 00007fd7c6bee1fb
[ 324.131473] RDX: 00000000ffffffff RSI: 00007ffe140f5ef0 RDI: 00007ffe140f7f7f
[ 324.132206] RBP: 00007ffe140f5ef0 R08: 00007ffe140f6013 R09: 0000000000000032
[ 324.132940] R10: 0000000000000000 R11: 0000000000000246 R12: 00007ffe140f5f90
[ 324.133672] R13: 0000000000000000 R14: 000056187bdb37c0 R15: 000056187bdb3780
At this point the system is unresponsive due to the soft lockup, but we got a kernel oops at least that already tells a lot about the bug. However we are unable to run any other command inside the instance.
In cases like this it can be pretty useful to generate a memory dump of the instance and use a separate tool (i.e., crash) to analize the state of the system after the crash.
In our environment we can use the QEMU Machine Protocol (QMP) to generate a memory dump. If we pay attention to the options used in kernel-test to spin up the QEMU/KVM instance (using virtme-run) we can see that we appended the following: "-qmp tcp:localhost:4444,server,nowait". With these options QEMU/KVM starts a QMP server listening on port 4444 that can accept QMP commands (see also https://wiki.qemu.org/Documentation/QMP).
To easily generate a memory dump of our test instance we are going to create an additional helper script called kernel-dump:
righiandr@xps-13:~$ cat bin/kernel-dump
#!/usr/bin/expect -f
if { $argc < 1 } {
send_user "usage: kernel-dump vmcore.img\n"
exit
}
set out_file [lindex $argv 0]
spawn telnet localhost 4444
send "{ \"execute\": \"qmp_capabilities\" }\r"
expect "{\"return\": {}}"
send "{\"execute\":\"dump-guest-memory\",\"arguments\":{\"paging\":false,\"protocol\":\"file:$out_file\"}}\r"
expect "{\"return\": {}}"
Now we can simply use this helper script to generate the memory dump in /tmp/vmcore.img:
righiandr@xps-13:~/linux$ kernel-dump /tmp/vmcore.img
spawn telnet localhost 4444
{ "execute": "qmp_capabilities" }
Trying 127.0.0.1...
Connected to localhost.
Escape character is '^]'.
{"QMP": {"version": {"qemu": {"micro": 0, "minor": 0, "major": 4}, "package": "Debian 1:4.0+dfsg-0ubuntu5"}, "capabilities": ["oob"]}}
{"return": {}}
{"execute":"dump-guest-memory","arguments":{"paging":false,"protocol":"file:/tmp/vmcore.img"}}
{"timestamp": {"seconds": 1565520296, "microseconds": 278432}, "event": "STOP"}
{"timestamp": {"seconds": 1565520296, "microseconds": 963140}, "event": "DUMP_COMPLETED", "data": {"result": {"total": 1074003968, "status": "completed", "completed": 1074003968}}}
{"timestamp": {"seconds": 1565520296, "microseconds": 963240}, "event": "RESUME"}
{"return": {}}
At this point we have a regular memory dump (similar to "virsh dump --memory-only" if you are more familiar with libvirt instances) and we can use "crash" to analize it:
righiandr@xps-13:~/linux$ crash /tmp/vmcore.img vmlinux
...
This GDB was configured as "x86_64-unknown-linux-gnu"...
KERNEL: vmlinux
DUMPFILE: /tmp/vmcore.img
CPUS: 4
DATE: Sun Aug 11 12:44:55 2019
UPTIME: 00:05:10
LOAD AVERAGE: 1.55, 0.41, 0.14
TASKS: 86
NODENAME: (none)
RELEASE: 5.3.0-rc3+
VERSION: #43 SMP Sun Aug 11 08:46:51 CEST 2019
MACHINE: x86_64 (1992 Mhz)
MEMORY: 1 GB
PANIC: ""
PID: 0
COMMAND: "swapper/0"
TASK: ffffffff8261b800 (1 of 4) [THREAD_INFO: ffffffff8261b800]
CPU: 0
STATE: TASK_RUNNING
WARNING: panic task not found
crash>
Now we are able to dig into the system even after the crash, for example we can still look at the kernel log (running dmesg), list the processes (ps) and even get a backtrace of a target PID:
crash> dmesg | tail -36
[ 324.109386] watchdog: BUG: soft lockup - CPU#0 stuck for 22s! [df:328]
[ 324.110158] Modules linked in: xfs
[ 324.110522] irq event stamp: 64386
[ 324.110896] hardirqs last enabled at (64385): [<ffffffff810038fa>] trace_hardirqs_on_thunk+0x1a/0x20
[ 324.111843] hardirqs last disabled at (64386): [<ffffffff8100391a>] trace_hardirqs_off_thunk+0x1a/0x20
[ 324.112756] softirqs last enabled at (64384): [<ffffffff81e00338>] __do_softirq+0x338/0x435
[ 324.113592] softirqs last disabled at (64377): [<ffffffff810a03ae>] irq_exit+0xbe/0xd0
[ 324.114377] CPU: 0 PID: 328 Comm: df Not tainted 5.3.0-rc3+ #43
[ 324.114940] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.12.0-1 04/01/2014
[ 324.115820] RIP: 0010:xfs_fs_statfs+0x87/0x90 [xfs]
[ 324.116290] Code: 38 e8 7d b5 52 e1 48 8d bb d0 01 00 00 e8 71 b5 52 e1 48 8d bb 38 02 00 00 e8 65 b5 52 e1 48 8d bb 20 01 00 00 e8 49 fa af e1 <eb> fe 0f 1f 80 00 00 00 00 0f 1f 44 00 00 85 f6 75 03 31 c0 c3 55
[ 324.118115] RSP: 0018:ffffc90000217df0 EFLAGS: 00000282 ORIG_RAX: ffffffffffffff13
[ 324.118829] RAX: 0000000000000000 RBX: ffff88803ba4e000 RCX: 0000000000000000
[ 324.119506] RDX: ffff88803dc176e0 RSI: 8888888888888889 RDI: 0000000000000246
[ 324.120214] RBP: ffffc90000217df8 R08: 0000000000000000 R09: 0000000000000001
[ 324.120943] R10: 0000000000000001 R11: 0000000000000001 R12: ffff88803b3ec640
[ 324.121673] R13: ffffc90000217e90 R14: 0000000000000002 R15: 0000000000000000
[ 324.122409] FS: 00007fd7c6ccf580(0000) GS:ffff88803dc00000(0000) knlGS:0000000000000000
[ 324.123239] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 324.123832] CR2: 00007fd7c6bc20d0 CR3: 000000003a91a002 CR4: 0000000000360ef0
[ 324.124567] Call Trace:
[ 324.124841] statfs_by_dentry+0x73/0xa0
[ 324.125245] vfs_statfs+0x1b/0xc0
[ 324.125592] user_statfs+0x5b/0xb0
[ 324.125956] __do_sys_statfs+0x28/0x60
[ 324.126357] __x64_sys_statfs+0x16/0x20
[ 324.126766] do_syscall_64+0x65/0x1d0
[ 324.127153] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 324.127679] RIP: 0033:0x7fd7c6bee1fb
[ 324.128053] Code: c3 66 0f 1f 44 00 00 48 8b 05 91 8c 0d 00 64 c7 00 16 00 00 00 b8 ff ff ff ff c3 0f 1f 40 00 f3 0f 1e fa b8 89 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 65 8c 0d 00 f7 d8 64 89 01 48
[ 324.129961] RSP: 002b:00007ffe140f5ee8 EFLAGS: 00000246 ORIG_RAX: 0000000000000089
[ 324.130739] RAX: ffffffffffffffda RBX: 00007ffe140f7f7f RCX: 00007fd7c6bee1fb
[ 324.131473] RDX: 00000000ffffffff RSI: 00007ffe140f5ef0 RDI: 00007ffe140f7f7f
[ 324.132206] RBP: 00007ffe140f5ef0 R08: 00007ffe140f6013 R09: 0000000000000032
[ 324.132940] R10: 0000000000000000 R11: 0000000000000246 R12: 00007ffe140f5f90
[ 324.133672] R13: 0000000000000000 R14: 000056187bdb37c0 R15: 000056187bdb3780
crash>
crash> ps
PID PPID CPU TASK ST %MEM VSZ RSS COMM
0 0 0 ffffffff8261b800 RU 0.0 0 0 [swapper/0]
> 0 0 1 ffff88803da48000 RU 0.0 0 0 [swapper/1]
> 0 0 2 ffff88803da4b2c0 RU 0.0 0 0 [swapper/2]
> 0 0 3 ffff88803da532c0 RU 0.0 0 0 [swapper/3]
1 0 0 ffff88803d9c8000 IN 0.2 4084 2028 virtme-init
2 0 3 ffff88803d9cb2c0 IN 0.0 0 0 [kthreadd]
3 2 0 ffff88803d9eb2c0 ID 0.0 0 0 [rcu_gp]
4 2 0 ffff88803d9e8000 ID 0.0 0 0 [rcu_par_gp]
5 2 0 ffff88803d9f0000 ID 0.0 0 0 [kworker/0:0]
6 2 0 ffff88803d9f32c0 ID 0.0 0 0 [kworker/0:0H]
8 2 0 ffff88803d9fb2c0 ID 0.0 0 0 [mm_percpu_wq]
9 2 0 ffff88803da00000 IN 0.0 0 0 [ksoftirqd/0]
10 2 2 ffff88803da032c0 ID 0.0 0 0 [rcu_sched]
11 2 0 ffff88803da08000 RU 0.0 0 0 [migration/0]
12 2 0 ffff88803da0b2c0 RU 0.0 0 0 [kworker/0:1]
13 2 0 ffff88803da50000 IN 0.0 0 0 [cpuhp/0]
14 2 1 ffff88803da632c0 IN 0.0 0 0 [cpuhp/1]
15 2 1 ffff88803da60000 IN 0.0 0 0 [migration/1]
16 2 1 ffff88803da68000 IN 0.0 0 0 [ksoftirqd/1]
18 2 1 ffff88803da732c0 ID 0.0 0 0 [kworker/1:0H]
19 2 2 ffff88803da70000 IN 0.0 0 0 [cpuhp/2]
20 2 2 ffff88803da80000 IN 0.0 0 0 [migration/2]
21 2 2 ffff88803da832c0 IN 0.0 0 0 [ksoftirqd/2]
22 2 2 ffff88803dad32c0 ID 0.0 0 0 [kworker/2:0]
23 2 2 ffff88803dad0000 ID 0.0 0 0 [kworker/2:0H]
24 2 3 ffff88803dad8000 IN 0.0 0 0 [cpuhp/3]
25 2 3 ffff88803dadb2c0 IN 0.0 0 0 [migration/3]
26 2 3 ffff88803dae32c0 IN 0.0 0 0 [ksoftirqd/3]
27 2 3 ffff88803dae0000 ID 0.0 0 0 [kworker/3:0]
28 2 3 ffff88803daf0000 ID 0.0 0 0 [kworker/3:0H]
29 2 1 ffff88803daf32c0 IN 0.0 0 0 [kdevtmpfs]
30 2 1 ffff88803dbab2c0 ID 0.0 0 0 [netns]
31 2 2 ffff88803dba8000 IN 0.0 0 0 [kauditd]
32 2 2 ffff88803d0732c0 IN 0.0 0 0 [khungtaskd]
33 2 2 ffff88803d070000 IN 0.0 0 0 [oom_reaper]
34 2 2 ffff88803d078000 ID 0.0 0 0 [writeback]
35 2 3 ffff88803d07b2c0 IN 0.0 0 0 [kcompactd0]
36 2 3 ffff88803d0c32c0 IN 0.0 0 0 [ksmd]
37 2 3 ffff88803d0c0000 IN 0.0 0 0 [khugepaged]
39 2 2 ffff88803d0cb2c0 ID 0.0 0 0 [kworker/u8:1]
43 2 3 ffff88803d138000 ID 0.0 0 0 [kworker/3:1]
48 2 1 ffff88803d150000 ID 0.0 0 0 [kworker/1:1]
131 2 1 ffff88803d370000 ID 0.0 0 0 [kintegrityd]
132 2 0 ffff88803d3932c0 ID 0.0 0 0 [kblockd]
133 2 2 ffff88803d390000 ID 0.0 0 0 [blkcg_punt_bio]
134 2 3 ffff88803d0c8000 ID 0.0 0 0 [tpm_dev_wq]
135 2 1 ffff88803d158000 ID 0.0 0 0 [ata_sff]
136 2 0 ffff88803d15b2c0 ID 0.0 0 0 [md]
137 2 2 ffff88803d128000 ID 0.0 0 0 [edac-poller]
138 2 3 ffff88803d12b2c0 ID 0.0 0 0 [devfreq_wq]
139 2 0 ffff88803d18b2c0 IN 0.0 0 0 [watchdogd]
140 2 2 ffff88803d188000 ID 0.0 0 0 [kworker/2:1]
143 2 1 ffff88803d17b2c0 IN 0.0 0 0 [kswapd0]
144 2 0 ffff88803d168000 ID 0.0 0 0 [kworker/u9:0]
145 2 1 ffff88803d16b2c0 IN 0.0 0 0 [ecryptfs-kthrea]
148 2 1 ffff88803d13b2c0 ID 0.0 0 0 [kthrotld]
149 2 2 ffff88803bcab2c0 ID 0.0 0 0 [acpi_thermal_pm]
150 2 3 ffff88803bca8000 IN 0.0 0 0 [scsi_eh_0]
151 2 2 ffff88803bf88000 ID 0.0 0 0 [scsi_tmf_0]
152 2 1 ffff88803bf8b2c0 ID 0.0 0 0 [nvme-wq]
153 2 3 ffff88803b8a0000 ID 0.0 0 0 [nvme-reset-wq]
154 2 3 ffff88803b8a32c0 ID 0.0 0 0 [nvme-delete-wq]
155 2 3 ffff88803b920000 IN 0.0 0 0 [scsi_eh_1]
156 2 3 ffff88803b9232c0 ID 0.0 0 0 [scsi_tmf_1]
157 2 3 ffff88803b9432c0 IN 0.0 0 0 [scsi_eh_2]
158 2 1 ffff88803b940000 ID 0.0 0 0 [scsi_tmf_2]
159 2 2 ffff88803b9932c0 ID 0.0 0 0 [kworker/u8:2]
161 2 0 ffff88803ba38000 ID 0.0 0 0 [ipv6_addrconf]
173 2 2 ffff88803ba532c0 ID 0.0 0 0 [kstrp]
192 2 2 ffff88803bb9b2c0 ID 0.0 0 0 [charger_manager]
193 2 2 ffff88803bb98000 ID 0.0 0 0 [kworker/2:1H]
213 1 1 ffff88803bb90000 IN 0.3 18312 2804 systemd-udevd
296 1 3 ffff88803bb4b2c0 IN 0.2 4216 2496 bash
299 2 0 ffff88803bb932c0 ID 0.0 0 0 [kworker/0:1H]
302 2 2 ffff88803b708000 ID 0.0 0 0 [xfsalloc]
303 2 0 ffff88803b70b2c0 ID 0.0 0 0 [xfs_mru_cache]
308 2 1 ffff88803bb732c0 ID 0.0 0 0 [kworker/1:2]
312 2 1 ffff88803d3332c0 ID 0.0 0 0 [kworker/1:1H]
321 2 1 ffff88803bb70000 ID 0.0 0 0 [xfs-buf/sda]
322 2 3 ffff88803d3732c0 ID 0.0 0 0 [xfs-conv/sda]
323 2 2 ffff88803d1532c0 ID 0.0 0 0 [xfs-cil/sda]
324 2 0 ffff88803b9db2c0 ID 0.0 0 0 [xfs-reclaim/sda]
325 2 3 ffff88803b9d8000 ID 0.0 0 0 [xfs-eofblocks/s]
326 2 0 ffff88803b9d0000 ID 0.0 0 0 [xfs-log/sda]
327 2 1 ffff88803b9d32c0 IN 0.0 0 0 [xfsaild/sda]
> 328 296 0 ffff88803d330000 RU 0.1 2556 1076 df
crash> bt 328
PID: 328 TASK: ffff88803d330000 CPU: 0 COMMAND: "df"
[exception RIP: xfs_fs_statfs+135]
RIP: ffffffffa008fd17 RSP: ffffc90000217df0 RFLAGS: 00000282
RAX: 0000000000000000 RBX: ffff88803ba4e000 RCX: 0000000000000000
RDX: ffff88803dc176e0 RSI: 8888888888888889 RDI: 0000000000000246
RBP: ffffc90000217df8 R8: 0000000000000000 R9: 0000000000000001
R10: 0000000000000001 R11: 0000000000000001 R12: ffff88803b3ec640
R13: ffffc90000217e90 R14: 0000000000000002 R15: 0000000000000000
CS: 0010 SS: 0018
#0 [ffffc90000217e00] statfs_by_dentry at ffffffff81352353
#1 [ffffc90000217e20] vfs_statfs at ffffffff8135265b
#2 [ffffc90000217e40] user_statfs at ffffffff8135275b
#3 [ffffc90000217e88] __do_sys_statfs at ffffffff813527d8
#4 [ffffc90000217f20] __x64_sys_statfs at ffffffff81352826
#5 [ffffc90000217f30] do_syscall_64 at ffffffff81004745
#6 [ffffc90000217f50] entry_SYSCALL_64_after_hwframe at ffffffff81c00091
RIP: 00007fd7c6bee1fb RSP: 00007ffe140f5ee8 RFLAGS: 00000246
RAX: ffffffffffffffda RBX: 00007ffe140f7f7f RCX: 00007fd7c6bee1fb
RDX: 00000000ffffffff RSI: 00007ffe140f5ef0 RDI: 00007ffe140f7f7f
RBP: 00007ffe140f5ef0 R8: 00007ffe140f6013 R9: 0000000000000032
R10: 0000000000000000 R11: 0000000000000246 R12: 00007ffe140f5f90
R13: 0000000000000000 R14: 000056187bdb37c0 R15: 000056187bdb3780
ORIG_RAX: 0000000000000089 CS: 0033 SS: 002b
crash>
Looking at all these details we can see that the lockup happened at xfs_fs_statfs+135. We can use gdb inside crash to list the source code where the lockup happened, unfortunately in our case it seems that the symbol cannot be found:
crash> gdb list *(xfs_fs_statfs+135)
No symbol "xfs_fs_statfs" in current context.
gdb: gdb request failed: list *(xfs_fs_statfs+135)
crash>
This is because xfs_fs_statfs is defined in an external module and crash doesn't know where to find it. So we can help it to find the missing symbols using the command "mod":
crash> mod
MODULE NAME SIZE OBJECT FILE
ffffffffa00f5340 xfs 1327104 (not loaded) [CONFIG_KALLSYMS]
crash> mod -s xfs /home/righiandr/tmp/kmod/lib/modules/5.3.0-rc3+/kernel/fs/xfs/xfs.ko
MODULE NAME SIZE OBJECT FILE
ffffffffa00f5340 xfs 1327104 /home/righiandr/tmp/kmod/lib/modules/5.3.0-rc3+/kernel/fs/xfs/xfs.ko
crash>
Now we can use "gdb list" to look at the exact line of code that caused the lockup:
crash> gdb set listsize 3
crash> gdb list *(xfs_fs_statfs+135)
0xffffffffa008fd17 is in xfs_fs_statfs (fs/xfs/xfs_super.c:1102).
1101 spin_lock(&mp->m_sb_lock);
1102 while (1);
1103 statp->f_bsize = sbp->sb_blocksize;
crash>
There it is! In this case it's pretty obvious to notice the endless while(1) loop after a spin_lock(), some times it can be way harder to spot a lockup, but the virtualized environment in combination with crash can really help a lot to speed up the tests and, often, it allows get additional information that would have been impossible to see in a typical development/testing environment.
In this article we have seen a possible workflow to test and debug the kernel in a virtualized environment. This makes kernel development safer, faster and easier, compared to the usual workflow of developing and testing a kernel on the same development system.
This is all possible simply using QEMU/KVM, virtme and crash with a couple of small helper scripts.
I hope anyone searching for kernel development tips find this useful. Let me know your kernel development tips and tricks in the comments below!