Kdump on CentOS 6 | linuxsysconfig

kdump is part of the kexec-tools package which provides the kexec binary that facilitates a new kernel to boot using the kernel’s kexec feature either on a normal or a panic reboot. With the help of kdump, kexec and a debug kernel, one can have a much higher chance of finding out why the kernel failed. When a kernel panic occurs, kexec loads a new kernel which collects the crash data and saves it in a special log file which helps troubleshooting the failure.

This guide shows you how to configure kdump for CentOS 6, but it should also apply to Red Hat Enterprise Linux and Fedora.

This CentOS installation is a guest OS under VirtualBox 4.2.8 and it has the latest kernel installed (as of today).

First some info about the machine

cat /etc/redhat-releaseCentOS release 6.3 (Final)uname -r2.6.32-279.22.1.el6.x86_64rpm -qa | grep `uname -r`kernel-2.6.32-279.22.1.el6.x86_64kernel-headers-2.6.32-279.22.1.el6.x86_64kernel-devel-2.6.32-279.22.1.el6.x86_64

Install the required packages

yum --enablerepo=debug install kexec-tools crash kernel-debug kernel-debuginfo-`uname -r`

This will install all required packages and dependencies. Make sure you use `uname -r` or $(uname -r) when installing the debuginfo rpms, otherwise yum could install the latest packages available under the debug repository and not those needed for your kernel version. Also note that kernel-debuginfo is quite large in size (1.5-1.7GB installed) so check your free disk space before the installation.

Modify grub

A kernel argument must be added to /etc/grub.conf to enable kdump. It’s called crashkernel and it can be either auto or set as a predefined value e.g. 128M, 256M, 512M etc. These values define the amount of memory reserved for the capture kernel. I chose 128M for my testing.

title CentOS (2.6.32-279.22.1.el6.x86_64.debug)root (hd0,0)kernel /vmlinuz-2.6.32-279.22.1.el6.x86_64.debug ro root=/dev/mapper/vg_centos6-lv_root rd_NO_LUKS LANG=en_US.UTF-8 rd_LVM_LV=vg_centos6/lv_swap rd_NO_MD SYSFONT=latarcyrheb-sun16 rd_LVM_LV=vg_centos6/lv_root  KEYBOARDTYPE=pc KEYTABLE=us rd_NO_DM crashkernel=128Minitrd /initramfs-2.6.32-279.22.1.el6.x86_64.debug.imgtitle CentOS (2.6.32-279.22.1.el6.x86_64)root (hd0,0)kernel /vmlinuz-2.6.32-279.22.1.el6.x86_64 ro root=/dev/mapper/vg_centos6-lv_root rd_NO_LUKS LANG=en_US.UTF-8 rd_LVM_LV=vg_centos6/lv_swap rd_NO_MD SYSFONT=latarcyrheb-sun16 rd_LVM_LV=vg_centos6/lv_root  KEYBOARDTYPE=pc KEYTABLE=us rd_NO_DM crashkernel=128Minitrd /initramfs-2.6.32-279.22.1.el6.x86_64.img

Enable kdump

chkconfig kdump onservice kdump startNo kdump initial ramdisk found.                            [WARNING]Rebuilding /boot/initrd-2.6.32-279.22.1.el6.x86_64kdump.imgStarting kdump:                                            [  OK  ]

After this step a reboot is required in order to boot the kernel with the new argument.

shutdown -r now

Confirm kdump is active

service kdump statusKdump is operationalcat /sys/kernel/kexec_crash_loaded1cat /proc/iomem | grep Crash03000000-12ffffff : Crash kernel

Test kdump i.e. trigger a kernel crash

### Clearly you shouldn’t do this on a production machine! ###

echo 1 > /proc/sys/kernel/sysrqecho c > /proc/sysrq-trigger

The kernel panic should happen instantly. In theory the debug kernel is loaded by kexec and gathers the crash data. After that the machine will boot into the default kernel. In practice this doesn’t always happen. You may need to tweak the configuration files (/etc/kdump.conf and /etc/sysconfig/kdump) or try different crashkernel options in grub.

There could also be issues with the debug kernel and some existing kernel modules (e.g. megaraid) so you might need to explicitly add those to the extra_modules line in /etc/kdump.conf or prevent them from being added to initrd by using the mkdumprd utility (and its omit-raid-modules option).

Analysing the log file

The default path to store the log file is under /var/crash. With the help of the crash utility you can try to investigate what happened. Most data is pretty cryptic, but with the help of the built-in commands you can at least get some idea of what went wrong.

crash /usr/lib/debug/lib/modules/2.6.32-279.22.1.el6.x86_64/vmlinux /var/crash/127.0.0.1-2013-03-03-20\:14\:21/vmcoreKERNEL: /usr/lib/debug/lib/modules/2.6.32-279.22.1.el6.x86_64/vmlinuxDUMPFILE: /var/crash/127.0.0.1-2013-03-03-20:14:21/vmcore  [PARTIAL DUMP]CPUS: 2DATE: Sun Mar  3 20:13:14 2013UPTIME: 00:00:56LOAD AVERAGE: 0.08, 0.03, 0.01TASKS: 188NODENAME: centos6.3RELEASE: 2.6.32-279.22.1.el6.x86_64VERSION: #1 SMP Wed Feb 6 03:10:46 UTC 2013MACHINE: x86_64  (2467 Mhz)MEMORY: 4 GBPANIC: "Oops: 0002 [#1] SMP " (check log for details)PID: 8473COMMAND: "bash"TASK: ffff88011b550040  [THREAD_INFO: ffff880119322000]CPU: 0STATE: TASK_RUNNING (PANIC)

In my case the issue was quite easy to spot as the log command from the crash tool exposed the SysRq triggered crash:

SysRq : Trigger a crash

The bt command also revealed the same thing:

KERNEL-MODE EXCEPTION FRAME AT: ffff8801193238d8[exception RIP: sysrq_handle_crash+22]RIP: ffffffff81321d66  RSP: ffff880119323e18  RFLAGS: 00010096RAX: 0000000000000010  RBX: 0000000000000063  RCX: 0000000000002388RDX: 0000000000000000  RSI: 0000000000000000  RDI: 0000000000000063RBP: ffff880119323e18   R8: 0000000000000000   R9: ffffffff8163ac60R10: 0000000000000001  R11: 0000000000000000  R12: 0000000000000000R13: ffffffff81afb7a0  R14: 0000000000000286  R15: 0000000000000004ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018

There are other commands that you can run with the crash utility, type help inside the crash prompt to get the full list.

See also some screens while booting the crash kernel after the panic.

本站仅提供存储服务，所有内容均由用户发布，如发现有害或侵权内容，请点击举报。