admin 管理员组

文章数量: 1086019


2024年12月25日发(作者:html源代码怎么看)

关于SUSE LINUX系统假死问题,我们需要分几个方面来看:

一、如果这个时候系统网络能有响应(比如能ping通),但是kernel运行正常,同时,我们

也要确保Ctrl+ALt+F1--F6键时console控制台正常,那么我们可以通过下面的方法来获取一

些信息:

基于SUSE LINUX操作系统方面,我们部署在故障发生时通过魔术键将系统状态的

CALLTRACE(如内存、线程堆栈等)抓出来,则可以清晰的了解系统当时的状态。可通过

配置串口控制台及操作本地键盘魔术键来将系统状态导出到控制台上。此时可以通过触发魔

术键来获取有用信息。

服务器发生死机前,必须先配置服务器,具体步骤如下:

1、 进入以下的界面提示后按进入BIOS设置。

“Press to view diagnostic messages

Press to enter SETUP,Network Boot ”

选择Server Mangement菜单下:

Console Redirection,即选择控制台重定向的串口,设置为enable

记录下默认的串口波特率等参数以备死机时连接使用。

2、有运行业务的机器上开启sysrq功能:

echo 1 > /proc/sys/kernel/sysrq 这种方法不用重启系统

修改上面目录下的sysrq文件,将文件内容改为1,可知系统已启用sysrq。

再通过修改/etc/文件,这样可以保证系统启动后自动开启sysrq功能。方法是:

在/etc/文件中加入:

=1

运行:

sysctl -p

使之立即生效

3、现某台机器业务中断后,先尝试网络登录,如果可以,执行下面命令:

echo t > /proc/sysrq-trigger

echo p > /proc/sysrq-trigger

echo m > /proc/sysrq-trigger

如果网络不能登录,可尝试本地操作,串口登入,在控制台执行上述3条命令。

如果本地也不能登录,可执行在死机的服务器的键盘上先后同时按下:

Alt + SysRq + “t ”

Alt + SysRq + “p”

Alt + SysRq + “m”

二、如果这个时候系统属于真死机状态,也就是出现kernel panic或kernel Oops的话,那么

我们需要部署LKCD工具来做Crash DUMP,从获取的DUMP文件中来分析系统死机的原因,

配置LKCD的方法如下:

1、开启core dump功能

1) edit /etc/profile and comment following lines:

将ulimit -Sc 0注释掉,即:

修改成 #ulimit -Sc 0

2) edit /etc/security/ and add two line like:

* soft core unlimited

* hard core unlimited

2、配置lkcd

1) Edit /etc/sysconfig/dump

修改并激活以下选项:

DUMP_ACTIVE='1'

DUMPDEV='/dev/cciss/c0d0p2'

DUMPDIR='/var/log/dump'(存放core dump文件的路径,一般默认即可)

DUMP_LEVEL='4' (在SLES8上默认是8 ,在SLES9上默认是2,是生成DUMP文件的级别)

DUMP_FLAGS='0x80000000'

TARGET_HOST='' (“”里头输入主机的IP地址)

2)运行以下命令使脚本生效:

#lkcd config

#lkcd_config –q (输出信息来自刚才lkcd config的配置)

#insserv /etc/init.d/

3、设置LKCD在系统启动时自启动:

运行YAST

选择system>runlevel>专家模式,将改成运行级别3和5都在系统启动的时候启动

,保存退出!重新启动服务器检查是否LKCD已经启动。

检查是否LKCD已经启动的方法:

在服务器的键盘上先后同时按下:

Alt + SysRq + “t ”

Alt + SysRq + “p”

Alt + SysRq + “m”

若有对话框跳出,即已启动。

后面的是官方网站的文章参考:

Configuring a Remote Serial Console for SLESThis document (3456486) is provided subject to the

disclaimer at the end of this document.

Environment

Novell SUSE Linux Enterprise Server 10

Novell SUSE Linux Enterprise Server 9

Novell SUSE Linux Enterprise Server 8

Novell SUSE Linux Enterprise Desktop 10

Novell Linux Desktop

Novell SUSE Linux Openexchange Server 4.1

Novell SUSE Linux Standard Server 8

Serial Console

Remote Management

Situation

Purpose

Configure access to a system using a serial connection, e.g. in order to manually trigger a kernel

crash dump.

Resolution

Assumptions

Another Linux system is to be configured to act as the serial console for a server, rather than, say, a

data terminal or a Microsoft Windows system.

On both systems, the null modem cable is attached to the first serial port ("COM1" in

DOS-terminology).

The server is booted using GRUB.

The connection will use a baud rate of 115200, 8 data bits, 1 stop bit and odd parity ("115200

8N1").

Configuration Steps

Connect a null modem cable between the system that will act as the console and the server. Refer to

the Wikipedia article Null modem for details, including pin mapping.

If the server's BIOS supports serial console, configure the BIOS for it. The details of this procedure

are dependent on the BIOS vendor - refer to vendor documentation.

Configure GRUB on the server to use the first serial port. In the file /boot/grub/, comment

out the color and gfxmenu lines and add the following lines:

serial --unit=0 --speed=115200

terminal --timeout=15 serial console

(在启动标题栏上方)

Configure the kernel (and hypervisor) on the server to use the serial port. This configuration differs

between Xen setups and non-Xen setups.

Non-Xen setup

In the file /boot/grub/, add the following options to the kernel command line:

console=tty0 console=ttyS0,115200

Kernel messages will be written to both tty0 and ttyS0, but OS messages will only be written to

ttyS0. OS messages go to the last console defined on the boot options line.

A sample /boot/grub/ file illustrating these changes:

#color white/blue black/light-gray

default 0

timeout 8

#gfxmenu (hd0,1)/boot/message

serial --unit=0 --speed=115200

terminal --timeout=15 serial console

title Linux ! SERIAL CONSOLE !

kernel (hd0,1)/boot/vmlinuz root=/dev/sda3 selinux=0 splash=0 resume=/dev/sda1 showopts

elevator=cfq vga=791 console=tty0 console=ttyS0,115200

initrd (hd0,1)/boot/initrd

Xen setup

When Xen virtualization is used, both the Xen hypervisor and the Dom0 kernel need to be

instructed to use the serial connection:

Add console=vga,com1 com1=115200 to the parameters for the hypervisor.

Add console=tty0 console=xvc0,115200 to the parameters for the Dom0 kernel.

A sample /boot/grub/ file illustrating these changes:

#color white/blue black/light-gray

default 0

timeout 8

#gfxmenu (hd1,0)/boot/message

serial --unit=0 --speed=115200

terminal --timeout=15 serial console

title Linux - Xen ! SERIAL CONSOLE !

kernel (hd0,1)/boot/ console=vga,com1 com1=115200

module (hd0,1)/boot/vmlinuz root=/dev/sda3 console=tty0 console=xvc0,115200

module (hd0,1)/boot/initrd

Configure the server to allow logins over the serial connection. In the file /etc/inittab, add the

following line.

S0:12345:respawn:/sbin/agetty -L 115200 console vt102

To allow single-user mode to work using the serial connection, additionally change the line

~~:S:respawn:/sbin/sulogin

in /etc/inittab to

~~:S:respawn:/sbin/sulogin /dev/console

NOTE: Single-user mode will only work on the serial console with this option. You will need to

change it back, to run on the local console.

Configure the serial port on the server as a secure port, so a login as root is possible on it without

the need to log in as a regular user first.

Add lines

console

ttyS0

xvc0

to the file /etc/securetty

Ensure the package screen is installed on the server; this will be used later on to send control

sequences to it.

Triggering kernel crash dumps using the serial console

The serial console connection can be used to perform "magic SysRq" control of the server,

including triggering a kernel crash dump. This is particularly useful when analysing system hangs

where "magic SysRq" via the system's keyboard is not working.

Configure the server for kernel crash dump capture. Refer to the appropriate TID for details:

TID 3374462, Configure kernel core dump capture, documents the Kdump method for SLE 10.

TID 3044267, Configure lkcd to capture a kernel core dump, documents the lkcdutils method,

primarily used with SLES9 and related products.

Use a serial program like minicom on the serial console system to connect to the server over the

serial port.

Login to the system as root to the serial console system and run

screen -S console /dev/ttyS0 115200

This sets up a screen session connected to the first serial port. To use this session, do the following:

Login as root to the serial console system from any machine on the network.

Run the following command:

screen -x -r console

On a reboot of the SUSE host, GRUB will prompt "Press any key to continue." If a key is pressed,

then the GRUB menu will be displayed on the device used. If no key is pressed, the GRUB menu

will be displayed on the serial console screen as defined by the terminal option in

the/boot/grub/ file.

The screen command allows for multiple users to attach and control the screen simultaneously. This

allows for multiple people to participate in the troubleshooting process if necessary.

To trigger a kernel crash dump:

Non-Xen setup

Send a break to the serial port and then the magic sysrq key. For example: Ctrl-A, Ctrl-B, d. Refer

to the screen man page for more commands.

Xen setup

With the Xen hypervisor, the magic sysrq key is Ctrl-O; send Ctrl-O, d to trigger a crash dump.


本文标签: 系统 文件 死机 串口 状态