Linux core 文件


某台服务器业务进程挂了,研发反映没看到 core 文件生成。   于是我们来排查下是什么原因造成 core 文件没生成。

什么是 core 文件

core - core dump file
The default action of certain signals is to cause a process to terminate and produce a core dump file, 
a disk file containing an image of the process's memory at the time of termination.  
This image can be used in a debugger (e.g., gdb(1)) to inspect the state of the program 
at the time that it terminated.  
A list of the signals which cause a process to dump core can be found in signal(7).

简介: 在一个程序崩溃时,它一般会在指定目录下生成一个 core 文件。 core 文件是一个内存映象(同时加上调试信息),主要是用来调试的。

配置系统生成 core 文件

使用 ulimit 配置生成 core 文件大小

首先使用ulimit -a确认系统是否生成 core 文件 看第一行core file size,若为 0,说明禁用了 core 文件生成,若要启用,根据需要配置 core 文件生成大小。 如不限制 core 文件大小,如下设置:
ulimit -c unlimited

系统没生成 core 文件的原因

如在 ulimit 设置了生成 core 文件,还是没有生成 core,可能是什么原因呢?

man core里面说了可能的原因,如下:

There are various circumstances in which a core dump file is not produced:

*  The process does not have permission to write the core file.  
(By default the core file is called core, and is created in the current working directory.  
See below for details on naming.)  
Writing the core file will fail if the directory in which it is to be created is nonwritable, 
or if a file with the same name exists and is not writable or is not a regular file 
(e.g., it is a directory or a symbolic link).

*  A (writable, regular) file with the same name as would be used for the core dump already exists, 
but there is more than one hard link to that file.

*  The file system where the core dump file would be created is full; or has run out of inodes; 
or is mounted read-only; or the user has reached their quota for the file system.

*  The directory in which the core dump file is to be created does not exist.

*  The RLIMIT_CORE (core file size) or RLIMIT_FSIZE (file size) resource limits for the process 
are set to zero; see getrlimit(2) and the documentation of the shell's ulimit command 
(limit  in csh(1)).

*  The binary being executed by the process does not have read permission enabled.

*  The process is executing a set-user-ID (set-group-ID) program that is owned by a user (group) 
other than the real user (group) ID of the process.  (However, see the description of the prctl(2)
PR_SET_DUMPABLE operation, and the description of the /proc/sys/fs/suid_dumpable file in proc(5).)

*  (Since Linux 3.7) The kernel was configured without the CONFIG_COREDUMP option.

In addition, a core dump may exclude part of the address space of the process 
if the madvise(2) MADV_DONTDUMP flag was employed.



一、要保证存放 coredump 的目录存在且进程对该目录有写权限。
存放 coredump 的目录即进程的当前目录,一般就是当初发出命令启动该进程时所在的目录。

二、若程序调用了 seteuid()/setegid() 改变了进程的有效用户或组,则在默认情况下系统不会为这些进程生成 coredump。
很多服务程序都会调用 seteuid(),如 MySQL,不论你用什么用户运行 mysqld_safe 启动 MySQL,
mysqld 运行的有效用户始终是 msyql 用户。
如果你当初是以用户 A 运行了某个程序,但在 ps 里看到的这个程序的运行用户却是 B 的话,那么这些进程就是调用了 seteuid 了。
为了能够让这些进程生成 core dump,需要将 /proc/sys/fs/suid_dumpable 文件的内容改为 1(一般默认是 0)。

三、要设置足够大的 coredump 文件大小限制。
程序崩溃时生成的 core 文件大小即为程序运行时占用的内存大小。
因此无论程序正常运行时占用的内存多么少,要保证生成 core 文件还是将大小限制设为 unlimited 为好。


# 实例分析


# 1. 业务进程设置了 SUID
[root@xxx_game ~]# ll /home/xxx/global/app/globalserver
-rws--s--x 1 root root 737162 Mar 11 04:07 /home/xxx/global/app/globalserver

# 2. 普通用户启动了该业务进程
[root@xxx_game ~]# grep su /home/xxx/global/ 
su - xxx -c "cd /home/xxx/global/app;./globalserver -d"

[root@xxx_game ~]# ps aux | grep globalserver | grep -v grep
root     28448  0.2  1.2 270452 208996 ?       Sl   11:16   0:55 ./globalserver -d

# 3. suid_dumpable 为 0,所以 suid 程序不能生成 core 文件
[root@xxx_game ~]# cat /proc/sys/fs/suid_dumpable 


以上情况的解决方法是设置`/proc/sys/fs/suid_dumpable`成 1

