2019独角兽企业重金招聘Python工程师标准>>>

最近几天线上jboss服务器经常莫名地突然停止运行,导致半夜都被报警短信吵醒,元旦几天也基本就在收报警,然后重启系统。查看jboss控制台错误日志,发现只有下面一行:

/opt/.../jboss/bin/run.sh: line 181: 26430 段错误                  "$JAVA" $JAVA_OPTS -Djava.endorsed.dirs="$JBOSS_ENDORSED_DIRS" -classpath "$JBOSS_CLASSPATH" org.jboss.Main "$@"

一开始对上面的错误日志也没引起重视,一直在找JVM crash的日志hs_err_<pid>,看了很多网上的帖子,包括JVM官网的介绍,为什么没有生成hs_err_pid日志,可能的原因就是没权限或者空间不够,查看了磁盘空间是足够的,于是想到可能没有权限(其实默认目录就是JVM的工作目录,应该是有权限的,但当时唯一能怀疑的就是这个),所以就想到JVM参数(-XX:ErrorFile=./hs_err_<pid>.log)可以配置这个日志文件的保存目录,但这个参数是JDK1.6才引入的,而我们系统杯催的还停留在JDK1.5上,没办法想到另外一个参数(-XX:OnError="<cmd args>;<cmd args>"),里面配置的命令是kill -3 <pid>,这样JVM在crash的时候可以将线程堆栈dump出来,想法是好的,可这个命令一直没配上去,只能作罢。

拿不到hs_err_<pid>日志,唯一的线索就是上面仅有的一行错误日志,于是google“段错误”“segmentation fault”,发现可以通过core dump日志进行分析,解决思路也就自然而然有了。

1.core dump介绍

程序异常退出(crash)时会自动生成一个core文件,包含了程序运行时的内存,寄存器状态,堆栈指针,内存管理等信息,也就是把程序当时工作的状态存储成一个文件。不仅仅是在出错的时候会生成core dump文件,在系统卡住或者cpu使用率很高的时候也可以手动触发生成core dump文件(当然这种情况也可以直接通过jmap和jstack dump出内存和线程堆栈,但JDK1.6以前,这2个命令经常执行不成功),这样就可以很好的分析了。

2.生成core dump文件

core dump文件生成开关其实是通过对生成的文件大小进行控制达到的,默认大小是0,也就是说默认是不生成core dump文件的,可以通过命令ulimit -c进行查看。将此参数修改成unlimited就可以生成core dump文件了,但值得注意的一点是,每个应用进程都会读取自己的一套系统参数,可以查看进程对应的内存文件/proc/<pid>/limits中的信息来判断修改后的参数值是否对此应用进程生效了,limits文件中的信息如下:

Limit                     Soft Limit           Hard Limit           Units     
Max cpu time              unlimited            unlimited            seconds   
Max file size             unlimited            unlimited            bytes     
Max data size             unlimited            unlimited            bytes     
Max stack size            10485760             unlimited            bytes     
Max core file size        unlimited           unlimited            bytes    
Max resident set          unlimited            unlimited            bytes     
Max processes             20480                20480                processes 
Max open files            204800               204800               files     
Max locked memory         32768                32768                bytes     
Max address space         unlimited            unlimited            bytes     
Max file locks            unlimited            unlimited            locks     
Max pending signals       1024                 1024                 signals   
Max msgqueue size         819200               819200               bytes

上述信息中的对应的参数值为unlimited,代表此应用进程的core dump开关是开着的。ulimit -c unlimited这个命令可以在很多地方修改,比如用root在命令行执行,在/etc/profile中增加(/etc/profile文件中已经有defaultulimit -S -c 0 > /dev/null 2>&1,可以直接修改0为unlimited),在用户的.bash_profile中增加..这些方式执行后的影响范围还是挺复杂的,到现在也没弄清楚。当时通过root用户命令行修改后,在shell里查询ulimit -c是生效了,以为应用程序也生效了,觉得不用应用都不需要重启就生效了挺好,等到下一次crash的时候发现还是没有生成core dump文件,于是到文件/proc/<pid>/limits中检查此参数值才发现应用程序所读取的值还是0。后来采用一种比较简单直接的方式,用root用户修改/etc/profile文件中的ulimit -c的值,然后重新登录,重启应用,再检查/proc/<pid>/limits中的值,确保参数已经生效,后来应用再crash时core dump文件也如期而至,心里暗爽,终于可以开始分析堆栈了。

core dump文件生成的默认路径在用户的工作目录(启动脚本目录),可以通过设置参数/proc/sys/kernel/core_pattern的值改变core dump文件的最终生成路径及文件名格式。core dump文件成功后,在jboss控制台错误日志中的日志也多了core dumped关键字,如下:

/opt/.../jboss/bin/run.sh: line 181:  9176 段错误                  (core dumped) "$JAVA" $JAVA_OPTS -Djava.endorsed.dirs="$JBOSS_ENDORSED_DIRS" -classpath "$JBOSS_CLASSPATH" org.jboss.Main "$@"

3.core dump分析

有了core dump文件,接下来要做的就是通过命令去解析此文件,定位具体问题了,主要有以下三个命令:

(1)先执行gdb $JAVA_HOME$/bin/java core-java-16427-1325846515,再执行bt,输出结果如下:

Loaded symbols for /opt/.../oracle/lib/libnnz10.so
Reading symbols from /opt/.../install/oracle/10.2.0.3/lib/libociei.so...(no debugging symbols found)...done.
Loaded symbols for /opt/.../oracle/lib/libociei.so
Reading symbols from /usr/lib/gconv/GB18030.so...(no debugging symbols found)...done.
Loaded symbols for /usr/lib/gconv/GB18030.so
#0  0x4662051e in ttcrd2r ()
   from /opt/.../oracle/lib/libclntsh.so.10.1
(gdb) bt
#0  0x4662051e in ttcrd2r () from /opt/.../oracle/lib/libclntsh.so.10.1
#1  0x46617212 in ttcrbur () from /opt/.../oracle/lib/libclntsh.so.10.1
#2  0x466176e6 in ttcbur () from /opt/.../oracle/lib/libclntsh.so.10.1
#3  0x465ba8c7 in ttcdrv () from /opt/.../oracle/lib/libclntsh.so.10.1
#4  0x464a1bc8 in nioqwa () from /opt/.../oracle/lib/libclntsh.so.10.1
#5  0x4630d618 in upirtrc () from /opt/.../oracle/lib/libclntsh.so.10.1
#6  0x462825f6 in kpurcsc () from /opt/.../oracle/lib/libclntsh.so.10.1
#7  0x46236a0d in kpuexecv8 () from /opt/.../oracle/lib/libclntsh.so.10.1
#8  0x46238ec4 in kpuexec () from /opt/.../oracle/lib/libclntsh.so.10.1
#9  0x463121b6 in OCIStmtExecute () from /opt/.../oracle/lib/libclntsh.so.10.1
#10 0x46e725aa in Java_oracle_jdbc_driver_T2CConnection_t2cClearAllApplicationContext ()
   from /opt/.../install/oracle/10.2.0.3/lib/libocijdbc10.so
#11 0x46e7580f in Java_oracle_jdbc_driver_T2CStatement_t2cDefineExecuteFetch ()
   from /opt/.../install/oracle/10.2.0.3/lib/libocijdbc10.so
#12 0xb125a8b8 in ?? ()
#13 0x47e01bc8 in ?? ()
#14 0x47d46bb0 in ?? ()
#15 0x47d46c40 in ?? ()
#16 0x4babf774 in ?? ()
#17 0x00000000 in ?? ()

通过这个debug命令可以查看到应用crash时正在做什么,这里是在获取jdbc连接,但这个并不代表问题就出在这里,有可能这只是个结果,起因还需要进一步排查。

(2)光有上述信息还不足以定位出问题在哪,可以通过命令jstack $JAVA_HOME$/bin/java core-java-16427-1325846515获取应用crash时的线程堆栈,输出结果如下:

Thread 11586: (state = BLOCKED)
 - java.lang.Object.wait(long) @bci=0 (Compiled frame; information may be imprecise)
 - java.lang.Object.wait() @bci=2, line=474 (Compiled frame)
 - edu.emory.mathcs.backport.java.util.concurrent.LinkedBlockingDeque.takeFirst() @bci=20, line=389 (Compiled frame)
 - edu.emory.mathcs.backport.java.util.concurrent.LinkedBlockingDeque.take() @bci=1, line=577 (Compiled frame)
 - edu.emory.mathcs.backport.java.util.concurrent.ThreadPoolExecutor$Worker.run() @bci=18, line=674 (Compiled frame)
Error occurred during stack walking:

Thread 10924: (state = IN_NATIVE)
 - oracle.jdbc.driver.T2CStatement.t2cDefineExecuteFetch(oracle.jdbc.driver.OracleStatement, long, int, int, int, int, boolean, boolean, byte[], int, byte, int, int, short[], int, byte[], char[], int, int, short[], byte[], int, int, boolean, boolean, oracle.jdbc.driver.Accessor[], byte[][][], long[], byte[], int, char[], int, short[], int) @bci=0 (Interpreted frame)
 - oracle.jdbc.driver.T2CPreparedStatement.doDefineExecuteFetch() @bci=190, line=878 (Interpreted frame)
 - oracle.jdbc.driver.T2CPreparedStatement.executeForRows(boolean) @bci=34, line=760 (Interpreted frame)
 - oracle.jdbc.driver.OracleStatement.executeMaybeDescribe() @bci=154, line=1062 (Interpreted frame)
 - oracle.jdbc.driver.OracleStatement.doExecuteWithTimeout() @bci=102, line=1126 (Interpreted frame)
 - oracle.jdbc.driver.OraclePreparedStatement.executeInternal() @bci=94, line=3339 (Interpreted frame)
 - oracle.jdbc.driver.OraclePreparedStatement.execute() @bci=17, line=3445 (Interpreted frame)
 - org.jboss.resource.adapter.jdbc.CachedPreparedStatement.execute() @bci=4, line=216 (Interpreted frame)
 - org.jboss.resource.adapter.jdbc.WrappedPreparedStatement.execute() @bci=12, line=209 (Compiled frame)
 - com.ibatis.sqlmap.engine.execution.SqlExecutor.executeQuery(com.ibatis.sqlmap.engine.scope.RequestScope, java.sql.Connection, java.lang.String, java.lang.Object[], int, int, com.ibatis.sqlmap.engine.mapping.statement.RowHandlerCallback) @bci=135, line=180 (Compiled frame)
 - com.ibatis.sqlmap.engine.mapping.statement.GeneralStatement.sqlExecuteQuery(com.ibatis.sqlmap.engine.scope.RequestScope, java.sql.Connection, java.lang.String, java.lang.Object[], int, int, com.ibatis.sqlmap.engine.mapping.statement.RowHandlerCallback) @bci=15, line=205 (Interpreted frame)
 - com.ibatis.sqlmap.engine.mapping.statement.GeneralStatement.executeQueryWithCallback(com.ibatis.sqlmap.engine.scope.RequestScope, java.sql.Connection, java.lang.Object, java.lang.Object, com.ibatis.sqlmap.client.event.RowHandler, int, int) @bci=169, line=173 (Compiled frame)
 - com.ibatis.sqlmap.engine.mapping.statement.GeneralStatement.executeQueryForObject(com.ibatis.sqlmap.engine.scope.RequestScope, com.ibatis.sqlmap.engine.transaction.Transaction, java.lang.Object, java.lang.Object) @bci=28, line=104 (Compiled frame)
 - com.ibatis.sqlmap.engine.impl.SqlMapExecutorDelegate.queryForObject(com.ibatis.sqlmap.engine.scope.SessionScope, java.lang.String, java.lang.Object, java.lang.Object) @bci=58, line=561 (Compiled frame)
 - com.ibatis.sqlmap.engine.impl.SqlMapExecutorDelegate.queryForObject(com.ibatis.sqlmap.engine.scope.SessionScope, java.lang.String, java.lang.Object) @bci=5, line=536 (Interpreted frame)
 - com.ibatis.sqlmap.engine.impl.SqlMapSessionImpl.queryForObject(java.lang.String, java.lang.Object) @bci=10, line=93 (Interpreted frame)
 - org.springframework.orm.ibatis.SqlMapClientTemplate$1.doInSqlMapClient(com.ibatis.sqlmap.client.SqlMapExecutor) @bci=9, line=273 (Interpreted frame)
 - org.springframework.orm.ibatis.SqlMapClientTemplate.execute(org.springframework.orm.ibatis.SqlMapClientCallback) @bci=242, line=209 (Compiled frame)
 - org.springframework.orm.ibatis.SqlMapClientTemplate.queryForObject(java.lang.String, java.lang.Object) @bci=11, line=271 (Compiled frame)
......
 - org.springframework.web.servlet.DispatcherServlet.doService(javax.servlet.http.HttpServletRequest, javax.servlet.http.HttpServletResponse) @bci=219, line=809 (Interpreted frame)
 - org.springframework.web.servlet.FrameworkServlet.processRequest(javax.servlet.http.HttpServletRequest, javax.servlet.http.HttpServletResponse) @bci=10, line=571 (Compiled frame)
 - org.springframework.web.servlet.FrameworkServlet.doPost(javax.servlet.http.HttpServletRequest, javax.servlet.http.HttpServletResponse) @bci=3, line=511 (Interpreted frame)
 - javax.servlet.http.HttpServlet.service(javax.servlet.http.HttpServletRequest, javax.servlet.http.HttpServletResponse) @bci=139, line=717 (Interpreted frame)
 - javax.servlet.http.HttpServlet.service(javax.servlet.ServletRequest, javax.servlet.ServletResponse) @bci=30, line=810 (Compiled frame)
 - org.apache.catalina.core.ApplicationFilterChain.doFilter(javax.servlet.ServletRequest, javax.servlet.ServletResponse) @bci=101, line=173 (Interpreted frame)
 - org.apache.catalina.core.ApplicationDispatcher.invoke(javax.servlet.ServletRequest, javax.servlet.ServletResponse) @bci=396, line=672 (Interpreted frame)
 - org.apache.catalina.core.ApplicationDispatcher.processRequest(javax.servlet.ServletRequest, javax.servlet.ServletResponse) @bci=64, line=463 (Interpreted frame)
 - org.apache.catalina.core.ApplicationDispatcher.doForward(javax.servlet.ServletRequest, javax.servlet.ServletResponse) @bci=469, line=398 (Interpreted frame)
 - org.apache.catalina.core.ApplicationDispatcher.forward(javax.servlet.ServletRequest, javax.servlet.ServletResponse) @bci=55, line=301 (Interpreted frame)

- org.springframework.web.servlet.view.InternalResourceView.renderMergedOutputModel(java.util.Map, javax.servlet.http.HttpServletRequest, javax.servlet.http.HttpServletResponse) @bci=244, line=240 (Interpreted frame)
 - org.springframework.web.servlet.view.AbstractView.render(java.util.Map, javax.servlet.http.HttpServletRequest, javax.servlet.http.HttpServletResponse) @bci=157, line=252 (Interpreted frame)
 - org.springframework.web.servlet.DispatcherServlet.render(org.springframework.web.servlet.ModelAndView, javax.servlet.http.HttpServletRequest, javax.servlet.http.HttpServletResponse) @bci=225, line=1173 (Compiled frame)
 - org.springframework.web.servlet.DispatcherServlet.doService(javax.servlet.http.HttpServletRequest, javax.servlet.http.HttpServletResponse) @bci=219, line=809 (Interpreted frame)
 - org.springframework.web.servlet.FrameworkServlet.processRequest(javax.servlet.http.HttpServletRequest, javax.servlet.http.HttpServletResponse) @bci=10, line=571 (Compiled frame)
 - org.springframework.web.servlet.FrameworkServlet.doPost(javax.servlet.http.HttpServletRequest, javax.servlet.http.HttpServletResponse) @bci=3, line=511 (Interpreted frame)
 - javax.servlet.http.HttpServlet.service(javax.servlet.http.HttpServletRequest, javax.servlet.http.HttpServletResponse) @bci=139, line=717 (Interpreted frame)
 - javax.servlet.http.HttpServlet.service(javax.servlet.ServletRequest, javax.servlet.ServletResponse) @bci=30, line=810 (Compiled frame)
 - org.apache.catalina.core.ApplicationFilterChain.doFilter(javax.servlet.ServletRequest, javax.servlet.ServletResponse) @bci=101, line=173 (Interpreted frame)
 - org.apache.catalina.core.ApplicationDispatcher.invoke(javax.servlet.ServletRequest, javax.servlet.ServletResponse) @bci=396, line=672 (Interpreted frame)
 - org.apache.catalina.core.ApplicationDispatcher.processRequest(javax.servlet.ServletRequest, javax.servlet.ServletResponse) @bci=64, line=463 (Interpreted frame)
 - org.apache.catalina.core.ApplicationDispatcher.doForward(javax.servlet.ServletRequest, javax.servlet.ServletResponse) @bci=469, line=398 (Interpreted frame)
 - org.apache.catalina.core.ApplicationDispatcher.forward(javax.servlet.ServletRequest, javax.servlet.ServletResponse) @bci=55, line=301 (Interpreted frame)

- org.springframework.web.servlet.view.InternalResourceView.renderMergedOutputModel(java.util.Map, javax.servlet.http.HttpServletRequest, javax.servlet.http.HttpServletResponse) @bci=244, line=240 (Interpreted frame)
 - org.springframework.web.servlet.view.AbstractView.render(java.util.Map, javax.servlet.http.HttpServletRequest, javax.servlet.http.HttpServletResponse) @bci=157, line=252 (Interpreted frame)
 - org.springframework.web.servlet.DispatcherServlet.render(org.springframework.web.servlet.ModelAndView, javax.servlet.http.HttpServletRequest, javax.servlet.http.HttpServletResponse) @bci=225, line=1173 (Compiled frame)
 - org.springframework.web.servlet.DispatcherServlet.doService(javax.servlet.http.HttpServletRequest, javax.servlet.http.HttpServletResponse) @bci=219, line=809 (Interpreted frame)
 - org.springframework.web.servlet.FrameworkServlet.processRequest(javax.servlet.http.HttpServletRequest, javax.servlet.http.HttpServletResponse) @bci=10, line=571 (Compiled frame)
 - org.springframework.web.servlet.FrameworkServlet.doPost(javax.servlet.http.HttpServletRequest, javax.servlet.http.HttpServletResponse) @bci=3, line=511 (Interpreted frame)
 - javax.servlet.http.HttpServlet.service(javax.servlet.http.HttpServletRequest, javax.servlet.http.HttpServletResponse) @bci=139, line=717 (Interpreted frame)
 - javax.servlet.http.HttpServlet.service(javax.servlet.ServletRequest, javax.servlet.ServletResponse) @bci=30, line=810 (Compiled frame)
 - org.apache.catalina.core.ApplicationFilterChain.doFilter(javax.servlet.ServletRequest, javax.servlet.ServletResponse) @bci=101, line=173 (Interpreted frame)
 - org.apache.catalina.core.ApplicationDispatcher.invoke(javax.servlet.ServletRequest, javax.servlet.ServletResponse) @bci=396, line=672 (Interpreted frame)
 - org.apache.catalina.core.ApplicationDispatcher.processRequest(javax.servlet.ServletRequest, javax.servlet.ServletResponse) @bci=64, line=463 (Interpreted frame)
 - org.apache.catalina.core.ApplicationDispatcher.doForward(javax.servlet.ServletRequest, javax.servlet.ServletResponse) @bci=469, line=398 (Interpreted frame)
 - org.apache.catalina.core.ApplicationDispatcher.forward(javax.servlet.ServletRequest, javax.servlet.ServletResponse) @bci=55, line=301 (Interpreted frame)

- org.springframework.web.servlet.view.InternalResourceView.renderMergedOutputModel(java.util.Map, javax.servlet.http.HttpServletRequest, javax.servlet.http.HttpServletResponse) @bci=244, line=240 (Interpreted frame)
 - org.springframework.web.servlet.view.AbstractView.render(java.util.Map, javax.servlet.http.HttpServletRequest, javax.servlet.http.HttpServletResponse) @bci=157, line=252 (Interpreted frame)
 - org.springframework.web.servlet.DispatcherServlet.render(org.springframework.web.servlet.ModelAndView, javax.servlet.http.HttpServletRequest, javax.servlet.http.HttpServletResponse) @bci=225, line=1173 (Compiled frame)
 - org.springframework.web.servlet.DispatcherServlet.doService(javax.servlet.http.HttpServletRequest, javax.servlet.http.HttpServletResponse) @bci=219, line=809 (Interpreted frame)
 - org.springframework.web.servlet.FrameworkServlet.processRequest(javax.servlet.http.HttpServletRequest, javax.servlet.http.HttpServletResponse) @bci=10, line=571 (Compiled frame)
 - org.springframework.web.servlet.FrameworkServlet.doPost(javax.servlet.http.HttpServletRequest, javax.servlet.http.HttpServletResponse) @bci=3, line=511 (Interpreted frame)
 - javax.servlet.http.HttpServlet.service(javax.servlet.http.HttpServletRequest, javax.servlet.http.HttpServletResponse) @bci=139, line=717 (Interpreted frame)
 - javax.servlet.http.HttpServlet.service(javax.servlet.ServletRequest, javax.servlet.ServletResponse) @bci=30, line=810 (Compiled frame)
 - org.apache.catalina.core.ApplicationFilterChain.doFilter(javax.servlet.ServletRequest, javax.servlet.ServletResponse) @bci=101, line=173 (Interpreted frame)
 - org.apache.catalina.core.ApplicationDispatcher.invoke(javax.servlet.ServletRequest, javax.servlet.ServletResponse) @bci=396, line=672 (Interpreted frame)
 - org.apache.catalina.core.ApplicationDispatcher.processRequest(javax.servlet.ServletRequest, javax.servlet.ServletResponse) @bci=64, line=463 (Interpreted frame)
 - org.apache.catalina.core.ApplicationDispatcher.doForward(javax.servlet.ServletRequest, javax.servlet.ServletResponse) @bci=469, line=398 (Interpreted frame)
  - org.apache.catalina.core.ApplicationDispatcher.forward(javax.servlet.ServletRequest, javax.servlet.ServletResponse) @bci=55, line=301 (Interpreted frame)

......

通过这个线程堆栈可以看出问题根本原因是一直在forward,代码出现死循环,总的forward的次数有100多次,直到线程堆栈溢出。这也能说明前面通过gdb命令查看到的获取jdbc连接仅仅是最近的一次循环正在执行的操作而已,不是真正的起因。这个线程的死循环forward同样影响到了其他的线程,导致出现Error occurred during stack walking错误,线程一直处于BLOCKED状态,总之线程堆栈对分析问题原因是非常有帮助的。

(3).可以通过命令jmap $JAVA_HOME$/bin/java core-java-16427-1325846515查看到应用的内存使用,结果如下:

using parallel threads in the new generation.
using thread-local object allocation.
Concurrent Mark-Sweep GC

Heap Configuration:
   MinHeapFreeRatio = 40
   MaxHeapFreeRatio = 70
   MaxHeapSize      = 1342177280 (1280.0MB)
   NewSize          = 134217728 (128.0MB)
   MaxNewSize       = 134217728 (128.0MB)
   OldSize          = 402653184 (384.0MB)
   NewRatio         = 15
   SurvivorRatio    = 1024
   PermSize         = 100663296 (96.0MB)
   MaxPermSize      = 100663296 (96.0MB)

Heap Usage:
New Generation (Eden + 1 Survivor Space):
   capacity = 134152192 (127.9375MB)
   used     = 22525464 (21.481956481933594MB)
   free     = 111626728 (106.4555435180664MB)
   16.790977220856742% used
Eden Space:
   capacity = 134086656 (127.875MB)
   used     = 22525464 (21.481956481933594MB)
   free     = 111561192 (106.3930435180664MB)
   16.799183954591275% used
From Space:
   capacity = 65536 (0.0625MB)
   used     = 0 (0.0MB)
   free     = 65536 (0.0625MB)
   0.0% used
To Space:
   capacity = 65536 (0.0625MB)
   used     = 0 (0.0MB)
   free     = 65536 (0.0625MB)
   0.0% used
concurrent mark-sweep generation:
   capacity = 1207959552 (1152.0MB)
   used     = 764300224 (728.8934936523438MB)
   free     = 443659328 (423.10650634765625MB)

Exception in thread "main" java.lang.OutOfMemoryError: requested 4096 bytes for jbyte in /BUILD_AREA/jdk1.5.0_08/hotspot/src/share/vm/prims/jni.cpp. Out of swap space?

      最后一行出现了OOM,这是因为dump出来的文件太大,32位机器支持的最大内存也就2G,所以在解析时出现OOM问题,可以放到64位服务器上进行解析。

4.常见的core dump原因

造成coredump的原因很多,常见的有内存访问越界,非法指针,堆栈溢出等等,也可以通过发送操作系统信号中断应用,比如kill -SIGSEGV <pid>,这是一个无效存储信号。如果想手动生成core dump文件,可以通过gdb -p <pid>生成。

其实通过上一篇所讲的core dump方法可以比较方便找到jvm crash问题所在,但这种方法适合crash比较频繁,容易获取到core dump文件,如果没有这个文件,那怎么可以查到问题呢。现在回过头想想通过跟踪crash时的apache日志,其实也能找到造成crash的请求,下面是某次crash时的日志:

[07/Jan/2012:07:52:40 +0800] "GET /a.xhtml HTTP/1.1" 200 2521 "-" "Mozilla/5.0 "  "a=-; b=-; c=-" -
[07/Jan/2012:07:52:47 +0800] "POST /a.xhtml HTTP/1.1" 503 323 "-" "Opera/9.80 "  "a=-; b=-; c=-" -
[07/Jan/2012:07:52:47 +0800] "POST /b.xhtml HTTP/1.0" 503 323 "-" "-" -  "a=-; b=-; c=-" -
[07/Jan/2012:07:52:51 +0800] "GET /a.xhtml HTTP/1.1" 503 323 "-" "Mozilla/5.0 "  "a=-; b=-; c=-" -
[07/Jan/2012:07:52:52 +0800] "GET /b.xhtml HTTP/1.1" 503 323 "-" "Mozilla/5.0 "  "a=-; b=-; c=-" -
[07/Jan/2012:07:52:48 +0800] "GET /c.do HTTP/1.1" 503 323 "-" "Mozilla/5.0 "  "a=-; b=-; c=-" -
[07/Jan/2012:07:52:52 +0800] "GET /c.do HTTP/1.1" 503 323 "-" "NokiaE63/UCWEB8.1.0.104/28/800"  "a=-; b=-; c=-" -
[07/Jan/2012:07:52:42 +0800] "POST /a.xhtml HTTP/1.1" 503 323 "-" "MAUI_WAP_Browser"  "a=-; b=-; c=-" -
[07/Jan/2012:07:50:26 +0800] "POST /e.xhtml HTTP/1.1" 503 323 "-" "MAUI_WAP_Browser"  "a=-; b=-; c=-" -
[07/Jan/2012:07:52:42 +0800] "GET /c.do HTTP/1.1" 503 323 "-" "MQQBrowser/2.9/Mozilla/5.0 "  "a=-; b=-; c=-" -
[07/Jan/2012:07:52:40 +0800] "GET /c.do HTTP/1.1" 503 323 "-" "MQQBrowser/2.0 "  "a=-; b=-; c=-" -

这里apache通过接收请求转发给jboss处理,从上面的日志可以看出07:50:26的这个请求比较特殊,没按时间先后排列,属于“插队”的请求。apache日志中打印出的时间是接收到请求时的时间,但日志是等到请求处理完后才打印,那么这个请求从7:50:26秒进来,处理了2分多钟,我们应该首先怀疑它;另外一个值得注意的是请求的状态,除了第一个请求是200状态外,其他都是503错误,这说明“插队”的请求不是瞬间就让应用crash的,上面列出的第一个请求7:52:40进来的,还是可以正常执行完返回,同时这个“插队”的请求在即将导致应用crash时,也会影响其他请求的处理,反映到apache日志上就是503状态(处于第一个请求和“插队”的请求之间的请求)。所以在apache日志中找出应用crash时刻的所有请求,然后在其中查找那个“插队”比较明显的请求(apache日志总的来说是按时间先后顺序排列的,也会有秒级别的差异,但像出现分钟级别的顺序错乱就需要引起重视),同时结合业务日志进一步确定代码的执行顺序,找出问题。

总之,碰到这种应用意外终止的情况,需要根据日志(内存日志、apache日志、应用日志)找到蛛丝马迹方能抓到幕后黑手。这貌似一句废话.

/var/log/syslog.log

转载于:https://my.oschina.net/CasparLi/blog/336317

如何抓获JVM crash的幕后黑手相关推荐

  1. JVM crash at ForUtil.readBlock

    今天同学让帮忙看下JVM错误日志,才发现已经开始接触java3个月,还没看到相关错误日志.平时看的都只是程序运行时写入的日志,关于JVM的错误日志还真没看过.网上收集资料,整理如下. 一.日志文件: ...

  2. 假笨说-协助美团kafka团队定位到的一个JVM Crash问题

    概述 有挺长一段时间没写技术文章了,正好这两天美团kafka团队有位小伙伴加了我微信,然后咨询了一个JVM crash的问题,大家对crash的问题都比较无奈,因为没有现场,信息量不多,碰到这类问题我 ...

  3. jvm crash分析工具

    为什么80%的码农都做不了架构师?>>>    介绍一款非常好用的jvm crash分析工具,当jvm挂掉时,会产生hs_err_pid.log.里面记录了jvm当时的运行状态以及错 ...

  4. 解决JDBC-ODBC驱动桥导致JVM crash.

    最近帮一个朋友做了一个网站,开始因考虑到访问量不算太高,采用了ACCESS作为数据库,采用了JDBC-ODBC驱动桥接方式.写到配置文件中加载到内存,供系统运行时使用. <dbconfig> ...

  5. jvm crash分析

    问题描述:线上进程异常退出,查看服务器端日志,有jvm crash文件生成 # # A fatal error has been detected by the Java Runtime Enviro ...

  6. jvm crash 的原因以及解决办法

    当JVM crash时,不要慌,首先启动crash的服务,然后再分析原因.那么,应该从哪里入手呢?下面我根据已有的知识,发表下自己的浅见,希望能给和我遇到一样问题的小伙伴一些思路. 谁crash了我们 ...

  7. JVM 系列一:Native memory allocation 导致JVM Crash

    2019独角兽企业重金招聘Python工程师标准>>> JVM Crash抛出如下信息: # # There is insufficient memory for the Java ...

  8. JVM crash 异常分析

    背景 在升级服务jar包的时候,由于采取的是热升级,即在服务运行的时候,替换掉服务的jar包,导致启动的时候JVM崩溃了,产生hs_err_pidxxx.log这样的log文件 错误日志文件 # # ...

  9. java hs_err,分析JVM crash时的hs_err_pidxxx.log文件

    Oracle官网: # # A fatal error has been detected by the Java Runtime Environment: # # SIGSEGV (0xb) at ...

最新文章

  1. 文件解析库doctotext源码分析
  2. Bag of Tricks for Image Classification
  3. 超级猩猩:网红健身房的故事很好,但别为他人做了嫁衣
  4. PP视频怎么设置退出程序时清空本地播放记录
  5. RocketMQ使用mmap - TODO
  6. LeetCode:Add Binary
  7. linux机器crt连接不上,SecureCRT连不上Linux主机了,求破
  8. 使用FlyMcu.exe中出现的问题
  9. 彩色图像灰度化 (RGB ⇒ Gray )(RGB ⇒ YUV)(Verilog)
  10. 干细胞体外培养最新技术进展(2021年9月)
  11. 华为mate40营销之我见
  12. 使用Stratasys创建3D打印医学模型的工作流程
  13. 服务器打不开微信怎么办,手机wifi只能用微信,打不开网页怎么处理?
  14. 敲黑板,定积分也有换元和分部积分法!
  15. 商会管理系统_沈阳写字间丨商会总部大厦
  16. Java 的历史和版本
  17. JavaScript使用事件onclick导致css样式失效问题
  18. spring 整合c3p0连接池
  19. HTML标签检查工具
  20. [实操]ITunes打开时报错.“如果您在 iTunes 中看到 -45054 或 -42110 错误“

热门文章

  1. Centos查看用户登陆记录
  2. 特殊表情存数据库处理
  3. h5能调取摄像头吗_高质感的国产中型车,实力能比肩本田雅阁吗?带你看红旗H5...
  4. oracle连接数增加无法释放,Oracle连接数过多释放机制
  5. 关于计算机软件技术基础知识,计算机软件技术基础知识要点.doc
  6. phpdesigner8 php7.0,大家千万别用PHPDesigner8 的项目替换,多说是泪,改整个站点中!
  7. C++ 使用模板Template
  8. 第一部分 Calendar介绍
  9. 新基建系列之:中国城市人工智能发展指数报告
  10. linux挂载磁盘出问题,centos7 - 出现挂载磁盘,挂载出现问题