【案例】Oracle CSSD进程HANG导致RAC节点重启原因分析笔记

时间:2016-11-04 19:20   来源:Oracle研究中心   作者:HTZ   点击:

天萃荷净

Oracle研究中心案例分析:运维DBA反映Oracle RAC数据库节点异常重启,通过分析RAC两节点日志得出是由CSSD进程HANG导致RAC节点重启。

下面是模拟主机OCSSD.LOG进程HANG住导致主机重启

1,环境介绍

[root@cisser2 ~]# crsctl query crs activeversion

CRS active version on the cluster is [10.2.0.5.0]

[root@cisser2 ~]# lsb_release -a

LSB Version: :core-4.0-amd64:core-4.0-ia32:core-4.0-noarch:graphics-4.0-amd64:graphics-4.0-ia32:graphics-4.0-noarch:printing-4.0-amd64:printing-4.0-ia32:printing-4.0-noarch

Distributor ID: RedHatEnterpriseServer

Description: Red Hat Enterprise Linux Server release 5.11 (Tikanga)

Release: 5.11

Codename: Tikanga

2,手动暂停OCSSD进程

[root@cisser1 ~]# ps -ef|grep d.bin

oracle 4666 4665 0 11:23 00:00:00 /oracle/app/oracle/product/10.2.0/crs_1/bin/evmd.bin

root 4746 4063 0 11:23 00:00:00 /oracle/app/oracle/product/10.2.0/crs_1/bin/crsd.bin reboot

root 5258 4874 0 11:23 00:00:00 /oracle/app/oracle/product/10.2.0/crs_1/bin/oprocd.bin run -t 1000 -m 500 -f

oracle 5348 4903 0 11:23 00:00:00 /oracle/app/oracle/product/10.2.0/crs_1/bin/ocssd.bin

root 14621 13718 0 11:44 pts/1 00:00:00 grep d.bin

[root@cisser1 ~]# kill -19 5348

30S后会生成下面的日志

3,主机messages日志

Mar 29 11:45:23 cisser1 logger: Oracle clsomon failed with fatal status 13.

这里看到状态为13,这里由于status代码不通,可能出错的原因不通,下面是常见的代码说明。

/* 10-39 are reserved for various kinds of steady state errors

* i.e. anything that comes after the group registration.

*/

clssomonretMEM = 11, /* memory allocation failure */

clssomonretCSS = 12, /* misc error in CSS layer */

clssomonretFATAL = 13, /* failure in CSS layer that should cause a reboot*/

clssomonretOCR = 14, /* misc error in OCR layer */

clssomonretOSD = 15, /* error in OSD layer used by generic code*/

/* 40-69 are reserved for various kinds of initialization errors

* i.e. anything that comes before the group registration.

*/

clssomonretCRSHOME = 40, /* CRS home is unavailable. */

clssomonretHOSTNAME = 42, /* unable to fetch hostname */

clssomonretSTDERR = 43, /* failure redirecting stderr */

clssomonretSTDOUT = 44, /* failure redirecting stdout */

clssomonretCHDIR = 45, /* failure redirecting corefile */

clssomonretARGS = 50, /* error processing arguments */

clssomonretCSSINIT = 51, /* failure initializing CSS-objects/APIs */

clssomonretCSSINIT = 51, /* failure initializing CSS-objects/APIs */

clssomonretOCRINIT = 52, /* failure initializing OCR-objects/APIs */

clssomonretOSDINIT = 53, /* error in OSD layer used by generic code*/

clssomonretMEMINIT = 54, /* unable to allocate memory during init */

clssomonretREINIT = 55, /* exceeded the CSS context reinit limit */

clssomonretINUSE = 56, /* duplicate oclsomon found */

4,ocssd于oclsomon日志

由于ocssd进程已经暂停了,所有ocssd没有任何日志信息

下面是clsomon日志信息

[root@cisser1 ~]# tail -f /oracle/app/oracle/product/10.2.0/crs_1/log/cisser1/cssd/oclsomon/oclsomon.log

2015-03-29 11:23:05.781: clsc_connect: (0x8467d70) no listener at (ADDRESS=(PROTOCOL=ipc)(KEY=OCSSD_LL_cisser1_))

2015-03-29 11:23:08.435: clsc_connect: (0x8466450) no listener at (ADDRESS=(PROTOCOL=ipc)(KEY=OCSSD_LL_cisser1_))

2015-03-29 11:23:15.684 clssomon: end of cssinit, status 0

2015-03-29 11:23:15.685 Reconfig event. (1/1/1)

2015-03-29 11:23:16.186 Reconfig event. (2/2/1)

2015-03-29 11:45:23.250 clssomon: Timeout waiting for CSS response.

5,节点2的日志信息

5.1 ocssd日志信息

[ CSSD]2015-03-29 11:44:55.852 [633092416] >WARNING: clssnmPollingThread: node cisser1 (1) at 50% heartbeat fatal, eviction in 29.810 seconds seedhbimpd 0

[ CSSD]2015-03-29 11:44:55.852 [633092416] >TRACE: clssnmPollingThread: node cisser1 (1) is impending reconfig, flag 1039, misstime 30190

[ CSSD]2015-03-29 11:44:55.852 [633092416] >TRACE: clssnmPollingThread: diskTimeout set to (57000)ms impending reconfig status(1)

[ CSSD]2015-03-29 11:44:56.854 [633092416] >WARNING: clssnmPollingThread: node cisser1 (1) at 50% heartbeat fatal, eviction in 28.810 seconds seedhbimpd 1

[ CSSD]2015-03-29 11:44:58.709 [643582272] >TRACE: clssnmSendingThread: sending status msg to all nodes

[ CSSD]2015-03-29 11:44:58.709 [643582272] >TRACE: clssnmSendingThread: sent 5 status msgs to all nodes

[ CSSD]2015-03-29 11:45:02.716 [643582272] >TRACE: clssnmSendingThread: sending status msg to all nodes

[ CSSD]2015-03-29 11:45:02.716 [643582272] >TRACE: clssnmSendingThread: sent 4 status msgs to all nodes

[ CSSD]2015-03-29 11:45:07.725 [643582272] >TRACE: clssnmSendingThread: sending status msg to all nodes

[ CSSD]2015-03-29 11:45:07.725 [643582272] >TRACE: clssnmSendingThread: sent 5 status msgs to all nodes

[ CSSD]2015-03-29 11:45:10.855 [633092416] >WARNING: clssnmPollingThread: node cisser1 (1) at 75% heartbeat fatal, eviction in 14.810 seconds seedhbimpd 1

[ CSSD]2015-03-29 11:45:11.857 [633092416] >WARNING: clssnmPollingThread: node cisser1 (1) at 75% heartbeat fatal, eviction in 13.810 seconds seedhbimpd 1

[ Oracle о CSSD]2015-03-29 11:45:12.733 [643582272] >TRACE: clssnmSendingThread: sending status msg to all nodes

[ CSSD]2015-03-29 11:45:12.733 [643582272] >TRACE: clssnmSendingThread: sent 5 status msgs to all nodes

[ CSSD]2015-03-29 11:45:16.738 [643582272] >TRACE: clssnmSendingThread: sending status msg to all nodes

[ CSSD]2015-03-29 11:45:16.738 [643582272] >TRACE: clssnmSendingThread: sent 4 status msgs to all nodes

[ CSSD]2015-03-29 11:45:19.858 [633092416] >WARNING: clssnmPollingThread: node cisser1 (1) at 90% heartbeat fatal, eviction in 5.810 seconds seedhbimpd 1

[ CSSD]2015-03-29 11:45:20.744 [643582272] >TRACE: clssnmSendingThread: sending status msg to all nodes

[ CSSD]2015-03-29 11:45:20.744 [643582272] >TRACE: clssnmSendingThread: sent 4 status msgs to all nodes

[ CSSD]2015-03-29 11:45:20.859 [633092416] >WARNING: clssnmPollingThread: node cisser1 (1) at 90% heartbeat fatal, eviction in 4.810 seconds seedhbimpd 1

[ CSSD]2015-03-29 11:45:21.860 [633092416] >WARNING: clssnmPollingThread: node cisser1 (1) at 90% heartbeat fatal, eviction in 3.810 seconds seedhbimpd 1

[ CSSD]2015-03-29 11:45:22.673 [579823936] >TRACE: clssgmAllocateRPCIndex: allocated rpc 326 (0x2b7b1f5a1310)

[ CSSD]2015-03-29 11:45:22.673 [579823936] >TRACE: clssgmRPC: rpc 0x2b7b1f5a1310 (RPC#326) tag(146002a) sent to node 1

[ CSSD]2015-03-29 11:45:22.861 [633092416] >WARNING: clssnmPollingThread: node cisser1 (1) at 90% heartbeat fatal, eviction in 2.810 seconds seedhbimpd 1

[ CSSD]2015-03-29 11:45:23.863 [633092416] >WARNING: clssnmPollingThread: node cisser1 (1) at 90% heartbeat fatal, eviction in 1.800 seconds seedhbimpd 1

[ CSSD]2015-03-29 11:45:24.855 [633092416] >WARNING: clssnmPollingThread: node cisser1 (1) at 90% heartbeat fatal, eviction in 0.810 seconds seedhbimpd 1

[ CSSD]2015-03-29 11:45:25.667 [633092416] >TRACE: clssnmPollingThread: Eviction started for node cisser1 (1), flags 0x040f, state 3, wt4c 0 seedhbimpd 1

这里看以看到主机2在11:45:25的时候开始驱除主机1,但是主机在11:45:23分的时候就开始重启主机了,所以主机重启由于oclsomon进程导致的,而不是节点驱除导致的。

本文固定链接: http://www.htz.pw/2015/03/31/%e6%a8%a1%e6%8b%9fcssd%e8%bf%9b%e7%a8%8bhang%e5%af%bc%e8%87%b4rac%e8%8a%82%e7%82%b9%e9%87%8d%e5%90%af.html | 认真就输

--------------------------------------ORACLE-DBA----------------------------------------

最权威、专业的Oracle案例资源汇总之【案例】Oracle CSSD进程HANG导致RAC节点重启原因分析笔记

oracle10g cssd日志,【案例】Oracle CSSD进程HANG导致RAC节点重启原因分析笔记相关推荐

  1. Oracle显示表裂开,【案例】Oracle RAC脑裂导致节点重启原因分析

    天萃荷净 Oracle研究中心案例分析:运维DBA反映Oracle RAC重启,分析原因为脑裂导致,结合日志分析产生原因. 本站文章除注明转载外,均为本站原创: 转载自love wife & ...

  2. oracle19c集群重启,由重启引起的Oracle RAC节点宕机分析及追根溯源

    原标题:由重启引起的Oracle RAC节点宕机分析及追根溯源 作者介绍 裴征峰,现就职于北京海天起点,二线专家成员,南京办事处负责人,OCP 10g.OCP 11g.OCM11g.超八年Oracle ...

  3. 本机未装Oracle数据库时Navicat for Oracle 报错:Cannot create oci environment 原因分析及解决方案

    本机未装Oracle数据库时Navicat for Oracle 报错:Cannot create oci environment 原因分析及解决方案 参考文章: (1)本机未装Oracle数据库时N ...

  4. linux 服务hang住原因,Python主进程hang住的两个原因

    最近使用Python遇到两个非常不好定位的问题,表现都是Python主进程hang住.最终定位出一个是subprocess模块的问题,一个是threading.Timer线程的问题. subproce ...

  5. oracle rac节点重启,oracle RAC一个节点频繁重启解决

    oracle RAC一个节点频繁重启解决 类别:Oracle数据库   作者:码皇   来源:hijk139的专栏     点击: oracle RAC一个节点频繁重启解决故障现象:2011年的一次问 ...

  6. oracle10g生成awr报告,oracle 10g awr报告生成步骤及awr报告分析

    3. io:如果需要的数据在内存中没有,则需要到磁盘中去取,就会用到物理io了,还有表之间的连接数据太多,以及排序等操作内存放不下的时候,也需要用到临时表空间,也就用到物理io了 这里有一点说明的是, ...

  7. oracle rodm包,由重启引起的Oracle RAC节点宕机分析及追根溯源

    作者介绍 裴征峰,现就职于北京海天起点,二线专家成员,南京办事处负责人,OCP 10g.OCP 11g.OCM11g.超八年Oracle服务经验,擅长数据库故障诊断和性能调优.目前主要从事客户的现场维 ...

  8. oracle 监听 宕机,由重启引起的Oracle RAC节点宕机分析及追根溯源

    作者介绍 裴征峰,现就职于北京海天起点,二线专家成员,南京办事处负责人,OCP 10g.OCP 11g.OCM11g.超八年Oracle服务经验,擅长数据库故障诊断和性能调优.目前主要从事客户的现场维 ...

  9. oracle rac节点重启的原因,由重启引起的Oracle RAC节点宕机分析及追根溯源

    作者介绍 裴征峰,现就职于北京海天起点,二线专家成员,南京办事处负责人,OCP 10g.OCP 11g.OCM11g.超八年Oracle服务经验,擅长数据库故障诊断和性能调优.目前主要从事客户的现场维 ...

最新文章

  1. sql server优化策略
  2. 长沙哪招jaVa后端开发人才_求职:Java后台开发-何柄融-湖南大学
  3. 前端学习(2627):node安装
  4. Ethercat解析(一)之获取、编译、安装(Ubuntu14.04)
  5. 20个优秀的 CSS 网格系统(CSS Grid Systems)推荐
  6. spy-debugger 前端调试工具
  7. Python回归 岭回归(Ridge Regression)
  8. 9、Fiddler中设置断点修改Request
  9. 查询一个月一来表空间的变化情况
  10. Python——format()/str.format()函数
  11. iOS开发中通知(Notification)快速入门及推送通知实现教程
  12. 第三章:logback 的配置
  13. 2022 年度杭州未来科技城数字经济人才编程大赛
  14. 从程序员到项目经理(九):程序员加油站 — 再牛也要合群
  15. 有线路由器加无线路由器的组网方法
  16. Edge浏览器在新标签页打开链接(操作方法)
  17. Camera基本结构及原理
  18. KITTI数据集下载(百度云)
  19. matlab函数——shading函数
  20. BZOJ1067降雨量

热门文章

  1. jenkins_使用Jenkins / Hudson远程API检查作业状态
  2. ssh mysql 警告_ssh 对数据表查询出错。警告: SQL Error: 1064, SQLState: 42000
  3. mysql安装start service错误_为什么安装mysql会出现start service错误
  4. android界面怎么优化字符串,android应用性能优化
  5. 特征值与特征向量 matlab数值解,用MATLAB和numpy求解特征值和特征向量,matlab,与
  6. php随机图片github,PHP随机图片代码
  7. 年月跨度_预应力跨度国内第一!1850吨146米跨度钢结构整体拔高22米!
  8. Spring Cloud 2022.0.0 M1 发布:需Java 17、兼容Spring Boot 3
  9. 皮一皮:据说这样去拜佛比较灵...
  10. Elasticsearch Top 51 重中之重面试题及答案