Oracle集群管理-集群资源crsd异常启动案例,crsd资源offline
1 环境介绍
数据库版本11.2.0.4 RAC环境。
操作系统版本centos 7,
2 故障现象
今日对数据库一个节点进行重启,重启完成后发现。数据库agent信息只有3个agent在运行
[grid@rac02 admin]$ ps -ef|grep agent
patrol 10723 10552 0 09:56 ? 00:00:00 /usr/bin/ssh-agent /bin/sh -c exec -l /bin/bash -c "env GNOME_SHELL_SESSION_MODE=classic gnome-session --session gnome-classic"
grid 16067 1 0 10:02 ? 00:00:00 /u01/11.2.0/bin/oraagent.bin
root 16099 1 0 10:02 ? 00:00:02 /u01/11.2.0/bin/orarootagent.bin
root 16145 1 0 10:02 ? 00:00:00 /u01/11.2.0/bin/cssdagent
grid 19423 16434 0 10:07 pts/2 00:00:00 grep --color=auto agent
查询资源状态信息如下:
[grid@rac02 admin]$ crsctl status res -t -init
--------------------------------------------------------------------------------
NAME TARGET STATE SERVER STATE_DETAILS
--------------------------------------------------------------------------------
Cluster Resources
--------------------------------------------------------------------------------
ora.asm
1 ONLINE ONLINE rac02 Started
ora.cluster_interconnect.haip
1 ONLINE ONLINE rac02
ora.crf
1 ONLINE ONLINE rac02
ora.crsd
1 ONLINE OFFLINE
ora.cssd
1 ONLINE ONLINE rac02
ora.cssdmonitor
1 ONLINE ONLINE rac02
ora.ctssd
1 ONLINE ONLINE rac02 OBSERVER
ora.diskmon
1 OFFLINE OFFLINE
ora.evmd
1 ONLINE ONLINE rac02
ora.gipcd
1 ONLINE ONLINE rac02
ora.gpnpd
1 ONLINE ONLINE rac02
ora.mdnsd
1 ONLINE ONLINE rac02
[grid@rac02 admin]$
crsd资源处于offline状态。
3 日志分析
查询alert日志发现如下信息
[crsd(19237)]CRS-0813:Cluster Ready Service aborted due to failure to initialize the network layer with error [clsclisten failed with ret 3
(File: caa_Socket.cpp, line: 525
]. Details at (:CRSD00133:) in /u01/11.2.0/log/rac02/crsd/crsd.log.
2021-03-10 10:07:17.115:
[ohasd(15918)]CRS-2765:Resource 'ora.crsd' has failed on server 'rac02'.
2021-03-10 10:07:17.116:
[ohasd(15918)]CRS-2771:Maximum restart attempts reached for resource 'ora.crsd'; will not restart.
查询crsd.log发现如下信息:
[ OCRMAS][1132443392]th_master: Received group public data event. Incarnation [1]
2021-03-10 10:07:16.527: [ OCRMAS][1132443392]th_master:1': Recvd pubdata event from node [2]
2021-03-10 10:07:16.527: [ OCRMAS][1132443392]th_master:2': Recvd pubdata event for self. Do nothing.
2021-03-10 10:07:16.533: [ CRSMAIN][1468389184] Running path init...
2021-03-10 10:07:16.539: [ CLSE][1468389184]clse_get_auth_loc: Returning default authloc: /u01/11.2.0/auth/crs/rac02
2021-03-10 10:07:16.539: [ CRSMAIN][1468389184] Using Authorizer location: /u01/11.2.0/auth/crs/rac02
2021-03-10 10:07:16.539: [ CRSMAIN][1468389184] Initialing cluclu context...
2021-03-10 10:07:16.551: [ CLSCLU][1468389184]clsclu_init: rc 0
2021-03-10 10:07:16.551: [ CRSMAIN][1468389184] Getting CR Root...
2021-03-10 10:07:16.555: [ CRSMAIN][1468389184] Initializing RTI
2021-03-10 10:07:16.555: [ CRSMAIN][1468389184] Initializing staging area
2021-03-10 10:07:16.571: [ CLSE][1468389184]clse_get_auth_loc: Returning default authloc: /u01/11.2.0/auth/crs/rac02
2021-03-10 10:07:16.571: [ default][1468389184] AuthLoc /u01/11.2.0/auth/crs/rac02
2021-03-10 10:07:16.571: [ default][1468389184] PE active version: 11.2.0.4.0
2021-03-10 10:07:16.571: [ default][1468389184] PE Engine: NEW
2021-03-10 10:07:16.571: [ default][1468389184] Using OCR batch ops : ENABLED
2021-03-10 10:07:16.571: [ CRSMAIN][1468389184] Creating RTI lock info...
2021-03-10 10:07:16.571: [ CRSMAIN][1468389184] Initializing EVMMgr
2021-03-10 10:07:16.576: [ CRSMAIN][1468389184] Getting local nodename...
[ CLWAL][1468389184]clsw_Initialize: OLR initlevel [70000]
2021-03-10 10:07:16.617: [ OCRSRV][1126139648]th_upgrade: Starting upgrade calculation
2021-03-10 10:07:16.630: [ OCRSRV][1126139648]th_upgrade:10.1 AV [186647552]. State [11]. Already upgraded.Updated global data to the crs version group. Return [0]
2021-03-10 10:07:16.835: [ COMMCRS][1096722176]clsclisten: Error listening on: (ADDRESS=(PROTOCOL=tcp)(HOST=10.2.0.76)(PORT=0))2021-03-10 10:07:16.835: [ COMMCRS][1096722176]clsclisten: op 65 failed, NSerr (12560, 0), transport: (584, 0, 0)
2021-03-10 10:07:16.836: [ CRSD][1468389184] Created alert : (:CRSD00133:) : Unable to get E2E port, error: IOException : clsclisten failed with ret 3
(File: caa_Socket.cpp, line: 5252021-03-10 10:07:16.836: [ CRSD][1468389184][PANIC] CRSD exiting: Unable to get E2E port after 2nd attempt
2021-03-10 10:07:16.836: [ CRSD][1468389184] Done.
查看网卡信息如下:
ens36: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1500
inet 10.2.151.86 netmask 255.255.255.224 broadcast 10.228.151.95
inet6 fe80::250:56ff:fe8d:5908 prefixlen 64 scopeid 0x20<link>
ether 00:50:56:8d:59:08 txqueuelen 1000 (Ethernet)
RX packets 7289 bytes 646307 (631.1 KiB)
RX errors 0 dropped 0 overruns 0 frame 0
TX packets 10140 bytes 6909723 (6.5 MiB)
TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0ens37: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1500
inet 10.2.0.76 netmask 255.255.255.0 broadcast 10.2.0.255
inet6 fe80::250:56ff:fe8d:13fa prefixlen 64 scopeid 0x20<link>
ether 00:50:56:8d:13:fa txqueuelen 1000 (Ethernet)
RX packets 271 bytes 35397 (34.5 KiB)
RX errors 0 dropped 0 overruns 0 frame 0
TX packets 183 bytes 29338 (28.6 KiB)
TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0ens37:1: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1500
inet 169.254.220.193 netmask 255.255.0.0 broadcast 169.254.255.255
ether 00:50:56:8d:13:fa txqueuelen 1000 (Ethernet)lo: flags=73<UP,LOOPBACK,RUNNING> mtu 65536
inet 127.0.0.1 netmask 255.0.0.0
inet6 ::1 prefixlen 128 scopeid 0x10<host>
loop txqueuelen 1000 (Local Loopback)
RX packets 4524 bytes 7037484 (6.7 MiB)
RX errors 0 dropped 0 overruns 0 frame 0
TX packets 4524 bytes 7037484 (6.7 MiB)
TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0virbr0: flags=4099<UP,BROADCAST,MULTICAST> mtu 1500
inet 192.168.122.1 netmask 255.255.255.0 broadcast 192.168.122.255
ether 52:54:00:8d:96:71 txqueuelen 1000 (Ethernet)
RX packets 0 bytes 0 (0.0 B)
RX errors 0 dropped 0 overruns 0 frame 0
TX packets 0 bytes 0 (0.0 B)
TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0virbr0-nic: flags=4098<BROADCAST,MULTICAST> mtu 1500
ether 52:54:00:8d:96:71 txqueuelen 1000 (Ethernet)
RX packets 0 bytes 0 (0.0 B)
RX errors 0 dropped 0 overruns 0 frame 0
TX packets 0 bytes 0 (0.0 B)
TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0[grid@rac02 rac02]$
HAIP已经正常启动。
4 问题解决
后来发现是由于GRID_HOME下sqlnet.ora文件配置存在问题导致scan和普通listener无法正常启动
[grid@rac02 admin]$ rm sqlnet.ora
启动资源
[grid@rac02 admin]$ crsctl start resource "ora.crsd" -init
CRS-2672: Attempting to start 'ora.crsd' on 'rac02'
CRS-2676: Start of 'ora.crsd' on 'rac02' succeeded
[grid@rac02 admin]$ ps -ef|grep tns
root 19 2 0 09:55 ? 00:00:00 [netns]
grid 21423 20603 0 10:12 pts/2 00:00:00 grep --color=auto tns
[grid@rac02 admin]$ ps -ef|grep tns
root 19 2 0 09:55 ? 00:00:00 [netns]
grid 21493 20603 0 10:12 pts/2 00:00:00 grep --color=auto tns
[grid@rac02 admin]$ ps -ef|grep tns
root 19 2 0 09:55 ? 00:00:00 [netns]
grid 21506 20603 0 10:12 pts/2 00:00:00 grep --color=auto tns
[grid@rac02 admin]$ ps -ef|grep tns
root 19 2 0 09:55 ? 00:00:00 [netns]
grid 21513 1 2 10:12 ? 00:00:00 /u01/11.2.0/bin/tnslsnr LISTENER_SCAN1 -inherit
grid 21525 1 0 10:12 ? 00:00:00 /u01/11.2.0/bin/tnslsnr LISTENER -inherit
grid 21546 20603 0 10:12 pts/2 00:00:00 grep --color=auto tns
[grid@rac02 admin]$
资源启动正常。
Oracle集群管理-集群资源crsd异常启动案例,crsd资源offline相关推荐
- 工商银行:应用多k8s集群管理及容灾实践
摘要:在华为开发者大会(Cloud)2021上,工商银行Paas云平台架构师沈一帆发表了<工商银行多k8s集群管理及容灾实践>主题演讲,分享了工商银行使用多云容器编排引擎Karmada的落 ...
- day2-----k8s集群管理常用知识点(1)
使用二进制安装部署K8S的要点︰ 基础设施环境准备好 . CentOS7.6系统(内核在3.8.x以上) · 关闭SELinux,关闭firewalld服务 · 时间同步( chronyd ) --- ...
- Hadoop集群管理与NFS网关
目录 一.Hadoop集群管理 1.访问集群文件系统 2.重新初始化集群 3.增加新的节点 4.修复节点 5.删除节点 二.NFS网关 1.NFS网关概述 2.NFS网关架构图 3.HDFS用户授权 ...
- ElasticSearch 集群管理
ElasticSearch 集群管理 集群介绍 集群:多个人做一样的事 分布式:多个人做不一样的事 为什么要集群,原来的项目都是单体架构,一旦机器挂了,那就不能进行工作了.若是我们用了三台机器,都存储 ...
- 顶目群定义及项目群管理
顶目群定义及项目群管理 项目群是指"一组相互联系的项目,宜使用协同方法进行管理来获得收益和进行控制,而这 种收益和控制在单独管理这些项目时是不易获得的".②正如你想象的那样,将项目 ...
- winform怎么实现七天签到_怎么管理多个微信群?怎么提高微信群管理效率?
怎么管理多个微信群?实际上如今大批量管理微信群,大部分人依靠的是群管理工具来輔助进行,比如:利用群管理工具进行入群通告.群欢迎语.微信群储存.群内全自动答疑解惑.组员管理方法.多群管理.深潜查寻.群活 ...
- 个人微信api接口调用-微信群管理
个人微信api接口调用-微信群管理 /*** 微信群聊管理* @author wechatno:tangjinjinwx* @blog http://www.wlkankan.cn*/@Asyncpu ...
- 微信群管理软件哪个好?
私域流量兴起,社群运营成为热门,越来越多的行业进入社群的圈子.但是社群管理是个超麻烦.巨琐碎的活儿,方法不对,很容易无限陷入死循环. 此时,一个合适的管理工具可以帮我们高效管理,达到事半功倍的效果,为 ...
- 浅谈Oracle RAC --集群管理软件GI
浅谈Oracle RAC --集群管理软件GI基本架构 今天周五,想想可以过周末,心情大好.一周中最喜欢过的就是周五晚上,最不喜欢过的是周日晚上和周一,看来我不是个热爱劳动的人啊.趁着现在心情愉悦,赶 ...
最新文章
- 转:C#动态循环生成button怎么分别写他们的事
- springMVC项目在jboss7中配置应用自己的log4j--转载
- ACE网络编程模式比较
- 河南版权登记,给自己的“孩子”一个身份证
- NULL、nil、Nil、NSNull的区别
- 所以,路遥工具箱到底是什么东西?
- Windows 下 Redis 的下载和安装
- node.js npm常用命令
- QPW 点评点赞日志表(tf_appraise_praise)
- python管理数据库的库_Python中管理数据库
- # 定义四边形_对特殊平行四边形核心梳理,拓展提升思维
- ConfigParser.InterpolationSyntaxError: ‘%‘ must be followed by ‘%‘ or ‘(‘, found: “%‘“
- PXE+kickstart 无人值守安装CentOS 6
- Werkzeug 库——routing 模块简析
- CrossApp应用源码集合贴
- jdk和cglib动态代理
- 计算机械零件体积,第三章机械零件的工作能力和计算准则.docx
- 大数据杀熟 算法_大数据“杀熟” 怎能让真正的刀俎逍遥法外
- matlab2c使用c++实现matlab函数系列教程-kron函数
- 高考作文《细雨闲花》