项目场景：

采用云资源上部署的oracle RAC 11.2.0.4数据库两节点不定期重启

问题描述

现场反馈，数据库两节点不断重启，检查crs，无重大报错。检查asm日志，发现如下报错。

Fri Sep 09 10:32:50 2022
WARNING: Waited 15 secs for write IO to PST disk 0 in group 1.
WARNING: Waited 15 secs for write IO to PST disk 0 in group 1.
WARNING: Waited 15 secs for write IO to PST disk 0 in group 2.
WARNING: Waited 15 secs for write IO to PST disk 0 in group 2.
Fri Sep 09 10:33:13 2022
NOTE: client exited [2319]
Fri Sep 09 10:33:13 2022
NOTE: ASMB process exiting, either shutdown is in progress
NOTE: or foreground connected to ASMB was killed.
Fri Sep 09 10:33:13 2022
PMON (ospid: 2262): terminating the instance due to error 481
Fri Sep 09 10:33:14 2022
ORA-1092 : opitsk aborting process
Fri Sep 09 10:33:14 2022
License high water mark = 19
Instance terminated by PMON, pid = 2262
USER (ospid: 8682): terminating the instance
Instance terminated by USER, pid = 8682

原因分析：

经过查询oracle官方有关于此问题说明
ASM diskgroup dismount with “Waited 15 secs for write IO to PST” (文档 ID 1581684.1)
Generally this kind messages comes in ASM alertlog file on below situations,
Delayed ASM PST heart beats on ASM disks in normal or high redundancy diskgroup,causes the affected disks to go offline.By default, it is 15 seconds.
Diskgroup will get dismounted if ASM cannot issue the PST heart beat to majority of the PST copies in a diskgroup with respect to redundancy.
i.e. Normal redundancy diskgroup will get dismounted if it failed to update two of the copies.
By the way the heart beat delays are sort of ignored for external redundancy diskgroup.
ASM instance stop issuing more PST heart beat until it succeeds PST revalidation,but the heart beat delays do not dismount external redundancy diskgroup directly.
The ASM disk could go into unresponsiveness, normally in the following scenarios:

Some of the paths of the physical paths of the multipath device are offline or lost

During path ‘failover’ in a multipath set up

Server load, or any sort of storage/multipath/OS maintenance
The Doc ID 10109915.8 briefs about Bug 10109915(this fix introduce this underscore parameter). And the issue is with no OS/Storage tunable timeout mechanism in a case of a Hung NFS Server/Filer. And then _asm_hbeatiowait helps in setting the time out.

上面描述，可以理解为下面几点：

ASM实例会定期检查每一个磁盘组的磁盘状态，是否通信正常；
这个检查，只是针对normal和high冗余模式，对于external冗余，不会遇到这个错误；
默认情况是15s超时，也就是说15s磁盘组还是没有对ASM实例响应的话，就会dismount磁盘组。
此次部署使用云资源共享磁盘，仅一个磁盘，会导致数据库ASM磁盘组宕机。

解决方案：

根据oracle建议，将_asm_hbeatiowait时间调整为120S。

#查看当前_asm_hbeatiowait时间
SQL> select ksppinm as "hidden parameter", ksppstvl as "value" from x$ksppi join x$ksppcv using (indx) where ksppinm like '\_%' escape '\' and ksppinm like '%undo%' order by ksppinm;
hidden parameter value；
_asm_hbeatiowait 15
_asm_hbeatwaitquantum 2#修改_asm_hbeatiowait时间为120S
SQL> alter system set "_asm_hbeatiowait"=120 scope=spfile;#重启CRS和数据库

更改后，观察运行状况，无报错。
建议：
不推荐在虚拟化环境安装oracle rac。

【ORACLE】RAC 磁盘超时，导致数据库重启 WARNING: Waited 15 secs for write IO to PST disk 0 in group 1.相关推荐

WARNING: Waited 15 secs for write IO to PST disk 4 in group 3 in alert_asm.log
昨天同事遇到一个 2节点,基于AIX 7.1的的ASM ocr访问超时的问题,Node2无法正常访问,检查Node2的alert_asm.log日志如下: Reference :ASM diskgro ...
Oracle显示表裂开,【案例】Oracle RAC脑裂导致节点重启原因分析
天萃荷净 Oracle研究中心案例分析:运维DBA反映Oracle RAC重启,分析原因为脑裂导致,结合日志分析产生原因. 本站文章除注明转载外,均为本站原创: 转载自love wife & ...
下载丨7月数据库技术通讯：LINUX OS配置问题导致数据库重启
为了及时共享行业案例,通知共性问题,达成共享和提前预防,我们整理和编辑了<云和恩墨技术通讯>,通过对过去一段时间的知识回顾,故障归纳,以期提供有价值的信息供大家参考.同时,我们也希望能够将 ...
mysql sys库 oom_MySQL 5.6因为OOM导致数据库重启
MySQL 5.6因为OOM导致数据库重启发布时间:2020-08-09 08:29:53 来源:ITPUB博客阅读:89 作者:feelpurple 线上的一套MySQL 5.6的从库,因为OO ...
oracle 单机改rac,把oracle rac 转化作单机数据库
把oracle rac 转化为单机数据库 1. Stop database and CRS on both node $ srvctl stop database -d mydb # crsctl s ...
Oracle RAC CSS 超时计算及参数 misscount， Disktimeout 说明
一. 概述在之前的文章: RAC 的一些概念性和原理性的知识 http://blog.csdn.net/tianlesoftware/article/details/5331067 提到OCSSD ...
oracle rac 磁盘重建,Oracle RAC环境下重建ASM磁盘组 Re-create ASM diskgroup with Oracle RAC...
oracle@node01:/$dbca 查看创建结果: 16)最后,引用原文如下: Steps to Re-Create ASM Diskgroups [ID 268481.1] 修改时间 17-M ...
【北亚数据恢复】Hp DL380服务器raid磁盘故障导致数据库数据丢失的数据恢复案例
环境: HP DL380服务器: 三块300GSAS硬盘: 数据库在D分区: 备份放在E分区. 故障: 一块硬盘出现故障,状态灯红色,RAID瘫痪,存储故障,D分区不能识别,E分区可识别,拷贝备份文件 ...
数据太大导致oracle数据库连接关闭,ORACLE异常关闭后导致数据库报错无法连接问题解决办法-Oracle...
1.首先kill掉所有oracle相关的进程 [oracle@oracle11g db_1]$ps -ef|grep $ORACLE_SID [oracle@oracle11g db_1]$kill ...

【ORACLE】RAC 磁盘超时，导致数据库重启 WARNING: Waited 15 secs for write IO to PST disk 0 in group 1.

项目场景：

问题描述

原因分析：

解决方案：

【ORACLE】RAC 磁盘超时，导致数据库重启 WARNING: Waited 15 secs for write IO to PST disk 0 in group 1.相关推荐

最新文章

热门文章