又有一段时间没写有关oracle的文章了,恐怕这也是不能成为高手的原因之一,偶有典型性拖延症和懒癌。
今天要讲的一个栗子又一个现场case,中午吃饭的时候看到来自同事的未接,处于职业敏感感觉是数据中心出问题了,拨过去第一通电话告诉我存储坏掉一块盘,然后NBU有个备份作业没完成,感觉问题不大通知厂商维修便可以了。刚吃完饭,第二通电话进来了,这次告知我又一个双活控制器存储报警,同时说数据库RAC集群看不到状态了。心中暗暗祈祷不要被近期“比特币勒索”事件波及,上周二刚刚通知各位厂家技术人员排查过,可别害我啊,这玩意儿我真修复不了。
原本打算中午休息,忍了赶快处理吧,下午还有3节c#的实验课呢,周末不太平啊。赶到现场,如我猜想的一样,同事赶着中午在巡检,询问最近的巡检情况,同事告知我周六没有做巡检,周一到周五都做了,没发现问题。打开终端连到db server上,果然是有问题:

[12:59:29][root: ~]#ckrac   自己写个一个小脚本
[12:59:31]CRS-4535: Cannot communicate with Cluster Ready Services
[12:59:31]CRS-4000: Command Status failed, or completed with errors.

首先第一印象判断:集群垮了
马上检查数据库

su - oracle
sqlplus / as sysdba
select * from v$instance;

还好,实例还活着,数据算是基本上不会丢了。
尝试启动集群

13:07:48][root: /u01/app/11.2.0/grid/bin]#./crsctl start cluster
[13:07:48]CRS-2672: Attempting to start 'ora.crsd' on 'rac01'
[13:07:49]CRS-2676: Start of 'ora.crsd' on 'rac01' succeeded

接下就是检查日志了,先检查节点实例的日志,没看出问题

[13:13:33][root: /u01/app/11.2.0/grid/bin]# cd /u01/app/oracle/diag/rdbms/gbf1/GBF11/trace
[13:13:52][root: /u01/app/oracle/diag/rdbms/gbf1/GBF11/trace]#tail -500  alert_GBF11.log |more

有点懵了,确定是集群的问题,这会是什么问题呢?
查看集群日志:没做功课和笔记下场,只有find一下了

[13:19:33][root: ~]#find /u01 -name 'crsd.log'
[13:19:45]/u01/app/11.2.0/grid/log/rac01/crsd/crsd.log[13:20:42]2016-11-20 13:23:50.303: [    CRSD][3134007024] Logging level for Module: COMMCRS  0
[13:20:42]2016-11-20 13:23:50.303: [    CRSD][3134007024] Logging level for Module: COMMNS  0
[13:20:42]2016-11-20 13:23:50.303: [    CRSD][3134007024] Logging level for Module: CSSCLNT  0
[13:20:42]2016-11-20 13:23:50.303: [    CRSD][3134007024] Logging level for Module: GIPCLIB  0
[13:20:42]2016-11-20 13:23:50.303: [    CRSD][3134007024] Logging level for Module: GIPCXBAD  0
[13:20:42]2016-11-20 13:23:50.303: [    CRSD][3134007024] Logging level for Module: GIPCLXPT  0
[13:20:42]2016-11-20 13:23:50.303: [    CRSD][3134007024] Logging level for Module: GIPCUNDE  0
[13:20:42]2016-11-20 13:23:50.303: [    CRSD][3134007024] Logging level for Module: GIPC  0
[13:20:42]2016-11-20 13:23:50.303: [    CRSD][3134007024] Logging level for Module: GIPCGEN  0
[13:20:42]2016-11-20 13:23:50.303: [    CRSD][3134007024] Logging level for Module: GIPCTRAC  0
[13:20:42]2016-11-20 13:23:50.303: [    CRSD][3134007024] Logging level for Module: GIPCWAIT  0
[13:20:42]2016-11-20 13:23:50.303: [    CRSD][3134007024] Logging level for Module: GIPCXCPT  0
[13:20:42]2016-11-20 13:23:50.303: [    CRSD][3134007024] Logging level for Module: GIPCOSD  0
[13:20:42]2016-11-20 13:23:50.303: [    CRSD][3134007024] Logging level for Module: GIPCBASE  0
[13:20:42]2016-11-20 13:23:50.303: [    CRSD][3134007024] Logging level for Module: GIPCCLSA  0
[13:20:42]2016-11-20 13:23:50.303: [    CRSD][3134007024] Logging level for Module: GIPCCLSC  0
[13:20:42]2016-11-20 13:23:50.303: [    CRSD][3134007024] Logging level for Module: GIPCEXMP  0
[13:20:42]2016-11-20 13:23:50.303: [    CRSD][3134007024] Logging level for Module: GIPCGMOD  0
[13:20:42]2016-11-20 13:23:50.303: [    CRSD][3134007024] Logging level for Module: GIPCHEAD  0
[13:20:42]2016-11-20 13:23:50.304: [    CRSD][3134007024] Logging level for Module: GIPCMUX  0
[13:20:42]2016-11-20 13:23:50.304: [    CRSD][3134007024] Logging level for Module: GIPCNET  0
[13:20:42]2016-11-20 13:23:50.304: [    CRSD][3134007024] Logging level for Module: GIPCNULL  0
[13:20:42]2016-11-20 13:23:50.304: [    CRSD][3134007024] Logging level for Module: GIPCPKT  0
[13:20:42]2016-11-20 13:23:50.304: [    CRSD][3134007024] Logging level for Module: GIPCSMEM  0
[13:20:42]2016-11-20 13:23:50.304: [    CRSD][3134007024] Logging level for Module: GIPCHAUP  0
[13:20:42]2016-11-20 13:23:50.304: [    CRSD][3134007024] Logging level for Module: GIPCHALO  0
[13:20:42]2016-11-20 13:23:50.304: [    CRSD][3134007024] Logging level for Module: GIPCHTHR  0
[13:20:42]2016-11-20 13:23:50.304: [    CRSD][3134007024] Logging level for Module: GIPCHGEN  0
[13:20:42]2016-11-20 13:23:50.304: [    CRSD][3134007024] Logging level for Module: GIPCHLCK  0
[13:20:42]2016-11-20 13:23:50.304: [    CRSD][3134007024] Logging level for Module: GIPCHDEM  0
[13:20:42]2016-11-20 13:23:50.304: [    CRSD][3134007024] Logging level for Module: GIPCHWRK  0
[13:20:42]2016-11-20 13:23:50.304: [    CRSD][3134007024] Logging level for Module: CRSMAIN  0
[13:20:42]2016-11-20 13:23:50.304: [    CRSD][3134007024] Logging level for Module: clsdmt  0
[13:20:42]2016-11-20 13:23:50.304: [    CRSD][3134007024] Logging level for Module: clsdms  0
[13:20:42]2016-11-20 13:23:50.304: [    CRSD][3134007024] Logging level for Module: CRSUI  0
[13:20:42]2016-11-20 13:23:50.304: [    CRSD][3134007024] Logging level for Module: CRSCOMM  0
[13:20:42]2016-11-20 13:23:50.304: [    CRSD][3134007024] Logging level for Module: CRSRTI  0
[13:20:42]2016-11-20 13:23:50.304: [    CRSD][3134007024] Logging level for Module: CRSPLACE  0
[13:20:42]2016-11-20 13:23:50.304: [    CRSD][3134007024] Logging level for Module: CRSAPP  0
[13:20:42]2016-11-20 13:23:50.304: [    CRSD][3134007024] Logging level for Module: CRSRES  0
[13:20:42]2016-11-20 13:23:50.304: [    CRSD][3134007024] Logging level for Module: CRSTIMER  0
[13:20:42]2016-11-20 13:23:50.304: [    CRSD][3134007024] Logging level for Module: CRSEVT  0
[13:20:42]2016-11-20 13:23:50.304: [    CRSD][3134007024] Logging level for Module: CRSD  0
[13:20:42]2016-11-20 13:23:50.304: [    CRSD][3134007024] Logging level for Module: CLUCLS  0
[13:20:42]2016-11-20 13:23:50.304: [    CRSD][3134007024] Logging level for Module: CLSVER  0
[13:20:42]2016-11-20 13:23:50.304: [    CRSD][3134007024] Logging level for Module: CLSFRAME  0
[13:20:42]2016-11-20 13:23:50.304: [    CRSD][3134007024] Logging level for Module: CRSPE  0
[13:20:42]2016-11-20 13:23:50.304: [    CRSD][3134007024] Logging level for Module: CRSSE  0
[13:20:42]2016-11-20 13:23:50.304: [    CRSD][3134007024] Logging level for Module: CRSRPT  0
[13:20:42]2016-11-20 13:23:50.304: [    CRSD][3134007024] Logging level for Module: CRSOCR  0
[13:20:42]2016-11-20 13:23:50.304: [    CRSD][3134007024] Logging level for Module: UiServer  0
[13:20:42]2016-11-20 13:23:50.304: [    CRSD][3134007024] Logging level for Module: AGFW  0
[13:20:42]2016-11-20 13:23:50.304: [    CRSD][3134007024] Logging level for Module: SuiteTes  1
[13:20:42]2016-11-20 13:23:50.304: [    CRSD][3134007024] Logging level for Module: CRSSHARE  1
[13:20:42]2016-11-20 13:23:50.304: [    CRSD][3134007024] Logging level for Module: CRSSEC  0
[13:20:42]2016-11-20 13:23:50.304: [    CRSD][3134007024] Logging level for Module: CRSCCL  0
[13:20:42]2016-11-20 13:23:50.304: [    CRSD][3134007024] Logging level for Module: CRSCEVT  0
[13:20:42]2016-11-20 13:23:50.304: [    CRSD][3134007024] Logging level for Module: AGENT  1
[13:20:42]2016-11-20 13:23:50.304: [    CRSD][3134007024] Logging level for Module: OCRAPI  1
[13:20:42]2016-11-20 13:23:50.304: [    CRSD][3134007024] Logging level for Module: OCRCLI  1
[13:20:42]2016-11-20 13:23:50.304: [    CRSD][3134007024] Logging level for Module: OCRSRV  1
[13:20:42]2016-11-20 13:23:50.304: [    CRSD][3134007024] Logging level for Module: OCRMAS  1
[13:20:42]2016-11-20 13:23:50.304: [    CRSD][3134007024] Logging level for Module: OCRMSG  1
[13:20:42]2016-11-20 13:23:50.304: [    CRSD][3134007024] Logging level for Module: OCRCAC  1
[13:20:42]2016-11-20 13:23:50.304: [    CRSD][3134007024] Logging level for Module: OCRRAW  1
[13:20:42]2016-11-20 13:23:50.305: [    CRSD][3134007024] Logging level for Module: OCRUTL  1
[13:20:42]2016-11-20 13:23:50.305: [    CRSD][3134007024] Logging level for Module: OCROSD  1
[13:20:42]2016-11-20 13:23:50.305: [    CRSD][3134007024] Logging level for Module: OCRASM  1
[13:20:42]2016-11-20 13:23:50.305: [ CRSMAIN][3134007024] Checking the OCR device
[13:20:42]2016-11-20 13:23:50.305: [ CRSMAIN][3134007024] Sync-up with OCR
[13:20:42]2016-11-20 13:23:50.305: [ CRSMAIN][3134007024] Connecting to the CSS Daemon
[13:20:42]2016-11-20 13:23:50.305: [ CRSMAIN][3134007024] Getting local node number
[13:20:42]2016-11-20 13:23:50.306: [ CRSMAIN][3134007024] Initializing OCR
[13:20:42]2016-11-20 13:23:50.307: [ CRSMAIN][3127560512] Policy Engine is not initialized yet!
[13:20:42][   CLWAL][3134007024]clsw_Initialize: OLR initlevel [70000]
[13:20:42]2016-11-20 13:23:50.622: [  OCRASM]
****[3134007024]proprasmo: Error in open/create file in dg [OCRVDISK]****
[13:20:42][  OCRASM][3134007024]SLOS : SLOS: cat=8, opn=kgfoOpen01, dep=15056, loc=kgfokge
[13:20:42]
[13:20:42]2016-11-20 13:23:50.623: [  OCRASM][3134007024]ASM Error Stack :
[13:20:42]2016-11-20 13:23:50.667: [  OCRASM][3134007024]proprasmo: kgfoCheckMount returned [6]
[13:20:42]2016-11-20 13:23:50.667: [  OCRASM][3134007024]proprasmo: The ASM disk group OCRVDISK is not found or not mounted
[13:20:42]2016-11-20 13:23:50.667: [  OCRRAW][3134007024]proprioo: Failed to open [+OCRVDISK]. Returned proprasmo() with [26]. Marking location as UNAVAILABLE.
[13:20:42]2016-11-20 13:23:50.667: [  OCRRAW][3134007024]proprioo: No OCR/OLR devices are usable
[13:20:42]2016-11-20 13:23:50.667: [  OCRASM][3134007024]proprasmcl: asmhandle is NULL
[13:20:42]2016-11-20 13:23:50.668: [    GIPC][3134007024] gipcCheckInitialization: possible incompatible non-threaded init from [prom.c : 690], original from [clsss.c : 5343]
[13:20:42]2016-11-20 13:23:50.668: [ default][3134007024]clsvactversion:4: Retrieving Active Version from local storage.
[13:20:42]2016-11-20 13:23:50.670: [ CSSCLNT][3134007024]clssgsgrppubdata: group (ocr_rac-cluster01) not found
[13:20:42]
[13:20:42]2016-11-20 13:23:50.670: [  OCRRAW][3134007024]proprio_repairconf: Failed to retrieve the group public data. CSS ret code [20]
[13:20:42]2016-11-20 13:23:50.672: [  OCRRAW][3134007024]proprioo: Failed to auto repair the OCR configuration.
[13:20:42]2016-11-20 13:23:50.672: [  OCRRAW][3134007024]proprinit: Could not open raw device
[13:20:42]2016-11-20 13:23:50.672: [  OCRASM][3134007024]proprasmcl: asmhandle is NULL
[13:20:42]2016-11-20 13:23:50.675: [  OCRAPI][3134007024]a_init:16!: Backend init unsuccessful : [26]
[13:20:42]2016-11-20 13:23:50.675: [  CRSOCR][3134007024] OCR context init failure.  Error: PROC-26: Error while accessing the physical storage
[13:20:42]
[13:20:42]2016-11-20 13:23:50.675: [    CRSD][3134007024] Created alert : (:CRSD00111:) :  Could not init OCR, error: PROC-26: Error while accessing the physical storage
[13:20:42]
[13:20:42]2016-11-20 13:23:50.675: [    CRSD][3134007024][PANIC] CRSD exiting: Could not init OCR, code: 26
[13:20:42]2016-11-20 13:23:50.675: [    CRSD][3134007024] Done.

可以看出[+OCRVDISK]. 这个盘有问题

su - grid
sqlplus / as sysasm
SQL> select GROUP_NUMBER,NAME,TYPE,ALLOCATION_UNIT_SIZE,STATE from v$asm_diskgroup;GROUP_NUMBER NAME  TYPE         ALLOCATION_UNIT_SIZE STATETYPE         ALLOCATION_UNIT_SIZE STATE
------------ ------------------------------------------------------------------------ -------------------- ----------------------1 BAK01  EXTERN                    4194304 MOUNTED2 DATA01 EXTERN                    4194304 MOUNTED3 GBF1_ARC EXTERN                    4194304 MOUNTED4 GBF2_ARC EXTERN                    4194304 MOUNTED0 OCRVDISK                               0 DISMOUNTED6 YWDB_ARC EXTERN                    4194304 MOUNTED
6 rows selected.
alter diskgroup OCRVDISK mount;

上面的日志进一步印证该问题,这个盘绝对有问题!
查看ASM日志

cd /u01/app/grid/diag/asm/+asm/+ASM1/trace
#tail -100 alert_+ASM1.log
[13:38:07]Thu Nov 17 19:13:56 2016
[13:38:07]WARNING: dirty detached from domain 5
[13:38:07]NOTE: cache dismounted group 5/0xAC0A8092 (OCRVDISK)
[13:38:07]SQL> alter diskgroup OCRVDISK dismount force /* ASM SERVER:2886369426 */   <-- 看这里
[13:38:07]Thu Nov 17 19:13:56 2016
[13:38:07]NOTE: cache deleting context for group OCRVDISK 5/0xac0a8092
[13:38:07]GMON dismounting group 5 at 21 for pid 28, osid 20749
[13:38:07]NOTE: Disk OCRVDISK_0000 in mode 0x7f marked for de-assignment
[13:38:07]NOTE: Disk OCRVDISK_0001 in mode 0x7f marked for de-assignment
[13:38:07]NOTE: Disk OCRVDISK_0002 in mode 0x7f marked for de-assignment
[13:38:07]SUCCESS: diskgroup OCRVDISK was dismounted
[13:38:07]SUCCESS: alter diskgroup OCRVDISK dismount force /* ASM SERVER:2886369426 */
[13:38:07]SUCCESS: ASM-initiated MANDATORY DISMOUNT of group OCRVDISK
[13:38:07]Thu Nov 17 19:13:56 2016
[13:38:07]NOTE: diskgroup resource ora.OCRVDISK.dg is offline
[13:38:07]ASM Health Checker found 1 new failures
[13:38:07]Thu Nov 17 19:13:56 2016
[13:38:07]Errors in file /u01/app/grid/diag/asm/+asm/+ASM1/trace/+ASM1_ora_27147.trc:
[13:38:07]ORA-15078: ASM diskgroup was forcibly dismounted
[13:38:07]Errors in file /u01/app/grid/diag/asm/+asm/+ASM1/trace/+ASM1_ora_27147.trc:
[13:38:07]ORA-15078: ASM diskgroup was forcibly dismounted
[13:38:07]Errors in file /u01/app/grid/diag/asm/+asm/+ASM1/trace/+ASM1_ora_27147.trc:
[13:38:07]ORA-15078: ASM diskgroup was forcibly dismounted
[13:38:07]Errors in file /u01/app/grid/diag/asm/+asm/+ASM1/trace/+ASM1_ora_27147.trc:
[13:38:07]ORA-15078: ASM diskgroup was forcibly dismounted
[13:38:07]Errors in file /u01/app/grid/diag/asm/+asm/+ASM1/trace/+ASM1_ora_27147.trc:
[13:38:07]ORA-15078: ASM diskgroup was forcibly dismounted
[13:38:07]Errors in file /u01/app/grid/diag/asm/+asm/+ASM1/trace/+ASM1_ora_27147.trc:
[13:38:07]ORA-15078: ASM diskgroup was forcibly dismounted
[13:38:07]Errors in file /u01/app/grid/diag/asm/+asm/+ASM1/trace/+ASM1_ora_27147.trc:
[13:38:07]ORA-15078: ASM diskgroup was forcibly dismounted
[13:38:07]Errors in file /u01/app/grid/diag/asm/+asm/+ASM1/trace/+ASM1_ora_27147.trc:
[13:38:07]ORA-15078: ASM diskgroup was forcibly dismounted
[13:38:07]Errors in file /u01/app/grid/diag/asm/+asm/+ASM1/trace/+ASM1_ora_27147.trc:
[13:38:07]ORA-15078: ASM diskgroup was forcibly dismounted
[13:38:07]Thu Nov 17 20:57:21 2016
[13:38:07]Errors in file /u01/app/grid/diag/asm/+asm/+ASM1/trace/+ASM1_ora_27147.trc:
[13:38:07]ORA-15078: ASM diskgroup was forcibly dismounted
[13:38:07]WARNING: requested mirror side 1 of virtual extent 6 logical extent 0 offset 462848 is not allocated; I/O request failed
[13:38:07]WARNING: requested mirror side 2 of virtual extent 6 logical extent 1 offset 462848 is not allocated; I/O request failed
[13:38:07]Errors in file /u01/app/grid/diag/asm/+asm/+ASM1/trace/+ASM1_ora_27147.trc:
<-- 通过此文件进一步追踪
[13:38:07]ORA-15078: ASM diskgroup was forcibly dismounted
[13:38:07]ORA-15078: ASM diskgroup was forcibly dismounted
[13:38:07]Thu Nov 17 20:57:21 2016
[13:38:07]SQL> alter diskgroup OCRVDISK check /* proxy */
[13:38:07]ORA-15032: not all alterations performed
[13:38:07]ORA-15001: diskgroup "OCRVDISK" does not exist or is not mounted
[13:38:07]ERROR: alter diskgroup OCRVDISK check /* proxy */
[13:38:07]Thu Nov 17 20:57:37 2016
[13:38:07]NOTE: client exited [27134]
[13:38:07]Thu Nov 17 20:57:38 2016
[13:38:07]NOTE: [crsd.bin@rac01 (TNS V1-V3) 1688] opening OCR file
[13:38:07]Thu Nov 17 20:57:40 2016
[13:38:07]NOTE: [crsd.bin@rac01 (TNS V1-V3) 1707] opening OCR file
[13:38:07]Thu Nov 17 20:57:42 2016
[13:38:07]NOTE: [crsd.bin@rac01 (TNS V1-V3) 1726] opening OCR file
[13:38:07]Thu Nov 17 20:57:44 2016
[13:38:07]NOTE: [crsd.bin@rac01 (TNS V1-V3) 1749] opening OCR file
[13:38:07]Thu Nov 17 20:57:46 2016
[13:38:07]NOTE: [crsd.bin@rac01 (TNS V1-V3) 1772] opening OCR file
[13:38:07]Thu Nov 17 20:57:48 2016
[13:38:07]NOTE: [crsd.bin@rac01 (TNS V1-V3) 1791] opening OCR file
[13:38:07]Thu Nov 17 20:57:50 2016
[13:38:07]NOTE: [crsd.bin@rac01 (TNS V1-V3) 1814] opening OCR file
[13:38:07]Thu Nov 17 20:57:52 2016
[13:38:07]NOTE: [crsd.bin@rac01 (TNS V1-V3) 1833] opening OCR file
[13:38:07]Thu Nov 17 20:57:54 2016
[13:38:07]NOTE: [crsd.bin@rac01 (TNS V1-V3) 1852] opening OCR file
[13:38:07]Thu Nov 17 20:57:57 2016
[13:38:07]NOTE: [crsd.bin@rac01 (TNS V1-V3) 1872] opening OCR file
[13:38:07]Sun Nov 20 13:23:30 2016
[13:38:07]NOTE: [crsd.bin@rac01 (TNS V1-V3) 28546] opening OCR file
[13:38:07]Sun Nov 20 13:23:32 2016
[13:38:07]NOTE: [crsd.bin@rac01 (TNS V1-V3) 28566] opening OCR file
[13:38:07]Sun Nov 20 13:23:34 2016
[13:38:07]NOTE: [crsd.bin@rac01 (TNS V1-V3) 28585] opening OCR file
[13:38:07]Sun Nov 20 13:23:36 2016
[13:38:07]NOTE: [crsd.bin@rac01 (TNS V1-V3) 28607] opening OCR file
[13:38:07]Sun Nov 20 13:23:38 2016
[13:38:07]NOTE: [crsd.bin@rac01 (TNS V1-V3) 28627] opening OCR file
[13:38:07]Sun Nov 20 13:23:40 2016
[13:38:07]NOTE: [crsd.bin@rac01 (TNS V1-V3) 28651] opening OCR file
[13:38:07]Sun Nov 20 13:23:42 2016
[13:38:07]NOTE: [crsd.bin@rac01 (TNS V1-V3) 28678] opening OCR file
[13:38:07]Sun Nov 20 13:23:44 2016
[13:38:07]NOTE: [crsd.bin@rac01 (TNS V1-V3) 28700] opening OCR file
[13:38:07]Sun Nov 20 13:23:46 2016
[13:38:07]NOTE: [crsd.bin@rac01 (TNS V1-V3) 28723] opening OCR file
[13:38:07]Sun Nov 20 13:23:48 2016
[13:38:07]NOTE: [crsd.bin@rac01 (TNS V1-V3) 28742] opening OCR file
[13:38:07]Sun Nov 20 13:23:50 2016
[13:38:07]NOTE: [crsd.bin@rac01 (TNS V1-V3) 28761] opening OCR file
[13:38:07]Sun Nov 20 13:25:14 2016
[13:38:07]NOTE: [ocrcheck.bin@rac01 (TNS V1-V3) 28960] opening OCR file
[13:38:07]Sun Nov 20 13:38:53 2016
[13:38:07]NOTE: No asm libraries found in the system
[13:38:07]MEMORY_TARGET defaulting to 1128267776.
[13:38:07]* instance_number obtained from CSS = 1, checking for the existence of node 0...
[13:38:07]* node 0 does not exist. instance_number = 1
[13:38:07]Starting ORACLE instance (normal)

进一步追踪ASM1_ora_27147.trc文件

#tail -500  +ASM1_ora_27147.trc
[13:39:54]Trace file /u01/app/grid/diag/asm/+asm/+ASM1/trace/+ASM1_ora_27147.trc
[13:39:54]Oracle Database 11g Enterprise Edition Release 11.2.0.4.0 - 64bit Production
[13:39:54]With the Real Application Clusters and Automatic Storage Management options
[13:39:54]ORACLE_HOME = /u01/app/11.2.0/grid
[13:39:54]System name:    Linux
[13:39:54]Node name:      rac01
[13:39:54]Release:        2.6.39-300.26.1.el5uek
[13:39:54]Version:        #1 SMP Thu Jan 3 18:31:38 PST 2013
[13:39:54]Machine:        x86_64
[13:39:54]VM name:        VMWare Version: 6
[13:39:54]Instance name: +ASM1
[13:39:54]Redo thread mounted by this instance: 0 <none>
[13:39:54]Oracle process number: 24
[13:39:54]Unix process pid: 27147, image: oracle@rac01 (TNS V1-V3)
[13:39:54]
[13:39:54]
[13:39:54]*** 2016-11-17 19:13:56.353
[13:39:54]*** SESSION ID:(3.3) 2016-11-17 19:13:56.353
[13:39:54]*** CLIENT ID:() 2016-11-17 19:13:56.353
[13:39:54]*** SERVICE NAME:() 2016-11-17 19:13:56.353
[13:39:54]*** MODULE NAME:(crsd.bin@rac01 (TNS V1-V3)) 2016-11-17 19:13:56.353
[13:39:54]*** ACTION NAME:() 2016-11-17 19:13:56.353
[13:39:54]
[13:39:54]WARNING:failed xlate 1
[13:39:54]ORA-15078: ASM diskgroup was forcibly dismounted
[13:39:54]WARNING:failed xlate 1
[13:39:54]ORA-15078: ASM diskgroup was forcibly dismounted
[13:39:54]WARNING:failed xlate 1
[13:39:54]ORA-15078: ASM diskgroup was forcibly dismounted
[13:39:54]WARNING:failed xlate 1
[13:39:54]ORA-15078: ASM diskgroup was forcibly dismounted
[13:39:54]WARNING:failed xlate 1
[13:39:54]ORA-15078: ASM diskgroup was forcibly dismounted
[13:39:54]WARNING:failed xlate 1
[13:39:54]ORA-15078: ASM diskgroup was forcibly dismounted
[13:39:54]WARNING:failed xlate 1
[13:39:54]ORA-15078: ASM diskgroup was forcibly dismounted
[13:39:54]WARNING:failed xlate 1
[13:39:54]ORA-15078: ASM diskgroup was forcibly dismounted
[13:39:54]WARNING:failed xlate 1
[13:39:54]ORA-15078: ASM diskgroup was forcibly dismounted
[13:39:54]
[13:39:54]*** 2016-11-17 19:14:18.676
[13:39:54]Received ORADEBUG command (#1) 'CLEANUP_KFK_FD' from process 'Unix process pid: 20811, image: <none>'
[13:39:54]
[13:39:54]*** 2016-11-17 19:14:18.697
[13:39:54]Finished processing ORADEBUG command (#1) 'CLEANUP_KFK_FD'
[13:39:54]
[13:39:54]*** 2016-11-17 20:57:21.852
[13:39:54]WARNING:failed xlate 1
[13:39:54]ORA-15078: ASM diskgroup was forcibly dismounted
[13:39:54]ksfdrfms:Mirror Read file=+OCRVDISK.255.4294967295 fob=0x9d804848 bufp=0x7f0543d2fa00 blkno=1649 nbytes=4096
[13:39:54]WARNING:failed xlate 1
[13:39:54]WARNING: requested mirror side 1 of virtual extent 6 logical extent 0 offset 462848 is not allocated; I/O request failed
[13:39:54]ksfdrfms:Read failed from mirror side=1 logical extent number=0 dskno=65535
[13:39:54]WARNING:failed xlate 1
[13:39:54]WARNING: requested mirror side 2 of virtual extent 6 logical extent 1 offset 462848 is not allocated; I/O request failed
[13:39:54]ksfdrfms:Read failed from mirror side=2 logical extent number=1 dskno=65535
[13:39:54]ORA-15078: ASM diskgroup was forcibly dismounted
[13:39:54]ORA-15078: ASM diskgroup was forcibly dismounted

追踪#tail -500 +ASM1_gmon_27098.trc文件

#tail -500  +ASM1_gmon_27098.trc
[13:43:37]Trace file /u01/app/grid/diag/asm/+asm/+ASM1/trace/+ASM1_gmon_27098.trc
[13:43:37]Oracle Database 11g Enterprise Edition Release 11.2.0.4.0 - 64bit Production
[13:43:37]With the Real Application Clusters and Automatic Storage Management options
[13:43:37]ORACLE_HOME = /u01/app/11.2.0/grid
[13:43:37]System name:    Linux
[13:43:37]Node name:      rac01
[13:43:37]Release:        2.6.39-300.26.1.el5uek
[13:43:37]Version:        #1 SMP Thu Jan 3 18:31:38 PST 2013
[13:43:37]Machine:        x86_64
[13:43:37]VM name:        VMWare Version: 6
[13:43:37]Instance name: +ASM1
[13:43:37]Redo thread mounted by this instance: 0 <none>
[13:43:37]Oracle process number: 19
[13:43:37]Unix process pid: 27098, image: oracle@rac01 (GMON)
[13:43:37]
[13:43:37]
[13:43:37]*** 2016-11-17 19:13:53.292
[13:43:37]*** SESSION ID:(396.1) 2016-11-17 19:13:53.292
[13:43:37]*** CLIENT ID:() 2016-11-17 19:13:53.292
[13:43:37]*** SERVICE NAME:() 2016-11-17 19:13:53.292
[13:43:37]*** MODULE NAME:() 2016-11-17 19:13:53.292
[13:43:37]*** ACTION NAME:() 2016-11-17 19:13:53.292
[13:43:37]
[13:43:37]
[13:43:37]*** TRACE FILE RECREATED AFTER BEING REMOVED ***
[13:43:37]
[13:43:37]WARNING: Waited 15 secs for write IO to PST disk 0 in group 5.
[13:43:37]WARNING: Waited 15 secs for write IO to PST disk 1 in group 5.
[13:43:37]NOTE: Set to be offline flag for disk OCRVDISK_0000 only locally: flag 0x3211
[13:43:37]NOTE: Set to be offline flag for disk OCRVDISK_0001 only locally: flag 0x3211
[13:43:37]----- Abridged Call Stack Trace -----
[13:43:37]
[13:43:37]*** 2016-11-17 19:13:55.556
[13:43:37]ksedsts()+465<-kfdpGc_doTobeoflnAsync()+37<-kfdpGc_checkTobeofln()+623<-kfdpGc_timeout()+17<-kfdp_timeoutBg()+1657<-ksbcti()+513<-ksbabs()+1735<-ksbrdp()+1045<-opirip()+623<-opidrv()+603<-sou2o()+103<-opimai_real()+250<-ssthrdmain()+265<-main()+201<-__libc_start_main()+244
[13:43:37]
[13:43:37]----- End of Abridged Call Stack Trace -----
[13:43:37]GMON checking disk modes for group 5 at 19 for pid 27, osid 20745
[13:43:37]  dsk = 0/0xe94a7097, mask = 0x7e, op = clear
[13:43:37]  dsk = 1/0xe94a7098, mask = 0x7e, op = clear
[13:43:37]POST (justCheck) res = 0
[13:43:37]=============== PST ====================
[13:43:37]grpNum:    5
[13:43:37]state:     2
[13:43:37]callCnt:   19
[13:43:37](lockvalue) valid=0 ver=0.0 ndisks=0 flags=0x2 from inst=0 (I am 1) last=0
[13:43:37]--------------- HDR --------------------
[13:43:37]next:    2
[13:43:37]last:    2
[13:43:37]pst count:       3
[13:43:37]pst locations:   0  1  2
[13:43:37]incarn:          1
[13:43:37]dta size:        0
[13:43:37]version:         1
[13:43:37]ASM version:     186646528 = 11.2.0.0.0
[13:43:37]contenttype:     1
[13:43:37]partnering pattern:      [ ]
[13:43:37]--------------- LOC MAP ----------------
[13:43:37]--------------- DTA --------------------
[13:43:37]--------------- HBEAT ------------------
[13:43:37]kfdpHbeat_dump: state=1, inst=1, ts=33041971.903244800,
[13:43:37]        rnd=287294089.3067682192.4031620725.409975780.
[13:43:37]kfk io-queue:    0x7f0a5cd7f950
[13:43:37]kfdpHbeatCB_dump: at 0x7f0a5cd7f940 with ts=11/17/2016 19:13:28 iop=0x7f0a5cd7f950, grp=5, disk=0/3913969815, isWrite=1 Hbeat #100 state=2 iostate=4
[13:43:37]kfdpHbeatCB_dump: at 0x7f0a5cd7f778 with ts=11/17/2016 19:13:28 iop=0x7f0a5cd7f788, grp=5, disk=1/3913969816, isWrite=1 Hbeat #100 state=2 iostate=4
[13:43:37]kfdpHbeatCB_dump: at 0x7f0a5cd7f5b0 with ts=11/17/2016 19:13:31 iop=0x7f0a5cd7f5c0, grp=5, disk=2/3913969817, isWrite=1 Hbeat #100 state=2 iostate=4
[13:43:37]GMON updating disk modes for group 5 at 20 for pid 27, osid 20745
[13:43:37]  dsk = 0/0xe94a7097, mask = 0x6a, op = clear
[13:43:37]  dsk = 1/0xe94a7098, mask = 0x6a, op = clear
[13:43:37]POST res = 0
[13:43:37]=============== PST ====================
[13:43:37]grpNum:    5
[13:43:37]state:     2
[13:43:37]callCnt:   20
[13:43:37](lockvalue) valid=0 ver=0.0 ndisks=0 flags=0x2 from inst=0 (I am 1) last=0
[13:43:37]--------------- HDR --------------------
[13:43:37]next:    2
[13:43:37]last:    2
[13:43:37]pst count:       3
[13:43:37]pst locations:   0  1  2
[13:43:37]incarn:          1
[13:43:37]dta size:        0
[13:43:37]version:         1
[13:43:37]ASM version:     186646528 = 11.2.0.0.0
[13:43:37]contenttype:     1
[13:43:37]partnering pattern:      [ ]
[13:43:37]--------------- LOC MAP ----------------
[13:43:37]--------------- DTA --------------------
[13:43:37]--------------- HBEAT ------------------
[13:43:37]kfdpHbeat_dump: state=1, inst=1, ts=33041971.903244800,
[13:43:37]        rnd=287294089.3067682192.4031620725.409975780.
[13:43:37]kfk io-queue:    0x7f0a5cd7f950
[13:43:37]kfdpHbeatCB_dump: at 0x7f0a5cd7f940 with ts=11/17/2016 19:13:28 iop=0x7f0a5cd7f950, grp=5, disk=0/3913969815, isWrite=1 Hbeat #100 state=2 iostate=4
[13:43:37]kfdpHbeatCB_dump: at 0x7f0a5cd7f778 with ts=11/17/2016 19:13:28 iop=0x7f0a5cd7f788, grp=5, disk=1/3913969816, isWrite=1 Hbeat #100 state=2 iostate=4
[13:43:37]kfdpHbeatCB_dump: at 0x7f0a5cd7f5b0 with ts=11/17/2016 19:13:31 iop=0x7f0a5cd7f5c0, grp=5, disk=2/3913969817, isWrite=1 Hbeat #100 state=2 iostate=4
[13:43:37]NOTE: kfdp_updateInt: forceDismount grp 5
[13:43:37]NOTE: GMON: failed to update modes: triggering force dismount of group 5
[13:43:37]GMON dismounting group 5 at 21 for pid 28, osid 20749
[13:43:37]NOTE: kfdp_doDismount: dismount grp 5
[13:43:37]
[13:43:37]*** 2016-11-17 19:13:58.596
[13:43:37]NOTE: kfdpUtil_freeSlMsg: ksvtst for grp: 5 KSV status 0x10

这是oracle11g 一个bug,以下是恢复操作:
step1.mount 问题ASM盘

su - grid
sqlplus / as sysasm
[13:51:35]SQL> alter diskgroup OCRVDISK mount;
[13:51:42]
[13:51:42]Diskgroup altered.

step2.恢复服务

cd /u01/app/11.2.0/grid/bin
#./crsctl start crs
[13:54:28]CRS-4640: Oracle High Availability Services is already active
[13:54:28]CRS-4000: Command Start failed, or completed with errors.

使用该方法拉起集群

./crsctl start res ora.crsd -init
[13:54:37]CRS-2672: Attempting to start 'ora.crsd' on 'rac01'
[13:54:38]CRS-2676: Start of 'ora.crsd' on 'rac01' succeeded

检查

[13:54:46][root: /u01/app/11.2.0/grid/bin]#./crsctl check crs
[13:54:46]CRS-4638: Oracle High Availability Services is online
[13:54:46]CRS-4535: Cannot communicate with Cluster Ready Services
[13:54:46]CRS-4529: Cluster Synchronization Services is online
[13:54:46]CRS-4533: Event Manager is online
[13:54:57][root: /u01/app/11.2.0/grid/bin]#./crsctl check cluster -all
[13:55:01]**************************************************************
[13:55:01]rac01:
[13:55:01]CRS-4535: Cannot communicate with Cluster Ready Services
[13:55:01]CRS-4529: Cluster Synchronization Services is online
[13:55:01]CRS-4533: Event Manager is online
[13:55:01]**************************************************************
[13:55:01]rac02:
[13:55:01]CRS-4535: Cannot communicate with Cluster Ready Services
[13:55:01]CRS-4529: Cluster Synchronization Services is online
[13:55:01]CRS-4533: Event Manager is online
[13:55:01]**************************************************************
[13:55:06][root: /u01/app/11.2.0/grid/bin]#./crs_stat -t
[13:55:06]Name           Type           Target    State     Host
[13:55:06]------------------------------------------------------------
[13:55:06]ora.BAK01.dg   ora....up.type ONLINE    ONLINE    rac01
[13:55:06]ora.DATA01.dg  ora....up.type ONLINE    ONLINE    rac01
[13:55:06]ora...._ARC.dg ora....up.type ONLINE    ONLINE    rac01
[13:55:06]ora...._ARC.dg ora....up.type ONLINE    ONLINE    rac01
[13:55:06]ora....ER.lsnr ora....er.type ONLINE    ONLINE    rac01
[13:55:06]ora....N1.lsnr ora....er.type ONLINE    OFFLINE
[13:55:06]ora....DISK.dg ora....up.type ONLINE    ONLINE    rac01
[13:55:06]ora...._ARC.dg ora....up.type ONLINE    ONLINE    rac01
[13:55:06]ora.asm        ora.asm.type   ONLINE    ONLINE    rac01
[13:55:06]ora.cvu        ora.cvu.type   ONLINE    OFFLINE
[13:55:06]ora.gbf1.db    ora....se.type ONLINE    ONLINE    rac01
[13:55:06]ora.gbf2.db    ora....se.type ONLINE    ONLINE    rac01
[13:55:06]ora.gsd        ora.gsd.type   OFFLINE   OFFLINE
[13:55:06]ora....network ora....rk.type ONLINE    ONLINE    rac01
[13:55:06]ora.oc4j       ora.oc4j.type  ONLINE    OFFLINE
[13:55:06]ora.ons        ora.ons.type   ONLINE    ONLINE    rac01
[13:55:06]ora....SM1.asm application    ONLINE    ONLINE    rac01
[13:55:06]ora....01.lsnr application    ONLINE    ONLINE    rac01
[13:55:06]ora.rac01.gsd  application    OFFLINE   OFFLINE
[13:55:06]ora.rac01.ons  application    ONLINE    ONLINE    rac01
[13:55:06]ora.rac01.vip  ora....t1.type ONLINE    ONLINE    rac01
[13:55:06]ora.rac02.vip  ora....t1.type ONLINE    OFFLINE
[13:55:06]ora....ry.acfs ora....fs.type ONLINE    ONLINE    rac01
[13:55:06]ora.scan1.vip  ora....ip.type ONLINE    OFFLINE
[13:55:06]ora.ywdb.db    ora....se.type ONLINE    ONLINE    rac01
[13:55:09][root: /u01/app/11.2.0/grid/bin]#ckrac

oracle 11g RAC ASM磁盘被强制下线抢修一例相关推荐

  1. oracle asm spfile丢失,Oracle 11g RAC ASM磁盘全部丢失后的恢复

    Oracle 11g RAC ASM磁盘全部丢失后的恢复,Oracle 11.2.0.3 RAC ON Oracle Linux 6 x86_64,只有一个ASM外部冗余磁盘组mdash;m 一.环境 ...

  2. ORACLE 11G RAC ASM磁盘全部丢失后的恢复

    故障描述 (1)存储故障导致ASM磁盘丢失. (2)CRS因为OCR和VOTEDISK的丢失,除了OHAS还联机外,CLUSTERWARE服务都已经停止 操作步骤 一.恢复OCR和VOTEDISK ( ...

  3. oracle 磁盘组掉了,Oracle 11g RAC ASM磁盘全部丢失后的恢复(一)

    一.环境描述 (1) 11.2.0.3 RAC ON Oracle Linux 6 x86_64,只有一个ASM外部冗余磁盘组--DATA: (2)OCR,VOTEDISK,DATAFILE,CONT ...

  4. oracle 11g rac asm磁盘组增加硬盘

    要增加磁盘的磁盘组为:DATA 要增加的磁盘为: /dev/sde1 在第一个节点上: [root@rac1 ~]# fdisk /dev/sde Device contains neither a ...

  5. Oracle 11g RAC ASM 错误之(1)

    Oracle 11g RAC ASM 错误之(1) 系统环境: 操作系统:RedHat EL5.5 集群软件:  GI (11.2.0.1) 数据库软件:Oracle 11g(11.2.0.1) 故障 ...

  6. oracle 11g crs stat,Oracle 11g RAC CRS磁盘丢失后恢复

    一.概述 为了方便相关问题测试,我在本机搭建了一套RAC环境,但昨天打开后却发现RAC无法启动了,不错,就当一次实战演练了. 测试环境:RedHat6.3_x64+ Oracle11gr2 RAC 二 ...

  7. oracle crs 4535,Oracle 11g RAC CRS磁盘丢失后恢复

    一.概述 为了方便相关问题测试,我在本机搭建了一套RAC环境,但昨天打开后却发现RAC无法启动了,不错,就当一次实战演练了. 一.概述 为了方便相关问题测试,我在本机搭建了一套RAC环境,但昨天打开后 ...

  8. 【RAC】oracle 11g RAC两个机器重启后节点二能找到asm磁盘,节点一没有找到

         oracle 11g RAC两个机器重启后节点二能找到asm磁盘,节点一没有找到 [root@rac1 ~]# oracleasm scandisks Reloading disk part ...

  9. Oracle 11g rac 生产环境asm磁盘迁移

    Oracle 11g rac 生产环境asm磁盘迁移 一.配置存储 二.配置udev(两个节点都需要操作) 三.创建新的ocr磁盘组orc 四.备份原ocr和votedisk 五.迁移OCR磁盘组 六 ...

最新文章

  1. 基于光线追踪的Mental Ray鱼眼镜头的编程实现 作者:华文广
  2. java自动化初始变量_Java自动化测试-01.环境准备(JDK/环境变量/Intellij IDEA安装)
  3. 贯穿你一生的14条社会生计
  4. 阶段1 语言基础+高级_1-3-Java语言高级_02-继承与多态_第1节 继承_6_继承中方法的覆盖重写_概念...
  5. C++ class类 实现搜索二叉树(BST)
  6. MongoDB投影字段
  7. 特征筛选:WOE值与IV值
  8. 三极管原理及特性分析
  9. html中怎么写小箭头,纯CSS代码实现各种小箭头
  10. Django读取图片
  11. C++之字符串大小写转换
  12. WIN10极限清理 C盘空间
  13. PI数据库开发-java(读写pi中的时序数据和关系数据)
  14. 二本考生考研复试331分逆袭390分上岸,被传教务处长之子?校方回应!
  15. 神经网络和人工智能原理,人工神经网络基本原理
  16. 霍常亮app淘宝客开发视频教程第5节
  17. VBA过程调用和参数传递
  18. 【连载】【STM32神舟III号实验例程】流水灯实验(实验一)
  19. 澳洲语言成绩等级c,澳大利亚对A-Level成绩要求分为几个等级
  20. 特征点检测之SURF

热门文章

  1. Android移动应用开发之使用异步调用进度条及实现幻灯片切换效果
  2. 2020年7月-8月CISP及CISP-PTE线下考试时间
  3. 麦块一直在安装java_麦块我的世界盒子
  4. 菏泽中考报名不报计算机,2019菏泽中考报名志愿填报时间方法
  5. 2021-06-03Leetcode 438.找到字符串中所有字母异位词
  6. java webapprootkey_web.xml中的webAppRootKey
  7. Layui弹出层关闭
  8. Error: 17172, Severity: 16 SNIInitialize() failed with error 0x2.
  9. EN 14041弹性织物层压板和模块化多层地板—CE认证
  10. mysql慢查询打印_mysql慢查询