11Grac+ASM+linux2.6.18 processes (100) exceeded

现象：两个节点的11grac环境，在使用软件做复制时，rac1直接down机，rac2ASM实例重启了一下，然后就好了。但rac1一直没启动

由于rac1一直down机无法登陆上去，所以只好登陆rac2上检查log日志

rac2的alter日志

Stopping background process CJQ0
Tue Jun 26 14:38:18 2012
NOTE: ASMB terminating
Errors in file /oracle/app/db/diag/rdbms/xxxx/xxxx2/trace/xxxx2_asmb_12325.trc:
ORA-15064: communication failure with ASM instance
ORA-03113: end-of-file on communication channel
Process ID:
Session ID: 103 Serial number: 5
Errors in file /oracle/app/db/diag/rdbms/xxxx/xxxx2/trace/xxxx2_asmb_12325.trc:
ORA-15064: communication failure with ASM instance
ORA-03113: end-of-file on communication channel
Process ID:
Session ID: 103 Serial number: 5
ASMB (ospid: 12325): terminating the instance due to error 15064
Termination issued to instance processes. Waiting for the processes to exit
Tue Jun 26 14:38:30 2012
Instance termination failed to kill one or more processes
Instance terminated by ASMB, pid = 12325
Tue Jun 26 14:58:40 2012
Adjusting the default value of parameter parallel_max_servers
from 2560 to 1485 due to the value of parameter processes (1500)
Starting ORACLE instance (normal)
Tue Jun 26 14:59:04 2012
LICENSE_MAX_SESSION = 0
LICENSE_SESSIONS_WARNING = 0
Private Interface 'ib0:1' configured from GPnP for use as a private interconnect.
[name='ib0:1', type=1, ip=169.254.41.10, mac=80-00-00-48-fe-80, net=169.254.0.0/18, mask=255.255.192.0, use=haip:cluster_interconnect/62]
Private Interface 'ib1:1' configured from GPnP for use as a private interconnect.
[name='ib1:1', type=1, ip=169.254.67.102, mac=80-00-00-49-fe-80, net=169.254.64.0/18, mask=255.255.192.0, use=haip:cluster_interconnect/62]
Private Interface 'ib2:1' configured from GPnP for use as a private interconnect.
[name='ib2:1', type=1, ip=169.254.163.124, mac=80-00-00-48-fe-80, net=169.254.128.0/18, mask=255.255.192.0, use=haip:cluster_interconnect/62]
Private Interface 'ib3:1' configured from GPnP for use as a private interconnect.
[name='ib3:1', type=1, ip=169.254.232.204, mac=80-00-00-49-fe-80, net=169.254.192.0/18, mask=255.255.192.0, use=haip:cluster_interconnect/62]
Public Interface 'bond0' configured from GPnP for use as a public interface.
[name='bond0', type=1, ip=10.240.52.148, mac=00-16-35-02-7f-02, net=10.240.52.128/25, mask=255.255.255.128, use=public/1]
Public Interface 'bond0:1' configured from GPnP for use as a public interface.
[name='bond0:1', type=1, ip=10.240.52.151, mac=00-16-35-02-7f-02, net=10.240.52.128/25, mask=255.255.255.128, use=public/1]
Public Interface 'bond0:2' configured from GPnP for use as a public interface.
[name='bond0:2', type=1, ip=10.240.52.149, mac=00-16-35-02-7f-02, net=10.240.52.128/25, mask=255.255.255.128, use=public/1]
Public Interface 'bond0:3' configured from GPnP for use as a public interface.
[name='bond0:3', type=1, ip=10.240.52.150, mac=00-16-35-02-7f-02, net=10.240.52.128/25, mask=255.255.255.128, use=public/1]
Picked latch-free SCN scheme 3
Tue Jun 26 15:00:22 2012
Autotune of undo retention is turned on.
LICENSE_MAX_USERS = 0
SYS auditing is disabled
Starting up:
Oracle Database 11g Enterprise Edition Release 11.2.0.2.0 - 64bit Production
With the Partitioning, Real Application Clusters, OLAP, Data Mining

发现14:38分报错

单独trace出来的日志

Trace file /oracle/app/db/diag/rdbms/xxxx/xxxx2/trace/xxxx2_asmb_12325.trc
Oracle Database 11g Enterprise Edition Release 11.2.0.2.0 - 64bit Production
With the Partitioning, Real Application Clusters, OLAP, Data Mining
and Real Application Testing options
ORACLE_HOME = /oracle/app/db/11gr2
System name: Linux
Node name: xxxx
Release: 2.6.18-274.el5
Version: #1 SMP Fri Jul 8 17:36:59 EDT 2011
Machine: x86_64
Instance name: xxxx2
Redo thread mounted by this instance: 0 <none>
Oracle process number: 33
Unix process pid: 12325, p_w_picpath: oracle@xxxx (ASMB)
*** 2012-06-25 14:07:00.911
*** SESSION ID:(1189.1) 2012-06-25 14:07:00.911
*** CLIENT ID:() 2012-06-25 14:07:00.911
*** SERVICE NAME:() 2012-06-25 14:07:00.911
*** MODULE NAME:() 2012-06-25 14:07:00.911
*** ACTION NAME:() 2012-06-25 14:07:00.911
NOTE: initiating MARK startup
*** 2012-06-26 14:38:18.936
NOTE: ASMB terminating
ORA-15064: communication failure with ASM instance
ORA-03113: end-of-file on communication channel
Process ID:
Session ID: 103 Serial number: 5
error 15064 detected in background process
ORA-15064: communication failure with ASM instance
ORA-03113: end-of-file on communication channel
Process ID:
Session ID: 103 Serial number: 5
kjzduptcctx: Notifying DIAG for crash event
----- Abridged Call Stack Trace -----
ksedsts()+461<-kjzdssdmp()+267<-kjzduptcctx()+232<-kjzdicrshnfy()+53<-ksuitm()+1325<-ksbrdp()+3344<-opirip()+623<-opidrv()+603<-sou2o()+103<-opimai_real()+266<-ssthrdmain()+252<-main()+201<-__libc_start_main()+244<-_start()+36
----- End of Abridged Call Stack Trace -----
*** 2012-06-26 14:38:19.509
ASMB (ospid: 12325): terminating the instance due to error 15064
*** 2012-06-26 14:38:30.826
Instance termination failed to kill one or more processes
ksuitm_check: OS PID=13918 is still alive
ksuitm_check: OS PID=13914 is still alive
ksuitm_check: OS PID=13910 is still alive
ksuitm_check: OS PID=13905 is still alive
ksuitm_check: OS PID=12309 is still alive
ksuitm_check: OS PID=12305 is still alive
ksuitm_check: OS PID=12301 is still alive
ksuitm_check: OS PID=12297 is still alive
ksuitm_check: OS PID=12293 is still alive
ksuitm_check: OS PID=12289 is still alive
ksuitm_check: OS PID=12285 is still alive
ksuitm_check: OS PID=12281 is still alive
ksuitm_check: OS PID=12277 is still alive
ksuitm_check: OS PID=12273 is still alive
ksuitm_check: OS PID=12229 is still alive

ocssd.log

2012-06-26 14:38:14.007: [ CSSD][1077279040]clssscMonitorThreads clssnmvDiskPingThread not scheduled for 196740 msecs
2012-06-26 14:38:16.543: [ CSSD][1115167040]clssnmHandleMeltdownStatus: node bjyq-hist-par-db01, number 1, has experienced a failure in thread number 9 and is shutting down
2012-06-26 14:38:16.984: [ CSSD][1101257024](:CSSNM00058:)clssnmvDiskCheck: No I/O completions for 200720 ms for voting file /dev/mapper/crsdisk001)
2012-06-26 14:38:16.984: [ CSSD][1101257024]clssnmvDiskAvailabilityChange: voting file /dev/mapper/crsdisk001 now offline
2012-06-26 14:38:16.984: [ CSSD][1101257024](:CSSNM00018:)clssnmvDiskCheck: Aborting, 0 of 1 configured voting disks available, need 1
2012-06-26 14:38:16.984: [ CSSD][1101257024]###################################
2012-06-26 14:38:16.984: [ CSSD][1101257024]clssscExit: CSSD aborting from thread clssnmvDiskPingMonitorThread
2012-06-26 14:38:16.984: [ CSSD][1101257024]###################################
2012-06-26 14:38:16.984: [ CSSD][1101257024](:CSSSC00012:)clssscExit: A fatal error occurred and the CSS daemon is terminating abnormally
2012-06-26 14:38:16.984: [ SKGFD][1107282240]Lib :UFS:: closing handle 0x2aaaac04fa00 for disk :/dev/mapper/crsdisk001:
2012-06-26 14:38:16.984: [ CSSD][1101257024]
----- Call Stack Trace -----
2012-06-26 14:38:16.984: [ CSSD][1101257024]calling call entry argument values in hex
2012-06-26 14:38:16.984: [ CSSD][1101257024]location type point (? means dubious value)
2012-06-26 14:38:16.984: [ CSSD][1101257024]-------------------- -------- -------------------- ----------------------------
2012-06-26 14:38:17.012: [ CSSD][1101257024]clssscExit()+726 call kgdsdst() 000000000 ? 000000000 ?

发现asm日志后有这样一段话

SUCCESS: diskgroup ARCHDG was mounted
GMON querying group 2 at 17 for pid 18, osid 25807
NOTE: Instance updated compatible.asm to 11.2.0.0.0 for grp 2
SUCCESS: diskgroup CRS was mounted
GMON querying group 3 at 18 for pid 18, osid 25807
NOTE: Instance updated compatible.asm to 11.2.0.0.0 for grp 3
SUCCESS: diskgroup DATADG was mounted
GMON querying group 4 at 19 for pid 18, osid 25807
NOTE: Instance updated compatible.asm to 11.2.0.0.0 for grp 4
SUCCESS: diskgroup IDXDG was mounted
GMON querying group 5 at 20 for pid 18, osid 25807
NOTE: Instance updated compatible.asm to 11.2.0.0.0 for grp 5
SUCCESS: diskgroup SYSDG was mounted
SUCCESS: ALTER DISKGROUP ALL MOUNT /* asm agent call crs *//* {0:0:2} */
SQL> ALTER DISKGROUP ALL ENABLE VOLUME ALL /* asm agent *//* {0:0:2} */
SUCCESS: ALTER DISKGROUP ALL ENABLE VOLUME ALL /* asm agent *//* {0:0:2} */
Tue Jun 26 14:58:00 2012

证明crs盘在asm里面进行管理

这时节点一启动了，查看rac1的日志

Tue Jun 26 14:38:18 2012
NOTE: ASMB process exiting, either shutdown is in progress
NOTE: or foreground connected to ASMB was killed.
Tue Jun 26 14:38:18 2012
NOTE: client exited [18463]
NOTE: force a map free for map id 2
Tue Jun 26 14:38:20 2012
PMON (ospid: 18351): terminating the instance due to error 481
Tue Jun 26 14:38:20 2012
ORA-1092 : opitsk aborting process
Tue Jun 26 14:38:20 2012
License high water mark = 75
Termination issued to instance processes. Waiting for the processes to exit
Tue Jun 26 14:38:30 2012
Instance termination failed to kill one or more processes
Instance terminated by PMON, pid = 18351
Tue Jun 26 14:38:30 2012
USER (ospid: 26836): terminating the instance
Termination issued to instance processes. Waiting for the processes to exit

查看ASM1的log日志

NOTE: ASMB process exiting, either shutdown is in progress
NOTE: or foreground connected to ASMB was killed.
Tue Jun 26 14:38:18 2012
NOTE: client exited [18463]
NOTE: force a map free for map id 2
Tue Jun 26 14:38:20 2012
PMON (ospid: 18351): terminating the instance due to error 481
Tue Jun 26 14:38:20 2012
ORA-1092 : opitsk aborting process
Tue Jun 26 14:38:20 2012
License high water mark = 75
Termination issued to instance processes. Waiting for the processes to exit
Tue Jun 26 14:38:30 2012

再此之前注意到有如下的报错

ORA-15055: unable to connect to ASM instance
ORA-00020: maximum number of processes (100) exceeded

后来想了一下，由于CRS盘在ASM中，由于应用程序的连接数过大导致了processes () exceeded，最基本的CRS通讯都无法启动一个process所以就会导致服务的漂移。但是有两个地方想不清楚，第一，为什么节点1会down机。第二，为什么在切换的时候ASM实例会自己重启一下？

转载于:https://blog.51cto.com/omygreen/910542

11Grac+ASM+linux2.6.18 processes (100) exceeded相关推荐

Oracle 数据库设置最大进程数参数方法，oracle最大进程数满了处理方法，sysdba管理员登录报“maximum number of processes (150) exceeded“问题解决
oracle 数据库使用 sysdba 管理员登录报: ORA-00020: maximum number of processes (150) exceeded 译:超过了最大进程数(150) 方法 ...
ORA-00020: maximum number of processes (xxxx) exceeded 报错解决方法
ORA-00020: maximum number of processes (xxxx) exceeded 报错解决方法参考文章: (1)ORA-00020: maximum number of ...
Davinci DM6446开发攻略——linux-2.6.18移植
TI DAVINCI 使用最新的内核是 montavista linux-2.6.18 ,之前说过,国内很多公司,包括开发板的软件包,一直在使用 montavista linux-2.6.10 , ...
linux2.6.18内核S3C2410平台移植笔记
我使用的实验箱是Embest EDUKIT-III,板上资源CPU:SAMSUNG S3C2410A.FLASH:K9F5608U0(Samsung NAND 32MiB),烧录工具:embest f ...
linux2.4.18内核定时器的使用
Linux2.4下驱动中定时器的应用我的内核是2.4.18的.Linux的内核中定义了一个定时器的结构: #include<linux/timer.h> struct timer_lis ...
linux-2.6.18源码分析笔记---中断
一.中断初始化中断的一些硬件机制不做过多的描述,只介绍一些和linux实现比较贴近的机制,便于理解代码. 1.1 关于intel和linux几种门的简介 intel提供了4种门:系统门,中断门,陷阱 ...
ORA-00020:maximum number of processes (150) exceeded 错误解决方法
解决方案 1.查看进程数 SQL> show parameter proce NAME TYPE VALUE --- ...
内存脏数据下刷(linux2.6.18/linux.2.6.32)剖析
1 前言 BDI机制原本主要是用于检测磁盘的繁忙程度等作用,从2.6.19内核开始,将此部分功能整合到了mm/backing_dev.c中,一直到2.6.31内核为止,其功能也只是在不段的完善,但 ...
解决ORA-00020:maximum number of processes (150) exceeded 错误
1.执行如下命令 sqlplus / as sysdba 连接SQL,如果出错导致无法连接,可以用以下两种方法重启Oracle, (1)杀死所有oracle进程 $ ps -ef |grep $ORA ...

11Grac+ASM+linux2.6.18 processes (100) exceeded

11Grac+ASM+linux2.6.18 processes (100) exceeded相关推荐

最新文章

热门文章