环境:oracle 11.2.0.1 +rac +AIX 6.1建立两套

1问题描述

2010年11月29日下午15点左右,p570a主机telnet不进去,应用新建连接不成功,严重影响到业务,16点赶到用户现场,进行应急处理。

现把本次数据库应急故障处理、问题分析过程总结如下:

2应急处理

通过hmc控制台,登录到p570a主机,输入任何命令都报内存不足,如下;

root@p570a:/> errpt|more

ksh: 0403-031 The fork function failed. There is not enough memory available.

ksh: 0403-031 The fork function failed. There is not enough memory available.

root@p570a:/> ps -ef | grep LOCAL=NO|wc -l

ksh: 0403-031 The fork function failed. There is not enough memory available.

root@p570a:/> ls

ksh: 0403-031 The fork function failed. There is not enough memory available.

征求用户意见同意后,通过hmc控制台,重启p570a主机。

3.1操作系统Errpt

p570a@root#errpt|more

IDENTIFIER TIMESTAMP T C RESOURCE_NAME DESCRIPTION

A6DF45AA  1129164210 I O RMCdaemon     The daemon is started.

EC0BCCD4  1129164110 T H ent1          ETHERNET DOWN

67145A39  1129163910 U S SYSDUMP       SYSTEM DUMP

F48137AC  1129163810 U O minidump      COMPRESSED MINIMAL DUMP

1104AA28  1129163810 T S SYSPROC       SYSTEM RESET INTERRUPT RECEIVED

9DBCFDEE  1129164110 T O errdemon      ERROR LOGGING TURNED ON

B6267342  1126235510 P H hdisk3        DISK OPERATION ERROR

B6267342  1125235510 P H hdisk3        DISK OPERATION ERROR

C5C09FFA  1125062110 P S SYSVMM        SOFTWARE PROGRAM ABNORMALLY TERMINATED

C5C09FFA  1125051010 P S SYSVMM        SOFTWARE PROGRAM ABNORMALLY TERMINATED

C5C09FFA  1124144010 P S SYSVMM        SOFTWARE PROGRAM ABNORMALLY TERMINATED

C5C09FFA  1124144010 P S SYSVMM        SOFTWARE PROGRAM ABNORMALLY TERMINATED

C5C09FFA  1124144010 P S SYSVMM        SOFTWARE PROGRAM ABNORMALLY TERMINATED

C5C09FFA  1124144010 P S SYSVMM        SOFTWARE PROGRAM ABNORMALLY TERMINATED

C5C09FFA  1124144010 P S SYSVMM        SOFTWARE PROGRAM ABNORMALLY TERMINATED

C5C09FFA  1124144010 P S SYSVMM        SOFTWARE PROGRAM ABNORMALLY TERMINATED

C5C09FFA  1124144010 P S SYSVMM        SOFTWARE PROGRAM ABNORMALLY TERMINATED

C5C09FFA  1124144010 P S SYSVMM        SOFTWARE PROGRAM ABNORMALLY TERMINATED

C5C09FFA  1124144010 P S SYSVMM        SOFTWARE PROGRAM ABNORMALLY TERMINATED

C5C09FFA  1124144010 P S SYSVMM        SOFTWARE PROGRAM ABNORMALLY TERMINATED

C5C09FFA  1124144010 P S SYSVMM        SOFTWARE PROGRAM ABNORMALLY TERMINATED

C5C09FFA  1124144010 P S SYSVMM        SOFTWARE PROGRAM ABNORMALLY TERMINATED

C5C09FFA  1124144010 P S SYSVMM        SOFTWARE PROGRAM ABNORMALLY TERMINATED

C5C09FFA  1124144010 P S SYSVMM        SOFTWARE PROGRAM ABNORMALLY TERMINATED

C5C09FFA  1124144010 P S SYSVMM        SOFTWARE PROGRAM ABNORMALLY TERMINATED

C5C09FFA  1124144010 P S SYSVMM        SOFTWARE PROGRAM ABNORMALLY TERMINATED

C5C09FFA  1124144010 P S SYSVMM        SOFTWARE PROGRAM ABNORMALLY TERMINATED

p570a@root#errpt -aj C5C09FFA |more

---------------------------------------------------------------------------

LABEL:         PGSP_KILL

IDENTIFIER:    C5C09FFA

Date/Time:      Thu Nov 25 06:21:13 BEIST 2010

Sequence Number: 99122

Machine Id:     00C6E9C54C00

Node Id:        p570a

Class:          S

Type:           PERM

WPAR:           Global

Resource Name:  SYSVMM

Description

SOFTWARE PROGRAM ABNORMALLY TERMINATED

Probable Causes

SYSTEM RUNNING OUT OF PAGING SPACE

Failure Causes

INSUFFICIENT PAGING SPACE DEFINED FOR THE SYSTEM

PROGRAM USING EXCESSIVE AMOUNT OF PAGING SPACE

11月24号开始已经报没有足够的页面交换空间可以使用,可见物理内存早就用完。

3.2数据库警告文件

alert_gzjb1.log从11月24号开始就有大量如下报错:

Wed Nov 24 22:36:15 2010

ORA-27302: failure occurred at: skgpspawn3

ORA-27301: OS failure message: Not enough space

ORA-27300: OS system dependent operation:fork failed with status: 12

Errors in file /oracle/app/oracle/diag/rdbms/gdjb/gdjb1/trace/gdjb1_psp0_352314.trc:

Process startup failed, error stack:

Thu Nov 25 02:56:24 2010

Process q000 died, see its trace file

Thu Nov 25 02:56:13 2010

ORA-27302: failure occurred at: skgpspawn3

ORA-27301: OS failure message: Not enough space

ORA-27300: OS system dependent operation:fork failed with status: 12

Errors in file /oracle/app/oracle/diag/rdbms/gdjb/gdjb1/trace/gdjb1_psp0_352314.trc:

Process startup failed, error stack:

Instance terminated by USER, pid = 144242

USER (ospid: 144242): terminating the instance due to error 443

Process LMHB died, see its trace file

ORA-27302: failure occurred at: skgpspawn3

ORA-27301: OS failure message: Not enough space

ORA-27300: OS system dependent operation:fork failed with status: 12

Errors in file /oracle/app/oracle/diag/rdbms/gdjb/gdjb1/trace/gdjb1_ora_144242.trc:

p570a节点数据库down机是由于物理内存和页面交换空间已经使用完,无法得到请求引起的。

TNS-12500: TNS:监听器未能启动专用的进程

TNS-12540: TNS:超出内部极限限制

TNS-12560: TNS:协议适配器错误

TNS-00510:超出内部极限限制

IBM/AIX RISC System/6000 Error: 12: Not enough space

监听日志也报无法请求外部连接错误。

3.4检查物理内存和内存参数

物理内存

p570a

AIX

System Model: IBM,9117-MMA

Machine Serial Number: 066E9C5

Processor Type: PowerPC_POWER6

Processor Implementation Mode: POWER 6

Processor Version: PV_6_Compat

Number Of Processors: 8

Processor Clock Speed: 3504 MHz

CPU Type: 64-bit

Kernel Type: 64-bit

LPAR Info: 1 06-6E9C5

Memory Size: 15232 MB

Good Memory Size: 15232 MB

Platform. Firmware level: EM350_038

Firmware Version: IBM,EM350_038

Console Login: enable

Auto Restart: true

Full Core: false

可以看出总物理内存为15G左右

数据库A

SQL> show sga

Total System Global Area 2137886720 bytes

Fixed Size                 2208496 bytes

Variable Size           1207962896 bytes

Database Buffers         922746880 bytes

Redo Buffers               4968448 bytes

SQL> show parameter sga

NAME                                TYPE       VALUE

------------------------------------ ----------- ------------------------------

lock_sga                            boolean    FALSE

pre_page_sga                        boolean    FALSE

sga_max_size                        big integer 2G

sga_target                          big integer 2G

SQL> show parameter pga

NAME                                TYPE       VALUE

------------------------------------ ----------- ------------------------------

pga_aggregate_target                big integer 1G

SQL> show parameter instance_name

NAME                                TYPE       VALUE

------------------------------------ ----------- ------------------------------

instance_name                       string     gd1

可以看出A数据库占用3G物理内存

数据库B

SQL> show sga

Total System Global Area 8551575552 bytes

Fixed Size                 2223904 bytes

Variable Size           1778385120 bytes

Database Buffers        6761218048 bytes

Redo Buffers               9748480 bytes

SQL> show parameter sga

NAME                                TYPE    VALUE

lock_sga                            Boolean FALSE

pre_page_sga                        Boolean FALSE

sga_max_size                        big integer 8G

sga_target                          big integer 8G

SQL> show parameter instance_name

NAME                                TYPE       VALUE

------------------------------------ ----------- ------------------------------

instance_name                       string     gd2

SQL> show parameter pga

NAME                                TYPE            VALUE

pga_aggregate_target                big integer      2G

可以看出B数据库占用10G物理内存,分配的值占用总内存较多。

4总结及建议

4.1故障原因分析

总物理内存15G,分配给两个数据库总共内存13G,只剩2G给操作系统使用,随着业务连接数增多或不释放等原因,很容易把物理内存和页面交换空间耗用完,导致数据库down机和主机挂起。

4.2已采取措施和建议

1)gzcdc数据库oracle内存参数值设置过大,建议调整,跟开发商,用户商量后,将gzcdc数据库sga调整为5G,pga设置为1G,这样操作系统还剩余7G。

oracle11内存建议,环境:oracle 11.2.0.1 +aix6内存问题相关推荐

  1. linux单机到单机adg环境,Oracle 11.2.0.4 DataGuard 环境打PSU,OJVM PSU补丁快速参考

    环境:RHEL6.5 + Oracle 11.2.0.4 DataGuard physical standby 主库和备库都是单节点. 需求:主备库同时应用160719的PSU和OJVM PSU补丁. ...

  2. linux oracle11.2安装 ins-1010,Oracle 11.2.0.2 RAC安装出现INS-35354解决

    今天在安装一套Oracle 11.2.0.2 RAC数据库时出现了INS-35354的问题: 因为之前已经成功安装了11.2.0.2的GI,而且Cluster的一切状态都正常,出现这错误都少有点意外: ...

  3. CentOS 6.5+Oracle 11.2.0.4的ADG环境搭建

    预先下载 1.VMware workstation 2. CentOS 6.5 3. linux Oracle 11.2.0.4 链接:https://pan.baidu.com/s/1_VaYV-u ...

  4. Oracle 11.2.0.1 升级到 11.2.0.3 示例

    Oracle 11.2.0.1 单实例升级到11.2.0.3. Oracle 升级的步骤都差不多. 先升级Oracle software,然后升级Oracle instance. Oracle 11. ...

  5. Oracle Linux 6.9安装和Oracle 11.2.0.4.0安装及psu补丁升级

    原文有图图https://www.linuxidc.com/linux/2018-09/154218.htm 图文详解在Oracle Linux 6.9安装和Oracle 11.2.0.4.0安装及p ...

  6. linux dump命令 异机,Oracle 11.2.0.4 从单实例,使用RMAN 异机恢复到RAC

    Oracle 11.2.0.4从单实例,使用RMAN异机恢复到RAC 注意: (1)迁移的2个db版本版本要一致.包括小版本. (2)RMAN异机恢复的时候,db_name必须相同.如果说要想改成其他 ...

  7. ORACLE LINUX 6.3 + ORACLE 11.2.0.3 RAC + VBOX安装文档

    ORACLE LINUX 6.3 + ORACLE 11.2.0.3 RAC + VBOX安装文档 2015-10-21 12:51 525人阅读 评论(0) 收藏 举报  分类: Oracle RA ...

  8. oracle aix 11.2.0.3,Oracle 11.2.0.3 Database for AIX bug导致ORA-04030的报错

    根据我以往的经验,在AIX平台Oracle Database数据库较其他平台更容易报ORA-04030的错误,PGA的参数有时候设置小了反而不报错误.帮助客户将一个Oracle 11.2.0.3 Re ...

  9. 记一次Oracle 11.2.0.4 RAC异地还原到单实例

    此次记录一下Oracle RAC集群备份异地单实例恢复操作.主要记录关键操作,由于保密原因不粘贴详细操作流程. 一.环境: 原库: 操作系统:Redhat 6.5 数据库:Oracle 11.2.0. ...

最新文章

  1. 自动驾驶汽车视觉- 图像特征提取与匹配技术
  2. mabatisplus怎么给实体类自定义属性_吊打面试官之:当实体类中的属性名和表中的字段名不一样 ,怎么办 ?...
  3. 一段动态生成表格的JSP代码讲解
  4. Windows下动态加载可执行代码原理简述
  5. intelij idea启动之谜
  6. Hive _练习,更新中
  7. Python中xlrd模块解析
  8. linux程序运行耗时shell脚本running_time.sh
  9. python自定义类的属性_我可以将自定义方法/属性添加到内置Python类型吗?
  10. 山东理工【2871】爱
  11. Tableau中的行级数据安全性——第1部分
  12. springboot项目如何使用HikariCP(国际主流)做为数据源,gradle构建工具
  13. Python编写学生类计算年龄、成绩等级
  14. 解读现代存储系统背后的经典算法
  15. kafka安装java,Linux下Kafka单机安装配置方法(图文)
  16. 嵌入式系统中的FLASH
  17. 三菱a系列motion软体_三菱系列 PLC常见问题解答
  18. php获取笔顺矢量,php如何获取汉字笔画数功能的实例分析
  19. C++客户端面经总结
  20. 【公开课报名】腾讯产品经理教你如何用好腾讯会议

热门文章

  1. linux arp 工具下载,linux如何ARP嗅探 Linux下嗅探工具Dsniff安装记录(示例代码)
  2. [estore基础知识] 之(三)【Spring 体系】
  3. 应用宝apk_【小镇狼人杀】新角色猜测(末尾附上应用宝下载渠道)
  4. AFL(American Fuzzy Lop)源码详细解读(8)
  5. Unity 在代码中利用Mesh实时生成圆环/空心圆柱
  6. 【51nod_3144】超级购物【期望】
  7. Android游戏: 五子棋-局域网版
  8. bzoj2073 [POI2004]PRZ
  9. python 文字语音朗读软件下载_Python 文本转语音
  10. HTML怎么转换base64教程,html5将图片转换成base64代码