数据库版本:
Oracle RAC 12.1.0.2
数据库架构
ODA

客户反应业务系统连不上数据库,登上数据库检查数据库状态,发现所有PDB均为MOUNT状态。

[oracle@node0 ~]$ sqlplus / as sysdbaSQL*Plus: Release 12.1.0.2.0 Production on Mon Jun 7 01:51:18 2021Copyright (c) 1982, 2014, Oracle.  All rights reserved.Connected to:
Oracle Database 12c Enterprise Edition Release 12.1.0.2.0 - 64bit Production
With the Partitioning, Real Application Clusters, OLAP, Advanced Analytics
and Real Application Testing optionsSQL> show pdbsCON_ID CON_NAME                       OPEN MODE  RESTRICTED
---------- ------------------------------ ---------- ----------2 PDB$SEED                       READ ONLY  NO3 ALTH                         MOUNTED4 YDHEL                           MOUNTED5 MAALA                        MOUNTED6 TTECL                       MOUNTED7 DARE                         MOUNTED8 DOC                        MOUNTED9 BAAE                           MOUNTED10 EIEQ                           MOUNTED11 LILD                        MOUNTED12 NS                          MOUNTEDCON_ID CON_NAME                       OPEN MODE  RESTRICTED
---------- ------------------------------ ---------- ----------13 IAEG                            MOUNTED14 OGL                           MOUNTED15 EGEQB                          MOUNTED

先把pdb打开,让业务跑起来。


SQL> alter pluggable database all open;Pluggable database altered.SQL> show pdbsCON_ID CON_NAME                       OPEN MODE  RESTRICTED
---------- ------------------------------ ---------- ----------2 PDB$SEED                       READ ONLY  NO3 ALTH                         READ WRITE4 YDHEL                           READ WRITE5 MAALA                        READ WRITE6 TTECL                       READ WRITE7 DARE                         READ WRITE8 DOC                        READ WRITE9 BAAE                           READ WRITE10 EIEQ                           READ WRITE11 LILD                        READ WRITE12 NS                          READ WRITECON_ID CON_NAME                       OPEN MODE  RESTRICTED
---------- ------------------------------ ---------- ----------13 IAEG                            READ WRITE14 OGL                           READ WRITE15 EGEQB                          READ WRITESQL> exit

数据库起来后,业务运行正常。

检查告警日志:
1节点:

Mon Jun 07 01:22:14 2021
LMD0 (ospid: 10613) has not called a wait for sub 0 secs.
LMD1 (ospid: 10617) has not called a wait for sub 0 secs.
Errors in file /u01/app/oracle/diag/rdbms/cdb/cdb1/trace/cdb1_lmhb_10637.trc  (incident=688375) (PDBNAME=CDB$ROOT):
ORA-29770: global enqueue process LMD0 (OSID 10613) is hung for more than 70 seconds
Incident details in: /u01/app/oracle/diag/rdbms/cdb/cdb1/incident/incdir_688375/cdb1_lmhb_10637_i688375.trc
Errors in file /u01/app/oracle/diag/rdbms/cdb/cdb1/trace/cdb1_lmhb_10637.trc  (incident=688376) (PDBNAME=CDB$ROOT):
ORA-29770: global enqueue process LMD1 (OSID 10617) is hung for more than 70 seconds
Incident details in: /u01/app/oracle/diag/rdbms/cdb/cdb1/incident/incdir_688376/cdb1_lmhb_10637_i688376.trc
LOCK_DBGRP: GCR_SYSTEST debug event locked group GR+DB_CDB by memno 0
ERROR: Some process(s) is not making progress.
LMHB (ospid: 10637) is terminating the instance.
Please check LMHB trace file for more details.
Please also check the CPU load, I/O load and other system properties for anomalous behavior
ERROR: Some process(s) is not making progress.
LMHB (ospid: 10637): terminating the instance due to error 29770
Mon Jun 07 01:22:24 2021
System state dump requested by (instance=1, osid=10637 (LMHB)), summary=[abnormal instance termination].
System State dumped to trace file /u01/app/oracle/diag/rdbms/cdb/cdb1/trace/cdb1_diag_10575_20210607012224.trc
Mon Jun 07 01:22:26 2021
License high water mark = 591
Mon Jun 07 01:22:29 2021
Instance terminated by LMHB, pid = 10637
Mon Jun 07 01:22:29 2021
USER (ospid: 21601): terminating the instance
Mon Jun 07 01:22:29 2021
Instance terminated by USER, pid = 21601
Mon Jun 07 01:22:32 2021
Starting ORACLE instance (normal) (OS id: 21660)
Mon Jun 07 01:22:32 2021
CLI notifier numLatches:37 maxDescs:3986
Mon Jun 07 01:22:32 2021
**********************************************************************
Mon Jun 07 01:22:32 2021
Dump of system resources acquired for SHARED GLOBAL AREA (SGA) Mon Jun 07 01:22:32 2021Per process system memlock (soft) limit = UNLIMITED
Mon Jun 07 01:22:32 2021Expected per process system memlock (soft) limit to lockSHARED GLOBAL AREA (SGA) into memory: 128G
Mon Jun 07 01:22:32 2021Available system pagesizes:4K, 2048K
Mon Jun 07 01:22:32 2021Supported system pagesize(s):
Mon Jun 07 01:22:32 2021PAGESIZE  AVAILABLE_PAGES  EXPECTED_PAGES  ALLOCATED_PAGES  ERROR(s)
Mon Jun 07 01:22:32 20212048K            66823           65538           65538        NONE
Mon Jun 07 01:22:32 2021Reason for not supporting certain system pagesizes:
Mon Jun 07 01:22:32 20214K - Large pagesizes only
Mon Jun 07 01:22:32 2021
**********************************************************************
LICENSE_MAX_SESSION = 0
LICENSE_SESSIONS_WARNING = 0
Initial number of CPU is 24

2节点:

Mon Jun 07 01:25:05 2021Set master node info
Mon Jun 07 01:26:05 2021
Auto-tuning: Shutting down background process GTXb
Mon Jun 07 01:27:53 2021
IPC Send timeout detected. Sender: ospid 62152 [oracle@node1 (PING)]
Receiver: inst 1 binc 912546309 ospid 10589
Mon Jun 07 01:29:25 2021
LMD0 (ospid: 62168) received an instance eviction notification from instance 1 [2]
Mon Jun 07 01:29:26 2021
Received an instance abort message from instance 1
Mon Jun 07 01:29:26 2021
Received an instance abort message from instance 1
Mon Jun 07 01:29:26 2021
Please check instance 1 alert and LMON trace files for detail.
Mon Jun 07 01:29:26 2021
Please check instance 1 alert and LMON trace files for detail.
Mon Jun 07 01:29:26 2021
LMS0 (ospid: 62192): terminating the instance due to error 481
Mon Jun 07 01:29:26 2021
System state dump requested by (instance=2, osid=62192 (LMS0)), summary=[abnormal instance termination].
System State dumped to trace file /u01/app/oracle/diag/rdbms/cdb/cdb2/trace/cdb2_diag_62123_20210607012926.trc
Mon Jun 07 01:29:26 2021
ORA-1092 : opitsk aborting process
Mon Jun 07 01:29:27 2021
License high water mark = 1251
Mon Jun 07 01:29:31 2021
Instance terminated by LMS0, pid = 62192
Mon Jun 07 01:29:31 2021
USER (ospid: 85262): terminating the instance
Mon Jun 07 01:29:31 2021
Instance terminated by USER, pid = 85262
Mon Jun 07 01:29:33 2021
Starting ORACLE instance (normal) (OS id: 85397)
Mon Jun 07 01:29:33 2021
CLI notifier numLatches:37 maxDescs:3986
Mon Jun 07 01:29:33 2021
**********************************************************************
Mon Jun 07 01:29:33 2021
Dump of system resources acquired for SHARED GLOBAL AREA (SGA) Mon Jun 07 01:29:33 2021Per process system memlock (soft) limit = UNLIMITED
Mon Jun 07 01:29:33 2021Expected per process system memlock (soft) limit to lockSHARED GLOBAL AREA (SGA) into memory: 128G
Mon Jun 07 01:29:33 2021Available system pagesizes:4K, 2048K
Mon Jun 07 01:29:33 2021Supported system pagesize(s):
Mon Jun 07 01:29:33 2021PAGESIZE  AVAILABLE_PAGES  EXPECTED_PAGES  ALLOCATED_PAGES  ERROR(s)
Mon Jun 07 01:29:33 20212048K            67200           65538           65538        NONE
Mon Jun 07 01:29:33 2021Reason for not supporting certain system pagesizes:
Mon Jun 07 01:29:33 20214K - Large pagesizes only
Mon Jun 07 01:29:33 2021
**********************************************************************
LICENSE_MAX_SESSION = 0
LICENSE_SESSIONS_WARNING = 0
Initial number of CPU is 24

通过对数据库告警日志查看,节点1在01:22:14出现ORA-29970错误,LMD进程无法响应,在01:22:29被LMHB进程将实例终止,实例终止后自动启动。节点2在01:29:31被LMS0进程将实例终止,实例终止后自动启动。

Mon Jun 07 01:18:14 2021
Errors in file /u01/app/oracle/diag/rdbms/cdb/cdb1/trace/cdb1_ora_14795.trc  (incident=691727) (PDBNAME=CDB$ROOT):
ORA-04031: unable to allocate 12312 bytes of shared memory ("shared pool","unknown object","KKSSP^1069","kglseshtTable")
Use ADRCI or Support Workbench to package the incident.
See Note 411.1 at My Oracle Support for error and packaging details.
Errors in file /u01/app/oracle/diag/rdbms/cdb/cdb1/trace/cdb1_lmd0_10613.trc  (incident=707486) (PDBNAME=CDB$ROOT):
ORA-04031: unable to allocate 8504 bytes of shared memory ("shared pool","unknown object","sga heap(5,0)","ges big msg pool")
Use ADRCI or Support Workbench to package the incident.
See Note 411.1 at My Oracle Support for error and packaging details.
Mon Jun 07 01:18:16 2021
Errors in file /u01/app/oracle/diag/rdbms/cdb/cdb1/trace/cdb1_ora_14832.trc  (incident=691688) (PDBNAME=CDB$ROOT):
ORA-04031: unable to allocate 12312 bytes of shared memory ("shared pool","unknown object","KKSSP^320","kglseshtTable")
Use ADRCI or Support Workbench to package the incident.
See Note 411.1 at My Oracle Support for error and packaging details.
Mon Jun 07 01:18:16 2021
Errors in file /u01/app/oracle/diag/rdbms/cdb/cdb1/trace/cdb1_ora_14837.trc  (incident=691735) (PDBNAME=CDB$ROOT):
ORA-04031: unable to allocate 12312 bytes of shared memory ("shared pool","unknown object","KKSSP^1234","kglseshtTable")
Use ADRCI or Support Workbench to package the incident.
See Note 411.1 at My Oracle Support for error and packaging details.
Mon Jun 07 01:18:18 2021
Errors in file /u01/app/oracle/diag/rdbms/cdb/cdb1/trace/cdb1_ora_14912.trc  (incident=691871) (PDBNAME=CDB$ROOT):
ORA-04031: unable to allocate 12312 bytes of shared memory ("shared pool","unknown object","KKSSP^159","kglseshtTable")
Use ADRCI or Support Workbench to package the incident.
See Note 411.1 at My Oracle Support for error and packaging details.
Errors in file /u01/app/oracle/diag/rdbms/cdb/cdb1/trace/cdb1_lmd0_10613.trc  (incident=707487) (PDBNAME=CDB$ROOT):
ORA-04031: unable to allocate 8504 bytes of shared memory ("shared pool","unknown object","sga heap(5,0)","ges big msg pool")
Use ADRCI or Support Workbench to package the incident.
See Note 411.1 at My Oracle Support for error and packaging details.
DDE: Problem Key 'ORA 4031' was completely flood controlled (0x6)
Further messages for this problem key will be suppressed for up to 10 minutes
Mon Jun 07 01:18:33 2021
Errors in file /u01/app/oracle/diag/rdbms/cdb/cdb1/trace/cdb1_m000_15479.trc:
ORA-04031: unable to allocate 12312 bytes of shared memory ("shared pool","unknown object","KKSSP^2147","kglseshtTable")

查看更多日志,发现在关闭之前,出现了较多的ORA-04031错误。

AWR报告分析:

拉取了宕机前半小时的AWR报告,从AWR报告中得知,shared pool中ges enqueues和ges resource dynamic分别达到了18G和16G。

参考文献:
参考MOS文章:ORA-04031 Errors Occurring with High “ges resource dynamic” & “ges enqueues” Memory Usage In The Shared Pool (Doc ID 2063751.1)得到匹配信息,该问题是Oracle的BUG,该BUG出现于Oracle RAC 12.1.0.1至12.1.0.2,此问题在Oracle RAC 12.2中被修复,该数据库版本为Oracle RAC 12.1.0.2。

处理方式:
解决方案是打补丁:21373473 。客户是 12.1.0.2 版本,需要 21260431 补丁。

临时解决方案是修改: _GES_DIRECT_FREE_RES_TYPE=“CTARAHDXBB”

ges resource dynamic 和 ges enqueues较高导致数据库宕机相关推荐

  1. ges resource dynamic和ges enqueues占用较高导致实例终止问题处理

    数据库告警日志分析分析 通过对数据库告警日志查看,节点1出现ORA-29970错误,LMD进程无法响应,被LMHB进程将实例终止,实例终止后自动启动.节点2被LMS0进程将实例终止,实例终止后自动启动 ...

  2. keep老是显示服务器开小差,nginx+keepalived高可用服务器宕机解决方案

    原标题:nginx+keepalived高可用服务器宕机解决方案 1.基本服务器宕机的主从切换配置 两台nginx服务器,分别安装keepalived,配置/etc/keepalived/keepal ...

  3. clickhouse高可用-节点宕机数据一致性方案-热扩容

    clickhouse高可用-节点宕机数据一致性方案-热扩容 1. 集群节点及服务分配 说明: 1.1. 在每个节点上启动两个clickhouse服务(后面会详细介绍如何操作这一步),一个数据分片,一个 ...

  4. 服务器高并发处理/服务器宕机了怎么处理?

    服务器高并发处理/服务器宕机了怎么处理? 高并发问题是大部分服务器都经历过的,由于资源的有限性,其同时处理请求的能力自然也有限制.当高并发出现时,服务端的处理和响应速度会大幅降低,更严重的会使服务器崩 ...

  5. 高并发处理/服务器宕机处理

    一. web加速相关技术 1. 镜像站点 2. DNS负载均衡 3. CDN内容分发 二. 内网加速技术 1. 负载均衡(软件负载均衡.硬件负载均衡) 2. Web缓存服务器 3. Web/应用服务器 ...

  6. 主数据库宕机怎么办?MHA高可用帮你实现主从服务器自动切换(详细操作与命令详解)

    一.MHA简介 上一篇介绍了MySQL的主从复制.读写分离,实现主从服务器同步的架构,它存在单点故障的隐患,一旦主服务器出现故障,将无法进行写入,为了解决这个问题,可以考虑是否能够让从服务器切换角色, ...

  7. linux 宕机 内存,Linux内存使用高,触发系统宕机

    网上的解决方案:用ps查看各进程的内存,大约就占用了4G, 绝大部分内存都是被Page Cache所占用.Linux内核的策略是最大程度的利用内存cache 文件系统的数据,提高IO速度,虽然在机制上 ...

  8. MySQL - 高可用性:少宕机即高可用?

    MySQL - 高可用性:少宕机即高可用? 原文:MySQL - 高可用性:少宕机即高可用? 我们之前了解了复制.扩展性,接下来就让我们来了解可用性.归根到底,高可用性就意味着 "更少的宕机 ...

  9. oracle lms进程 内存,【案例】Oracle ges resource消耗内存高报错ORA-04031 MOS解决办法...

    天萃荷净 Oracle研究中心案例分析:运维DBA反映Oracle数据库10.2.0.4.12每间隔一段时间就必须重启,运行一断时间报ORA-04031错误oracle ges res cache l ...

最新文章

  1. windows下配置java环境jdk
  2. poj1195(二维树状数组)
  3. vue 自定义select下拉框样式(div模拟)
  4. 如何用纯 CSS 创作一个方块旋转动画
  5. 最快的ASP无组件上传类(4M只需10秒)0.96版
  6. 菜鸟学Linux命令:ssh命令 远程登录
  7. 13 | 答疑(一):无法模拟出 RES 中断的问题,怎么办?
  8. 64位Ubuntu kylin 16.04下使用DNW下载uboot到tiny4412的EMMC
  9. Ubuntu 中将Python3 置为默认版本
  10. h2o中模型存储与加载
  11. redis中各种数据类型对应的jedis操作命令
  12. LeetCode 60. 第k个排列(python、c++)
  13. Nginx自学手册(六)Nginx+Tomcat实现动静分离
  14. linux文件共享加锁,Linux共享数据管理——文件锁定
  15. 常用python内置函数_python常用内置函数
  16. C#图解教程 第二十四章 反射和特性
  17. 【Android命令行】jarsigner参数详解
  18. Android Design与Holo Theme详解
  19. 如何使用project制定项目计划?(附详细步骤截图)
  20. Teams Bot App 用户互动

热门文章

  1. unity开发日记之火箭发射
  2. 北京 Java软件工程师薪资_【北京京东工资】java高级软件工程师待遇-看准网
  3. 说说你对koa中洋葱模型的理解?
  4. 夏天快到了,热不热?下雪啦
  5. shell脚本 简介 及 第一个脚本程序
  6. 经典法定K型归一化LPF设计学习记录
  7. 如何向icloud上传文件_如何将苹果手机iCloud网盘中的文件分享给好友?
  8. 首届“全国人工智能大赛”正式启动,作为大赛支撑平台,和鲸科技助力年度顶级AI赛事
  9. C语言及程序设计初步 网络课程主页
  10. 海思WiFi IOT 芯片方案介绍!