LGWR waits for event ‘DLM cross inst call completion’ 故障排除
点击上方"蓝字"
关注我们,享更多干货!
客户一套Oracle 19c Dataguard的数据库环境,Standby端总是会间隔性出现较大GAP,同时DB alert log日志出现LGWR (ospid: 105521) waits for event ‘DLM cross inst call completion’ for N secs. 的现象,Standby端并未对外提供查询,同时也禁用了多实例日志应用,操作系统资源空闲,LMS进程个数正常,如果关闭其它节点只留apply log并不存在该问题。DLM是Distributed Lock Manager属于RAC架构中核心机制,实现多节点资源共享调度,通过interconnect Network传递请求。下面简单记录一下这个案例:
db alert log
PR00 (PID:109603): Media Recovery Log +ARCH/anbob1/ARCHIVELOG/2021_07_12/thread_3_seq_13586.1479.1077669291
2021-07-12T20:25:29.643687+08:00
PR00 (PID:109603): Media Recovery Log +ARCH/anbob1/ARCHIVELOG/2021_07_12/thread_2_seq_14361.1072.1077669019
2021-07-12T20:29:38.183656+08:00
LGWR (ospid: 105521) waits for event 'DLM cross inst call completion' for 1 secs.
2021-07-12T20:29:48.137737+08:00
LGWR (ospid: 105521) waits for event 'DLM cross inst call completion' for 2 secs.
2021-07-12T20:31:21.952345+08:00rfs (PID:113884): Selected LNO:26 for T-2.S-14456 dbid 3902007743 branch 1037635587
2021-07-12T20:31:21.987333+08:00rfs (PID:114704): Error ORA-235 occurred during an un-locked control filerfs (PID:114704): transaction. This error can be ignored. The controlrfs (PID:114704): file transaction will be retried.
2021-07-12T20:31:43.532600+08:00
ARC2 (PID:106404): Archived Log entry 9591 added for T-2.S-14455 ID 0xe894b1bf LAD:1
2021-07-12T20:31:47.151671+08:00rfs (PID:113882): Selected LNO:31 for T-3.S-13731 dbid 3902007743 branch 1037635587
2021-07-12T20:31:49.116049+08:00rfs (PID:113880): Selected LNO:22 for T-1.S-13006 dbid 3902007743 branch 1037635587
2021-07-12T20:31:53.393547+08:00
ARC3 (PID:106408): Archived Log entry 9592 added for T-1.S-13005 ID 0xe894b1bf LAD:1
2021-07-12T20:32:02.346585+08:00
ARC2 (PID:106404): Archived Log entry 9593 added for T-3.S-13730 ID 0xe894b1bf LAD:1
2021-07-12T20:33:13.805344+08:00
LGWR (ospid: 105521) waits for event 'DLM cross inst call completion' for 0 secs.
2021-07-12T20:33:13.805470+08:00
LGWR (ospid: 105521) is hung in an acceptable location (inwait 0x1.ffff).
2021-07-12T20:33:21.196764+08:00
LGWR (ospid: 105521) waits for event 'DLM cross inst call completion' for 2 secs.
2021-07-12T20:33:31.310737+08:00
LGWR (ospid: 105521) waits for event 'DLM cross inst call completion' for 0 secs.
2021-07-12T20:33:41.223781+08:00
LGWR (ospid: 105521) waits for event 'DLM cross inst call completion' for 1 secs.
2021-07-12T20:33:51.205776+08:00
LGWR (ospid: 105521) waits for event 'DLM cross inst call completion' for 2 secs.
2021-07-12T20:34:01.307770+08:00
LGWR (ospid: 105521) waits for event 'DLM cross inst call completion' for 0 secs.
2021-07-12T20:34:25.440231+08:00
PR00 (PID:109603): Media Recovery Log +ARCH/anbob1/ARCHIVELOG/2021_07_12/thread_2_seq_14362.1867.1077670807
2021-07-12T20:34:44.864009+08:00
PR00 (PID:109603): Media Recovery Log +ARCH/anbob1/ARCHIVELOG/2021_07_12/thread_3_seq_13587.691.1077670845
2021-07-12T20:34:45.204773+08:00
PR00 (PID:109603): Media Recovery Log +ARCH/anbob1/ARCHIVELOG/2021_07_12/thread_1_seq_12934.1156.1077670917
2021-07-12T20:36:09.378685+08:00
LGWR (ospid: 105521) waits for event 'DLM cross inst call completion' for 2 secs.
2021-07-12T20:36:19.341635+08:00
LGWR (ospid: 105521) waits for event 'DLM cross inst call completion' for 0 secs.
2021-07-12T20:36:28.416573+08:00
LGWR (ospid: 105521) waits for event 'DLM cross inst call completion' for 0 secs.
2021-07-12T20:36:38.375742+08:00
LGWR (ospid: 105521) waits for event 'DLM cross inst call completion' for 1 secs.
LGWR trace
*** 2021-07-12T20:33:43.793041+08:00 ((4))
Received ORADEBUG command (#235) 'dump KSTDUMPCURPROC 1' from process '105470'
-------------------------------------------------------------------------------
Trace Bucket Dump Begin: default bucket for process 47 (osid: 105521, LGWR)
CDB_NAME(CON_ID):CON_UID:TIME(*=approx):SEQ:COMPONENT:FILE@LINE:FUNCTION:SECT/DUMP:SID:SERIAL#: [EVENT#:PID] DATA
-------------------------------------------------------------------------------
IRMSDB(4):3247498417:2021-07-12 20:33:42.784 :KJCI:kjci.c@1957:kjci_complete():4466:40278: freeing request 0x20fd651e8 (inst|inc|reqid)=(1|88|823031) with opcode=146 and completion status [DONE]
IRMSDB(4):3247498417:2021-07-12 20:33:42.784 :KJCI:kjci.c@1089:kjci_initreq():4466:40278: request 0x20fd651e8 (inst|inc|reqid)=(1|88|823032) with group (type|id)=(1|1), opcode=146, flags=0x0, msglen=56, where=[kqlmClusterMessage] to target instances=
IRMSDB(4):3247498417:2021-07-12 20:33:42.784 :KJCI:kjci.c@1091:kjci_initreq():4466:40278: 1 2
IRMSDB(4):3247498417:2021-07-12 20:33:42.784 :KJCI:kjci.c@1618:kjci_processcrq():4466:40278: processing reply 0x2cff2d4e8 for request 0x20fd651e8 (inst|inc|reqid)=(1|88|823032) with opcode=146 from callee (inst|pid|psn)=(1|36|1)
IRMSDB(4):3247498417:2021-07-12 20:33:42.784 :KJCI:kjci.c@1618:kjci_processcrq():4466:40278: processing reply 0x2cff2d718 for request 0x20fd651e8 (inst|inc|reqid)=(1|88|823032) with opcode=146 from callee (inst|pid|psn)=(2|36|1)
IRMSDB(4):3247498417:2021-07-12 20:33:42.784 :KJCI:kjci.c@1957:kjci_complete():4466:40278: freeing request 0x20fd651e8 (inst|inc|reqid)=(1|88|823032) with opcode=146 and completion status [DONE]
IRMSDB(4):3247498417:2021-07-12 20:33:42.785 :KJCI:kjci.c@1089:kjci_initreq():4466:40278: request 0x20fd651e8 (inst|inc|reqid)=(1|88|823033) with group (type|id)=(1|1), opcode=146, flags=0x0, msglen=56, where=[kqlmClusterMessage] to target instances=
IRMSDB(4):3247498417:2021-07-12 20:33:42.785 :KJCI:kjci.c@1091:kjci_initreq():4466:40278: 1 2
IRMSDB(4):3247498417:2021-07-12 20:33:42.785 :KJCI:kjci.c@1618:kjci_processcrq():4466:40278: processing reply 0x2cff2d4e8 for request 0x20fd651e8 (inst|inc|reqid)=(1|88|823033) with opcode=146 from callee (inst|pid|psn)=(1|36|1)
IRMSDB(4):3247498417:2021-07-12 20:33:42.785 :KJCI:kjci.c@1618:kjci_processcrq():4466:40278: processing reply 0x2cff2d718 for request 0x20fd651e8 (inst|inc|reqid)=(1|88|823033) with opcode=146 from callee (inst|pid|psn)=(2|36|1)
IRMSDB(4):3247498417:2021-07-12 20:33:42.785 :KJCI:kjci.c@1957:kjci_complete():4466:40278: freeing request 0x20fd651e8 (inst|inc|reqid)=(1|88|823033) with opcode=146 and completion status [DONE]
IRMSDB(4):3247498417:2021-07-12 20:33:42.785 :KJCI:kjci.c@1089:kjci_initreq():4466:40278: request 0x20fd651e8 (inst|inc|reqid)=(1|88|823034) with group (type|id)=(1|1), opcode=146, flags=0x0, msglen=56, where=[kqlmClusterMessage] to target instances=
IRMSDB(4):3247498417:2021-07-12 20:33:42.785 :KJCI:kjci.c@1091:kjci_initreq():4466:40278: 1 2
**KJCJ ** ==> (kjci)_processcrq – kernel lock management communication cross instance call
跨节点的通信,MOS中不存在已知BUG。先分析网络问题,也可以从进程blocker做SSD或查看hangmgr Trace。Oracle 19c CRS中AHF框架自带了OSW。
OSW netstat data
zzz ***Tue Jul 13 00:59:51 CST 2021
...
#kernel
IpInReceives 1456201695 0.0
IpInHdrErrors 0 0.0
IpInAddrErrors 0 0.0
IpForwDatagrams 0 0.0
IpInUnknownProtos 0 0.0
IpInDiscards 0 0.0
IpInDelivers 1085210966 0.0
IpOutRequests 1007206469 0.0
IpOutDiscards 5280 0.0
IpOutNoRoutes 8 0.0
IpReasmTimeout 6333500 0.0
IpReasmReqds 408470736 0.0
IpReasmOKs 37504539 0.0
IpReasmFails 8651478 0.0
IpFragOKs 29029579 0.0
Note:
当前存在较高的IP重组失败包,这是一个累计值,下面可以查看日常变化。
查看日常IP重组失败情况
awk '/zzz/{d=$3"/"$4" "$5}/IpReasmFails/{curr=$2;diff=curr-prev;if(diff>5)print d,diff,prev,curr;prev=curr}' *.dat
Jul/13 00:00:16 8620039 8620039
Jul/13 00:00:46 185 8620039 8620224
Jul/13 00:01:16 242 8620224 8620466
Jul/13 00:01:46 324 8620466 8620790
Jul/13 00:02:16 279 8620790 8621069
Jul/13 00:02:46 325 8621069 8621394
Jul/13 00:03:16 325 8621394 8621719
Jul/13 00:03:46 247 8621719 8621966
Jul/13 00:04:16 246 8621966 8622212
Jul/13 00:04:46 210 8622212 8622422
Jul/13 00:05:16 327 8622422 8622749
Jul/13 00:05:46 247 8622749 8622996
Jul/13 00:06:16 238 8622996 8623234
Jul/13 00:06:46 219 8623234 8623453
Jul/13 00:07:16 262 8623453 8623715
Jul/13 00:07:46 254 8623715 8623969
Jul/13 00:08:16 179 8623969 8624148
Jul/13 00:08:46 294 8624148 8624442
Note:
可见平时也存在较高的IP重组失败,下面尝试使用ping验证网络
使用ping验证
— on node1
ping -s 4000 {node2-privateIP}
Note:
这里忘记保留历史输出,发现有12% package loss,说明当前和心跳网络并不健康,不过使用的是两个网卡做的BOND,当前是Active-Backup主备模式,可以尝试切换另一个网卡。
网卡切换
cat /proc/net/bonding/bond0
Note:
检查到当前主卡是ens9f0,切换到备卡ens9f1
ifenslave -c bond0 ens9f1
做了主备网卡切换后:ping正常、IP重组失败消失、DLM cross inst call completion未在出现、DG同步正常、问题得到解决。
墨天轮原文链接:https://www.modb.pro/db/84815(复制链接至浏览器或点击文末阅读原文查看)
关于作者
张维照,云和恩墨技术总监,Oracle ACE-A。2006年起从事数据库管理工作,2009年起从事ORACLE DBA维护工作,十余年来专注于Database 技术和架构的研究,热衷于oracle数据库故障诊断、性能优化、内部原理、新特性的学习与分享,在BLOG分享大量的学习和案例经验。从事过多套TB级省级工商、医疗、交通、人社、政府、电信运营商等行业数据库项目从业经验。
END
推荐阅读:267页!2020年度数据库技术年刊
推荐下载:2020数据技术嘉年华PPT下载
2020数据技术嘉年华近50个PPT下载、视频回放已上传墨天轮平台,可在“数据和云”公众号回复关键词“2020DTC”获得!
你知道吗?我们的视频号里已经发布了很多精彩的内容,快去看看吧!↓↓↓
点击下图查看更多 ↓
云和恩墨大讲堂 | 一个分享交流的地方
长按,识别二维码,加入万人交流社群
请备注:云和恩墨大讲堂
点个“在看”
你的喜欢会被看到❤
LGWR waits for event ‘DLM cross inst call completion’ 故障排除相关推荐
- 饥荒联机版Mod开发——常用inst方法(八)
饥荒联机版Mod开发--常用inst方法(八) 前言 生成实体 监听/推送事件 使用标签 增删组件 网络组件 定时/阶段任务 回调函数 父子实体 平台 删除 位置,角度 显示/隐藏 判断 大脑 状态图 ...
- 19、修改和删除事件(ALTER/DROP EVENT)
介绍 MySQL 删除和修改事件的方法. 修改事件 在 MySQL 中,事件创建之后,可以使用 ALTER EVENT 语句修改其定义和相关属性. 修改事件的语法格式如下: ALTER EVENT e ...
- mysql 手动执行event_MYSQL 定时自动执行EVENT
MySQL从5.1开始支持EVENT功能,类似Oracle和MSSQL的定时任务job功能.有了这个功能之后我们就可以让MySQL自动的执行存储过程来实现数据汇总等功能了,不用像以前哪样手动操作完成了 ...
- What is event bubbling and capturing?
What is event bubbling and capturing? 答案1 Event bubbling and capturing are two ways of event propaga ...
- 奥的斯服务器显示dlm,奥的斯ACD系列电梯DLM故障的分析与处理
DLM故障的定义 ACD2/3/4电梯GECB板OTIS servicetool M-1-1-1-1提示DLM:Door Lock Monitor门锁监控, 电梯处于IDL状态下,没有开门指令时,轿门 ...
- MySQL CREATE EVENT创建任务计划定时执行任务
自 MySQL5.1.6起,增加了一个非常有特色的功能–事件调度器(Event Scheduler),可以用做定时执行某些特定任务(例如:删除记录.对数据进行汇总等等),来取代原先只能由操作系统的计划 ...
- PBlaze6 6530系列SSD的“黑匣子”——Persistent Event Log介绍
在最新发布的PBlaze6 6530系列企业级PCIe 4.0 NVMe SSD中,我们加入了Persistent Event Log(简称"PEL")持久化事件日志功能.它是NV ...
- oracle gc buffur,【性能調優】Oracle AWR報告指標全解析
1-1 內存參數大小 Cache Sizes BeginEnd ~~~~~~~~~~~ ---------- ----- ...
- Oracle AWR报告指标全解析-11011552
1-5 Top 5 Timed Events Waits : 该等待事件发生的次数, 对于DB CPU此项不可用 Times : 该等待事件消耗的总计时间,单位为秒, 对于DB CPU 而言是前台进程 ...
最新文章
- 系统集成项目管理工程师软考辅导——3年真题透解与全真模拟
- mysql中鼠标光标消失了_为什么我这里没有显示鼠标的悬停可改变页面颜色,以为什么我加载了mysql的jar文件还是不能显示报表的内容呢?...
- Apache POI和EasyExcel 第六集:Apache POI的Excel读取单元格中的计算公式
- 防盗链技术底层实现原理分析
- Linux编译C没有文件名,crt1.o linux x64上没有这样的文件c编译错误
- 【Python】Could not find a version that satisfies the requirement cv2 (from versions: ) 的解决方案
- Unity搭建简单的图片服务器
- c#连接sqlserver
- Atitit 微信小程序的部署流程文档 目录 1.1. 设置https 参照 Atitit tomcat linux 常用命令	1 1.2. 增加证书 腾讯云和阿里云都可申请免费证书,但要一天
- VS2005 执行控制台程序的时候,窗口一闪就没了的问题
- ODAC的tnsnames.ora文件
- IR2104全桥驱动的自举问题
- 极客空间-MySQL实战45天-第三天
- matlab数组输出字符,Matlab字符型数组的输入输出
- 软件实施人员具备的技能和素养
- 根据身份证号(18/15)计算年龄、出生日期、性别
- GraphicsMagick实现云服务商基础图片处理
- 美国芯片陆续转向,或许为当初的做法后悔不迭,芯片补贴也难改局面
- java中父类创建子类的语法_Java 语言中,所创建的子类都应有一个父类。( )_学小易找答案...
- wordpress文章页面添加版权信息的四种方法