oracle latch chain,ORACLE latch AND mutex 深入讲解

本帖最后由 sunyunyi 于 2018-3-13 10:06 编辑

作者简介：

----------------------------------------------------

@ 孙显鹏，海天起点oracle技术专家，十年从业经验

@ 精通oracle内部原理，擅长调优和解决疑难问题

@ 致力于帮助客户解决生产中的问题，提高生产效率。

@ 爱好：书法，周易，中医。微信：sunyunyi_sun

@ 易曰：精义入神，以致用也！

@ 调优乃燮理阴阳何其难也！

-----------------------------------------------------

ORACLE latch AND mutex 深入讲解

latch 目的：

序列化访问SGA中的数据结构

保护共享内存的分配

latch 属性：

低级别的lock

enqueue 和 latch的区别

访问模式：许多模式几乎只有独占

请求机制：FIFO 不定

原子性： NO YES

例子说明：比如更新一行数据，首先在buffer cache中获取cbc latch访问buffer cache chains检索该chains上的

buffer header，[ 跳过地址不相等的buffer，跳过cr，等待reading，如果全部为cr则copy一个buffer设置为current buffer。

若果存在当前buffer则检索锁的兼容性，兼容获取buffer lock，不兼容等待buffer lock等待事件buffer busy。

注意buffer lock 不是采取enqueue机制，而是使用内部代码控制] 然后pin该buffer，释放cbc latch，然后锁该块中的行通过

TX enqueue，更新数据，释放pin，接下来该行锁持有的时间依据该事务的持续时间。

从上面的例子可以知道，latch 保护内存结构快速访问接下来访问高层次持有事件较长的LOCK。

latch 通过内存中的一个地址来实现，32位，0表示没有持有，非0表示持有。

latch 分为：

独占latch和共享latch，共享latch很少使用。

属性：

latch 定义为：parent latch和solitary latch

parent latch：在编译时分配，存在child latch set。child latch继承parent latch属性，实例启动后自动分配。

solitary latch：比如：shared pool，redo allocate，无child latch

建立latch时可拥有几个falg：

PR2：建立一个父亲latch同时允许两个子latch以愿意等待方式请求，但是KSL层强制控制请求顺序用来防止死锁。

PAR: 建立一个父亲latch允许任何两个子latch以任何顺序请求，但是仅仅只有一个子latch以等待方式请求，接

下来另外一个子latch必须以非等待方式请求。

long：等待时间较长的latch，以post/wait机制访问，若等待则进入sleep，释放cpu资源。

SHL：共享latch，多个进程可以以只读方式并发访问。

例如：

PAR: cache buffers lru chain , cache buffers chains--这也就是为什么AWR报告latch部分会统计NoWait Requests次数的原因

PR2：library cache ---10.2以后使用mutex

latch level：

大多数latch拥有0到8的level。为了防止死锁，有严格的访问顺序，一个进程获取一个latch后，不能再以同级别或

低级别获取latch，除非latch为非等待模式。

latch的描述：可以通过X$kslld表查看每个latch的描述。kslld结构为每个latch的描述信息，存储在PGA中。进程

通过访问该结构获取latch描述。

no-wait方式：

分为两种情况：

1：例如redo copy ，存在许多可用的相同类型的latch，如果一个不可用，则请求下一个，直到所有的latch都不可用

停下来等待。

2：第二种latch按照严格的级别执行，如果请求latch空闲则获取，否则如果当前请求的级别高则打破当前队列从新排队。

wait 方式：

请求latch，如果空闲则获取，否则进行spins(空转CPU不释放CPU资源，因为如果释放CPU再次获取CPU资源则需要上下文

切换和再次排队更加耗费资源)spins后再次尝试，还不能获取再次spins，如此循环2000次还不能获取则进入sleep，sleep

则必须释放CPU资源，第一次sleep时间为0.01s，下次sleep时间为0.02秒，(1分秒，2分秒，2分秒，4分秒,4分秒...)

直到达到上限200分秒,如果还同时持有另外一个latch则最大时间为4分秒。

latch release：

只是针对LONG类型的latch，这里存在一个等待链表存储正在等待的session，当前session释放latch则post最前面的session获取。

latch cleanup:

当通过四次sleep还不能获取latch资源，在第五次sleep时进程请求pmon检查持有者是否还存活，如果死掉则清除该latch，如果

持有者alive则说明存在latch争用可能sleep时持有信息改变，或者指示操作系统调度问题，不能获取CPU资源。

目前latch 主要用在 buffer cache 和redo 以及 shared pool 内存分配中。

MUTEX:

从10.2版本开始mutex主要应用于library cache中，那么我们必须对library cache结构了解清楚。

和buffer cache一样，每个cursor通过hash算法散列到每个bucket中，在10.2之前bucket上存在latch保护，同样的挂在

bucket上的handle通过library cache latch保护，一个子latch保护多个bucket，这个上面的latch中有概述，这样就会

造成latch本身的争用，比如大量的并发session访问相同的子latch。mutex则不然，每一个对象存在一个mutex，比如每

一个bucket存在对应的mutex保护，每个handle存在一个对应的mutex保护，这样就极大的减少了争用，且mutex比latch

代码量更小，操作更快，且无需死锁检查。mutex只是替代了latch，handle上的library cache lock和pin还是存在。

下面SQL可以查询library cache每个bucket中的内容：

set lines 1200 pages 1200

col kglnaobj for a70

col kglnaown for a15

col KGLNAHSV for a30

prompt kglhdbid is bucket id

prompt kglnahsh handle address [=p1 SQL:HASH_VALUE FROM V$SQL]

prompt kglhdnsp is namespace

prompt kglnahsv is hash

select kglhdbid,kglnahsh,kglnaown,kglnaobj,kglhdnsp,kglnahsv from x$kglob

V$SGAINFO:

Shared Pool Size 704643072 Yes --700M

select max(kglhdbid) from x$kglob

131071--总共分配的hash_buckets

select max(links) from (

select kglhdbid,count(kglnahsh) links from x$kglob group by kglhdbid order by count(kglnahsh));

MAX(LINKS)

----------

36 --

所有hash_bucket中包含Handel最多的是36个，大部分值为1；

这里说明下从9i开始oracle默认最少分配509个bucket，如果平均每个bucket中的handle数为2那么bucket数量增大一倍，然后重新

分配handle到各个bucket中。由此也可以看出其实library cache中bucket不会引起争用，因为每个bucket中大部分只有一个

handle，所以大部分争用都是在并发操作相同的handle也就是共享CURSOR。

下面看看SQL怎么和bucket对应：

SQL> select kglhdbid,kglnahsh,kglnaown,kglnaobj,kglhdnsp,kglnahsv from x$kglob where kglnaobj='select count(*) from dba_objects';

KGLHDBID KGLNAHSH KGLNAOWN KGLNAOBJ KGLHDNSP KGLNAHSV

---------- ---------- --------------- ---------------------------------------------------------------------- ---------- ------------------------------

48235 2935929963 select count(*) from dba_objects 0 46d40caf7797eb73f25653bdaefebc6b

SQL> select ADDRESS,hash_value from v$sql where sql_text ='select count(*) from dba_objects';

ADDRESS HASH_VALUE

---------------- ----------

000000007A9459F8 2935929963 = KGLNAHSH in x$kglob

library cache trace 信息：

ALTER SESSION SET EVENTS 'immediate trace name library_cache level 4';

select value from v$diag_info where name = 'Default Trace File'

trace:

/#=48235

Bucket: #=48235 Mutex=0x813f58d8(425201762304, 12, 0, 6)

LibraryHandle: Address=0x7a9459f8 Hash=aefebc6b LockMode=0 PinMode=0 LoadLockMode=0 Status=VALD

ObjectName: Name=select count(*) from dba_objects

FullHashValue=46d40caf7797eb73f25653bdaefebc6b Namespace=SQL AREA(00) Type=CURSOR(00) ContainerId=1 ContainerUid=1 Identifier=2935929963 OwnerIdn=0

Statistics: InvalidationCount=0 ExecutionCount=2 LoadCount=2 ActiveLocks=0 TotalLockCount=2 TotalPinCount=1

Counters: BrokenCount=1 RevocablePointer=1 KeepDependency=1 Version=0 BucketInUse=1 HandleInUse=1 HandleReferenceCount=0

Concurrency: DependencyMutex=0x7a945aa8(0, 1, 0, 0) Mutex=0x7a945b48(99, 41, 0, 6)

Flags=RON/PIN/TIM/PN0/DBN/[10012841] Flags2=[0000]

WaitersLists:

Lock=0x7a945a88[0x7a945a88,0x7a945a88]

Pin=0x7a945a68[0x7a945a68,0x7a945a68]

LoadLock=0x7a945ae0[0x7a945ae0,0x7a945ae0]

Timestamp: Current=03-09-2018 07:04:18

HandleReference: Address=0x7a945be0 Handle=(nil) Flags=[00]

LibraryObject: Address=0x693a91e0 HeapMask=0000-0001-0001-0000 Flags=EXS[0000] Flags2=[0000] PublicFlags=[0000]

ChildTable: size='16'

Child: id='0' Table=0x693aa060 Reference=0x693a9b28 Handle=0x69b487b0

NamespaceDump:

Parent Cursor: sql_id=g4pkmrqrgxg3b parent=0x693a92b0 maxchild=1 plk=n ppn=n prsfcnt=0 obscnt=0

Mutex相关的等待事件：

cursor: mutex S

A session waits on this event when it is requesting a mutex in shared mode, when

another session is currently holding a this mutex in exclusive mode on the same cursor

object.

Parameter Description

P1 Hash value of cursor

P2 Mutex value (top 2 bytes contain SID holding mutex in exclusive

mode, and bottom two bytes usually hold the value 0)

P3 Mutex where (an internal code locator) OR'd with Mutex Sleeps

cursor: mutex X

The session requests the mutex for a cursor object in exclusive mode, and it must wait

because the resource is busy. The mutex is busy because either the mutex is being held

in exclusive mode by another session or the mutex is being held shared by one or more

sessions. The existing mutex holder(s) must release the mutex before the mutex can be

granted exclusively.

Parameter Description

P1 Hash value of cursor

P2 Mutex value (top 2 bytes contain SID holding mutex in exclusive

mode, and bottom two bytes usually hold the value 0)

P3 Mutex where (an internal code locator) OR'd with Mutex Sleeps

cursor: pin S

A session waits on this event when it wants to update a shared mutex pin and another

session is currently in the process of updating a shared mutex pin for the same cursor

object. This wait event should rarely be seen because a shared mutex pin update is

very fast.

Wait Time: Microseconds

Parameter Description

P1 Hash value of cursor

P2 Mutex value (top 2 bytes contains SID holding mutex in exclusive

mode, and bottom two bytes usually hold the value 0)

P3 Mutex where (an internal code locator) OR'd with Mutex Sleeps

pin S wait on X

A session waits for this event when it is requesting a shared mutex pin and another

session is holding an exclusive mutex pin on the same cursor object.

Wait Time: Microseconds

Parameter Description

P1 Hash value of cursor

P2 Mutex value (top 2 bytes contains SID holding mutex in exclusive

mode, and bottom two bytes usually hold the value 0)

P3 Mutex where (an internal code locator) OR'd with Mutex Sleeps

cursor: pin X

A session waits on this event when it is requesting an exclusive mutex pin for a cursor

object and it must wait because the resource is busy. The mutex pin for a cursor object

can be busy either because a session is already holding it exclusive, or there are one or

more sessions which are holding shared mutex pin(s). The exclusive waiter must wait

until all holders of the pin for that cursor object have released it, before it can be

granted.

Wait Time: Microseconds

Parameter Description

P1 Hash value of cursor

P2 Mutex value (top 2 bytes contains SID holding mutex in exclusive

mode, and bottom two bytes usually hold the value 0)

P3 Mutex where (an internal code locator) OR'd with Mutex Sleeps

其中cursor: pin S比较难理解：

A session waits on this event when it wants to update a shared mutex pin and another

session is currently in the process of updating a shared mutex pin for the same cursor

object. This wait event should rarely be seen because a shared mutex pin update is

very fast.

进程更新一个共享mutex pin时另外一个进程正在更新一个共享mutex pin，更新为什么需要请求S模式？

那么更新共享mutex pin 是什么操作？这里引用刘向兵的总结：总结的很到位

************************************************************************************************

Mutex数据结构中存放了Holder id持有者ID ， Ref Count，和其他Mutex相关的统计信息。 Holder id

对应于持有该Mutex的session id (v$session.sid) 。特别注意， Ref Count是进程并发以S mode参

考该Mutex的进程数量。

当一个Mutex被以X mode 持有，则Holder id 为对应持有该mutex的session id，而Ref Count为0。

每一个共享S mode持有者仅仅增加mutex上的Ref Count。可供大量session并发以S mode持有参考一个Mutex。

但是注意更新ref count的操作是串行的，这是为了避免错漏并维护mutex中正确的ref count。

下面我们详细介绍一个执行游标过程中对mutex share pin的过程：

某进程以SHRD 模式申请一个Mutex，并尝试临时修改该Mutex的Holder ID

若该Mutex正被他人更新，则该session会将Holder id设置为本session的sid，之后该进程将增加ref count，

之后再清除mutex上的Holder id。简单来说这个Holder id是真正做了并行控制的功能。若该Holder id 被

设置了，则说明该Mutex要么被以EXCL模式持有，要么正有一个其他进程在以S mode申请该Mutex的过程中(例

如更新Ref Count)。当更新Ref Count时临时设置holder id的目的就是为了实现避免其他进程并发更新该Mutex

的机制。通过这些例子说明了 , Mutex既可以用作Latch并发控制，也可用作pin。

若Holder id已被设置，则申请进程将可能进入等待事件。例如若当前Mutex的持有者进程正以X mode更新该

Mutex，则申请者的等待事件应为”cursor: pin S on X” 。而若当持有者Holder并不是”真的要持有” 该

Mutex，而仅仅是尝试更新其Ref Count,则第二个进程将等在’ Cursor

in S’等待事件上；实际正在更新

Ref count的操作时很快的，是一种轻微的操作。当第一个进程正在更新mutex，则后续的申请进程将进入spin

循环中255次等待前者结束。当mutex上不再有 Holder id时(如前者的进程已经更新完Ref Count)时，则申请

者进程将Holder ID设为自身的SID，并更新Ref Count，并清除Holder id。若在255次循环SPIN后mutex仍不被

释放，则该进程进入等待并不再跑在CPU上。

**************************************************************************************************

我这里做个试验构造cursor: pin S 事件：

开启8个session同时执行

SQL>

declare

stra varchar2(200);

begin

stra:='select * from dba_objects where rownum=1';

for i in 1..100000000 loop

execute immediate stra;

end loop;

end;

USERNAME SID SQL_ID SQL_TEXT EVENT BLOCKER wait(s) SEQ# MODULE

------------ ------ --------------- ---------------------------- ------------------- -------- ------- ------- ----------

SYS 63 fqnbs9qz85bhw select * from dba_objects wh cursor: mutex X _ 0 21049 sqlplus@se

81 fqnbs9qz85bhw select * from dba_objects wh cursor: mutex X _ 0 43091 sqlplus@se

82 fqnbs9qz85bhw select * from dba_objects wh cursor: mutex X _ 0 40827 sqlplus@se

94 fqnbs9qz85bhw select * from dba_objects wh cursor: mutex X _ 0 24743 sqlplus@se

95 fqnbs9qz85bhw select * from dba_objects wh cursor: mutex X _ 0 44537 sqlplus@se

99 fqnbs9qz85bhw select * from dba_objects wh cursor: mutex X _ 0 41516 sqlplus@se

101 fqnbs9qz85bhw select * from dba_objects wh cursor: mutex X _ 0 33414 sqlplus@se

CPU 使用率很高！！

set lines 1200 pages 1200

col event for a50

select * from (

select nvl(event,'on cpu') event,

count(*),

(ratio_to_report(count(*)) over() )*100 pct

from v$active_session_history

group by nvl(event,'on cpu')

order by count(*) desc )

where rownum<11;

EVENT COUNT(*) PCT

-------------------------------------------------- ---------- ----------

on cpu 6740 82.4666585

cursor: mutex X 1196 14.6335495

cursor: mutex S 115 1.40707207

cursor: pin S 85 1.04000979

oracle thread bootstrap 18 .220237367

os thread creation 10 .122354093

db file sequential read 3 .036706228

ADR block file read 3 .036706228

Sync ASM rebalance 1 .012235409

log file sync 1 .012235409

10 rows selected.

select * from (

select p1,

count(*),

(ratio_to_report(count(*)) over() )*100 pct

from v$active_session_history

where event = 'cursor: mutex X'

group by p1

order by count(*) desc )

where rownum<11;

P1 COUNT(*) PCT

---------- ---------- ----------

0 1179 100

为什么是 cursor: mutex X ？不可能呀！不可能在父游标请求X mutex，P1=0什么鬼？ MOS查询：

p1=0 是一个bug，Doc ID 16175381.8，mutex 的bug较多！

也许是因为dba_objects 视图底层表太多的关系，我们构造一个简单的表：

create table tb_mux (id number);

insert into tb_mux values (1);

SQL>

declare

stra varchar2(200);

begin

stra:='select * from tb_mux';

for i in 1..100000000 loop

execute immediate stra;

end loop;

end;

USERNAME SID SQL_ID SQL_TEXT EVENT BLOCKER wait(s) SEQ# MODULE

------------ ------ --------------- ---------------------------- ------------------- -------- ------- ------- ----------

SYS 99 2runuy5h23sab declare stra cursor: pin S _ 0 30418 sqlplus@se

41 2runuy5h23sab declare stra cursor: pin S _ 0 20124 sqlplus@se

82 7tvqaaccd7bt3 select * from tb_mux cursor: pin S _ 0 29569 sqlplus@se

81 2runuy5h23sab declare stra cursor: pin S _ 0 21985 sqlplus@se

95 2runuy5h23sab declare stra cursor: pin S _ 0 23641 sqlplus@se

这次看到了大量的cursor: pin S ，这个在大量的执行SQL时遇到，非常罕见，因为我这里开启了8个session同时执行。

pin S wait on X 等待事件是我们最经常遇到的，关于该等待可以关注我以前的文章，这里就不做实验了。

mutex等待严重大部分是因为硬解析过多引起，或者bug引起，或者热的handle引起。

硬解析过多那么需要关注version count的统计了，使用绑定变量，极端情况使用cursor_sharing=force.

bug 那就FIX.

热的handle比较复杂，首先你需要定位到热handle，然后使用hotcopy( Doc ID 2194448.1),一般不要使用hotcopy,

hotcopy的原理就是在SQL文本中加入pid信息这样就会让不同进程执行相同SQL的handle分配到不同的bucket中，

实验证明hotcopy只能缓解问题但不能从根本上解决问题。但是这是一个思路可以借鉴! 也有bug存在！

到这里基本上把latch和mutex讲完了！比较抽象，需要多思考！

2017-12-21

孙显鹏

oracle latch chain,ORACLE latch AND mutex 深入讲解相关推荐

oracle gc chain,ORACLE GC 类等待事件汇总分析
ORACLE GC 类等待事件汇总分析作者简介: ---------------------------------------------------------------------- @ 孙 ...
Oracle KSL Latch 管理层与 Latch管理（未看）
Oracle KSL Latch 管理层与 Latch管理 1 作者:eygle |English [转载时请标明出处和作者信息]|[恩墨学院 OCM培训传DBA成功之道] 链接:http://ww ...
oracle 闩锁（latch）概述
这是在网上看到的一篇文章,感觉这个知识点挺重要的,特地分享出来. Oracle数据库使用闩锁来管理内存的分配和释放.假设,某个用户进程(假设其为A)发出一条update语句,要去更新58号数据 ...
oracle 11g latch之v$latch系列三
背景本文为oracle 11g latch系列的第三篇文章,继续深入学习latch,想要熟悉其原理,还是先了解下相关视图的含义,尔后进一步深入其中, 便于解决问题. 本系列前2文链接如 ...
oracle数据库latch,关于Oracle数据库latch: cache buffers chains等待事件
关于Oracle数据库latch: cache buffers chains等待事件 latch: cache buffers chains等待事件的原理当一个数据块读入到sga中时,该块的块头(b ...
2.latch之oracle latch
2.latch之oracle latch Oracle的Latch专业术语叫锁存器在12C下共有770中LATCH 查看如下: SQL> select count(*) from v$latc ...
oracle 闩机制,Oracle latch闩原理示意图
还是搞不懂oracle中latch 闩的原理吗?那么来看看这个图以及下面这段代码如何? Function Get_Latch(latch_name,mode) { If Mode eq 'immed ...
oracle中lock和latch的用途
本文向各位阐述Oracle的Latch机制,Latch,用金山词霸翻译是门插栓,闭锁,专业术语叫锁存器,我开始接触时就不大明白为什么不写Lock,不都是锁吗?只是翻译不同而以?研究过后才知道两者有很大 ...
oracle+dba+网课,[Oracle] 蓬动Oracle教程 DBA培训视频实战精品课及开发转Oracle 共52课...
资源介绍 0:Oracle学习方法 1:学习Oracle第一步(虚拟机+Linux安装) 2:静默安装1 3:静默安装2 4:静默安装3 5:细讲Oracle数据库启动步骤.flv 6:细讲Oracl ...
oracle index alter,Oracle alter index rebuild 一系列问题
在ITPUB 论坛上看到的一个帖子,很不错.根据论坛的帖子重做整理了一下. 原文链接如下: alter index rebuild online引发的血案一. 官网说明在MOS 上的一篇文章讲 ...

oracle latch chain,ORACLE latch AND mutex 深入讲解

oracle latch chain,ORACLE latch AND mutex 深入讲解相关推荐

最新文章

热门文章