转帖,同时画蛇添足做了一些其他笔记

转:彻底搞清楚library cache lock的成因和解决方法

问题描述:
接到应用人员的报告,说是在任何对表CSNOZ629926699966的操作都会hang,包括desc CSNOZ629926699966,例如:

ora9i@cs_dc02:/ora9i > sqlplus pubuser/pubuser

SQL*Plus: Release 9.2.0.4.0 - Production on Mon Jan 10 10:11:06 2005

Copyright (c) 1982, 2002, Oracle Corporation. All rights reserved.

Connected to:
Oracle9i Enterprise Edition Release 9.2.0.4.0 - 64bit Production
With the Partitioning and Real Application Clusters options
JServer Release 9.2.0.4.0 - Production

SQL> conn pubuser/pubuser
Connected.
SQL> desc CSNOZ629926699966

。。。

这个进程 hang 了

。。。

询问了一下之前有无特别的操作,业务人员说很久以前执行了脚本,但是该教本运行很久都没有结果,然后他就退出了会话,再之后,就出现了上面的情况。脚本内容如下:$ cat CSNOZ629926699966.sh
#!/bin/sh
sqlplus pubuser/pubuser@csmisc << EOF #use your username/password

create table CSNOZ629926699966 as select * from CSNOZ62992266cs
where mid not in ( select mid from pubuser.SUBSCRIPTION_BAK_200412@newdb where servid='020999011964' and status in ('A','B','S'));

exit;

================================

--------- 跟执行的脚本内从没太大关系,最根本的是跟强行退出会话有关,导致客户端虽然停止,数据库系统进程仍然存在,不能正常终止,似乎也不能在进行

==================================

解决过程:ora9i@cs_dc02:/ora9i > sqlplus "/ as sysdba"

SQL*Plus: Release 9.2.0.4.0 - Production on Mon Jan 10 10:19:13 2005

Copyright (c) 1982, 2002, Oracle Corporation. All rights reserved.

Connected to:
Oracle9i Enterprise Edition Release 9.2.0.4.0 - 64bit Production
With the Partitioning and Real Application Clusters options
JServer Release 9.2.0.4.0 - Production

SQL> select * from v$lock where block=1;

no rows selected

SQL> /

no rows selected

SQL> /

no rows selected

SQL>
我们看到目前没有锁的信息

SQL> select xidusn, object_id, session_id, locked_mode from v$locked_object;

。。。

XIDUSN OBJECT_ID SESSION_ID LOCKED_MODE
---------- ---------- ---------- -----------
14 18 37 3

。。。

SQL> /

。。。

XIDUSN OBJECT_ID SESSION_ID LOCKED_MODE
---------- ---------- ---------- -----------
14 18 37 3

。。。

SQL> /

。。。

XIDUSN OBJECT_ID SESSION_ID LOCKED_MODE
---------- ---------- ---------- -----------
14 18 37 3

。。。

SQL>
查找 v$locked_object,我们发现了一个可疑的会话,SID 37:

SQL> select object_name,owner,object_type from dba_objects where object_id=18;

。。。 。。。

OBJECT_NAME OWNER OBJECT_TYPE
------------------------------ ------------------------------ ------------------
OBJ$ SYS TABLE

。。。 。。。

SQL>

奇怪怎么一直有这个锁??
初步猜测是由于SID为37的会话执行了上面的DDL语句,并在语句未完成前异常退出,
造成了所有访问那个(DDL语句中涉及到的)对象的进程都hang了。

接下来我们看看等待事件:
SQL> select event,sid,p1,p2,p3 from v$session_wait where event not like 'SQL*%' and event not like 'rdbms%';

EVENT P1 P2 SID
---------------------------------------------------------------- ---------- ---------- ----------
pmon timer 300 0 1
ges remote message 32 0 4
gcs remote message 64 0 5
gcs remote message 64 0 7
smon timer 300 0 19
library cache lock 1.3835E+19 1.3835E+19 30wakeup time manager 0 0 22

7 rows selected.

SQL> /

EVENT P1 P2 SID
---------------------------------------------------------------- ---------- ---------- ----------
pmon timer 300 0 1
ges remote message 32 0 4
gcs remote message 64 0 5
gcs remote message 64 0 7
smon timer 300 0 19
library cache lock 1.3835E+19 1.3835E+19 30wakeup time manager 0 0 22

7 rows selected.

SQL> /

EVENT P1 P2 SID
---------------------------------------------------------------- ---------- ---------- ----------
pmon timer 300 0 1
ges remote message 32 0 4
gcs remote message 64 0 5
gcs remote message 64 0 7
smon timer 300 0 19
library cache lock 1.3835E+19 1.3835E+19 30wakeup time manager 0 0 22

7 rows selected.

SQL> /

EVENT P1 P2 SID
---------------------------------------------------------------- ---------- ---------- ----------
pmon timer 300 0 1
ges remote message 32 0 4
gcs remote message 64 0 5
gcs remote message 64 0 7
smon timer 300 0 19
library cache lock 1.3835E+19 1.3835E+19 30wakeup time manager 0 0 22

7 rows selected.

SQL>

我们注意到下面的事件:
EVENT P1 P2 SID
---------------------------------------------------------------- ---------- ---------- ----------
。。。

library cache lock 1.3835E+19 1.3835E+19 30

。。。

P1 是句柄地址(handle address),也就是'library cache lock'发生的地址。
P2 是一个状态对象,在这里,它表示在对象上加载的锁的地址(lock address)。
P1 和 P2都是科学计数发表示的10进制数。

这些信息再次证实了上面的猜测,SID 37阻塞了SID 30。

找出这两个可疑进程的sid和serial,然后对他们设置10046事件:SQL> select sid,serial# from v$session where sid in (30,37);

SID SERIAL#
---------- ----------
30 24167
37 2707

SQL> exec dbms_system.set_ev(30,24167,10046,12,'');

PL/SQL procedure successfully completed.

SQL> exec dbms_system.set_ev(37,2707,10046,12,'');

PL/SQL procedure successfully completed.

=================================

使用DBMS_SYSTEM.SET_EV的方法
a. 过和定义如下
DBMS_SYSTEM.SET_EV(
SI Binary_integer,
SE Binary_integer,
EV Binary_integer,
LE Binary_integer,
NM Binary_integer);

SI: 即v$session中的sid
SE:即v$session中的serial#
EV:要设置的事件
LE:要设置事件的级别
NM:名称

===============================================

彻底搞清楚library cache lock的成因和解决方法 收藏
 
问题描述:
接到应用人员的报告,说是在任何对表CSNOZ629926699966的操作都会hang,包括desc CSNOZ629926699966,例如:

ora9i@cs_dc02:/ora9i > sqlplus pubuser/pubuser

SQL*Plus: Release 9.2.0.4.0 - Production on Mon Jan 10 10:11:06 2005

Copyright (c) 1982, 2002, Oracle Corporation. All rights reserved.

Connected to:
Oracle9i Enterprise Edition Release 9.2.0.4.0 - 64bit Production
With the Partitioning and Real Application Clusters options
JServer Release 9.2.0.4.0 - Production

SQL> conn pubuser/pubuser
Connected.
SQL> desc CSNOZ629926699966

。。。

这个进程 hang 了

。。。

询问了一下之前有无特别的操作,业务人员说很久以前执行了脚本,但是该教本运行很久都没有结果,然后他就退出了会话,再之后,就出现了上面的情况。脚本内容如下:$ cat CSNOZ629926699966.sh
#!/bin/sh
sqlplus pubuser/pubuser@csmisc << EOF #use your username/password

create table CSNOZ629926699966 as select * from CSNOZ62992266cs
where mid not in ( select mid from pubuser.SUBSCRIPTION_BAK_200412@newdb where servid='020999011964' and status in ('A','B','S'));

exit;
$
$
$
$

解决过程:ora9i@cs_dc02:/ora9i > sqlplus "/ as sysdba"

SQL*Plus: Release 9.2.0.4.0 - Production on Mon Jan 10 10:19:13 2005

Copyright (c) 1982, 2002, Oracle Corporation. All rights reserved.

Connected to:
Oracle9i Enterprise Edition Release 9.2.0.4.0 - 64bit Production
With the Partitioning and Real Application Clusters options
JServer Release 9.2.0.4.0 - Production

SQL> select * from v$lock where block=1;

no rows selected

SQL> /

no rows selected

SQL> /

no rows selected

SQL>
我们看到目前没有锁的信息

SQL> select xidusn, object_id, session_id, locked_mode from v$locked_object;

。。。

XIDUSN OBJECT_ID SESSION_ID LOCKED_MODE
---------- ---------- ---------- -----------
14 18 37 3

。。。

SQL> /

。。。

XIDUSN OBJECT_ID SESSION_ID LOCKED_MODE
---------- ---------- ---------- -----------
14 18 37 3

。。。

SQL> /

。。。

XIDUSN OBJECT_ID SESSION_ID LOCKED_MODE
---------- ---------- ---------- -----------
14 18 37 3

。。。

SQL>
查找 v$locked_object,我们发现了一个可疑的会话,SID 37:

SQL> select object_name,owner,object_type from dba_objects where object_id=18;

。。。 。。。

OBJECT_NAME OWNER OBJECT_TYPE
------------------------------ ------------------------------ ------------------
OBJ$ SYS TABLE

。。。 。。。

SQL>

奇怪怎么一直有这个锁??
初步猜测是由于SID为37的会话执行了上面的DDL语句,并在语句未完成前异常退出,
造成了所有访问那个(DDL语句中涉及到的)对象的进程都hang了。

接下来我们看看等待事件:
SQL> select event,sid,p1,p2,p3 from v$session_wait where event not like 'SQL*%' and event not like 'rdbms%';

EVENT P1 P2 SID
---------------------------------------------------------------- ---------- ---------- ----------
pmon timer 300 0 1
ges remote message 32 0 4
gcs remote message 64 0 5
gcs remote message 64 0 7
smon timer 300 0 19
library cache lock 1.3835E+19 1.3835E+19 30wakeup time manager 0 0 22

7 rows selected.

SQL> /

EVENT P1 P2 SID
---------------------------------------------------------------- ---------- ---------- ----------
pmon timer 300 0 1
ges remote message 32 0 4
gcs remote message 64 0 5
gcs remote message 64 0 7
smon timer 300 0 19
library cache lock 1.3835E+19 1.3835E+19 30wakeup time manager 0 0 22

7 rows selected.

SQL> /

EVENT P1 P2 SID
---------------------------------------------------------------- ---------- ---------- ----------
pmon timer 300 0 1
ges remote message 32 0 4
gcs remote message 64 0 5
gcs remote message 64 0 7
smon timer 300 0 19
library cache lock 1.3835E+19 1.3835E+19 30wakeup time manager 0 0 22

7 rows selected.

SQL> /

EVENT P1 P2 SID
---------------------------------------------------------------- ---------- ---------- ----------
pmon timer 300 0 1
ges remote message 32 0 4
gcs remote message 64 0 5
gcs remote message 64 0 7
smon timer 300 0 19
library cache lock 1.3835E+19 1.3835E+19 30wakeup time manager 0 0 22

7 rows selected.

SQL>

我们注意到下面的事件:
EVENT P1 P2 SID
---------------------------------------------------------------- ---------- ---------- ----------
。。。

library cache lock 1.3835E+19 1.3835E+19 30

。。。

P1 是句柄地址(handle address),也就是'library cache lock'发生的地址。
P2 是一个状态对象,在这里,它表示在对象上加载的锁的地址(lock address)。
P1 和 P2都是科学计数发表示的10进制数。

这些信息再次证实了上面的猜测,SID 37阻塞了SID 30。

找出这两个可疑进程的sid和serial,然后对他们设置10046事件:SQL> select sid,serial# from v$session where sid in (30,37);

SID SERIAL#
---------- ----------
30 24167
37 2707

SQL> exec dbms_system.set_ev(30,24167,10046,12,'');

PL/SQL procedure successfully completed.

SQL> exec dbms_system.set_ev(37,2707,10046,12,'');

PL/SQL procedure successfully completed.

SQL>

跟踪期间咱们再次测试一下,看看有没有其他线索。

新开一个进程,找出其sid, serial和spid等信息:ora9i@cs_dc01:/ora9i > sqlplus pubuser/pubuser

SQL> select distinct sid from v$mystat;

SID
----------
33

SQL> select sid,serial# from v$session where sid=33;

SID SERIAL#
---------- ----------
33 6639

SQL> SELECT SPID,PID FROM V$PROCESS WHERE ADDR=(SELECT PADDR FROM V$SESSION WHERE SID=37);

SPID PID
------------ ----------
20552 26

SQL> SELECT SPID,PID FROM V$PROCESS WHERE ADDR=(SELECT PADDR FROM V$SESSION WHERE SID=30);

SPID PID
------------ ----------
22580 28

SQL> show parameter dump

NAME TYPE VALUE
------------------------------------ ----------- ------------------------------
background_core_dump string partial
background_dump_dest string /ora9i/app/oracle/admin/csmisc
/bdump
core_dump_dest string /ora9i/app/oracle/admin/csmisc
/cdump
max_dump_file_size string UNLIMITED
shadow_core_dump string partial
user_dump_dest string /ora9i/app/oracle/admin/csmisc
/udump
SQL>

然后,再尝试对 CSNOZ629926699966 表进行操作
SQL> desc CSNOZ629926699966

。。。

还是hang住了。

于是中断这个操作(CTRL + C):

SQL> desc CSNOZ629926699966
ERROR:
ORA-01013: user requested cancel of current operation

SQL> select tname from tab where tname='CSNOZ629926699966';

no rows selected

SQL> 查看PUBUSER用户下的这个表,居然不存在!!

进一步证实了前面的猜测,也就是说会话37阻塞了其他所有操作表CSNOZ629926699966的会话,造成了进程的hang,当然,包括上面的SID 30和SID 33的DDL语句

现在,我们结束10046的事件跟踪:SQL> exec dbms_system.set_ev(30,24167,0,0,'');

PL/SQL procedure successfully completed.

SQL> exec dbms_system.set_ev(37,2707,0,0,'');

PL/SQL procedure successfully completed.

SQL>

根据上面记录的信息,我们知道这两个会话产生的跟踪信息分别为:
SID为30的会话,产生的跟踪文件为:/ora9i/app/oracle/admin/csmisc/udump/csmisc2_ora_22580.trc
SID为37的会话,产生的跟踪文件为:/ora9i/app/oracle/admin/csmisc/udump/csmisc2_ora_20552.trc

看看trace文件:
ora9i@cs_dc02:/ora9i > cd /ora9i/app/oracle/admin/csmisc/udump
ora9i@cs_dc02:/ora9i/app/oracle/admin/csmisc/udump > ll -tlc
total 4432
-rw-r----- 1 ora9i dba 332995 Jan 10 12:00 csmisc2_ora_22580.trc
-rw-r----- 1 ora9i dba 3168 Jan 10 11:59 csmisc2_ora_20552.trc-rw-r----- 1 ora9i dba 407133 Jan 7 15:10 csmisc2_ora_2708.trc
-rw-r----- 1 ora9i dba 640 Jan 7 14:48 csmisc2_ora_835.trc
-rw-r----- 1 ora9i dba 1590 Dec 30 22:50 csmisc2_ora_16244.trc
-rw-r----- 1 ora9i dba 1308403 Dec 30 22:44 csmisc2_ora_16033.trc
-rw-r----- 1 ora9i dba 616 Dec 28 14:16 csmisc2_ora_2176.trc
-rw-r----- 1 ora9i dba 644 Dec 8 18:22 csmisc2_ora_21083.trc
ora9i@cs_dc02:/ora9i/app/oracle/admin/csmisc/udump > mailx -s "csmisc2_ora_22580.trc" zhangdp@aspire-tech.com < csmisc2_ora_22580.trc
ora9i@cs_dc02:/ora9i/app/oracle/admin/csmisc/udump > mailx -s "csmisc2_ora_20552.trc" zhangdp@aspire-tech.com < csmisc2_ora_20552.trc
ora9i@cs_dc02:/ora9i/app/oracle/admin/csmisc/udump > exit

SQL>

我们看到SID为30的会话,产生的跟踪文件(csmisc2_ora_22580.trc)为的主要内容是:/ora9i/app/oracle/admin/csmisc/udump/csmisc2_ora_22580.trc
Oracle9i Enterprise Edition Release 9.2.0.4.0 - 64bit Production
With the Partitioning and Real Application Clusters options
JServer Release 9.2.0.4.0 - Production
ORACLE_HOME = /ora9i/app/oracle/product/920
System name: HP-UX
Node name: cs_dc02
Release: B.11.11
Version: U
Machine: 9000/800
Instance name: csmisc2Redo thread mounted by this instance: 2
Oracle process number: 28Unix process pid: 22580, image: oracle@cs_dc02 (TNS V1-V3)

*** 2005-01-10 11:31:49.416
*** SESSION ID:(30.24167) 2005-01-10 11:31:49.354
WAIT #0: nam='library cache lock' ela= 507258 p1=-4611686013547141416 p2=-4611686013691716064 p3=1301
WAIT #0: nam='library cache lock' ela= 505686 p1=-4611686013547141416 p2=-4611686013691716064 p3=1301
WAIT #0: nam='library cache lock' ela= 507678 p1=-4611686013547141416 p2=-4611686013691716064 p3=1301
WAIT #0: nam='library cache lock' ela= 507595 p1=-4611686013547141416 p2=-4611686013691716064 p3=1301
WAIT #0: nam='library cache lock' ela= 507880 p1=-4611686013547141416 p2=-4611686013691716064 p3=1301
WAIT #0: nam='library cache lock' ela= 507317 p1=-4611686013547141416 p2=-4611686013691716064 p3=1301
WAIT #0: nam='library cache lock' ela= 507703 p1=-4611686013547141416 p2=-4611686013691716064 p3=1301
WAIT #0: nam='library cache lock' ela= 507683 p1=-4611686013547141416 p2=-4611686013691716064 p3=1301
WAIT #0: nam='library cache lock' ela= 508265 p1=-4611686013547141416 p2=-4611686013691716064 p3=1301
WAIT #0: nam='library cache lock' ela= 507100 p1=-4611686013547141416 p2=-4611686013691716064 p3=1301
WAIT #0: nam='library cache lock' ela= 507684 p1=-4611686013547141416 p2=-4611686013691716064 p3=1301
WAIT #0: nam='library cache lock' ela= 505889 p1=-4611686013547141416 p2=-4611686013691716064 p3=1301
WAIT #0: nam='library cache lock' ela= 507731 p1=-4611686013547141416 p2=-4611686013691716064 p3=1301
WAIT #0: nam='library cache lock' ela= 507650 p1=-4611686013547141416 p2=-4611686013691716064 p3=1301
WAIT #0: nam='library cache lock' ela= 507604 p1=-4611686013547141416 p2=-4611686013691716064 p3=1301
WAIT #0: nam='library cache lock' ela= 507698 p1=-4611686013547141416 p2=-4611686013691716064 p3=1301

。。。 。。。

我们看到SID 30的跟踪文件中的等待事件就是在V$SESSION_WAIT中看到的'library cache lock' .

再看看SID为37的会话,产生的跟踪文件(csmisc2_ora_20552.trc)为的主要内容是:
Oracle9i Enterprise Edition Release 9.2.0.4.0 - 64bit Production
With the Partitioning and Real Application Clusters options
JServer Release 9.2.0.4.0 - Production
ORACLE_HOME = /ora9i/app/oracle/product/920
System name: HP-UX
Node name: cs_dc02Release: B.11.11
Version: U
Machine: 9000/800
Instance name: csmisc2Redo thread mounted by this instance: 2
Oracle process number: 26
Unix process pid: 20552, image: oracle@cs_dc02 (TNS V1-V3)

*** 2005-01-10 11:33:22.702
*** SESSION ID:(37.2707) 2005-01-10 11:33:22.690
WAIT #1: nam='SQL*Net message to dblink' ela= 4 p1=675562835 p2=1 p3=0
*** 2005-01-10 11:35:07.452
WAIT #1: nam='SQL*Net message from dblink' ela= 102293555 p1=675562835 p2=1 p3=0
WAIT #1: nam='SQL*Net message to dblink' ela= 3 p1=675562835 p2=1 p3=0*** 2005-01-10 11:36:55.980
WAIT #1: nam='SQL*Net message from dblink' ela= 105969709 p1=675562835 p2=1 p3=0
WAIT #1: nam='SQL*Net message to dblink' ela= 4 p1=675562835 p2=1 p3=0
*** 2005-01-10 11:39:05.416
WAIT #1: nam='SQL*Net message from dblink' ela= 126390826 p1=675562835 p2=1 p3=0
WAIT #1: nam='SQL*Net message to dblink' ela= 4 p1=675562835 p2=1 p3=0
*** 2005-01-10 11:41:12.878
WAIT #1: nam='SQL*Net message from dblink' ela= 124461520 p1=675562835 p2=1 p3=0
WAIT #1: nam='SQL*Net message to dblink' ela= 4 p1=675562835 p2=1 p3=0
*** 2005-01-10 11:43:01.285
WAIT #1: nam='SQL*Net message from dblink' ela= 105859385 p1=675562835 p2=1 p3=0
WAIT #1: nam='SQL*Net message to dblink' ela= 4 p1=675562835 p2=1 p3=0
*** 2005-01-10 11:44:48.200
WAIT #1: nam='SQL*Net message from dblink' ela= 104397696 p1=675562835 p2=1 p3=0
WAIT #1: nam='SQL*Net message to dblink' ela= 4 p1=675562835 p2=1 p3=0

。。。 。。。

现在我们来dump 系统状态(systemstate),看看更详细的信息。

首先简单的介绍一下 event systemstate。很多人把 systemstate 事件理解为dump发生的那一刻的系统内所有进程的信息,这是个错误的概念,事实上,
转储 system state 产生的跟踪文件是从dump那一刻开始到dump任务完成之间一段事件内的系统内所有进程的信息。

dump systemstate产生的跟踪文件包含了系统中所有进程的进程状态等信息。每个进程对应跟踪文件中的一段内容,反映该进程的状态信息,包括进程信息,会话信息,enqueues信息(主要是lock的信息),缓冲区的信息和该进程在SGA区中持有的(held)对象的状态等信息。

那么通常在什么情况下使用systemstate比较合适呢? Oracle推荐的使用systemstate事件的几种情况是:

数据库 hang 住了
数据库很慢
进程正在hang
数据库出现某些错误
资源争用
dump systemstate的语法为: ALTER SESSION SET EVENTS 'immediate trace name systemstate level 10';

也可以使用ORADEBUG实现这个功能
ORADEBUG DUMP SYSTEMSTATE level 10

如果希望在数据库发生某种错误时除非systemstate事件,可以在参数文件(spfile或者pfile)中设置event参数,
例如,当系统发生死锁(出现ORA-00060错误)时dump systemstate:
event = "60 trace name systemstate level 10"

言归正传,我们dump系统状态:SQL> ALTER SESSION SET EVENTS 'IMMEDIATE TRACE NAME SYSTEMSTATE LEVEL 8';

Session altered.

SQL> host
ora9i@cs_dc02:/ora9i >cd /ora9i/app/oracle/admin/csmisc/udump
ora9i@cs_dc02:/ora9i/app/oracle/admin/csmisc/udump > ll -ctl
-rw-r----- 1 ora9i dba 1070863 Jan 10 13:02 csmisc2_ora_22580.trc
-rw-r----- 1 ora9i dba 1345368 Jan 10 13:01 csmisc2_ora_22568.trc-rwxrwxrwx 1 ora9i dba 44114 Jan 10 12:52 ass1015.awk
-rw-r----- 1 ora9i dba 407133 Jan 7 15:10 csmisc2_ora_2708.trc
-rw-r----- 1 ora9i dba 640 Jan 7 14:48 csmisc2_ora_835.trc
-rw-r----- 1 ora9i dba 1590 Dec 30 22:50 csmisc2_ora_16244.trc
-rw-r----- 1 ora9i dba 1308403 Dec 30 22:44 csmisc2_ora_16033.trc
-rw-r----- 1 ora9i dba 616 Dec 28 14:16 csmisc2_ora_2176.trc
-rw-r----- 1 ora9i dba 644 Dec 8 18:22 csmisc2_ora_21083.trc
ora9i@cs_dc02:/ora9i/app/oracle/admin/csmisc/udump >
ora9i@cs_dc02:/ora9i/app/oracle/admin/csmisc/udump > mailx -s "22568" zhangdp@aspire-tech.com < csmisc2_ora_22568.trc

这个跟踪文件很大(因为它包含了所有进程的信息),那么我们从哪里开始看起呢?

首先,通过在跟踪文件中查找字符串"waiting for 'library cache lock'",我们找到了被阻塞进程的信息:

PROCESS 28: ----------------被阻塞的Oracle进程,这里PROCESS 28对应了V$PROCESS中的PID的值,
也就是说我们可以根据这一信息在V$PROCESS和V$SESSION找到被阻塞的会话的信息 ----------------------------------------
SO: c000000109c83bf0, type: 2, owner: 0000000000000000, flag: INIT/-/-/0x00
(process) Oracle pid=28, calls cur/top: c00000010b277890/c00000010b277890, flag: (0) -
int error: 0, call error: 0, sess error: 0, txn error 0
(post info) last post received: 17 24 6
last post received-location: ksusig
last process to post me: c000000109c840f8 25 0
last post sent: 0 0 15
last post sent-location: ksasnd
last process posted by me: c000000109c7ff90 1 6
(latch info) wait_event=0 bits=0
Process Group: DEFAULT, pseudo proc: c000000109eefda0
O/S info: user: ora9i, term: pts/th, ospid: 22580 ----------------该进程的操作系统进程号,对应于V$PROCESS中的SPID
OSD pid info: Unix process pid: 22580, image: oracle@cs_dc02 (TNS V1-V3)
----------------------------------------
SO: c000000109f02c68, type: 4, owner: c000000109c83bf0, flag: INIT/-/-/0x00
(session) trans: 0000000000000000, creator: c000000109c83bf0, flag: (100041) USR/- BSY/-/-/-/-/-
DID: 0002-001C-00000192, short-term DID: 0000-0000-00000000
txn branch: 0000000000000000
oct: 0, prv: 0, sql: c00000011f8ea068, psql: c00000011f8ea068, user: 50/PUBUSER
O/S info: user: ora9i, term: , ospid: 22536, machine: cs_dc02
program: sqlplus@cs_dc02 (TNS V1-V3)
application name: SQL*Plus, hash value=3669949024 waiting for 'library cache lock' blocking sess=0x0 seq=18589 wait_time=0
handle address=c000000122e2a6d8, lock address=c00000011a449e20, 100*mode+namespace=515

。。。 。。。

SO: c00000010b277890, type: 3, owner: c000000109c83bf0, flag: INIT/-/-/0x00
(call) sess: cur c000000109f02c68, rec 0, usr c000000109f02c68; depth: 0
----------------------------------------
SO: c00000011a449e20, type: 51, owner: c00000010b277890, flag: INIT/-/-/0x00
LIBRARY OBJECT LOCK: lock=c00000011a449e20 handle=c000000122e2a6d8 request=S
call pin=0000000000000000 session pin=0000000000000000
htl=c00000011a449e90[c00000011a4bc350,c00000011a4bc350] htb=c00000011a4bc350
user=c000000109f02c68 session=c000000109f02c68 count=0 flags=[00] savepoint=463
the rest of the object was already dumped

。。。 。。。

请注意下面的信息: waiting for 'library cache lock' blocking sess=0x0 seq=18589 wait_time=0
handle address=c000000122e2a6d8, lock address=c00000011a449e20, 100*mode+namespace=515

这段信息告诉我们ORACLE PID为 28的进程(PROCESS 28),正在等待'library cache lock' ,通过‘handle address=c000000122e2a6d8’我们可以找到阻塞它的会话的ORACLE PID信息。

还要注意这段信息: LIBRARY OBJECT LOCK: lock=c00000011a449e20 handle=c000000122e2a6d8 request=S
call pin=0000000000000000 session pin=0000000000000000
htl=c00000011a449e90[c00000011a4bc350,c00000011a4bc350] htb=c00000011a4bc350
user=c000000109f02c68 session=c000000109f02c68 count=0 flags=[00] savepoint=463

这里就是阻塞PROCESS 28进程的会话的信息。

简单的记住这个依据的要点是:

waiting session的'handle address'的值对应于blocking session的'handle'的值。

回过头来,看看这个值,它应于上面我们在V$SESSION_WAIT中看到的P1和P2的值:
SQL> select to_number('C000000122E2A6D8','XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX') from dual;

TO_NUMBER('C000000122E2A6D8','XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX')
----------------------------------------------------------------
1.3835E+19

SQL>

问题的成因已经基本上明确了,这里推荐两种解决问题的方法:
方法1,根据 c000000122e2a6d8 地址,我们可以得到当前在library cache中相应的锁信息: SQL> l
1 select INST_ID,USER_NAME,KGLNAOBJ,KGLLKSNM,KGLLKUSE,KGLLKSES,KGLLKMOD,KGLLKREQ,KGLLKPNS,KGLLKHDL
2* from X$KGLLK where KGLLKHDL = 'C000000122E2A6D8' order by KGLLKSNM,KGLNAOBJ
SQL> /

INST_ID USER_NAME KGLNAOBJ KGLLKSNM KGLLKUSE KGLLKSES KGLLKMOD KGLLKREQ KGLLKPNS KGLLKHDL
---------- ------------- ---------------------- ---------- ---------------- ---------------- ---------- ---------- ---------------- ----------------
2 PUBUSER CSNOZ629926699966 30 C000000109F02C68 C000000109F02C68 0 2 00 C000000122E2A6D8
2 PUBUSER CSNOZ629926699966 37 C000000108C99E28 C000000108C99E28 3 0 00 C000000122E2A6D8

SQL>

按照Oracle推荐的做法,我们现在应该使用'alter system kill session'命令kill掉SID 37,结果得到了ORA-00031错误:
SQL> alter system kill session '37,2707';

alter system kill session '37,2707'
*
ERROR at line 1:
ORA-00031: session marked for kill

SQL>

检查SID 37的状态:
SQL> set linesize 150
SQL> col program for a50
SQL> select sid,serial#,status,username,program from v$session where sid=37;

SID SERIAL# STATUS USERNAME PROGRAM
---------- ---------- -------- ------------------------------ --------------------------------------------------
37 2707 KILLED PUBUSER sqlplus@cs_dc02 (TNS V1-V3)

SQL>
再次证实了我们最初的想法—— 有人在执行了某个需要运行很久的DDL(多数是语句效率低,当然不排除遭遇bug的可能),
然后没等语句结束就异常退出了会话。

这个例子中我们在上面的跟踪文件已经找到了该会话对应的操作系统进程(SPID),如果在其他情况下,我们如何找到这种状态为'KILLED'
的操作系统进程号(SPID)呢?
下面给出了一个方法,可以借鉴:
SQL> l
1 SELECT s.username,s.status,
2 x.ADDR,x.KSLLAPSC,x.KSLLAPSN,x.KSLLASPO,x.KSLLID1R,x.KSLLRTYP,
3 decode(bitand (x.ksuprflg,2),0,null,1)
4 FROM x$ksupr x,v$session s
5 WHERE s.paddr(+)=x.addr
6 and bitand(ksspaflg,1)!=0
7* and s.sid=37
SQL> /

USERNAME STATUS ADDR KSLLAPSC KSLLAPSN KSLLASPO KSLLID1R KS D
------------------------------ -------- ---------------- ---------- ---------- ------------ ---------- -- -
PUBUSER KILLED C000000109C831E0 41 15 16243 17

SQL>

x$ksupr.ADDR列的值对应了V$PROCESS 中的ADDR的值,知道了这个SPID的地址,找到这个操作系统进程(SPID)就简单了,例如:
SQL> select spid,pid from v$process where addr='C000000109C831E0';

SPID PID
------------ ----------
20552 26

SQL>

现在,我们只需要在操作系统上 kill 操作系统进程20552就可以了:
ora9i@cs_dc02:/ora9i > ps -ef | grep 20552
ora9i 20552 1 0 Jan 8 ? 0:01 oraclecsmisc2 (LOCAL=NO)
ora9i 14742 14740 0 17:19:02 pts/ti 0:00 grep 20552
ora9i@cs_dc02:/ora9i > kill -9 20552
ora9i@cs_dc02:/ora9i > ps -ef | grep 20552
ora9i 14966 14964 0 17:40:01 pts/ti 0:00 grep 20552
ora9i@cs_dc02:/ora9i >

再来检查一下SID 37的信息,我们看到这个会话是真的被kill掉了,
ora9i@cs_dc02:/ora9i > exit

SQL> select sid,serial#,status,username,program from v$session where sid=37;

no rows selected

SQL> l
1 SELECT s.username,s.status,
2 x.ADDR,x.KSLLAPSC,x.KSLLAPSN,x.KSLLASPO,x.KSLLID1R,x.KSLLRTYP,
3 decode(bitand (x.ksuprflg,2),0,null,1)
4 FROM x$ksupr x,v$session s
5 WHERE s.paddr(+)=x.addr
6 and bitand(ksspaflg,1)!=0
7* and s.sid=37
SQL> /

no rows selected

SQL>

回到刚才hang住的会话,它已经恢复了正常操作,
并且我们已经得到了'ORA-04043: object CSNOZ629926699966 does not exist'这个正常的信息:
SQL> desc CSNOZ629926699966

ERROR:
ORA-04043: object CSNOZ629926699966 does not exist

SQL>

在开一个会话,测试一把:
ora9i@cs_dc02:/ora9i > sqlplus pubuser/pubuser

SQL*Plus: Release 9.2.0.4.0 - Production on Mon Jan 10 17:42:16 2005

Copyright (c) 1982, 2002, Oracle Corporation. All rights reserved.

Connected to:
Oracle9i Enterprise Edition Release 9.2.0.4.0 - 64bit Production
With the Partitioning and Real Application Clusters options
JServer Release 9.2.0.4.0 - Production

SQL> set timing on
SQL> desc CSNOZ629926699966
ERROR:
ORA-04043: object CSNOZ629926699966 does not exist

SQL>
当发出命令'desc CSNOZ629926699966'的时候,我们看到系统立刻返回了ORA-04043: object CSNOZ629926699966 does not exist'信息,问题就此解决了。

这里,简单的介绍一下X$KGLLK,这个基表保存了库缓存中对象的锁的信息,它对于解决这类问题特别有用,其名称的含义如下:
[K]ernel Layer
[G]eneric Layer
[L]ibrary Cache Manager ( defined and mapped from kqlf )
Object Locks
X$KGLLK - Object [L]oc[K]s

KGLNAOBJ 列包含了在librarky cache中的对象上执行命令的语句的前80个字符(其实从这里我们也可以大大缩小范围了)
X$KGLLK.KGLLKUSE 和 x$kgllk.KGLLKSES 对应于跟踪文件中的owner的值
X$KGLLK.KGLLKADR
X$KGLLK.KGLLKHDL 对应于跟踪文件中的handle的值(handle=C000000122E2A6D8),也就是'library cache lock'的地址
X$KGLLK.KGLLKPNS 对应于跟踪文件中的session pin的值
X$KGLLK.KGLLKSPN对应于跟踪文件中的savepoint的值

我们再来看一下更全面的信息:
SQL> set linesize 2000
SQL> select * from X$KGLLK where KGLLKHDL = 'C000000122E2A6D8' order by KGLLKSNM,KGLNAOBJ
2 /

ADDR INDX INST_ID KGLLKADR KGLLKUSE KGLLKSES KGLLKSNM KGLLKHDL KGLLKPNC KGLLKPNS KGLLKCNT KGLLKMOD KGLLKREQ KGLLKFLG KGLLKSPN KGLLKHTB KGLNAHSH KGLHDPAR KGLHDNSP USER_NAME KGLNAOBJ
---------------- ---------- ---------- ---------------- ---------------- ---------------- ---------- ---------------- ---------------- ---------------- ---------- ---------- ---------- ---------- ---------- ---------------- ---------- ---------------- ---------- ------------------------------ ------------------------------------------------------------
800003FB0007E4D0 33 2 C00000011A449E20 C000000109F02C68 C000000109F02C68 30 C000000122E2A6D8 00 00 0 0 2 0 463 C00000011A4BC350 3990848181 C000000122E2A6D8 1 PUBUSER CSNOZ629926699966
800003FB0007E5B0 34 2 C00000011A44A150 C000000108C99E28 C000000108C99E28 37 C000000122E2A6D8 00 00 1 3 0 0 179 C00000011A4BB328 3990848181 C000000122E2A6D8 1 PUBUSER CSNOZ629926699966

SQL> set linesize 100
SQL> l
1* select * from X$KGLLK where KGLLKHDL = 'C000000122E2A6D8' order by KGLLKSNM,KGLNAOBJ
SQL> /

ADDR INDX INST_ID KGLLKADR KGLLKUSE KGLLKSES KGLLKSNM
---------------- ---------- ---------- ---------------- ---------------- ---------------- ----------
KGLLKHDL KGLLKPNC KGLLKPNS KGLLKCNT KGLLKMOD KGLLKREQ KGLLKFLG
---------------- ---------------- ---------------- ---------- ---------- ---------- ----------
KGLLKSPN KGLLKHTB KGLNAHSH KGLHDPAR KGLHDNSP USER_NAME
---------- ---------------- ---------- ---------------- ---------- ------------------------------
KGLNAOBJ
------------------------------------------------------------
800003FB0007E4D0 33 2 C00000011A449E20 C000000109F02C68 C000000109F02C68 30
C000000122E2A6D8 00 00 0 0 2 0
463 C00000011A4BC350 3990848181 C000000122E2A6D8 1 PUBUSER
CSNOZ629926699966

800003FB0007E5B0 34 2 C00000011A44A150 C000000108C99E28 C000000108C99E28 37
C000000122E2A6D8 00 00 1 3 0 0
179 C00000011A4BB328 3990848181 C000000122E2A6D8 1 PUBUSER
CSNOZ629926699966

SQL>

上一个例子中我们主要借助于X$KGLLK基表和event systemstate解决问题,那么如果你不了解X$KGLLK基表,或者忘记了如何使用它,那也不要紧张,这里再介绍一种常规的方法。

从system state 的转储信息中,我们已经注意到PROCESS 28当前正在等待'library cache lock'。
'handle address'表示的就是正持有 PROCESS 28 进程所等待的library cache中的地址。

现在我们继续在跟踪文件中查找包含 'handle=c000000122e2a6d8' 字符串的ORACLE PROCESS,也就是查找blocking session的信息,发现信息如下:

PROCESS 26: ----------------阻塞其他会话的Oracle进程,这里PROCESS 26对应了V$PROCESS中的PID的值 ----------------------------------------
SO: c000000109c831e0, type: 2, owner: 0000000000000000, flag: INIT/-/-/0x00
(process) Oracle pid=26, calls cur/top: c00000010b2774d0/c00000010b2774d0, flag: (0) -
int error: 0, call error: 0, sess error: 0, txn error 0
(post info) last post received: 17 24 6
last post received-location: ksusig
last process to post me: c000000109c840f8 25 0
last post sent: 751404 0 15
last post sent-location: ksasnd
last process posted by me: c000000109c836e8 1 6
(latch info) wait_event=0 bits=0
Process Group: DEFAULT, pseudo proc: c000000109eefda0
O/S info: user: ora9i, term: UNKNOWN, ospid: 20552 OSD pid info: Unix process pid: 20552, image: oracle@cs_dc02 (TNS V1-V3)
----------------------------------------
SO: c0000001180b9510, type: 8, owner: c000000109c831e0, flag: INIT/-/-/0x00
(FOB) flags=2 fib ptr=162e1b48 incno=0 pending i/o cnt=0
----------------------------------------
SO: c0000001180b9458, type: 8, owner: c000000109c831e0, flag: INIT/-/-/0x00
(FOB) flags=2 fib ptr=162deb18 incno=0 pending i/o cnt=0
----------------------------------------
SO: c0000001180b8230, type: 8, owner: c000000109c831e0, flag: INIT/-/-/0x00
(FOB) flags=2 fib ptr=162de848 incno=0 pending i/o cnt=0
----------------------------------------
SO: c0000001180b7b00, type: 8, owner: c000000109c831e0, flag: INIT/-/-/0x00
(FOB) flags=2 fib ptr=162de578 incno=0 pending i/o cnt=0
----------------------------------------
SO: c000000108c99e28, type: 4, owner: c000000109c831e0, flag: INIT/-/-/0x00
c000000108c99e28 对应的就是V$SESSION 中的SADDR的值,通过这个信息就可以找到blocking session的SID等信息
(session) trans: c0000001169403c0, creator: c000000109c831e0, flag: (100041) USR/- BSY/-/-/-/-/-
DID: 0002-001A-0000007D, short-term DID: 0000-0000-00000000
txn branch: c00000011b825e18
oct: 0, prv: 0, sql: 800003fb0005f7b0, psql: c00000011fbe3f98, user: 50/PUBUSER
O/S info: user: report16, term: , ospid: 20550, machine: cs_dc02 program: sqlplus@cs_dc02 (TNS V1-V3)
application name: SQL*Plus, hash value=3669949024
waiting for 'SQL*Net message from dblink' blocking sess=0x0 seq=3319 wait_time=0
driver id=28444553, #bytes=1, =0 -------------------
这里,

#bytes 表示个server process通过database link发送给另一个server process的字节数(bytes)driver id 是一个10进制数,我们需要把它转化为16进制数,然后就会发现它对应于我们通过event 10046中的相应的信息:*** 2005-01-10 11:44:48.200
WAIT #1: nam='SQL*Net message from dblink' ela= 104397696 p1=675562835 p2=1 p3=0
WAIT #1: nam='SQL*Net message to dblink' ela= 4 p1=675562835 p2=1 p3=0

SQL> select to_char(675562835,'XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX') from dual;

TO_CHAR(675562835,'XXXXXXXXXXXXXX
---------------------------------
28444553

SQL>

temporary object counter: 0
----------------------------------------
SO: c00000011a4496b0, type: 51, owner: c000000108c99e28, flag: INIT/-/-/0x00
LIBRARY OBJECT LOCK: lock=c00000011a4496b0 handle=c00000012029f968 mode=N
call pin=0000000000000000 session pin=c00000011a44ad70
htl=c00000011a449720[c00000011a4baa78,c00000011a4baa78] htb=c00000011a4baa78
user=c000000108c99e28 session=c000000108c99e28 count=1 flags=[00] savepoint=173
LIBRARY OBJECT HANDLE: handle=c00000012029f968
namespace=CRSR flags=RON/KGHP/PN0/[10010000]
kkkk-dddd-llll=0000-0041-0041 lock=N pin=0 latch#=3
lwt=c00000012029f998[c00000012029f998,c00000012029f998] ltm=c00000012029f9a8[c00000012029f9a8,c00000012029f9a8]
pwt=c00000012029f9c8[c00000012029f9c8,c00000012029f9c8] ptm=c00000012029fa58[c00000012029fa58,c00000012029fa58]
ref=c00000012029f978[c0000001202a0068, c0000001202a0068] lnd=c00000012029fa70[c00000012029fa70,c00000012029fa70]
LIBRARY OBJECT: bject=c00000012029f5c8
type=CRSR flags=EXS[0001] pflags= [00] status=VALD load=0
DEPENDENCIES: count=1 size=16
AUTHORIZATIONS: count=1 size=16 minimum entrysize=16
ACCESSES: count=1 size=16
DATA BLOCKS:
data# heap pointer status pins change
----- -------- -------- ------ ---- ------
0 c00000012029f8a8 c00000012029f288 I/P/A 0 NONE
6 c00000012029f6e8 c00000012029e7c8 I/-/A 0 NONE
----------------------------------------

。。。 。。。

----------------------------------------
SO: c00000011a44a150, type: 51, owner: c0000001169403c0, flag: INIT/-/-/0x00
// X$KGLLK.KGLLKADR 对应于SO(SO: c00000011a44a150 //
// X$KGLLK.KGLLKUSE 和 x$kgllk.KGLLKSES 对应于owner的值(owner: c0000001169403c0)
LIBRARY OBJECT LOCK: lock=c00000011a44a150 handle=c000000122e2a6d8 mode=X
// X$KGLLK.KGLLKADR 对应于SO 和 lock的值(SO: c00000011a44a150,lock=c00000011a44a150) //
// X$KGLLK.KGLLKHDL 对应于handle的值(handle=c000000122e2a6d8)
call pin=0000000000000000 session pin=0000000000000000
// X$KGLLK.KGLLKPNS 对应于session pin的值(session pin=0000000000000000) //
htl=c00000011a44a1c0[c00000011a4bb328,c00000011a4bb328] htb=c00000011a4bb328
user=c000000108c99e28 session=c000000108c99e28 count=1 flags=[00] savepoint=179
user和session的值分别对应着x$kgllk.KGLLKUSE 和 x$kgllk.KGLLKSES,也对应于V$SESSION中阻塞其他会话的SADDR // X$KGLLK.KGLLKSPN对应于savepoint的值(savepoint=179) //
LIBRARY OBJECT HANDLE: handle=c000000122e2a6d8
name=PUBUSER.CSNOZ629926699966
hash=eddf82b5 timestamp=01-08-2005 13:00:18 previous=NULL
namespace=TABL/PRCD/TYPE flags=KGHP/TIM/PTM/SML/[02000000]
kkkk-dddd-llll=0000-0709-0001 lock=X pin=X latch#=3
lwt=c000000122e2a708[c00000011a449e40,c00000011a449e40] ltm=c000000122e2a718[c000000122e2a718,c000000122e2a718]
pwt=c000000122e2a738[c000000122e2a738,c000000122e2a738] ptm=c000000122e2a7c8[c000000122e2a7c8,c000000122e2a7c8]
ref=c000000122e2a6e8[c000000122e2a6e8, c000000122e2a6e8] lnd=c000000122e2a7e0[c000000122e2a7e0,c000000122e2a7e0]
LOCK INSTANCE LOCK: id=LBcafc8485d0949f81
PIN INSTANCE LOCK: id=NBcafc8485d0949f81 mode=X release=F flags=[00]
LIBRARY OBJECT: bject=c000000122e12f70
type=TABL flags=EXS/LOC/CRT[0015] pflags= [00] status=VALD load=0
DATA BLOCKS:
data# heap pointer status pins change
----- -------- -------- ------ ---- ------
0 c000000122e2a618 c000000122e13118 I/P/A 0 INSERT
3 c000000122e13178 0 -/P/- 1 NONE
8 c000000122e12c30 c000000122febdb8 I/P/A 1 UPDATE
9 c000000122e13090 0 -/P/- 1 NONE
10 c000000122e12ce0 c000000122acbc70 I/P/A 1 UPDATE
----------------------------------------

。。。 。。。

根据上述两个ORACLE进程号(ORACLE PID),我们可以找到他们的会话信息和操作系统进程信息
SQL> select spid,pid,addr from v$process where pid in (26,28);

SPID PID ADDR
------------ ---------- ----------------
20552 26 C000000109C831E0 ----------- 阻塞其他会话的Oracle进程
22580 28 C000000109C83BF0 ----------- 被阻塞的Oracle进程

SQL>

我们来进一步证实一下上述信息:

SQL>col username for a20
SQL> col osuser for a20
SQL> col machine for a20
SQL> l
1 select sid,serial#,username,osuser,machine,to_char(logon_time,'yyyy/mm/dd hh24:mi:ss') LogonTime
2* from v$session where paddr in ( select addr from v$process where spid ='&spid')
SQL> /
Enter value for spid: 20552 ----------- 阻塞其他会话的Oracle进程

old 2: from v$session where paddr in ( select addr from v$process where spid ='&spid')
new 2: from v$session where paddr in ( select addr from v$process where spid ='20552')

SID SERIAL# USERNAME OSUSER MACHINE LOGONTIME
---------- ---------- -------------------- -------------------- -------------------- -------------------
37 2707 PUBUSER report16 cs_dc02 2005/01/08 13:00:17

SQL> /
Enter value for spid: 22580 ----------- 被阻塞的Oracle进程

old 2: from v$session where paddr in ( select addr from v$process where spid ='&spid')
new 2: from v$session where paddr in ( select addr from v$process where spid ='22580')

SID SERIAL# USERNAME OSUSER MACHINE LOGONTIME
---------- ---------- -------------------- -------------------- -------------------- -------------------
30 24167 PUBUSER ora9i cs_dc02 2005/01/10 10:20:31

SQL> select sid,saddr,paddr,username,status,OSUSER from v$session where sid in (37,30);

SID SADDR PADDR USERNAME STATUS OSUSER
---------- ---------------- ---------------- -------------------- -------- --------------------
30 C000000109F02C68 C000000109C83BF0 PUBUSER ACTIVE ora9i
37 C000000108C99E28 C000000109C831E0 PUBUSER ACTIVE report16

SQL>
现在,问题已经水落石出了,解决方法和方法1中的一样(在操作系统中直接kill掉相应的操作系统进程)。

当然,处于研究的目的,我们可以进一步了解一下上述两个会话(SID 30 和 SID 37)所有已经持有锁的相关信息:
SQL> set linesize 150
SQL> set pages 10000
SQL> select * from v$lock where sid in (37,30);

ADDR KADDR SID TY ID1 ID2 LMODE REQUEST CTIME BLOCK
---------------- ---------------- ---------- -- ---------- ---------- ---------- ---------- ---------- ----------
C0000001169403C0 C000000116940538 37 TX 917507 26579 6 0 180478 2
C00000011676DAE0 C00000011676DB08 37 TM 18 0 3 0 180478 2
C00000010B30C4E8 C00000010B30C508 37 XR 4 0 2 0 180369 2
C00000010B30C460 C00000010B30C480 37 DX 21 0 1 0 68 0

SQL>
不难看出,会话37阻塞了其他会话

现在,我们再进一步看看会话37当前在哪些对象上加了锁:
SQL> select object_name,object_id from dba_objects where object_id in ('917507','18','4','21') order by object_id;

OBJECT_NAME OBJECT_ID
------------------------------ ----------
TAB$ 4
OBJ$ 18
COL$ 21

SQL> /

OBJECT_NAME OBJECT_ID
------------------------------ ----------
TAB$ 4
OBJ$ 18
COL$ 21

SQL>

接下来,再着重看看SID 为37的会话在library cache中请求和持有对象锁的详细信息:
SQL> col KGLNAOBJ for a30
SQL> col USER_NAME for a10
SQL> l
1 select INST_ID,USER_NAME,KGLNAOBJ,KGLLKSNM,KGLLKUSE,KGLLKSES,KGLLKMOD,KGLLKREQ
2* from x$kgllk where KGLLKSNM = 37
SQL> /

INST_ID USER_NAME KGLNAOBJ KGLLKSNM KGLLKUSE KGLLKSES KGLLKMOD KGLLKREQ
---------- ---------- ------------------------------ ---------- ---------------- ---------------- ---------- ----------
2 PUBUSER DBMS_OUTPUT 37 C000000108C99E28 C000000108C99E28 1 0
2 PUBUSER DBMS_OUTPUT 37 C000000108C99E28 C000000108C99E28 1 0
2 PUBUSER DBMS_STANDARD 37 C000000108C99E28 C000000108C99E28 1 0
2 PUBUSER PUBUSER 37 C000000108C99E28 C000000108C99E28 1 0
2 PUBUSER SELECT MINOR_VERSION FROM SY 37 C000000108C99E28 C000000108C99E28 1 0
S.CDC_SYSTEM$

2 PUBUSER SELECT MINOR_VERSION FROM SY 37 C000000108C99E28 C000000108C99E28 1 0
S.CDC_SYSTEM$

2 PUBUSER DBMS_CDC_PUBLISH 37 C000000108C99E28 C000000108C99E28 1 0
2 PUBUSER DBMS_CDC_PUBLISH 37 C000000108C99E28 C000000108C99E28 1 0
2 PUBUSER CSNOZ629926699966 37 C000000108C99E28 C000000108C99E28 3 0
2 PUBUSER DBMS_APPLICATION_INFO 37 C000000108C99E28 C000000108C99E28 1 0
2 PUBUSER DBMS_APPLICATION_INFO 37 C000000108C99E28 C000000108C99E28 1 0
2 PUBUSER DATABASE 37 C000000108C99E28 C000000108C99E28 1 0

12 rows selected.

SQL>

再看看SID为30的会话在library cache中请求和持有对象锁的详细信息:
SQL> select INST_ID,USER_NAME,KGLNAOBJ,KGLLKSNM,KGLLKUSE,KGLLKSES,KGLLKMOD,KGLLKREQ
2 from x$kgllk where KGLLKSNM = 30
3 /

INST_ID USER_NAME KGLNAOBJ KGLLKSNM KGLLKUSE KGLLKSES KGLLKMOD KGLLKREQ
---------- ---------- ------------------------------ ---------- ---------------- ---------------- ---------- ----------
2 PUBUSER PUBUSER 30 C000000109F02C68 C000000109F02C68 1 0
2 PUBUSER CSNOZ629926699966 30 C000000109F02C68 C000000109F02C68 0 2
2 PUBUSER DBMS_APPLICATION_INFO 30 C000000109F02C68 C000000109F02C68 1 0
2 PUBUSER DBMS_APPLICATION_INFO 30 C000000109F02C68 C000000109F02C68 1 0
2 PUBUSER DATABASE 30 C000000109F02C68 C000000109F02C68 1 0

SQL>
KGLNAOBJ 列包含了在librarky cache中的对象上执行命令的语句的前80个字符,其实从这里我们也可以大大缩小范围了
KGLLKSES 对应于V$SESSION 中的 SADDR列的值
KGLLKSNM 对应于V$SESSION 中的SID(Session ID)
KGLLKHDL 的值与方法1中跟踪文件中的'handle address'的值对应
KGLLKPNS 的值对应于方法1中跟踪文件中的'Ssession pin'的值

SQL> set linesize 2000
SQL> l
1 select INST_ID,USER_NAME,KGLNAOBJ,KGLLKSNM,KGLLKUSE,KGLLKSES,KGLLKMOD,KGLLKREQ,KGLLKPNS,KGLLKHDL
2* from x$kgllk where KGLLKSNM in (30,37) order by KGLLKSNM,KGLNAOBJ
SQL> /

INST_ID USER_NAME KGLNAOBJ KGLLKSNM KGLLKUSE KGLLKSES KGLLKMOD KGLLKREQ KGLLKPNS KGLLKHDL
---------- ------------------------------ ------------------------------------------------------------ ---------- ---------------- ---------------- ---------- ---------- ---------------- ----------------
2 PUBUSER DATABASE 30 C000000109F02C68 C000000109F02C68 1 0 00 C000000119F8EC58
2 PUBUSER DBMS_APPLICATION_INFO 30 C000000109F02C68 C000000109F02C68 1 0 00 C00000011CCDDA48
2 PUBUSER DBMS_APPLICATION_INFO 30 C000000109F02C68 C000000109F02C68 1 0 00 C00000011CCD81B8
2 PUBUSER PUBUSER 30 C000000109F02C68 C000000109F02C68 1 0 00 C00000011CBFDAA8
2 PUBUSER CSNOZ629926699966 30 C000000109F02C68 C000000109F02C68 0 2 00 C000000122E2A6D8
2 PUBUSER DATABASE 37 C000000108C99E28 C000000108C99E28 1 0 00 C000000119F8EC58
2 PUBUSER DBMS_APPLICATION_INFO 37 C000000108C99E28 C000000108C99E28 1 0 00 C00000011CCDDA48
2 PUBUSER DBMS_APPLICATION_INFO 37 C000000108C99E28 C000000108C99E28 1 0 00 C00000011CCD81B8
2 PUBUSER DBMS_CDC_PUBLISH 37 C000000108C99E28 C000000108C99E28 1 0 00 C00000011FEA4918
2 PUBUSER DBMS_CDC_PUBLISH 37 C000000108C99E28 C000000108C99E28 1 0 00 C0000001202A4988
2 PUBUSER DBMS_OUTPUT 37 C000000108C99E28 C000000108C99E28 1 0 00 C00000011CCB48B0
2 PUBUSER DBMS_OUTPUT 37 C000000108C99E28 C000000108C99E28 1 0 00 C00000011FFF5098
2 PUBUSER DBMS_STANDARD 37 C000000108C99E28 C000000108C99E28 1 0 00 C00000011CCF0ED8
2 PUBUSER PUBUSER 37 C000000108C99E28 C000000108C99E28 1 0 00 C00000011CBFDAA8
2 PUBUSER SELECT MINOR_VERSION FROM SYS.CDC_SYSTEM$ 37 C000000108C99E28 C000000108C99E28 1 0 C00000011A44AD70 C00000012029F968
2 PUBUSER SELECT MINOR_VERSION FROM SYS.CDC_SYSTEM$ 37 C000000108C99E28 C000000108C99E28 1 0 00 C0000001202A0228
2 PUBUSER CSNOZ629926699966 37 C000000108C99E28 C000000108C99E28 3 0 00 C000000122E2A6D8

17 rows selected.

SQL>

前面我们已经知道碰到'library cache lock'导致进程hang时,如何解决。

那么如何避免这类问题的发生呢?这篇文章主要介绍了library cache lock的成因以及如何避免。

当我们在执行一些DML,DDL,甚至desc tablename等等操作的时候,会话就hang住了,
还有一种情况,当我们使用create or replace procedure/function等语句修改Procedure和Function的时候,会话也会hang住,这是为什么呢?

当出现上述情况的时候,我们可以通过v$session_wait查询等待事件,当然,这种情况下,该session的等待事件一定是'Library cache lock'。

让我们来解释一下,

第一种情况,当会话1(session 1)在对一个表执行DML 或者 DDL,与此同时还有另一个会话,我们姑且称之为会话2(session 2),这个会话2也在对这个表执行DDL(如ALTER TABLE),当会话2的完成需要很长时间时(依操作的具体的数据量而定),会话1就会hang住,这时,你查询会话1的等待事件就是'Library cache lock'。

第二种情况,当会话1(session 1)在修改一个package,与此同时还有另一个会话,我们姑且称之为会话2(session 2),这个会话2正在执行会话1所修改的package中的Procedure或者Function,会话1就会hang住,这时,你查询会话1的等待事件就是'Library cache lock'。

因此,在对Package/Procedure/Function/View进行编译和分析的时候,我们必须确定此时没有人正在编译和分析相同的对象,即确保没有人也在此时改变这些需要重定义(drop和recreate)的对象的定义。

SQL> create or replace procedure who_is_using wrapped
2 0
3 abcd
4 abcd
5 abcd
6 abcd
7 abcd
8 abcd
9 abcd
10 abcd
11 abcd
12 abcd
13 abcd
14 abcd
15 abcd
16 abcd
17 abcd
18 abcd
19 7
20 200f000
21 1
22 4
23 0
24 1e
25 c WHO_IS_USING:
26 8 OBJ_NAME:
27 8 VARCHAR2:
28 b DBMS_OUTPUT:
29 6 ENABLE:
30 7 1000000:
31 1 I:
32 1 B:
33 8 USERNAME:
34 3 SID:
35 3 SYS:
36 7 X$KGLPN:
37 1 A:
38 9 V$SESSION:
39 7 X$KGLOB:
40 1 C:
41 8 KGLPNUSE:
42 1 =:
43 5 SADDR:
44 5 UPPER:
45 8 KGLNAOBJ:
46 4 LIKE:
47 8 KGLPNHDL:
48 8 KGLHDADR:
49 4 LOOP:
50 8 PUT_LINE:
51 1 (:
52 2 ||:
53 7 TO_CHAR:
54 4 ) - :
55 0
56
57 0
58 0
59 74
60 2
61 0 1d 9a 8f a0 b0 3d b4
62 55 6a :2 a0 6b 51 a5 57 91
63 :2 a0 6b :2 a0 6b ac :2 a0 6b a0
64 b9 :2 a0 b9 :2 a0 6b a0 b9 b2
65 ee :2 a0 6b a0 7e a0 6b b4
66 2e :3 a0 6b a5 b 7e :2 a0 a5
67 b b4 2e a 10 :2 a0 6b a0
68 7e a0 6b b4 2e a 10 ac
69 d0 e5 e9 37 :3 a0 6b 6e 7e
70 :3 a0 6b a5 b b4 2e 7e 6e
71 b4 2e 7e :2 a0 6b b4 2e a5
72 57 b7 a0 47 b7 a4 b1 11
73 68 4f 17 b5
74 74
75 2
76 0 3 4 19 15 14 20 13
77 25 29 2d 31 35 38 3b 3c
78 41 45 49 4d 50 54 58 5b
79 5c 60 64 67 11 6b 6f 73
80 75 79 7d 80 84 86 87 8e
81 92 96 99 9d a0 a4 a7 a8
82 ad b1 b5 b9 bc bd bf c2
83 c6 ca cb cd ce 1 d3 d8
84 dc e0 e3 e7 ea ee f1 f2
85 1 f7 fc fd 101 106 10b 10d
86 111 115 119 11c 120 123 127 12b
87 12f 132 133 135 136 13b 13e 142
88 143 148 14b 14f 153 156 157 15c
89 15d 162 164 168 16f 171 175 177
90 182 186 188 18f
91 74
92 2
93 0 1 b 18 21 :2 18 17 :2 1
94 2 :2 e 15 :2 2 6 1c :2 1e 27
95 :2 29 1c 11 15 11 1d 11 1f
96 29 1f 2b 2f 2b 37 2b c
97 11 12 :2 14 1f 1d :2 21 :2 1d 12
98 18 :2 1a :3 12 2a 30 :2 2a :5 12 :2 14
99 1f 1d :2 21 :2 1d :2 12 :4 c 2 2b
100 3 :2 f 18 1b 1d 25 :2 27 :2 1d
101 :2 18 2b 2d :2 18 33 35 :2 37 :2 18
102 :2 3 2b 6 2 :8 1
103 74
104 2
105 0 :9 1 :6 3 :8 4 :f 5 :9 6 :d 7 :2 6
106 :9 8 :2 6 5 :4 4 8 :19 9 8 a
107 4 :2 2 :6 1
108 191
109 2
110 :4 0 1 :a 0 70 1 1a 1b 5
111 :2 3 :3 0 2 :6 0 5 4 :3 0 7
112 :2 0 70 2 8 :2 0 4 :3 0 5
113 :3 0 a b 0 6 :2 0 7 c
114 e :2 0 6c 7 :3 0 8 :3 0 9
115 :3 0 11 12 0 8 :3 0 a :3 0
116 14 15 0 9 b :3 0 c :3 0
117 18 19 0 d :3 0 e :3 0 8
118 :3 0 1d 1e b :3 0 f :3 0 20
119 21 0 10 :3 0 22 23 c 25
120 49 0 4a :3 0 d :3 0 11 :3 0
121 27 28 0 8 :3 0 12 :2 0 13
122 :3 0 2a 2c 0 12 2b 2e :3 0
123 14 :3 0 10 :3 0 15 :3 0 31 32
124 0 10 30 34 16 :2 0 14 :3 0
125 2 :3 0 15 37 39 17 36 3b
126 :3 0 2f 3d 3c :2 0 d :3 0 17
127 :3 0 3f 40 0 10 :3 0 12 :2 0
128 18 :3 0 42 44 0 1c 43 46
129 :3 0 3e 48 47 :3 0 2 17 26
130 0 4b :5 0 4c :2 0 4e 10 4d
131 19 :3 0 4 :3 0 1a :3 0 50 51
132 0 1b :3 0 1c :2 0 1d :3 0 7
133 :3 0 a :3 0 56 57 0 1a 55
134 59 1f 54 5b :3 0 1c :2 0 1e
135 :3 0 22 5d 5f :3 0 1c :2 0 7
136 :3 0 9 :3 0 62 63 0 25 61
137 65 :3 0 28 52 67 :2 0 69 2a
138 6b 19 :3 0 4e 69 :4 0 6c 2c
139 6f :3 0 6f 0 6f 6e 6c 6d
140 :6 0 70 0 2 8 6f 72 :2 0
141 1 70 73 :6 0
142 2f
143 2
144 :3 0 1 3 1 6 1 d 2
145 13 16 3 1c 1f 24 1 33
146 2 29 2d 1 38 2 35 3a
147 1 58 2 41 45 2 53 5a
148 2 5c 5e 2 60 64 1 66
149 1 68 2 f 6b
150 1
151 4
152 0
153 72
154 0
155 1
156 14
157 2
158 3
159 0 1 0 0 0 0 0 0
160 0 0 0 0 0 0 0 0
161 0 0 0 0
162 2 0 1
163 3 1 0
164 10 2 0
165 0
166 /

Procedure created.

Elapsed: 00:00:00.07
SQL>

本文来自CSDN博客,转载请标明出处:http://blog.csdn.net/wzy0623/archive/2008/05/21/2465429.aspx

来自 “ ITPUB博客 ” ,链接:http://blog.itpub.net/21634752/viewspace-696704/,如需转载,请注明出处,否则将追究法律责任。

转载于:http://blog.itpub.net/21634752/viewspace-696704/

关于library cache lock相关推荐

  1. oracle library cache lock,【案例】Oracle等待事件library cache lock产生原因和解决办法...

    [案例]Oracle等待事件library cache lock产生原因和解决办法 时间:2016-12-07 18:56   来源:Oracle研究中心   作者:网络   点击: 次 天萃荷净 O ...

  2. oracle library cache lock,【DB】彻底搞清楚library cache lock的成因和解决方法(一)

    问题描述: 接到应用人员的报告,说是在任何对表CSNOZ629926699966的操作都会hang,包括desc CSNOZ629926699966,例如: > sqlplus SQL*Plus ...

  3. oracle package lock,Oracle 11g下重现library cache lock等待事件

    从下面的例子中可以看到,在生产数据库中对象的重新编译会导致library cache lock,所以应该尽量避免在业务高峰期编译对象.如果是package或过程中存在复杂的依赖关系就极易导致libra ...

  4. 记一次library cache lock/library cache pin导致的函数编译hang住分析及处理过程

    墨墨导读:业务在进行alter function my_function_name compile时,有两个函数编译无法通过,现象就是会hang住,这里分享处理的整个过程. 一.前言 业务在进行alt ...

  5. 五月数据库技术通讯丨Oracle 12c因新特性引发异常Library Cache Lock等待

    每月关注:35页数据库技术干货,汇总一个月数据库行业热点事件.新的产品特性,包括重要数据库产品发布.警报.更新.新版本.补丁等. 亲爱的读者朋友: 为了及时共享行业案例,通知共性问题,达成共享和提前预 ...

  6. 经验:Library Cache Lock之异常分析-云和恩墨技术通讯精选

    亲爱的读者朋友: 为了及时共享行业案例,通知共性问题,达成共享和提前预防,我们整理和编辑了<云和恩墨技术通讯>,通过对过去一段时间的知识回顾,故障归纳,以期提供有价值的信息供大家参考.同时 ...

  7. 实战课堂:数据库高Library Cache Lock导致Hang的故障分析

    编辑手记:在现实的生产环境中,DBA可能遭遇到各种各样的异常,或简单.或复杂,但是无一不考验DBA的经验和能力,在『实战课堂』栏目中,我们将整理和分享来自云和恩墨一线的各种案例,以其帮助走在DBA道路 ...

  8. 并行insert出现library cache lock与cursor: pin S wait on X等待问题记录

    一. 故障现象与紧急处理 开发反馈凌晨5点左右应用出现大量报错 ORA-04021: timeout occurred while waiting to lock object,并且集中出现在inse ...

  9. 密码错误频繁登录引发的“library cache lock”或“row cache lock”等待

    密码错误频繁登录引发的"library cache lock"或"row cache lock"等待 对于正常的系统,由于密码的更改,可能存在某些被遗漏的客户端 ...

  10. (转载)library cache lock和library cache pin到底是什么

    (http://www.dbsnake.net/library-cache-lock-and-pin.html) Posted: December 16, 2011 | Author: Cui Hua ...

最新文章

  1. Java虚拟机详解01----初识JVM
  2. SVN - 简单使用手册
  3. Django框架实现支付宝第三方支付
  4. 共享内存:mmap函数实现
  5. java随机生成n位数字_java随机生成n个不相同的整数
  6. sgu 101 domino
  7. Linux下查看网卡信息及确定网卡位置以及更改网卡名称
  8. 一文教你如何摆脱应用商店下载微软应用
  9. 微信小程序:页面传递中文出现乱码
  10. C语言程序的特点及设计思想
  11. 项目实战之信用卡数字识别
  12. 洛谷P4196 半平面交
  13. pjsip学习 ------ 三
  14. 走向Web渗透工程师
  15. java 生成正弦波声音_Java错误生成声音正弦波
  16. 波澜不惊 波澜不惊 波澜不惊
  17. AtCoder Beginner Contest 254 A~E 题解
  18. HCIA脱产班 学习笔记2
  19. TMMI_测试过程改进框架_优化级别8
  20. python 物理公式计算_计算重力/跳跃

热门文章

  1. 华为云平台 IP SAN 存储配置
  2. 允许用户使用 MAK 密钥激活 Office 2010 批量许可版
  3. 戴尔服务器连接显示器无信号 键盘灯不亮,键盘灯不亮按键没反应怎么回事及解决办法...
  4. [音乐欣赏]夜空中最亮的星,张杰,苏阳,新闻晚高峰结束曲
  5. win10语言栏恢复原位
  6. 20220925 appreciate和grateful的区别
  7. 改变linux 背景修改命令,Linux命令行下更改桌面背景(GNOME环境)
  8. 计算24点游戏C语言课设
  9. win10网页找不到服务器dns,win10系统浏览网页提示“找不到服务器或dns错误”的解决方法...
  10. 计算机管理-磁盘管理中进行扩展卷操作,管理磁盘diskpart命令