墨墨导读:一套19C CDB数据库,存储更换HBA卡宕,本文详述这起begin backup导致的故障恢复全过程。

半夜接到客户反馈,一套19C CDB数据库,存储更换HBA卡宕,起不来了,OPEN时提示需要介质恢复,这里截了一段ALERT LOG。

2020-07-28T23:40:53.328908+08:00
Errors in file /u02/app/oracle/diag/rdbms/racdb3/racdb32/trace/racdb32_ora_306493.trc:
ORA-10873: file 1 needs to be either taken out of backup mode or media recovered
ORA-01110: data file 1: '+DATA/RACDB3/DATAFILE/system.278.1037610503'
2020-07-28T23:40:53.387627+08:00
Errors in file /u02/app/oracle/diag/rdbms/racdb3/racdb32/trace/racdb32_ora_306493.trc:
ORA-10873: file 1 needs to be either taken out of backup mode or media recovered
ORA-01110: data file 1: '+DATA/RACDB3/DATAFILE/system.278.1037610503'
ORA-10873 signalled during: ALTER DATABASE OPEN /* db agent *//* {0:17:3557} */...
2020-07-28T23:40:55.357177+08:00
License high water mark = 2
2020-07-28T23:40:55.357492+08:00
USER(prelim) (ospid: 310054): terminating the instance
2020-07-28T23:40:56.369307+08:00
Instance terminated by USER(prelim), pid = 310054ORA-10873: file 1 needs to be either taken out of backup mode or media recovered

这里报ORA-10873是由于数据库或表空间BEGIN BACKUP导致,正确的处理方法只需要end backup即可。

alter database end backup;
alter tablespace [tablespace_name] end backup;
alter database open;

[oracle@test ~]$ oerr ora 10873
perl: warning: Setting locale failed.
perl: warning: Please check that your locale settings:
LANGUAGE = (unset),
LC_ALL = (unset),
LANG = “en_us.utf8”
are supported and installed on your system.
perl: warning: Falling back to the standard locale (“C”).
10873, 00000, “file %s needs to be either taken out of backup mode or media recovered”
// *Cause: An attempt was made to open a database after an instance failure or
// SHUTDOWN ABORT interrupted an online backup.
// *Action: If the indicated file is not a restored backup, then issue the
// ALTER DATABASE END BACKUP command and open the database. If the
// file is a restored online backup, then apply media recovery to
// it and open the database.

当时RECOVER DATABASE 提示找不到归档(需要6-18号的归档)

由于有存储相关操作,误以为其它原因导致的问题,没有关注该报错,查询vdatafile,vdatafile,vdatafile_header发现检查点为上个月的6-18号。

---当前日期为2020-07-28T23:40:56
SQL> col name for a10
SQL> select a.con_id,a.name,b.file#,b.rfile#,b.checkpoint_change#,b.checkpoint_time,b.status from  v$containers a,v$datafile b where a.con_id=b.con_id order by checkpoint_change#;CON_ID NAME                      FILE#               RFILE#   CHECKPOINT_CHANGE# CHECKPOINT_TIME     STATUS
-------------------- ---------- -------------------- -------------------- -------------------- ------------------- -------2 PDB$SEED                      6                    4              2336937 2020-04-13 09:26:28 ONLINE2 PDB$SEED                      5                    1              2336937 2020-04-13 09:26:28 SYSTEM2 PDB$SEED                      8                    9              2336937 2020-04-13 09:26:28 ONLINE1 CDB$ROOT                    178                  178        8041849750453 2020-06-18 19:28:19 ONLINE1 CDB$ROOT                      7                    7        8041849750453 2020-06-18 19:28:19 ONLINE1 CDB$ROOT                      9                    9        8041849750453 2020-06-18 19:28:19 ONLINE1 CDB$ROOT                      4                    4        8041849750453 2020-06-18 19:28:19 ONLINE1 CDB$ROOT                      3                    3        8041849750453 2020-06-18 19:28:19 ONLINE1 CDB$ROOT                      1                    1        8041849750453 2020-06-18 19:28:19 SYSTEM

接着检查日志及询问客户也没有做restore的操作,误判断为出现了异常,未找到解决办法,因为有最近的全备,做了restore CDB$ROOT的操作,结果悲剧了,还原出来的数据文件,RECOVER仍然需要从6-18的归档开始,查询文件头检查点还是6-18号,接着查询备份中信息,文件的检查点也是6-18号,没遇到过这种情况,以为是ORACLE BUG。

RMAN> list backup of datafile 1;List of Backup Sets
===================BS Key  Type LV Size       Device Type Elapsed Time Completion Time
------- ---- -- ---------- ----------- ------------ -------------------
2508    Incr 0  23.19G     SBT_TAPE    00:04:26     **2020-07-26 08:32:28**BP Key: 2508   Status: AVAILABLE  Compressed: NO  Tag: HOT_DB_BK_LEVEL0Handle: bk_3490_1_1046766482   Media: @aaaa6List of Datafiles in backup set 2508File LV Type Ckp SCN    Ckp Time            Abs Fuz SCN Sparse Name---- -- ---- ---------- ------------------- ----------- ------ ----1    0  Incr 8041849750453 **2020-06-18 19:28:19** 8050242429418  NO    +DATA/RACDB3/DATAFILE/system.278.1037610503RMAN>

这里有一个大问题,就是由的数据库较大,restore前没有对当前环境做备份。切记,任何危险的变更操作都需要备份。做到可回退!!!

咨询公司专家后,确定为某此表空间做了begin backup导致。begin backup后文件头上的checkpoint不再更新。但经了解实际没有人为发起过backup backup,alert log中也没有找到begin backup的操作记录,需要再分析。

这时由于之前做了restore cdbroot的操作,控制文件,cdbroot的文件已从备份中还原,导致不能再end backup操作,1个月前的归档已清理,也没办法从6-18开始应用归档。没有办法通过正常的途径恢复数据库,悲剧!!!!

通过v$backup确定文件处于begin backup热备模式。SQL> select * from v$backup;FILE# STATUS                          CHANGE# TIME                              CON_ID
-------------------- ------------------ -------------------- ------------------- --------------------3 NOT ACTIVE                8041849750453 2020-06-18 19:28:19                    14 NOT ACTIVE                8041849750453 2020-06-18 19:28:19                    15 NOT ACTIVE                            0                                        26 NOT ACTIVE                            0                                        27 ACTIVE                    8041849750453 2020-06-18 19:28:19                    18 NOT ACTIVE                            0                                        2.......19 ACTIVE                    8041849750453 2020-06-18 19:28:19                    420 ACTIVE                    8041849750453 2020-06-18 19:28:19                    421 ACTIVE                    8041849750453 2020-06-18 19:28:19                    4SQL> select * from v$backup;FILE# STATUS                          CHANGE# TIME                              CON_ID
-------------------- ------------------ -------------------- ------------------- --------------------3 NOT ACTIVE                8041849750453 2020-06-18 19:28:19                    14 NOT ACTIVE                8041849750453 2020-06-18 19:28:19                    15 NOT ACTIVE                            0                                        26 NOT ACTIVE                            0                                        27 ACTIVE                    8041849750453 2020-06-18 19:28:19                    18 NOT ACTIVE                            0                                        29 NOT ACTIVE                8041849750453 2020-06-18 19:28:19                    115 ACTIVE                    8041849750453 2020-06-18 19:28:19                    416 ACTIVE                    8041849750453 2020-06-18 19:28:19                    417 ACTIVE                    8041849750453 2020-06-18 19:28:19                    418 ACTIVE                    8041849750453 2020-06-18 19:28:19                    4FILE# STATUS                          CHANGE# TIME                              CON_ID
-------------------- ------------------ -------------------- ------------------- --------------------19 ACTIVE                    8041849750453 2020-06-18 19:28:19                    420 ACTIVE                    8041849750453 2020-06-18 19:28:19                    421 ACTIVE                    8041849750453 2020-06-18 19:28:19                    422 ACTIVE                    8041849750453 2020-06-18 19:28:19                    423 ACTIVE                    8041849750453 2020-06-18 19:28:19                    424 ACTIVE                    8041849750453 2020-06-18 19:28:19                    425 ACTIVE                    8041849750453 2020-06-18 19:28:19                    426 ACTIVE                    8041849750453 2020-06-18 19:28:19                    427 ACTIVE                    8041849750453 2020-06-18 19:28:19                    428 ACTIVE                    8041849750453 2020-06-18 19:28:19                    429 ACTIVE                    8041849750453 2020-06-18 19:28:19                    4

如果不是CDB还可以重建控制文件,然后用以下方法解决,但如果是CDB,需要切到pdb中执行以下操作,重建控制文件后,PDB是ORACLE内部的名字,没办法切换PDB,所以以下方法行不通。
alter database datafile 1 offline;
alter database datafile 1 end backup;
alter database datafile 1 online;

最终的解决办法是,bbed修改文件头上的检查点信息,再应用近几天的归档,应用到最新状态,open resetlogs,最终0数据丢失恢复。

这里由于文件比较多,不好全copy到本地文件系统,用到了ASM未公开的内部包,只

读取ASM中的数据头数据块到本地,bbed修改完,再copy回去。

--copy datafile head from asm
Set pagesize 300
Set linesize 300
set numw 20
alter session set nls_date_format='yyyy-mm-dd hh24:mi:ss';
col name for a100
select  '@tofs '||b.name||' '||regexp_replace(b.name,'^.*DATAFILE/','/u01/work/') from  v$datafile b where  b.checkpoint_change#<8050778122014 and con_id!=2 order by checkpoint_change#;@tofs +DATA/RACDB/DATAFILE/system.278.1037610503 /u01/work/system.278.1037610503
@tofs +DATA/RACDB/DATAFILE/sysaux.261.1037610537 /u01/work/sysaux.261.1037610537
@tofs +DATA/RACDB/DATAFILE/undotbs1.288.1037610553 /u01/work/undotbs1.288.1037610553
@tofs +DATA/RACDB/DATAFILE/users.282.1037610553 /u01/work/users.282.1037610553
@tofs +DATA/RACDB/DATAFILE/undotbs2.263.1037611185 /u01/work/undotbs2.263.1037611185
@tofs +DATA/RACDB/A3D75790AD24522EE053C756D80A788E/DATAFILE/system.302.1038386279 /u01/work/system.302.1038386279
@tofs +DATA/RACDB/A3D75790AD24522EE053C756D80A788E/DATAFILE/sysaux.292.1038386277 /u01/work/sysaux.292.1038386277
@tofs +DATA/RACDB/A3D75790AD24522EE053C756D80A788E/DATAFILE/undotbs1.299.1038386279
......
@tofs +DATA/RACDB/A3EFF303A38BE5E8E053C756D80A801E/DATAFILE/sysaux.272.1038492071 /u01/work/sysaux.272.1038492071
@tofs +DATA/RACDB/A3EFF303A38BE5E8E053C756D80A801E/DATAFILE/undotbs1.281.1038492071 /u01/work/undotbs1.281.1038492071
@tofs +DATA/RACDB/A3EFF303A38BE5E8E053C756D80A801E/DATAFILE/undo_2.283.1038492071 /u01/work/undo_2.283.1038492071
@tofs +DATA/RACDB/A3EFF303A38BE5E8E053C756D80A801E/DATAFILE/users.279.1038492071 /u01/work/users.279.1038492071
@tofs +DATA/RACDB/DATAFILE/test11.425.1041690739 /u01/work/test11.425.1041690739--生成bbed listfile
Set pagesize 300
Set linesize 300
set numw 20
alter session set nls_date_format='yyyy-mm-dd hh24:mi:ss';
col name for a100
select  file#||' '||regexp_replace(b.name,'^.*DATAFILE/','/u01/work/') from  v$datafile b where  b.checkpoint_change#<8050778122014 and con_id!=2 order by checkpoint_change#;BBED> infoFile#  Name                                                        Size(blks)-----  ----                                                        ----------1  /u01/work/system.278.1037610503                                      03  /u01/work/sysaux.261.1037610537                                      04  /u01/work/undotbs1.288.1037610553                                    07  /u01/work/users.282.1037610553                                       09  /u01/work/undotbs2.263.1037611185                                    015  /u01/work/system.302.1038386279                                      016  /u01/work/sysaux.292.1038386277                                      017  /u01/work/undotbs1.299.1038386279                                    018  /u01/work/undotbs1.303.1038386279                                    019  /u01/work/undotbs2.294.1038386277                                    020  /u01/work/undotbs2.301.1038386279                                    0
......178  /u01/work/test11.425.1041690739                                      0
---bbed 从kcvfhbcp恢复检查点
assign file 1 block 1 kcvfhckp = file 1 block 1 kcvfhbcp
assign file 3 block 1 kcvfhckp = file 3 block 1 kcvfhbcp
assign file 4 block 1 kcvfhckp = file 4 block 1 kcvfhbcp
assign file 7 block 1 kcvfhckp = file 7 block 1 kcvfhbcp
assign file 9 block 1 kcvfhckp = file 9 block 1 kcvfhbcp
assign file 15 block 1 kcvfhckp = file 15 block 1 kcvfhbcp
assign file 16 block 1 kcvfhckp = file 16 block 1 kcvfhbcp
assign file 17 block 1 kcvfhckp = file 17 block 1 kcvfhbcp
assign file 18 block 1 kcvfhckp = file 18 block 1 kcvfhbcp
assign file 19 block 1 kcvfhckp = file 19 block 1 kcvfhbcp
......
assign file 119 block 1 kcvfhckp = file 119 block 1 kcvfhbcp
assign file 178 block 1 kcvfhckp = file 178 block 1 kcvfhbcp
assign file 181 block 1 kcvfhckp = file 181 block 1 kcvfhbcpsum apply file 1 block 1
sum apply file 3 block 1
sum apply file 4 block 1
sum apply file 7 block 1
sum apply file 9 block 1
sum apply file 15 block 1
sum apply file 16 block 1
sum apply file 17 block 1
sum apply file 18 block 1
sum apply file 19 block 1
......
sum apply file 118 block 1
sum apply file 119 block 1
sum apply file 178 block 1
sum apply file 181 block 1kcvfhckp  检查点,恢复起始点,begin backup后再不更新
kcvfhbcp  begin backup后检查点(begin backup后检查点更新在该位置,end backup以该检查点更新kcvfhckp)---copy to asmSet pagesize 300
Set linesize 300
set numw 20
alter session set nls_date_format='yyyy-mm-dd hh24:mi:ss';
col name for a100
select  '@toasm '||regexp_replace(b.name,'^.*DATAFILE/','/u01/work/')||' '||b.name from  v$datafile b where  b.checkpoint_change#<8050778122014 and con_id!=2 order by checkpoint_change#;@toasm /u01/work/system.278.1037610503 +DATA/RACDB/DATAFILE/system.278.1037610503
@toasm /u01/work/sysaux.261.1037610537 +DATA/RACDB/DATAFILE/sysaux.261.1037610537
@toasm /u01/work/undotbs1.288.1037610553 +DATA/RACDB/DATAFILE/undotbs1.288.1037610553
@toasm /u01/work/users.282.1037610553 +DATA/RACDB/DATAFILE/users.282.1037610553
@toasm /u01/work/undotbs2.263.1037611185 +DATA/RACDB/DATAFILE/undotbs2.263.1037611185
@toasm /u01/work/system.302.1038386279 +DATA/RACDB/A3D75790AD24522EE053C756D80A788E/DATAFILE/system.302.1038386279
@toasm /u01/work/sysaux.292.1038386277 +DATA/RACDB/A3D75790AD24522EE053C756D80A788E/DATAFILE/sysaux.292.1038386277
@toasm /u01/work/undotbs1.299.1038386279 +DATA/RACDB/A3D75790AD24522EE053C756D80A788E/DATAFILE/undotbs1.299.1038386279
.......
@toasm /u01/work/test11.425.1041690739 +DATA/RACDB/DATAFILE/test11.425.1041690739

查询v$datafile_header 确认checkpoint_change#已更新。

recover database
run {
allocate channel ch00 device type sbt;
allocate channel ch01 device type sbt;
allocate channel ch02 device type sbt;
allocate channel ch03 device type sbt;
SEND ‘NB_ORA_SERV=bak-svr,NB_ORA_CLIENT=racdb6’;
recover database ;
release channel ch00;
release channel ch01;
release channel ch02;
release channel ch03;
}

注册最近两天的归档,继续recover database,直到最新

catalog archivelog ‘+ARCH/RACDB/archivelog/2020_07_28/thread_2_seq_4935.819.1046934651’;
catalog archivelog ‘+ARCH/RACDB/archivelog/2020_07_28/thread_2_seq_4936.524.1046935119’;
…
catalog archivelog ‘+ARCH/RACDB/archivelog/2020_07_28/thread_2_seq_4942.554.1046945869’;
catalog archivelog ‘+ARCH/RACDB/archivelog/2020_07_28/thread_2_seq_4943.750.1046946151’;
最后
recover database using backup controlfile;
cancel;
alter database open resetlogs;

到些数据库启动成功。

下面测试重现了该问题,及正确的处理方法。不过19C中并没有人为发起begin backup,需要继续排查什么原因导致。

SQL> alter tablespace ts2 begin backup;Tablespace altered.SQL> select * from v$backup;FILE# STATUS                CHANGE# TIME
---------- ------------------ ---------- -----------------1 NOT ACTIVE                  02 NOT ACTIVE                  03 NOT ACTIVE                  04 NOT ACTIVE                  05 NOT ACTIVE                  06 NOT ACTIVE                  07 NOT ACTIVE                  08 ACTIVE                2237473 20200730 17:30:449 NOT ACTIVE                  09 rows selected.SQL> alter system switch logfile;System altered.SQL> alter system switch logfile;System altered.SQL> alter system switch logfile;System altered.SQL> alter system switch logfile;System altered.SQL> alter system switch logfile;System altered.SQL> alter system switch logfile;System altered.SQL> alter system switch logfile;System altered.SQL> alter system switch logfile;System altered.SQL> shutdown abort;
ORACLE instance shut down.
SQL>SQL> startup
ORA-32004: obsolete or deprecated parameter(s) specified for RDBMS instance
ORACLE instance started.Total System Global Area  626327552 bytes
Fixed Size                  2255832 bytes
Variable Size             419431464 bytes
Database Buffers          197132288 bytes
Redo Buffers                7507968 bytes
Database mounted.
ORA-10873: file 8 needs to be either taken out of backup mode or media recovered
ORA-01110: data file 8: '/oracle/app/oracle/oradata/TESTA/datafile/o1_mf_ts2_hcw05wo4_.dbf'SQL> alter tablespace ts2 end backup;Tablespace altered.SQL> alter database open;Database altered.

墨天轮原文链接:https://www.modb.pro/db/28494(复制到浏览器中打开或者点击“阅读原文”)

推荐阅读:144页!分享珍藏已久的数据库技术年刊


视频号,新的分享时代,关注我们,看看有什么新发现?

数据和云

ID:OraNews

如有收获,请划至底部,点击“在看”,谢谢!

点击下图查看更多 ↓

云和恩墨大讲堂 | 一个分享交流的地方

长按,识别二维码,加入万人交流社群

请备注:云和恩墨大讲堂

  点个“在看”

你的喜欢会被看到❤

begin backup导致的故障恢复全过程相关推荐

  1. 有关 alter tablespace begin backup

    来自:http://www.dev-club.com/club/bbs/essence,27479.htm 在Oracle备份中,我们可以使用alter tablespace ... begin ba ...

  2. 探究ORACLE的SCN机制(3):Begin Backup

    在用户管理的备份模式下,如果是在线备份,在必须先启动beging backup,也就是热备份模式,以恢复可能出现的断裂快. 启动了begin backup后,数据库会冻结表空间或数据库对应数据文件的文 ...

  3. mysql mgr故障恢复实现_MGR实现分析 - 成员管理与故障恢复实现

    MySQL Group Replication(MGR)框架让MySQL具备了自动主从切换和故障恢复能力,举single primary(单主)模式为例,primary作为主节点对外提供读写服务,是唯 ...

  4. 下载丨7月数据库技术通讯:LINUX OS配置问题导致数据库重启

    为了及时共享行业案例,通知共性问题,达成共享和提前预防,我们整理和编辑了<云和恩墨技术通讯>,通过对过去一段时间的知识回顾,故障归纳,以期提供有价值的信息供大家参考.同时,我们也希望能够将 ...

  5. ORACLE备份策略(ORACLE BACKUP STRATEGY)

    概要 1.了解什么是备份 2.了解备份的重要性 3.理解数据库的两种运行方式 4.理解不同的备份方式及其区别 5.了解正确的备份策略及其好处    一.了解备份的重要性 可以说,从计算机系统出世的那天 ...

  6. Oracle数据文件scn不一致,控制文件与数据文件头SCN不一致导致数据库无法启动故障处理...

    环境说明 OS操作系统:WINDOWS 2012 64位 数据库版 本:ORACLE 11.2.0.1 故障问题描述 客户反映数据库无法启动,报ORA-01589:要打开数据库必须使用RESETLOG ...

  7. oracle end backup,oracle-backup-hot backup

    hot backup --********************************** -- 1.环境确认 归档模式.路径.文件 --***************************** ...

  8. 042-16 Backup and Recovery备份与恢复(前3种备份方式)

    --Oracle数据库工作的2种模式 1.Archivelog               --归档模式 2.Noarchivelog             --非归档模式 Select * Fro ...

  9. “net usershare”返回错误 255:net usershare add: share name backup is already a valid system user name

    samba共享一块名称为BACKUP的硬盘时提示: "net usershare"返回错误 255:net usershare add: share name backup is ...

最新文章

  1. LLVM Backend技术
  2. Windows启动文件的详细介绍
  3. 华为2021在美专利数量首次进前五,超越英特尔苹果微软
  4. 数据库系统概论:第十二章 数据库管理系统
  5. Android之旅---广播(BroadCast)
  6. 【Mail】telnet收发邮件过程
  7. LeetCode 1221. 分割平衡字符串
  8. VOA ECONOMICS REPORT - Junior Achievement Marks 90 Years of Business Education
  9. 2021-08-04 Mysql自连接
  10. 阶段3 2.Spring_09.JdbcTemplate的基本使用_1 今日课程内容介绍
  11. 徐思201771010132《面向对象程序设计(java)》第十五周学习总结
  12. java 拉勾网,拉钩网java笔试题分享
  13. 【OpenCV】图像多通道混合、缩放
  14. 天翼云携手华为,强强联合,共创数据存储新生态
  15. 如何快速的了解某种数据库
  16. 【C库函数】 strstr函数详解
  17. 关于ROS功能包里package.xml和CMakeList.txt的源码分析
  18. 杰奇1.X-3.X通用极端简系统,php7高效,杰奇系统多模版一库教程
  19. 2021年MathorCup高校数学建模挑战赛b题:三维团簇的能量预测(三等)
  20. 【趣学算法】第一章 算法之美(上)

热门文章

  1. awk 使用正则表达式_如何在awk中使用正则表达式
  2. 您不会相信Buzzfeed如何处理变更管理
  3. Bootstrap3 滚动监听的使用方法
  4. TensorFlow笔记(8) LeNet-5卷积神经网络
  5. c语言打开文件出现分段故障,c fclose() 导致分段故障_segmentation-fault_开发99编程知识库...
  6. php mysql数据库同步_实现MySQL数据库同步实例演示_MySQL
  7. iwrite提交不了作业_“iWrite写作中心”使用全攻略
  8. python解题软件哪个好用_几个好用的Python数据分析工具
  9. L2-DAY 2-程序完善夜
  10. MapReduce基础