在对系统进行例行检查的时候,发现日常备份失败。

错误信息为:

RMAN> backup incremental level 0 database;

Starting backup at 10-MAR-08
using target database controlfile instead of recovery catalog
allocated channel: ORA_SBT_TAPE_1
channel ORA_SBT_TAPE_1: sid=120 devtype=SBT_TAPE
channel ORA_SBT_TAPE_1: VERITAS NetBackup for Oracle - Release 5.0GA (2003103006)
channel ORA_SBT_TAPE_1: starting incremental level 0 datafile backupset
channel ORA_SBT_TAPE_1: specifying datafile(s) in backupset
input datafile fno=00001 name=/dev/vx/rdsk/maindbdg/lv_main00
input datafile fno=00008 name=/opt/oracle/oradata/oradata/bjdb01/users01.dbf
input datafile fno=00039 name=/opt/oracle/oradata/oradata/bjdb01/xdb02.dbf
input datafile fno=00009 name=/opt/oracle/oradata/oradata/bjdb01/xdb01.dbf
input datafile fno=00003 name=/opt/oracle/oradata/oradata/bjdb01/cwmlite01.dbf
input datafile fno=00004 name=/opt/oracle/oradata/oradata/bjdb01/drsys01.dbf
input datafile fno=00006 name=/opt/oracle/oradata/oradata/bjdb01/odm01.dbf
input datafile fno=00007 name=/opt/oracle/oradata/oradata/bjdb01/tools01.dbf
channel ORA_SBT_TAPE_1: starting piece 1 at 10-MAR-08
RMAN-00571: ===========================================================
RMAN-00569: =============== ERROR MESSAGE STACK FOLLOWS ===============
RMAN-00571: ===========================================================
RMAN-03009: failure of backup command on ORA_SBT_TAPE_1 channel at 03/10/2008 11:31:12
ORA-19506: failed to create sequential file, name="tpjatl1b_1_1", parms=""
ORA-27028: skgfqcre: sbtbackup returned error
ORA-19511: Error received from media manager layer, error text:
VxBSACreateObject: Failed with error:
Server Status: unable to allocate new media for backup, storage unit has none available

从这个错误信息上看似乎是空间不足造成的。不过虽然的备份错误信息变为:

RMAN-00571: ===========================================================
RMAN-00569: =============== ERROR MESSAGE STACK FOLLOWS ===============
RMAN-00571: ===========================================================
RMAN-03009: failure of backup command on ch00 channel at 03/10/2008 05:14:15
ORA-19502: write error on file "bk_26552_1_648968690", blockno 664577 (blocksize=512)
ORA-27030: skgfwrt: sbtwrite2 returned error
ORA-19511: Error received from media manager layer, error text:
VxBSASendData: Failed with error:
Server Status: Communication with the server has not been iniatated or the server status has not been retrieved from the server.

从这个错误上看,就不只是空间的问题了。

通过图形界面jnbSA,发现很多管理选项点击后反应很慢,基本上出不来结果。于是采用bpadm从命令行方式进行查询,从REPORT的PROBLEM中查询到下面的信息:

03/11/2008 01:45:04 backupcenter240 bpexpdate Could not build host list: client hostname could not be found
03/11/2008 02:13:34 backupcenter240 bjdb01 cannot write image to media id 000013, drive index 0, I/O错误
03/11/2008 02:13:48 backupcenter240 bjdb01 backup by oracle on client bjdb01 using policy oracle: media write error
03/11/2008 02:14:04 backupcenter240 bjdb01 backup of client bjdb01 exited with status 6 (the backup failed to back up the requested files)
03/11/2008 02:22:58 backupcenter240 bjdb01 cannot write image to media id 000013, drive index 0, I/O错误
03/11/2008 02:23:12 backupcenter240 bjdb01 backup by oracle on client bjdb01 using policy oracle: media write error
03/11/2008 02:23:19 backupcenter240 bjdb01 suspending further backup attempts for client bjdb01, policy oracle, schedule Cumulative-Inc because it has exceeded the configured number of tries
03/11/2008 02:23:19 backupcenter240 bjdb01 backup of client bjdb01 exited with status 6 (the backup failed to back up the requested files)
03/11/2008 02:23:20 backupcenter240 - scheduler exiting - the backup failed to back up the requested files (6)
03/11/2008 09:32:42 backupcenter240 data03 cannot write image to media id 000016, drive index 0, I/O错误
03/11/2008 09:32:53 backupcenter240 data03 DOWN'ing drive index 0, it has had at least 3 errors in last 12 hour(s)
03/11/2008 09:32:55 backupcenter240 data03 backup by oracle on client data03 using policy bjdb03-ora: media write error
03/11/2008 09:33:02 backupcenter240 data03 backup of client data03 exited with status 6 (the backup failed to back up the requested files)
03/11/2008 10:48:34 backupcenter240 data03 media manager terminated during mount of media id 000016, possible media mount timeout
03/11/2008 10:48:36 backupcenter240 data03 media manager terminated by parent process
03/11/2008 10:48:37 backupcenter240 data03 backup by oracle on client data03 using policy bjdb03-ora: the backup failed to back up the requested files
03/11/2008 10:48:38 backupcenter240 data03 suspending further backup attempts for client data03, policy bjdb03-ora, schedule diff because it has exceeded the configured number of tries
03/11/2008 10:48:38 backupcenter240 data03 backup of client data03 exited with status 6 (the backup failed to back up the requested files)
03/11/2008 13:55:03 backupcenter240 bpexpdate Could not build host list: client hostname could not be found

进一步查询详细的log信息,发现存在大量的错误:

03/11/2008 18:23:59 backupcenter240 - cleaning job DB
03/11/2008 18:23:59 backupcenter240 - all drives are down for the specified robot number = 0, robot type = TLD and density = hcart
03/11/2008 18:23:59 backupcenter240 - no drives up on storage unit <backupcenter240-hcart-robot-tld-0>
03/11/2008 18:24:00 bjdb01 - all drives are down for the specified robot number = 0, robot type = TLD and density = hcart
03/11/2008 18:24:00 backupcenter240 - no drives up on storage unit <bjdb01-hcart-robot-tld-0>
03/11/2008 18:24:31 backupcenter240 - all drives are down for the specified robot number = 0, robot type = TLD and density = hcart
03/11/2008 18:24:31 backupcenter240 - no drives up on storage unit <unit_99>
03/11/2008 18:24:32 backupcenter240 - all drives are down for the specified robot number = 0, robot type = TLD and density = hcart
03/11/2008 18:24:32 backupcenter240 - no drives up on storage unit <unit_data>
03/11/2008 18:24:32 backupcenter240 data03 skipping backup of client data03, policy bjdb03-ora, schedule diff because it has exceeded the configured number of tries

从这个信息上看,似乎是机械手出现了问题。而且如果真的是机械手的问题,那么也可以解释前后两次备份错误信息的不同。当一个磁带备份满了之后,机械手尝试更换新的磁带,这时出现了故障,而对于当时备份的操作,就出现了无法写入的错误,报错没有足够空间。而随后的备份由于机械手故障,而导致没有可用的磁带可以写入,因此报错NETBACKUP没有初始化完成。

继续检查media的报告,在汇总信息中看到:

Number of ACTIVE media that, as of now:
There are no ACTIVE media present in the media database

这进一步确定了刚才的判断,机械手故障导致可用的磁带无法放到驱动器中,因此系统中没有可用的介质。

通过tpconfig检查机械手的状态:

Index DriveName DrivePath Type Shared Status
***** ********* ********** **** ****** ******
0 IBMULTRIUM-TD10 /dev/rmt/1cbn hcart Yes DOWN
TLD(0) Definition DRIVE=1

Currently defined robotics are:
TLD(0) robotic path = /dev/sg/c2t4l1,
volume database host = backupcenter240

机械手处于DOWN的状态,看来问题已经基本确定了。

尝试使用robtest检查机械手:

bash-2.03# robtest
Configured robots with local control supporting test utilities:
TLD(0) robotic path = /dev/sg/c2t4l1

Robot Selection
---------------
1) TLD 0
2) none/quit
Enter choice: 1

Robot selected: TLD(0) robotic path = /dev/sg/c2t4l1

Invoking robotic test utility:
/usr/openv/volmgr/bin/tldtest -r /dev/sg/c2t4l1 -d1 /dev/rmt/1cbn

Opening /dev/sg/c2t4l1
MODE_SENSE complete
Enter tld commands (? returns help information)
?

To exit the utility, type q or Q.

init - Initialize element status
initrange <d#|s#|p#|t> [#]- Init element status range
allow - Allow media removal
prevent - Prevent media removal
extend - Extend media access port
retract - Retract media access port
mode - Mode sense
m <from> <to> - Move medium
pos <to> - Position to drive or slot
s [d|p|t|s [n]] [raw] - Read element status
inquiry - Display vendor and product ID
rezero - Rezero unit
inport - Ready inport (media access port)
debug - Toggle debug mode for this utility
test_ready - Send a TEST UNIT READY to the device

<from> <to> specifies drive (d#), slot (s#), media access port (p#),
or transport (t#)
<d#|s#|p#|t#> is drive #, slot #, media access port #, or transport #
[#] is number of elements for d, s, p, or t
NOTE - drive # is 1 - Number of drives
slot # is 1 - Number of slots
media access port # is 1 - Number of media access port elements
transport # is 1 - Number of transports
<type> = (d)rive, (s)lot, media access (p)ort, or (t)ransport

unload <drive> - Issue SCSI unload
<drive> = d1 or 1, d2 or 2, d3 or 3 ... d648 or 648

inquiry
Inquiry_data: STK L40 0213
test_ready
Unit is ready
q

Robot Selection
---------------
1) TLD 0
2) none/quit
Enter choice:

尝试发出test_ready命令,等待一段时间后,发现机械手状态已经恢复正常:

Index DriveName DrivePath Type Shared Status
***** ********* ********** **** ****** ******
0 IBMULTRIUM-TD10 /dev/rmt/1cbn hcart Yes UP
TLD(0) Definition DRIVE=1

Currently defined robotics are:
TLD(0) robotic path = /dev/sg/c2t4l1,
volume database host = backupcenter240

下面尝试备份:

$ rman target /

Recovery Manager: Release 9.2.0.4.0 - 64bit Production

Copyright (c) 1995, 2002, Oracle Corporation. All rights reserved.

connected to target database: BJDB01 (DBID=3255963758)

RMAN> backup current controlfile;

Starting backup at 11-MAR-08
using target database controlfile instead of recovery catalog
allocated channel: ORA_SBT_TAPE_1
channel ORA_SBT_TAPE_1: sid=19 devtype=SBT_TAPE
channel ORA_SBT_TAPE_1: VERITAS NetBackup for Oracle - Release 5.0GA (2003103006)
channel ORA_SBT_TAPE_1: starting full datafile backupset
channel ORA_SBT_TAPE_1: specifying datafile(s) in backupset
including current controlfile in backupset
channel ORA_SBT_TAPE_1: starting piece 1 at 11-MAR-08
channel ORA_SBT_TAPE_1: finished piece 1 at 11-MAR-08
piece handle=ttjb17ur_1_1 comment=API Version 2.0,MMS Version 5.0.0.0
channel ORA_SBT_TAPE_1: backup set complete, elapsed time: 00:04:56
Finished backup at 11-MAR-08

Starting Control File Autobackup at 11-MAR-08
piece handle=c-3255963758-20080311-00 comment=API Version 2.0,MMS Version 5.0.0.0
Finished Control File Autobackup at 11-MAR-08

尝试备份终于成功。

可惜的是,备份小的文件似乎没有问题,一旦备份文件比较大的时候,仍然出现上面的错误信息:

RMAN-00571: ===========================================================
RMAN-00569: =============== ERROR MESSAGE STACK FOLLOWS ===============
RMAN-00571: ===========================================================
RMAN-03009: failure of backup command on ch00 channel at 03/10/2008 05:14:15
ORA-19502: write error on file "bk_26552_1_648968690", blockno 664577 (blocksize=512)
ORA-27030: skgfwrt: sbtwrite2 returned error
ORA-19511: Error received from media manager layer, error text:
VxBSASendData: Failed with error:
Server Status: Communication with the server has not been iniatated or the server status has not been retrieved from the server.

而且后台日志出现大量的IO错误信息:

03/12/2008 09:42:51 backupcenter240 bjdb01 cannot write image to media id 000016, drive index 0, I/O错误
03/12/2008 09:42:51 backupcenter240 bjdb01 FREEZING media id 000016, it has had at least 3 errors in the last 12 hour(s)
03/12/2008 09:43:08 backupcenter240 bjdb01 CLIENT bjdb01 POLICY oracle SCHED Default-Application-Backup EXIT STATUS 84 (media write error)
03/12/2008 09:43:08 backupcenter240 bjdb01 backup by oracle on client bjdb01: media write error

看来现在不仅仅是软件问题了,经过供应商最后确认,是带库的读写头出现问题,最终通过更换配件,解决了这个问题。

转载于:https://www.cnblogs.com/myitworld/archive/2008/04/22/2214883.html

一次NBU备份错误诊断相关推荐

  1. NBU备份之一 Windows操作系统BMR的配置

    一. BMR原理概述 BMR的主要功能是方便用户快捷的恢复操作系统及其上面的应用软件.它的基本原理就是在恢复的时候先在BMR CLIENT端上安装一个小型的,带有网络功能的操作系统.当该操作系统建立完 ...

  2. NBU备份之效率提升

    使用Netbackup备份过程中,有些即使分配了多通道但备份速度极不理想,那极有可能是因为Netbackup的一些参数设置过于保守,本篇介绍可能影响备份速度的一些参数,以及修改方式. 1.设置NET_ ...

  3. NBU备份rac恢复到single

    1. 根据生产spfile,创建恢复数据库pfile 2.       *.audit_file_dest='/u01/app8/oracle/admin/zjyw/adump' 3.       * ...

  4. Windows2014使用NBU备份实现Oracle11g本地恢复和异地恢复

    Windows2014使用NBU备份实现Oracle11g本地恢复和异地恢复 主机名 IP 角色 环境配置 Win-126 192.168.0.126 业务数据库 安装完整的oracle11G数据库和 ...

  5. nbu备份本机oracle,使用NetBackup进行oracle备份和恢复

    一.环境介绍: 这个实验都是在vmware workstation里完成的.由于NetBackup7只能装在64位的系统上,所以这里采用了64位的rhel5.5系统,以及oracle 10gr2 fo ...

  6. 大话nbu四(nbu备份恢复catalog)

    Catalog是记录master server和其配置的信息的文件,通过对它进行备份可以在master出现异常或进行异地恢复时进行主机环境的恢复. 4.1备份NetBackup索引数据库 在建立备份策 ...

  7. symantec NBU 备份 status 6

    Ø 6 :未找到备份文件,一般是路径出错,或者是数据库实例配置问题 情景1:   最有可能是手工删除了归档日志,导致备份是找不到文件 可以使用rman crosscheck 一下 rman targe ...

  8. nbu备份恢复catalog

    在建立备份策略时选择nbu catalog,选择存储位置即可,一版选择存储在非c盘下比较安全,进行手工备份一次即可. 建议定期备份netbackup的索引数据库,以确保故障时的有效恢复. 从nbu可以 ...

  9. nbu备份db2数据库6号错误解决案例

    NBU在备份一台AIX服务器上的DB2数据库时,报6号错误. 报错截图: 在AIX服务器上运行备份脚本,具体报错信息如下: Executing: db2 BACKUP DATABASE BJMOA4 ...

  10. nbu备份本机oracle,NBU异构还原Oracle完整备份的一些总结

    准备 异构Ortacle服务器 添加相关服务器的Hosts记录. 安装NBU client agent. 安装相同的Oracle软件版本. 创建相同的管理员账号及密码. 创建与源Oracle相同名称, ...

最新文章

  1. Android Dialog 弹框之外的区域 默认透明背景色修改
  2. 现半透明的popupwindow
  3. ASCII和字母的转换
  4. “中能融合杯”线下赛感悟
  5. Lucene正则表达式查询RegenxQuery
  6. 2021 ACDU China Tour-上海站暨数据库大咖讲坛(第4期)成功举办!(附视频回放PPT下载)...
  7. Repository 仓储,你的归宿究竟在哪?(三)-SELECT 某某某。。。
  8. node.js学习之路(1)
  9. linux ip命令dhcp,嵌入式linux通过DHCP自动获取IP地址实现获取
  10. 如何把word文件转换成PDF格式?
  11. Vue - 选择器拼音快速检索目标(pinyin-match)
  12. python webp图片转化格式
  13. 计算机显存影响什么,老司机告诉你显存是怎样影响电脑速度的
  14. 相遇在世界尽头与冷酷仙境
  15. android直播音频开发准备
  16. ProtonMail邮箱
  17. spark hint中Broadcast Hints、COALESCE and REPARTITION Hints
  18. 算法学习笔记:简单数据结构及排序算法
  19. Unity 绿幕视频抠图算法原理与实现 -- 效果极好
  20. 技嘉 linux设置u盘启动项,技嘉u盘启动,教您技嘉主板怎么设置u盘启动

热门文章

  1. 计算机二级c语言编程题库100题下载,计算机二级c语言编程题库(100题).pdf
  2. Spring使用json转换工具
  3. 2019年注册测绘师备考历程
  4. 荐书丨《哥德尔、艾舍尔、巴赫书:集异璧之大成》:机器人与音乐的次元壁破了
  5. 02- linux下运行.exe文件(wine工具)
  6. 收看IT播吧吉米老师iptables讲座
  7. android 投屏原理图,手机投屏是什么原理
  8. 微信小游戏(打飞机1)
  9. 修改了下exeScope的导出函数功能,让它只导出函数名。。。
  10. 游戏公司2022秋招记录