For some reason, I’ve been working on lots of Exadata V2 systems in the past few months.  One of the issues that I’ve been coming across for these clients is a failure in the battery that is used by the RAID controller.  It was originally expected for these batteries to last 2 years.  Unfortunately, there is a defect in the batteries where they reach their end of life after approximately 18 months.  The local Sun reps should have access to a schedule that says when the “regular maintenance” should occur.  For one client, it wasn’t caught until the batteries had run down completely and the disks were in WriteThrough mode.  This can be seen by running MegaCLI64.  Here is the output to check the WriteBack/WriteThrough status for 2 different compute nodes (V2 is first, X2-2 is second):

?View Code NONE
[enkdb01:root] /root
> dmidecode -s system-product-name
SUN FIRE X4170 SERVER          [enkdb01:root] /root
> /opt/MegaRAID/MegaCli/MegaCli64 -LDInfo -LALL -aALL | grep "Cache Policy"
Default Cache Policy: WriteBack, ReadAheadNone, Direct, No Write Cache if Bad BBU
Current Cache Policy: WriteThrough, ReadAheadNone, Direct, No Write Cache if Bad BBU
Disk Cache Policy   : Disabled
?View Code NONE
[root@enkdb03 ~]# dmidecode -s system-product-name
SUN FIRE X4170 M2 SERVER
[root@enkdb03 ~]# /opt/MegaRAID/MegaCli/MegaCli64 -LDInfo -LALL -aALL | grep "Cache Policy"
Default Cache Policy: WriteBack, ReadAheadNone, Direct, No Write Cache if Bad BBU
Current Cache Policy: WriteBack, ReadAheadNone, Direct, No Write Cache if Bad BBU
Disk Cache Policy   : Disabled

If you have a V2 and you haven’t replaced the batteries yet, it’s worth running these commands to see what state your RAID controllers are in.  To find out what this means for you, read on after the break.

As you can see, on the V2 (X4170), the current cache policy is set to WriteThrough mode, while the X2 (X4170 M2) is still running in WriteBack mode.  What’s the difference?  According to the interwebs (http://goo.gl/6LyVK):

RAID Features
Write Through Cache

With Write Through Cache the data is written to both the cache and drive once the data is retrieved. As the data is written to both places, should the information be required it can be retrieved from the cache for faster access. The downside of this method is that the time to carry out a Write operation is greater the time to do a Write to a non cache device. The total Write time is the time to write to the cache plus the time to Write the disk.

Write Back Cache

With Write Back Cache the write operation does not suffer from the Write time delay. The block of data is initially written to the cache, only when the cache is full or required is the data written to the disk.

The limitation of this method is that the storage device for a period of time does not contain the new or updated block of data. If the data in the cache is lost due to power failure the data cannot be recovered. When using Write Back Cache a battery backup module would prevent data loss in a RAID power failure. 

We obviously have a problem on the V2 system, since it’s running in WriteThrough mode.  Why is this?  The first thing to check is the status of the battery.  From the Exadata setup/configuration best practices note (#1274318.1), we see the command that can be run to check the battery condition.  From the note:

Proactive battery replacement should be performed within 60 days for any batteries that do not meet the following criteria:

1) “Full Charge Capacity” less than or equal to 800 mAh and “Max Error” less than 10%.

Immediately replace any batteries that do not meet the following criteria:

1) “Max Error” is 10% or greater (battery deemed unreliable regardless of “Full Charge Capacity” reading)

2) “Full Charge Capacity” less than 674 mAh regardless of “Max Error” reading

Let’s run the command on our X2 and see how it looks:

?View Code NONE
[root@enkdb03 ~]# /opt/MegaRAID/MegaCli/MegaCli64 -AdpBbuCmd -a0 | grep "Full Charge" -A5 | sort | grep Full -A1
Full Charge Capacity: 1358 mAh
Max Error: 0 %

That looks great….just what we should expect. The “Full Charge Capacity” is well over 800 mAh threshold. Now, let’s look at the system that’s in WriteThrough mode:

?View Code NONE
[enkdb01:root] /root
> /opt/MegaRAID/MegaCli/MegaCli64 -AdpBbuCmd -a0 | grep "Full Charge" -A5 | sort | grep Full -A1
Full Charge Capacity: 562 mAh
Max Error: 2 %

That’s not good. Full charge capacity is below the dreaded “immediate replacement” level.  Looks like we need to get the batteries replaced.  The process of replacing the batteries is very straightforward.  The node (cell or compute) has to be powered off, opened up, and the old battery is unplugged from the LSI disk controller, and the new battery is connected.  Repeat for each of your servers.  No outage is required…the batteries can be replaced in a rolling fashion.  After a few hours, the new battery is charged, and the disks return to WriteBack mode.  One caveat is that the LSI firmware image included in Exadata storage servers previous to 11.2.2.1.1 do not recognize the new batteries.  Most V2 Exadata systems shipped with version 11.2.1.3.1 or older.  This means that if you’re running a V2 and haven’t patched yet, it’s time to start looking at getting the system up to date.  If you stay on an older version and choose to replace the batteries, you will most likely see no benefit.  All the more reason to keep you Exadata up to date.

One thing that I was curious about was how the battery degradation was missed.  Exadata systems run a periodic check of the battery that will force it down to no charge, and allow it to be charged up.  If you own an Exadata system, you have most likely seen the quarterly (formerly monthly) alerts that all drives are in WriteThrough mode.  After a few cycles, it’s easy to become desensitized to these messages and just delete them.  It’s imperative that you ensure that you also receive a message that the disks have returned to WriteBack mode.  If you don’t receive this, then your batteries may need to be replaced.  Two message should be received for each storage server – one that the disks are in WriteThrough mode, and one that they’ve returned to WriteBack mode.  The messages should look like this:


Unfortunately, it’s not quite good enough to wait for the all clear messages to come flooding your mailbox.  The compute nodes do not send these messages, which means that you could be in WriteThrough mode without being notified.  While it’s not as critical as it is on the storage servers, running in WriteThrough mode will show some performance degradation when running operations against the local disks (trace files, logs, local batch jobs, etc).  To resolve this, I suggest running Exachk (MOS note #1070954.1) at least once a quarter.  It will help diagnose anything that may have gone sideways in your Exadata environment.

Happy replacing!

Be Sociable, Share!

转载于:https://www.cnblogs.com/ericli/articles/4257177.html

Exadata V2 Battery Replacement相关推荐

  1. mysql类exadata功能_Exadata V2数据库一体机的几大致命缺点

    昨天客户语重心长的告诉我,他们准备把Exadata V2 上的核心应用迁移走,客户在09年就开始用Exadata,是不是国内第一家我不知道,但至少应该是用于生产的第一批.但是这2年来因为Exadata ...

  2. 一些有用的Exadata诊断命令

    检测check Exadata Image & OS versions , GI & DB patches sundiag exacheck cellserv ==> image ...

  3. Exadata使用技巧 (二)

    1. Exadata硬件篇 1.1 常规 默认密码,以下是Exadata中cell/db node IB等的默认密码: 组件 登陆 默认密码 Storage Cells root nm2user we ...

  4. 使用 Oracle GoldenGate 进行实时数据集成

    [转自]http://blog.chinaunix.net/u1/53677/showart_2314585.html 使用 Oracle GoldenGate 进行实时数据集成 了解如何安装.设置和 ...

  5. MegaSAS RAID卡 BBU Learn Cycle周期的影响

    背景 最近遇到有些带MegaSAS RAID卡的服务器,在业务高峰时突然IO负载飚升得很高,IO性能急剧下降,查了日志及各种设置最后才发现是RAID卡的Cache写策略由WriteBack变成Writ ...

  6. 在OOW2009上寻宝撞大运续(床上篇)

    历时5天的Oracle Open World 2009终于,终于结束了.今天最后的节目是去听一场金融分析师的会议,"只"开了不到6个钟. 去的时候是毛毛雨,回来的时候终于看到了一缕 ...

  7. apple 西单大悦城维修_如何检查Apple是否已召回MacBook(免费维修)

    apple 西单大悦城维修 Apple 苹果 Apple has recalled a lot of MacBooks recently. Your MacBook may be eligible f ...

  8. iphone4s更换电池_更换iPhone电池有多困难?

    iphone4s更换电池 With iPhone owners rushing to get their batteries replaced, wait lists at the Apple Gen ...

  9. Bose Soundlink Ⅲ 随机断电故障处理

    Bose Soundlink Ⅲ 随机断电故障处理 我的Bose蓝牙音箱,型号Soundlink第三代,于2014年购入,近日出现随机断电的故障. 有鉴于此音箱实在是昂贵(399美元),不忍丢弃,遂试 ...

最新文章

  1. huber loss
  2. AIX系统CPU性能评估-1
  3. Shell多线程操作及线程数控制实例
  4. [19/04/11-星期四] 多线程_并发协作(生产者/消费者模式_2种解决方案(管程法和信号灯法))...
  5. idea 父文件_万事开头难!最新MyBatis程序配置教程(IDEA版)
  6. php管道的概念,管道线的概念定义及分析技巧的讲解
  7. 【VMware】宿主机连接wifi,虚拟机中的Linux系统配置连接wifi
  8. 《ANTLR 4权威指南 》一导读
  9. Java入门学习笔记之变量与计算
  10. Tesseract训练笔记
  11. win7一直显示正在关机_windows8.1和windows7哪个好_win8.1好还是win7好用
  12. Hibernate工作流程及与 MyBatis的比较
  13. c花体复制_能复制的花体英文字母
  14. 南京大学计算机科学与技术系陈东东,南京大学2016年硕士生学业奖学金评审结果计算机科学与技术系...
  15. forEach终止循环
  16. UI设计可供性解析:巧用隐藏的设计力提升用户体验
  17. 谨慎处理单片机中断,中断等价于比主程序优先级更高的线程
  18. Android图片内存的计算
  19. HTML+CSS系列学习
  20. 带有滚动效果的ViewPager

热门文章

  1. as 运算符 与 where T : class
  2. cPanel附加域名出现Error from park wrapper: 使用带以下 IP 的命名服务器:
  3. php中系统函数的特征,老鸟需要知道的一些php系统类函数
  4. 【转】python开发大全、系列文章、精品教程
  5. 文本嵌入的经典模型与最新进展
  6. 保护你的眼睛,把电脑屏幕由白色改为淡绿
  7. 富文本编辑器中空格转化为a_文本编辑器题解
  8. TensorFlowOnSpark 接口函数用法
  9. 战神背光键盘如何关系_神舟战神K660E-i7D8红色背光键盘,深夜战火不熄!
  10. mysql导入txt linux_Linux中将txt导入到mysql的方法教程