定位硬盘盘位和盘符的方法

From Lin.Wang

Section One : Introduction

strocli是megacli的升级版本,针对于戴尔服务器是perccli,用法完全一致

smartctl可以查看磁盘的主控芯片smart信息

lsscsi可以查看系统的scsi信息,数据来源/proc/scsi/scsi相关,该文档此处暂不介绍

这些工具都是查看磁盘相关信息的常用工具,对于排查磁盘状态和raid卡问题都有帮助

Section Two : Install package

安装一下storcli或者perccli,并且将命令软连接到/usr/bin/目录下,方便使用命令:

ln -s /opt/MegaRAID/storcli/storcli64 /usr/bin/

ln -s /opt/MegaRAID/perccli/percclie64 /usr/bin/

Section Three : Step

由系统磁盘盘符/dev/sdf定位对应的硬盘盘位思路如下:

perccli64 /c0/eall/sall show 看到该磁盘有

img-/c0/eall/sall

从该图看到有四个jbod分区,根据经验一般人为jbod的分区系统盘符会在raid分区之前,也就是说jbod的分区会从/dev/sda > /dev/sdd,raid的分区从/dev/sde开始;

DG代表drive group,是配置raid建分组的顺序,有图上看到32:4和32:5是一个卷组。

perccli64 /c0/vall show看到该磁盘的DG与VD的对应关系如下

img-/c0/vall

​由图上看到DG/VD就是raid的卷组和系统里卷组的顺序对应关系,一般如果服务器只有raid卷组来说的话,VD0就是操作系统里的/dev/sda,以此类推;但是如果服务器包括了jbod卷组,则raid的卷组从jbod后开始排序,本例中也就是VD0=/dev/sde,则要定位/dev/sdf的话VD=1,对应DG=1;

​回到img-/c0/eall/sall上,DG为1时,DID=6,DID就是device id,这个概念后边有用;同时Slot NO.也就是slt = 6对应的服务器上盘位就是第7个(从0开始到6),此时即定位到了/dev/sdf的物理盘位。

反之从服务器上看到硬盘故障灯,可以反推对应的系统分区盘符

Note:

​如果服务器没有jbod卷组,全是raid的,则此时/c0/vall找到对应关系即可定位关联关系

​实际操作时还可以通过 perccli64 /c0/e32/s6 start/stop locate点亮关闭磁盘灯,来判断定位是否正确

Section Four : storcli/perccli Usage

查看控制器的信息

**perccli64 show ctrlcount查看有几个控制器即几个raid卡 **

perccli64 show显示raid卡信息

[root@node-15 ~]# perccli64 show

Status Code = 0

Status = Success

Description = None

Number of Controllers = 1

Host Name = node-15.domain.tld

Operating System = Linux3.10.0-327.20.1.es2.el7.x86_64

System Overview :

===============

------------------------------------------------------------------------

Ctl Model Ports PDs DGs DNOpt VDs VNOpt BBU sPR DS EHS ASOs Hlth

------------------------------------------------------------------------

0 PERCH730Mini 8 16 11 0 11 0 Opt On 3 N 0 Opt

------------------------------------------------------------------------

Ctl=Controller Index|DGs=Drive groups|VDs=Virtual drives|Fld=Failed

PDs=Physical drives|DNOpt=DG NotOptimal|VNOpt=VD NotOptimal|Opt=Optimal

Msng=Missing|Dgd=Degraded|NdAtn=Need Attention|Unkwn=Unknown

sPR=Scheduled Patrol Read|DS=DimmerSwitch|EHS=Emergency Hot Spare

Y=Yes|N=No|ASOs=Advanced Software Options|BBU=Battery backup unit

Hlth=Health|Safe=Safe-mode boot

可以看到只有一个raid卡,ctrl 0也是就是/c0

storcli64 /c0 show

[root@node-15 ~]# perccli64 /c0 show

Generating detailed summary of the adapter, it may take a while to complete.

Controller = 0

Status = Success

Description = None

Product Name = PERC H730 Mini

Serial Number = 663021Z

SAS Address = 51866da066153000

PCI Address = 00:03:00:00

System Time = 01/10/2019 20:48:38

Mfg. Date = 06/17/16

Controller Time = 01/10/2019 12:44:21

FW Package Build = 25.4.0.0017

BIOS Version = 6.29.00.0_4.16.07.00_0x06120100

FW Version = 4.260.00-6259

Driver Name = megaraid_sas

Driver Version = 06.807.10.00-rh1

Current Personality = RAID-Mode

Vendor Id = 0x1000

Device Id = 0x5D

SubVendor Id = 0x1028

SubDevice Id = 0x1F49

Host Interface = PCI-E

Device Interface = SAS-12G

Bus Number = 3

Device Number = 0

Function Number = 0

Drive Groups = 11

TOPOLOGY :

========

---------------------------------------------------------------------------

DG Arr Row EID:Slot DID Type State BT Size PDC PI SED DS3 FSpace TR

---------------------------------------------------------------------------

0 - - - - RAID1 Optl N 931.0 GB dflt N N dflt N N

0 0 - - - RAID1 Optl N 931.0 GB dflt N N dflt N N

0 0 0 32:4 4 DRIVE Onln N 931.0 GB dflt N N dflt - N

0 0 1 32:5 5 DRIVE Onln N 931.0 GB dflt N N dflt - N

1 - - - - RAID0 Optl N 931.0 GB dflt N N dflt N N

1 0 - - - RAID0 Optl N 931.0 GB dflt N N dflt N N

1 0 0 32:6 6 DRIVE Onln N 931.0 GB dflt N N dflt - N

2 - - - - RAID0 Optl N 931.0 GB dflt N N dflt N N

2 0 - - - RAID0 Optl N 931.0 GB dflt N N dflt N N

2 0 0 32:7 7 DRIVE Onln N 931.0 GB dflt N N dflt - N

3 - - - - RAID0 Optl N 931.0 GB dflt N N dflt N N

3 0 - - - RAID0 Optl N 931.0 GB dflt N N dflt N N

3 0 0 32:8 8 DRIVE Onln N 931.0 GB dflt N N dflt - N

4 - - - - RAID0 Optl N 931.0 GB dflt N N dflt N N

4 0 - - - RAID0 Optl N 931.0 GB dflt N N dflt N N

4 0 0 32:9 9 DRIVE Onln N 931.0 GB dflt N N dflt - N

5 - - - - RAID0 Optl N 931.0 GB dflt N N dflt N N

5 0 - - - RAID0 Optl N 931.0 GB dflt N N dflt N N

5 0 0 32:10 10 DRIVE Onln N 931.0 GB dflt N N dflt - N

6 - - - - RAID0 Optl N 931.0 GB dflt N N dflt N N

6 0 - - - RAID0 Optl N 931.0 GB dflt N N dflt N N

6 0 0 32:11 11 DRIVE Onln N 931.0 GB dflt N N dflt - N

7 - - - - RAID0 Optl N 931.0 GB dflt N N dflt N N

7 0 - - - RAID0 Optl N 931.0 GB dflt N N dflt N N

7 0 0 32:12 12 DRIVE Onln N 931.0 GB dflt N N dflt - N

8 - - - - RAID0 Optl N 931.0 GB dflt N N dflt N N

8 0 - - - RAID0 Optl N 931.0 GB dflt N N dflt N N

8 0 0 32:13 13 DRIVE Onln N 931.0 GB dflt N N dflt - N

9 - - - - RAID0 Optl N 931.0 GB dflt N N dflt N N

9 0 - - - RAID0 Optl N 931.0 GB dflt N N dflt N N

9 0 0 32:14 14 DRIVE Onln N 931.0 GB dflt N N dflt - N

10 - - - - RAID0 Optl N 931.0 GB dflt N N dflt N N

10 0 - - - RAID0 Optl N 931.0 GB dflt N N dflt N N

10 0 0 32:15 15 DRIVE Onln N 931.0 GB dflt N N dflt - N

---------------------------------------------------------------------------

DG=Disk Group Index|Arr=Array Index|Row=Row Index|EID=Enclosure Device ID

DID=Device ID|Type=Drive Type|Onln=Online|Rbld=Rebuild|Dgrd=Degraded

Pdgd=Partially degraded|Offln=Offline|BT=Background Task Active

PDC=PD Cache|PI=Protection Info|SED=Self Encrypting Drive|Frgn=Foreign

DS3=Dimmer Switch 3|dflt=Default|Msng=Missing|FSpace=Free Space Present

TR=Transport Ready

Virtual Drives = 11

VD LIST :

=======

-------------------------------------------------------------

DG/VD TYPE State Access Consist Cache Cac sCC Size Name

-------------------------------------------------------------

0/0 RAID1 Optl RW Yes RWBD - OFF 931.0 GB

1/1 RAID0 Optl RW Yes RWBD - OFF 931.0 GB

2/2 RAID0 Optl RW Yes RWBD - OFF 931.0 GB

3/3 RAID0 Optl RW Yes RWBD - OFF 931.0 GB

4/4 RAID0 Optl RW Yes RWBD - OFF 931.0 GB

5/5 RAID0 Optl RW Yes RWBD - OFF 931.0 GB

6/6 RAID0 Optl RW Yes RWBD - OFF 931.0 GB

7/7 RAID0 Optl RW Yes RWBD - OFF 931.0 GB

8/8 RAID0 Optl RW Yes RWBD - OFF 931.0 GB

9/9 RAID0 Optl RW Yes RWBD - OFF 931.0 GB

10/10 RAID0 Optl RW Yes RWBD - OFF 931.0 GB

-------------------------------------------------------------

Cac=CacheCade|Rec=Recovery|OfLn=OffLine|Pdgd=Partially Degraded|Dgrd=Degraded

Optl=Optimal|RO=Read Only|RW=Read Write|HD=Hidden|TRANS=TransportReady|B=Blocked|

Consist=Consistent|R=Read Ahead Always|NR=No Read Ahead|WB=WriteBack|

FWB=Force WriteBack|WT=WriteThrough|C=Cached IO|D=Direct IO|sCC=Scheduled

Check Consistency

Physical Drives = 16

PD LIST :

=======

----------------------------------------------------------------------------

EID:Slt DID State DG Size Intf Med SED PI SeSz Model Sp

----------------------------------------------------------------------------

32:0 0 JBOD - 185.75 GB SATA SSD N N 512B INTEL SSDSC2BX200G4R U

32:1 1 JBOD - 185.75 GB SATA SSD N N 512B INTEL SSDSC2BX200G4R U

32:2 2 JBOD - 185.75 GB SATA SSD N N 512B INTEL SSDSC2BX200G4R U

32:3 3 JBOD - 185.75 GB SATA SSD N N 512B INTEL SSDSC2BX200G4R U

32:4 4 Onln 0 931.0 GB SATA HDD N N 512B ST91000640NS U

32:5 5 Onln 0 931.0 GB SATA HDD N N 512B ST91000640NS U

32:6 6 Onln 1 931.0 GB SATA HDD N N 512B ST91000640NS U

32:7 7 Onln 2 931.0 GB SATA HDD N N 512B ST91000640NS U

32:8 8 Onln 3 931.0 GB SATA HDD N N 512B ST91000640NS U

32:9 9 Onln 4 931.0 GB SATA HDD N N 512B ST91000640NS U

32:10 10 Onln 5 931.0 GB SATA HDD N N 512B ST91000640NS U

32:11 11 Onln 6 931.0 GB SATA HDD N N 512B ST91000640NS U

32:12 12 Onln 7 931.0 GB SATA HDD N N 512B ST91000640NS U

32:13 13 Onln 8 931.0 GB SATA HDD N N 512B ST91000640NS U

32:14 14 Onln 9 931.0 GB SATA HDD N N 512B ST91000640NS U

32:15 15 Onln 10 931.0 GB SATA HDD N N 512B ST91000640NS U

----------------------------------------------------------------------------

EID-Enclosure Device ID|Slt-Slot No.|DID-Device ID|DG-DriveGroup

DHS-Dedicated Hot Spare|UGood-Unconfigured Good|GHS-Global Hotspare

UBad-Unconfigured Bad|Onln-Online|Offln-Offline|Intf-Interface

Med-Media Type|SED-Self Encryptive Drive|PI-Protection Info

SeSz-Sector Size|Sp-Spun|U-Up|D-Down/PowerSave|T-Transition|F-Foreign

UGUnsp-Unsupported|UGShld-UnConfigured shielded|HSPShld-Hotspare shielded

CFShld-Configured shielded|Cpybck-CopyBack|CBShld-Copyback Shielded

BBU_Info :

========

----------------------------------------------

Model State RetentionTime Temp Mode MfgDate

----------------------------------------------

BBU Optimal 0 hour(s) 38C - 0/00/00

----------------------------------------------

看磁盘的Device id、Slot No. 以及DriveGroup

[root@node-15 ~]# perccli64 /c0/eall/sall show

Controller = 0

Status = Success

Description = Show Drive Information Succeeded.

Drive Information :

=================

----------------------------------------------------------------------------

EID:Slt DID State DG Size Intf Med SED PI SeSz Model Sp

----------------------------------------------------------------------------

32:0 0 JBOD - 185.75 GB SATA SSD N N 512B INTEL SSDSC2BX200G4R U

32:1 1 JBOD - 185.75 GB SATA SSD N N 512B INTEL SSDSC2BX200G4R U

32:2 2 JBOD - 185.75 GB SATA SSD N N 512B INTEL SSDSC2BX200G4R U

32:3 3 JBOD - 185.75 GB SATA SSD N N 512B INTEL SSDSC2BX200G4R U

32:4 4 Onln 0 931.0 GB SATA HDD N N 512B ST91000640NS U

32:5 5 Onln 0 931.0 GB SATA HDD N N 512B ST91000640NS U

32:6 6 Onln 1 931.0 GB SATA HDD N N 512B ST91000640NS U

32:7 7 Onln 2 931.0 GB SATA HDD N N 512B ST91000640NS U

32:8 8 Onln 3 931.0 GB SATA HDD N N 512B ST91000640NS U

32:9 9 Onln 4 931.0 GB SATA HDD N N 512B ST91000640NS U

32:10 10 Onln 5 931.0 GB SATA HDD N N 512B ST91000640NS U

32:11 11 Onln 6 931.0 GB SATA HDD N N 512B ST91000640NS U

32:12 12 Onln 7 931.0 GB SATA HDD N N 512B ST91000640NS U

32:13 13 Onln 8 931.0 GB SATA HDD N N 512B ST91000640NS U

32:14 14 Onln 9 931.0 GB SATA HDD N N 512B ST91000640NS U

32:15 15 Onln 10 931.0 GB SATA HDD N N 512B ST91000640NS U

----------------------------------------------------------------------------

EID-Enclosure Device ID|Slt-Slot No.|DID-Device ID|DG-DriveGroup

DHS-Dedicated Hot Spare|UGood-Unconfigured Good|GHS-Global Hotspare

UBad-Unconfigured Bad|Onln-Online|Offln-Offline|Intf-Interface

Med-Media Type|SED-Self Encryptive Drive|PI-Protection Info

SeSz-Sector Size|Sp-Spun|U-Up|D-Down/PowerSave|T-Transition|F-Foreign

UGUnsp-Unsupported|UGShld-UnConfigured shielded|HSPShld-Hotspare shielded

CFShld-Configured shielded|Cpybck-CopyBack|CBShld-Copyback Shielded

Note:

​根据经验,在centos系统里的默认udev规则下,jbod的分区在raid的分区之前(如果在线修改的,重启后会变成jbod在前),通过lsscsi命令可以发现在同一个raid控制器下,jbod的分区的channel值小于raid分区的值,类似下图, 第一个字段的第二个值0是jbod和2是raid的区别.

[root@SZVPN-2 udev]# lsscsi

[0:0:24:0] disk IBM-ESXS MBF2600RC SB2C /dev/sda

[0:2:0:0] disk IBM ServeRAID M5110e 3.19 /dev/sdb

[0:2:1:0] disk IBM ServeRAID M5110e 3.19 /dev/sdc

并且jbod设备的分区在系统里被udev规则识别得到的scsi_level高于raid分区.

udevadm -ap /sys/class/block/sdx |grep scsi_level

我的测试值jbod的scsi_level是7而raid的scsi_level是6.

相应的udev规则是 /lib/udev/rules.d/60-persistent-storage.rules

scsci_level: ATTRS{scsi_level}=="[6-9]*"

查看指定硬盘的信息

[root@node-15 ~]# perccli64 /c0/e32/s6 show all

Controller = 0

Status = Success

Description = Show Drive Information Succeeded.

Drive /c0/e32/s6 :

================

-------------------------------------------------------------------

EID:Slt DID State DG Size Intf Med SED PI SeSz Model Sp

-------------------------------------------------------------------

32:6 6 Onln 1 931.0 GB SATA HDD N N 512B ST91000640NS U

-------------------------------------------------------------------

EID-Enclosure Device ID|Slt-Slot No.|DID-Device ID|DG-DriveGroup

DHS-Dedicated Hot Spare|UGood-Unconfigured Good|GHS-Global Hotspare

UBad-Unconfigured Bad|Onln-Online|Offln-Offline|Intf-Interface

Med-Media Type|SED-Self Encryptive Drive|PI-Protection Info

SeSz-Sector Size|Sp-Spun|U-Up|D-Down/PowerSave|T-Transition|F-Foreign

UGUnsp-Unsupported|UGShld-UnConfigured shielded|HSPShld-Hotspare shielded

CFShld-Configured shielded|Cpybck-CopyBack|CBShld-Copyback Shielded

Drive /c0/e32/s6 - Detailed Information :

=======================================

Drive /c0/e32/s6 State :

======================

Shield Counter = 0

Media Error Count = 46431*** 很明显的问题发生了46431次介质错误 ***

Other Error Count = 0

Drive Temperature = 31C (87.80 F)

Predictive Failure Count = 126 *** 预测故障次数126次 ***

S.M.A.R.T alert flagged by drive = Yes

Drive /c0/e32/s6 Device attributes :

==================================

SN = 9XGA228L

Manufacturer Id = ATA

Model Number = ST91000640NS

NAND Vendor = NA

WWN = 5000c500918f2f8a

Firmware Revision = AA63

Raw size = 931.512 GB [0x74706db0 Sectors]

Coerced size = 931.0 GB [0x74600000 Sectors]

Non Coerced size = 931.012 GB [0x74606db0 Sectors]

Device Speed = 6.0Gb/s

Link Speed = 12.0Gb/s

NCQ setting = N/A

Write Cache = Enabled

Logical Sector Size = 512B

Physical Sector Size = 512B

Connector Name = 00

Drive /c0/e32/s6 Policies/Settings :

==================================

Drive position = DriveGroup:1, Span:0, Row:0

Enclosure position = 0

Connected Port Number = 0(path0)

Sequence Number = 2

Commissioned Spare = No

Emergency Spare = No

Last Predictive Failure Event Sequence Number = 95183 *** 上一次预测错误的序号95183 ***

Successful diagnostics completion on = N/A

SED Capable = No

SED Enabled = No

Secured = No

Cryptographic Erase Capable = No

Locked = No

Needs EKM Attention = No

PI Eligible = No

Certified = Yes

Wide Port Capable = No

Port Information :

================

-----------------------------------------

Port Status Linkspeed SAS address

-----------------------------------------

0 Active 12.0Gb/s 0x500056b33fefe586

-----------------------------------------

Inquiry Data =

5a 0c ff 3f 37 c8 10 00 00 00 00 00 3f 00 00 00

00 00 00 00 20 20 20 20 20 20 20 20 20 20 20 20

58 39 41 47 32 32 4c 38 00 00 00 00 04 00 20 20

20 20 41 41 33 36 54 53 31 39 30 30 36 30 30 34

53 4e 20 20 20 20 20 20 20 20 20 20 20 20 20 20

20 20 20 20 20 20 20 20 20 20 20 20 20 20 10 80

00 40 00 2f 00 40 00 02 00 02 07 00 ff 3f 10 00

3f 00 10 fc fb 00 10 00 ff ff ff 0f 00 00 07 00

Note:

通过单个卷组的信息查看,发现了media error,说明了硬盘是有问题的

查看磁盘与系统磁盘分区的对应

[root@node-15 ~]# perccli64 /c0/vall show

Controller = 0

Status = Success

Description = None

Virtual Drives :

==============

-------------------------------------------------------------

DG/VD TYPE State Access Consist Cache Cac sCC Size Name

-------------------------------------------------------------

0/0 RAID1 Optl RW Yes RWBD - OFF 931.0 GB

1/1 RAID0 Optl RW Yes RWBD - OFF 931.0 GB

2/2 RAID0 Optl RW Yes RWBD - OFF 931.0 GB

3/3 RAID0 Optl RW Yes RWBD - OFF 931.0 GB

4/4 RAID0 Optl RW Yes RWBD - OFF 931.0 GB

5/5 RAID0 Optl RW Yes RWBD - OFF 931.0 GB

6/6 RAID0 Optl RW Yes RWBD - OFF 931.0 GB

7/7 RAID0 Optl RW Yes RWBD - OFF 931.0 GB

8/8 RAID0 Optl RW Yes RWBD - OFF 931.0 GB

9/9 RAID0 Optl RW Yes RWBD - OFF 931.0 GB

10/10 RAID0 Optl RW Yes RWBD - OFF 931.0 GB

-------------------------------------------------------------

Cac=CacheCade|Rec=Recovery|OfLn=OffLine|Pdgd=Partially Degraded|Dgrd=Degraded

Optl=Optimal|RO=Read Only|RW=Read Write|HD=Hidden|TRANS=TransportReady|B=Blocked|

Consist=Consistent|R=Read Ahead Always|NR=No Read Ahead|WB=WriteBack|

FWB=Force WriteBack|WT=WriteThrough|C=Cached IO|D=Direct IO|sCC=Scheduled

Check Consistency

Note:

VD:一般认为是该硬盘在系统里的设备顺序,一般如果只有raid分区,那么VD=0的就是系统里的/dev/sda,VD=1就是/dev/sdb以此类推,但是如果有jbod的分区,先排列jbod分区,如jbod的到了/dev/sdc,VD0则是/dev/sdd,以此类推;

DG:是在raid卡里配置卷组的顺序;

Raid卡日志收集相关命令

storcli64 /c0 show time 显示raid的时间

storcli64 /c0 show alilog logfile=node-x.alilog 获取alilog,所有的log都包括了

storcli64 /c0 show all logfile=node-x.all.lograid卡的信息

storcli64 /c0 show badblocks磁盘坏道的信息

perccli64 /c0 show events filter=fatal 显示事件级别为fatal的,可以获取所有毁灭性事件的信息,发现磁盘故障或raid卡故障

perccli64 /c0 show cc 数据一致性检测,raid1以上的级别多个盘的数据是需要进行一致性检测的,但是单盘raid0可能是不需要的,是否影响性能不确定

Section Five : Smartctl Get Error info of Disks

Common Commands Usage Description

--scanScan for devices

--scan-openScan for devices and try to open each device

-x, --xall Show all information for device

-a, --allShow all SMART information for device

-i, --infoShow identity information for device

-d TYPE, --device=TYPE Specify device type to one of: ata, scsi, nvme[,NSID], sat[,auto][,N][+TYPE], usbcypress[,X], usbjmicron[,p][,x][,N], usbprolific, usbsunplus, marvell, areca,N/E, 3ware,N, hpt,L/M/N, megaraid,N, aacraid,H,L,ID, cciss,N, auto, test

-s VALUE, --smart=VALUEEnable/disable SMART on device (on/off)

-o VALUE, --offlineauto=VALUE(ATA)Enable/disable automatic offline testing on device (on/off)

-S VALUE, --saveauto=VALUE(ATA)Enable/disable Attribute autosave on device (on/off)

-H, --healthShow device SMART health status

-c, --capabilities(ATA,NVMe)Show device SMART capabilities

-A, --attributesShow device SMART vendor-specific Attributes and values

-l TYPE, --log=TYPEShow device log. TYPE: error, selftest, selective, directory[,g|s],

​ xerror[,N][,error], xselftest[,N][,selftest],

​ background, sasphy[,reset], sataphy[,reset],

​ scttemp[sts,hist], scttempint,N[,p],

​ scterc[,N,M], devstat[,N], ssd,

​ gplog,N[,RANGE], smartlog,N[,RANGE],

​ nvmelog,N,SIZE

-t TEST, --test=TESTRun test. TEST: offline, short, long, conveyance, force, vendor,N,

​ select,M-N, pending,N, afterselect,[on|off]

-X, --abortAbort any non-captive test on device

Get info for /dev/sdf

查看所有设备列表

[root@node-15 ~]# smartctl --scan

/dev/sda -d scsi # /dev/sda, SCSI device

/dev/sdb -d scsi # /dev/sdb, SCSI device

/dev/sdc -d scsi # /dev/sdc, SCSI device

/dev/sdd -d scsi # /dev/sdd, SCSI device

/dev/sde -d scsi # /dev/sde, SCSI device

/dev/sdf -d scsi # /dev/sdf, SCSI device

/dev/sdg -d scsi # /dev/sdg, SCSI device

/dev/sdh -d scsi # /dev/sdh, SCSI device

/dev/sdi -d scsi # /dev/sdi, SCSI device

/dev/sdj -d scsi # /dev/sdj, SCSI device

/dev/sdk -d scsi # /dev/sdk, SCSI device

/dev/sdl -d scsi # /dev/sdl, SCSI device

/dev/sdm -d scsi # /dev/sdm, SCSI device

/dev/sdn -d scsi # /dev/sdn, SCSI device

/dev/sdo -d scsi # /dev/sdo, SCSI device

/dev/bus/0 -d megaraid,0 # /dev/bus/0 [megaraid_disk_00], SCSI device

/dev/bus/0 -d megaraid,1 # /dev/bus/0 [megaraid_disk_01], SCSI device

/dev/bus/0 -d megaraid,2 # /dev/bus/0 [megaraid_disk_02], SCSI device

/dev/bus/0 -d megaraid,3 # /dev/bus/0 [megaraid_disk_03], SCSI device

/dev/bus/0 -d megaraid,4 # /dev/bus/0 [megaraid_disk_04], SCSI device

/dev/bus/0 -d megaraid,5 # /dev/bus/0 [megaraid_disk_05], SCSI device

/dev/bus/0 -d megaraid,6 # /dev/bus/0 [megaraid_disk_06], SCSI device

/dev/bus/0 -d megaraid,7 # /dev/bus/0 [megaraid_disk_07], SCSI device

/dev/bus/0 -d megaraid,8 # /dev/bus/0 [megaraid_disk_08], SCSI device

/dev/bus/0 -d megaraid,9 # /dev/bus/0 [megaraid_disk_09], SCSI device

/dev/bus/0 -d megaraid,10 # /dev/bus/0 [megaraid_disk_10], SCSI device

/dev/bus/0 -d megaraid,11 # /dev/bus/0 [megaraid_disk_11], SCSI device

/dev/bus/0 -d megaraid,12 # /dev/bus/0 [megaraid_disk_12], SCSI device

/dev/bus/0 -d megaraid,13 # /dev/bus/0 [megaraid_disk_13], SCSI device

/dev/bus/0 -d megaraid,14 # /dev/bus/0 [megaraid_disk_14], SCSI device

/dev/bus/0 -d megaraid,15 # /dev/bus/0 [megaraid_disk_15], SCSI device

Note:

通过前面的章节我们定位到了磁盘/dev/sdf在perccli里的DID即device_id为6,也就是/dev/bus/0 -d megaraid,6

查看磁盘信息

[root@node-15 ~]# smartctl -i -d megaraid,6 /dev/sdf

smartctl 6.5 2016-05-07 r4318 [x86_64-linux-3.10.0-327.20.1.es2.el7.x86_64] (local build)

Copyright (C) 2002-16, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===

Model Family: Seagate Constellation.2 (SATA)

Device Model: ST91000640NS

Serial Number: 9XGA228L

LU WWN Device Id: 5 000c50 0918f2f8a

Add. Product Id: DELL(tm)

Firmware Version: AA63

User Capacity: 1,000,204,886,016 bytes [1.00 TB]

Sector Size: 512 bytes logical/physical

Rotation Rate: 7200 rpm

Form Factor: 2.5 inches

Device is: In smartctl database [for details use: -P show]

ATA Version is: ATA8-ACS T13/1699-D revision 4

SATA Version is: SATA 3.0, 6.0 Gb/s (current: 6.0 Gb/s)

Local Time is: Fri Jan 11 11:28:46 2019 CST

SMART support is: Available - device has SMART capability.

SMART support is: Enabled

查看磁盘的属性信息

一般此处可以用来查看磁盘的整体健康状态指标参数

针对以下输出信息,字段的解释

ID:属性ID,通常是一个1到255之间的十进制或十六进制的数字。

ATTRIBUTE_NAME:硬盘制造商定义的属性名。

FLAG:属性操作标志(可以忽略)。

VALUE:这是表格中最重要的信息之一,代表给定属性的标准化值,在1到253之间。253意味着最好情况,1意味着最坏情况。取决于属性和制造商,初始化VALUE可以被设置成100或200.

WORST:所记录的最小VALUE。

THRESH:在报告硬盘FAILED状态前,WORST可以允许的最小值,也就是WORST如果小于THRESH,磁盘就会报告FAILED。

TYPE:属性的类型(Pre-fail或Oldage)。Pre-fail类型的属性可被看成一个关键属性,表示参与磁盘的整体SMART健康评估(PASSED/FAILED)。如果任何Pre-fail类型的属性故障,那么可视为磁盘将要发生故障。另一方面,Oldage类型的属性可被看成一个非关键的属性(如正常的磁盘磨损),表示不会使磁盘本身发生故障。

UPDATED:表示属性的更新频率。Offline代表磁盘上执行离线测试的时间。

WHEN_FAILED:如果VALUE小于等于THRESH,会被设置成“FAILING_NOW”;如果WORST小于等于THRESH会被设置成“In_the_past”;如果都不是,会被设置成“-”。在“FAILING_NOW”情况下,需要尽快备份重要文件,特别是属性是Pre-fail类型时。“In_the_past”代表属性已经故障了,但在运行测试的时候没问题。“-”代表这个属性从没故障过。

RAW_VALUE:制造商定义的原始值,从VALUE派生。

[root@node-15 ~]# smartctl -A -d megaraid,6 /dev/sdf

smartctl 6.5 2016-05-07 r4318 [x86_64-linux-3.10.0-327.20.1.es2.el7.x86_64] (local build)

Copyright (C) 2002-16, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF READ SMART DATA SECTION ===

SMART Attributes Data Structure revision number: 10

Vendor Specific SMART Attributes with Thresholds:

ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE

1 Raw_Read_Error_Rate 0x010f 081 038 044 Pre-fail Always In_the_past 151546765

3 Spin_Up_Time 0x0103 094 094 000 Pre-fail Always - 0

4 Start_Stop_Count 0x0032 100 100 020 Old_age Always - 21

5 Reallocated_Sector_Ct 0x0133 100 100 036 Pre-fail Always - 0

7 Seek_Error_Rate 0x000f 085 060 030 Pre-fail Always - 338813105

9 Power_On_Hours 0x0032 079 079 000 Old_age Always - 18784

10 Spin_Retry_Count 0x0013 100 100 097 Pre-fail Always - 0

12 Power_Cycle_Count 0x0032 100 100 020 Old_age Always - 21

184 End-to-End_Error 0x0032 100 100 099 Old_age Always - 0

187 Reported_Uncorrect 0x0032 001 001 000 Old_age Always - 1710

188 Command_Timeout 0x0032 100 100 000 Old_age Always - 0

189 High_Fly_Writes 0x003a 100 100 000 Old_age Always - 0

190 Airflow_Temperature_Cel 0x0022 069 053 045 Old_age Always - 31 (Min/Max 24/40)

191 G-Sense_Error_Rate 0x0032 100 100 000 Old_age Always - 0

192 Power-Off_Retract_Count 0x0032 100 100 000 Old_age Always - 19

193 Load_Cycle_Count 0x0032 100 100 000 Old_age Always - 852

194 Temperature_Celsius 0x0022 031 047 000 Old_age Always - 31 (0 14 0 0 0)

195 Hardware_ECC_Recovered 0x001a 117 099 000 Old_age Always - 151546765

197 Current_Pending_Sector 0x0012 084 084 000 Old_age Always - 688

198 Offline_Uncorrectable 0x0010 084 084 000 Old_age Offline - 688

199 UDMA_CRC_Error_Count 0x003e 200 200 000 Old_age Always - 0

240 Head_Flying_Hours 0x0000 100 253 000 Old_age Offline - 8093 (164 214 0)

241 Total_LBAs_Written 0x0000 100 253 000 Old_age Offline - 1870535293

242 Total_LBAs_Read 0x0000 100 253 000 Old_age Offline - 1530387871

查看磁盘的健康检测状态

Note:

关于以下检测结果,说明检测结果是PASSED的,就是磁盘还可以使用,但是列出了一条检测异常的WORST

[root@node-15 ~]# smartctl -H -d megaraid,6 /dev/sdf

smartctl 6.5 2016-05-07 r4318 [x86_64-linux-3.10.0-327.20.1.es2.el7.x86_64] (local build)

Copyright (C) 2002-16, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF READ SMART DATA SECTION ===

SMART Status not supported: ATA return descriptor not supported by controller firmware

SMART overall-health self-assessment test result: PASSED

Warning: This result is based on an Attribute check.

Please note the following marginal Attributes:

ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE

1 Raw_Read_Error_Rate 0x010f 081 038 044 Pre-fail Always In_the_past 151546765

查看磁盘的错误日志

[root@node-15 ~]# smartctl -l error -d megaraid,6 /dev/sdf

smartctl 6.5 2016-05-07 r4318 [x86_64-linux-3.10.0-327.20.1.es2.el7.x86_64] (local build)

Copyright (C) 2002-16, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF READ SMART DATA SECTION ===

SMART Error Log Version: 1

ATA Error Count: 46431 (device log contains only the most recent five errors)

CR = Command Register [HEX]

FR = Features Register [HEX]

SC = Sector Count Register [HEX]

SN = Sector Number Register [HEX]

CL = Cylinder Low Register [HEX]

CH = Cylinder High Register [HEX]

DH = Device/Head Register [HEX]

DC = Device Command Register [HEX]

ER = Error register [HEX]

ST = Status register [HEX]

Powered_Up_Time is measured from power on, and printed as

DDd+hh:mm:SS.sss where DD=days, hh=hours, mm=minutes,

SS=sec, and sss=millisec. It "wraps" after 49.710 days.

Error 46431 occurred at disk power-on lifetime: 18640 hours (776 days + 16 hours)

When the command that caused the error occurred, the device was active or idle.

After command completion occurred, registers were:

ER ST SC SN CL CH DH

-- -- -- -- -- -- --

40 51 00 ff ff ff 0f Error: UNC at LBA = 0x0fffffff = 268435455

Commands leading to the command that caused the error were:

CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name

-- -- -- -- -- -- -- -- ---------------- --------------------

42 00 00 ff ff ff 4f 00 46d+15:15:32.968 READ VERIFY SECTOR(S) EXT

42 00 00 ff ff ff 4f 00 46d+15:15:29.901 READ VERIFY SECTOR(S) EXT

42 00 00 ff ff ff 4f 00 46d+15:15:26.825 READ VERIFY SECTOR(S) EXT

42 00 00 ff ff ff 4f 00 46d+15:15:23.965 READ VERIFY SECTOR(S) EXT

42 00 00 ff ff ff 4f 00 46d+15:15:20.905 READ VERIFY SECTOR(S) EXT

Error 46430 occurred at disk power-on lifetime: 18640 hours (776 days + 16 hours)

When the command that caused the error occurred, the device was active or idle.

After command completion occurred, registers were:

ER ST SC SN CL CH DH

-- -- -- -- -- -- --

40 51 00 ff ff ff 0f Error: UNC at LBA = 0x0fffffff = 268435455

Commands leading to the command that caused the error were:

CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name

-- -- -- -- -- -- -- -- ---------------- --------------------

42 00 00 ff ff ff 4f 00 46d+15:15:29.901 READ VERIFY SECTOR(S) EXT

42 00 00 ff ff ff 4f 00 46d+15:15:26.825 READ VERIFY SECTOR(S) EXT

42 00 00 ff ff ff 4f 00 46d+15:15:23.965 READ VERIFY SECTOR(S) EXT

42 00 00 ff ff ff 4f 00 46d+15:15:20.905 READ VERIFY SECTOR(S) EXT

42 00 00 ff ff ff 4f 00 46d+15:15:18.093 READ VERIFY SECTOR(S) EXT

Error 46429 occurred at disk power-on lifetime: 18640 hours (776 days + 16 hours)

When the command that caused the error occurred, the device was active or idle.

After command completion occurred, registers were:

ER ST SC SN CL CH DH

-- -- -- -- -- -- --

40 51 00 ff ff ff 0f Error: UNC at LBA = 0x0fffffff = 268435455

Commands leading to the command that caused the error were:

CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name

-- -- -- -- -- -- -- -- ---------------- --------------------

42 00 00 ff ff ff 4f 00 46d+15:15:26.825 READ VERIFY SECTOR(S) EXT

42 00 00 ff ff ff 4f 00 46d+15:15:23.965 READ VERIFY SECTOR(S) EXT

42 00 00 ff ff ff 4f 00 46d+15:15:20.905 READ VERIFY SECTOR(S) EXT

42 00 00 ff ff ff 4f 00 46d+15:15:18.093 READ VERIFY SECTOR(S) EXT

b0 da 00 00 4f c2 00 00 46d+15:15:17.838 SMART RETURN STATUS

Error 46428 occurred at disk power-on lifetime: 18640 hours (776 days + 16 hours)

When the command that caused the error occurred, the device was active or idle.

After command completion occurred, registers were:

ER ST SC SN CL CH DH

-- -- -- -- -- -- --

40 51 00 ff ff ff 0f Error: UNC at LBA = 0x0fffffff = 268435455

Commands leading to the command that caused the error were:

CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name

-- -- -- -- -- -- -- -- ---------------- --------------------

42 00 00 ff ff ff 4f 00 46d+15:15:23.965 READ VERIFY SECTOR(S) EXT

42 00 00 ff ff ff 4f 00 46d+15:15:20.905 READ VERIFY SECTOR(S) EXT

42 00 00 ff ff ff 4f 00 46d+15:15:18.093 READ VERIFY SECTOR(S) EXT

b0 da 00 00 4f c2 00 00 46d+15:15:17.838 SMART RETURN STATUS

2f 00 01 e0 00 00 40 00 46d+15:15:17.703 READ LOG EXT

Error 46427 occurred at disk power-on lifetime: 18640 hours (776 days + 16 hours)

When the command that caused the error occurred, the device was active or idle.

After command completion occurred, registers were:

ER ST SC SN CL CH DH

-- -- -- -- -- -- --

40 51 00 ff ff ff 0f Error: UNC at LBA = 0x0fffffff = 268435455

Commands leading to the command that caused the error were:

CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name

-- -- -- -- -- -- -- -- ---------------- --------------------

42 00 00 ff ff ff 4f 00 46d+15:15:20.905 READ VERIFY SECTOR(S) EXT

42 00 00 ff ff ff 4f 00 46d+15:15:18.093 READ VERIFY SECTOR(S) EXT

b0 da 00 00 4f c2 00 00 46d+15:15:17.838 SMART RETURN STATUS

2f 00 01 e0 00 00 40 00 46d+15:15:17.703 READ LOG EXT

42 00 00 ff ff ff 4f 00 46d+15:15:15.276 READ VERIFY SECTOR(S) EXT

补充

如果没有开启磁盘的smart可以通过-s on device开启

一般来说如果samrtctl -i 获取info时没有什么信息输出且smart support是允许的可用的,那么说明可能需要做test才能获取到-t short/long,该测试不会破坏硬盘上的数据,但对于存储一般不适用离线offline测试

收集时可以通过-x -a参数获取更全面的磁盘信息

smartctl是可以配置服务的/etc/smartmontools/smartd.conf,对此目前没有研究,后续有研究成果再更新

jbod ugood 磁盘驱动状态_storcli64和smartctl定位硬盘的故障信息相关推荐

  1. jbod ugood 磁盘驱动状态_AS SSD Benchmark查看硬盘状态的详细操作步骤

    最近很多朋友咨询关于AS SSD Benchmark怎么查看硬盘状态的问题,今天的这篇教程就来聊一聊这个话题,希望可以帮助到有需要的朋友. AS SSD Benchmark查看硬盘状态的详细操作步骤 ...

  2. strocli64 源码_storcli64和smartctl定位硬盘的故障信息

    定位硬盘盘位和盘符的方法 from lin.wang section one : introduction strocli是megacli的升级版本,针对于戴尔服务器是perccli,用法完全一致 s ...

  3. jbod ugood 磁盘驱动状态_英特尔脱坑玩家必看:AMD主板RAID设置介绍

    原标题:英特尔脱坑玩家必看:AMD主板RAID设置介绍 之前看到一则笑话说二战时期的日本陆军与海军竞争,陆军螺丝往左拧,海军螺丝就必须要设计成往右拧.作为CPU界的一对老冤家,AMD和英特尔使用同样的 ...

  4. storcli64和smartctl定位硬盘的故障信息

    目录 Section One : Introduction Section Two : Install package Section Three : Step Section Four : stor ...

  5. jbod ugood 磁盘驱动状态_组成原理—磁盘/IO/中断

    1.外存储器: 计算机的外存储器又称为辅助存储器,目前主要使用磁表面存储器. 原理:当磁头和磁介质有相对运动, 通过电磁转换完成读写操作.串行工作模式,每度一位,磁头都要移动. 磁盘存储器: 存储区域 ...

  6. jbod ugood 磁盘驱动状态_如何检查Mac磁盘空间,mac磁盘空间其他怎么清理

    致力于成为您终身的苹果管家 点击上方蓝字  关注我们 检查Mac磁盘空间的时候,你会发现"其他存储"占用了驱动器太多的空间.你知道Mac上的其他存储在哪里,mac磁盘空间其他怎么清 ...

  7. jbod ugood 磁盘驱动状态_NTFS磁盘读写工具Mounty免费版

    mounty for mac版是Mac OS平台上的一款ntfs格式读写工具,软件虽小但功能实用,其主要功能类似paragon ntfs for mac.当你插入 ntfs 硬盘后会提示是否创新加载成 ...

  8. jbod ugood 磁盘驱动状态_Win10扫描修复磁盘驱动器错误全攻略

    如果你的电脑在磁盘读写的过程中总是出现莫名的错误,或者是电脑开机时总是提示"正在扫描和修复驱动器",那么就说明你的电脑磁盘的文件系统出现错误了,需要进行扫描与修复.下面艾薇百科就来 ...

  9. jbod ugood 磁盘驱动状态_LSI Storcli 工具使用

    使用实例 #查询RAID卡的详细信息. [root@localhost ~]# ./storcli64 /c0 show Generating detailed summary of the adap ...

最新文章

  1. RESTful API 设计规范精讲
  2. .net mvc mssql easyui treegrid
  3. 用HTML和CSS和JS构建跨平台桌面应用程序的开源库Electron的介绍以及搭建HelloWorld
  4. mogodb 的自定义函数定义及引用
  5. 线性表、顺序表以及ArrayList、Iterable、Collection、List中重要的方法
  6. Servlet 3.0异步处理可将服务器吞吐量提高十倍
  7. 前端学习(3250):一个简单的文件
  8. linux 中文意思,linux 中 ~/. 是什么意思
  9. Linux内核:进程上下文切换
  10. 沙盒机制和应用程序目录
  11. 微型orm fluentdata
  12. c# 十六进制转为字节_如何在C#中将具有十六进制内容的字节数组转换为具有十进制的字符串...
  13. Hive建表语句解释
  14. 【VISIO安装问题】无法安装64位版本的office,因为在您的PC上找到以下32位程序
  15. php 时间转换时间戳_php时间戳转换日期方法总结
  16. 高精度乘法+刘汝佳BigNumber高精度结构体
  17. edi许可证和ICP区别
  18. 关闭cidaemon进程的方法
  19. linux at91看门狗驱动设置
  20. CocosCreator黄金矿工资源工程文件

热门文章

  1. 一个莆田系医院网站提醒的浏览器插件
  2. 主动变被动9个例句_怒赞!这30个雅思口语地道表达和例句,让你秒变native speaker!...
  3. two.js实现地球绕太阳转,月亮绕地球转,兔子绕月球转
  4. 抓包工具(Fiddler4)
  5. stm32硬件消抖_STM32单片机按键消抖和FPGA按键消抖大全
  6. Hello World ! 节日快乐!
  7. HIVE厂牌艺人_Labelwarts Vol. 2:洛杉矶天才厂牌 Odd Future Records 的开始到结束
  8. 概说《TCP/IP详解 卷2》第2章 mbuf:存储器缓存
  9. 运营笔记:微信加粉没效果,看看你哪里做错了!
  10. android 二级联动列表,Linkage-RecyclerView