linux 运行ctl文件_Linux磁盘检测工具smartctl的使用和分析
1编写目的
在如今大数据的环境中,磁盘的性能和稳定性是非常重要的一个业务因素。在Linux系统中,smartctl是较为常用的磁盘检测工具。
本文基于Linux系统中smartctl进行分析,目的在于说明相关工具的使用,并对SMART(Self-Monitoring,
Analysis and Reporting Technology)做一些分析。
2术语、定义和缩略语
2.1术语、定义
本文使用的专用术语、定义,见表2.1。
表2.1
术语/定义
含义
SMART
Self-Monitoring, Analysis and Reporting Technology
2.2缩略语
本文件应用了以下缩略语,见表2.2。
表2.2
缩略语
原文
中文含义
SMART
Self-Monitoring,
Analysis and Reporting Technology
自监察分析及报告技术
3smartctl
smartctl是smartmontools-5.38-2.el5 rpm中的一个命令行工具,可以执行SMART任务:打印SMART self-test和error报告,开启或关闭SMART自动测试,触发磁盘self-test。
语法:
smartctl [options] device
device:
"/dev/hd[a-t]" IDE/ATA磁盘
"/dev/sd[a-z]" SCSI devices磁盘。注意,对于SATA磁盘,由于是通过libata
库来访问,所以要增加参数"-d ata"。
3.1[options]:
参数按照不同的类型来分类。
3.1.1显示信息 参数:
-h帮助信息
-V版本信息
-i打印基本信息(磁盘设备号、序列号、固件版本…)
-a打印磁盘所有的SMART信息
3.1.2运行时行为 参数:
-q TYPE指定输出的安静模式。
TYPE可以有3种选择:
eorsonly只打印错误日志。
slent有任何打印。
nserial不打印序列号
-d TYPE指定磁盘的类型。如果没有指定,smartctl会根据磁盘的名字来
猜测磁盘类型。
-T TYPE指定当发生错误时,smartctl的容忍程度,是否继续运行。
TYPE可以有4种选择:
conservative一有错就会退出
normal如果必须支持的SMART命令失败,则退出
permissive忽略一次必须支持的SMART命令失败
verypermissive忽略所有必须支持的SMART命令失败
-b TYPE指定当发生校验错误时,smartctl的动作。
TYPE有3种选择:
warn发出警告,继续执行
exit退出smartctl
ignore不发出告警,继续执行
-r TYPEsmartmontools开发人员相关。
-n POWERMODE指定当磁盘处于节能模式时,smartctl是否继续检查,
默认是不检查。
POWERMODE有4种选择:
never检查
sleep除了sleep模式,检查。
standby除了sleep或standby模式,检查。
idle除了sleep或standby或idle模式,见车。
3.1.3SMART功能开关 参数:
-s on/off打开或关闭磁盘的SMART功能
-o on/off打开或关闭SMART自动离线检测,该功能每4小时就会自动扫描磁盘是
否有缺陷。
-S on/off打开或关闭“自动保存厂商指定属性”功能。
3.1.4SMART读和显示数据 参数
-H报告磁盘的是否健康。如果报告不健康,则说明磁盘已经损坏或会在24小时
内损坏。
-c显示磁盘支持的普通SMART功能,以及这些功能当前的状态。
-A显示磁盘支持的厂商指定SMART特性。这些特性的编号从1-253,并且有指
定的名字。
-l TYPE指定显示的log类型。
TYPE有4种选择:
error只显示error log。
selftest只显示selftest log
selective只显示selective self-test log
directory只显示Log Directory
-v N,OPTION显示厂商指定SMART特性N时,使用厂商相关的显示方式。
-F TYPE设置smartctl的行为,当出现一些已知但还没有解决的硬件或软件bug时,
smartctl应该怎么做。
-P TYPE设置smartctl是否对磁盘使用数据库中已有的参数。
3.1.5SMART离线测试、自测试 参数
-t TEST立刻执行测试,可以和-C参数一起使用。
TEST可以有以下几个选择:
offline离线测试。可以在挂载文件系统的磁盘上使用
short短时间测试。可以在挂载文件系统的磁盘上使用。
long长时间测试。可以在挂载文件系统的磁盘上使用。
conveyance [ATA only]传输zi测试。可以在挂载文件系统的磁盘上使用。
select,
N-M
select, N+SIZE [ATA only]有选择性测试,测试磁盘的部分LBA。N表示
LBA编号,M表示结束LBA编号,SIZE表示测试的LBA
范围。
-C在captive模式下运行测试。
注意:(1)-C必须配合-t一起使用,但如果是-t offline,则-C不生效。
(2)-C会使得磁盘很忙,所以最好是在没有挂载文件系统的磁盘上使用。
-X中断no-captive模式下运行的测试。
3.2常用example
3.2.1查看当前整体健康状态
查看/dev/sda当前整体监控状态。PASSED表示健康,否则意味着磁盘已经故障,或很快就会发生故障。
smartctl -H /dev/sda
3.2.2查看所有信息
打印/dev/sda所有的SMART信息。
martctl -a /dev/sda
相当于依次执行:
smartctl –i
/dev/sda
smartctl -c
/dev/sda
smartctl -A
/dev/sda
smartctl -l
error /dev/sda
smartctl -l
selftest /dev/sda
smartctl -l selective /dev/sda
3.2.3开/关SMART功能
打开或关闭/dev/sda的SMART功能。
smartctl -s on/off
/dev/sda
查看当前SMART功能是否开启,可以使用–i参数。
smartctl -i /dev/sda
3.2.4离线测试
对/dev/sda进行离线测试,它的结果主要用来更新SMART属性。
smartctl -t
offline /dev/sda
3.2.5短时间测试
对/dev/sda进行短时间测试。
smartctl -t
short /dev/sda
3.2.5.1观察测试进度
通过-c参数,可以观察到测试的进度:
# smartctl -c /dev/sda
…
Self-test execution status: ( 242) Self-test
routine in progress...
20% of
test remaining.
…
3.2.5.2观察测试结果
通过-l selftest参数,可以看到/dev/sda测试的结果记录:
“#1”代表的那一次测试,Completed without error表示完成,没有错误。
“#2”代表的那一次测试,Aborted by host表示测试被用户终止,还有90%没有完成。
# smartctl -l selftest /dev/sda
...
Num
Test_Description Status Remaining LifeTime(hours) LBA_of_first_error
# 1 Short offlineCompleted without error00% 9535-
# 2
Extended offline Aborted by host 90% 9534
-
...
3.2.6查看SMART属性值
通过-A参数,可以看到/dev/sda
SMART属性值。
smartctl -A
/dev/sda
3.4SMART属性
使用smartctl -A /dev/sda能看到很多磁盘的SMART属性,可以知道磁盘是否健康。
下面是一个列表,可以知道每个属性的具体含义:
ID
Hex
Attribut
name
Description
01
0x01
Read Error Rate
(Vendor specific raw value.) Stores data related to the
rate of hardware read errors that occurred when reading data from a disk
surface. The raw value has different structure for different vendors and is
often not meaningful as a decimal number.
02
0x02
Throughput Performance
Overall (general) throughput performance of a hard disk
drive. If the value of this attribute is decreasing there is a high
probability that there is a problem with the disk.
03
0x03
Spin-Up Time
Average time of spindle spin up (from zero RPM to fully
operational [millisecs]).
04
0x04
Start/Stop Count
A tally of spindle start/stop cycles. The spindle turns
on, and hence the count is increased, both when the hard disk is turned on
after having before been turned entirely off (disconnected from power source)
and when the hard disk returns from having previously been put to sleep mode.
05
0x05
Reallocated Sectors Count
Count of reallocated sectors. When the hard drive finds a
read/write/verification error, it marks that sector as
"reallocated" and transfers data to a special reserved area (spare
area). This process is also known as remapping, and reallocated sectors are
called "remaps". The raw value normally represents a count of the
bad sectors that have been found and remapped. Thus, the higher the attribute
value, the more sectors the drive has had to reallocate. This allows a drive
with bad sectors to continue operation; however, a drive which has had any
reallocations at all is significantly more likely to fail in the near future.is forced to seek to the reserved area whenever a remap is
accessed. A workaround which will preserve drive speed at the expense of
capacity is to create aover the region which contains remaps and instruct theto not use that partition.
06
0x06
Read Channel Margin
Margin of a channel while reading data. The function of
this attribute is not specified.
07
0x07
Seek Error Rate
(Vendor specific raw value.) Rate of seek errors of the
magnetic heads. If there is a partial failure in the mechanical positioning
system, then seek errors will arise. Such a failure may be due to numerous
factors, such as damage to a servo, or thermal widening of the hard disk. The
raw value has different structure for different vendors and is often not
meaningful as a decimal number.
08
0x08
Seek Time Performance
Average performance of seek operations of the magnetic
heads. If this attribute is decreasing, it is a sign of problems in the
mechanical subsystem.
09
0x09
Count of hours in power-on state. The raw value of this
attribute shows total count of hours (or minutes, or seconds, depending on
manufacturer) in power-on state.
10
0x0A
Spin Retry Count
Count of retry of spin start attempts. This attribute
stores a total count of the spin start attempts to reach the fully
operational speed (under the condition that the first attempt was
unsuccessful). An increase of this attribute value is a sign of problems in
the hard disk mechanical subsystem.
11
0x0B
Recalibration RetriesorCalibration Retry Count
This attribute indicates the count that recalibration was
requested (under the condition that the first attempt was unsuccessful). An
increase of this attribute value is a sign of problems in the hard disk
mechanical subsystem.
12
0x0C
Power Cycle Count
This attribute indicates the count of full hard disk power
on/off cycles.
13
0x0D
Soft Read Error Rate
Uncorrected read errors reported to the operating system.
180
0xB4
Unused Reserved Block Count Total
"Pre-Fail" Attribute used at least in HP
devices.
183
0xB7
SATA Downshift Error Count
Western Digital and Samsung attribute.
184
0xb8
End-to-Enderror / IOEDC
This attribute is a part ofHewlett-Packard's
SMART IV technology, as well as part of other vendors' IO Error Detection and
Correction schemas, and it contains a count of parity errors which occur in
the data path to the media via the drive's cache RAM.
185
0xB9
Head Stability
Western Digital attribute.
186
0xBA
Induced Op-Vibration Detection
Western Digital attribute.
187
0xBB
Reported Uncorrectable Errors
The count of errors that could not be recovered using
hardware ECC.
188
0xBC
Command Timeout
The count of aborted operations due to HDD timeout.
Normally this attribute value should be equal to zero and if the value is far
above zero, then most likely there will be some serious problems with power
supply or an oxidized data cable.
189
0xBD
High Fly Writes
HDD
producers implement a Fly Height Monitor that attempts to provide additional
protections for write operations by detecting when a recording head is flying
outside its normal operating range. If an unsafe fly height condition is
encountered, the write process is stopped, and the information is rewritten
or reallocated to a safe region of the hard drive. This attribute indicates
the count of these errors detected over the lifetime of the drive.
This feature is implemented in most modern Seagate drivesand some of Western Digital’s drives, beginning with the WD Enterprise
WDE18300 and WDE9180 Ultra2 SCSI hard drives, and will be included on all
future WD Enterprise products.
190
0xBE
Airflow Temperature (WDC)resp.Airflow Temperature Celsius (HP)
Airflow temperature on Western Digital HDs (Same as temp.
[C2], but current value is 50 less for some models. Marked as obsolete.)
191
0xBF
G-sense Error Rate
The count of errors resulting from externally-induced
shock & vibration.
192
0xC0
Power-off Retract CountorEmergency Retract Cycle Count(Fujitsu)
Count of times the heads are loaded off the media. Heads
can be unloaded without actually powering off.
193
0xC1
Load Cycle CountorLoad/Unload Cycle Count(Fujitsu)
Count of
load/unload cycles into head landing zone position.
The typical lifetime rating for laptop (2.5-in) hard
drives is 300,000 to 600,000 load cycles.Some
laptop drives are programmed to unload the heads whenever there has not been
any activity for about five seconds.Many Linux installations write to the
file system a few times a minute in the background.As a result, there may be 100 or
more load cycles per hour, and the load cycle rating may be exceeded in less
than a year
194
0xC2
Temperatureresp.Temperature Celsius
Current internal temperature.
195
0xC3
Hardware ECC Recovered
(Vendor specific raw value.) The raw value has different
structure for different vendors and is often not meaningful as a decimal
number.
196
0xC4
Reallocation Event Count
Count of remap operations. The raw value of this attribute
shows the total count of attempts to transfer data from reallocated sectors
to a spare area. Both successful & unsuccessful attempts are counted.
197
0xC5
Current Pending Sector Count
Count of "unstable" sectors (waiting to be
remapped, because of read errors). If an unstable sector is subsequently read
successfully, this value is decreased and the sector is not remapped. Read
errors on a sector will not remap the sector (since it might be readable
later); instead, the drive firmware remembers that the sector needs to be
remapped, and remaps it the next time it's written.
198
0xC6
Uncorrectable Sector Countor
Offline Uncorrectableor
Off-Line Scan
Uncorrectable Sector Count
The total count of uncorrectable errors when
reading/writing a sector. A rise in the value of this attribute indicates
defects of the disk surface and/or problems in the mechanical subsystem.
199
0xC7
UltraDMA CRC Error Count
The count of errors in data transfer via the interface
cable as determined by ICRC (Interface Cyclic Redundancy Check).
200
0xC8
Multi-Zone Error Rate
The count of errors found when writing a sector. The
higher the value, the worse the disk's mechanical condition is.
200
0xC8
Write Error Rate(Fujitsu)
The total count of errors when writing a sector.
201
0xC9
Soft Read Error Rateor
TA Counter Detected
Count of off-track errors.
202
0xCA
Data Address Mark errorsor
TA Counter Increased
Count of Data Address Mark errors (or vendor-specific).
203
0xCB
Run Out Cancel
Count of ECC errors
204
0xCC
Soft ECC Correction
Count of errors corrected by software ECC
205
0xCD
Thermal Asperity Rate (TAR)
Count of errors due to high temperature.
206
0xCE
Flying Height
Height of heads above the disk surface. A flying height
that's too low increases the chances of a head crash while a flying height
that's too high increases the chances of a read/write error.
207
0xCF
Spin High Current
Amount ofused to spin up the drive.
208
0xD0
Spin Buzz
Count of buzz routines needed to spin up the drive due to
insufficient power.
209
0xD1
Offline Seek Performance
Drive’s seek performance during its internal tests.
210
0xD2
Unkonw
(found in a Maxtor 6B200M0 200GB and Maxtor 2R015H1 15GB
disks)
211
0xD3
Vibration During Write
Vibration During Write
212
0xD4
Shock During Write
Shock During Write
220
0xDC
Disk Shift
Distance the disk has shifted relative to the spindle
(usually due to shock or temperature). Unit of measure is unknown.
222
0xDE
Loaded Hours
Time spent operating under data load (movement of magnetic
head armature)
223
0xDF
Load/Unload Retry Count
Count of times head changes position.
224
0xE0
Load Friction
Resistance caused by friction in mechanical parts while
operating.
225
0xE1
Load/Unload Cycle Count
Total count of load cycles
226
0xE2
Load 'In'-time
Total time of loading on the magnetic heads actuator (time
not spent in parking area).
227
0xE3
Torque Amplification Count
Count of attempts to compensate for platter speed
variations
228
0xE4
Power-Off Retract Cycle
The count of times the magnetic armature was retracted
automatically as a result of cutting power.
230
0xE6
GMR Head Amplitude
Amplitude of "thrashing" (distance of repetitive
forward/reverse head motion)
231
0xE7
Temperature
Drive Temperature
232
0xE8
Endurance Remaining
Number of physical erase cycles completed on the drive as
a percentage of the maximum physical erase cycles the drive is designed to
endure
232
0xE8
Available Reserved Space
Intel SSD reports the number of available reserved space
as a percentage of reserved space in a brand new SSD.
233
0xE9
Power-On Hours
Number of hours elapsed in the power-on state.
233
0xE9
Media Wearout Indicator
Intel SSD reports a normalized value of 100 (when the SSD
is new) and declines to a minimum value of 1. It decreases while the NAND
erase cycles increase from 0 to the maximum-rated cycles.
240
0xF0
Head Flying Hours
Time while head is positioning
240
0xF0
Transfer Error Rate(Fujitsu)
Count of times the link is reset during a data transfer.
241
0xF1
Total LBAs Written
Total count of LBAs written
242
0xF2
Total LBAs Read
Total count of LBAs read.Some S.M.A.R.T. utilities will report a negative
number for the raw value since in reality it has 48 bits rather than 32.
250
0xFA
Read Error Retry Rate
Count of errors while reading from a disk
254
0xFE
Free Fall Protection
ount of "Free Fall Events" detected
3.5SMART self-test
使用smartctl –t offline/short/long可以指定磁盘进行自测。
offline:
这个是默认的自测。
short:
短时自测的目的是快速确认磁盘是否故障。
测试过程有很多项目,都是磁盘厂商自定义的,比如下面的项目:
a)电气测试项目,测试磁盘内部的电路。具体测试细节有磁盘厂商自己指定,比如:
A)缓存测试。
B)读、写电路测试。
C)读、写磁头测试。
b)寻道、伺服测试项目,测试磁盘在数据磁道上的寻找和伺服能。
c)读、校验测试项目,测试磁盘对部分或全盘的读能力。
long:
称为扩展的自测试。测试的项目和short类型,但是时间长得多。
linux 运行ctl文件_Linux磁盘检测工具smartctl的使用和分析相关推荐
- Linux磁盘检测工具smartctl的使用和分析
1 编写目的 在如今大数据的环境中,磁盘的性能和稳定性是非常重要的一个业务因素.在Linux系统中,smartctl是较为常用的磁盘检测工具. 本文基于Linux系统中smartct ...
- linux 运行ctl文件_linux journalctl 命令
目录 journalctl 用来查询 systemd-journald 服务收集到的日志.systemd-journald 服务是 systemd init 系统提供的收集系统日志的服务. 命令格式为 ...
- 磁盘检测工具smartctl
一.工具简介 Smartmontools是一种硬盘检测工具,通过控制和管理硬盘的SMART(Self Monitoring Analysis and Reporting Technology,自动检测 ...
- Linux应急处置/信息搜集/漏洞检测工具---附脚本下载
Linux应急处置/信息搜集/漏洞检测工具,支持基础配置/网络流量/任务计划/环境变量/用户信息/Services/bash/恶意文件/内核Rootkit/SSH/Webshell/挖矿文件/挖矿进程 ...
- 文件包含漏洞检测工具fimap
文件包含漏洞检测工具fimap 在Web应用中,文件包含漏洞(FI)是常见的漏洞.根据包含的文件不同,它分为本地文件包含漏洞(LFI)和远程文件包含漏洞(RFL).利用该漏洞,安全人员可以获取服务器的 ...
- Linux运行python文件出现以下错误:terminate called after throwing an instance of ‘std::runtime_error‘
Linux运行python文件出现以下错误:terminate called after throwing an instance of 'std::runtime_error'
- linux启动sh文件命令,Linux 运行 .sh 文件的两种方法
Linux 运行 .sh 文件的两种方法 文章作者:网友投稿 发布时间:2010-06-15 13:31:16 来源:网络 一个中等水平的Linux用户一定少不了经常执行.sh文件,当然了,你可以在图 ...
- Linux系统下10款文件和磁盘加密工具
本文我们将重点转向加密方法,因为我们为您的Linux机器提供了10个最佳文件和磁盘加密软件. 1.Tomb Tomb是一个免费的开源工具,可以轻松加密和备份GNU/Linux系统上的文件. 它由一个简 ...
- linux磁盘检测工具
介绍 什么是Smartmontools? Smartmontools是一种硬盘检测工具,通过控制和管理硬盘的SMART(Self Monitoring Analysis and Reporting T ...
最新文章
- 转【C#调用DLL的几种方法,包括C#调用C\C++\C#DLL】
- python 分类变量xgboost_如何用XGBoost做时间序列预测?
- caxa电子图板2018中文版
- openCV中的findHomography函数分析以及RANSAC算法的详解(源代码分析)
- 【NLP】BERT 模型与中文文本分类实践
- 设计模式四:策略模式
- Net窗体程序设计总结
- $.ajax()在IE9下的兼容性问题
- Memcache:set()
- java编译时绑定_为什么Java在编译时绑定变量?
- delphi实现延时的方法,很多人首先就想到用timer控件,这里我们不用timer控delphi直接用settimer函数实现延时的方法...
- 企业微信端开启debug模式
- Evasion 使用及实际免杀测试
- 如何高效设计游戏——从抽奖模型到圆桌算法(上)
- WPF 视觉树和逻辑树区别,以及其子节点的遍历过程。
- sql 纵向求和_SQL语句求和语句该怎么编写?有几种方法?
- 【脑电数据处理】electrophysiology and EEG(AP\LFP\ECoG\EEG)
- MIT Technology Review 2022年“全球十大突破性技术”解读
- 7-17 最长对称子串
- 博客管理系统php教程,Wblog博客程序管理系统
热门文章
- 恩智浦arm芯片Linux,基于ARM处理器的工业控制系列【恩智浦】
- ubuntu-firefox有网但是打不开网页的解决办法
- 笔记本支架有必要考虑购买吗
- 面经总结(大数据开发相关)
- 获取html隐藏元素,js获取隐藏元素的宽高
- libnuma详解(A NUMA API for LINUX)
- 虚幻4皮肤材质_Unreal Engine 4 —— Physically Based Materials
- http如何远程调用html页面,【Web】写个HTML页面去调试HTTP接口方便些
- 【Mysql】 Mac Mysql密码重置
- 个人简历之武侠风.2007.8