新购入的建兴ZETA 256G,在CentOS 7.2中,用PostgreSQL自带的fsync测试工具pg_test_fsync测试IOPS时,突然IO hang住了。
dmesg报了一堆这样的超时:

         res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
[  895.604149] ata1.00: status: { DRDY }
[  895.606940] ata1.00: failed command: WRITE FPDMA QUEUED
[  895.609389] ata1.00: cmd 61/08:e0:38:bd:06/00:00:00:00:00/40 tag 28 ncq 4096 outres 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
[  895.614144] ata1.00: status: { DRDY }
[  895.616516] ata1.00: failed command: WRITE FPDMA QUEUED
[  895.618665] ata1.00: cmd 61/10:e8:00:90:06/02:00:00:00:00/40 tag 29 ncq 270336 outres 40/00:01:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
[  895.622940] ata1.00: status: { DRDY }
[  895.625089] ata1.00: failed command: WRITE FPDMA QUEUED
[  895.627236] ata1.00: cmd 61/00:f0:00:8c:06/04:00:00:00:00/40 tag 30 ncq 524288 outres 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
[  895.631176] ata1.00: status: { DRDY }
[  895.633133] ata1: hard resetting link
[  895.937682] ata1: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
[  895.940816] ata1.00: ACPI cmd ef/10:03:00:00:00:a0 (SET FEATURES) filtered out
[  895.940830] ata1.00: ACPI cmd f5/00:00:00:00:00:a0 (SECURITY FREEZE LOCK) filtered out
[  895.941234] ata1.00: ACPI cmd ef/10:03:00:00:00:a0 (SET FEATURES) filtered out
[  895.941243] ata1.00: ACPI cmd f5/00:00:00:00:00:a0 (SECURITY FREEZE LOCK) filtered out
[  895.941314] ata1.00: configured for UDMA/133
[  895.941356] ata1.00: device reported invalid CHS sector 0
[  895.941362] ata1.00: device reported invalid CHS sector 0
[  895.941366] ata1.00: device reported invalid CHS sector 0
[  895.941369] ata1.00: device reported invalid CHS sector 0
[  895.941374] ata1.00: device reported invalid CHS sector 0
[  895.941377] ata1.00: device reported invalid CHS sector 0
[  895.941381] ata1.00: device reported invalid CHS sector 0
[  895.941384] ata1.00: device reported invalid CHS sector 0
[  895.941388] ata1.00: device reported invalid CHS sector 0
[  895.941392] ata1.00: device reported invalid CHS sector 0
[  895.941395] ata1.00: device reported invalid CHS sector 0
[  895.941399] ata1.00: device reported invalid CHS sector 0
[  895.941403] ata1.00: device reported invalid CHS sector 0
[  895.941408] ata1.00: device reported invalid CHS sector 0
[  895.941434] ata1: EH complete

现象和网上描述的类似,很多SSD有这样的问题。
https://bugzilla.kernel.org/show_bug.cgi?id=15573
https://communities.intel.com/thread/77801?start=0&tstart=0
http://www.cnblogs.com/welhzh/p/4469206.html
http://patchwork.ozlabs.org/patch/49365/
建议关闭ncq。
什么是NCQ?
http://baike.baidu.com/view/17501.htm
NCQ(Native Command Queuing,全速命令队列)是被设计用于改进在日益增加的负荷情况下硬盘的性能和稳定性的技术。当用户的应用程序发送多条指令到用户的硬盘,NCQ硬盘可以优化完成这些指令的顺序,从而降低机械负荷达到提升性能的目的。 NCQ技术是一种使硬盘内部优化工作负荷执行顺序,通过对内部队列中的命令进行重新排序实现智能数据管理,改善硬盘因机械部件而受到的各种性能制约。
貌似对SSD没什么用,所以是SSD的话,可以关闭它。
查看了一下,装载ncq的信息如下:

# dmesg|gerp ncq[    4.157792] ahci 0000:00:1f.2: flags: 64bit ncq sntf ilck pm led clo pio slum part ems apst 

解决办法:
禁用ncq,启动项中加入libata.force=noncq

[root@digoal ahci]# vi /etc/default/grub GRUB_CMDLINE_LINUX="rhgb quiet libata.force=noncq"
重启。
或者修改/boot/grub2/grub.cfg   加到rhgb quiet后面
libata.force=noncq 
(如果我有机械盘,又有SSD,怎么处理呢?)
(机械盘需要ncq,而SSD不需要NCQ。)
(此时需要patch libata的代码才行,针对硬盘型号来处理。)

针对不同的盘设置不同的queue_depth,设置为1和禁用ncq功能相当。

Disabling ncq by putting the following in /etc/conf.d/local.start.
echo 1 > /sys/block/sdX/device/queue_depth 

解释一下  libata.force=noncq  
通过查看libata的模块信息

[root@digoal ~]# modinfo libata
filename:       /lib/modules/3.10.0-327.el7.x86_64/kernel/drivers/ata/libata.ko
version:        3.00
license:        GPL
description:    Library module for ATA devices
author:         Jeff Garzik
rhelversion:    7.2
srcversion:     042B7B276FD3988FFBEFB88
depends:
intree:         Y
vermagic:       3.10.0-327.el7.x86_64 SMP mod_unload modversions
signer:         CentOS Linux kernel signing key
sig_key:        79:AD:88:6A:11:3C:A0:22:35:26:33:6C:0F:82:5B:8A:94:29:6A:B3
sig_hashalgo:   sha256
parm:           acpi_gtf_filter:filter mask for ACPI _GTF commands, set to filter out (0x1=set xfermode, 0x2=lock/freeze lock, 0x4=DIPM, 0x8=FPDMA non-zero offset, 0x10=FPDMA DMA Setup FIS auto-activate) (int)
parm:           force:Force ATA configurations including cable type, link speed and transfer mode (see Documentation/kernel-parameters.txt for details) (string)
parm:           atapi_enabled:Enable discovery of ATAPI devices (0=off, 1=on [default]) (int)
parm:           atapi_dmadir:Enable ATAPI DMADIR bridge support (0=off [default], 1=on) (int)
parm:           atapi_passthru16:Enable ATA_16 passthru for ATAPI devices (0=off, 1=on [default]) (int)
parm:           fua:FUA support (0=off [default], 1=on) (int)
parm:           ignore_hpa:Ignore HPA limit (0=keep BIOS limits, 1=ignore limits, using full disk) (int)
parm:           dma:DMA enable/disable (0x1==ATA, 0x2==ATAPI, 0x4==CF) (int)
parm:           ata_probe_timeout:Set ATA probing timeout (seconds) (int)
parm:           noacpi:Disable the use of ACPI in probe/suspend/resume (0=off [default], 1=on) (int)
parm:           allow_tpm:Permit the use of TPM commands (0=off [default], 1=on) (int)
parm:           atapi_an:Enable ATAPI AN media presence notification (0=0ff [default], 1=on) (int)

看到有一个force参数,它提示详见内核文档。

[root@digoal ~]# less /usr/share/doc/kernel-doc-3.10.0/Documentation/kernel-parameters.txt

找到了对应的解释

        libata.force=   [LIBATA] Force configurations.  The format is commaseparated list of "[ID:]VAL" where ID isPORT[.DEVICE].  PORT and DEVICE are decimal numbersmatching port, link or device.  Basically, it matchesthe ATA ID string printed on console by libata.  Ifthe whole ID part is omitted, the last PORT and DEVICEvalues are used.  If ID hasn't been specified yet, theconfiguration applies to all ports, links and devices.If only DEVICE is omitted, the parameter applies tothe port and all links and devices behind it.  DEVICEnumber of 0 either selects the first device or thefirst fan-out link behind PMP device.  It does notselect the host link.  DEVICE number of 15 selects thehost link and device attached to it.The VAL specifies the configuration to force.  As longas there's no ambiguity shortcut notation is allowed.For example, both 1.5 and 1.5G would work for 1.5Gbps.The following configurations can be forced.* Cable type: 40c, 80c, short40c, unk, ign or sata.Any ID with matching PORT is used.* SATA link speed limit: 1.5Gbps or 3.0Gbps.* Transfer mode: pio[0-7], mwdma[0-4] and udma[0-7].udma[/][16,25,33,44,66,100,133] notation is alsoallowed.* [no]ncq: Turn on or off NCQ.  # 和本文相关的部分。* nohrst, nosrst, norst: suppress hard, softand both resets.* rstonce: only attempt one reset duringhot-unplug link recovery* dump_id: dump IDENTIFY data.* atapi_dmadir: Enable ATAPI DMADIR bridge support* disable: Disable this device.If there are multiple matching configurations changingthe same attribute, the last one is used.

模块参数也可以在这里查看。

[root@digoal ~]# cd /sys/module/libata/parameters/
[root@digoal parameters]# ll
total 0
-rw-r--r-- 1 root root 4096 Dec 20 21:17 acpi_gtf_filter
-r--r--r-- 1 root root 4096 Dec 20 21:17 allow_tpm
-r--r--r-- 1 root root 4096 Dec 20 21:17 atapi_an
-r--r--r-- 1 root root 4096 Dec 20 21:17 atapi_dmadir
-r--r--r-- 1 root root 4096 Dec 20 21:17 atapi_enabled
-r--r--r-- 1 root root 4096 Dec 20 21:17 atapi_passthru16
-r--r--r-- 1 root root 4096 Dec 20 21:17 ata_probe_timeout
-r--r--r-- 1 root root 4096 Dec 20 21:17 dma
-r--r--r-- 1 root root 4096 Dec 20 21:17 fua
-rw-r--r-- 1 root root 4096 Dec 20 21:17 ignore_hpa
-r--r--r-- 1 root root 4096 Dec 20 21:17 noacpi

SSD 因 NCQ hang,failed command: WRITE FPDMA QUEUED / tag 28 ncq 4096 out相关推荐

  1. hard resetting link----softreset failed (device not ready)----failed command: READ FPDMA QUEUED

    linux故障处理 屏显大量如下关键字段 hard resetting link---- softreset failed (device not ready)---- failed command: ...

  2. linux系统ata1.00,Linux : ata: failed command: READ FPDMA QUEUED

    Got "READ FPDMA QUEUED" errors from "dmesg" output on your Linux machine? ata2.0 ...

  3. 关于Linux报错解决方案:READ FPDMA QUEUED

    关于Linux报错解决方案:READ FPDMA QUEUED 1 前段时间管理的linux系统,接连4台开机报错 READ FPDMA QUEUED (1) 第一台 (2) 第二台 2 解决方案没有 ...

  4. queued_在Linux上,诸如“ UnrecovData 10B8B BadCRC”和“失败的命令:READ FPDMA QUEUED”之类的消息有什么问题?...

    queued I keep seeing messages in dmesg as follows with "exception Emask 0x10" -> " ...

  5. 【实战经验分享】如何对SSD固态硬盘下发SCSI command?

    一.故事背景 最近在处理SSD客户端case时,需要在Windows环境直接下发SCSI command来复现客户问题, 这下可为难了小编, 小编之前并未深入了解SCSI相关内容.话说,事在人为,知识 ...

  6. 5-Error:failed to find Build Tools revision 28.0.0 rc1解决方案

    将app下面的build.gradle中的版本改为你安装的 sdk 版本: 转载于:https://www.cnblogs.com/zhumengdexiaobai/p/10295435.html

  7. Failed to find Build Tools revision 28.0.3

    报错信息 没有安装项目所需的SDK版本 解决方案 Tools-->Android SDK 勾选右下角的框,如何选择28.0.3,然后再Apply 点击OK 等待安装完成即可.

  8. Android studio 出现Error:failed to find Build Tools revision 28.0.0 rc1解决方案

    在Android studio中新建项目出现上图所示错误,解决方案为:在app->build.gradle中修改buildToolsVersion这一项的版本号 修改后的app->buil ...

  9. linux+jira无法进入网页,修复 Jira 无法访问

    更新 2019/04/10 调整定时任务频率以应对内存泄露过快问题 2019/03/30 增加 cron 任务重启 Unity Cache Server 2019/03/28 初次发布 问题 Jira ...

最新文章

  1. 学硕计算机306分调剂,考研学硕可调剂专硕 需符合两个条件
  2. React-状态提升
  3. Linux电源管理(5)_Hibernate和Sleep功能介绍【转】
  4. Yii查看SQL语句:getRawSql()
  5. Linux字符设备驱动框架
  6. 【Windows7系统新特性】
  7. SQL Server 索引和表体系结构(三)
  8. 安装VISTA我们应该选择哪种
  9. java 调用foxmail_Javamail简单使用案例
  10. Elon Mask又搞大事情:新公司要将人脑与机器连接,给大脑上传想法不再是科幻...
  11. 大数据shipin教程_大数据全套视频教程完整版
  12. 如何复制百度文库上的文本
  13. 考研日语线上笔记(一):考研日语203大纲陌生、易混词汇本
  14. php类 汉子转拼音 通讯录按字母顺序排列
  15. 星际战一直显示网络无法连接服务器,所有战网应用均无法连接到服务器,无法登陆...
  16. windows 删除设备和驱动器中你不要的图标
  17. 数据库系统实验4:SQL——SELECT查询操作
  18. 985大学计算机考研报录比,68:1!这所985率先公布2020考研报录比,部分专业数据惊人...
  19. Arduino制作俄罗斯方块小游戏(三)程序源码
  20. 【图形学】刚体的旋转

热门文章

  1. iphone连上wifi却上不了网_我想问问为何我的手机连上家里的wifi,可以用,但是将网线连接路由器与主机电脑时,电脑却上不了网...
  2. mysql根据语句自动实现索引_mysql 语句的索引和优化
  3. mac obs 录屏黑屏_差点被录屏软件搞死.......
  4. html5number最小值,JavaScript Number(数字)
  5. 计算机专业毕业后现状,计算机专业怎么样_毕业生道出现状_“千万”别学
  6. java中如何上送list集合_如何使用java中的list集合
  7. android 全局定时器,高通Android LED驱动移植-GPIO,内核定时器
  8. jQuery+CSS动态改变class
  9. SQL 查找重复记录
  10. 数学习题:求解不定方程a^2 + b^2 = 3025