(二)使用蛋白质(核酸)序列搜索已构建HMM数据库

该方法为常用的功能注释方法。

构建HMM数据库。使用多序列比对文件,同上述命令即可完成构建。同时可以从Pfam、SMART等网站下载现成额HMM。举个例子,假如我有一批蛋白质序列,想做Pfam注释,看看有什么结构域,那么我可以去Pfam下载下述文件:

ftp://ftp.ebi.ac.uk/pub/databases/Pfam/releases/Pfam31.0/Pfam-A.hmm.gz

使用hmmscan搜索HMM数据库,命令如下:

hmmscan -E 0.00001 --domE 0.00001 --cpu 2  --noali  --acc --notextw --domtblout  pfam.tab Pfam-A.hmm test.pep.fa

三、输出结果介绍

主要介绍两种格式

--domtblout

--tblout

输出结果中分为两类一类是针对序列的(full sequence) ,另一类是针对domain的(主要基于一条序列存在多个domain)。这两种格式涉及到的每一列信息解释如下(英文原文大家看的可能更明白!)

(1) target name: The name of the target sequence or profile.

(2) accession: The accession of the target sequence or profile, or ’-’ if none.

(3) query name: The name of the query sequence or profile.

(4) accession: The accession of the query sequence or profile, or ’-’ if none.

(5) hmmfrom: The position in the hmm at which the hit starts.

(6) hmm to: The position in the hmm at which the hit ends.

(7) alifrom: The position in the target sequence at which the hit starts.

(8) ali to: The position in the target sequence at which the hit ends.

(9) envfrom: The position in the target sequence at which the surrounding envelope starts. 结构域的起始位置。

(10) env to: The position in the target sequence at which the surrounding envelope ends. 结构域的终止位置。

(11) sq len: The length of the target sequence..

(12) strand: The strand on which the hit was found (“-” when alifrom¿ali to).

(13) E-value: The expectation value (statistical significance) of the target, as above.

(14) score (full sequence): The score (in bits) for this hit. It includes the biased-composition correction.

(15) Bias (full sequence): The biased-composition correction, as above

(16) description of target: The remainder of the line is the target’s description line, as free text.

(17) c-Evalue: The “conditional E-value”, a permissive measure of how reliable this particular domain may be. The conditional E-value is calculated on a smaller search space than the independent Evalue. The conditional E-value uses the number of targets that pass the reporting thresholds. The null hypothesis test posed by the conditional E-value is as follows. Suppose that we believe that there is already sufficient evidence (from other domains) to identify the set of reported sequences as homologs of our query; now, how many additional domains would we expect to find with at least this particular domain’s bit score, if the rest of those reported sequences were random nonhomologous sequence (i.e. outside the other domain(s) that were sufficient to identified them as homologs in the first place)?

(18) i-Evalue: The “independent E-value”, the E-value that the sequence/profile comparison would have received if this were the only domain envelope found in it, excluding any others. This is a stringent measure of how reliable this particular domain may be. The independent E-value uses the total number of targets in the target database.

附:大伙可能对输出结果的envelope的含义比较陌生,但是它确是我们序列结构域所在的位置,不要误用成align的位置了。关于envelope这里涉及到一些算法,小编贴了一段原文,大伙可以用通俗的话说出自己的理解。

Envelope定义:The envelope defines a subsequence for which their is substantial probability mass supporting a homologous domain, whether or not a single discrete alignment can be identified. The envelope may extend beyond the endpoints of the MEA(maximum expected accuracy ) alignment, and in fact often does, for weakly scoring domains.

Envelope鉴定:Now, within each region, we will attempt to identify envelopes. An envelope is a subsequence of the target sequence that appears to contain alignment probability mass for a likely domain (one local alignment to the profile).When the region contains '1 expected domain, envelope identification is already done: the region’s start and end points are converted directly to the envelope coordinates of a putative domain.

There are a few cases where the region appears to contain more than one expected domain – where more than one domain is closely spaced on the target sequence and/or the domain scores are weak and the probability masses are ill-resolved from each other. These “multidomain regions”, when they occur, are passed off to an even more ad hoc resolution algorithm called stochastic traceback clustering. In stochastic traceback clustering, we sample many alignments from the posterior alignment ensemble, cluster those alignments according to their overlap in start/end coordinates, and pick clusters that sum up to sufficiently high probability. Consensus start and end points are chosen for each cluster of sampled alignments. These start/end points define envelopes.These envelopes identified by stochastic traceback clustering are not guaranteed to be nonoverlapping.It’s possible that there are alternative “solutions” for parsing the sequence into domains, when the correct parsing is ambiguous. HMMER will report all high-likelihood solutions, not just a single nonoverlapping parse.

It’s also possible (though rare) for stochastic clustering to identify no envelopes in the region.In a tabular output (--tblout) file, the number of regions that had to be subjected to stochastic traceback clustering is given in the column labeled clu. This ought to be a small number (often it’s zero). The number of envelopes identified by stochastic traceback clustering that overlap with other envelopes is in the column labeled ov. If this number is non-zero, you need to be careful when you interpret the details of alignments in the output, because HMMER is going to be showing overlapping alternative solutions. The total number of domain envelopes identified (either by the simple method or by stochastic traceback clustering) is in the column labeled env. It ought to be almost the same as the expectation and the number of regions

pfamscan 的使用_基础工具-HMMER用法相关推荐

  1. ps文字换行_零基础一周内熟悉使用PS基础工具【Photoshop教程二】

    零基础一周内熟悉使用PS基础工具[Photoshop教程一]这篇的后台数据显示有很多知友都有收藏了.由此可见现在的视频教程,网络上太多太多但,但好多知识都太"碎片化"今天学习这个技 ...

  2. vue脚手架_基础API、Vue基本概念、vue-cli 脚手架、vue指令

    vue脚手架_基础API 安装:vue-devtools 学习和调试vue必备之利器 - 官方插件 安装: 打开Chrome浏览器 =>点击浏览器右上角小图标,按图示操作 2.进入扩展程序菜单 ...

  3. Vue教程_基础(一)

    目录 章节 地址 Vue教程_tips https://blog.csdn.net/weixin_46349544/article/details/124082287 Vue教程_基础(一) http ...

  4. 学习笔记:SpringCloud 微服务技术栈_实用篇①_基础知识

    若文章内容或图片失效,请留言反馈.部分素材来自网络,若不小心影响到您的利益,请联系博主删除. 前言 学习视频链接 SpringCloud + RabbitMQ + Docker + Redis + 搜 ...

  5. 学习笔记:Java 并发编程⑥_并发工具_JUC

    若文章内容或图片失效,请留言反馈. 部分素材来自网络,若不小心影响到您的利益,请联系博主删除. 视频链接:https://www.bilibili.com/video/av81461839 配套资料: ...

  6. python类装饰器详解-Python装饰器基础概念与用法详解

    本文实例讲述了Python装饰器基础概念与用法.分享给大家供大家参考,具体如下: 装饰器基础 前面快速介绍了装饰器的语法,在这里,我们将深入装饰器内部工作机制,更详细更系统地介绍装饰器的内容,并学习自 ...

  7. Lunx运维监控_shark巨菜_基础篇

    Lunx运维监控_shark巨菜_基础篇 一.监控重要性 单单从"监控"两个字来谈,范围之广可以涵盖我们生活的方方面面,我们生活和工作中处处可见视频监控的摄像机:机房中的电压电流监 ...

  8. 初学者也能看懂的 Vue2 源码中那些实用的基础工具函数

    1. 前言 大家好,我是若川.最近组织了源码共读活动,感兴趣的可以加我微信 ruochuan12 想学源码,极力推荐之前我写的<学习源码整体架构系列>jQuery.underscore.l ...

  9. python数据可视化工具 pandas_Pandas数据可视化工具——Seaborn用法整理(下)

    在前一篇文章 Pandas数据可视化工具--Seaborn用法整理(上),我们了解了如何使用这些Seaborn代码绘制分布图和分类图.在本文中,我们将继续讨论Seaborn提供的一些其他以绘制不同类型 ...

最新文章

  1. python 和 matlab的caffe读数据细节
  2. S/4HANA中的销售计划管理
  3. linux 页面内容输出,Node.js 一个简单的页面输出
  4. [JavaWeb-Tomcat]web服务器软件_Tomcat介绍
  5. python点击按钮创建进程_python-创建进程的三种方式
  6. bzoj3631: [JLOI2014]松鼠的新家(LCA+差分)
  7. 函数命名空间,函数的名字
  8. pandas读取数据库,将结果使用matplotlib绘制成二维表格图片
  9. 国际首个!OpenV2X 开源社区成立,填补 5G 路侧开放基础架构(RSOI)空白
  10. 线性代数与空间解析几何重要知识点笔记
  11. mbr转gpt 无损 linux,磁盘MBR改成GPT|MBR无损转换GPT分区
  12. 圆周率小数点后1千位(附计算圆周率源代码)
  13. 【概念】区块链中账本是什么?通用区块链平台账本概念介绍,一个谁都能看懂的账本概念
  14. qt屏幕漫天雪花飘落
  15. 区块链大繁荣背后:我们需要引入「预言机」| 专访DOS团队
  16. linux. qt信号崩溃,【创龙AM4379 Cortex-A9试用体验】之I/O中断异步通知驱动程序+QT捕获Linux系统信号+测试信号通知...
  17. 破坏生产力的5种方法
  18. moment.js获取当天,本周,当月,当季,近n天开始结束时间
  19. 前端头像上传功能实现之base64图片/头像上传 详细解析2【扩展知识FileReader对象】
  20. 【SAP】查询所有用户信息并导出

热门文章

  1. android中setClickable,setEnabled,setFocusable的含义及区别
  2. 神经网络模型结果怎么看,神经网络模型怎么评估
  3. 在vue项目中使用html2canvas截图(固定区域截图)
  4. RTFM(Read The Fucking Manual)的意思(zt)
  5. 【计算机网络】IPv6
  6. 灵遁者引力理论,相对论的底层逻辑是什么?
  7. 导致计算机科学硕士和计算机科学理学硕士,诺丁汉大学计算机科学理学硕士研究生offer一枚...
  8. zzulioj1197 考试排名(一)(结构体专题)
  9. PS2017使用快速选择工具的时候因内存不足提示“要求96和8之间的整数,已插入最接近的数值”问题解决方案
  10. 《精益数据分析》-第二部分概括笔记