pfamscan 的使用_基础工具-HMMER用法

(二)使用蛋白质(核酸)序列搜索已构建HMM数据库

该方法为常用的功能注释方法。

构建HMM数据库。使用多序列比对文件，同上述命令即可完成构建。同时可以从Pfam、SMART等网站下载现成额HMM。举个例子，假如我有一批蛋白质序列，想做Pfam注释，看看有什么结构域，那么我可以去Pfam下载下述文件：

ftp://ftp.ebi.ac.uk/pub/databases/Pfam/releases/Pfam31.0/Pfam-A.hmm.gz

使用hmmscan搜索HMM数据库，命令如下：

hmmscan -E 0.00001 --domE 0.00001 --cpu 2 --noali --acc --notextw --domtblout pfam.tab Pfam-A.hmm test.pep.fa

三、输出结果介绍

主要介绍两种格式

--domtblout

--tblout

输出结果中分为两类一类是针对序列的(full sequence) ，另一类是针对domain的(主要基于一条序列存在多个domain)。这两种格式涉及到的每一列信息解释如下(英文原文大家看的可能更明白！)

(1) target name: The name of the target sequence or profile.

(2) accession: The accession of the target sequence or profile, or ’-’ if none.

(3) query name: The name of the query sequence or profile.

(4) accession: The accession of the query sequence or profile, or ’-’ if none.

(5) hmmfrom: The position in the hmm at which the hit starts.

(6) hmm to: The position in the hmm at which the hit ends.

(7) alifrom: The position in the target sequence at which the hit starts.

(8) ali to: The position in the target sequence at which the hit ends.

(9) envfrom: The position in the target sequence at which the surrounding envelope starts. 结构域的起始位置。

(10) env to: The position in the target sequence at which the surrounding envelope ends. 结构域的终止位置。

(11) sq len: The length of the target sequence..

(12) strand: The strand on which the hit was found (“-” when alifrom¿ali to).

(13) E-value: The expectation value (statistical significance) of the target, as above.

(14) score (full sequence): The score (in bits) for this hit. It includes the biased-composition correction.

(15) Bias (full sequence): The biased-composition correction, as above

(16) description of target: The remainder of the line is the target’s description line, as free text.

(17) c-Evalue: The “conditional E-value”, a permissive measure of how reliable this particular domain may be. The conditional E-value is calculated on a smaller search space than the independent Evalue. The conditional E-value uses the number of targets that pass the reporting thresholds. The null hypothesis test posed by the conditional E-value is as follows. Suppose that we believe that there is already sufficient evidence (from other domains) to identify the set of reported sequences as homologs of our query; now, how many additional domains would we expect to find with at least this particular domain’s bit score, if the rest of those reported sequences were random nonhomologous sequence (i.e. outside the other domain(s) that were sufficient to identified them as homologs in the first place)?

(18) i-Evalue: The “independent E-value”, the E-value that the sequence/profile comparison would have received if this were the only domain envelope found in it, excluding any others. This is a stringent measure of how reliable this particular domain may be. The independent E-value uses the total number of targets in the target database.

附：大伙可能对输出结果的envelope的含义比较陌生，但是它确是我们序列结构域所在的位置，不要误用成align的位置了。关于envelope这里涉及到一些算法，小编贴了一段原文，大伙可以用通俗的话说出自己的理解。

Envelope定义：The envelope defines a subsequence for which their is substantial probability mass supporting a homologous domain, whether or not a single discrete alignment can be identified. The envelope may extend beyond the endpoints of the MEA(maximum expected accuracy ) alignment, and in fact often does, for weakly scoring domains.

Envelope鉴定：Now, within each region, we will attempt to identify envelopes. An envelope is a subsequence of the target sequence that appears to contain alignment probability mass for a likely domain (one local alignment to the profile).When the region contains '1 expected domain, envelope identification is already done: the region’s start and end points are converted directly to the envelope coordinates of a putative domain.

There are a few cases where the region appears to contain more than one expected domain – where more than one domain is closely spaced on the target sequence and/or the domain scores are weak and the probability masses are ill-resolved from each other. These “multidomain regions”, when they occur, are passed off to an even more ad hoc resolution algorithm called stochastic traceback clustering. In stochastic traceback clustering, we sample many alignments from the posterior alignment ensemble, cluster those alignments according to their overlap in start/end coordinates, and pick clusters that sum up to sufficiently high probability. Consensus start and end points are chosen for each cluster of sampled alignments. These start/end points define envelopes.These envelopes identified by stochastic traceback clustering are not guaranteed to be nonoverlapping.It’s possible that there are alternative “solutions” for parsing the sequence into domains, when the correct parsing is ambiguous. HMMER will report all high-likelihood solutions, not just a single nonoverlapping parse.

It’s also possible (though rare) for stochastic clustering to identify no envelopes in the region.In a tabular output (--tblout) file, the number of regions that had to be subjected to stochastic traceback clustering is given in the column labeled clu. This ought to be a small number (often it’s zero). The number of envelopes identified by stochastic traceback clustering that overlap with other envelopes is in the column labeled ov. If this number is non-zero, you need to be careful when you interpret the details of alignments in the output, because HMMER is going to be showing overlapping alternative solutions. The total number of domain envelopes identified (either by the simple method or by stochastic traceback clustering) is in the column labeled env. It ought to be almost the same as the expectation and the number of regions

pfamscan 的使用_基础工具-HMMER用法相关推荐

ps文字换行_零基础一周内熟悉使用PS基础工具【Photoshop教程二】
零基础一周内熟悉使用PS基础工具[Photoshop教程一]这篇的后台数据显示有很多知友都有收藏了.由此可见现在的视频教程,网络上太多太多但,但好多知识都太"碎片化"今天学习这个技 ...
vue脚手架_基础API、Vue基本概念、vue-cli 脚手架、vue指令
vue脚手架_基础API 安装:vue-devtools 学习和调试vue必备之利器 - 官方插件安装: 打开Chrome浏览器 =>点击浏览器右上角小图标,按图示操作 2.进入扩展程序菜单 ...
Vue教程_基础(一)
目录章节地址 Vue教程_tips https://blog.csdn.net/weixin_46349544/article/details/124082287 Vue教程_基础(一) http ...
学习笔记：SpringCloud 微服务技术栈_实用篇①_基础知识
若文章内容或图片失效,请留言反馈.部分素材来自网络,若不小心影响到您的利益,请联系博主删除. 前言学习视频链接 SpringCloud + RabbitMQ + Docker + Redis + 搜 ...
学习笔记：Java 并发编程⑥_并发工具_JUC
若文章内容或图片失效,请留言反馈. 部分素材来自网络,若不小心影响到您的利益,请联系博主删除. 视频链接:https://www.bilibili.com/video/av81461839 配套资料: ...
python类装饰器详解-Python装饰器基础概念与用法详解
本文实例讲述了Python装饰器基础概念与用法.分享给大家供大家参考,具体如下: 装饰器基础前面快速介绍了装饰器的语法,在这里,我们将深入装饰器内部工作机制,更详细更系统地介绍装饰器的内容,并学习自 ...
Lunx运维监控_shark巨菜_基础篇
Lunx运维监控_shark巨菜_基础篇一.监控重要性单单从"监控"两个字来谈,范围之广可以涵盖我们生活的方方面面,我们生活和工作中处处可见视频监控的摄像机:机房中的电压电流监 ...
初学者也能看懂的 Vue2 源码中那些实用的基础工具函数
1. 前言大家好,我是若川.最近组织了源码共读活动,感兴趣的可以加我微信 ruochuan12 想学源码,极力推荐之前我写的<学习源码整体架构系列>jQuery.underscore.l ...
python数据可视化工具 pandas_Pandas数据可视化工具——Seaborn用法整理（下）
在前一篇文章 Pandas数据可视化工具--Seaborn用法整理(上),我们了解了如何使用这些Seaborn代码绘制分布图和分类图.在本文中,我们将继续讨论Seaborn提供的一些其他以绘制不同类型 ...

pfamscan 的使用_基础工具-HMMER用法

pfamscan 的使用_基础工具-HMMER用法相关推荐

最新文章

热门文章