VariantAnnotator

简要说明

用途: 利用上下文信息注释识别的变异位点(variant calls)
分类: 变异位点操作工具
概要: 根据变异位点的背景信息(与功能注释相对)进行注释。目前有许多的注释模块(见注释模块一节)可供使用。

输入文件

用于注释的VCF文件和可选的BAM文件

输出文件

注释完毕的VCF文件

使用案例

HaplotypeCallerUnifiedGenotyper的结果中增加每个样本的深度和dbSNP ID信息。

java -jar GenomeAnalysisTK.jar \-R reference.fasta \-T VariantAnnotator \-I input.bam \-V input.vcf \-o output.vcf \-A Coverage \--dbsnp dbsnp.vcf

参数说明:

-R/--reference_sequence:参考基因组
-T/--analysis_type : 运行的工具
-I/--input_file: 和vcf相应的BAM文件
-o :输出文件
-V/--varaint: 输入的VCF文件
-A/--annotation: 要添加哪些注释项
--dbsnp: 已有的snp信息注释数据库

HaplotypeCaller和MuTect2也有-A选项,并且有些注释模块只能在HaplotypeCaller和MuTect2计算,例如StrandAlleleCountsBySample
如下是 -A可接的内容:

Standard annotations in the list below are marked with a '*'.
Available annotations for the VCF INFO field:AS_BaseQualityRankSumTestAS_FisherStrandAS_InbreedingCoeffAS_InsertSizeRankSumAS_MQMateRankSumTestAS_MappingQualityRankSumTestAS_QualByDepthAS_RMSMappingQualityAS_ReadPosRankSumTestAS_StrandOddsRatioAlleleBalanceBaseCounts*BaseQualityRankSumTest*ChromosomeCountsClippingRankSumTestClusteredReadPosition*Coverage*ExcessHet*FisherStrandFractionInformativeReadsGCContentGenotypeSummaries*HaplotypeScoreHardyWeinbergHomopolymerRun*InbreedingCoeffLikelihoodRankSumTestLowMQMVLikelihoodRatio*MappingQualityRankSumTestMappingQualityZeroNBaseCountPossibleDeNovo*QualByDepth*RMSMappingQuality*ReadPosRankSumTestSampleListSnpEffSpanningDeletions*StrandOddsRatioTandemRepeatAnnotatorTransmissionDisequilibriumTestVariantTypeAvailable annotations for the VCF FORMAT field:AlleleBalanceBySampleAlleleCountBySampleBaseCountsBySampleBaseQualitySumPerAlleleBySample*DepthPerAlleleBySampleDepthPerSampleHCMappingQualityZeroBySampleOxoGReadCountsStrandAlleleCountsBySampleStrandBiasBySampleAvailable classes/groups of annotations:AS_RMSAnnotationAS_RankSumTestAS_StandardAnnotationAS_StrandBiasTestActiveRegionBasedAnnotationBetaTestingAnnotationExperimentalAnnotationRMSAnnotationRankSumTestReducibleAnnotationRodRequiringAnnotationStandardAnnotationStandardHCAnnotationStandardSomaticAnnotationStandardUGAnnotationStrandBiasTestWorkInProgressAnnotation

注释模块

这是官方文档提供的注释模块:

Name Summary
AS_BaseQualityRankSumTest Allele-specific rank Sum Test of REF versus ALT base quality scores
AS_FisherStrand Allele-specific strand bias estimated using Fisher's Exact Test *
AS_InbreedingCoeff Allele-specific likelihood-based test for the inbreeding among samples
AS_InsertSizeRankSum Allele specific Rank Sum Test for insert sizes of REF versus ALT reads
AS_MQMateRankSumTest Allele specific Rank Sum Test for mate's mapping qualities of REF versus ALT reads
AS_MappingQualityRankSumTest Allele specific Rank Sum Test for mapping qualities of REF versus ALT reads
AS_QualByDepth Allele-specific call confidence normalized by depth of sample reads supporting the allele
AS_RMSMappingQuality Allele-specific Root Mean Square of the mapping quality of reads across all samples.
AS_ReadPosRankSumTest Allele-specific Rank Sum Test for relative positioning of REF versus ALT allele within reads
AS_StrandOddsRatio Allele-specific strand bias estimated by the Symmetric Odds Ratio test
AlleleBalance Allele balance across all samples
AlleleBalanceBySample Allele balance per sample
AlleleCountBySample Allele count and frequency expectation per sample
BaseCounts Count of A, C, G, T bases across all samples
BaseCountsBySample Count of A, C, G, T bases for each sample
BaseQualityRankSumTest Rank Sum Test of REF versus ALT base quality scores
BaseQualitySumPerAlleleBySample Sum of evidence in reads supporting each allele for each sample
ChromosomeCounts Counts and frequency of alleles in called genotypes
ClippingRankSumTest Rank Sum Test for hard-clipped bases on REF versus ALT reads
ClusteredReadPosition Detect clustering of variants near the ends of reads
Coverage Total depth of coverage per sample and over all samples.
DepthPerAlleleBySample Depth of coverage of each allele per sample
DepthPerSampleHC Depth of informative coverage for each sample.
ExcessHet Phred-scaled p-value for exact test of excess heterozygosity
FisherStrand Strand bias estimated using Fisher's Exact Test
FractionInformativeReads The fraction of reads deemed informative over the entire cohort
GCContent GC content of the reference around the given site
GenotypeSummaries Summarize genotype statistics from all samples at the site level
HaplotypeScore Consistency of the site with strictly two segregating haplotypes
HardyWeinberg Hardy-Weinberg test for transmission disequilibrium
HomopolymerRun Largest contiguous homopolymer run of the variant allele
InbreedingCoeff Likelihood-based test for the inbreeding among samples
LikelihoodRankSumTest Rank Sum Test of per-read likelihoods of REF versus ALT reads
LowMQ Proportion of low quality reads
MVLikelihoodRatio Likelihood of being a Mendelian Violation
MappingQualityRankSumTest Rank Sum Test for mapping qualities of REF versus ALT reads
MappingQualityZero Count of all reads with MAPQ = 0 across all samples
MappingQualityZeroBySample Count of reads with mapping quality zero for each sample
NBaseCount Percentage of N bases
OxoGReadCounts Count of read pairs in the F1R2 and F2R1 configurations supporting the reference and alternate alleles
PossibleDeNovo Existence of a de novo mutation in at least one of the given families
QualByDepth Variant call confidence normalized by depth of sample reads supporting a variant
RMSMappingQuality Root Mean Square of the mapping quality of reads across all samples.
ReadPosRankSumTest Rank Sum Test for relative positioning of REF versus ALT alleles within reads
SampleList List samples that are non-reference at a given site
SnpEff Top effect from SnpEff functional predictions
SpanningDeletions Fraction of reads containing spanning deletions
StrandAlleleCountsBySample Number of forward and reverse reads that support each allele
StrandBiasBySample Number of forward and reverse reads that support REF and ALT alleles
StrandOddsRatio Strand bias estimated by the Symmetric Odds Ratio test
TandemRepeatAnnotator Tandem repeat unit composition and counts per allele
TransmissionDisequilibriumTest Wittkowski transmission disequilibrium test
VariantType General category of variant

GATK之VariantAnnotator相关推荐

  1. 实验记录 | mutect问题详解:No tribble type was provided on the command line and the type of the file?

    出错详情: /home/xxzhang/workplace/software/java/jdk1.7.0_80/bin/java -Djava.io.tmpdir=./output_RNA/mutmp ...

  2. GATK 无法下载的问题

    GATK全称是 genome analysis toolkit,非常好的基因组处理软件. 下载步骤 第一步:利用360浏览器找到GATK的官网. 第二步:找下载地址,其实在网上搜一下,很多资料. 第三 ...

  3. gatk过滤_重测序2--看了不后悔的gatk-变异检测

    稍微对重测序分析有点了解的都知道gatk是干嘛的,二代变异检测的金标准,就像一个百宝箱一样,里面藏掖着众多的工具包,看文档都够看一两个月的,但是没有必要.最近事情特别多,不搞废话,变异检测流程可以参考 ...

  4. 满分室间质评之GATK Somatic SNV+Indel+CNV+SV(下)性能优化

    我们接上文:满分室间质评之GATK Somatic SNV+Indel+CNV+SV一文中实现了对于卫计委室间质评数据分析以及与满分结果的匹配.本文将着重解决,保证最终结果一致的情况下,如何优化分析性 ...

  5. GATK教程 / 体细胞短变异检测 (SNV+InDel)流程概览

    体细胞短变体检测 (SNV + InDel) Somatic short variant discovery (SNVs + Indels) 目的 在单个个体的一个或多个肿瘤样本中,识别体细胞短变异( ...

  6. 基因数据处理72之GATK安装成功

    1.下载: git clone https://github.com/broadgsa/gatk-protected.git 2.安装: git checkout 3.5 mvn clean pack ...

  7. 用GATK进行二代测序数据 SNP Calling 流程:(二)bwa比对和HaplotypeCaller 变异检测

    1. 创建基因组索引 bwa index genome.fa 2. 查看read group信息,按read group分组, 比对.合并,生成gvcf 由于数据太多,无法存储过多的中间文件,因此写了 ...

  8. 用GATK进行二代测序数据 SNP Calling 流程:(四)变异过滤

    GATK推荐的最好的过滤方式是用 VQSR功能,它通过机器学习算法来判断SNP的优劣,因此至少需要两个已存在的 SNP 数据集,一个是经过验证的高质量 SNP 数据集作为真集(如 HapMap),还需 ...

  9. 「GATK 4」如何提高HaplotyperCaller的效率

    GATK的HaplotypeCaller 应该是目前最常用的变异检测软件,尤其是在人类基因组上.不过HaplotypeCaller的速度相对于其他软件,例如bcftools, freeBayes 也是 ...

最新文章

  1. 六种方法实现CSS三栏布局
  2. 深入了解db file parallel read等待事件
  3. postgresql中自定义函数脚本的备份及恢复
  4. 在线IDE之关键字另色显示
  5. 验证客户端和服务端可以传输经SM4加密的密文数据,从而验证发送数据已使用服务器密码机进行SM4加密,而不是随便的字符串乱码
  6. oracle服务器结构01
  7. Layui 中 formSelects 的使用
  8. 《大话数据结构》学习笔记
  9. 【BSC】使用Python玩转PancakeSwap(入门篇)
  10. abandon connection, owner thread: DubboServerHandler错误原因
  11. 中国工业园区建设与运营市场发展状况与投资战略咨询报告2022-2028年
  12. 计算机word资料,怎样快速找到电脑中的Word文档
  13. 什么是interop
  14. 动态改变shiro的Principal属性
  15. 通过Kali Linux暴力破解WiFi密码
  16. 美团商家的数据指标体系是怎么做的?
  17. 将一个十六进制字符串转换为十进制数值的问题
  18. C2000 系列DSP使用Syscfg配置CLB模块记录
  19. Python解决线性规划问题
  20. vue3之provide(提供)与inject(注入)

热门文章

  1. html静态页面引用其他页面,Shtml完美解决静态页面内部调用其他页面(非Iframe、Object、Js方法)...
  2. python 运行时间 装饰器_python 装饰器统计某个函数的运行时间
  3. 如把联想电脑计算机图标放在桌面上,thinkpad电脑图标没了怎么恢复
  4. java死锁怎么用jvm调试,线程死锁演示,线程锁演示,模拟JVM的线程次序调度
  5. java 审计 漏洞函数_Java Web代码审计流程与漏洞函数
  6. python建立py文件夹过程_Pycharm创建python文件自动添加日期作者等信息(步骤详解)...
  7. format 函数包含_Python成为专业人士笔记-高级对象Format格式化
  8. python中有哪些重要的书写规则_一文读懂Python代码的书写规范
  9. 怎么把此电脑放到桌面_Win10我的电脑怎么放到桌面
  10. [2020-CVPR] Dynamic Region-Aware Convolution 论文简析