GATK之VariantAnnotator
VariantAnnotator
简要说明
用途: 利用上下文信息注释识别的变异位点(variant calls)
分类: 变异位点操作工具
概要: 根据变异位点的背景信息(与功能注释相对)进行注释。目前有许多的注释模块(见注释模块一节)可供使用。
输入文件
用于注释的VCF文件和可选的BAM文件
输出文件
注释完毕的VCF文件
使用案例
对HaplotypeCaller
或UnifiedGenotyper
的结果中增加每个样本的深度和dbSNP ID信息。
java -jar GenomeAnalysisTK.jar \-R reference.fasta \-T VariantAnnotator \-I input.bam \-V input.vcf \-o output.vcf \-A Coverage \--dbsnp dbsnp.vcf
参数说明:
-R/--reference_sequence:参考基因组
-T/--analysis_type : 运行的工具
-I/--input_file: 和vcf相应的BAM文件
-o :输出文件
-V/--varaint: 输入的VCF文件
-A/--annotation: 要添加哪些注释项
--dbsnp: 已有的snp信息注释数据库
注HaplotypeCaller和MuTect2也有-A选项,并且有些注释模块只能在HaplotypeCaller和MuTect2计算,例如StrandAlleleCountsBySample
如下是 -A可接的内容:
Standard annotations in the list below are marked with a '*'.
Available annotations for the VCF INFO field:AS_BaseQualityRankSumTestAS_FisherStrandAS_InbreedingCoeffAS_InsertSizeRankSumAS_MQMateRankSumTestAS_MappingQualityRankSumTestAS_QualByDepthAS_RMSMappingQualityAS_ReadPosRankSumTestAS_StrandOddsRatioAlleleBalanceBaseCounts*BaseQualityRankSumTest*ChromosomeCountsClippingRankSumTestClusteredReadPosition*Coverage*ExcessHet*FisherStrandFractionInformativeReadsGCContentGenotypeSummaries*HaplotypeScoreHardyWeinbergHomopolymerRun*InbreedingCoeffLikelihoodRankSumTestLowMQMVLikelihoodRatio*MappingQualityRankSumTestMappingQualityZeroNBaseCountPossibleDeNovo*QualByDepth*RMSMappingQuality*ReadPosRankSumTestSampleListSnpEffSpanningDeletions*StrandOddsRatioTandemRepeatAnnotatorTransmissionDisequilibriumTestVariantTypeAvailable annotations for the VCF FORMAT field:AlleleBalanceBySampleAlleleCountBySampleBaseCountsBySampleBaseQualitySumPerAlleleBySample*DepthPerAlleleBySampleDepthPerSampleHCMappingQualityZeroBySampleOxoGReadCountsStrandAlleleCountsBySampleStrandBiasBySampleAvailable classes/groups of annotations:AS_RMSAnnotationAS_RankSumTestAS_StandardAnnotationAS_StrandBiasTestActiveRegionBasedAnnotationBetaTestingAnnotationExperimentalAnnotationRMSAnnotationRankSumTestReducibleAnnotationRodRequiringAnnotationStandardAnnotationStandardHCAnnotationStandardSomaticAnnotationStandardUGAnnotationStrandBiasTestWorkInProgressAnnotation
注释模块
这是官方文档提供的注释模块:
Name | Summary |
---|---|
AS_BaseQualityRankSumTest | Allele-specific rank Sum Test of REF versus ALT base quality scores |
AS_FisherStrand | Allele-specific strand bias estimated using Fisher's Exact Test * |
AS_InbreedingCoeff | Allele-specific likelihood-based test for the inbreeding among samples |
AS_InsertSizeRankSum | Allele specific Rank Sum Test for insert sizes of REF versus ALT reads |
AS_MQMateRankSumTest | Allele specific Rank Sum Test for mate's mapping qualities of REF versus ALT reads |
AS_MappingQualityRankSumTest | Allele specific Rank Sum Test for mapping qualities of REF versus ALT reads |
AS_QualByDepth | Allele-specific call confidence normalized by depth of sample reads supporting the allele |
AS_RMSMappingQuality | Allele-specific Root Mean Square of the mapping quality of reads across all samples. |
AS_ReadPosRankSumTest | Allele-specific Rank Sum Test for relative positioning of REF versus ALT allele within reads |
AS_StrandOddsRatio | Allele-specific strand bias estimated by the Symmetric Odds Ratio test |
AlleleBalance | Allele balance across all samples |
AlleleBalanceBySample | Allele balance per sample |
AlleleCountBySample | Allele count and frequency expectation per sample |
BaseCounts | Count of A, C, G, T bases across all samples |
BaseCountsBySample | Count of A, C, G, T bases for each sample |
BaseQualityRankSumTest | Rank Sum Test of REF versus ALT base quality scores |
BaseQualitySumPerAlleleBySample | Sum of evidence in reads supporting each allele for each sample |
ChromosomeCounts | Counts and frequency of alleles in called genotypes |
ClippingRankSumTest | Rank Sum Test for hard-clipped bases on REF versus ALT reads |
ClusteredReadPosition | Detect clustering of variants near the ends of reads |
Coverage | Total depth of coverage per sample and over all samples. |
DepthPerAlleleBySample | Depth of coverage of each allele per sample |
DepthPerSampleHC | Depth of informative coverage for each sample. |
ExcessHet | Phred-scaled p-value for exact test of excess heterozygosity |
FisherStrand | Strand bias estimated using Fisher's Exact Test |
FractionInformativeReads | The fraction of reads deemed informative over the entire cohort |
GCContent | GC content of the reference around the given site |
GenotypeSummaries | Summarize genotype statistics from all samples at the site level |
HaplotypeScore | Consistency of the site with strictly two segregating haplotypes |
HardyWeinberg | Hardy-Weinberg test for transmission disequilibrium |
HomopolymerRun | Largest contiguous homopolymer run of the variant allele |
InbreedingCoeff | Likelihood-based test for the inbreeding among samples |
LikelihoodRankSumTest | Rank Sum Test of per-read likelihoods of REF versus ALT reads |
LowMQ | Proportion of low quality reads |
MVLikelihoodRatio | Likelihood of being a Mendelian Violation |
MappingQualityRankSumTest | Rank Sum Test for mapping qualities of REF versus ALT reads |
MappingQualityZero | Count of all reads with MAPQ = 0 across all samples |
MappingQualityZeroBySample | Count of reads with mapping quality zero for each sample |
NBaseCount | Percentage of N bases |
OxoGReadCounts | Count of read pairs in the F1R2 and F2R1 configurations supporting the reference and alternate alleles |
PossibleDeNovo | Existence of a de novo mutation in at least one of the given families |
QualByDepth | Variant call confidence normalized by depth of sample reads supporting a variant |
RMSMappingQuality | Root Mean Square of the mapping quality of reads across all samples. |
ReadPosRankSumTest | Rank Sum Test for relative positioning of REF versus ALT alleles within reads |
SampleList | List samples that are non-reference at a given site |
SnpEff | Top effect from SnpEff functional predictions |
SpanningDeletions | Fraction of reads containing spanning deletions |
StrandAlleleCountsBySample | Number of forward and reverse reads that support each allele |
StrandBiasBySample | Number of forward and reverse reads that support REF and ALT alleles |
StrandOddsRatio | Strand bias estimated by the Symmetric Odds Ratio test |
TandemRepeatAnnotator | Tandem repeat unit composition and counts per allele |
TransmissionDisequilibriumTest | Wittkowski transmission disequilibrium test |
VariantType | General category of variant |
GATK之VariantAnnotator相关推荐
- 实验记录 | mutect问题详解:No tribble type was provided on the command line and the type of the file?
出错详情: /home/xxzhang/workplace/software/java/jdk1.7.0_80/bin/java -Djava.io.tmpdir=./output_RNA/mutmp ...
- GATK 无法下载的问题
GATK全称是 genome analysis toolkit,非常好的基因组处理软件. 下载步骤 第一步:利用360浏览器找到GATK的官网. 第二步:找下载地址,其实在网上搜一下,很多资料. 第三 ...
- gatk过滤_重测序2--看了不后悔的gatk-变异检测
稍微对重测序分析有点了解的都知道gatk是干嘛的,二代变异检测的金标准,就像一个百宝箱一样,里面藏掖着众多的工具包,看文档都够看一两个月的,但是没有必要.最近事情特别多,不搞废话,变异检测流程可以参考 ...
- 满分室间质评之GATK Somatic SNV+Indel+CNV+SV(下)性能优化
我们接上文:满分室间质评之GATK Somatic SNV+Indel+CNV+SV一文中实现了对于卫计委室间质评数据分析以及与满分结果的匹配.本文将着重解决,保证最终结果一致的情况下,如何优化分析性 ...
- GATK教程 / 体细胞短变异检测 (SNV+InDel)流程概览
体细胞短变体检测 (SNV + InDel) Somatic short variant discovery (SNVs + Indels) 目的 在单个个体的一个或多个肿瘤样本中,识别体细胞短变异( ...
- 基因数据处理72之GATK安装成功
1.下载: git clone https://github.com/broadgsa/gatk-protected.git 2.安装: git checkout 3.5 mvn clean pack ...
- 用GATK进行二代测序数据 SNP Calling 流程:(二)bwa比对和HaplotypeCaller 变异检测
1. 创建基因组索引 bwa index genome.fa 2. 查看read group信息,按read group分组, 比对.合并,生成gvcf 由于数据太多,无法存储过多的中间文件,因此写了 ...
- 用GATK进行二代测序数据 SNP Calling 流程:(四)变异过滤
GATK推荐的最好的过滤方式是用 VQSR功能,它通过机器学习算法来判断SNP的优劣,因此至少需要两个已存在的 SNP 数据集,一个是经过验证的高质量 SNP 数据集作为真集(如 HapMap),还需 ...
- 「GATK 4」如何提高HaplotyperCaller的效率
GATK的HaplotypeCaller 应该是目前最常用的变异检测软件,尤其是在人类基因组上.不过HaplotypeCaller的速度相对于其他软件,例如bcftools, freeBayes 也是 ...
最新文章
- 六种方法实现CSS三栏布局
- 深入了解db file parallel read等待事件
- postgresql中自定义函数脚本的备份及恢复
- 在线IDE之关键字另色显示
- 验证客户端和服务端可以传输经SM4加密的密文数据,从而验证发送数据已使用服务器密码机进行SM4加密,而不是随便的字符串乱码
- oracle服务器结构01
- Layui 中 formSelects 的使用
- 《大话数据结构》学习笔记
- 【BSC】使用Python玩转PancakeSwap(入门篇)
- abandon connection, owner thread: DubboServerHandler错误原因
- 中国工业园区建设与运营市场发展状况与投资战略咨询报告2022-2028年
- 计算机word资料,怎样快速找到电脑中的Word文档
- 什么是interop
- 动态改变shiro的Principal属性
- 通过Kali Linux暴力破解WiFi密码
- 美团商家的数据指标体系是怎么做的?
- 将一个十六进制字符串转换为十进制数值的问题
- C2000 系列DSP使用Syscfg配置CLB模块记录
- Python解决线性规划问题
- vue3之provide(提供)与inject(注入)
热门文章
- html静态页面引用其他页面,Shtml完美解决静态页面内部调用其他页面(非Iframe、Object、Js方法)...
- python 运行时间 装饰器_python 装饰器统计某个函数的运行时间
- 如把联想电脑计算机图标放在桌面上,thinkpad电脑图标没了怎么恢复
- java死锁怎么用jvm调试,线程死锁演示,线程锁演示,模拟JVM的线程次序调度
- java 审计 漏洞函数_Java Web代码审计流程与漏洞函数
- python建立py文件夹过程_Pycharm创建python文件自动添加日期作者等信息(步骤详解)...
- format 函数包含_Python成为专业人士笔记-高级对象Format格式化
- python中有哪些重要的书写规则_一文读懂Python代码的书写规范
- 怎么把此电脑放到桌面_Win10我的电脑怎么放到桌面
- [2020-CVPR] Dynamic Region-Aware Convolution 论文简析