Haplotype-aware genotyping from noisy long reads



Motivation Current genotyping approaches for single nucleotide variations (SNVs) rely on short, relatively accurate reads from second generation sequencing devices. Presently, third generation sequencing platforms able to generate much longer reads are becoming more widespread. These platforms come with the significant drawback of higher sequencing error rates, which makes them ill-suited to current genotyping algorithms. However, the longer reads make more of the genome unambiguously mappable and typically provide linkage information between neighboring variants.

Results In this paper we introduce a novel approach for haplotype-aware genotyping from noisy long reads. We do this by considering bipartitions of the sequencing reads, corresponding to the two haplotypes. We formalize the computational problem in terms of a Hidden Markov Model and compute posterior genotype probabilities using the forward-backward algorithm. Genotype predictions can then be made by picking the most likely genotype at each site. Our experiments indicate that longer reads allow significantly more of the genome to potentially be accurately genotyped. Further, we are able to use both Oxford Nanopore and Pacific Biosciences sequencing data to independently validate millions of variants previously identified by short-read technologies in the reference NA12878 sample, including hundreds of thousands of variants that were not previously included in the high-confidence reference set.







Haplotype-aware genotyping from noisy long reads 单倍型识别的基因分型来自嘈杂的长读相关推荐

  1. Efficient local alignment discovery amongst noisy long reads

    有效的局部比对发现在嘈杂的长读 Long read sequencers portend the possibility of producing reference quality genomes ...

  2. CoLoRMap: Correcting Long Reads by Mapping short reads CoLoRMap:通过映射短读来纠正长读

    CoLoRMap: Correcting Long Reads by Mapping short reads CoLoRMap:通过映射短读来纠正长读 Motivation: 第二代测序技术为测序基因 ...

  3. Badread: simulation of error-prone long reads Badread:模拟容易出错的长读断

    背景 DNA测序平台 旨在测量DNA样本中的核苷酸(A.C.G和T)序列. Illumina公司生产的测序仪在过去十年的大部分时间里一直是主导技术,但他们的平台生成的序列片段(reads)相对较小(长 ...

  4. Genome Sequencing and Assembly by Long Reads in Plants植物基因组的长读测序与组装

    Genome Sequencing and Assembly by Long Reads in Plants 植物基因组的长读测序与组装 Abstract: Plant genomes generat ...

  5. LoRDEC:hybrid correction of long reads 长读的混合校正

    LoRDEC是2014年在法国蒙彼利埃大学的CNRS与赫尔辛基大学(芬兰)的Leena Salmela合作开发的一种生物信息学软件. LoRDEC处理来自第二代和第三代高通量测序仪的数据.这些数据称为 ...

  6. Hybrid de novo tandem repeat detection using short and long reads 使用短读和长读的混合从头到尾串联重复检测

    背景 串联重复序列作为基因组重排研究的热点之一,对遗传疾病的遗传背景有着重要的影响.许多用于参考序列串联重复检测的方法获得了高质量的结果.但是,在de novo上下文中,没有可用的参考序列,串联重复检 ...

  7. Jabba: hybrid error correction for long sequencing reads using maximal exact matches机译:Jabba:使用最大精

    Jabba: hybrid error correction for long sequencing reads using maximal exact matches 机译:Jabba:使用最大精确 ...

  8. Science发布基因组比对革新技术:泛基因组学映射工具Giraffe

    目前,基因测序普遍使用的DNA测序仪主要基于短读长测序技术,在获得基因组序列片段后,将其映射到参考基因组序列中来确定染色体位置,识别出其与基因组参照的差异.但完全依赖单一参考序列来鉴别具有遗传多样性的 ...

  9. 软件前沿:泛基因组学映射工具Giraffe

    软件前沿:泛基因组学映射工具Giraffe 作者:心如止水 要点: 软件概览 软件算法核心 软件安装和使用 hello,大家好,今天为大家带来关于软件前沿 | 泛基因组学映射工具Giraffe的超详细 ...


  1. 漫画:毕昇 JDK,重现了 “活字印刷术” 的传奇
  2. 数据库中的约束和三大范式
  3. 2022年美国大学生数学建模竞赛——Problem E:林业固碳
  4. 二叉树的四种遍历方式(递归和非递归双重实现)
  5. RxLifecycle详细解析
  6. PHP下载文件(隐藏真实的下载地址)
  7. SVM支撑向量机原理
  8. HTTP协议详解(经典)
  9. Cannot load php5apache2_4.dll into server解决办法;
  10. 数独超难题目_超难数独基本解题法
  11. 简单的java爬虫程序
  12. 编写非递归算法实现二叉树的中序遍历
  13. 豆瓣电台WP7客户端 开发记录2
  14. 图像旋转源程序c语言,图像旋转 - 红尘潇洒,独自前行,但尽人事,莫问前程 - OSCHINA - 中文开源技术交流社区...
  15. Mac文件管理工具:Path Finder
  16. Python:布尔表达式
  17. 应急资源大数据三维电子沙盘展示系统
  18. 基于 Netty 网络编程项目实战课程
  19. 基于EP4CE10F17C8的以太网数据回环(UDP)
  20. CentOS常见命令之远程拷贝(scp)


  1. 使用Basler相机SDK采集的显示图片
  2. 人脸对齐(Face Alignment)
  3. cpmp和pmp(cpmp和pmp哪个好考)
  4. JAVA根据模板导出PPTX
  5. 获取Alexa排名数据接口
  6. 关于UDK GameFramework的一点总结
  7. docker-comose搭建openldap + gitlab
  8. 腾讯财经 财经资讯 专题栏目 商业人生
  9. 计算机二级36套题解答百度云,全国计算机二级C选择题试题库第36套
  10. 模拟CMOS集成电路设计入门学习(2)