Comprehensive evaluation of error correction methods for high-throughput sequencing data

高通量测序数据误差修正方法的综合评价

4.2.1 Illumina Tools
The input read sets were corrected using the 17 error correction tools that had shown good
accuracy in the previous evaluations or had been newly published at the time of running the
evaluations. Among these, the standalone error correction tools are BFC [5], BLESS [6], Blue [7], Coral
[8], ECHO [9], HiTEC [11], Fiona, Lighter [12], Musket [13], Quake [14], QuorUM [15], RACER [16], Reptile
[17], and Trowel [18]. The remaining three tools are parts of DNA assemblers, ALLPATHS-LG [21], SGA
[22], and SOAPdenovo [23].
For each error correction method, successive numbers were applied to the key parameters of the
tools, and multiple corrected output read sets were generated corresponding to each parameter. The
output read sets were assessed using SPECTACLE and the one that had the highest gain for substitutions,
insertions, and deletions was chosen. The maximum k-mer length for Quake was limited to 18 beyond
which the memory capacity of our server was exhausted.
ALLPATHS-LG, BFC, BLESS, Blue, Musket, Quake, QuorUM, RACER, Reptile, SGA, and SOAPec
succeeded in generating outputs for all the input read sets. Coral, HiTEC, Fiona, and Trowel failed to
correct errors in large genomes because of insufficient memory. ECHO had not finished after 70 hours
for the I4 and I5 read sets. Lighter finished correcting all the read sets but it made no correction for the
read sets with 10 X coverage.

4.2.1 Illumina公司工具

使用17种错误校正工具对输入读集进行校正,这些工具在以前的评估中显示出良好的准确性,或在运行评估时新发布。

其中,独立纠错工具有BFC [5], BLESS [6], Blue [18]0, Coral [18]2, ECHO [18]1, HiTEC [11], Fiona, Lighter [12], Musket [13], Quake [14], QuorUM [15], RACER [16], Reptile[17],和Trowel[18]。

其余三种工具分别是DNA装配器ALLPATHS-LG[21]、SGA[22]和SOAPdenovo[23]。

对于每一种纠错方法,对工具的关键参数进行逐次编号,并根据每个参数生成多个纠错后的输出读集。输出读集使用SPECTACLE进行评估,并选择在替换、插入和删除方面收益最高的那一个。

Quake的最大k-mer长度限制为18,超过这个限制,服务器的内存容量就会耗尽。

ALLPATHS-LG、BFC、BLESS、Blue、Musket、Quake、QuorUM、RACER、Reptile、SGA和SOAPec成功地为所有输入读集生成了输出。Coral、HiTEC、Fiona和Trowel未能纠正大基因组中的错误,因为它们的内存不足。

ECHO在I4和I5读组70小时后仍未完成。

Lighter完成了对所有读集的校正,但是它没有对覆盖10倍的读集进行校正。

4.2.2 TGS (PacBio and ONT) Tools
Widely used PacBio read error correction tools LoRDEC [29], LSC [30], PBcR [31], and Proovread[32] were evaluated using P1 and P2. No parameter tuning was needed for LSC, PBcR, and Proovread.
For LoRDEC, multiple output sets were generated by applying successive values for k-mer length and solid k-mer occurrence threshold, and result that gave the highest percentage similarity was chosen.

LSC could not be assessed using P2 because it had not finished after 70 hours.
Since ONT is a relatively newer technology, ONT read error correction technologies are just being explored and studied in detail.

Two of the most recent ONT read error correction technologies NaS [33] and NanoCorr [34] were evaluated using O1 and O2.

4.2.2 TGS (PacBio和ONT)工具

广泛使用的PacBio读错误校正工具LoRDEC[29]、LSC[30]、PBcR[31]和Proovread[32]使用P1和P2进行评估。

LSC、PBcR和Proovread不需要进行参数调优。对于LoRDEC,通过对k-mer长度和实体k-mer出现阈值应用连续值生成多个输出集,并选择相似度百分比最高的结果。LSC不能使用P2进行评估,因为它在70小时后还没有完成。

由于ONT是一项相对较新的技术,ONT读纠错技术还处于探索和研究阶段。两种最新的ONT read错误校正技术NaS[33]和NanoCorr[34]使用O1和O2进行了评估。

Comprehensive evaluation of error correction methods for high-throughput sequencing data相关推荐

  1. Jabba: hybrid error correction for long sequencing reads using maximal exact matches机译:Jabba:使用最大精

    Jabba: hybrid error correction for long sequencing reads using maximal exact matches 机译:Jabba:使用最大精确 ...

  2. Comparative assessment of long-read error correction software applied to Nanopore RNA-sequencing dat

    Comparative assessment of long-read error correction software applied to Nanopore RNA-sequencing dat ...

  3. Computational methods for analysis of single molecule sequencing data

    Computational methods for analysis of single molecule sequencing data    分析单分子测序数据的计算方法 Computing Sc ...

  4. Evaluation of long read error correction software 长读纠错软件的评估

    Evaluation of long read error correction software Laurent Bouri∗ , Dominique Lavenier† Project-Team ...

  5. Spelling Error Correction with Soft-Masked BERT

    使用Soft-Masked BERT纠正拼写错误 Shaohua Zhang 1 , Haoran Huang 1 , Jicong Liu 2 and Hang Li 1 1 ByteDance A ...

  6. HALC: High throughput algorithm for long read error correction

    Journal|[J]BMC BioinformaticsVolume 18, Issue 1. 2017. HALC: High throughput algorithm for long read ...

  7. NGS数据的Error correction方法

    NGS数据的Error correction方法 发表评论 2,371 A+ 所属分类:Genomics 现在进行error-correciton的算法有三种: k-spectrum-based.Su ...

  8. Bi-level error correction for PacBio long reads

    Bi-level error correction for PacBio long reads 双级错误校正PacBio长read 最新的测序技术,如太平洋生物科学公司(PacBio)和牛津纳米孔机器 ...

  9. 论文阅读:Overview of the NLPCC 2018 Shared Task: Grammatical Error Correction

    论文阅读:Overview of the NLPCC 2018 Shared Task: Grammatical Error Correction 1. 引言 2. 任务定义 3. 数据 3.1 训练 ...

最新文章

  1. flower.php,flowerlist.php
  2. 安装matplotlib和cx_Freeze
  3. kotlin sealed 中_7.8 Kotlin Sealed类
  4. LeetCode 421. 数组中两个数的最大异或值
  5. HTML5-寻路跟踪
  6. Peter's smokes -poj 2509
  7. 分布式系统关注点(21)——构建「易测试」系统的“六脉神剑”
  8. HiveSQL技术原理、优化[深度解析]
  9. 有道云笔记语音速记功能体验:让你在移动办公中解放双手
  10. Ruby on Rails本地安装方法
  11. css常用选择器选择器
  12. Activity及Dialog的全透明
  13. docker镜像命令
  14. oracle 久其报表,毕盛久其报表自动填报解决方案
  15. 沈阳移动打造“爱贝通”、“校讯通”业务助少年儿童健康成长
  16. css图片菜鸟教程,css 常用样式(分享)
  17. 管理计算机域的内置账户怎么取消,如何删除供来宾访问计算机或访问域的内置账?...
  18. latex_箭头上加斜杠
  19. [Python]游戏编程--人工智能1
  20. 《Xmind 用好思维导图走上开挂人生》记录

热门文章

  1. ICCV 2021 | 最新开源!多视角几何和注意力机制实现新视角合成
  2. 博士生在没有导师指导的情况下,该如何自己选题发 CVPR ?
  3. CNN可视化又添新作,南大开源Group-CAM:高效的显著图生成方法|CVPR2021
  4. 【车道线检测】一种基于神经网络+结构约束的车道线检测方法
  5. pl/sql块的基本语法
  6. OC中创建对象,存入数组,并且遍历对象
  7. 揭示生命的奥秘——生物信息学
  8. NLM 公布了一个新的重新设计的 PubMed 数据库
  9. Nature子刊 | 翟冰等造血干细胞移植中的肠道真菌菌群动态变化与临床结果分析(招聘博后、助研)...
  10. mSystems:青大苏晓泉阐述微生物组的Beta多样性-从全局比对到局部比对