链接:Canu FAQ

Q:
What resources does Canu require for a bacterial genome assembly(细菌基因组组装)?   A mammalian(哺乳类) assembly?
A:

Canu is designed to scale resources(自动测量系统硬件资源) to the system it runs on. It will report if the a system does not meet the minimum requirements for a given genome size.

Typically, a bacterial genome can be assembled in 1-10 cpu hours, depending on coverage (~20 min on 16-cores) and 4GB of ram (8GB is recommended). A mammalian genome (such as human) can be assembled in 10-25K cpu hours, depending on coverage (a grid environment is recommended) and at least one machine with 64GB of ram (128GB is recommended).

Q:
What parameters should I use for my genome? Sequencing type? (该用什么参数进行组装?)
A:

By default, Canu is designed to be universal(通用) on a large range of PacBio (C2-P6-C4) and Oxford Nanopore (R6-R9) data. You can adjust parameters to increase efficiency for your datatype. For example, for higher coverage PacBio datasets, especially from inbred(同系交配) samples, you can decrease the error rate (errorRate=0.013)(覆盖度足够的话可以降低errorrate,1.3%,从而保证更加精准). For recent Nanopore data (R9) 2D data, you can also decrease the default error rate (errorRate=0.013).

With R7 1D sequencing data, multiple rounds(多轮) of error correction are helpful. This should not be necessary for sequences over 85% identity. You can run just the correction from Canu with the options

-correct corOutCoverage=500 corMinCoverage=0 corMhapSensitivity=high

for 5-10 rounds, supplying the asm.correctedReads.fasta.gz output from round i-1 to round i. Assemble with

-nanopore-corrected <your data> errorRate=0.1 utgGraphDeviation=50

Q:
How do I run Canu on my SLURM/SGE/PBS/LSF/Torque system? (怎么在集群上运行canu)
A:
Canu will auto-detect and configure itself to submit on most grids. If your grid requires special options (such as a partition on SLURM or an account code on SGE, specify it with gridOptions="<your options list>" which will passed to the sheduler by Canu. If you have a grid system but prefer to run locally, specify useGrid=false (平时一般都是设置为false)
Q:
My asm.contigs.fasta is empty, why? (得到的contig文件是空的?)
A:

By default, canu will split the final output into three files:

asm.contigs.fasta
Everything which could be assembled and is part of the primary assembly, including both unique and repetitive elements. Each contig has several flags included on the fasta def line:
asm.bubbles.fasta
alternate paths in the graph which could not be merged into the primary assembly.
asm.unassembled.fasta
reads/tigs which could not be incorporated into the primary or bubble assemblies.

It is possible for tigs comprised of multiple reads to end up in asm.unassembled.fasta. The default filtering eliminates(消除了) anything with < 2 reads, shorter than 1000bp, or comprised of mostly a single sequence (>75%). The filtering is controlled by the contigFilter parameter which takes 5 values.

contigFilterminReadsminLengthsingleReadSpanlowCovSpanlowCovDepth

The default filtering is 2 1000 0.75 0.75 2. If you are assembling amplified data or viral data, it is possible your assembly will be flagged as unassembled. In those cases, you can turn off the filtering with the parameters

contigFilter="2 1000 1.0 1.0 2"

Q:
Why is my assembly is missing my favorite short plasmid X?
A:

The first step in Canu is to find high-error overlaps and generate corrected sequences for subsequent assembly. This is currently the fastest step in Canu. By default, only the longest 40X of data (based on the specified genome size) is used for correction. If you have a dataset with uneven coverage or small plasmids, correcting the longest 40X may not give you sufficient coverage of your genome/plasmid. In these cases, you can set

corOutCoverage=1000

Or any large value greater than your total input coverage which will correct and assemble all input data, at the expense of runtime. This option is also recommended for metagenomic datasets where all data is useful for assembly.

Q:
Why do I get only 30X of corrected data?
A:

By default, only the longest 40X of data (based on the specified genome size) is used for correction. Typically, some reads are trimmed during correction due to being chimeric or having erroneous sequence, resulting in a loss of 20-25% (30X output). You can force correction to be non-lossy by setting(数据全部使用、无损输出)

corMinCoverage=0

In which case the corrected reads output will be the same length as the input data, keeping any high-error unsupported bases. Canu will trim these in downstream steps before assembly.

Q:
What is the minimum coverage required to run Canu? (最小的覆盖度要求)
A:

We have found that on eukaryotic genomes(真核生物基因组) >=20X typically begins to outperform(胜过) current hybrid methods(混合方法). For low coverage datasets (<=30X) we recommend the following parameters

corMinCoverage=0 errorRate=0.035

For high-coverage datasets (typically >=60X) you can decrease the error rate since the higher number of reads should allow sufficient assembly from only the best subset

errorRate=0.013

However, the above is mainly an optimization for speed and will not affect your assembly continuity.

Q:
My genome is AT/GC rich, do I need to adjust parameters? (基因组AT或GC含量偏差比较大怎么设置参数?)
A:

On bacterial genomes, typically no(细菌的不需要设置). On repetitive genomes with AT<=25 or 75>=AT (or GC) the sequence biases the Jaccard estimate used by MHAP. In those cases setting

corMaxEvidenceErate=0.15

has been sufficient to correct for the bias in our testing. In general, with high coverage repetitive genomes(高覆盖率重复的基因组) (such as plants) it can be beneficial to set the above parameter as it will eliminate repetitive matches, speed up the assembly, and sometime improve unitigs.

转载于:https://www.cnblogs.com/leezx/p/5713157.html

Canu FAQ常见问题相关推荐

  1. 微信攻城三国怎么找服务器,攻城三国怎么玩 新手FAQ常见问题答案汇总[图]

    类型:策略卡牌 大小:269MB 评分:5.0 平台: 攻城三国怎么玩?很多小伙伴是第一次玩这种类型的游戏,下面友情小编为大家带来新手FAQ的常见问题答案汇总,看看能不能帮到大家哦~ 新手FAQ常见问 ...

  2. faq常见问题 html模板,新手FAQ(常见问题答疑)

    原标题:新手FAQ(常见问题答疑) 温馨提示 1.手机版玩家要记得[绑定手机账号]哟,这样当出现因卸载游戏等操作或游戏错误导致的账号丢数据丢失时,可以利用手机号码找回账号,此外没有其他手段找回被删除的 ...

  3. Waterdrop FAQ/常见问题

    前言 由于github老是打不开,转载部分waterdrop常见问题. 原文地址:https://github.com/InterestingLab/waterdrop/issues/267 wate ...

  4. 阿里云DataHub常见问题

    FAQ常见问题 1. 访问DataHub服务的域名是什么 地区Region外网Endpoint经典网络ECS Endpoint(金融云)VPC ECS Endpoint华东1(杭州)dh-cn-han ...

  5. 不懂FAQ页面怎么设计?一些产品FAQ页面模板展示!

    "FAQ"这个关键词可能很多人都听说过,但是如果不是行业内的人,恐怕很难理解它的含义. 什么是FAQ? FAQ是英语 Frequently Asked Questions的缩写.中 ...

  6. README 规范和项目文档规范

    1. README 规范 我们直接通过一个 README 模板,来看一下 README 规范中的内容: # 项目名称<!-- 写一段简短的话描述项目 -->## 功能特性<!-- 描 ...

  7. vector机器人 UPDATING YOUR VECTOR ACCOUNT 更新你的 VECTOR 帐户

    目录 Sections: 部分: Creating an account 创建一个帐户 Email address requirements 电邮地址要求 Username requirements ...

  8. vector机器人 PHOTOS TAKEN BY VECTOR 由 VECTOR 拍摄的照片

    目录 How to take a photo 如何拍照 View and export pictures taken by Vector 查看和导出由 Vector 拍摄的图片 The world t ...

  9. vector机器人 WHAT DO I USE THE VECTOR APP FOR? 我使用 VECTOR 应用程序做什么?

    目录 Sections: 部分: Learn about features and how to use them 了解特性以及如何使用它们 Meet Vector again and again 一 ...

  10. Rose Study

    公司大量使用Rose框架支持线上业务,不仅仅是它的实用性和方便的特点吸引你去学习,更是它优雅的设计. 但是,由于是开源的框架,所以,资料和示例程序都是有限的,这一篇陈臻老师的作品简单的介绍了Rose的 ...

最新文章

  1. 3d大爱心c语言程序,C语言控制台打印3D爱心图案
  2. 装箱算法 java_贪心算法装箱问题-Java代码
  3. 012_Comparable和Comparator实例
  4. C++与MATLAB数组的存储结构
  5. java if赋值语句_Java基础第3天+运算符(算术运算符、赋值运算符、比较运算符、逻辑运算符、位运算符、三元运算符)、Scanner键盘录入、if语句...
  6. VideoLAN、VLC 和 FFmpeg联合开发AV1 解码器 Dav1d
  7. asp.net 2.0 中引用Web.config内的连接字符串的方法
  8. vux 组件库首次使用安装
  9. 用ssh从ubuntu系统向ubuntu系统服务器发送文件
  10. jQuery 图像裁剪插件Jcrop
  11. 高德地图的测试key_如何获取高德地图api key
  12. TiDB 的现在和未来
  13. Nginx性能优化(十八)
  14. tkinter -- tcp
  15. 如何设置电脑可以qq远程桌面连接到服务器,qq如何实现远程访问家里的电脑?
  16. Spring整合Quartz集群部署
  17. 四种类型的数据节点 Znode
  18. form layui vue 和_vue和layui一起用好用吗?
  19. [哈希]PAT1039 Course List for Student
  20. 中国汽车业发展令决策层喜忧参半

热门文章

  1. c++:template使用中的常见报错
  2. python 最小二乘法库_利用python搞机器学习——最小二乘法 | 学步园
  3. linux拷贝多行 y a b,copy /b命令无缝合并多个文件
  4. php hidden属性,微信小程序关于组件的hidden属性的使用建议
  5. 自动驾驶的Pipline -- 如何打造自动驾驶的数据闭环?(中)
  6. 【ACM夏训】综合训练赛
  7. Impala 的特点
  8. Momentum动量法
  9. 递归计算二叉树的叶子节点个数
  10. 实战!Servlet简单实践,完成上次的任务