What is quality control?

质控是从数据删除可辨认错误从而提高数据质量的过程,是拿到数据后的第一步工作。

How critical is quality control?

The more unknowns about the genome under study, the more important it is to correct any errors.
When aligning against well-studied and understood genomes, we can recognize and identify errors by their
alignments. When assembling a de-novo genome, errors can derail the process; hence, it is more
important to apply a higher stringency for filtering.

When do we perform quality control?

Quality control is performed at different stages
Pre-alignment: “raw data” - the protocols are the same regardless of what analysis will follow
Post-alignment: “data? filtering” - the protocols are specific to the analysis that is being performed.

How reliable are QC tools?

Does quality control introduce errors?

How does read quality trimming work?

Originally, the reliability of sequencing decreased along the read. A common correction is to work backwards from the end of each read and remove low quality measurements from it.? This is called trimming

Why do we need to trim adapters?

How do we trim adapters?

Trim adapters with trimmomatic :

trimmomatic SE SRR519926_1. fastq output. fq ILLUMINACLIP: adapter. fa: 2: 30: 5

Trimming adapter sequences - is it necessary?

Removal of adapter sequences in a process called read trimming, or clipping, is one of the first steps in analyzing NGS data. With more than 30 published adapter trimming tools there is a more than large choice for the appropriate tool. Yet, there is a debate whether this step really is as important as the number of tools suggests, or whether it is possible to skip this time-consuming step for many NGS applications.

Why do adapters contaminate my sequences?
Adapters have to be ligated to every single DNA molecule during library preparation. For Illumina short read sequencing, the corresponding protocols involve (in most cases) a DNA fragmentation step, followed by the ligation of certain oligonucleotides to the 5’ and 3’ ends. These 5’ and 3’ adapter sequences have important functions in Illumina sequencing, since they hold barcoding sequences, forward/reverse primers (for paired-end sequencing) and the important binding sequences for immobilizing the fragments to the flowcell and allowing bridge-amplification.

When are adapters sequences observed in the reads?
In common short read sequencing, the DNA insert (original molecule to be sequenced) is downstream from the read primer, meaning that the 5’ adapters will not appear in the sequenced read. But, if the fragment is shorter than the number of bases sequenced, one will sequence into the 3’ adapter. To make it clear: In Illumina sequencing, adapter sequences will only occur at the 3’ end of the read and only if the DNA insert is shorter than the number of sequencing cycles (see picture below)!

How often that happens largely depends on the used NGS protocol. Think about it: How often will you sequence into the 3’ adapters when performing common RNA-Seq? After mRNA enrichment, cDNA creation (using a reverse transcriptase) and DNA fragmentation the protocols typically involve a size selection. When using a miSeq with 2x300 paired-end mode, one will select molecules that are longer than the read length, in our example greater than 600 nucleotides in length. However, it is technically impossible to obtain a specific fragment size, but one will rather get a distribution of fragment lengths (see picture). Thus, one will also obtain a certain fraction of adapter contamination for large fragment sizes. For RNA-Seq you will observe that only 0.2 - 2% of reads contain adapter sequences.

Summary
Adapter contamination will lead to NGS alignment errors and an increased number of unaligned reads, since the adapter sequences are synthetic and do not occur in the genomic sequence. There are applications (e.g. small RNA sequencing) where adapter trimming is highly necessary. With a fragment size of around 24 nucleotides, one will definitely sequence into the 3’ adapter. But there are also applications (transcriptome sequencing, whole genome sequencing, etc.) where adapter contamination can be expected to be so small (due to an appropriate size selection) that one could consider to skip the adapter removal and thereby save time and efforts.

Quality control of sequencing data相关推荐

  1. Computational methods for analysis of single molecule sequencing data

    Computational methods for analysis of single molecule sequencing data    分析单分子测序数据的计算方法 Computing Sc ...

  2. Comprehensive evaluation of error correction methods for high-throughput sequencing data

    Comprehensive evaluation of error correction methods for high-throughput sequencing data 高通量测序数据误差修正 ...

  3. A Crowdsourcing Method for Correcting Sequencing Errors for the Third-generation Sequencing Data 一种用

    A Crowdsourcing Method for Correcting Sequencing Errors for the Third-generation Sequencing Data 一种用 ...

  4. Denoising DNA deep sequencing data—high-throughput sequencing errors and their correction

    去噪DNA深度测序数据,高通量测序误差及其校正 Abstract 描述常见高通量测序平台产生的错误并从技术人工制品中识别出真正的遗传变异是两个相互依赖的步骤,对许多分析如单核苷酸变异调用.单倍型推断. ...

  5. 【DoubletFinder】predicts doublets in single-cell RNA sequencing data

    remotes::install_github('chris-mcginnis-ucsf/DoubletFinder') DF共分成四步: 需要输入的参数: (1)seu ~ fully proces ...

  6. html页面text固定长度,HTML中input type=text和type=password 显示的长度不一样

    springmvc下js控制表单提交(表单提交前检验,提交后获取json返回值) 这个问题我搞了四天,终于搞懂.因为对js很不熟悉.郁闷的是后台代码出错总可以设置断点调试,前端js代码出错只能通过浏览 ...

  7. 宏基因组大数据分析的质量控制流程规范

    宏基因组大数据分析的质量控制流程规范 郑广勇1,杨桢1,曹瑞芳1,刘婉2,李亦学1,2,张国庆1,2 1. 中国科学院上海生命科学研究院生物医学大数据中心,上海 200031 2. 上海生物信息技术研 ...

  8. A Sequence-Based Novel Approach for Quality Evaluation of Third-Generation Sequencing Reads

    A Sequence-Based Novel Approach for Quality Evaluation of Third-Generation Sequencing Reads 一种基于 序列的 ...

  9. 使用 Oracle Enterprise Manager Cloud Control 12c 安装和管理 Oracle Data Guard

    2019独角兽企业重金招聘Python工程师标准>>> 使用 Oracle Enterprise Manager Cloud Control 12c 安装和管理 Oracle Dat ...

最新文章

  1. 史上第二走心的 iOS11-Drag Drop 教程
  2. 改公司demo板块样式
  3. [CH Round #61] 取数游戏
  4. 中国计算机学会YOCSEF TDS《专题探索班》,新一代人工智能的基础与前瞻
  5. struts2常用标签
  6. php加水印功能,PHP图片加水印功能
  7. 数据可视化机器学习工具在线_为什么您不能跳过学习数据可视化
  8. C++设计模式-职责链模式
  9. NameError: name 'file' is not defined
  10. XPath总结(转)
  11. 用grub4dos制作U盘启动盘winpe+红叶dos+maxdos+veket+linuxmint
  12. 离线网页地图开发-详细开发过程
  13. java鼠标钩子,使用setwindowshookex在C#中设置鼠标钩子:wparam和lparam总是返回常量...
  14. centos 7 安装donet core2.0环境
  15. postgresql division by zero
  16. 学术论文科研写作方法总结--针对深度学习,自然语言处理等领域
  17. 【Swagger】 SrpingBoot整合Swagger
  18. 条码标签里的数据源如何使用
  19. 系统集成项目管理工程师备考资料(口袋应试第二版第3章)
  20. 当YOLOv5遇见OpenVINO!

热门文章

  1. JSF pickList
  2. 【LeetCode】我能赢吗 [M](记忆化搜索)
  3. 白盒测试方法之逻辑覆盖
  4. 太空射击 第05课: 敌人精灵
  5. c# 读hex_C# Hex编码和解码
  6. c语言入门1.2.3 百度云,C语言入门1.2.3--一个老鸟的C语言学习心得(附光盘)
  7. 选股技巧|如何选股|怎么选股
  8. 如何修改eplan工程的名字
  9. 爆笑!史上最强的中国式英语
  10. Atari 2600 新书:主机游戏的一次黎明冒险