论文题目:Integrated omics: tools, advances and future approaches

scholar 引用:12

页数:25

发表时间:2018.07

发表刊物:Journal of Molecular Endocrinology

作者:Biswapriya B Misra, Carl Langefeld, Michael Olivier and Laura A Cox

摘要:Key Words: integrated, omics, genomics, transcriptomics, proteomics, metabolomics, network, statistics, Bayesian, machine learning, principal component analysis, correlation, clustering

With the rapid adoption of high-throughput omic approaches to analyze biological samples such as genomics, transcriptomics, proteomics and metabolomics, each analysis can generate tera- to peta-byte sized data files on a d aily basis. These data file
sizes, together with differences in nomenclature(命名法) among these data types, make the integration of these multi-dimensional omics data into biologically meaningful context challenging. Variously named as integrated omics, multi-omics, poly-omics, trans-omics, pan-omics or shortened to just ‘omics’, the challenges include differences in data cleaning, normalization, biomolecule identification, data dimensionality reduction, biological contextualization, statistical validation, data storage and handling, sharing and data archiving. The ultimate goal is toward the holistic realization of a ‘systems biology’ understanding of the biological question. Commonly used approaches are currently limited by the 3 i’s – integration, interpretation and insights. Post integration, these very large datasets aim to yield unprecedented views of cellular systems at exquisite resolution for transformative insights into processes, events and diseases through various computational and informatics frameworks. With the continued reduction in costs and processing time for sample analyses, and increasing types of omics datasets generated such as glycomics(糖组学), lipidomics(脂类组学), microbiomics and phenomics(表型组学), an increasing number of scientists in this interdisciplinary domain of bioinformatics face these challenges. We discuss recent approaches, existing tools and potential caveats in the integration of omics datasets for development of standardized analytical pipelines that could be adopted by the global omics research community.

结论:

  • no single approach exists for processing, analyzing and interpreting all data from different -omes.
  • the community needs to embrace challenges posed from these complex datasets to standardize sample quality, sample analysis pipelines, data analysis pipelines and data formats for public data availability.
  • Integrated omics is not just a collage of tools, but a cohesive paradigm for insightful biological interpretation of multi-omics datasets that will potentially reveal novel insights into basic biology, as well as health and disease.

Introduction:

  • Access to large-scale omics datasets has revolutionized biology and led to the emergence of systems approaches to advance our understanding of biological processes.
  • These multilayered, multifactorial approaches are computationally challenging and difficult to display and comprehend
    visually.
  • Broad experimental challenges in these integrated omics approaches include, but are not limited to:
  1. understanding the statistical behavior of readouts from each omics regime independently
  2. recognizing non-obvious relationships that exist between omics regimes within their original biological context
  3. capitalizing on time resolution in omics data

正文组织架构:

1. Introduction

2. Strengths and challenges of individual omics

2.1 Genomics and transcriptomics

2.2 Proteomics

2.3 Metabolomics

2.4 Unique challenges to specific omics platforms

2.4.1 Linking genotype to phenotype

2.4.2 Quantification of the proteome

2.4.3 Quantification of the metabolome

2.5 Issues shared among the omics platforms

2.5.1 Data handling

2.5.2 Annotation

2.5.3 Study design and analytic assumptions

2.5.4 Statistical power

2.5.5 Data archiving and sharing

3. Tools available for integration of multi-omics data

4. Recent examples of integration in real world datasets

4.1 Complex diseases

4.2 Immunity and infection

4.3 Cancer

4.4 Host microbiome interactions

5. Statistical approaches for current challenges

5.1 Number of samples vs number of molecules

5.2 Dimension reduction

5.3 Data integration

6. Current challenges and looking to future

6.1 Experimental challenges

6.1.1 Challenges in sample preparation

6.1.2 Optimizing, documenting and sharing workflows

6.1.3 Data processing

6.1.4 Time course studies

6.2 Individual omics datasets – normalization, transformation of different omics data types

6.3 Integration issues – data scaling, false positives and unknowns

6.4 Data issues – data archiving and sharing

6.5 Hurdles in implementing multi-omics approaches in the clinic for diagnostic/prognostic purposes

6.6 Biological knowledge – data interpretation

7. Conclusions

正文部分内容摘录:

2. Strengths and challenges of individual omics

2.1 Genomics and transcriptomics

  • Genomics and transcriptomics have been applied to various aspects of research and clinical applications.

2.2 Proteomics

  • Proteomics is used to quantify proteins in multiple sample types using both shotgun and targeted approaches.
  • proteomics is advancing our understanding in biomedical research, including diagnosis, protein-based biomarker development and therapeutics.

2.3 Metabolomics

  • Metabolites are often end products of complex biochemical cascades that can link the genome, transcriptome and
    proteome to phenotype, providing an important key tool for discovery of the genetic basis of metabolic variation.

2.4 Unique challenges to specific omics platforms

2.4.1 Linking genotype to phenotype

  • Combining data from proteomics and metabolomics with genomics and transcriptomics helps to overcome this limitation by providing molecular information that links genetic and epigenetic variation with phenotypic variation.

2.4.2 Quantification of the proteome

  • advances in instrument sensitivity, and the development of effective isotopic labeling tools for tissue samples have
    significantly improved the accuracy and reproducibility of peptide and protein quantification using MS.

2.4.3 Quantification of the metabolome

  • 相关challenges的解决方法:standardization, annotation of metabolites, interoperability of protocols and methods and statistical considerations.

2.5 Issues shared among the omics platforms

2.5.1 Data handling

  • it is essential that every analysis pipeline is well documented, including versions of software used for each step in the pipeline and rationale for parameters implemented.

2.5.2 Annotation

  • 主要是non-standard organisms的annotation数据存在更多问题。

2.5.3 Study design and analytic assumptions

  • Too often the large number of variables is viewed as making assumption validation impossible or not worth the investment.
  • An important step in a proper analysis is to clearly understand from the experimental question whether the omics variable is a predictor or an outcome.
  • aligning the analytic approach to match the outcome is important so that the proper variance is estimated for the test and interval estimates.

2.5.4 Statistical power

  • Major statistical challenges:the number of samples in a study versus the number of molecules quantified in each sample;analysis of time series data and treatment of data for targeted and untargeted (unbiased) approaches;the
    large variation in the number of observations per sample

2.5.5 Data archiving and sharing

  • the lack of a standardized nomenclature, data formatting and eventual public access to datasets.

3. Tools available for integration of multi-omics data

4. Recent examples of integration in real world datasets

  • 集成组学在以下四个领域近几年取得的一些研究成果的介绍,反映了集成组学的重要性。

4.1 Complex diseases

4.2 Immunity and infection

4.3 Cancer

4.4 Host microbiome interactions

5. Statistical approaches for current challenges

5.1 Number of samples vs number of molecules

  • Sampling directly impacts the appropriate statistical tools employed and must be defined prior to sampling for a given study.

5.2 Dimension reduction

  • Dimension reduction is one strategy to reduce the computational burden while also addressing multiple testing concerns.

5.3 Data integration

  • Most methods implemented for data integration have relied on PCA, correlation or Bayesian or non-Bayesian networkbased methods.

6. Current challenges and looking to future

6.1 Experimental challenges

6.1.1 Challenges in sample preparation

  • the unified sample preparation workflows are in their infancy with current methods typically providing unequal sample
    quality; significant work is required to achieve universal applications for diverse biological matrices.

6.1.2 Optimizing, documenting and sharing workflows

  • It is essential to define and document all steps in the data handling workflow, including generation of individual omics datasets and integration of omics datasets.

6.1.3 Data processing

  • Analysis tools chosen for integrative efforts also have a significant impact on outcomes.

6.1.4 Time course studies

  • Sampling time courses are important for understanding integrated network dynamics.

6.2 Individual omics datasets – normalization, transformation of different omics data types

  • 不同的dataset,比如说0的含义也不一样。
  • imputation of missing values must be addressed differently for the different types of datasets.

6.3 Integration issues – data scaling, false positives and unknowns

  • Tools for scaling datasets and addressing false positives from three or more independent platforms for integration and subsequent analysis have not yet been developed.
  • A key strength of unbiased omics approaches is the ability to identify novel molecules that impact biological function.
  • A major limitation of omics analyses is the ability to annotate unknowns.

6.4 Data issues – data archiving and sharing

  • There is a growing urgency for reproducible research using integrated omics, similar to all disciplines in science.
  • although public databases for archiving individual omics datasets exist, no such archive exists for integrated omics datasets.

6.5 Hurdles in implementing multi-omics approaches in the clinic for diagnostic/prognostic purposes

  • there is the need for not only quantification of different types of biological variants, but integration of these data in ways that inform our understanding health and disease which will translate to clinical practice.

6.6 Biological knowledge – data interpretation

  • The largest hurdle for any omics dataset remains ‘making sense of the data’.

Paper reading (二十二):Integrated omics: tools, advances and future approaches相关推荐

  1. Paper reading (三十二):Personalized Nutrition by Prediction of Glycemic Responses(Results)

    正文部分内容摘录: 2. Results 2.1 Measurements of postprandial responses, clinical data, and gut microbiome h ...

  2. Paper reading (八十二):Maturation of the Infant Respiratory Microbiota, Envir Drivers, and Health cons

    论文题目:Maturation of the Infant Respiratory Microbiota, Environmental Drivers, and Health Consequences ...

  3. 自然语言处理系列二十二》词性标注》词性标注原理》词性介绍

    注:此文章内容均节选自充电了么创始人,CEO兼CTO陈敬雷老师的新书<分布式机器学习实战>(人工智能科学与技术丛书)[陈敬雷编著][清华大学出版社] 文章目录 自然语言处理系列二十二 词性 ...

  4. JAVA基础知识总结:一到二十二全部总结

    >一: 一.软件开发的常识 1.什么是软件? 一系列按照特定顺序组织起来的计算机数据或者指令 常见的软件: 系统软件:Windows\Mac OS \Linux 应用软件:QQ,一系列的播放器( ...

  5. 2021年大数据Hadoop(二十二):MapReduce的自定义分组

    全网最详细的Hadoop文章系列,强烈建议收藏加关注! 后面更新文章都会列出历史文章目录,帮助大家回顾知识重点. 目录 本系列历史文章 前言 MapReduce的自定义分组 需求 分析 实现 第一步: ...

  6. 一位中科院自动化所博士毕业论文的致谢:二十二载风雨求学路,他把自己活成了光.........

    4月18日,中国科学院官方微博发布消息,披露了这篇论文为<人机交互式机器翻译方法研究与实现>,作者是2017年毕业于中国科学院大学的工学博士黄国平. 这篇论文中情感真挚的<致谢> ...

  7. iOS 11开发教程(二十二)iOS11应用视图实现按钮的响应(2)

    iOS 11开发教程(二十二)iOS11应用视图实现按钮的响应(2) 此时,当用户轻拍按钮后,一个叫tapButton()的方法就会被触发. 注意:以上这一种方式是动作声明和关联一起进行的,还有一种先 ...

  8. 实验二十二 SCVMM中的SQL Server配置文件

    实验二十二 SCVMM中的SQL Server配置文件 在VMM 2012中管理员可以使用 SQL Server 配置文件,在部署完成虚拟机之后,实现 SQL Server 数据库服务自动化部署并交付 ...

  9. 插入DLL和挂接API——Windows核心编程学习手札之二十二

    插入DLL和挂接API --Windows核心编程学习手札之二十二 如下情况,可能要打破进程的界限,访问另一个进程的地址空间: 1)为另一个进程创建的窗口建立子类时: 2)需要调试帮助时,如需要确定另 ...

最新文章

  1. shell 开机自动执行_windows还能这么玩?开机自动念情书
  2. android view过度动画,为View的切换添加过渡动画
  3. linux如何拷贝iphone文件夹,IPhone 手机如何和 Deepin 系统共享文件
  4. 小米平板5或无缘MIUI 13:搭配骁龙870 预装MIUI 12.5系统
  5. 电平转换电路_RS232电平和TTL电平有什么不同?如何转换?
  6. Python命令行解析工具argparse
  7. 黑科技之中文计算机语言,双语 - 小巧玲珑的计算机语言 - Red
  8. python零基础教学plc_编程零基础应当如何开始学习 Python?
  9. MySQL中GA、RC、Alpha的区别
  10. Win7蓝屏提示STOP: 0x0000008E (0xC0000005,0xA4E7B664,0X852E946C,0X00000000)的一种处理方法
  11. 电影周周看——适合新手学习的微信小程序
  12. surface pro 写php,surface pro7尺寸
  13. 淘宝618列车喵币自动做任务app 懒人一键安装包
  14. CentOS7系统root分区文件损坏修复的解决方法
  15. 20180415字节跳动今日头条笔试题——后台研发方向
  16. C++算法练习题 T1(henu.hjy)
  17. vtx文件有什么用_VTX文件扩展名 - 什么是.vtx以及如何打开? - ReviverSoft
  18. cursor使用说明
  19. python输入q结束程序_试图让一个Python程序以字母“q”退出,但是输入是一个整数?...
  20. 爱客猴内容管理系统(AikehouAdmin5.2.0)

热门文章

  1. python 处理大数据
  2. 物理学专业英语(词汇整理)--------03
  3. 22长安杯——个人赛
  4. 用 Python 给程序加个进度条,让你的程序看起来更炫酷
  5. 锦标赛算法Python实现
  6. 支付宝内测商家版「朋友圈」
  7. 腾讯云web应用防火墙(WAF)防护设置步骤介绍
  8. 社保公积金异地转移(上海->宁波)持续更新中
  9. unity,射手游戏
  10. 大家都在努力,你凭什么不努力