Paper reading (六十五)：Kernel-penalized regression for analysis of microbiome data

论文题目：Kernel-penalized regression for analysis of microbiome data

scholar 引用：15

页数：29

发表时间：2018.03

发表刊物：Institute of Mathematical Statistics

作者：Timothy W. Randolph, Sen Zhao, ..., and Ali Shojaie

摘要：

The analysis of human microbiome data is often based on dimension-reduced graphical displays and clusterings derived from vectors of microbial abundances in each sample. Common to these ordination methods is the use of biologically motivated definitions of similarity. Principal coordinate analysis, in particular, is often performed using ecologically defined distances, allowing analyses to incorporate context-dependent, non-Euclidean structure. In this paper, we go beyond dimension-reduced ordination methods and describe a framework of high-dimensional regression models that extends these distance-based methods. In particular, we use kernel-based methods to show how to incorporate a variety of extrinsic information, such as phylogeny, into penalized regression models that estimate taxon-specific associations with a phenotype or clinical outcome. Further, we show how this regression framework can be used to address the compositional nature of multivariate predictors comprised of relative abundances; that is, vectors whose entries sum to a constant. We illustrate this approach with several simulations using data from two recent studies on gut and vaginal microbiomes. We conclude with an application to our own data, where we also incorporate a significance test for the estimated coefficients that represent associations between microbial abundance and a percent fat.

正文组织架构：

1. Introduction

2. Kernel Penalized Regression for Microbiome Data

2.1 Background for PCoA and principal component regression

2.2 Penalized regression and DPCoA

2.3 Kernel-based regression with two kernels

2.4 Regression with compositional data

3. Numerical Experiments

3.1 Regression and DPCoA

3.2 Regression and PCoA with respect to a UniFrac kernel

3.3 Regression and PCoA using an edge-matrix kernel

4. Application to an observational study

5. Discussion

正文部分内容摘录：

1. Biological Problem: What biological problems have been solved in this paper?

The analysis of human microbiome data

2. Main discoveries: What is the main discoveries in this paper?

use kernel-based methods to show how to incorporate a variety of extrinsic information, such as phylogeny, into penalized regression models that estimate taxon-specific associations with a phenotype or clinical outcome.
how this regression framework can be used to address the compositional nature of multivariate predictors comprised of relative abundances; that is, vectors whose entries sum to a constant.
An interesting feature of the proposed kernel-penalized regression framework is its ability to sidestep some of the problems inherent in compositional data analysis.

3. ML(Machine Learning) Methods: What are the ML methods applied in this paper?

describe a framework of high-dimensional regression models that extends these distance-based methods.
A primary motivation for PCoA graphical displays is the ability to incorporate biologically-inclined measures of (dis)similarity.
提出的方法：kernel penalized regression
We show how phylogenetic and other structure can be incorporated via kernel penalized regression in either the primal (p-dimensional) feature space or the dual (n-dimensional) samples space
以前的方法：PCoA？standard (Euclidean-based) statistical models
dataset：We apply our kernel-penalized regression framework to data from 16S rRNA gene collected in a study of premenopausal women (Hullar et al., 2015). This study investigated aspects of gut microbial communities in stool samples from premenopausal women using 454 pyrosequencing of the 16S rRNA gene. The abundances of 127 species were zero for more than 90% of the subjects and were removed from our analysis. The data set we consider consists of p = 128 species sampled from n = 102 women.

4. ML Advantages: Why are these ML methods better than the traditional methods in these biological problems?

traditional methods: dimension-reduced graphical displays and clusterings derived from vectors of microbial abundances in each sample. Principal coordinate analysis
none of these analyses proceed to estimate the individual associations
In contrast, we focus on estimating the coefficient vector, which is a key aspect of any approach used to draw scientific conclusions based on the association of microbial communities with an outcome or phenotype.
Our approach, which differs somewhat from that of Li (2015), may also be viewed as a penalized version of the low-dimensional linear model for compositions by Tolosana-Delgado and Van Den Boogart (2011), who use the isometric log-ratio (ILR) coordinates.
for addressing well-known problems that arise from applying standard (Euclidean-based) statistical models to compositional data

5. Biological Significance: What is the biological significance of these ML methods’ results?

In this analysis, we obtain estimates of associations between microbial species and percent fat measured in premenopausal women, and also provide inference for these estimates by applying a recent significance test in our kernel-penalized regression (KPR) framework.

6. Prospect: What are the potential applications of these machine learning methods in biological science?

the proposed framework also allows us to use existing inference frameworks for high-dimensional regression, and in particular the Grace test (Zhao and Shojaie, 2016), to assess the significance of estimated regression coefficients.

Paper reading (六十五)：Kernel-penalized regression for analysis of microbiome data相关推荐

Paper reading (六十)：Multidomain analyses of a intestinal cleanout perturbation experiment
论文题目:Multidomain analyses of a longitudinal human microbiome intestinal cleanout perturbation experi ...
六十五、Leetcode数组系列（上篇）
@Author:Runsen @Date:2020/6/5 作者介绍:Runsen目前大三下学期,专业化学工程与工艺,大学沉迷日语,Python, Java和一系列数据分析软件.导致翘课严重,专业排名 ...
六十五年来，他的祖国向他道歉了三次
△ "人工智能之父"艾伦 · 麦席森 · 图灵 (Alan Mathison Turing,1912-1954) 2021年6月23日是英国科学家."人工智能之父&quo ...
JavaScript学习（六十五）—数组知识点总结
JavaScript学习(六十五)-数组学习内容一.什么是数组二.数组的分类三.数组的创建方式四.数组元素五.数组的操作六.数组元素遍历的四种方法七.随机数为数组赋值八.数组的比较 ...
信息系统项目管理师核心考点（六十五）信息安全基础知识网络安全
科科过为您带来软考信息系统项目管理师核心重点考点(六十五)信息安全基础知识网络安全,内含思维导图+真题 [信息系统项目管理师核心考点]信息安全基础知识网络安全 1.拒绝服务攻击(Dos) 一种利用合理 ...
C语言/C++常见习题问答集锦(六十五) 之彩票幸运星
C语言/C++常见习题问答集锦(六十五) 之彩票幸运星程序之美 1.L1-062 幸运彩票 (15 分) 彩票的号码有 6 位数字,若一张彩票的前 3 位上的数之和等于后 3 位上的数之和,则称这张 ...
问题六十五：二叉查找树的一个应用实例——求解一元十次方程时单实根区间的划分
65.1 概述回忆一下: "问题五十九:怎么求一元六次方程在区间内的所有不相等的实根"和"问题六十二:怎么求一元十次方程在区间内的所有不相等的实根"中求一元六 ...
如何选择适合你的兴趣爱好（六十五），文学
围城网的摇摇今天给大家带来了"如何选择适合你的兴趣爱好"系列专辑的第六十五讲--文学. 文学是语言文字的艺术,包括小说.诗歌.散文等.相信我们经常看小说的朋友对唐家三少肯定不陌生吧 ...
（六十五）Android O StartService的 anr timeout 流程分析
前言:之前在(六十四)Android O Service启动流程梳理--startService 梳理了startService的一般流程,anr的没有涉及,本篇就以anr的为关注点梳理下流程. 参考 ...

Paper reading (六十五)：Kernel-penalized regression for analysis of microbiome data

Paper reading (六十五)：Kernel-penalized regression for analysis of microbiome data相关推荐

最新文章

热门文章