R语言并行计算RCbray-curtis距离

  群落构建分析是微生物生态学分析的重要组成部分,成为目前文章发表的热点技术。之前我们介绍了计算beta-NTI(beta nearest taxon index)来进行群落构建分析。|beta-NTI| >2说明决定性过程主导,其中beta-NTI >2说明OTU的遗传距离发散,为生物交互作用主导,beta-NTI < -2则说明OUT的遗传距离收敛为环境选择主导。|beta-NTI| <2则说明随机过程主导,但随机过程包含遗传漂变、扩散限制和均质扩散,那么如何对随机过程进一步划分呢?

1. 如何理解RCbray-curtis距离?

  根据OUT表获得每个物种的占居表(例如,物种1只在C群落中出现过则为1,物种2在A和B群落中出现过则为2,以此类推)和丰度表(获取物种在各个样点的丰度总和),然后根据物种占居表和丰度表进行模拟。在保证样点丰富度不变,根据占居表的概率选择出现的物种,根据丰度表的概率选择所出现物种的丰度,从而获得模拟条件下的OTU表,并计算样品间的Bray-Curtis距离。重复以上过程1000次,获得1000个模拟条件下样品间的Bray-Curtis距离。将模拟得到的Bray-Curtis距离与实际观察到的Bray-Curtis距离进行比较,记录小于实际观测值的比例,即为RCbray-curtis距离。当|beta-NTI| < 2时,可根据RCbray-curtis对随机过程进一步划分。 RCbray-curtis > 0.95 意味着扩散限制,RCbray-curtis < 0.05意味着均值扩散,0.05 < RCbray-curtis < 0.95 则意味着漂变。

2. 如何计算RCbray-curtis距离

#calculate RC-bray value if site which beta-NTI in (-2,2)
raup_crick_abundance = function(spXsite,plot_names_in_col1 = TRUE,classic_metric = FALSE,split_ties = TRUE,reps = 999,set_all_species_equal = FALSE,as.distance.matrix = TRUE,report_similarity = FALSE,threads = 1) {##expects a species by site matrix for spXsite, with row names for plots, or optionally plots named in column 1.  By default calculates a modification of the Raup-Crick metric (standardizing the metric to range from -1 to 1 instead of 0 to 1). Specifying classic_metric=TRUE instead calculates the original Raup-Crick metric that ranges from 0 to 1. The option split_ties (defaults to TRUE) adds half of the number of null observations that are equal to the observed number of shared species to the calculation- this is highly recommended.  The argument report_similarity defaults to FALSE so the function reports a dissimilarity (which is appropriate as a measure of beta diversity).  Setting report_similarity=TRUE returns a measure of similarity, as Raup and Crick originally specified.  If ties are split (as we recommend) the dissimilarity (default) and similarity (set report_similarity=TRUE) calculations can be flipped by multiplying by -1 (for our modification, which ranges from -1 to 1) or by subtracting the metric from 1 (for the classic metric which ranges from 0 to 1). If ties are not split (and there are ties between the observed and expected shared number of species) this conversion will not work. The argument reps specifies the number of randomizations (a minimum of 999 is recommended- default is 9999).  set_all_species_equal weights all species equally in the null model instead of weighting species by frequency of occupancy.##Note that the choice of how many plots (rows) to include has a real impact on the metric, as species and their occurrence frequencies across the set of plots is used to determine gamma and the frequency with which each species is drawn from the null model##this section moves plot names in column 1 (if specified as being present) into the row names of the matrix and drops the column of namesif (plot_names_in_col1) {row.names(spXsite) <- spXsite[, 1]spXsite <- spXsite[, -1]}## count number of sites and total species richness across all plots (gamma)library(doParallel)library(foreach)library(vegan)n_sites <- nrow(spXsite)gamma <- ncol(spXsite)##build a site by site matrix for the results, with the names of the sites in the row and col names:results <-matrix(data = NA,nrow = n_sites,ncol = n_sites,dimnames = list(row.names(spXsite), row.names(spXsite)))##make the spXsite matrix into a new, pres/abs. matrix:ceiling(spXsite / max(spXsite)) -> spXsite.inc##create an occurrence vector- used to give more weight to widely distributed species in the null model:occur <- apply(spXsite.inc, MARGIN = 2, FUN = sum)##create an abundance vector- used to give more weight to abundant species in the second step of the null model:abundance <- apply(spXsite, MARGIN = 2, FUN = sum)##make_null:##looping over each pairwise community combination:for (null.one in 1:(nrow(spXsite) - 1)) {for (null.two in (null.one + 1):nrow(spXsite)) {null_bray_curtis <- NULLcl <- makeCluster(threads)registerDoParallel(cl)null_bray_curtis <- foreach (i = 1:reps,.combine = "c") %dopar%{library(vegan)##two empty null communities of size gamma:com1 <- rep(0, gamma)com2 <- rep(0, gamma)##add observed number of species to com1, weighting by species occurrence frequencies:com1[sample(1:gamma,sum(spXsite.inc[null.one, ]),replace = FALSE,prob = occur)] <- 1com1.samp.sp = sample(which(com1 > 0),(sum(spXsite[null.one, ]) - sum(com1)),replace = TRUE,prob = abundance[which(com1 > 0)])com1.samp.sp = cbind(com1.samp.sp, 1)# head(com1.samp.sp);com1.sp.counts = as.data.frame(tapply(com1.samp.sp[, 2], com1.samp.sp[, 1], FUN = sum))colnames(com1.sp.counts) = 'counts'# head(com1.sp.counts);com1.sp.counts$sp = as.numeric(rownames(com1.sp.counts))# head(com1.sp.counts);com1[com1.sp.counts$sp] = com1[com1.sp.counts$sp] + com1.sp.counts$counts# com1;#sum(com1) - sum(spXsite[null.one,]); ## this should be zero if everything work properlyrm('com1.samp.sp', 'com1.sp.counts')##same for com2:com2[sample(1:gamma,sum(spXsite.inc[null.two, ]),replace = FALSE,prob = occur)] <- 1com2.samp.sp = sample(which(com2 > 0),(sum(spXsite[null.two, ]) - sum(com2)),replace = TRUE,prob = abundance[which(com2 > 0)])com2.samp.sp = cbind(com2.samp.sp, 1)# head(com2.samp.sp);com2.sp.counts = as.data.frame(tapply(com2.samp.sp[, 2], com2.samp.sp[, 1], FUN =sum))colnames(com2.sp.counts) = 'counts'# head(com2.sp.counts);com2.sp.counts$sp = as.numeric(rownames(com2.sp.counts))# head(com2.sp.counts);com2[com2.sp.counts$sp] = com2[com2.sp.counts$sp] + com2.sp.counts$counts# com2;# sum(com2) - sum(spXsite[null.two,]); ## this should be zero if everything work properlyrm('com2.samp.sp', 'com2.sp.counts')null.spXsite = rbind(com1, com2)# null.spXsite;##calculate null bray curtisnull_bray_curtis = vegdist(null.spXsite, method = 'bray')#print(c(i,date()))detach(package:vegan)}stopCluster(cl)# end reps loop## empirically observed bray curtisobs.bray = vegdist(spXsite[c(null.one, null.two), ], method = 'bray')##how many null observations is the observed value tied with?num_exact_matching_in_null = sum(null_bray_curtis == obs.bray)##how many null values are smaller than the observed *dissimilarity*?num_less_than_in_null = sum(null_bray_curtis < obs.bray)rc = (num_less_than_in_null) / reps# rc;if (split_ties) {rc = ((num_less_than_in_null + (num_exact_matching_in_null) / 2) / reps)}if (!classic_metric) {##our modification of raup crick standardizes the metric to range from -1 to 1 instead of 0 to 1rc = (rc - .5) * 2}results[null.two, null.one] = round(rc, digits = 2)##store the metric in the results matrixprint(c(null.one, null.two, date()))}## end null.two loop}## end null.one loopif (as.distance.matrix) {## return as distance matrix if so desiredresults <- as.dist(results)}return(results)
}
## end function
library(vegan)
#读取OTU表
otu_biom2 <- read.delim("./otu table.txt", row.names=1)
#查看样品序列条数
summary(colSums(otu_biom2))
otu_biom2  = otu_biom2[,colSums(otu_biom2) > 8500]
#将OTU表转置为行名为样品名称,列名为OTU名称的表
otu<-t(otu_biom2)
head(rowSums(otu))
#对OTU表进行抽平
otu = as.data.frame(rrarefy(otu, 8500))
head(rowSums(otu))
#去除序列条数较少的OTU
otu_niche<-otu[,log(colSums(otu)/sum(otu),10)>-4.5]
#以前500个OTU为例,度量十核计算所需的时间
system.time(raup_crick_abundance(otu_niche[,1:200],plot_names_in_col1 = FALSE,threads=1))
#以前500个OTU为例,度量单核计算所需的时间
system.time(raup_crick_abundance(otu_niche[,1:200],plot_names_in_col1 = FALSE,threads=5))

3. 多核运行可以节省多少时间?
根据下图我们可以看到,以前200个OTUs为例,十核计算一个样本对所需的时间是1秒,总共花费194秒;而单核计算一个样本对花费的时间是7-8秒,总共花费1193秒。当样本量较大时,多核计算可以节省更多时间。

测试数据和代码可以通过如下链接进行下载,两者内容相同:
面包多:https://mianbaoduo.com/o/bread/YpmTlZhu
CSDN: https://download.csdn.net/download/weixin_43367441/79594637
更多R语言分析微生物生态学的资源可参考如下链接:
https://mbd.pub/o/bread/mbd-YpmTlZpr

注意事项:

Raup-Crick index for absence-presence data. This index (Raup & Crick1979) uses a randomization (Monte Carlo) procedure, comparing the observed number of species ocurring in both associations with the distribution of co-occurrences from 1000 random replicates from the pool of samples (not from the pair of samples, as incorrectly assumed by some authors).

参考文献:
[1] Quantifying Community Assembly Processes and Identifying Features that Impose Them (https://doi.org/10.1038/ismej.2013.93)
[2] https://www.nhm.uio.no/english/research/infrastructure/past/help/similarity.html
[3] Using null models to disentangle variation in community dissimilarity from variation in α-diversity

R语言并行计算RC~bray-curtis~距离相关推荐

  1. R语言distVincentyEllipsoid函数计算大圆距离实战(Great Circle Distance)

    R语言distVincentyEllipsoid函数计算大圆距离实战(Great Circle Distance) 目录 R语言distVincentyEllipsoid函数计算大圆距离实战(Grea ...

  2. R语言distVincentySphere函数计算大圆距离实战(Great Circle Distance)

    R语言distVincentySphere函数计算大圆距离实战(Great Circle Distance) 目录 R语言distVincentySphere函数计算大圆距离实战(Great Circ ...

  3. R语言distMeeus函数计算大圆距离实战(Great Circle Distance)

    R语言distMeeus函数计算大圆距离实战(Great Circle Distance) 目录 R语言distMeeus函数计算大圆距离实战(Great Circle Distance) #导入ge ...

  4. R语言distHaversine函数计算大圆距离实战

    R语言distHaversine函数计算大圆距离实战 目录 R语言distHaversine函数计算大圆距离实战 #导入geosphere包 # 仿真数据

  5. R语言distCosine函数计算大圆距离实战(Law of Cosines Great Circle Distance)

    R语言distCosine函数计算大圆距离实战(Law of Cosines Great Circle Distance) 目录 R语言distCosine函数计算大圆距离实战(Law of Cosi ...

  6. R语言并行计算spearman相关系数

    R语言并行计算spearman相关系数,加快共现网络(co-occurrence network)构建速度    利用spearman相关性分析是构建共现网络的重要方法,但由于OTU table往往有 ...

  7. R语言并行计算beta-NTI值

    目录 一.beta-NTI(nearest taxon index)简介 二.系统发育信号(phylogenetic signal)检测 三.R代码并行实现beta-NTI计算 参考文献 一.beta ...

  8. R语言并行计算 deviation of null beta diversity(beta多样性零偏差)

      群落构建分析是微生物生态学分析的重要组成部分,成为目前文章发表的热点技术.之前我们介绍了计算beta-NTI(beta nearest taxon index)来进行群落构建分析(https:// ...

  9. R语言并行计算实战教程

    foreach包增强了R循环遍历功能,并且提供了并行执行能力.本文通过示例带你轻松掌握这个高级技能. foreach语法介绍 %do% 和 %dopar% 是对遍历对象执行一段业务功能代码的操作. # ...

最新文章

  1. JS数组去重方法记录
  2. Ubuntu下tftp服务器的搭建
  3. js将中文转换成编码 java解析_JS实现的汉字与Unicode码相互转化功能分析
  4. 使用CLion在Gtkmm中加载glade文件时的相对路径问题
  5. SK海力士CEO前往日本 解决关键半导体原材料供应问题
  6. mysqlbinlog unknown variable:default-character-set=gbk
  7. 测量两台机器的的网络延迟和时间差
  8. 如何编写兼容各主流邮箱的HTML邮件
  9. 慕课python第六周测验答案_大学慕课2020Python编程基础章节测验答案
  10. ArcPad 8 简介
  11. kali linux工具词典,Kali字典神器—Crunch
  12. 电力载波通信模块JST-HPLC-S-C在物联网通信领域的应用
  13. 旺旺怎么去服务器接收文件夹,xp系统下找到阿里旺旺安装路径文件夹的方法
  14. 从Palm到Pocket PC(转)
  15. 猪呀,羊呀,送到哪里去?
  16. 算法入门章——引出贯穿《算法导论》全书的算法分析和设计框架
  17. 论文研读笔记(三)——基于障碍函数的移动机器人编队控制安全强化学习
  18. iOS调用QQ客户端,发起临时会话
  19. 【敏捷开发每日一贴】DoD“完成”的定义
  20. python菜鸟教程学习:数据结构

热门文章

  1. Leetcode 048旋转图像(将矩阵逆时针旋转90度)(先对角线翻转,再左右翻转)
  2. 如何获取dll或exe的模块名
  3. 一个在线免费短网址生成API
  4. AtCoder Beginner Contest 171 C.One Quadrillion and One Dalmatians
  5. 5.Abp vNext 地磅无人值守 微信小程序
  6. 使用EasyPoi导出Word文件,使用@Excel注释导出实体对象图片的解决方案
  7. Pandas写入Excel文件如何避免覆盖已有Sheet
  8. Windows安全日志
  9. java金额三位一撇方法_Java数据格式化问题
  10. c语言霍夫变换圆检测,Hough Transform(霍夫变换)检测Circle(圆)的几种方法