手把手10分文章WGCNA复现:小胶质细胞亚群在脑发育时髓鞘形成的作用

Original 生信技能树 生信技能树 2020-01-09 14:2

Hi大家好,我是老米!生信入门一个月,感谢生信技能树平台。这是我的第3篇Markdown。不知道多少人还记得我的前两个投稿:

  • 原来一个星期真的可以零基础入门TCGA数据挖掘,甚至markdown写作公众号投稿

  • 一个基因引发的血案

现在继续完成生信技能树的作业题:重复一篇WGCNA分析的文章。WGCNA学习教程:一文学会WGCNA分析。

复现的文献:A novel microglial subset plays a key role in myelinogenesis in developing brain。EMBO J,好杂志。

该文章对五个处理组,共17个老鼠:

  • orange represents neonatal CD11c+ microglia (n = 4),

  • green neonatal CD11c microglia (n = 4),

  • blue EAE CD11c+ microglia (n = 3),

  • purple EAE CD11c microglia (n = 3),

  • black adult microglia (n = 3).

其实就是 neonatal(新生的) 和 EAE的Microglia,还有CD11C阳性和阴性,然后和成年小鼠的Microglia进行比较。这些样本进行了RNASeq测序,数据在GEO可供下载:https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE78809。当然我偷了个懒,该文章Supplemental material提供了整理之后的csv矩阵,大概1万3千个基因。

作者进行了WGCNA分析,以此证明小胶质细胞的作用及寻找关键信号通路及基因(嗯,估计这个实验室后续还有大文章出来)。

组学及生信分析的篇幅占了整个文章的一半以上。在此,仅重复文章的Figure3。

其实我跌跌撞撞学了WGCNA半个月,大概了解了这个原理,现在还一知半解,对很多细节摸索了很久,头很大。下面进入正题,请大家一起跟着我头大。

1.数据处理

rm(list = ls())
dev.off()
library(WGCNA)# 偷懒,读取作者给出的附件的表达矩阵
norm.counts <- read.csv("Dataset_EV2.csv",header = T,sep = ",") ##original counts
rownames(norm.counts)<-norm.counts[,1]
norm.counts <- norm.counts[,2:18]
save(norm.counts,file = "norm.counts.Rdata")## 处理表型信息,就是老鼠的分组信息
a <- colnames(norm.counts)
library(stringr)
str_tmp=as.data.frame(str_split(a,"[_]",simplify = T))
rownames(str_tmp) <- colnames(norm.counts)
colnames(str_tmp) <- c("group","sampleRep")
datTraits <- str_tmp
datTraits$groupNo <- c(rep(1,3),rep(2,4),rep(3,4),rep(4,3),rep(5,3))
###数据初步处理完成## 筛选中位绝对偏差前75%的基因,取MAD大于0.3
## 筛选后会降低运算量,也会失去部分信息
## 也可不做筛选,使MAD大于0即可
norm.counts <- log2(norm.counts)  ##注意我在这里进行了log2,因为我发现如果不log的话,出现了个很有意思的现象,请您往下看
m.mad <- apply(norm.counts,1,mad)
dataExprMad <- norm.counts[which(m.mad > max(quantile(m.mad, probs=seq(0, 1, 0.25))[2],0.3)),] #25%  = 0.22
datExpr0 <- as.data.frame(t(dataExprMad))
##最终取到8223个基因。###########start handling the data
##看看有没有为空的值,需要剔除。这里下载的都是作者处理好的,基本没问题。
gsg <- goodSamplesGenes(datExpr0,verbose = 3)
gsg$allOK
if (!gsg$allOK){# Optionally, print the gene and sample names that were removed:if (sum(!gsg$goodGenes)>0)printFlush(paste("Removing genes:", paste(names(datExpr0)[!gsg$goodGenes],collapse = ", ")));if (sum(!gsg$goodSamples)>0)printFlush(paste("Removing samples:",paste(rownames(datExpr0)[!gsg$goodSamples], collapse = ", ")));# Remove the offending genes and samples from the data:datExpr0 = datExpr0[gsg$goodSamples, gsg$goodGenes]
}
gsg <- goodSamplesGenes(datExpr0,verbose = 3)
gsg$allOK###画个树看看样本有没有离群值!!!
if(T){# Plot a line to show the cutsampleTree <- hclust(dist(datExpr0),method = "average")par(cex=.6)par(mar=c(0,4,2,0))plot(sampleTree, main = "Sample clustering to detect outliers", sub = "",xlab = "",cex.lab = 1.5, cex.axis =1.5, cex.main=2)
}##然后保存数据
dim(datExpr0) ##17,8222
datExpr <- datExpr0
nGenes <- ncol(datExpr)
nSamples <- nrow(datExpr)
save(nGenes,nSamples,datExpr,datTraits,file="input.Rdata")

结果如图:

这结果挺好的,符合实验设计,都聚类好了。但是!但是如果不对原始数值进行log2处理,会出现一个outlier老鼠,如图:

看到没,那个新生的NEO1号鼠有点调皮!他超脱于所有老鼠之上,统领其他所有老鼠!这要么是问题少年,要么是智商超群,比如科大少年班的那种!而我大概深究了一下,控制这个离群值的大概是500个基因,哈哈哈。。。

2. MDS分析

专业名词叫多维标度法(Multidimensional Scaling)MDS,是一种维数缩减方法,把高维的数据点映射到一个低维的流形上;同时也是一种可视化方法,实践中通常利用2D或3D的MDS 结果观察(投影后)点的分布和聚集来研究数据的性质。实际上你也可以把它看做是PCA分析,反正都是看样本间的相关性计算距离远近。

说人话叫做:通过基因表达log2组间差异,看看这17个老鼠是否能够分到一个类别,比如成年鼠归到一类,EAE归到一类,新生鼠归到一类。和上面的有点类似,又不完全一样。

参考学习了stat的视频,也参考了他提供的代码。如下:

rm(list = ls())
load(file = 'input.Rdata')
log2.data.matrix <- as.data.frame(t(datExpr))
## now create an empty distance matrix
log2.distance.matrix <- matrix(0,nrow=ncol(log2.data.matrix),ncol=ncol(log2.data.matrix),dimnames=list(colnames(log2.data.matrix),colnames(log2.data.matrix)))
## now compute the distance matrix using avg(absolute value(log2(FC)))
for(i in 1:ncol(log2.distance.matrix)) {for(j in 1:i) {log2.distance.matrix[i, j] <-mean(abs(log2.data.matrix[,i] - log2.data.matrix[,j]))}
}
## do the MDS math (this is basically eigen value decomposition)
## cmdscale() is the function for "Classical Multi-Dimensional Scalign"
mds.stuff <- cmdscale(as.dist(log2.distance.matrix),eig=TRUE,x.ret=TRUE)
save(mds.stuff,file = "step7_MDS.Rdata")
## calculate the percentage of variation that each MDS axis accounts for...
mds.var.per <- round(mds.stuff$eig/sum(mds.stuff$eig)*100, 1)
## now make a fancy looking plot that shows the MDS axes and the variation:
mds.values <- mds.stuff$points
mds.data <- data.frame(Sample=rownames(mds.values),X=mds.values[,1],Y=mds.values[,2])
## 然后画图
library(ggpubr)
library(ggplot2)
groups <- datTraits$group
png("figures/step7_MDS_logfc.png",width = 800,height=600)
ggplot(mds.data, aes(x=X, y=Y, label=Sample,col=groups)) + geom_point(size = 10, alpha = 0.8) +theme(panel.grid.minor = element_blank()) +coord_fixed() + theme_bw()+ggtitle("MDS plot using avg(logFC) as the distance")+xlab(paste("Leading logFC dim1 ", mds.var.per[1], "%", sep="")) +ylab(paste("Leading logFC dim2 ", mds.var.per[2], "%", sep="")) +ggtitle("MDS plot using avg(logFC) as the distance")
dev.off()

结果如下:

嗯,,,结果表示完美!别问我为什么要先做这个图,因为它好看,我忍不住!

点评:老米在这里开始秀技术了,实际上大家随便跑一下PCA分析即可,暂时无需深入写这么长代码!

       下面正式进入WGCNA分析。

3. 计算beta值

这个没啥好讲的,代码如下:

rm(list = ls())
library(WGCNA)
load(file = "input.Rdata")
dim(datExpr)
#################################################
if(T){powers = c(c(1:10), seq(from = 12, to=30, by=2))# Call the network topology analysis functionsft = pickSoftThreshold(datExpr, powerVector = powers, verbose = 5)png("figures/step2-beta-value.png",width = 800,height = 600)# Plot the results:##sizeGrWindow(9, 5)par(mfrow = c(1,2));cex1 = 0.9;# Scale-free topology fit index as a function of the soft-thresholding powerplot(sft$fitIndices[,1], -sign(sft$fitIndices[,3])*sft$fitIndices[,2],xlab="Soft Threshold (power)",ylab="Scale Free Topology Model Fit,signed R^2",type="n",main = paste("Scale independence"));text(sft$fitIndices[,1], -sign(sft$fitIndices[,3])*sft$fitIndices[,2],labels=powers,cex=cex1,col="red");# this line corresponds to using an R^2 cut-off of habline(h=0.9,col="red")# Mean connectivity as a function of the soft-thresholding powerplot(sft$fitIndices[,1], sft$fitIndices[,5],xlab="Soft Threshold (power)",ylab="Mean Connectivity", type="n",main = paste("Mean connectivity"))text(sft$fitIndices[,1], sft$fitIndices[,5], labels=powers, cex=cex1,col="red")dev.off()
}
sft$powerEstimate  ## beta=22 SCI文章里面用了20
save(sft,file = "step2_beta_value.Rdata")

4. 采用一步法构建加权共表达网络

(weight co-expression network)

rm(list = ls())
library(WGCNA)
load(file = "input.Rdata")
load(file = "step2_beta_value.Rdata")
enableWGCNAThreads()
if(T){net = blockwiseModules(datExpr,power = sft$powerEstimate,maxBlockSize = nGenes,TOMType = "unsigned", minModuleSize = 30,reassignThreshold = 0, mergeCutHeight = 0.28, ## 注意这个0.28numericLabels = TRUE, pamRespectsDendro = FALSE,saveTOMs = F,verbose = 3)table(net$colors)
}##模块可视化,画那个传说中的树
if(T){# Convert labels to colors for plottingmoduleColors=labels2colors(net$colors)table(moduleColors)# Plot the dendrogram and the module colors underneathpng("figures/step3-genes-modules.png",width = 800,height = 600)plotDendroAndColors(net$dendrograms[[1]], moduleColors[net$blockGenes[[1]]],"Module colors",dendroLabels = FALSE, hang = 0.03,addGuide = TRUE, guideHang = 0.05)dev.off()
}
save(net,moduleColors,file = "step3_genes_modules.Rdata")

结果如下:

这里需要停顿一下:SCI文章只取到8个模块,图注说用了12691个基因(被我发现全部加起来也才9999,没错!9999个基因)。事实上,能够分这么少模块,我不知道作者参数细节。有两个参数特别影响模块大小:

  • 一个是minModuleSize,表示每个模块里面最少放多少个基因,这很好理解,设定越大,模块越少;

  • 另一个是mergeCutHeight,这个参数表示你在哪里砍树。值设定越小,树枝越多,通常是0.25。

google一下,发现这么一个说法:

I am not aware of a principle from which

这里,我的mergeCutHeight = 0.28,最终取到13个模块!

5. 模块与基因与表型

5.1 模块和老鼠表型对应

这是关键,其实就是开始解读这些关联的基因群对老鼠表型的影响,比如敲除CDC11之后的老鼠,哪些信号通路激活或抑制,EAE的老鼠哪些基因激活?等等等。

rm(list = ls())
library(WGCNA)
load(file = "input.Rdata")
load(file = "step2_beta_value.Rdata")
load(file = "step3_genes_modules.Rdata")if(T){nGenes = ncol(datExpr)nSamples = nrow(datExpr)design <- model.matrix(~0+datTraits$group)colnames(design)= levels(datTraits$group) ## get the groupMES0 <- moduleEigengenes(datExpr,moduleColors)$eigengenesMEs = orderMEs(MES0)moduleTraitCor <- cor(MEs,design,use = "p")moduleTraitPvalue <- corPvalueStudent(moduleTraitCor,nSamples)textMatrix = paste(signif(moduleTraitCor,2),"\n(",signif(moduleTraitPvalue,1),")",sep = "")dim(textMatrix)=dim(moduleTraitCor)png("figures/step4-Module-trait-relationship.png",width = 800,height = 1200,res = 120)par(mar=c(6, 8.5, 3, 3))labeledHeatmap(Matrix = moduleTraitCor,xLabels = colnames(design),yLabels = names(MEs),ySymbols = names(MEs),colorLabels = FALSE,colors = blueWhiteRed(50),textMatrix = textMatrix,setStdMargins = FALSE,cex.text = 0.5,zlim = c(-1,1),main = paste("Module-trait relationships"))dev.off()save(design,file = "step4_design.Rdata")
}

通过模块与各种表型的相关系数,可发现自己感兴趣的模块。如图:

可以看到有些模块与成年鼠关联,有些与EAE相关性很高。和原SCI的图有点像。

5.2 各模块与表型相关性bar图

就是上面的图转化成bar图。

if(T){mes_group <- merge(MEs,datTraits,by="row.names") # 小技巧,可以通过rowname进行merge##写了个画图的functionlibrary(gplots)library(ggplot2)library(ggpubr)library(grid)library(gridExtra) draw_ggboxplot <- function(data,gene="P53",group="group"){#print(gene)ggboxplot(data,x=group, y=gene,ylab = sprintf("Expression of %s",gene),xlab = group,fill = group,palette = "jco",#add="jitter",legend = "") +stat_compare_means()}###开始批量画boxplotcolorNames = names(MEs)png("figures/step4-expression-group.png",width = 800,height=2000,res = 120)#par(mfrow=c(ceiling(length(colorNames)/2),2))p <- lapply(colorNames,function(x) {draw_ggboxplot(mes_group,gene= x,group="group")})do.call(grid.arrange,c(p,ncol=2))dev.off()
}

mes_group <- merge(MEs,datTraits,by="row.names") # 小技巧,可以通过rowname进行merge 

部分结果如下:

5.3 模块与基因的相关性

if(T){modNames = substring(names(MEs), 3)geneModuleMembership = as.data.frame(cor(datExpr, MEs, use = "p"))## 算出每个模块跟基因的皮尔森相关系数矩阵## MEs是每个模块在每个样本里面的值## datExpr是每个基因在每个样本的表达量MMPvalue = as.data.frame(corPvalueStudent(as.matrix(geneModuleMembership), nSamples))names(geneModuleMembership) = paste("MM", modNames, sep="")names(MMPvalue) = paste("p.MM", modNames, sep="")geneTraitSignificance <- as.data.frame(cor(datExpr,datTraits$groupNo,use = "p"))GSPvalue <- as.data.frame(corPvalueStudent(as.matrix(geneTraitSignificance),nSamples))names(geneTraitSignificance)<- paste("GS.",names(datTraits$group),sep = "")names(GSPvalue)<-paste("GS.",names(datTraits$group),sep = "")#selectModule<-c("blue","green","purple","grey")  ##可以选择自己喜欢的模块selectModule <- modNames  ## 也可以批量作图png("figures/step4-Module-trait-significance.png",width = 800,height=2000,res = 120)par(mfrow=c(ceiling(length(selectModule)/2),2)) #批量作图开始for(module in selectModule){column <- match(module,selectModule)print(module)moduleGenes <- moduleColors==moduleverboseScatterplot(abs(geneModuleMembership[moduleGenes, column]),abs(geneTraitSignificance[moduleGenes, 1]),xlab = paste("Module Membership in", module, "module"),ylab = paste("Gene significance for", module, "module"),main = paste("Module membership vs. gene significance\n"),cex.main = 1.2, cex.lab = 1.2, cex.axis = 1.2, col = module)}dev.off()
}

截取部分如下:

5.4 关注感兴趣的模块,导出基因进行GO分析

rm(list = ls())
library(WGCNA)
load(file = 'input.Rdata')
load(file = "step2_beta_value.Rdata")
load(file = "step3_genes_modules.Rdata")
load(file = "step4_design.Rdata")
load(file="step4_datTraits.Rdata")table(moduleColors)
group_g <- data.frame(gene=colnames(datExpr),group=moduleColors)
write.csv(group_g,file = "figures/group_g.csv") ## 导出了对应模块所有基因# 选择mouse的基因组进行注释及ID转化啥的,如果是人的,另有R包
if(!require("clusterProfiler")) BiocManager::install("clusterProfiler",ask = F,update = F)
library(clusterProfiler)
if(!require("org.Mm.eg.db")) BiocManager::install("org.Mm.eg.db",ask = F,update = F)
library(org.Mm.eg.db)
tmp <- bitr(group_g$gene,fromType = "ENSEMBL",toType = "ENTREZID",OrgDb = "org.Mm.eg.db")
de_gene_cluster <- merge(tmp,group_g,by.x="ENSEMBL",by.y="gene")
table(de_gene_cluster$group)###run go analysis
formula_res <- compareCluster(ENTREZID~group,data = de_gene_cluster,fun = "enrichGO",OrgDb = "org.Mm.eg.db",style="margin: 0px; padding: 0px; font-size: inherit; line-height: inherit; color: rgb(80, 161, 79); overflow-wrap: inherit !important; word-break: inherit !important;">"BP",pAdjustMethod = "BH",pvalueCutoff = 0.01,qvalueCutoff = 0.05
)lineage1_ego <-simplify(formula_res,cutoff=0.5,by="p.adjust",select_fun=min
)
save(group_g,formula_res,lineage1_ego,file="step5_GOananlysis.Rdata")
#出图
pdf("figures/step5_Microglia_GO_term_DE.pdf",width = 15,height = 15)
dotplot(lineage1_ego,showCategory=10)
dev.off()
write.csv(lineage1_ego@compareClusterResult,file="figures/Microglia_GO_term_DE.csv")

GO结果分析如下:

通过与上面的几个图表配合,发现EAE鼠与炎症通路/免疫相关,而新生鼠的胶质细胞则是cell cycle/胶质细胞分化/神经细胞发育/轴树突形成通路相关。与原SCI分析相符,与预期相符。

5.5 画WGCNA的标配热图

rm(list = ls())
library(WGCNA)
load(file = 'input.Rdata')
load(file = "step2_beta_value.Rdata")
load(file = "step3_genes_modules.Rdata")
load(file = "step4_design.Rdata")
load(file="step4_datTraits.Rdata")
load(file="step5_GOananlysis.Rdata")
# 主要是可视化 TOM矩阵,WGCNA的标准配图
# 然后可视化不同 模块 的相关性 热图
# 不同模块的层次聚类图
# 还有模块诊断,主要是 intramodular connectivity
if(T){#geneTree = net$dendrograms[[1]]TOM=TOMsimilarityFromExpr(datExpr,power=20)dissTOM=1-TOM#plotTOM = dissTOM^7#diag(plotTOM)=NA#TOMplot(plotTOM,geneTree,moduleColors,main="Network heapmap plot of all genes")### 我这里只取了1000个基因哈,我试了一下全部基因,结果跑了半个小时没跑完,被我强行退出!nSelect =1000set.seed(20)select=sample(nGenes,size = nSelect)selectTOM = dissTOM[select,select]selectTree = hclust(as.dist(selectTOM),method = "average")selectColors = moduleColors[select]plotDiss=selectTOM^7diag(plotDiss)=NApng("figures/step6_select_Network-heatmap.png",width = 800,height=600)TOMplot(plotDiss,selectTree,selectColors,main="Network heapmap of selected gene")dev.off()
}

如下:

5.6 提取指定模块的基因并做热图

if(T){module="turquoise"which.module=moduledat=datExpr[,moduleColors==which.module]library(pheatmap)pheatmap(dat,show_colnames = F,show_rownames = F)n=scale(t(dat+1)) # 'scale'可以对log-ratio数值进行归一化n[n>2]=2 n[n< -2]= -2n[1:4,1:4]pheatmap(n,show_colnames =F,show_rownames = F)group_list=datTraits$groupac=data.frame(g=group_list)rownames(ac)=colnames(n)png("figures/step6-moduleGene-heatmap.png",width = 800,height = 600)pheatmap(n,show_colnames =F,show_rownames = F,annotation_col =ac )dev.off()}

都挺好的。

点评:这个热图有问题,具体代码是  n=scale(t(dat+1)) 里面少了一个转置!

5.7 性状与模块的关系

if(T){## 连续性的变量,如体重等才好和模块进行比较。MEs=moduleEigengenes(datExpr,moduleColors)$eigengenesMET = orderMEs(cbind(MEs,datTraits$groupNo))par(cex = 0.9)png("figures/step6-Eigengene-dendrogram.png",width = 800,height = 600)plotEigengeneNetworks(MET, "", marDendro = c(0,4,1,2), marHeatmap = c(3,4,1,2), cex.lab = 0.8, xLabelsAngle= 90)dev.off()
}

如图:

6. 模块导出

感兴趣的模块导出到cytoscape,VisANT等软件进一步进行可视化分析。

#######gene export for VisANT or cytoscape
if(T){module="turquoise"probes = colnames(datExpr)inModule = (moduleColors==module)modProbes=probes[inModule]head(modProbes)modTOM = TOM[inModule,inModule]dimnames(modTOM)=list(modProbes,modProbes)### 这里只选了top100的基因nTop=100IMConn = softConnectivity(datExpr[,modProbes])top=(rank(-IMConn)<=nTop)filterTOM=modTOM[top,top]# for visANTvis = exportNetworkToVisANT(filterTOM,file = paste("figures/visANTinput-",module,".txt",sep = ""),weighted = T,threshold = 0)# for cytoscapecyt = exportNetworkToCytoscape(filterTOM,edgeFile = paste("figures/CytoscapeInput-edges-", paste(module, collapse="-"), ".txt", sep=""),nodeFile = paste("figures/CytoscapeInput-nodes-", paste(module, collapse="-"), ".txt", sep=""),weighted = TRUE,threshold = 0.02,nodeNames = modProbes[top], nodeAttr = moduleColors[inModule][top])
}

总结

至此,复现结束。如果你仔细的看到了这里,说明你是想学WGCNA的了。回到课题本身,是否对小胶质细胞的认识更近了一步?提个小小的科学问题:在PD中,是否有一个胶质细胞亚群可对MPP+诱导的黑质神经细胞死亡有抵抗和保护作用?哈哈,别说我是搞肿瘤的了。。。

这些内容,换个思路应该够一个硕士毕业了。。。

rm(list = ls())
dev.off()
library(WGCNA)
# 偷懒,读取作者给出的附件的表达矩阵
getwd()
file="G:/papers_for_ARDS/r_scripts_for_ARDS_From_zhongda/lncRNA_mRNA_cicleRNA/microglial cells/EMBJ-36-3292-s002/embj201696056-sup-0003-DatasetEV1"
dir.create(file)
setwd(file)
getwd()
??read.csv
norm.counts <- openxlsx::read.xlsx("Dataset_EV1.xlsx",sheet = 2) ##original counts
head(norm.counts)
rownames(norm.counts)<-norm.counts[,1]
norm.counts <- norm.counts[,2:18]
head(norm.counts)
save(norm.counts,file = "norm.counts.Rdata")
load("norm.counts.Rdata")## 处理表型信息,就是老鼠的分组信息
a <- colnames(norm.counts)
library(stringr)
str_split(a,"[,]",simplify = T)
strsplit("UK, USA, Germany", ",(?=[^,]+$)", perl=TRUE)
str_split(a, ",(?=[^,]+$)",simplify = T)#以最后一个逗号为分割符号 切割字符串
str_tmp=as.data.frame(str_split(a, ",(?=[^,]+$)",simplify = T))
rownames(str_tmp) <- colnames(norm.counts)
colnames(str_tmp) <- c("group","sampleRep")
str_tmp
datTraits <- str_tmp
datTraits$groupNo <- c(rep(1,3),rep(2,4),rep(3,4),rep(4,3),rep(5,3))
datTraits
###数据初步处理完成## 筛选中位绝对偏差前75%的基因,取MAD大于0.3
## 筛选后会降低运算量,也会失去部分信息
## 也可不做筛选,使MAD大于0即可
head(norm.counts)
norm.counts <- log2(norm.counts)  ##注意我在这里进行了log2,因为我发现如果不log的话,出现了个很有意思的现象,请您往下看
m.mad <- apply(norm.counts,1,mad)
dataExprMad <- norm.counts[which(m.mad > max(quantile(m.mad, probs=seq(0, 1, 0.25))[2],0.3)),] #25%  = 0.22
datExpr0 <- as.data.frame(t(dataExprMad))
dim(datExpr0)
head(datExpr0)[1:10,1:10]
##最终取到8222个基因。###########start handling the data
##看看有没有为空的值,需要剔除。这里下载的都是作者处理好的,基本没问题。
gsg <- goodSamplesGenes(datExpr0,verbose = 3)
gsg$allOK
if (!gsg$allOK){# Optionally, print the gene and sample names that were removed:if (sum(!gsg$goodGenes)>0)printFlush(paste("Removing genes:", paste(names(datExpr0)[!gsg$goodGenes],collapse = ", ")));if (sum(!gsg$goodSamples)>0)printFlush(paste("Removing samples:",paste(rownames(datExpr0)[!gsg$goodSamples], collapse = ", ")));# Remove the offending genes and samples from the data:datExpr0 = datExpr0[gsg$goodSamples, gsg$goodGenes]
}
gsg <- goodSamplesGenes(datExpr0,verbose = 3)
gsg$allOK###画个树看看样本有没有离群值!!!
if(T){# Plot a line to show the cutsampleTree <- hclust(dist(datExpr0),method = "average")par(cex=.6)par(mar=c(0,4,2,0))plot(sampleTree, main = "Sample clustering to detect outliers", sub = "",xlab = "",cex.lab = 1.5, cex.axis =1.5, cex.main=2)
}##然后保存数据
dim(datExpr0) ##17,8222
datExpr <- datExpr0
nGenes <- ncol(datExpr)
nSamples <- nrow(datExpr)
save(nGenes,nSamples,datExpr,datTraits,file="input.Rdata")rm(list = ls())
library(WGCNA)
load(file = "input.Rdata")
dim(datExpr)
head(datTraits)
#################################################
if(T){powers = c(c(1:10), seq(from = 12, to=30, by=2))# Call the network topology analysis functionsft = pickSoftThreshold(datExpr, powerVector = powers, verbose = 5)dev.off()getwd()dir.create("./figures")png("figures/step2-beta-value.png",width = 800,height = 600)# Plot the results:##sizeGrWindow(9, 5)par(mfrow = c(1,2));cex1 = 0.9;# Scale-free topology fit index as a function of the soft-thresholding powerplot(sft$fitIndices[,1], -sign(sft$fitIndices[,3])*sft$fitIndices[,2],xlab="Soft Threshold (power)",ylab="Scale Free Topology Model Fit,signed R^2",type="n",main = paste("Scale independence"));text(sft$fitIndices[,1], -sign(sft$fitIndices[,3])*sft$fitIndices[,2],labels=powers,cex=cex1,col="red");# this line corresponds to using an R^2 cut-off of habline(h=0.9,col="red")# Mean connectivity as a function of the soft-thresholding powerplot(sft$fitIndices[,1], sft$fitIndices[,5],xlab="Soft Threshold (power)",ylab="Mean Connectivity", type="n",main = paste("Mean connectivity"))text(sft$fitIndices[,1], sft$fitIndices[,5], labels=powers, cex=cex1,col="red")dev.off()
}
sft$powerEstimate  ## beta=22 SCI文章里面用了20
save(sft,file = "step2_beta_value.Rdata")rm(list = ls())
library(WGCNA)
load(file = "input.Rdata")
load(file = "step2_beta_value.Rdata")
enableWGCNAThreads()
if(T){net = blockwiseModules(datExpr,power = sft$powerEstimate,maxBlockSize = nGenes,TOMType = "unsigned", minModuleSize = 30,reassignThreshold = 0, mergeCutHeight = 0.28, ## 注意这个0.28numericLabels = TRUE, pamRespectsDendro = FALSE,saveTOMs = F,verbose = 3)table(net$colors)
}##模块可视化,画那个传说中的树
if(T){# Convert labels to colors for plottingmoduleColors=labels2colors(net$colors)table(moduleColors)# Plot the dendrogram and the module colors underneathpng("figures/step3-genes-modules.png",width = 800,height = 600)plotDendroAndColors(net$dendrograms[[1]], moduleColors[net$blockGenes[[1]]],"Module colors",dendroLabels = FALSE, hang = 0.03,addGuide = TRUE, guideHang = 0.05)dev.off()
}
save(net,moduleColors,file = "step3_genes_modules.Rdata")rm(list = ls())
library(WGCNA)
load(file = "input.Rdata")
load(file = "step2_beta_value.Rdata")
load(file = "step3_genes_modules.Rdata")if(T){nGenes = ncol(datExpr)nSamples = nrow(datExpr)datTraitsdesign <- model.matrix(~0+datTraits$group)colnames(design)= levels(as.factor(datTraits$group)) ## get the groupdesignMES0 <- moduleEigengenes(datExpr,moduleColors)$eigengeneshead(MES0)MEs = orderMEs(MES0)head(MEs)head(design)moduleTraitCor <- cor(MEs,design,use = "p")head(moduleTraitCor)moduleTraitPvalue <- corPvalueStudent(moduleTraitCor,nSamples)textMatrix = paste(signif(moduleTraitCor,2),"\n(",signif(moduleTraitPvalue,1),")",sep = "")dim(textMatrix)=dim(moduleTraitCor)textMatrixpng("figures/step4-Module-trait-relationship.png",width = 800,height = 1200,res = 120)par(mar=c(6, 8.5, 3, 3))labeledHeatmap(Matrix = moduleTraitCor,xLabels = colnames(design),yLabels = names(MEs),ySymbols = names(MEs),colorLabels = FALSE,colors = blueWhiteRed(50),textMatrix = textMatrix,setStdMargins = FALSE,cex.text = 0.5,zlim = c(-1,1),main = paste("Module-trait relationships"))dev.off()save(design,file = "step4_design.Rdata")
}if(T){mes_group <- merge(MEs,datTraits,by="row.names") # 小技巧,可以通过rowname进行merge##写了个画图的functionlibrary(gplots)library(ggplot2)library(ggpubr)library(grid)library(gridExtra) draw_ggboxplot <- function(data,gene="P53",group="group"){#print(gene)ggboxplot(data,x=group, y=gene,ylab = sprintf("Expression of %s",gene),xlab = group,fill = group,palette = "jco",#add="jitter",legend = "") +stat_compare_means()}###开始批量画boxplotcolorNames = names(MEs)png("figures/step4-expression-group.png",width = 800,height=2000,res = 120)#par(mfrow=c(ceiling(length(colorNames)/2),2))p <- lapply(colorNames,function(x) {draw_ggboxplot(mes_group,gene= x,group="group")})do.call(grid.arrange,c(p,ncol=2))dev.off()
}if(T){modNames = substring(names(MEs), 3)modNameshead(MEs)head(datExpr)[1:6:1:9]geneModuleMembership = as.data.frame(cor(datExpr, MEs, use = "p"))## 算出每个模块跟基因的皮尔森相关系数矩阵## MEs是每个模块在每个样本里面的值## datExpr是每个基因在每个样本的表达量MMPvalue = as.data.frame(corPvalueStudent(as.matrix(geneModuleMembership), nSamples))names(geneModuleMembership) = paste("MM", modNames, sep="")names(MMPvalue) = paste("p.MM", modNames, sep="")head(MMPvalue)geneTraitSignificance <- as.data.frame(cor(datExpr,datTraits$groupNo,use = "p"))head(geneTraitSignificance)GSPvalue <- as.data.frame(corPvalueStudent(as.matrix(geneTraitSignificance),nSamples))head(GSPvalue)names(geneTraitSignificance)<- paste("GS.",names(datTraits$group),sep = "")names(GSPvalue)<-paste("GS.",names(datTraits$group),sep = "")head(GSPvalue)#selectModule<-c("blue","green","purple","grey")  ##可以选择自己喜欢的模块selectModule <- modNames  ## 也可以批量作图png("figures/step4-Module-trait-significance.png",width = 800,height=2000,res = 120)par(mfrow=c(ceiling(length(selectModule)/2),2)) #批量作图开始for(module in selectModule){column <- match(module,selectModule)print(module)moduleGenes <- moduleColors==moduleverboseScatterplot(abs(geneModuleMembership[moduleGenes, column]),abs(geneTraitSignificance[moduleGenes, 1]),xlab = paste("Module Membership in", module, "module"),ylab = paste("Gene significance for", module, "module"),main = paste("Module membership vs. gene significance\n"),cex.main = 1.2, cex.lab = 1.2, cex.axis = 1.2, col = module)}dev.off()
}

wgcna 原文复现 小胶质细胞亚群在脑发育时髓鞘形成的作用 microglial相关推荐

  1. WGCNA复现:小胶质细胞亚群在脑发育时髓鞘形成的作用

    Hi大家好,我是老米!生信入门一个月,感谢生信技能树平台. Homework Response to: 重复一篇WGCNA分析的文章.WGCNA学习教程:一文学会WGCNA分析. 复现的文献:A no ...

  2. 随着上网次数越来越多,IE地址栏中留下大量的历史网址,感觉很不爽,于是决心写一个清除IE地址栏的应用程序,随说有“上网助手”,但它要在能上网时才起作用,我想在不能上网时来对系统进行清理,于是写了一个叫

    随着上网次数越来越多,IE地址栏中留下大量的历史网址,感觉很不爽,于是决心写一个清除IE地址栏的应用程序,随说有"上网助手",但它要在能上网时才起作用,我想在不能上网时来对系统进行 ...

  3. bootstrap中col-xs-*在屏幕缩小时没起作用

    问题描述: 在bootstrap中的col-xs-*在屏幕缩小时没起作用. eg. <div class="container-fluid"><div class ...

  4. 分享数控车削时夹具的作用

    分享数控车削时夹具的作用 一.数控车床夹具的定义和分类 在数控车床上用于装夹工件的装置称为车床夹具.车床夹具可分为通用夹具和专用夹具两大类.通用夹具是指能够装夹两种或两种以上工件的夹具,例如车床上的三 ...

  5. 护壁桩嵌入深度_泥浆护壁成孔灌注桩施工时泥浆的作用有哪些?

    混凝土灌注桩是直接在施工现场的桩位上成孔,然后在孔内安装钢筋骨架,浇筑混凝土成桩. 灌注桩按成孔方法可分为钻孔灌注桩.套管成孔灌注桩.挖孔灌注桩和爆扩成孔灌注桩等.灌注桩与预制桩相比,具有节约钢筋.节 ...

  6. angularjs中使用swiper时不起作用,最后出现空白位

    controller.js中定义swipers指令: var moduleCtrl = angular.module('newscontroller',['infinite-scroll','ngTo ...

  7. java a[i].setx(-1);_java – setX和setY在尝试定位图像时不起作用

    我在使用setX上的setX和setY为我的JavaFX程序中定位图像时遇到问题.我不确定是什么问题?感谢任何给予的帮助! 这是我的代码: Image rocket2 = new Image(&quo ...

  8. c语言中调试时go的作用,C语言调用GO

    C语言调用GO 最近工作中遇到需要在c语言里面调用go语言的需求,总结了一下,下面代码里面的每一个注释都很有用,闲话不多说,直接上代码~ 示例 GO代码: package main // 这个文件一定 ...

  9. MVC里面写html获取不到input,asp.net-mvc – ASP.Net [HiddenInput]数据属性在Razor中用Html.EditorForModel渲染时不起作用?...

    我有以下型号: public class Product { [HiddenInput(DisplayValue = false)] public int ProductID { get; set; ...

最新文章

  1. 类 求数组最大最小平均
  2. 【Groovy】Groovy 运算符重载 ( 运算符重载 | 运算符重载对应方法 )
  3. 【Netty】NIO 缓冲区 ( Buffer ) 分散 Scattering 与 聚合 Gathering 操作
  4. 求助:如何在Vista系统环境下增加系统盘C盘的容量?
  5. 计算机组成原理认识fpga,计算机组成原理课程设计-基于EDA和FPGA技术的8位模型计算机的设计与实现_精品.doc...
  6. Java中的策略设计模式-示例教程
  7. 学以致用十四-----打造一个简单的vim IDE
  8. std在汇编语言是什么指令_汇编语言程序指令整理
  9. kubernetes 客户端client-go 使用及常用api
  10. webuploader
  11. 第17章 高级数据表示 17.7 二叉搜索树(第一部分ADT 和 接口)
  12. 浅谈Java程序员的黄金五年,如何实现快速进阶
  13. 目前常用计算机配置,电脑常见主要配置、参数
  14. [已解决]The server cannot or will not process the request due to something that is perceived to be
  15. 【火炉炼AI】机器学习008-简单线性分类器解决二分类问题
  16. 微信小程序原生的下拉框组件
  17. 如何修改mind map pro 的快捷键 how to edit shortcut of mind map pro
  18. 店铺logo设计免费在线生成
  19. 神经网络与深度学习-chapter2 反向传播算法
  20. Linux之PyTorch安装

热门文章

  1. 申请美国大学计算机专业,低GPA如何申请美国大学计算机专业
  2. 值得收藏|基于全球切片解析标准TMS的瓦片规则
  3. 职称计算机考试演示,2015职称计算机考试模拟题:演示文稿的放映、打包和打印...
  4. 排队论,对策论,层次分析法
  5. 促进企业流量高质量转化,华为云CDN加速方案值得选择
  6. 高级密码学复习2-HUST版
  7. 服务器稳定度cpu温度,现在这天气我的CPU温度稳定在60度...打游戏70度,会不会烧?...
  8. VLAN基础VLAN间路由联动OSPF实验
  9. 三菱plc恒压供水程序+威纶触摸屏程序 以控制水泵一用一备、一拖二、一拖三、一拖四、一拖四带小泵恒压功能
  10. 游戏APP推荐,快来开启游戏时间