WGCNA 分析流程 2008 年发表在 BMC 之后的影响力还是很高的,先后在各大期刊都能看到,但是就其分析的过程来看,还是需要有一定 R 语言的基础才能完整的复现出来文章中的结果,这期就搞出来供大家参考,因为我自己尝试了一下结直肠癌的数据集,但是有些地方不太理想,故还是有软件包自带的数据来运行,终结起来就是输入两个文件,其他就是根据自己的数据集进行过滤,筛选,最终确定一个满意的基因集。

前言

相关网络在生物信息学中的应用日益广泛。例如,加权基因共表达网络分析是一种描述微阵列样本间基因相关性模式的系统生物学方法。加权相关网络分析(Weighted correlation network analysis, WGCNA)可用于寻找高度相关基因的聚类(模块),利用模块特征基因或模块内的中心基因来总结这些聚类,利用特征基因网络方法(eigengene network methodology,WGCNA)将模块之间和模块与外部样本性状进行关联,以及用于计算模块成员度量。相关网络促进了基于网络的基因筛选方法,可用于识别候选生物标记物或治疗靶点。这些方法已成功应用于各种生物学背景,如癌症、小鼠遗传学、酵母遗传学和脑成像数据分析。虽然相关网络方法的部分内容已在单独的出版物中进行了描述,但仍需要提供一种用户友好的、全面的、一致的软件实现和辅助工具。

WGCNA软件包是一个R函数的综合集合,用于进行加权相关网络分析的各个方面。该软件包包括网络构建、模块检测、基因选择、拓扑性质计算、数据仿真、可视化以及与外部软件的接口等功能。除了R包,我们还提供了R软件教程。虽然这些方法的开发是由基因表达数据驱动的,但底层的数据挖掘方法可以应用于各种不同的设置。

WGCNA软件包为加权相关网络分析提供了R函数,如基因表达数据的共表达网络分析。计算方法较为复杂,这里略去,我们根据文献中给出来的WGCNA方法概述一步一步进行实际操作。构建基因共表达网络:使用加权的表达相关性。识别基因集:基于加权相关性,进行层级聚类分析,并根据设定标准切分聚类结果,获得不同的基因模块,用聚类树的分枝和不同颜色表示。如果有表型信息,计算基因模块与表型的相关性,鉴定性状相关的模块。研究模型之间的关系,从系统层面查看不同模型的互作网络。从关键模型中选择感兴趣的驱动基因,或根据模型中已知基因的功能推测未知基因的功能。导出TOM矩阵,绘制相关性图。

WGCNA 分析流程如下:

1.构建基因共表达网络:使用加权的表达相关性;

2.识别基因集:基于加权相关性,进行层级聚类分析,并根据设定标准切分聚类结果,获得不同的基因模块,用聚类树的分枝和不同颜色表示;

3.如果有表型信息,计算基因模块与表型的相关性,鉴定性状相关的模块;

4.研究模型之间的关系,从系统层面查看不同模型的互作网络;

5.从关键模型中选择感兴趣的驱动基因,或根据模型中已知基因的功能推测未知基因的功能;

6.导出TOM矩阵,绘制相关性图。


实例解析

WGCNA 整个流程的输入文件为 RNA-seq 测序结果计算获得的 Count 矩阵,通常我们需要转置输入数据,这样 rows = treatments and columns = gene probes,WGCNA 输出的是聚类基因列表和加权基因相关网络文件。WGCNA 数据过滤分析流程如下:

1. Data input and cleaning

软件包安装及加载,如下:

if(!require(WGCNA)){BiocManager::install("WGCNA")
}
library(WGCNA)

第一步,读取表达数据,并进行预处理成适合于网络分析的格式,并通过删除明显的离群样本、基因和缺失条目过多的样本来过滤数据。表达式数据包含在R包中,文件名为 LiverFemale3600.csv。

# The following setting is important, do not omit.
options(stringsAsFactors = FALSE)
# Read in the female liver data set
femData = read.csv("./FemaleLiver-Data/LiverFemale3600.csv")
# Take a quick look at what is in the data set:
dim(femData)
## [1] 3600  143
head(names(femData))
## [1] "substanceBXH"   "gene_symbol"    "LocusLinkID"    "ProteomeID"
## [5] "cytogeneticLoc" "CHROMOSOME"
datExpr0 = as.data.frame(t(femData[, -c(1:8)]))
names(datExpr0) = femData$substanceBXH
rownames(datExpr0) = names(femData)[-c(1:8)]
gsg = goodSamplesGenes(datExpr0, verbose = 3)
##  Flagging genes and samples with too many missing values...
##   ..step 1
gsg$allOK
## [1] TRUE# If the last statement returns TRUE, all genes have passed the cuts. If not,
# we #remove the offending genes and samples from the data:
if (!gsg$allOK) {# Optionally, print the gene and sample names that were removed:if (sum(!gsg$goodGenes) > 0)printFlush(paste("Removing genes:", paste(names(datExpr0)[!gsg$goodGenes],collapse = ", ")))if (sum(!gsg$goodSamples) > 0)printFlush(paste("Removing samples:", paste(rownames(datExpr0)[!gsg$goodSamples],collapse = ", ")))# Remove the offending genes and samples from the data:datExpr0 = datExpr0[gsg$goodSamples, gsg$goodGenes]
}
# Next we cluster the samples (in contrast to clustering genes that will come
# later) to see if there are any obvious outliers.
sampleTree = hclust(dist(datExpr0), method = "average")
# Plot the sample tree: Open a graphic output window of size 12 by 9 inches The
# user should change the dimensions if the window is too large or too small.
# sizeGrWindow(12,9) pdf(file = 'Plots/sampleClustering.pdf', width = 12,
# height = 9);
par(mar = c(0, 4, 2, 0), cex = 0.6)
plot(sampleTree, main = "Sample clustering to detect outliers", sub = "", xlab = "",cex.lab = 1.5, cex.axis = 1.5, cex.main = 2)
# Plot a line to show the cut
abline(h = 15, col = "red")

Figure 1: Clustering dendrogram of samples based on their Euclidean distance.

添加表型信息,如下:

# Determine cluster under the line
clust = cutreeStatic(sampleTree, cutHeight = 15, minSize = 10)
table(clust)
## clust
##   0   1
##   1 134
# clust 1 contains the samples we want to keep.
keepSamples = (clust == 1)
datExpr = datExpr0[keepSamples, ]
nGenes = ncol(datExpr)
nSamples = nrow(datExpr)
traitData = read.csv("./FemaleLiver-Data/ClinicalTraits.csv")
dim(traitData)
## [1] 361  38
names(traitData)
##  [1] "X"                  "Mice"               "Number"
##  [4] "Mouse_ID"           "Strain"             "sex"
##  [7] "DOB"                "parents"            "Western_Diet"
## [10] "Sac_Date"           "weight_g"           "length_cm"
## [13] "ab_fat"             "other_fat"          "total_fat"
## [16] "comments"           "X100xfat_weight"    "Trigly"
## [19] "Total_Chol"         "HDL_Chol"           "UC"
## [22] "FFA"                "Glucose"            "LDL_plus_VLDL"
## [25] "MCP_1_phys"         "Insulin_ug_l"       "Glucose_Insulin"
## [28] "Leptin_pg_ml"       "Adiponectin"        "Aortic.lesions"
## [31] "Note"               "Aneurysm"           "Aortic_cal_M"
## [34] "Aortic_cal_L"       "CoronaryArtery_Cal" "Myocardial_cal"
## [37] "BMD_all_limbs"      "BMD_femurs_only"
# remove columns that hold information we do not need.
allTraits = traitData[, -c(31, 16)]
allTraits = allTraits[, c(2, 11:36)]
dim(allTraits)
## [1] 361  27
names(allTraits)
##  [1] "Mice"               "weight_g"           "length_cm"
##  [4] "ab_fat"             "other_fat"          "total_fat"
##  [7] "X100xfat_weight"    "Trigly"             "Total_Chol"
## [10] "HDL_Chol"           "UC"                 "FFA"
## [13] "Glucose"            "LDL_plus_VLDL"      "MCP_1_phys"
## [16] "Insulin_ug_l"       "Glucose_Insulin"    "Leptin_pg_ml"
## [19] "Adiponectin"        "Aortic.lesions"     "Aneurysm"
## [22] "Aortic_cal_M"       "Aortic_cal_L"       "CoronaryArtery_Cal"
## [25] "Myocardial_cal"     "BMD_all_limbs"      "BMD_femurs_only"
# Form a data frame analogous to expression data that will hold the clinical
# traits.
femaleSamples = rownames(datExpr)
traitRows = match(femaleSamples, allTraits$Mice)
datTraits = allTraits[traitRows, -1]
rownames(datTraits) = allTraits[traitRows, 1]
collectGarbage()
# Re-cluster samples
sampleTree2 = hclust(dist(datExpr), method = "average")
# Convert traits to a color representation: white means low, red means high,
# grey means missing entry
traitColors = numbers2colors(datTraits, signed = FALSE)
# Plot the sample dendrogram and the colors underneath.
plotDendroAndColors(sampleTree2, traitColors, groupLabels = names(datTraits), main = "Sample dendrogram and trait heatmap")

Figure 2: Clustering dendrogram of samples based on their Euclidean distance.

save(datExpr, datTraits, file = "FemaleLiver-01-dataInput.RData")

2. Network construction and module detection

这一步是使用WGCNA方法进行的所有网络分析的基石。有三种不同的方式构建网络并识别模块,包括:a.采用第一步网络建设和模块检测功能,适合希望到达的用户以最小的努力获得结果; b.为想要尝试定制/替代方法的用户提供分步网络建设和模块检测; c.为希望分析数据的用户提供一种全数据块网络构建和模块检测自动检测方法集太大,无法集中分析。

首先选择软阈值能力,网络拓扑分析,如下:

# Choose a set of soft-thresholding powers
powers = c(c(1:10), seq(from = 12, to = 20, by = 2))
# Call the network topology analysis function
sft = pickSoftThreshold(datExpr, powerVector = powers, verbose = 5)
## pickSoftThreshold: will use block size 3600.
##  pickSoftThreshold: calculating connectivity for given powers...
##    ..working on genes 1 through 3600 of 3600
## Warning: executing %dopar% sequentially: no parallel backend registered
##    Power SFT.R.sq  slope truncated.R.sq mean.k. median.k. max.k.
## 1      1   0.0278  0.345          0.456  747.00  762.0000 1210.0
## 2      2   0.1260 -0.597          0.843  254.00  251.0000  574.0
## 3      3   0.3400 -1.030          0.972  111.00  102.0000  324.0
## 4      4   0.5060 -1.420          0.973   56.50   47.2000  202.0
## 5      5   0.6810 -1.720          0.940   32.20   25.1000  134.0
## 6      6   0.9020 -1.500          0.962   19.90   14.5000   94.8
## 7      7   0.9210 -1.670          0.917   13.20    8.6800   84.1
## 8      8   0.9040 -1.720          0.876    9.25    5.3900   76.3
## 9      9   0.8590 -1.700          0.836    6.80    3.5600   70.5
## 10    10   0.8330 -1.660          0.831    5.19    2.3800   65.8
## 11    12   0.8530 -1.480          0.911    3.33    1.1500   58.1
## 12    14   0.8760 -1.380          0.949    2.35    0.5740   51.9
## 13    16   0.9070 -1.300          0.970    1.77    0.3090   46.8
## 14    18   0.9120 -1.240          0.973    1.39    0.1670   42.5
## 15    20   0.9310 -1.210          0.977    1.14    0.0951   38.7
# Plot the results: sizeGrWindow(9, 5)
par(mfrow = c(1, 2))
cex1 = 0.9
# Scale-free topology fit index as a function of the soft-thresholding power
plot(sft$fitIndices[, 1], -sign(sft$fitIndices[, 3]) * sft$fitIndices[, 2], xlab = "Soft Threshold (power)",ylab = "Scale Free Topology Model Fit,signed R^2", type = "n", main = paste("Scale independence"))
text(sft$fitIndices[, 1], -sign(sft$fitIndices[, 3]) * sft$fitIndices[, 2], labels = powers,cex = cex1, col = "red")
# this line corresponds to using an R^2 cut-off of h
abline(h = 0.9, col = "red")
# Mean connectivity as a function of the soft-thresholding power
plot(sft$fitIndices[, 1], sft$fitIndices[, 5], xlab = "Soft Threshold (power)", ylab = "Mean Connectivity",type = "n", main = paste("Mean connectivity"))
text(sft$fitIndices[, 1], sft$fitIndices[, 5], labels = powers, cex = cex1, col = "red")

Figure 3: Analysis of network topology for various soft-thresholding powers.

softPower = 6
adjacency = adjacency(datExpr, power = softPower)
# Turn adjacency into topological overlap
TOM = TOMsimilarity(adjacency)
## ..connectivity..
## ..matrix multiplication (system BLAS)..
## ..normalization..
## ..done.
dissTOM = 1 - TOM
# Call the hierarchical clustering function
geneTree = hclust(as.dist(dissTOM), method = "average")
# Plot the resulting clustering tree (dendrogram) sizeGrWindow(12,9)
plot(geneTree, xlab = "", sub = "", main = "Gene clustering on TOM-based dissimilarity",labels = FALSE, hang = 0.04)

Figure 4: Clustering dendrogram of genes, with dissimilarity based on topological overlap.

# We like large modules, so we set the minimum module size relatively high:
minModuleSize = 30
# Module identification using dynamic tree cut:
dynamicMods = cutreeDynamic(dendro = geneTree, distM = dissTOM, deepSplit = 2, pamRespectsDendro = FALSE,minClusterSize = minModuleSize)
##  ..cutHeight not given, setting it to 0.996  ===>  99% of the (truncated) height range in dendro.
##  ..done.
table(dynamicMods)
## dynamicMods
##   0   1   2   3   4   5   6   7   8   9  10  11  12  13  14  15  16  17  18  19
##  88 614 316 311 257 235 225 212 158 153 121 106 102 100  94  91  78  76  65  58
##  20  21  22
##  58  48  34
# Convert numeric lables into colors
dynamicColors = labels2colors(dynamicMods)
table(dynamicColors)
## dynamicColors
##        black         blue        brown         cyan    darkgreen      darkred
##          212          316          311           94           34           48
##        green  greenyellow         grey       grey60    lightcyan   lightgreen
##          235          106           88           76           78           65
##  lightyellow      magenta midnightblue         pink       purple          red
##           58          153           91          158          121          225
##    royalblue       salmon          tan    turquoise       yellow
##           58          100          102          614          257
# Plot the dendrogram and colors underneath sizeGrWindow(8,6)
plotDendroAndColors(geneTree, dynamicColors, "Dynamic Tree Cut", dendroLabels = FALSE,hang = 0.03, addGuide = TRUE, guideHang = 0.05, main = "Gene dendrogram and module colors")

Figure 5: Clustering dendrogram of genes, with dissimilarity based on topological overlap, together with assigned module colors.

# Calculate eigengenes
MEList = moduleEigengenes(datExpr, colors = dynamicColors)
MEs = MEList$eigengenes
# Calculate dissimilarity of module eigengenes
MEDiss = 1 - cor(MEs)
# Cluster module eigengenes
METree = hclust(as.dist(MEDiss), method = "average")
# Plot the result sizeGrWindow(7, 6)
plot(METree, main = "Clustering of module eigengenes", xlab = "", sub = "")
MEDissThres = 0.25
# Plot the cut line into the dendrogram
abline(h = MEDissThres, col = "red")

Figure 6: Clustering dendrogram of genes, with dissimilarity based on topological overlap, together with assigned merged module colors and the original module colors.

# Call an automatic merging function
merge = mergeCloseModules(datExpr, dynamicColors, cutHeight = MEDissThres, verbose = 3)##  mergeCloseModules: Merging modules whose distance is less than 0.25
##    multiSetMEs: Calculating module MEs.
##      Working on set 1 ...
##      moduleEigengenes: Calculating 23 module eigengenes in given set.
##    multiSetMEs: Calculating module MEs.
##      Working on set 1 ...
##      moduleEigengenes: Calculating 19 module eigengenes in given set.
##    Calculating new MEs...
##    multiSetMEs: Calculating module MEs.
##      Working on set 1 ...
##      moduleEigengenes: Calculating 19 module eigengenes in given set.# The merged module colors
mergedColors = merge$colors
# Eigengenes of the new merged modules:
mergedMEs = merge$newMEs
# sizeGrWindow(12, 9) pdf(file = 'Plots/geneDendro-3.pdf', wi = 9, he = 6)
plotDendroAndColors(geneTree, cbind(dynamicColors, mergedColors), c("Dynamic Tree Cut","Merged dynamic"), dendroLabels = FALSE, hang = 0.03, addGuide = TRUE, guideHang = 0.05)

Figure 7: Clustering dendrogram of genes, with dissimilarity based on topological overlap

3. Relating modules to external clinical traits and identifying important genes

将模块与外部临床特征相关联,如下:

# Rename to moduleColors
moduleColors = mergedColors
# Construct numerical labels corresponding to the colors
colorOrder = c("grey", standardColors(50))
moduleLabels = match(moduleColors, colorOrder) - 1
MEs = mergedMEs
# Save module colors and labels for use in subsequent parts
save(MEs, moduleLabels, moduleColors, geneTree, file = "FemaleLiver-02-networkConstruction-stepByStep.RData")
# Define numbers of genes and samples
nGenes = ncol(datExpr)
nSamples = nrow(datExpr)
# Recalculate MEs with color labels
MEs0 = moduleEigengenes(datExpr, moduleColors)$eigengenes
MEs = orderMEs(MEs0)
moduleTraitCor = cor(MEs, datTraits, use = "p")
moduleTraitPvalue = corPvalueStudent(moduleTraitCor, nSamples)
# sizeGrWindow(10,6) Will display correlations and their p-values
textMatrix = paste(signif(moduleTraitCor, 2), "\n(", signif(moduleTraitPvalue, 1),")", sep = "")
dim(textMatrix) = dim(moduleTraitCor)
par(mar = c(6, 8.5, 3, 3))
# Display the correlation values within a heatmap plot
labeledHeatmap(Matrix = moduleTraitCor, xLabels = names(datTraits), yLabels = names(MEs),ySymbols = names(MEs), colorLabels = FALSE, colors = greenWhiteRed(50), textMatrix = textMatrix,setStdMargins = FALSE, cex.text = 0.3, zlim = c(-1, 1), main = paste("Module-trait relationships"))
## Warning in greenWhiteRed(50): WGCNA::greenWhiteRed: this palette is not suitable for people
## with green-red color blindness (the most common kind of color blindness).
## Consider using the function blueWhiteRed instead.

Figure 8: Module-trait associations.

模内分析:鉴定高 GS 和 MM 基因

# Define variable weight containing the weight column of datTrait
weight = as.data.frame(datTraits$weight_g)
names(weight) = "weight"
# names (colors) of the modules
modNames = substring(names(MEs), 3)
geneModuleMembership = as.data.frame(cor(datExpr, MEs, use = "p"))
MMPvalue = as.data.frame(corPvalueStudent(as.matrix(geneModuleMembership), nSamples))
names(geneModuleMembership) = paste("MM", modNames, sep = "")
names(MMPvalue) = paste("p.MM", modNames, sep = "")
geneTraitSignificance = as.data.frame(cor(datExpr, weight, use = "p"))
GSPvalue = as.data.frame(corPvalueStudent(as.matrix(geneTraitSignificance), nSamples))
names(geneTraitSignificance) = paste("GS.", names(weight), sep = "")
names(GSPvalue) = paste("p.GS.", names(weight), sep = "")
module = "brown"
column = match(module, modNames)
moduleGenes = moduleColors == module
# sizeGrWindow(7, 7);
par(mfrow = c(1, 1))
verboseScatterplot(abs(geneModuleMembership[moduleGenes, column]), abs(geneTraitSignificance[moduleGenes,1]), xlab = paste("Module Membership in", module, "module"), ylab = "Gene significance for body weight",main = paste("Module membership vs. gene significance\n"), cex.main = 1.2, cex.lab = 1.2,cex.axis = 1.2, col = module)

Figure 9: A scatterplot of Gene Significance (GS) for weight vs. Module Membership (MM) in the brown module

head(names(datExpr)[moduleColors == "brown"])
## [1] "MMT00000149" "MMT00000283" "MMT00000550" "MMT00000701" "MMT00001190"
## [6] "MMT00001555"
annot = read.csv(file = "./FemaleLiver-Data/GeneAnnotation.csv")
probes = names(datExpr)
probes2annot = match(probes, annot$substanceBXH)
# The following is the number or probes without annotation:
sum(is.na(probes2annot))
## [1] 0
# Should return 0.  Create the starting data frame
geneInfo0 = data.frame(substanceBXH = probes, geneSymbol = annot$gene_symbol[probes2annot],LocusLinkID = annot$LocusLinkID[probes2annot], moduleColor = moduleColors, geneTraitSignificance,GSPvalue)
# Order modules by their significance for weight
modOrder = order(-abs(cor(MEs, weight, use = "p")))
# Add module membership information in the chosen order
for (mod in1:ncol(geneModuleMembership)) {oldNames = names(geneInfo0)geneInfo0 = data.frame(geneInfo0, geneModuleMembership[, modOrder[mod]], MMPvalue[,modOrder[mod]])names(geneInfo0) = c(oldNames, paste("MM.", modNames[modOrder[mod]], sep = ""),paste("p.MM.", modNames[modOrder[mod]], sep = ""))
}
# Order the genes in the geneInfo variable first by module color, then by
# geneTraitSignificance
geneOrder = order(geneInfo0$moduleColor, -abs(geneInfo0$GS.weight))
geneInfo = geneInfo0[geneOrder, ]
write.csv(geneInfo, file = "geneInfo.csv")

4. Interfacing network analysis with other data such as functional annotation and gene ontology

我们之前的分析已经确定了几个高度相关的模块(标记为 brown, red, and salmon)与重量。为了便于生物学解释,我们想知道基因的基因本体模块,它们是否在某些功能类别中显着丰富等。

# Read in the probe annotation annot = read.csv(file =
# './FemaleLiver-Data/GeneAnnotation.csv'); Match probes in the data set to the
# probe IDs in the annotation file
probes = names(datExpr)
probes2annot = match(probes, annot$substanceBXH)
# Get the corresponding Locuis Link IDs
allLLIDs = annot$LocusLinkID[probes2annot]
# $ Choose interesting modules
intModules = c("brown", "red", "salmon")
for (module in intModules) {# Select module probesmodGenes = (moduleColors == module)# Get their entrez ID codesmodLLIDs = allLLIDs[modGenes]# Write them into a filefileName = paste("LocusLinkIDs-", module, ".txt", sep = "")write.table(as.data.frame(modLLIDs), file = fileName, row.names = FALSE, col.names = FALSE)
}
# As background in the enrichment analysis, we will use all probes in the
# analysis.
fileName = paste("LocusLinkIDs-all.txt", sep = "")
write.table(as.data.frame(allLLIDs), file = fileName, row.names = FALSE, col.names = FALSE)

GO 富集分析,如下:

GOenr = GOenrichmentAnalysis(moduleColors, allLLIDs, organism = "mouse", nBestP = 10)
## Warning in GOenrichmentAnalysis(moduleColors, allLLIDs, organism = "mouse", : This function is deprecated and will be removed in the near future.
## We suggest using the replacement function enrichmentAnalysis
## in R package anRichment, available from the following URL:
## https://labs.genetics.ucla.edu/horvath/htdocs/CoexpressionNetwork/GeneAnnotation/
##  GOenrichmentAnalysis: loading annotation data...
##   ..of the 3038  Entrez identifiers submitted, 2843 are mapped in current GO categories.
##   ..will use 2843 background genes for enrichment calculations.
##   ..preparing term lists (this may take a while)..
##   ..working on label set 1 ..
##     ..calculating enrichments (this may also take a while)..
##     ..putting together terms with highest enrichment significance..
tab = GOenr$bestPTerms[[4]]$enrichment
write.table(tab, file = "GOEnrichmentTable.csv", sep = ",", quote = TRUE, row.names = FALSE)
keepCols = c(1, 2, 5, 6, 7, 12, 13)
screenTab = tab[, keepCols]
# Round the numeric columns to 2 decimal places:
numCols = c(3, 4)
screenTab[, numCols] = signif(apply(screenTab[, numCols], 2, as.numeric), 2)
# Truncate the the term name to at most 40 characters
screenTab[, 7] = substring(screenTab[, 7], 1, 40)
# Shorten the column names:
colnames(screenTab) = c("module", "size", "p-val", "Bonf", "nInTerm", "ont", "term name")
rownames(screenTab) = NULL
# Set the width of R’s output. The reader should play with this number to
# obtain satisfactory output.
options(width = 95)
# Finally, display the enrichment table:
head(screenTab)
##   module size   p-val Bonf nInTerm ont                             term name
## 1  black  167 5.9e-05    1      15  MF signaling receptor activator activity
## 2  black  167 5.9e-05    1      15  MF              receptor ligand activity
## 3  black  167 1.2e-04    1      15  MF signaling receptor regulator activity
## 4  black  167 6.8e-04    1       5  BP                        mRNA transport
## 5  black  167 7.1e-04    1       4  BP                    dopamine transport
## 6  black  167 1.7e-03    1       6  BP                       amine transport

5. Network visualization using WGCNA functions

可视化基因网络,对所有基因做图,如下:

nGenes = ncol(datExpr)
nSamples = nrow(datExpr)
# Calculate topological overlap anew: this could be done more efficiently by saving the TOM
# calculated during module detection, but let us do it again here.
dissTOM = 1 - TOMsimilarityFromExpr(datExpr, power = 6)
## TOM calculation: adjacency..
## ..will not use multithreading.
##  Fraction of slow calculations: 0.396405
## ..connectivity..
## ..matrix multiplication (system BLAS)..
## ..normalization..
## ..done.
# Transform dissTOM with a power to make moderately strong connections more visible in the
# heatmap
plotTOM = dissTOM^7
# Set diagonal to NA for a nicer plot
diag(plotTOM) = NA
# Call the plot function sizeGrWindow(9,9)
TOMplot(plotTOM, geneTree, moduleColors, main = "Network heatmap plot, all genes")

Figure 10: Visualizing the gene network using a heatmap plot.

对所有基因进行筛选,在绘制热图,如下:

nSelect = 400
# For reproducibility, we set the random seed
set.seed(10)
select = sample(nGenes, size = nSelect)
selectTOM = dissTOM[select, select]
# There’s no simple way of restricting a clustering tree to a subset of genes, so we must
# re-cluster.
selectTree = hclust(as.dist(selectTOM), method = "average")
selectColors = moduleColors[select]
# Open a graphical window sizeGrWindow(9,9) Taking the dissimilarity to a power, say 10, makes
# the plot more informative by effectively changing the color palette; setting the diagonal to
# NA also improves the clarity of the plot
plotDiss = selectTOM^7
diag(plotDiss) = NA
TOMplot(plotDiss, selectTree, selectColors, main = "Network heatmap plot, selected genes")

Figure 11: Visualizing the gene network using a heatmap plot

可视化特征基因网络,如下:

# Recalculate module eigengenes
MEs = moduleEigengenes(datExpr, moduleColors)$eigengenes
# Isolate weight from the clinical traits
weight = as.data.frame(datTraits$weight_g)
names(weight) = "weight"
# Add the weight to existing module eigengenes
MET = orderMEs(cbind(MEs, weight))
# Plot the relationships among the eigengenes and the trait sizeGrWindow(5,7.5);
par(cex = 1)
plotEigengeneNetworks(MET, "Eigengene dendrogram", marDendro = c(0, 4, 2, 0), plotHeatmaps = FALSE)

Figure 12: Visualization of the eigengene network representing the relationships among the modules and the clinical trait weight.

# Plot the heatmap matrix (note: this plot will overwrite the dendrogram plot)
par(cex = 1)
plotEigengeneNetworks(MET, "Eigengene adjacency heatmap", marHeatmap = c(3, 4, 2, 2), plotDendrograms = FALSE,xLabelsAngle = 90)

Figure 13: Visualization of the eigengene network representing the relationships among the modules and the clinical trait weight.

6. Export of networks to external software

将网络数据导出到网络可视化软件中,有两种方式: 1.VisANT
2.Cytoscape
但是每种软件有自己的输入文件要求,根据代码生成就可以了,如下: 1. 输入到 VisANT 软件进行基因网络的互作绘图。

# Recalculate topological overlap
TOM = TOMsimilarityFromExpr(datExpr, power = 6)
## TOM calculation: adjacency..
## ..will not use multithreading.
##  Fraction of slow calculations: 0.396405
## ..connectivity..
## ..matrix multiplication (system BLAS)..
## ..normalization..
## ..done.
# Read in the annotation file annot = read.csv(file =
# './FemaleLiver-Data/GeneAnnotation.csv'); Select module
module = "brown"
# Select module probes
probes = names(datExpr)
inModule = (moduleColors == module)
modProbes = probes[inModule]
# Select the corresponding Topological Overlap
modTOM = TOM[inModule, inModule]
dimnames(modTOM) = list(modProbes, modProbes)
# Export the network into an edge list file VisANT can read
vis = exportNetworkToVisANT(modTOM, file = paste("VisANTInput-", module, ".txt", sep = ""), weighted = TRUE,threshold = 0, probeToGene = data.frame(annot$substanceBXH, annot$gene_symbol))
nTop = 30
IMConn = softConnectivity(datExpr[, modProbes])
##  softConnectivity: FYI: connecitivty of genes with less than 45 valid samples will be returned as NA.
##  ..calculating connectivities..
top = (rank(-IMConn) <= nTop)
vis = exportNetworkToVisANT(modTOM[top, top], file = paste("VisANTInput-", module, "-top30.txt",sep = ""), weighted = TRUE, threshold = 0, probeToGene = data.frame(annot$substanceBXH, annot$gene_symbol))

2. 输入到 Cytoscape 软件进行基因网络的互作绘图。

# Recalculate topological overlap if needed
TOM = TOMsimilarityFromExpr(datExpr, power = 6)
## TOM calculation: adjacency..
## ..will not use multithreading.
##  Fraction of slow calculations: 0.396405
## ..connectivity..
## ..matrix multiplication (system BLAS)..
## ..normalization..
## ..done.
# Read in the annotation file annot = read.csv(file =
# './FemaleLiver-Data/GeneAnnotation.csv'); Select modules
modules = c("brown", "red")
# Select module probes
probes = names(datExpr)
inModule = is.finite(match(moduleColors, modules))
modProbes = probes[inModule]
modGenes = annot$gene_symbol[match(modProbes, annot$substanceBXH)]
# Select the corresponding Topological Overlap
modTOM = TOM[inModule, inModule]
dimnames(modTOM) = list(modProbes, modProbes)
# Export the network into edge and node list files Cytoscape can read
cyt = exportNetworkToCytoscape(modTOM, edgeFile = paste("CytoscapeInput-edges-", paste(modules,collapse = "-"), ".txt", sep = ""), nodeFile = paste("CytoscapeInput-nodes-", paste(modules,collapse = "-"), ".txt", sep = ""), weighted = TRUE, threshold = 0.02, nodeNames = modProbes,altNodeNames = modGenes, nodeAttr = moduleColors[inModule])

Figure 14: Visualization of the network connections among the most connected genes in the brown module, generated by the VisANT software.

这个软件包个人感觉还需要一定改进,后期修改一下便于代码运用不太好的小伙伴们,实现一键生成图片,这样可以更好的方便大家使用!

关注公众号,每日更新,扫码进群交流不停歇,马上就出视频版,关注我,您最佳的选择!

桓峰基因

生物信息分析,SCI文章撰写及生物信息基础知识学习:R语言学习,perl基础编程,linux系统命令,Python遇见更好的你

50篇原创内容

公众号

References:

  1. Langfelder P, Horvath S. WGCNA: an R package for weighted correlation network analysis. BMC Bioinformatics. 2008;9:559. Published 2008 Dec 29. doi:10.1186/1471-2105-9-559

  2. Zhang, B. and Horvath, S., 2005. A general framework for weighted gene co-expression network analysis. Statistical applications in genetics and molecular biology, 4(1).

  3. Zhenjun Hu, Evan S. Snitkin, and Charles DeLisi. VisANT: an integrative framework for networks in systems biology. Brief Bioinform, 9(4):317–325, 2008.

  4. Paul Shannon, Andrew Markiel, Owen Ozier, Nitin S. Baliga, Jonathan T. Wang, Daniel Ramage, Nada Amin, Benno Schwikowski, and Trey Ideker. Cytoscape: A Software Environment for Integrated Models of Biomolecular Interaction Networks. Genome Research, 13(11):2498–2504, 2003.

RNA 13. SCI 文章中加权基因共表达网络分析之 WGCNA相关推荐

  1. WGCNA(加权基因共表达网络分析)

    WGCNA(加权基因共表达网络分析) 序章 这个工具现在很火,高分文章用到很多. 加权基因共表达网络分析(WGCNA,Weighted gene co-expression network analy ...

  2. WGCNA加权基因共表达网络分析(1)简介、原理

    WGCNA简介 WGCNA(Weighted Gene Co-Expression Network Analysis, 加权基因共表达网络分析),鉴定表达模式相似的基因集合(module),解析基因集 ...

  3. 基因共表达网络分析java,WGCNA:加权基因共表达网络分析

    加权基因表达网络分析(Weighted gene co-expression network analysis, WGCNA),又叫权重基因共表达网络分析,其根本思想是根据基因表达模式的不同,挖掘出相 ...

  4. 基因共表达网络分析java,WGCNA 加权基因共表达网络分析教程(1)

    学是很多做科研的同学都想学的,包括我在内,现阶段正在学习这个,深夜整理材料不易,请多多关照支持! 先安装Rstudio以及以及加载WGCNA数据包 Rstudio软件下载地址: WGCNA加载按照网站 ...

  5. RNA 27 SCI文章中转录因子结合motif富集到调控网络 (RcisTarget)

    点击关注,桓峰基因 桓峰基因公众号推出转录组分析和临床预测模型教程,有需要生信的老师可以联系我们!首选看下转录分析教程整理如下: RNA 1. 基因表达那些事–基于 GEO RNA 2. SCI文章中 ...

  6. RNA 25. SCI文章中只有生信没有实验该怎么办?

    今天来介绍一个使用非常方便的在线免疫组化分析工具--PHA (The Human atlas),免疫组化时单基因干湿结合生信中最常见的补充实验的方法之一,性价比较高.但是如果没有条件进行自己样本的免疫 ...

  7. RNA 24. SCI文章中基于TCGA的免疫浸润细胞分析的在线小工具——TIMER

    点击关注,桓峰基因 桓峰基因 生物信息分析,SCI文章撰写及生物信息基础知识学习:R语言学习,perl基础编程,linux系统命令,Python遇见更好的你 135篇原创内容 公众号 今天来介绍一个使 ...

  8. RNA 25. SCI文章中估计组织浸润免疫细胞和基质细胞群的群体丰度(MCP-counter)

    点击关注,桓峰基因 今天来介绍一个利用基因表达估计组织浸润免疫细胞和基质细胞群的群体丰度的软件包--MCP-counter,亲试,非常好用. 桓峰基因的教程不但教您怎么使用,还会定期分析一些相关的文章 ...

  9. RNA 30. SCI文章中基于TCGA和GTEx数据挖掘神器(GEPIA2)

    这期介绍一个基于TCGA和GTEx数据挖掘神器(GEPIA2),个人觉得如果没有编程基础的可以直接利用这个在线小工具分析自己的研究的单个基因或者多个基因,效果还是蛮好的! 桓峰基因公众号推出转录组分析 ...

最新文章

  1. 如何做中文文本的情感分析?
  2. Python学习心得(二) 字典Dictionary
  3. 安徽理工大学计算机技术研究生,安徽理工大学(专业学位)计算机技术研究生辅导...
  4. eclipse启动出错的解决方案:org.osgi.framework.BundleExcep...
  5. 这些 IDEA 的优化设置赶紧安排起来,效率提升不是一点点!
  6. FreeRTOS的信号量和互斥量
  7. php mysql log文件,mysql log文件【读书笔记1】_MySQL
  8. JavaScript getFullYear() 方法
  9. python中goto的用法_python3里用goto
  10. asterisk1.8 Makefile分析 (1)
  11. 二次开发_企业ERP系统二次开发问题的探讨分析
  12. maven创建父项目和子项目
  13. UBUNTU18.04系统安装打印机
  14. 不能创建对象qmdispatch_运行时错误 429,ACTIVEX部件不能创建对象的解决方法小结...
  15. IOS - 苹果微信不打开收不到新消息提醒怎么办?
  16. opencv_python Stitcher拼接图像实例(SIFT/SURF检测特征点,BF/FLANN匹配特征点)
  17. js获取唯一设备码_前端JS获取设备IP地址和唯一标识
  18. 重新学javaweb---cookiesession
  19. java migration_JetPack知识点实战系列九:Room数据库Migration
  20. 测试人员必备:常用自动化测试工具

热门文章

  1. android裁剪圆形几种方式.
  2. android-基础知识解析
  3. ipmitool 常用命令
  4. 央视在世界杯高清直播中占了C位,它是怎么做到的?
  5. C/C++ 中‘0’ “0” ‘\0’ 0的辨析
  6. TCP协议ESTABLISHED状态处理
  7. mmap内存 android,Android中mmap原理及应用简析
  8. PyTorch 深度学习实战 | 基于 ResNet 的花卉图片分类
  9. linux epel,Linux增加epel源的方法
  10. 单片机怎么入门,发展前景如何?