前言

通过学习前面几个模块,我们已经发现了基于多组学数据找出的乳腺癌各亚型间具有非常显著的分子差异,而我们如果想深入挖掘其背后机制,就需要找出差异表达的基因是哪些,以及这些基因具有什么样的功能。而MOVICS第三个模块,也就是最后一个模块,将会帮助我们实现这个目的。

这一模块的内容需要有一定生物学背景知识才能较好的理解,大家可以通过生信往期的一些推文来进行学习。


主要函数

这一部分的函数主要是行驶差异分析以及富集分析的功能,MOVICS内置了多种常用分析的函数,大家使用时可以根据需要进行选择。

runDEA(): run differential expression analysis with three popular methods for choosing, including edgeR, DESeq2, and limma runMarker(): run biomarker identification to determine uniquely and significantly differential expressed genes for each subtype

runGSEA(): run gene set enrichment analysis (GSEA), calculate activity of functional pathways and generate a pathway-specific heatmap

runGSVA(): run gene set variation analysis to calculate enrichment score of each sample based on given gene set list of interest

runNTP(): run nearest template prediction based on identified biomarkers to evaluate subtypes in external cohorts

runPAM(): run partition around medoids classifier based on discovery cohort to predict subtypes in external cohorts

runKappa(): run consistency evaluation using Kappa statistics between two appraisements that identify or predict current subtypes


代码演示

下面Immugent将进行实操演示:

# run DEA with edgeR
runDEA(dea.method = "edger",expr       = count, # raw count datamoic.res   = cmoic.brca,prefix     = "TCGA-BRCA") # prefix of figure name
#> --all samples matched.
#> --you choose edger and please make sure an RNA-Seq count data was provided.
#> edger of CS1_vs_Others done...
#> edger of CS2_vs_Others done...
#> edger of CS3_vs_Others done...
#> edger of CS4_vs_Others done...
#> edger of CS5_vs_Others done...# run DEA with DESeq2
runDEA(dea.method = "deseq2",expr       = count,moic.res   = cmoic.brca,prefix     = "TCGA-BRCA")
#> --all samples matched.
#> --you choose deseq2 and please make sure an RNA-Seq count data was provided.
#> deseq2 of CS1_vs_Others done...
#> deseq2 of CS2_vs_Others done...
#> deseq2 of CS3_vs_Others done...
#> deseq2 of CS4_vs_Others done...
#> deseq2 of CS5_vs_Others done...# run DEA with limma
runDEA(dea.method = "limma",expr       = fpkm, # normalized expression datamoic.res   = cmoic.brca,prefix     = "TCGA-BRCA")

算出差异基因后,做个热图来看一下。

# choose edgeR result to identify subtype-specific up-regulated biomarkers
marker.up <- runMarker(moic.res      = cmoic.brca,dea.method    = "edger", # name of DEA methodprefix        = "TCGA-BRCA", # MUST be the same of argument in runDEA()dat.path      = getwd(), # path of DEA filesres.path      = getwd(), # path to save marker filesp.cutoff      = 0.05, # p cutoff to identify significant DEGsp.adj.cutoff  = 0.05, # padj cutoff to identify significant DEGsdirct         = "up", # direction of dysregulation in expressionn.marker      = 100, # number of biomarkers for each subtypedoplot        = TRUE, # generate diagonal heatmapnorm.expr     = fpkm, # use normalized expression as heatmap inputannCol        = annCol, # sample annotation in heatmapannColors     = annColors, # colors for sample annotationshow_rownames = FALSE, # show no rownames (biomarker name)fig.name      = "UPREGULATED BIOMARKER HEATMAP")

# MUST locate ABSOLUTE path of msigdb file
MSIGDB.FILE <- system.file("extdata", "c5.bp.v7.1.symbols.xls", package = "MOVICS", mustWork = TRUE)# run GSEA to identify up-regulated GO pathways using results from edgeR
gsea.up <- runGSEA(moic.res     = cmoic.brca,dea.method   = "edger", # name of DEA methodprefix       = "TCGA-BRCA", # MUST be the same of argument in runDEA()dat.path     = getwd(), # path of DEA filesres.path     = getwd(), # path to save GSEA filesmsigdb.path  = MSIGDB.FILE, # MUST be the ABSOLUTE path of msigdb filenorm.expr    = fpkm, # use normalized expression to calculate enrichment scoredirct        = "up", # direction of dysregulation in pathwayp.cutoff     = 0.05, # p cutoff to identify significant pathwaysp.adj.cutoff = 0.25, # padj cutoff to identify significant pathwaysgsva.method  = "gsva", # method to calculate single sample enrichment scorenorm.method  = "mean", # normalization method to calculate subtype-specific enrichment scorefig.name     = "UPREGULATED PATHWAY HEATMAP")

除了GSEA分析还能进行GSVA分析。

# MUST locate ABSOLUTE path of gene set file
GSET.FILE <- system.file("extdata", "gene sets of interest.gmt", package = "MOVICS", mustWork = TRUE)

# run NTP in Yau cohort by using up-regulated biomarkers
yau.ntp.pred <- runNTP(expr       = brca.yau$mRNA.expr,templates  = marker.up$templates, # the template has been already prepared in runMarker()scaleFlag  = TRUE, # scale input data (by default)centerFlag = TRUE, # center input data (by default)doPlot     = TRUE, # to generate heatmapfig.name   = "NTP HEATMAP FOR YAU")

# compare survival outcome in Yau cohort
surv.yau <- compSurv(moic.res         = yau.ntp.pred,surv.info        = brca.yau$clin.info,convt.time       = "m", # switch to monthsurv.median.line = "hv", # switch to bothfig.name         = "KAPLAN-MEIER CURVE OF NTP FOR YAU")

# compare agreement in Yau cohort
agree.yau <- compAgree(moic.res  = yau.ntp.pred,subt2comp = brca.yau$clin.info[, "PAM50", drop = FALSE],doPlot    = TRUE,fig.name  = "YAU PREDICTEDMOIC WITH PAM50")

除了NTP之外,MOVICS还提供了另一种无模型方法来预测亚型。具体来说,runPAM()首先在发现训练集(即TCGA-BRCA)中对medoids (PAM)分类器进行分区训练,以预测外部验证测试集(即BRCA-Yau)中患者的亚型,验证队列中的每个样本被分配到一个子类型标签,其质心与样本的皮尔逊相关性最高。最后,将执行组内比例(IGP)统计来评估发现队列和验证队列之间获得的亚型的相似性和重现性。

yau.pam.pred <- runPAM(train.expr  = fpkm,moic.res    = cmoic.brca,test.expr   = brca.yau$mRNA.expr)# predict subtype in discovery cohort using NTP
tcga.ntp.pred <- runNTP(expr      = fpkm,templates = marker.up$templates,doPlot    = FALSE)
#> --original template has 500 biomarkers and 500 are matched in external expression profile.
#> cosine correlation distance
#> 643 samples; 5 classes; 100-100 features/class
#> serial processing; 1000 permutation(s)...
#> predicted samples/class (FDR<0.05)
#>
#>  CS1  CS2  CS3  CS4  CS5 <NA>
#>   99  105  138  155  107   39# predict subtype in discovery cohort using PAM
tcga.pam.pred <- runPAM(train.expr  = fpkm,moic.res    = cmoic.brca,test.expr   = fpkm)
#> --all samples matched.
#> --a total of 13771 genes shared and used.
#> --log2 transformation done for training expression data.
#> --log2 transformation done for testing expression data.# check consistency between current and NTP-predicted subtype in discovery TCGA-BRCA
runKappa(subt1     = cmoic.brca$clust.res$clust,subt2     = tcga.ntp.pred$clust.res$clust,subt1.lab = "CMOIC",subt2.lab = "NTP",fig.name  = "CONSISTENCY HEATMAP FOR TCGA between CMOIC and NTP")# check consistency between current and PAM-predicted subtype in discovery TCGA-BRCA
runKappa(subt1     = cmoic.brca$clust.res$clust,subt2     = tcga.pam.pred$clust.res$clust,subt1.lab = "CMOIC",subt2.lab = "PAM",fig.name  = "CONSISTENCY HEATMAP FOR TCGA between CMOIC and PAM")# check consistency between NTP and PAM-predicted subtype in validation Yau-BRCA
runKappa(subt1     = yau.ntp.pred$clust.res$clust,subt2     = yau.pam.pred$clust.res$clust,subt1.lab = "NTP",subt2.lab = "PAM",fig.name  = "CONSISTENCY HEATMAP FOR YAU")


总结

截止到这篇推文,有关MOVICS包的所有实操部分内容都已结束,本来对这个R包的所有介绍内容都应该完结了。

但是Immugent本着对粉丝极度负责任的态度,后续专门会解读一篇完全依靠这个R包来分析的SCI文章,直到教会大家如何使用MOVICS包分析自己的数据为止,敬请期待!


MOVICS系列教程(三) RUN Module相关推荐

  1. MOVICS系列教程(二) COMP Module

    前言 今天我们来演示MOVICS包的第二个模块,在上一篇推文中:MOVICS系列教程(一) GET Module分析后,我们得到了乳腺癌的5个亚型,那么此模块就是为了对这5种亚型间的分子特征进行展示. ...

  2. MOVICS系列教程(一) GET Module

    前言 Immugent在之前的推文:整合多组学数据进行分型之MOVICS中已经介绍了MOVICS的基本功能,从本篇推文开始,小编将会以一系列推文的形式对这个R包进行实操演示. 为了方便有兴趣的小伙伴进 ...

  3. Fastify 系列教程三 (验证、序列化和生命周期)

    Fastify 系列教程: Fastify 系列教程一 (路由和日志) Fastify 系列教程二 (中间件.钩子函数和装饰器) Fastify 系列教程三 (验证.序列化和生命周期) Fastify ...

  4. 树莓派4B系列教程三 :优化配置

    树莓派4B系列教程三 :优化配置 写在前面 树莓派4B内存增加 CPU温度控制 显存优化 Chromuim缓存转移 优化自启程序 Chromium插件安装 离线安装插件时遇到的问题 结语 写在前面 不 ...

  5. 汇川技术小型PLC梯形图编程系列教程(三):PLC系统程序与用户程序介绍

    原文链接:汇川技术小型PLC梯形图编程系列教程(三):PLC系统程序与用户程序介绍 PLC的定义 可编程逻辑控制器是种专门为在工业环境下应用而设计的数字运算操作电子系统.它采用一种可编程的存储器,在其 ...

  6. ROS系列教程三:roslaunch文件及参数服务器

    一.标签简介 1.<launch> ... </launch> : 根标签,一般写在整个launch文件的头尾,斜杠/代表结束: 2.<node> :启动节点,如果 ...

  7. Cobaltstrike系列教程(三)beacon详解

    0x000–前文 有技术交流或渗透测试培训需求的朋友欢迎联系QQ/VX-547006660 2000人网络安全交流群,欢迎大佬们来玩 群号820783253 0x001-Beacon详解 1.Beac ...

  8. ClickHouse系列教程三:MergeTree引擎分析

    ClickHouse系列教程: ClickHouse系列教程 Clickhouse之MergeTree引擎分析 CRUD Clickhouse支持查询(select)和增加(insert),但是不直接 ...

  9. ASP .NET Core Web MVC系列教程三:添加视图

    系列文章目录:ASP .NET Core Web MVC系列教程:使用ASP .NET Core创建MVC Web应用程序 上一个教程:ASP .NET Core Web MVC系列教程二:添加控制器 ...

最新文章

  1. python jenkins库 api简介
  2. 新建用户组、用户、用户密码、删除用户组、用户(适合CentOS、Ubuntu系统)
  3. 在android C/C++ native编程(ndk)中使用logcat
  4. bp神经网络预测模型_基于BP神经网络模型的河南省严重精神障碍患者服药依从性影响因素分析...
  5. JS 中迭代数组的三种方法
  6. xgboost安装_Machine Learning Mastery 博客文章翻译:XGBoost
  7. 0_0 SimpleFactoryMode 简单工厂模式
  8. python样本不均衡_使用Python中的smote处理正负样本之间的不平衡,python,实现,失衡,问题...
  9. 数据结构与算法——排序
  10. iptables基本概念详解
  11. zircon ddk快速入门
  12. 菜鸟教程 linux c,C 基本语法 | 菜鸟教程
  13. 马克思主义哲学与计算机专业的关系,以科学技术哲学来分析与自然辨证法的统一关系...
  14. ABtest用于推荐系统性能衡量
  15. powerdns 安装部署备忘
  16. wireshark过滤规则
  17. 用c语言解参数积分,C语言求定积分的通用函数
  18. 开源问答Tipask伪静态apache和nginx设置
  19. ubuntu测网速方法
  20. BGP选路规则(实验做的有点乱)

热门文章

  1. day02【Collection、泛型】-笔记
  2. 今天睡眠质量记录67
  3. Win7系统安装哪个版本的Chrome浏览器
  4. 【名词解释】电信技术名词解释大全
  5. 日语语法实践篇十二——新编日语第一册第十三课之会话篇
  6. libreoffice 出现 /lib/x86_64-linux-gnu/libcairo.so.2: undefined symbol: FT_Get_Var_Design_Coordi
  7. WLAN网络配置,vlan内漫游
  8. 北京2016计算机应用自考,02316自考全国2016年4月计算机应用技术试题.doc
  9. 【模拟赛】2019 蓝桥杯省赛 A 组模拟赛(一)
  10. 【论文写作】学术英语写作辅助工具推荐:Academic Phrasebank