A Brief Tutorial of ArchR • ArchR 这个是简单版本

1.5 Getting Set Up | ArchR: Robust and scaleable analysis of single-cell chromatin accessibility data.经典必看 我已经看了两边

Chapter 4 Dimensionality Reduction with ArchR | ArchR: Robust and scaleable analysis of single-cell chromatin accessibility data.In ArchR, these visualization methods are referred to as embeddings.

One of the key inputs to LSI dimensionality reduction is the starting matrix. Thus far, the two main strategies in scATAC-seq have been to (1) use peak regions or (2) genome-wide tiles.

Chapter 7 Gene Scores and Marker Genes with ArchR | ArchR: Robust and scaleable analysis of single-cell chromatin accessibility data.

Gene Scores and Marker Genes with ArchR  先验知识 如何识别出markergene 既然atac都没有

7.1 Calculating Gene Scores in ArchR | ArchR: Robust and scaleable analysis of single-cell chromatin accessibility data.genescores是什么 怎么计算出来的? 可以理解为基因得分,虽然atac测得是可及性,但通过距离tss位点的远近 可以算出genescores

7.2 Identification of Marker Features | ArchR: Robust and scaleable analysis of single-cell chromatin accessibility data.

marker基因是如何识别出来的 具体原理

markersGS <- getMarkerFeatures(ArchRProj = projHeme2, useMatrix = "GeneScoreMatrix", groupBy = "Clusters",bias = c("TSSEnrichment", "log10(nFrags)"),testMethod = "wilcoxon"
)

这个参数相当于按照什么标准来找marker gene

 useMatrix = "GeneScoreMatrix"

这个参数相当于以什么为背景

 bias = c("TSSEnrichment", "log10(nFrags)"),

7.5 Marker Genes Imputation with MAGIC | ArchR: Robust and scaleable analysis of single-cell chromatin accessibility data.

MAGIC方法对genescores进行impute 赋予其权重 让图更鲜明 好看

7.6 Track Plotting with ArchRBrowser | ArchR: Robust and scaleable analysis of single-cell chromatin accessibility data.

看markergene在染色体上的位置

markerGenes  <- c("CD34", #Early Progenitor"GATA1", #Erythroid"PAX5", "MS4A1", #B-Cell Trajectory"CD14", #Monocytes"CD3D", "CD8A", "TBX21", "IL7R" #TCells)p <- plotBrowserTrack(ArchRProj = projHeme2, groupBy = "Clusters", geneSymbol = markerGenes, upstream = 50000,downstream = 50000
)
grid::grid.newpage()
grid::grid.draw(p$CD14)

7.7 Launching the ArchRBrowser | ArchR: Robust and scaleable analysis of single-cell chromatin accessibility data.

本地 嵌入shiny查看

Chapter 8 Defining Cluster Identity with scRNA-seq | ArchR: Robust and scaleable analysis of single-cell chromatin accessibility data.

跨平台数据整合 整合scRNA-seq

The way this integration works is by directly aligning cells from scATAC-seq with cells from scRNA-seq by comparing the scATAC-seq gene score matrix with the scRNA-seq gene expression matrix.

如何定义每个cluster的细胞类型 通过整合genescores 和scRANseq的gene expression

8.1 Cross-platform linkage of scATAC-seq cells with scRNA-seq cells | ArchR: Robust and scaleable analysis of single-cell chromatin accessibility data.

两种整合方法

Unconstrained integration is a completely agnostic approach that would take all of the cells in your scATAC-seq experiment and attempt to align them to any of the cells in the scRNA-seq experiment. While this is a feasible preliminary solution, we can improve the quality of our cross-platform alignment by constraining the integration process. To perform a constrained integration we use prior knowledge of the cell types to limit the search space of the alignment.

8.2 Adding Pseudo-scRNA-seq profiles for each scATAC-seq cell | ArchR: Robust and scaleable analysis of single-cell chromatin accessibility data.

推断的基因表达与genescores化的图差不多

8.3 Labeling scATAC-seq clusters with scRNA-seq information | ArchR: Robust and scaleable analysis of single-cell chromatin accessibility data.

确定细胞类型 celltype

Chapter 9 Pseudo-bulk Replicates in ArchR | ArchR: Robust and scaleable analysis of single-cell chromatin accessibility data.

有点像impute  把同类型的细胞变成一个pseudo-bulk样本

The term pseudo-bulk refers to a grouping of single cells where the data from each single cell is combined into a single pseudo-sample that resembles a bulk ATAC-seq experiment.

9.1 How Does ArchR Make Pseudo-bulk Replicates? | ArchR: Robust and scaleable analysis of single-cell chromatin accessibility data.

自己定义每群细胞的取样比例 以及每群细胞可以产生多少个replications

9.2 Making Pseudo-bulk Replicates | ArchR: Robust and scaleable analysis of single-cell chromatin accessibility data.

With these pseudo-bulk replicates generated, we can now call peaks in our data.

As mentioned previously, we do not want to call peaks on the merged set of all single cells so having these more granular cell groups defined, either through clustering or otherwise, provides the ideal starting point for peak calling.

Chapter 10 Calling Peaks with ArchR | ArchR: Robust and scaleable analysis of single-cell chromatin accessibility data.

单个细胞不能call peaks 必须使用pseudo-bulk数据才可以callpeaks

软件内置了不同方法的call peaks

10.2 Calling Peaks w/ Macs2 | ArchR: Robust and scaleable analysis of single-cell chromatin accessibility data.

10.3 Calling Peaks w/ TileMatrix | ArchR: Robust and scaleable analysis of single-cell chromatin accessibility data.

10.4 Add Peak Matrix | ArchR: Robust and scaleable analysis of single-cell chromatin accessibility data.

Chapter 11 Identifying Marker Peaks with ArchR | ArchR: Robust and scaleable analysis of single-cell chromatin accessibility data.

11.1 Identifying Marker Peaks with ArchR | ArchR: Robust and scaleable analysis of single-cell chromatin accessibility data.

识别marker peaks

markersPeaks <- getMarkerFeatures(ArchRProj = projHeme5, useMatrix = "PeakMatrix", groupBy = "Clusters2",bias = c("TSSEnrichment", "log10(nFrags)"),testMethod = "wilcoxon"
)

11.2 Plotting Marker Peaks in ArchR | ArchR: Robust and scaleable analysis of single-cell chromatin accessibility data. 给marker peaks 画图 pheatmap

heatmapPeaks <- markerHeatmap(seMarker = markersPeaks, cutOff = "FDR <= 0.1 & Log2FC >= 0.5",transpose = TRUE
)

Marker Peak MA and Volcano Plots

Marker Peaks in Browser Tracks

p <- plotBrowserTrack(ArchRProj = projHeme5, groupBy = "Clusters2", geneSymbol = c("GATA1"),features =  getMarkers(markersPeaks, cutOff = "FDR <= 0.1 & Log2FC >= 1", returnGR = TRUE)["Erythroid"],upstream = 50000,downstream = 50000
)

11.3 Pairwise Testing Between Groups | ArchR: Robust and scaleable analysis of single-cell chromatin accessibility data.

diffpeak差异分析火山图 volconIC plot

Chapter 12 Motif and Feature Enrichment with ArchR | ArchR: Robust and scaleable analysis of single-cell chromatin accessibility data.

 For example, we often find enrichment of key lineage-defining TFs in cell type-specific accessible chromatin regions. In a similar fashion, we might want to test various groups of peaks for enrichment of other known features. 

12.1 Motif Enrichment in Differential Peaks | ArchR: Robust and scaleable analysis of single-cell chromatin accessibility data.

基序富集分析

projHeme5 <- addMotifAnnotations(ArchRProj = projHeme5, motifSet = "cisbp", name = "Motif")
motifsUp <- peakAnnoEnrichment(seMarker = markerTest,ArchRProj = projHeme5,peakAnnotation = "Motif",cutOff = "FDR <= 0.1 & Log2FC >= 0.5")

12.3 ArchR Enrichment | ArchR: Robust and scaleable analysis of single-cell chromatin accessibility data.

marker peak的基序富集

转录因子结合位点 分析

12.3 ArchR Enrichment | ArchR: Robust and scaleable analysis of single-cell chromatin accessibility data.

12.4 Custom Enrichment | ArchR: Robust and scaleable analysis of single-cell chromatin accessibility data.

定制富集分析custom

Chapter 13 ChromVAR Deviatons Enrichment with ArchR | ArchR: Robust and scaleable analysis of single-cell chromatin accessibility data.

单细胞水平的富集分析

chromVAR is designed for predicting enrichment of TF activity on a per-cell basis from sparse chromatin accessibility data. The two primary outputs of chromVAR are:

  1. “deviations” - A deviation is a bias-corrected measurement of how far the per-cell accessibility of a given feature (i.e motif) deviates from the expected accessibility based on the average of all cells or samples.
  2. “z-score” - The z-score, also known as a “deviation score” is the z-score for each bias-corrected deviation across all cells. The absolute value of the deviation score is correlated with the per-cell read depth. This is because, with more reads, you have higher confidence that the difference in per-cell accessibility of the given feature (i.e. motif) from the expectation is greater than would occur by chance.

13.1 Motif Deviations | ArchR: Robust and scaleable analysis of single-cell chromatin accessibility data.

Chapter 14 Footprinting with ArchR | ArchR: Robust and scaleable analysis of single-cell chromatin accessibility data.  找到具体的 结合位点  足迹分析

Transcription factor (TF) footprinting allows for the prediction of the precise binding location of a TF at a particular locus. This is because the DNA bases that are directly bound by the TF are actually protected from transposition while the DNA bases immediately adjacent to TF binding are accessible.

Ideally, TF footprinting is performed at a single site to determine the precise binding location of the TF. However, in practice, this requires very high sequencing depth, often much higher depth than what most users would obtain from either bulk or single-cell ATAC-seq. To get around this problem, we can combine Tn5 insertion locations across many instances of predicted TF binding. For example, we can take all peaks that harbor a CTCF motif and make an aggregate TF footprint for CTCF across the whole genome.

14.1 Motif Footprinting | ArchR: Robust and scaleable analysis of single-cell chromatin accessibility data.

motif 足迹分析

14.2 Normalization of Footprints for Tn5 Bias | ArchR: Robust and scaleable analysis of single-cell chromatin accessibility data.

To calculate the insertion bias for a pseudo-bulk footprint, ArchR creates a k-mer frequency matrix that is represented as all possible k-mers across a window +/- N bp (user-defined, default 250 bp) from the motif center. Then, iterating over each motif site, ArchR fills in the positioned k-mers into the k-mer frequency matrix. This is then calculated for each motif position genome-wide. Using the sample’s k-mer frequency table, ArchR can then compute the expected Tn5 insertions by multiplying the k-mer position frequency table by the observed/expected Tn5 k-mer frequency.

plotFootprints(seFoot = seFoot,ArchRProj = projHeme5, normMethod = "Subtract",plotName = "Footprints-Subtract-Bias",addDOC = FALSE,smoothWindow = 5
)

14.3 Feature Footprinting | ArchR: Robust and scaleable analysis of single-cell chromatin accessibility data.

感觉就footprinting可以理解为特征 就像每个人的脚趾头一样 都是独一无二的 所以叫做footprinting 就像沙滩上留下的footprinting都是非常特征性的

本质上任何一个meta infomation都可以

plotFootprints(seFoot = seTSS,ArchRProj = projHeme5, normMethod = "None",plotName = "TSS-No-Normalization",addDOC = FALSE,flank = 2000,flankNorm = 100
)

15.1 Creating Low-Overlapping Aggregates of Cells | ArchR: Robust and scaleable analysis of single-cell chromatin accessibility data.

15.2 Co-accessibility with ArchR | ArchR: Robust and scaleable analysis of single-cell chromatin accessibility data.

Co-accessibility is a correlation in accessibility between two peaks across many single cells. Said another way, when Peak A is accessible in a single cell, Peak B is often also accessible. We illustrate this concept visually below, showing that Enhancer E3 is often co-accessible with Promoter P.

One thing to note about co-accessibility analysis is that it often identified cell type-specific peaks as being co-accessibile. This is because these peaks are often all accessible together within a single cell type and often all not accessible in all other cell types. This drives a strong correlation but does not necessarily mean that there is a regulatory relationship between these peaks.

projHeme5 <- addCoAccessibility(ArchRProj = projHeme5,reducedDims = "IterativeLSI"
)

15.3 Peak2GeneLinkage with ArchR | ArchR: Robust and scaleable analysis of single-cell chromatin accessibility data.

peaktogene 与

Similar to co-accessibility, ArchR can also identify so-called “peak-to-gene links”. The primary differences between peak-to-gene links and co-accessibility is that co-accessibility is an ATAC-seq-only analysis that looks for correlations in accessibility between two peaks while peak-to-gene linkage leverages integrated scRNA-seq data to look for correlations between peak accessibility and gene expression. These represent orthogonal approaches to a similar problem. However, because peak-to-gene linkage correlates scATAC-seq and scRNA-seq data, we often think of these links as more relevant to gene regulatory interaction

projHeme5 <- addPeak2GeneLinks(ArchRProj = projHeme5,reducedDims = "IterativeLSI"
)
p <- plotBrowserTrack(ArchRProj = projHeme5, groupBy = "Clusters2", geneSymbol = markerGenes, upstream = 50000,downstream = 50000,loops = getPeak2GeneLinks(projHeme5)
)

15.4 Identification of Positive TF-Regulators | ArchR: Robust and scaleable analysis of single-cell chromatin accessibility data.

This motif similarity makes it challenging to identify the specific TFs that might be driving observed changes in chromatin accessibility at their predicted binding sites. To circumvent this challenge, we have previously ATAC-seq and RNA-seq to identify TFs whose gene expression is positively correlated to changes in the accessibility of their corresponding motif. We term these TFs “positive regulators”. However, this analysis relies on matched gene expression data which may not be readily available in all experiments. To overcome this dependency, ArchR can identify TFs whose inferred gene scores are correlated to their chromVAR TF deviation z-scores. To achieve this, ArchR correlates chromVAR deviation z-scores of TF motifs with gene activity scores of TF genes from the low-overlapping cell aggregates. When using scRNA-seq integration with ArchR, gene expression of the TF can be used instead of inferred gene activity score.

To identify TFs whose motif accessibility is correlated with with their own gene activity (either by gene score or gene expression), we use the correlateMatrices() function and provide the two matrices that we are interested in, in this case the GeneScoreMatrix and the MotifMatrix.

Chapter 16 Trajectory Analysis with ArchR | ArchR: Robust and scaleable analysis of single-cell chromatin accessibility data.

16.1 Myeloid Trajectory - Monocyte Differentiation | ArchR: Robust and scaleable analysis of single-cell chromatin accessibility data.

scATAC-seq入门必看 染色质可及性 archr官网全流程相关推荐

  1. 95后数据科学家教你从零自学机器学习,这有3本入门必看书籍

    晓查 发自 凹非寺  量子位 编译 | 公众号 QbitAI 厌倦了现在的工作,想转行做数据科学,但是却没有计算机专业的相关学历,应该怎样才能入门? 这类的教程已经有很多.最近一位22岁的数据科学家D ...

  2. python基础教程书籍推荐-初学者python入门必看书籍推荐(上)

    随着最近Python的热度上涨,学习的人数也在增多,也因为Python这门语言的简短.易学.语言优美的优点,吸引了很多人来学习它.初学者想要入门,就必须先通过阅读,学习大量的理论知识对Python有一 ...

  3. arduino开发ESP8266配置方法,入门必看,esp8266开发板库离线安装包package2.7.1

    公众号关注 "DLGG创客DIY" 设为"星标",重磅干货,第一时间送达. 群里经常有朋友问arduino开发ESP8266的配置方法,今天在之前的文章基础上, ...

  4. 200528更新arduino开发ESP8266配置方法,入门必看,esp8266开发板库离线安装包package2.7.1...

    公众号关注 "DLGG创客DIY" 设为"星标",重磅干货,第一时间送达. 群里经常有朋友问arduino开发ESP8266的配置方法,今天(200528)在之 ...

  5. 怎么安装python_零基础入门必看篇:浅析python,PyCharm,Anaconda三者之间关系

    今天为大家带来的内容是:零基础入门必看篇:浅析python ,PyCharm,Anaconda三者之间关系 众所周知,Python是一种跨平台的计算机程序设计语言,简单来说,python就是类似于C, ...

  6. powershell 开发入门_详谈Ubuntu PowerShell(小白入门必看教程)

    早在去年八月份PowerShell就开始开源跨平台了,但是一直没有去尝试,叫做PowerShell Core. 这里打算简单介绍一下如何安装和简单使用,为还不知道PowerShell Core on ...

  7. IntelliJ IDEA 最常用配置详细图解,新手入门必看

    转载自   IntelliJ IDEA 最常用配置详细图解,新手入门必看 刚刚使用IntelliJ IDEA 编辑器的时候,会有很多设置,会方便以后的开发,磨刀不误砍柴工. 比如:设置文件字体大小,代 ...

  8. 五分钟教你使用vue-cli3创建项目(三种创建方式,小白入门必看)

    五分钟教你使用vue-cli3创建项目(三种创建方式,小白入门必看) 一.搭建vue环境 安装Nodejs 官网下载Nodejs,如果希望稳定的开发环境则下LTS(Long Time Support) ...

  9. 语音识别从菜鸟到入门必看的参考书目

    原标题:语音识别从菜鸟到入门必看的参考书目 此前研究这一方面的容,搜罗了些资料,在这里整理如下,做毕业设计,写毕业论文的朋友购书时可以参考一下: 一.<MATLAB数字信号处理> 编者:王 ...

最新文章

  1. Asp.Net+SqlServer+EntityFrameWork(项目问题总结)
  2. 自学python的书籍逐级推荐-近300本Python书籍到底哪家强,用Python告诉你
  3. 双向最大匹配算法(含完整代码实现,ui界面)正向最大匹配算法,逆向最大匹配算法
  4. 【人物】雷军去了联想,干货分享全场哑口无言,除了掌声...
  5. 淘宝应对双11的技术架构分析
  6. docker 内部ping不通宿主机_Docker容器数据管理
  7. 微信小程序中template模板使用
  8. C#读写txt文件的两种方法介绍
  9. redis list放入对象_Redis从入门到入土:详细讲解内存模型以及常用命令
  10. 印刷纸张尺寸,纸张种类规格
  11. md5 java实现与在线工具结果不符_「md5在线解密」使用MD5在线加解密工具,对汉字进行加密,得到不同的结果 - seo实验室...
  12. markdown转微信公众号编辑器
  13. 计算机教育中缺失的一课 - MIT - L5 - 命令行环境
  14. NMN是什么概念,nmn是真的还是假的,你一定要知道
  15. 帆软数据决策系统漏洞_帆软报表漏洞总结
  16. layui——下拉框监听
  17. 【数据来源】如何选择合适的第三方数据源
  18. 关于手工制作PCB印刷电路板的镜像操作详解【干货】
  19. python获取游戏数据_Python 爬取 3 万条游戏评分数据,原来程序员最爱玩的游戏竟然是.........
  20. Turf.js(地理空间GIS分析的js库),处理地图相关算法

热门文章

  1. 关闭Windows Defender实时保护解决下载激活软件报检测到病毒无法下载的问题
  2. 程序员刚写完代码 , 就被开除了
  3. 写在2016年的第365天,记录我的2016
  4. Stata数据处理:面板数据的填充和补漏
  5. mpi4py 并行读/写 numpy npy 文件的方法
  6. 类型萃取类型检查 Type-Traits LibraryType Checks --- C++20
  7. Windows Server 2012 R2桌面化详细设置图解
  8. 时间的尽头,是永恒的宇宙吗?
  9. [Unity] GPU动画实现(三)——材质合并
  10. MySQL · 捉虫动态 · UK 包含 NULL 值备库延迟分析