单细胞生信分析教程

桓峰基因公众号推出单细胞生信分析教程并配有视频在线教程,目前整理出来的相关教程目录如下:

Topic 6. 克隆进化之 Canopy

Topic 7. 克隆进化之 Cardelino

Topic 8. 克隆进化之 RobustClone

SCS【1】今天开启单细胞之旅,述说单细胞测序的前世今生

SCS【2】单细胞转录组 之 cellranger

SCS【3】单细胞转录组数据 GEO 下载及读取

SCS【4】单细胞转录组数据可视化分析 (Seurat 4.0)

SCS【5】单细胞转录组数据可视化分析 (scater)

SCS【6】单细胞转录组之细胞类型自动注释 (SingleR)

SCS【7】单细胞转录组之轨迹分析 (Monocle 3) 聚类、分类和计数细胞

SCS【8】单细胞转录组之筛选标记基因 (Monocle 3)

SCS【9】单细胞转录组之构建细胞轨迹 (Monocle 3)

SCS【10】单细胞转录组之差异表达分析 (Monocle 3)

SCS【11】单细胞ATAC-seq 可视化分析 (Cicero)

SCS【12】单细胞转录组之评估不同单细胞亚群的分化潜能 (Cytotrace)

SCS【13】单细胞转录组之识别细胞对“基因集”的响应 (AUCell)

SCS【14】单细胞调节网络推理和聚类 (SCENIC)

SCS【15】细胞交互:受体-配体及其相互作用的细胞通讯数据库 (CellPhoneDB)

SCS【16】从肿瘤单细胞RNA-Seq数据中推断拷贝数变化 (inferCNV)

SCS【17】从单细胞转录组推断肿瘤的CNV和亚克隆 (copyKAT)

SCS【18】细胞交互:受体-配体及其相互作用的细胞通讯数据库 (iTALK)


简介

单细胞技术和集成算法的最新进展使得构建包含许多供体、研究、疾病状态和测序平台的综合参考地图集成为可能。就像将测序读数映射到参考基因组一样,能够将查询细胞映射到复杂的、数百万个细胞的参考图谱上,以快速识别相关的细胞状态和表型是至关重要的。本文介绍了Symphony (https://github.com/immunogenomics/symphony),这是一种以方便、便携的格式构建大规模集成参考地图集的算法,可以在几秒钟内实现高效的查询映射。Symphony将查询细胞定位在稳定低维引用嵌入中,从而便于将引用定义的注释向下复制传输到查询。我们展示了Symphony在多个真实世界数据集中的强大功能,包括

(1)映射多供体、多物种查询以预测胰腺细胞类型;

(2)沿着胎儿肝脏造血的发育轨迹定位查询细胞;

(3)使用记忆T细胞的多模态citation-seq图谱推断表面蛋白表达。

Symphony 最早发表于 2021 年 Nature Communications 杂志上的一篇文章,文章题目为"Efficient and precise single-cell reference atlas mapping with Symphony"。发表该文章的 Raychaudhuri Lab 曾于 2019 年在 Nature Methods 杂志上发表单细胞数据整合算法 Harmony。

Symphony 算法首先将 Reference 数据集嵌入 UMAP 空间中,再将 Query 数据集(待注释数据集)嵌入到与 Reference 数据集相同的 UMAP 空间,接着使用 KNN 算法,根据 Reference 数据集,计算距离 Query 细胞最近的 K 个 Reference 细胞邻居,确定最可能的 cell type。

Symphony 包使用

1.使用已有的参考数据集进行细胞注释

2.使用自定义的参考数据集进行细胞注释

https://zenodo.org/record/5090425#.ZC03MJFBw5s

软件包安装

if(!require(harmony))install.packages("harmony")if(!require(symphony))install.packages("symphony")if(!require(symphony))devtools::install_github("immunogenomics/symphony")
suppressPackageStartupMessages({library(symphony)library(tidyverse)library(data.table)library(matrixStats)library(Matrix)library(plyr)library(dplyr)## 画图包library(ggplot2)library(ggthemes)library(ggrastr)library(RColorBrewer)library(patchwork)library(ggrepel)
})
source("F:/demo script/单细胞系列/symphony/symphony-main/vignettes/utils.R")  # color definitions and plotting functions
source("F:/demo script/单细胞系列/symphony/symphony-main/vignettes/plotBasic.R")
source("F:/demo script/单细胞系列/symphony/symphony-main/vignettes/colors.R")

数据准备

数据需要我们提前准备两个文件:

1.标准后的表达矩阵

  1. metadata 数据

下面例子的文件可到git上下载:

https://github.com/immunogenomics/symphony

在本例中,我们将从两种技术(10x 3'v1和3'v2)的两个pbmc数据集构建一个引用,然后用Symphony映射来自一种新技术(10x 5')的第三个数据集。

exprs_norm = readRDS("F:/demo script/单细胞系列/symphony/symphony-main/vignettes/data/exprs_norm_all.rds")
metadata = read.table("F:/demo script/单细胞系列/symphony/symphony-main/vignettes/data/meta_data_subtypes.txt",row.names = 1, header = T, check.names = F, sep = "\t")dim(exprs_norm)
## [1] 33694 20886
dim(metadata)
## [1] 20886     8

提取子集数据集作为reference and query

idx_query = which(metadata$donor == "5")  # use 5' dataset as the query
ref_exp_full = exprs_norm[, -idx_query]
ref_metadata = metadata[-idx_query, ]
query_exp = exprs_norm[, idx_query]
query_metadata = metadata[idx_query, ]

构建 Symphony Reference

关于如何构建Symphony Reference,有两个选项。

  1. (buildReferenceFromHarmonyObj)是更模块化的选项,这意味着用户在 Reference 压缩之前对预处理步骤有更多的控制。

  2. (buildReference)从表达式开始构建Reference,使过程更加自动化,但灵活性较低。

我们将在下面演示这两个选项。

选项1:从Harmony对象构建(首选方法)

这个选项比选项2包含更多的步骤,但是如果您想在Harmony集成步骤之前执行自己的预处理步骤,则可以使您的代码更加模块化和灵活。我们向大多数用户推荐这个选项。生成vargenes_means_sds(包含用于缩放基因的可变基因均值和标准偏差)以及为PCA步骤节省负载是很重要的。

选择可变基因,通过可变基因进行亚群内参表达

# Sparse matrix with the normalized genes x cells matrix
ref_exp_full[1:5, 1:2]
## 5 x 2 sparse Matrix of class "dgCMatrix"
##              threepfresh_AAACCTGAGCATCATC threepfresh_AAACCTGAGCTAACTC
## RP11-34P13.3                            .                            .
## FAM138A                                 .                            .
## OR4F5                                   .                            .
## RP11-34P13.7                            .                            .
## RP11-34P13.8                            .                            .
var_genes = vargenes_vst(ref_exp_full, groups = as.character(ref_metadata[["donor"]]),topn = 2000)
ref_exp = ref_exp_full[var_genes, ]
dim(ref_exp)
## [1]  3451 13189

计算并保存每个基因的均值和标准差

vargenes_means_sds = tibble(symbol = var_genes, mean = Matrix::rowMeans(ref_exp))
vargenes_means_sds$stddev = symphony::rowSDs(ref_exp, vargenes_means_sds$mean)
head(vargenes_means_sds)
## # A tibble: 6 × 3
##   symbol   mean stddev
##   <chr>   <dbl>  <dbl>
## 1 LYZ      1.80   1.91
## 2 HLA-DRA  1.90   1.71
## 3 CD74     2.56   1.54
## 4 S100A9   1.61   1.85
## 5 S100A4   2.50   1.46
## 6 FTL      3.58   1.18

使用计算出的基因均值和标准差对数据进行缩放,

ref_exp_scaled = symphony::scaleDataWithStats(ref_exp, vargenes_means_sds$mean,vargenes_means_sds$stddev,1
)#Run SVD, save gene loadings (s$u)
#install.packages("irlba")
library(irlba)
set.seed(0)
s = irlba(ref_exp_scaled, nv = 20)
Z_pca_ref = diag(s$d) %*% t(s$v)  # 每个细胞的主成分
loadings = s$uset.seed(0)
ref_harmObj = harmony::HarmonyMatrix(data_mat = t(Z_pca_ref),  ## PCA embedding matrix of cellsmeta_data = ref_metadata, ## dataframe with cell labelstheta = c(2),             ## cluster diversity enforcementvars_use = c('donor'),    ## variable to integrate outnclust = 100,             ## number of clusters in Harmony modelmax.iter.harmony = 20,return_object = TRUE,     ## return the full Harmony model objectdo_pca = FALSE            ## don't recompute PCs
)# Compress a Harmony object into a Symphony reference
reference = symphony::buildReferenceFromHarmonyObj(ref_harmObj,            # output object from HarmonyMatrix()ref_metadata,           # reference cell metadatavargenes_means_sds,     # gene names, means, and std devs for scalingloadings,               # genes x PCs matrixverbose = TRUE,         # verbose outputdo_umap = TRUE,         # Set to TRUE only when UMAP model was saved for referencesave_uwot_path = './testing_uwot_model_1')
# Optionally, you can specify which normalization method was
# used to build the reference as a custom slot inside the Symphony object to
# help record this information for future query users
reference$normalization_method = 'log(CP10k+1)'
saveRDS(reference, './testing_reference1.rds')

让我们看看参考对象包含:

meta_data: metadata
vargenes: variable genes, means, and standard deviations used for scaling
loadings: gene loadings for projection into pre-Harmony PC space
R: Soft cluster assignments
Z_orig: Pre-Harmony PC embedding
Z_corr: Harmonized PC embedding
centroids: locations of final Harmony soft cluster centroids
cache: pre-calculated reference-dependent portions of the mixture model
umap: UMAP coordinates
save_uwot_path: path to saved uwot model (for query UMAP projection into reference UMAP coordinates)
normalization_method: type of normalization used
str(reference)
## List of 13
##  $ meta_data           :'data.frame':    13189 obs. of  8 variables:
##   ..$ cell_id     : chr [1:13189] "threepfresh_AAACCTGAGCATCATC" "threepfresh_AAACCTGAGCTAACTC" "threepfresh_AAACCTGAGCTAGTGG" "threepfresh_AAACCTGCACATTAGC" ...
##   ..$ donor       : chr [1:13189] "3V2" "3V2" "3V2" "3V2" ...
##   ..$ nUMI        : int [1:13189] 2394 1694 4520 2788 4667 4440 3224 5205 5493 4419 ...
##   ..$ nGene       : int [1:13189] 871 806 1316 898 1526 1495 1253 1433 1632 1134 ...
##   ..$ percent_mito: num [1:13189] 0.0384 0.0573 0.0195 0.014 0.0362 ...
##   ..$ cell_type   : chr [1:13189] "bcells" "mono" "tcells" "tcells" ...
##   ..$ res_0.80    : int [1:13189] 3 5 2 1 0 0 6 4 0 4 ...
##   ..$ cell_subtype: chr [1:13189] "bnaive" "mono14" "cd4mem" "cd4naive" ...
##  $ vargenes            : tibble [3,451 × 3] (S3: tbl_df/tbl/data.frame)
##   ..$ symbol: chr [1:3451] "LYZ" "HLA-DRA" "CD74" "S100A9" ...
##   ..$ mean  : Named num [1:3451] 1.8 1.9 2.56 1.61 2.5 ...
##   .. ..- attr(*, "names")= chr [1:3451] "LYZ" "HLA-DRA" "CD74" "S100A9" ...
##   ..$ stddev: Named num [1:3451] 1.91 1.71 1.54 1.85 1.46 ...
##   .. ..- attr(*, "names")= chr [1:3451] "LYZ" "HLA-DRA" "CD74" "S100A9" ...
##  $ loadings            : num [1:3451, 1:20] -0.1071 -0.0704 -0.052 -0.1016 -0.0764 ...
##  $ R                   : num [1:100, 1:13189] 2.51e-10 6.46e-10 4.55e-10 1.04e-01 1.49e-02 ...
##  $ Z_orig              : num [1:20, 1:13189] 2.566 -13.635 -4.449 -0.973 -1.077 ...
##   ..- attr(*, "dimnames")=List of 2
##   .. ..$ : chr [1:20] "PC_1" "PC_2" "PC_3" "PC_4" ...
##   .. ..$ : chr [1:13189] "threepfresh_AAACCTGAGCATCATC" "threepfresh_AAACCTGAGCTAACTC" "threepfresh_AAACCTGAGCTAGTGG" "threepfresh_AAACCTGCACATTAGC" ...
##  $ Z_corr              : num [1:20, 1:13189] 2.67 -12.27 -4.08 1.98 -1.21 ...
##   ..- attr(*, "dimnames")=List of 2
##   .. ..$ : chr [1:20] "harmony_1" "harmony_2" "harmony_3" "harmony_4" ...
##   .. ..$ : chr [1:13189] "threepfresh_AAACCTGAGCATCATC" "threepfresh_AAACCTGAGCTAACTC" "threepfresh_AAACCTGAGCTAGTGG" "threepfresh_AAACCTGCACATTAGC" ...
##  $ betas               : num [1:3, 1:20, 1:100] -12.412 0.582 -0.582 1.061 0.694 ...
##  $ centroids           : num [1:20, 1:100] -0.7896 0.0674 -0.0882 0.2572 0.0249 ...
##  $ cache               :List of 2
##   ..$ : num [1:100, 1] 110 155 127 147 124 ...
##   ..$ : num [1:100, 1:20] -1362 953 693 603 405 ...
##  $ centroids_pc        :'data.frame':    100 obs. of  20 variables:
##   ..$ harmony_1 : num [1:100] -12.41 6.15 5.47 4.11 3.26 ...
##   ..$ harmony_2 : num [1:100] 1.06 0.98 4.16 -12.82 -10.71 ...
##   ..$ harmony_3 : num [1:100] -1.39 4.21 -2.19 -3.99 -3.42 ...
##   ..$ harmony_4 : num [1:100] 4.043 0.896 0.468 1.328 1.497 ...
##   ..$ harmony_5 : num [1:100] 0.391 -0.159 -0.866 -1.199 -0.176 ...
##   ..$ harmony_6 : num [1:100] 3.671 -0.22 -1.706 -1.413 0.633 ...
##   ..$ harmony_7 : num [1:100] -5.28 0.576 0.817 -0.316 0.112 ...
##   ..$ harmony_8 : num [1:100] -3.436 -1.304 2.898 0.256 1.211 ...
##   ..$ harmony_9 : num [1:100] -3.8916 0.5006 -3.4239 0.0352 0.8073 ...
##   ..$ harmony_10: num [1:100] 0.949 0.419 -2.019 0.409 -0.79 ...
##   ..$ harmony_11: num [1:100] -1.3928 0.0688 0.1749 0.2219 -1.5822 ...
##   ..$ harmony_12: num [1:100] -0.417 0.171 1.098 1.234 -4.802 ...
##   ..$ harmony_13: num [1:100] 0.825 0.353 1.206 0.164 1.535 ...
##   ..$ harmony_14: num [1:100] -0.5103 0.0407 0.6792 -0.9357 3.3156 ...
##   ..$ harmony_15: num [1:100] -0.5641 -0.0995 0.8319 -0.1903 0.1557 ...
##   ..$ harmony_16: num [1:100] -0.136 0.27 0.495 0.331 0.514 ...
##   ..$ harmony_17: num [1:100] -0.0144 0.2031 0.0241 -0.3746 0.118 ...
##   ..$ harmony_18: num [1:100] -0.705 -0.202 0.028 0.155 -0.282 ...
##   ..$ harmony_19: num [1:100] 0.1726 -0.1749 0.4197 -0.4117 0.0885 ...
##   ..$ harmony_20: num [1:100] 0.6368 0.0867 0.5938 0.0761 -0.5495 ...
##  $ umap                :List of 1
##   ..$ embedding: num [1:13189, 1:2] 3.59 7.48 -2.24 -4.31 6.68 ...
##   .. ..- attr(*, "scaled:center")= num [1:2] 0.248 0.203
##   .. ..- attr(*, "dimnames")=List of 2
##   .. .. ..$ : chr [1:13189] "threepfresh_AAACCTGAGCATCATC" "threepfresh_AAACCTGAGCTAACTC" "threepfresh_AAACCTGAGCTAGTGG" "threepfresh_AAACCTGCACATTAGC" ...
##   .. .. ..$ : chr [1:2] "UMAP1" "UMAP2"
##  $ save_uwot_path      : chr "./testing_uwot_model_1"
##  $ normalization_method: chr "log(CP10k+1)"

可视化参考UMAP

reference = readRDS('./testing_reference1.rds')
umap_labels = cbind(ref_metadata, reference$umap$embedding)
fig.size(3, 5)
plotBasic(umap_labels, title = 'Reference', color.by = 'cell_type')
## Warning: The `<scale>` argument of `guides()` cannot be `FALSE`. Use "none" instead as
## of ggplot2 3.3.4.

选项2:从头构建(从表达式开始)

该选项从统一管道中的表达式开始计算引用对象,使预处理步骤自动化。

set.seed(0)
reference = symphony::buildReference(ref_exp_full,ref_metadata,vars = c('donor'),         # variables to integrate overK = 100,                   # number of Harmony clustersverbose = TRUE,            # verbose outputdo_umap = TRUE,            # can set to FALSE if want to run umap separately laterdo_normalize = FALSE,      # set to TRUE if input counts are not normalized yetvargenes_method = 'vst',   # method for variable gene selection ('vst' or 'mvp')vargenes_groups = 'donor', # metadata column specifying groups for variable gene selection topn = 2000,               # number of variable genes to choose per groupd = 20,                    # number of PCssave_uwot_path = './testing_uwot_model_2'
)
reference$normalization_method = 'log(CP10k+1)' # optionally save normalization method in custom slot# Save reference (modify with your desired output path)
saveRDS(reference, './testing_reference2.rds')reference = readRDS('./testing_reference2.rds')

可视化参考UMAP

fig.size(3, 5)
plotBasic(umap_labels, title = "Reference", color.by = "cell_type")

Map query

为了将新的查询数据集映射到reference,您需要从上述步骤中保存一个reference对象,以及查询细胞表达式和meta数据。假设查询数据集已经按照与参考单细胞相同的方式进行了规范化(这里默认为log(CP10k+1)规范化)。

reference = readRDS('./testing_reference1.rds')
#### Map query
query = mapQuery(query_exp,             # query gene expression (genes x cells)query_metadata,        # query metadata (cells x attributes)reference,             # Symphony reference objectdo_normalize = FALSE,  # perform log(CP10k+1) normalization on querydo_umap = TRUE)        # project query cells into reference UMAPquery = knnPredict(query, reference, reference$meta_data$cell_type, k = 5)
str(query)
## List of 7
##  $ exp      :Formal class 'dgCMatrix' [package "Matrix"] with 6 slots
##   .. ..@ i       : int [1:13735662] 45 59 72 110 139 153 160 197 211 242 ...
##   .. ..@ p       : int [1:7698] 0 1318 3482 5594 7120 8452 10450 11318 12889 14154 ...
##   .. ..@ Dim     : int [1:2] 33694 7697
##   .. ..@ Dimnames:List of 2
##   .. .. ..$ : chr [1:33694] "RP11-34P13.3" "FAM138A" "OR4F5" "RP11-34P13.7" ...
##   .. .. ..$ : chr [1:7697] "fivePrime_AAACCTGAGCGATAGC" "fivePrime_AAACCTGAGCTAAACA" "fivePrime_AAACCTGAGGGAGTAA" "fivePrime_AAACCTGAGTCTTGCA" ...
##   .. ..@ x       : num [1:13735662] 1.54 1.54 2.13 1.54 1.54 ...
##   .. ..@ factors : list()
##  $ meta_data:'data.frame':   7697 obs. of  10 variables:
##   ..$ cell_id                : chr [1:7697] "fivePrime_AAACCTGAGCGATAGC" "fivePrime_AAACCTGAGCTAAACA" "fivePrime_AAACCTGAGGGAGTAA" "fivePrime_AAACCTGAGTCTTGCA" ...
##   ..$ donor                  : chr [1:7697] "5" "5" "5" "5" ...
##   ..$ nUMI                   : int [1:7697] 2712 6561 6322 4528 3426 6199 2378 4934 3654 12842 ...
##   ..$ nGene                  : int [1:7697] 1318 2164 2112 1526 1332 1998 868 1571 1265 3159 ...
##   ..$ percent_mito           : num [1:7697] 0.0664 0.0565 0.0562 0.072 0.0683 ...
##   ..$ cell_type              : chr [1:7697] "nk" "mono" "mono" "tcells" ...
##   ..$ res_0.80               : int [1:7697] 9 0 0 11 0 8 1 1 3 0 ...
##   ..$ cell_subtype           : chr [1:7697] "nk" "mono14" "mono14" "cd8eff" ...
##   ..$ cell_type_pred_knn     : Factor w/ 7 levels "","bcells","dc",..: 6 5 5 7 5 2 7 7 2 5 ...
##   .. ..- attr(*, "prob")= num [1:7697] 1 1 1 1 1 1 1 1 1 1 ...
##   ..$ cell_type_pred_knn_prob: num [1:7697] 1 1 1 1 1 1 1 1 1 1 ...
##  $ Z        : num [1:20, 1:7697] 3.575 8.814 -11.757 -0.905 -2.626 ...
##   ..- attr(*, "dimnames")=List of 2
##   .. ..$ : chr [1:20] "harmony_1" "harmony_2" "harmony_3" "harmony_4" ...
##   .. ..$ : chr [1:7697] "fivePrime_AAACCTGAGCGATAGC" "fivePrime_AAACCTGAGCTAAACA" "fivePrime_AAACCTGAGGGAGTAA" "fivePrime_AAACCTGAGTCTTGCA" ...
##  $ Zq_pca   : num [1:20, 1:7697] 3.58 10.64 -13.73 -4.33 -2.56 ...
##   ..- attr(*, "dimnames")=List of 2
##   .. ..$ : chr [1:20] "PC_1" "PC_2" "PC_3" "PC_4" ...
##   .. ..$ : chr [1:7697] "fivePrime_AAACCTGAGCGATAGC" "fivePrime_AAACCTGAGCTAAACA" "fivePrime_AAACCTGAGGGAGTAA" "fivePrime_AAACCTGAGTCTTGCA" ...
##  $ R        : num [1:100, 1:7697] 4.11e-11 6.23e-11 9.70e-07 8.12e-12 6.80e-12 ...
##  $ Xq       :Formal class 'dgCMatrix' [package "Matrix"] with 6 slots
##   .. ..@ i       : int [1:15394] 0 1 0 1 0 1 0 1 0 1 ...
##   .. ..@ p       : int [1:7698] 0 2 4 6 8 10 12 14 16 18 ...
##   .. ..@ Dim     : int [1:2] 2 7697
##   .. ..@ Dimnames:List of 2
##   .. .. ..$ : NULL
##   .. .. ..$ : NULL
##   .. ..@ x       : num [1:15394] 1 1 1 1 1 1 1 1 1 1 ...
##   .. ..@ factors : list()
##  $ umap     : num [1:7697, 1:2] -4.29 5.95 6.67 -5.21 7.44 ...
##   ..- attr(*, "dimnames")=List of 2
##   .. ..$ : chr [1:7697] "fivePrime_AAACCTGAGCGATAGC" "fivePrime_AAACCTGAGCTAAACA" "fivePrime_AAACCTGAGGGAGTAA" "fivePrime_AAACCTGAGTCTTGCA" ...
##   .. ..$ : chr [1:2] "UMAP1" "UMAP2"

让我们看看这个查询对象包含了什么:

Z: query cells in reference Harmonized embedding
Zq_pca: query cells in pre-Harmony reference PC embedding (prior to correction)
R: query cell soft cluster assignments
Xq: query cell design matrix for correction step
umap: query cells projected into reference UMAP coordinates (using uwot)
meta_data: metadata

查询细胞类型预测现在在cell_type_pred_knn列中。cell_type_pred_knn_prob列报告获胜投票的最近邻居的比例(可以帮助识别落在两种参考细胞类型之间的“边界”上的查询细胞)。

head(query$meta_data)
##                                               cell_id donor nUMI nGene
## fivePrime_AAACCTGAGCGATAGC fivePrime_AAACCTGAGCGATAGC     5 2712  1318
## fivePrime_AAACCTGAGCTAAACA fivePrime_AAACCTGAGCTAAACA     5 6561  2164
## fivePrime_AAACCTGAGGGAGTAA fivePrime_AAACCTGAGGGAGTAA     5 6322  2112
## fivePrime_AAACCTGAGTCTTGCA fivePrime_AAACCTGAGTCTTGCA     5 4528  1526
## fivePrime_AAACCTGAGTTCGATC fivePrime_AAACCTGAGTTCGATC     5 3426  1332
## fivePrime_AAACCTGCACACTGCG fivePrime_AAACCTGCACACTGCG     5 6199  1998
##                            percent_mito cell_type res_0.80 cell_subtype
## fivePrime_AAACCTGAGCGATAGC   0.06637168        nk        9           nk
## fivePrime_AAACCTGAGCTAAACA   0.05654626      mono        0       mono14
## fivePrime_AAACCTGAGGGAGTAA   0.05615312      mono        0       mono14
## fivePrime_AAACCTGAGTCTTGCA   0.07199647    tcells       11       cd8eff
## fivePrime_AAACCTGAGTTCGATC   0.06830123      mono        0       mono14
## fivePrime_AAACCTGCACACTGCG   0.05097596    bcells        8         bmem
##                            cell_type_pred_knn cell_type_pred_knn_prob
## fivePrime_AAACCTGAGCGATAGC                 nk                       1
## fivePrime_AAACCTGAGCTAAACA               mono                       1
## fivePrime_AAACCTGAGGGAGTAA               mono                       1
## fivePrime_AAACCTGAGTCTTGCA             tcells                       1
## fivePrime_AAACCTGAGTTCGATC               mono                       1
## fivePrime_AAACCTGCACACTGCG             bcells                       1

可视化 Mapping 结果,绘制所有细胞的UMAP可视化图

# Sync the column names for both data frames
reference$meta_data$cell_type_pred_knn = NA
reference$meta_data$cell_type_pred_knn_prob = NA
reference$meta_data$ref_query = "reference"
query$meta_data$ref_query = "query"# Add the UMAP coordinates to the metadata
meta_data_combined = rbind(query$meta_data, reference$meta_data)
umap_combined = rbind(query$umap, reference$umap$embedding)
umap_combined_labels = cbind(meta_data_combined, umap_combined)# Plot UMAP visualization of all cells
fig.size(3, 5)
plotBasic(umap_combined_labels, title = "Reference and query cells", color.by = "ref_query")

绘制所有细胞的 query 和 reference

fig.size(3, 7)
plotBasic(umap_combined_labels, title = "Reference and query cells", color.by = "cell_type",facet.by = "ref_query")

References:

  1. Kang, J.B., Nathan, A., Weinand, K. et al. Efficient and precise single-cell reference atlas mapping with Symphony. Nat Commun 12, 5890 (2021). https://doi.org/10.1038/s41467-021-25957-x

这个细胞互作软件包代码量还是很多的,需要具有一定 R 语言编程基础,并不是看起来那么简单,所有好多老师想直接自己学习教程来分析,但是实质上没有基础还是很难实现,每步报错都不知道该怎样处理,是最崩溃的,所以有需求的老师可以联系桓峰基因,提供最优质的服务!!!

桓峰基因,铸造成功的您!

未来桓峰基因公众号将不间断的推出单细胞系列生信分析教程,

敬请期待!!

桓峰基因和投必得合作,文章润色优惠85折,需要文章润色的老师可以直接到网站输入领取桓峰基因专属优惠券码:KYOHOGENE,然后上传,付款时选择桓峰基因优惠券即可享受85折优惠哦!https://www.topeditsci.com/

有想进生信交流群的老师可以扫最后一个二维码加微信,备注“单位+姓名+目的”,有些想发广告的就免打扰吧,还得费力气把你踢出去!

SCS【19】单细胞自动注释细胞类型 (Symphony)相关推荐

  1. SCS【6】单细胞转录组之细胞类型自动注释 (SingleR)

    点击关注,桓峰基因 桓峰基因公众号推出单细胞系列教程,有需要生信分析的老师可以联系我们!首选看下转录分析教程整理如下: Topic 6. 克隆进化之 Canopy Topic 7. 克隆进化之 Car ...

  2. 单细胞基础分析 | 基因细胞类型特异性富集分析

    本文目标是:通过分析单细胞的数据,根据已有的细胞分型,去看我们感兴趣的基因集在这些细胞类型中的富集情况.单细胞数据和bulk数据会有些不同,可能一些具体的技巧需要注意一下. 1.切换到R4环境,加载R ...

  3. 清华团队通过监督贝叶斯嵌入,对单细胞染色质可及性数据进行细胞类型注释...

    本文约3200字,建议阅读9分钟 本文介绍了清华团队在单细胞技术的最新进展. 单细胞技术的最新进展使得能够在细胞水平上表征表观基因组异质性.鉴于细胞数量呈指数增长,迫切需要用于自动细胞类型注释的计算方 ...

  4. 单细胞测序流程(六)单细胞的细胞类型的注释

    系列文章目录 单细胞测序流程(一)简介与数据下载 单细胞测序流程(二)数据整理 单细胞测序流程(三)质控和数据过滤--Seurat包分析,小提琴图和基因离差散点图 单细胞测序流程(四)主成分分析--P ...

  5. Nat Mach Intell | 江瑞课题组提出首个针对单细胞染色质开放性数据的细胞类型辨识神经网络模型EpiAnno...

    2022年2月10日,清华大学自动化系江瑞课题组在Nature Machine Intelligence发表了题为"Cell type annotation of single-cell c ...

  6. scBERT:用于scRNA-seq细胞类型注释的大规模预训练语言模型

    目录 背景与动机 结果 scBERT gene embedding expression embedding 结果表现 背景与动机 单细胞RNA测序(scRNA-seq)已广泛用于在单细胞水平表征复杂 ...

  7. BiB:王秀杰/裴小兵合作开发单细胞组学细胞标记基因鉴定算法COSG

    在单细胞测序数据分析中,对细胞进行准确分类是数据分析的重要基础.在利用聚类算法将细胞进行分组后,需要通过鉴定不同细胞群特异表达的标记基因来注释细胞类型.同时,细胞标记基因在所有细胞类群中的表达模式也能 ...

  8. Briefings in Bioinformatics | 王秀杰/裴小兵合作开发单细胞组学细胞标记基因鉴定算法COSG...

    在单细胞测序数据分析中,对细胞进行准确分类是数据分析的重要基础.在利用聚类算法将细胞进行分组后,需要通过鉴定不同细胞群特异表达的标记基因来注释细胞类型.同时,细胞标记基因在所有细胞类群中的表达模式也能 ...

  9. 分析方法,单细胞测序之细胞互作

    今年张泽民团队与其合作团队在Cell发表的一篇文章将细胞交互运用自如[1],该研究手段值得借鉴.肿瘤微环境由很多细胞类型组成,每一种细胞类型的异质性,如肿瘤细胞的不同克隆或免疫细胞的不同亚群,进一步增 ...

最新文章

  1. android 自定义音乐圆形进度条,Android自定义View实现音频播放圆形进度条
  2. Spring Boot + MyBatis + Druid + PageHelper 实现多数据源并分页
  3. Mysql对事务的支持
  4. 逻辑运算和作用域的问题
  5. 日志,错误日志,成功日志,日志是个好东西。
  6. HyperV2012的学习,从这里开始
  7. Oracle 11G 64位发布出现错误
  8. mysql存储过程和自定义函数_MySQL存储过程/存储过程与自定义函数的区别
  9. 什么是计算机游戏技术,dlss技术是什么意思有什么用?目前支持dlss的游戏有哪些?...
  10. firebug和firepath插件下载安装
  11. matlab gui制作,MATLAB GUI制作教程
  12. 计算机求百钱买百鸡问题采用,5.5 百钱买百鸡问题
  13. foo, bar, 甲乙丙丁
  14. Pycharm远程调试服务器代码出错:[Errno 2] No such file or directory
  15. 穿越NAT的p2p通信方法研究
  16. Firefox(火狐浏览器)常用插件
  17. deepin 安装显卡后,可支持双屏展示
  18. socketio使用
  19. 【USB描述符系列】触控电视
  20. 微机化远动系统与计算机网络,2012年1月自考电力系统远动及调度自动化试题

热门文章

  1. phantomjs java 爬虫_项目 | Java+PhantomJs爬虫实战——半次元 下载高清原图
  2. style对象和less/scss互相转换,驼峰转中横线,支持嵌套转换
  3. try catch异常捕获
  4. 第十五届“中国电机工程学会杯”全国大学生电工数学建模竞赛
  5. matlab平稳随机过程的功率谱密度,平稳随机过程及功率谱密度.ppt
  6. 停车场自动停车,取车创意
  7. 利用MATLAB计算 生物韵律(biorhythm)的周期
  8. C语言中可以用字符串常量来,C语言中的字符串常量
  9. 为新手准备的安卓连接经典蓝牙模块教程
  10. git pull和git pull -- rebase