单细胞数据读取(二)之Read10X读不出来dgCMatrix报错
前面我们也遇到过10x的数据读取不进去,如果大家遇到下面的报错,可以通过修改10x的原始重新读取,详细可以见链接https://blog.csdn.net/weixin_43949246/article/details/121225791
当然,这次跟大家说的是另一种10x的数据,同样的,我们先说报错,如果出现dgTMatrix的报错,我们该怎么办呢?
这里给大家推荐一种解决方法,首先先定义个函数,修改读取的方式
read10x_ymc <- function(data.dir = NULL,gene.column = 2,unique.features = TRUE,strip.suffix = FALSE,prefix='',suffix="tsv"
) {full.data <- list()for (i in seq_along(along.with = data.dir)) {run <- data.dir[i]if (!dir.exists(paths = run)) {stop("Directory provided does not exist")}barcode.loc <- file.path(run, paste(prefix,'barcodes.',suffix,sep=''))gene.loc <- file.path(run, paste(prefix,'genes.',suffix,sep=''))features.loc <- file.path(run, paste(prefix,'features.',suffix,'.gz',sep=''))matrix.loc=file.path(run,paste(prefix,'matrix.mtx',sep=''))# Flag to indicate if this data is from CellRanger >= 3.0pre_ver_3 <- file.exists(gene.loc)if (!pre_ver_3) {addgz <- function(s) {return(paste0(s, ".gz"))}barcode.loc <- addgz(s = barcode.loc)matrix.loc <- addgz(s = matrix.loc)}if (!file.exists(barcode.loc)) {stop("Barcode file missing. Expecting ", basename(path = barcode.loc))}if (!pre_ver_3 && !file.exists(features.loc) ) {stop("Gene name or features file missing. Expecting ", basename(path = features.loc))}if (!file.exists(matrix.loc)) {print(matrix.loc)stop("Expression matrix file missing. Expecting ", basename(path = matrix.loc))}data <- readMM(file = matrix.loc) cell.names <- readLines(barcode.loc)if(!dim(data)[2]==length(cell.names)){data <- t(readMM(file = matrix.loc))}###这是与cellranger输出结果最大的不同if (all(grepl(pattern = "\\-1$", x = cell.names)) & strip.suffix) {cell.names <- as.vector(x = as.character(x = sapply(X = cell.names,FUN = ExtractField,field = 1,delim = "-")))}if (is.null(x = names(x = data.dir))) {if (i < 2) {colnames(x = data) <- cell.names} else {colnames(x = data) <- paste0(i, "_", cell.names)}} else {colnames(x = data) <- paste0(names(x = data.dir)[i], "_", cell.names)}feature.names <- read.delim(file = ifelse(test = pre_ver_3, yes = gene.loc, no = features.loc),header = FALSE,stringsAsFactors = FALSE)if (any(is.na(x = feature.names[, gene.column]))) {warning('Some features names are NA. Replacing NA names with ID from the opposite column requested',call. = FALSE,immediate. = TRUE)na.features <- which(x = is.na(x = feature.names[, gene.column]))replacement.column <- ifelse(test = gene.column == 2, yes = 1, no = 2)feature.names[na.features, gene.column] <- feature.names[na.features, replacement.column]}if (unique.features) {fcols = ncol(x = feature.names)if (fcols < gene.column) {stop(paste0("gene.column was set to ", gene.column," but feature.tsv.gz (or genes.tsv) only has ", fcols, " columns."," Try setting the gene.column argument to a value <= to ", fcols, "."))}rownames(x = data) <- make.unique(names = feature.names[, gene.column])}# In cell ranger 3.0, a third column specifying the type of data was added# and we will return each type of data as a separate matrixif (ncol(x = feature.names) > 2) {data_types <- factor(x = feature.names$V3)lvls <- levels(x = data_types)if (length(x = lvls) > 1 && length(x = full.data) == 0) {message("10X data contains more than one type and is being returned as a list containing matrices of each type.")}expr_name <- "Gene Expression"if (expr_name %in% lvls) { # Return Gene Expression firstlvls <- c(expr_name, lvls[-which(x = lvls == expr_name)])}data <- lapply(X = lvls,FUN = function(l) {return(data[data_types == l, , drop = FALSE])})names(x = data) <- lvls} else{data <- list(data)}full.data[[length(x = full.data) + 1]] <- data}list_of_data <- list()for (j in 1:length(x = full.data[[1]])) {list_of_data[[j]] <- do.call(cbind, lapply(X = full.data, FUN = `[[`, j))# Fix for Issue #913print('yaomengcheng')list_of_data[[j]] <- as(object = list_of_data[[j]], Class = "dgCMatrix")}names(x = list_of_data) <- names(x = full.data[[1]])# If multiple features, will return a list, otherwise# a matrix.if (length(x = list_of_data) == 1) {return(list_of_data[[1]])} else {return(list_of_data)}
}
这时候重新读取进去,这样问题就可以解决
my.data<-read10x_ymc(data.dir = 'L1/',gene.column =2, prefix='',suffix="tsv")
sce=CreateSeuratObject(counts = my.data, project = 'L1', #每个基因至少在3个细胞中表达,每一个细胞至少有250个基因表达min.cells = 3, min.features = 250)
其实这里出现的问题,最主要是出现在dgCMatrix包这个问题上,而R语言中dgCMatrix包是用来专门大型稀疏矩阵的,当然大家可选择性重新安装Matrix包,但是不一定能解决问题,最好的方式,可以通过上述的方法解决该问题。有问题的小伙伴也可以私信我哈!
单细胞数据读取(二)之Read10X读不出来dgCMatrix报错相关推荐
- HQL语句中数据类型转换,及hibernate中createQuery执行hql报错
一.HQL语句中数据类型转换: 我们需要从数据库中取出序号最大的记录,想到的方法就是使用order by子句进行排序(desc倒序),然后取出第一个对象,可是当初设计数据库时(我们是在原来的数据库的基 ...
- 太爽了今天解决了大问题!——LOL英雄联盟读条后崩溃报错error,错误LOL_public……一下午终于解决
LOL英雄联盟读条后崩溃报错error,错误LOL_public-- 背景:是这样,前一天LOL退出时被我强退了,接着今天一如既往学了会python然后打算下把棋(云顶之弈),对了在这之前还搞了下自己 ...
- 【代码bug消除】PHM 2012轴承数据读取和XJTU-SY轴承数据读取(二)
这里写自定义目录标题 PHM 2012轴承数据读取和XJTU-SY轴承数据读取 XJTU的代码截选line12-13行 PHM 2012的代码截选line12-13行 PHM 2012的代码截选lin ...
- job kettle 导出_kettle从hive2导出数据正常,但是用kettle manager启动就报错
该楼层疑似违规已被系统折叠 隐藏此楼查看此楼 kettle连接hive2,并导出hive2数据,在kettle运行正常,但是用kettle manager启动job就报错: 2017/03/31 14 ...
- Sqoop在导入MySQL数据时遇到Timestamp列为0000-00-00 00:00:00报错
为什么80%的码农都做不了架构师?>>> Sqoop在导入MySQL数据时遇到Timestamp列为'0000-00-00 00:00:00'时报错,解决方法是:在JDBC连接 ...
- Oracle导入大数据量(百万以上)dmp文件,报错ora-12592 :包错误
进行自动化测试过程中,发现需要重新搭建一套自动化测试库,然后利用pl/sql对数据库导出: 进行导入后发现报错ora-12592 :包错误 原因分析,数据量过大,传输超时,需要在Oracle服务端以及 ...
- navicat导入成功但是没有数据_数据库能建立成功,但是还报错,应该怎么处理...
工作环境:MySql5.7 [SQL]create DATABASE mytest;受影响的行: 1时间: 0.001s[Err] 1055 - Expression #1 of ORDER BY c ...
- python向数据库插入字符串数据,字符串中含有单引号,入库报错解决办法
目录 问题 解决办法 示例 问题 最近处理部分数据,某些字段的数据中,含有单引号(含有双引号应该没有影响,如果报错,解决办法和单引号的方式类似)入库是会报错. 针对这一问题,解决办法如下: ...
- Python opencv 库cv.imread()读取图片为空None,cv.imshow()报错:error: (-215:Assertion failed) _src.empty() in ..
报错原因: cv.imread()读取图片为空None,原因很大可能是路径有问题,要么是在路径下找不到图片,要么就是路径中包含有中文(特别要注意是不是路径含有中文):图片读取为空从而导致了在显示图片时 ...
最新文章
- javascript时间处理
- centos7加固手册
- SVG关注复杂图形的网页绘制技术
- 校验用户手机号是否合法
- 数据结构---顺序查找和二分查找
- 一句话加速grep近30倍
- 《前端工程化体系设计与实践》-笔记
- Angular Extends
- OpenStack 已死?
- iOS - 视频开发
- CI框架 设置全局常量、全局变量
- python爬虫代码示例视频教学-清华学霸尹成Python爬虫教学视频
- IR2103驱动+双H桥电路=步进电机
- 什么是python语言的解释性?
- 有1234四个数字java_用java程序编写,1234这四个数进行排列组合,
- 上下取整函数的关系以及一些重要性质(附证明)
- 当一个女生说她要减肥的时候
- 操作系统实战 45 讲:运行HelloOS界面
- 【Linux】linux进程--进程控制:进程创建、进程终止、进程等待、进程程序替换
- Windows CMD命令大全(值得收藏)