1、什么是bed格式

1、文本文件

2、表明基因组的一段区域

3、标准的bed文件最少三列，最多十二列

eg：1、chrom 孔 2、start 开始 3、end 结束 4、name 名称 5、score 存一个数 6、 strand + or -

2、bed格式的使用

1、储存基因区

2、储存基因组的某些位点信息

3、储存CHIP-seq、ATAC-seq等的富集的peak信息

3、bedtools 是一种常用的bed操作工具，可以实现非常多的常用功能

① 两个Bed文件的交集（bedtools intersect）

②bed文件按照基因组坐标排序（bedtools sort）

③对bed文件进行扩大、平移（bedtools shift）

④对bed文件进行随机提取（bedtools random）

⑤根据提供的bed文件在基因组进行随机抽取（bedtools shuffle）

4、尝试用python写bedtools里的功能

交集（bedtools intersect）

一对一

读取文件

# step load reference orderref_order_dict = {}ref_fai = open("./xxxx.fa.fai","rb")index = 0for line in ref_fai:line_list = line.strip().split("\t")ref_order_dict[line_list[0]] = indexindex += 1ref_fai.close()

#step2 test interect#2.1 open file
from cv2 import splitbed_a = open("./CTCF_rep1.sort.bed","r")
bed_b = open("./CTCF_rep2.sort.bed","r")#initline_a = bed_a.readline()
line_b = bed_b.readline()overlap_count = 0while bed_a and bed_b:line_list_a = line_a.strip().split("\t")line_list_b = line_b.strip().split("\t")#same chromosome 相同染色体 if line_list_a[0] == line_list_b[0]:start_a = int(line_list_a[1])end_a = int(line_list_a[2])start_b = int(line_list_b[1])end_b = int(line_list_b[2])#no-overlap a upstream  上游没有重叠if end_a < start_b:line_a = bed_a.readline()#no-overlap a downstream  下游没有重叠if start_a > end_b:line_b = bed_b.readline()#overlap 重叠else:overlap_count += 1temp_line_a = line_atemp_line_b = line_bline_a = bed_a.readline()line_b = bed_b.readline()#differ chomosome #不同的染色体else:order_a = ref_order_dict.get(line_list_a[0])order_b = ref_order_dict.get(line_list_b[0])if order_a < order_b:line_a = bed_a.readline()else:line_b = bed_b.readline()print(overlap_count)

多对多

def cmp_region(region_a,region_b,ref_order_dict):"""INPUT:<region_a> <region_b>str, line from BED fileOUTPUT:-1 region_a at upstream of region_b0  region_a overlaps with region_b1  region_a at downstream of region_b"""if region_a is None:return 1if region_b is None:return -1region_list_a = region_a.strip().split("\t")region_list_b = region_b.strip().split("\t")order_a = ref_order_dict.get(region_list_a[0])order_b = ref_order_dict.get(region_list_b[0])if order_a < order_b:return -1if order_a > order_b:return 1#get region instart_a = int(region_list_a[1])end_a = int(region_list_a[2])start_b = int(region_list_b[1])end_b = int(region_list_b[2])#no-overlap a upstreamif end_a < start_b:return -1if start_a > end_b:return 1return 0

def update_temp_info(temp_merge_line,temp_chr_name,temp_chr_start,temp_chr_end,new_line):""""""new_line_list = new_line.strip().split("\t")if temp_merge_line is None:temp_merge_line = new_linetemp_chr_name = new_line_list[0]temp_chr_start = int(new_line_list[1])temp_chr_end = int(new_line_list[2])else:temp_chr_name = new_line_list[0]temp_chr_start = min(temp_chr_start, int(new_line_list[1]))temp_chr_end = max(temp_chr_end, int(new_line_list[2]))temp_merge_line = "%s\t%s\t%s" % (temp_chr_name, temp_chr_start, temp_chr_end)return temp_merge_line, temp_chr_name, temp_chr_start, temp_chr_end

# find all a-b intersect# 2.1 open filebed_a = open("./test_multi_region_a.sort.bed","r")
bed_b = open("./test_multi_region_b.sort.bed","r")line_a = bed_a.readline()
line_b = bed_b.readline()temp_list = []
temp_chr_name = None
temp_chr_start = None
temp_chr_end = None
temp_merge_line = None
new_line_state = Nonewhile line_a and line_a:#check temp listif len(temp_list) > 0 and new_line_state:temp_cmp_res = cmp_region(line_a,temp_merge_line, ref_order_dict)if temp_cmp_res == 1:temp_list = []temp_chr_name = Nonetemp_chr_start = Nonetemp_chr_end = Nonetemp_merge_line = Noneelif temp_cmp_res == 0:for temp_line_b in temp_list:run_cmp_res = cmp_region(line_a, temp_line_b, ref_order_dict)if run_cmp_res == 0:print(line_a.strip(), temp_line_b.strip())print("-" * 80)#check currentcmp_res = cmp_region(line_a, line_b, ref_order_dict)if cmp_res == -1:line_a = bed_a.readline()new_line_state = Trueelif cmp_res == 1:new_line_state = Falseline_b_list = line_b.strip().split("\t")if line_a.split("\t")[0] == line_b_list[0]:temp_list.append(line_b)temp_merge_line, temp_chr_name, temp_chr_start, temp_chr_end = update_temp_info(temp_merge_line, temp_chr_name, temp_chr_start, temp_chr_end,line_b)else:temp_list = []temp_chr_name = Nonetemp_chr_start = Nonetemp_chr_end = Nonetemp_merge_line = Noneline_b = bed_b.readline()else:new_line_state = Falseprint(line_a.strip(), line_b.strip())print("-" * 80)temp_list.append(line_b)temp_merge_line, temp_chr_name, temp_chr_start, temp_chr_end = update_temp_info(temp_merge_line, temp_chr_name, temp_chr_start, temp_chr_end,line_b)  line_b = bed_b.readline()# check file end
file_b_end_state = False
file_a_end_state = Falseif line_b == "":file_b_end_state = Trueif line_a == "":file_a_end_state = Truewhile not file_a_end_state:# read new lineline_a = bed_a.readline()new_line_state = Trueif line_a == "":break# check temp listif len(temp_list) > 0 and new_line_state: temp_cmp_res = cmp_region(line_a, temp_merge_line, ref_order_dict)if temp_cmp_res == 1:breakelif temp_cmp_res == 0:for temp_line_b in temp_list:run_cmp_res = cmp_region(line_a, temp_line_b, ref_order_dict)if run_cmp_res == 0:print(line_a.strip(), temp_line_b.strip())print("-" * 80)

写程序的本质还是数学吧慢慢学习

BED文件与bedtools简介相关推荐

Laravel 文件夹结构简介
表 1.1:Laravel 文件夹结构简介文件夹名称简介 app 应用程序的业务逻辑代码存放文件夹 app/Console 存放自定义 Artisian 命令文件 app/Http/Control ...
FreeRTOS 文件夹内容简介
FreeRTOS 文件夹内容简介 1.Source 文件夹底下单位portable文件夹内的RVDS 文件夹,下面包含了各种处理器相关的文件夹. 接口文件都是跟硬件密切相关的,不同的硬件接口文件是不 ...
linux文件的操作原理简介以及实现linux cp命令的代码
1.文件操作原理简介他可以这样写代码因为咱们只读了五个字节所以多的读不出来简单的说open 静态文件后产生动态文件 2.实现linux cp命令的代码原理用代码实现出来 ...
yaml文件 *.yml 写法简介
YAML文件简介我们可能在spring配置文件里见到过.yml格式的东东,配置文件不都是.propertie或者.xml文件吗?.yml是什么鬼,今天我带你们来一探究竟. YAML(Yet Anot ...
android bp文件_Android.bp 简介
Android.bp 简介大部分内容来自官方文档,本文目的为用于备份查询. Android 编译系统从 Android 7.0 开始,Ninja 成为默认的编译框架.Ninja 是一个致力于速度的 ...
linux shell 文件比较 diff 简介
diff 可以用来比较文件和文件夹是否相同比较文件 diff file1 file2 >/dev/null 比较文件夹 diff -rNaq dir1 dir2 >/dev/null - ...
linux文件显示程序,Linux下文件显示命令简介
文件操作,但是目录操作我们也是一样的.因为在Linux中,一切皆文件,目录也是文件.只不过目录文件是的文件内容是里面的文件名列表. 下面这些内容主要针对文件的文件内容操作.对于目录文件的内容操作有专门 ...
文件上传简介1---上传到指定的目录
preparation 本节摘要:本节主要介绍上传文件到指定目录. 引入: 文件上传是开发中常用的功能,本节主要介绍用commons-fileupload-1.1.jar包实现基本的文件上传功能,即上 ...
用户态与内核态文件流与文件描述符简介【转】
转自:https://www.cnblogs.com/Jimmy1988/p/7479856.html 用户态和内核态程序代码的依赖和调用关系如下图所示: Lib:标准ASCI C函数,几乎所有的平 ...

BED文件与bedtools简介

1、什么是bed格式

2、bed格式的使用

3、bedtools 是一种常用的bed操作工具，可以实现非常多的常用功能

4、尝试用python写bedtools里的功能

交集（bedtools intersect）

一对一

BED文件与bedtools简介相关推荐

最新文章

热门文章

BED文件与bedtools简介

1、什么是bed格式

2、bed格式的使用

3、bedtools 是一种常用的bed操作工具，可以实现非常多的常用功能

4、尝试用python写bedtools里的功能

交集 （bedtools intersect）

一对一

BED文件与bedtools简介相关推荐

最新文章

热门文章

交集（bedtools intersect）