Genome Assembly as Shortest Superstring

# 引入正则表达式模块
import redef gen_cf(m, n):"""该函数是用来计算 m序列尾部和 n序列头部重叠碱基个数"""max_gen = 0for j in range(len(m), 3, -1):if m[-j:] == n[0:j]:max_gen = jreturn max_gendef order(m, n):"""该函数通过调用gen_cf函数，来判断 m序列尾部和n序列头部连接 还是n序列尾部和 m序列头部连接,最后返回一个列表。列表包含序列之间连接的信息还有重叠碱基数"""order_1 = gen_cf(m, n)order_2 = gen_cf(n, m)if order_1 > order_2:return [1, order_1]else:return [2, order_2]def add(m, n):"""该函数通过调用order函数，将m n 连接起来"""j = order(m, n)if j[0] == 1:return m + n[j[1]:]else:return n + m[j[1]:]# 打开TXT文件，读取信息并将序列存为列表 substring_list
with open('G:/PycharmProjects/pythonProject1/hi/information/files/rosalind_long.txt') as ds:s = ''for i in ds:s += i.strip()substring_list = [i for i in re.findall(r'[TACG]+', s)]gen_begin = substring_list.pop(0)# 后面我的想法就是不断拿序列gen_begin连接其他序列(并且删除)，用while循环控制，直到列表 substring_list 为空while substring_list:l = []for i in substring_list:l.append(order(gen_begin, i)[1])print(l)  # l序列 是gen_begin和其他序列的最大重叠数(顺逆两种连接的办法)max_index = l.index(max(l))n = substring_list.pop(max_index)    #从列表 substring_list中 删除与gen_begin重叠碱基数 最多的序列gen_begin = add(gen_begin, n)        # 合成成为新的gen_beginprint(gen_begin)print(substring_list)

题目来源与生信练习网站 http://rosalind.info/problems/long/

做题过程中有些地方想法不对，学习了 https://www.bilibili.com/read/cv4332120的答案，并重新解答

Genome Assembly as Shortest Superstring相关推荐

The impact of third generation genomic technologies on plant genome assembly 第三代基因组技术对植物基因组组装的影响
题目:The impact of third generation genomic technologies on plant genome assembly 第三代基因组技术对植物基因组组装 ...
Nanopore sequencing technology and tools for genome assembly: computational analysis of the current
Nanopore sequencing technology and tools for genome assembly: computational analysis of the current ...
Error Correction and DeNovo Genome Assembly for the MinION Sequencing Reads mixing Illumina Short Re
Error Correction and DeNovo Genome Assembly for the MinION Sequencing Reads mixing Illumina Short Re ...
基因组组装(Genome Assembly)
基因组组装(Genome assembly)是指使用测序方法将待测物种的基因组生成序列片段(即read),并根据reads 之间的重叠区域对片段进行拼接,先拼接成较长的连续序列(contig),再将c ...
Oxford Nanopore MinION Sequencing and Genome Assembly
Oxford Nanopore MinION Sequencing and Genome Assembly Oxford Nanopore MinION测序和基因组组装摘要在成功的第二代测序(se ...
基因组组装（genome assembly）和对应版本的基因注释包（gene annotation packages）
Human hg19: BSgenome.Hsapiens.UCSC.hg19 EnsDb.Hsapiens.v75 Human hg38: BSgenome.Hsapiens.UCSC.hg38 E ...
943. Find the Shortest Superstring
目录题目描述暴力搜索分析暴力搜索优化动态规划参考链接题目描述输入:字符串数组String[] A 输出:一个字符串result,A中每一个字符串是result的子串,并且reuslt是符 ...
Genome Sequencing and Assembly by Long Reads in Plants植物基因组的长读测序与组装
Genome Sequencing and Assembly by Long Reads in Plants 植物基因组的长读测序与组装 Abstract: Plant genomes generat ...
Advantages of genome sequencing by long-read sequencer using SMRT technology in medical area
因为短读长测序仪丢失了太多基因组元件信息,该团队用PacBio的长读长测序仪作为替代.他们写道:"与其他测序技术相比,PacBio RS II具有四个主要优点:长读长.高一致性.低偏差.同时 ...

Genome Assembly as Shortest Superstring

Genome Assembly as Shortest Superstring相关推荐

最新文章

热门文章