文章目录

Counting DNA Nucleotides/统计ATCG数
- Problem
- Sample Dataset
- Sample Output
Transcribing DNA into RNA /DNA转录RNA
- Problem
- Sample Dataset
- Sample Output
Complementing a Strand of DNA/DNA的反向互补链
- Problem
- Sample Dataset
- Sample Output
Rabbits and Recurrence Relations/斐波那契数列
- Problem
- Sample Dataset
- Sample Output
Computing GC Content/计算序列GC含量
- Problem
- Sample Dataset
- Sample Output

Counting DNA Nucleotides/统计ATCG数

Problem

A string is simply an ordered collection of symbols selected from some alphabet and formed into a word; the length of a string is the number of symbols that it contains.

An example of a length 21 DNA string (whose alphabet contains the symbols ‘A’, ‘C’, ‘G’, and ‘T’) is “ATGCTTCAGAAAGGTCTTACG.”

Given: A DNA string of length at most 1000 nt.

Return: Four integers (separated by spaces) counting the respective number of times that the symbols ‘A’, ‘C’, ‘G’, and ‘T’ occur in .

Sample Dataset

AGCTTTTCATTCTGACTGCAACGGGCAATATGTCTCTGTGTGGATTAAAAAAAGAGTGTCTGATAGCAGC

Sample Output

20 12 17 21

#方法一
symbols = 'ACGT'
sequence = 'AGCTTTTCATTCTGACTGCAACGGGCAATATGTCTCTGTGTGGATTAAAAAAAGAGTGTCTGATAGCAGC' for i in symbols: print(sequence.count(i), end = " ")

#方法二
symbols = 'ACGT'
sequence = 'AGCTTTTCATTCTGACTGCAACGGGCAATATGTCTCTGTGTGGATTAAAAAAAGAGTGTCTGATAGCAGC' counts = [sequence.count(i) for i in symbols]
print (' '.join(map(str, counts)))

20 12 17 21

Transcribing DNA into RNA /DNA转录RNA

Problem

An RNA string is a string formed from the alphabet containing ‘A’, ‘C’, ‘G’, and ‘U’.

Given a DNA string
corresponding to a coding strand, its transcribed RNA string is formed by replacing all occurrences of ‘T’ in with ‘U’ in

Given: A DNA string

having length at most 1000 nt.

Return: The transcribed RNA string of

Sample Dataset

GATGGAACTTGACTACGTAAATT

Sample Output

GAUGGAACUUGACUACGUAAAUU

sequence = 'GATGGAACTTGACTACGTAAATT'
print(sequence.replace('T', 'U'))

GAUGGAACUUGACUACGUAAAUU

Complementing a Strand of DNA/DNA的反向互补链

Problem

In DNA strings, symbols ‘A’ and ‘T’ are complements of each other, as are ‘C’ and ‘G’.

The reverse complement of a DNA string
is the string formed by reversing the symbols of

, then taking the complement of each symbol (e.g., the reverse complement of “GTCA” is “TGAC”).

Given: A DNA string

of length at most 1000 bp.

Return: The reverse complement
of

Sample Dataset

AAAACCCGGT

Sample Output

ACCGGGTTTT

def reverse_complement(seq):"""生成反向互补序列:param seq::return:revComSeq"""ATCG_dict = {'A': 'T', 'T': 'A', 'C': 'G', 'G': 'C', 'a': 't', 't': 'a', 'c': 'g', 'g': 'c','N':'N'}revSeqList = list(reversed(seq))revComSeqList = [ATCG_dict[k] for k in revSeqList]revComSeq = ''.join(revComSeqList)return revComSeqseq = 'AAAACCCGGT'
output = reverse_complement(seq)
print(output)

ACCGGGTTTT

def reverseComplement(seq):"""生成反向互补序列:param seq::return:revComSeq"""ATCG_dict = {'A': 'T', 'T': 'A', 'C': 'G', 'G': 'C', 'a': 't', 't': 'a', 'c': 'g', 'g': 'c', 'N': 'N'}revComSeq = ''for i in seq:revComSeq =  ATCG_dict[i] + revComSeqreturn revComSeqseq = 'AAAACCCGGT'
output = reverseComplement(seq)
print(output)

ACCGGGTTTT

Rabbits and Recurrence Relations/斐波那契数列

Problem

A sequence is an ordered collection of objects (usually numbers), which are allowed to repeat. Sequences can be finite or infinite. Two examples are the finite sequence(π,2,0,π)(\pi,\sqrt2,0,\pi)(π,2,0,π) and the infinite sequence of odd numbers(1,3,5,7,...)(1,3,5,7,...)(1,3,5,7,...) We use the notation ana_nan to represent the nnn-th term of a sequence.

A recurrence relation is a way of defining the terms of a sequence with respect to the values of previous terms. In the case of Fibonacci’s rabbits from the introduction, any given month will contain the rabbits that were alive the previous month, plus any new offspring. A key observation is that the number of offspring in any month is equal to the number of rabbits that were alive two months prior. As a result, if FnF_nFn represents the number of rabbit pairs alive after the nnn-th month, then we obtain the Fibonacci sequence having terms FnF_nFn that are defined by the recurrence relation Fn=Fn−1+Fn−2F_n = F_{n-1} + F_{n-2}Fn=Fn−1+Fn−2(with F1=F2=1F_1 = F_2 = 1F1=F2=1 to initiate the sequence). Although the sequence bears Fibonacci’s name, it was known to Indian mathematicians over two millennia ago.

When finding the
-th term of a sequence defined by a recurrence relation, we can simply use the recurrence relation to generate terms for progressively larger values of nnn

. This problem introduces us to the computational technique of dynamic programming, which successively builds up solutions by using the answers to smaller cases.

Given: Positive integer n<=40n<= 40n<=40 and k<=5k <= 5k<=5

Return: The total number of rabbit pairs that will be present after nnn months, if we begin with 1 pair and in each generation, every pair of reproduction-age rabbits produces a litter of kkk rabbit pairs (instead of only 1 pair).

每对兔子在成年阶段每个月能产生1对幼年兔子

F1=1,F2=1,F3=F2+F1F_1 = 1, F_2 = 1, F_3 = F_2 + F_1 F1=1,F2=1,F3=F2+F1

月份	1	2	3	4	5	6
兔子个数（对）	1	1	2	3	5	8

每对兔子在成年阶段每个月能产生3对幼年兔子
F1=1,F2=1,F3=F2+F1×3F_1 = 1, F_2 = 1, F_3 = F_2 + F_1 \times 3 F1=1,F2=1,F3=F2+F1×3

月份	1	2	3	4	5	6
兔子个数（对）	1	1	4	7	19	40

Sample Dataset

5 3

Sample Output

def fibonacciRabbits(n, k):F = [1, 1]generation = 2while generation < n:F.append(F[generation - 1] + F[generation - 2] * k)generation += 1return F[n-1]print(fibonacciRabbits(5, 3))

def fibonacciRabbits(n, k):if n <= 2:return 1else:return fibonacciRabbits(n-1, k) + fibonacciRabbits(n-2, k) * kprint(fibonacciRabbits(5, 3))

def fibonacciRabbits(n, k):generation, a, b = 2, 1, 1while generation <= n:yield ba, b = b, a*k+bgeneration += 1return 'well done'fib = fibonacciRabbits(5, 3)
print(list(fib)[-1])

Computing GC Content/计算序列GC含量

Problem

The GC-content of a DNA string is given by the percentage of symbols in the string that are ‘C’ or ‘G’. For example, the GC-content of “AGCTATAG” is 37.5%. Note that the reverse complement of any DNA string has the same GC-content.

DNA strings must be labeled when they are consolidated into a database. A commonly used method of string labeling is called FASTA format. In this format, the string is introduced by a line that begins with ‘>’, followed by some labeling information. Subsequent lines contain the string itself; the first line to begin with ‘>’ indicates the label of the next string.

In Rosalind’s implementation, a string in FASTA format will be labeled by the ID “Rosalind_xxxx”, where “xxxx” denotes a four-digit code between 0000 and 9999.

Given: At most 10 DNA strings in FASTA format (of length at most 1 kbp each).

Return: The ID of the string having the highest GC-content, followed by the GC-content of that string. Rosalind allows for a default error of 0.001 in all decimal answers unless otherwise stated; please see the note on absolute error below.

Sample Dataset

>Rosalind_6404
CCTGCGGAAGATCGGCACTAGAATAGCCAGAACCGTTTCTCTGAGGCTTCCGGCCTTCCC
TCCCACTAATAATTCTGAGG
>Rosalind_5959
CCATCGGTAGCGCATCCTTAGTCCAATTAAGTCCCTATCCAGGCGCTCCGCCGAAGGTCT
ATATCCATTTGTCAGCAGACACGC
>Rosalind_0808
CCACCCTCGTGGTATGGCTAGGCATTCAGGAACCGGAGAACGCTTCAGACCAGCCCGGAC
TGGGAACCTGCGGGCAGTAGGTGGAAT

Sample Output

Rosalind_0808
60.919540


def gc_content(seq):"""GC含量:param seq: 序列:return:"""GC = (seq.upper().count('G') + seq.upper().count('C')) / len(seq) * 100return GCdef read_fasta(fastaFile):"""读取fasta文件"""aDict = {} for line in open(fastaFile,'r', encoding='utf-8'): if line[0] == '>': key = line.split()[0][1:] aDict[key] = [] else: aDict[key].append(line.strip())return aDictif __name__ == '__main__':seqName = 'Rosalind_0808'fastaFile = "./data/test3.fa"   fastaSeq = read_fasta(fastaFile)[seqName]GC = gc_content(''.join(fastaSeq))print(seqName)print('%.6f' % GC)

Rosalind_0808
60.919540

计算GC含量最大的DNA序列

from operator import itemgetterdef gc_content(seq):"""GC含量:param seq: 序列:return:"""GC = (seq.upper().count('G') + seq.upper().count('C'))/len(seq)*100return GCdef read_fasta(fastaFile):"""读取fasta文件"""seqDict = {} for line in open(fastaFile,'r', encoding='utf-8'): if line[0] == '>': key = line.split()[0][1:] seqDict[key] = [] else: seqDict[key].append(line.strip())return seqDictif __name__ == '__main__':gcDict ={}fastaFile = "./data/test3.fa"   seqDict = read_fasta(fastaFile)for key, val in seqDict.items():gcDict[key] = gc_content(''.join(val))gcDictSort = sorted(gcDict.items(), key = itemgetter(1), reverse = True)name = gcDictSort[0][0]largeGC = gcDictSort[0][1]print(gcDictSort)print('%s : %.6f' % (name,largeGC))

[('Rosalind_0808', 60.91954022988506), ('Rosalind_6404', 53.75), ('Rosalind_5959', 53.57142857142857)]
Rosalind_0808 : 60.919540

刷题ROSALIND，练编程水平

python生信编程1-5相关推荐

[Python|生信]从Fasta文件出发获取序列的基本信息
背景最近参加了个生信的面试,记录一下有意思的面试题. 题目描述要求从提供的*.fasta文件出发: 获得序列的反向互补序列,并统计信息:序列条数,碱基总数,N50,N90,GC 含量. 提取每条序 ...
python3编程实战_生信编程实战第3题（python）
image.png wget ftp://ftp.ensembl.org/pub/release-87/gtf/homo_sapiens/Homo_sapiens.GRCh38.87.chr.gtf. ...
python3.7程序实例_生信编程实战第7题（python）
image.png 做这个题目之间必须要了解一些背景知识 1.超几何分布超几何分布是统计学上一种离散概率分布.它描述了由有限个物件中抽出n个物件,成功抽出指定种类的物件的次数(不归还),称为超几何分 ...
python数据结构编程题_生信编程实战第5题（python）
image.png 先从hg38的gtf中提取"ANXA1"基因grep '"ANXA1"' hg38.gtf >ANXA1.gtf 在题目之前先分析要处 ...
python生信脚本之fasta序列反向互补
1.如何使用python把fasta序列进行反向互补后续还要再优化 def fasta2dict(fasta_name):with open(fasta_name) as fa:fa_dict = ...
python 生信分析_用python做生物信息数据分析（2-pysam）
写perl的思维,可能确实不能拿来学python.毕竟,python的裤子有很多.面向对象的语言,如果不好好穿裤子,怎么找对象?.手上要做的事情,需要解析sam,更或者bam文件.当然,如果有可能的话 ...
python 生信分析_安利一款生信分析神器：Biopython之分析环境搭建
当然作为入门,python语言基础还是要会一点点的,不过不需要很深.工具嘛,我们只用关心怎么用得溜,平时也没人追究勺子咋造的只管拿来用,是吧~Biopython是一个包含大量实用功能模块的集合,它支持 ...
谈谈感想，8元体会易生信培训
欢迎关注天下博客:http://blog.genesino.com/2018/03/ampliconseqsumamry/ 从2017年11月份到2018年3月份,共进行了5次培训研讨活动,内容依次为 ...
生信宝典教程大放送，一站式学习生信技术
生物信息学包含生物数据分析.数据可视化.重复工作程序化,是生物.医学科研必备的技能之一.生信宝典精心组织生信学习系列教程.生信工具精品教程,通过大量的生信例子.关键的注释.浓缩的语句和录制的视频帮助快 ...

python生信编程1-5

文章目录

Counting DNA Nucleotides/统计ATCG数

Problem

Sample Dataset

Sample Output

Transcribing DNA into RNA /DNA转录RNA

Problem

Sample Dataset

Sample Output

Complementing a Strand of DNA/DNA的反向互补链

Problem

Sample Dataset

Sample Output

Rabbits and Recurrence Relations/斐波那契数列

Problem

Sample Dataset

Sample Output

Computing GC Content/计算序列GC含量

Problem

Sample Dataset

Sample Output

python生信编程1-5相关推荐

最新文章

热门文章