Rosalind编程问题之读取开放阅读框。

Open Reading Frames

Problem
Either strand of a DNA double helix can serve as the coding strand for RNA transcription. Hence, a given DNA string implies six total reading frames, or ways in which the same region of DNA can be translated into amino acids: three reading frames result from reading the string itself, whereas three more result from reading its reverse complement.

An open reading frame (ORF) is one which starts from the start codon and ends by stop codon, without any other stop codons in between. Thus, a candidate protein string is derived by translating an open reading frame into amino acids until a stop codon is reached.

Given: A DNA string s of length at most 1 kbp in FASTA format.
Sample input:

Rosalind_99
AGCCATGTAGCTAACTCAGGTTACATGGGGATGACCCCGCGACTTGGATTAGAGTCTCTTTTGGAATAAGCCTGAATGATCCGAGTAGCATCTCAG

Return: Every distinct candidate protein string that can be translated from ORFs of s. Strings can be returned in any order.
Sample output:

MLLGSFRLIPKETLIQVAGSSPCNLS
M
MGMTPRLGLESLLE
MTPRLGLESLLE


在处理生物学数据的时候,我们经常要对于碱基序列的开放阅读框(ORF)进行获取,以预测其功能和基因结构,甚至是单倍型。一段开放阅读框由起始密码子ATG开始,终止于终止密码子TGA,TAA,TAG。期间的序列就是我们所要获取的开放阅读框。

特别要注意的是DNA以双链的形式存在,因此两条链(正义链和反义链)都可能成为编码链,在这里需要分类讨论其ORF。

思路如下:
1. 输入核酸序列,并同时获得其反向互补序列。
有关反向互补序列问题可以参考这一篇文章:Rosalind Java| Complementing a Strand of DNA。
2. 遍历序列,检索起始密码子和终止密码子。获取其间序列作为备选的ORF。
3. 翻译备选ORF为蛋白质
有关翻译蛋白的问题可以参考这一篇文章:Rosalind Java| Translating RNA into Protein。

下面是代码部分:


子方法1:反向互补序列

详情可以review之前的博客,这里直接给代码。

//反向互补序列public static String CompleDNA(String s) {//新建StringBuilder类数据以存储替换后的碱基StringBuilder arr = new StringBuilder();for (int i = 0; i < s.length(); i++) {switch (s.charAt(i)) {//括号中的输出类型为char类型,匹配后面的字符时需要将字符加单引号可匹配。case 'A':case 'a':arr.append('T');break;case 'C':case 'c':arr.append('G');break;case 'G':case 'g':arr.append('C');break;case 'T':case 't':arr.append('A');break;default:System.out.println("第" + (i + 1) + "位不是ACTG");break;}}String str = arr.toString();return str;}public static String reverse(String s) {String ss = "";for (int i = s.length() - 1; i >= 0; i--) {ss += s.charAt(i);}return ss;}

子方法2:RNA翻译为蛋白

同样直接给出代码。

    //翻译RNA to Propublic static String RNA2protein(String line) {StringBuilder arr = new StringBuilder();//2.遍历核酸序列并且每三个字符串为一组提取核苷酸for (int i = 0; i < line.length(); i += 3) {String pro = line.substring(i, i + 3);//两个参数,第一个是待提取文本的起始提取位置,第二个是终止提取位置,区间内左闭右开。switch (pro) {case "AUA":case "AUC":case "AUU":arr.append("I");break;case "AUG":arr.append("M");break;case "ACA":case "ACC":case "ACG":case "ACU":arr.append("T");break;case "AAC":case "AAU":arr.append("N");break;case "AAA":case "AAG":arr.append("K");break;case "AGC":case "AGU":arr.append("S");break;case "AGA":case "AGG":arr.append("R");break;case "CUA":case "CUC":case "CUG":case "CUU":arr.append("L");break;case "CCA":case "CCC":case "CCG":case "CCU":arr.append("P");break;case "CAC":case "CAU":arr.append("H");break;case "CAA":case "CAG":arr.append("Q");break;case "CGA":case "CGC":case "CGG":case "CGU":arr.append("R");break;case "GUA":case "GUC":case "GUG":case "GUU":arr.append("V");break;case "GCA":case "GCC":case "GCG":case "GCU":arr.append("A");break;case "GAC":case "GAU":arr.append("D");break;case "GAA":case "GAG":arr.append("E");break;case "GGA":case "GGC":case "GGG":case "GGU":arr.append("G");break;case "UCA":case "UCC":case "UCG":case "UCU":arr.append("S");break;case "UUC":case "UUU":arr.append("F");break;case "UUA":case "UUG":arr.append("L");break;case "UAC":case "UAU":arr.append("Y");break;case "UGC":case "UGU":arr.append("C");break;case "UAA":case "UAG":case "UGA":break;case "UGG":arr.append("W");break;default:break;}}String str = arr.toString();return str;}

main方法:遍历并获得开放阅读框

略讲一下遍历并获取ORF的思路:
1.首先设置i以遍历碱基序列,以3个密码子为单位,1个密码子为步长检索“ATG”。

2.如果检索到起始密码子,那么以3个密码子为单位,3个密码子为步长,检索终止密码子。

3.将起始密码子与终止密码子之间的序列作为输出,翻译为蛋白质。

4.将正义链和反义链分别做如上操作,保存至集合中,最后输出集合。

下面将子方法连接在一起形成main方法(附上全部子方法)。

import java.util.Scanner;
import java.util.Set;public class Open_Reading_Frames {public static void main(String[] args) {//1.输入数据Scanner sc = new Scanner(System.in);System.out.println("请输入核酸序列:");String line = sc.nextLine();String revline = CompleDNA(reverse(line));Set<String> subPro = new java.util.HashSet<String>();//2.正链检测ATG,将子序列保存到数组中for (int i = 0; i < line.length() - 3; i += 1) {//遍历时设置line.length() - 3以防止引索越界。String forwindex = line.substring(i, i + 3);if (forwindex.equals("ATG")) {String Substring = line.substring(i);for (int k = 0; k < Substring.length() - 3; k += 3) {//步长设置与检索ATG有差异,此处以3密码子为步长。String revindex = Substring.substring(k, k + 3);if (revindex.equals("TGA") || revindex.equals("TAA") || revindex.equals("TAG")) {String transDNA = line.substring(i, i + k + 3);
//                        System.out.println(transDNA);//3.子序列分别转录,翻译String RNA = transDNA.replace('T', 'U');subPro.add(RNA2protein(RNA));break;}}}}//3.反义链检测ORFfor (int i = 0; i < revline.length() - 3; i += 1) {String forwindex = revline.substring(i, i + 3);if (forwindex.equals("ATG")) {String Substring = revline.substring(i);for (int k = 0; k < Substring.length() - 3; k += 3) {//步长设置与检索ATG有差异,此处以3密码子为步长。String revindex = Substring.substring(k, k + 3);if (revindex.equals("TGA") || revindex.equals("TAA") || revindex.equals("TAG")) {String transDNA = revline.substring(i, i + k + 3);
//                        System.out.println(transDNA);//3.子序列分别转录,翻译String RNA = transDNA.replace('T', 'U');subPro.add(RNA2protein(RNA));break;}}}}for (String s : subPro) {System.out.println(s);}}//翻译RNA to Propublic static String RNA2protein(String line) {StringBuilder arr = new StringBuilder();//2.遍历核酸序列并且每三个字符串为一组提取核苷酸for (int i = 0; i < line.length(); i += 3) {String pro = line.substring(i, i + 3);//两个参数,第一个是待提取文本的起始提取位置,第二个是终止提取位置,区间内左闭右开。switch (pro) {case "AUA":case "AUC":case "AUU":arr.append("I");break;case "AUG":arr.append("M");break;case "ACA":case "ACC":case "ACG":case "ACU":arr.append("T");break;case "AAC":case "AAU":arr.append("N");break;case "AAA":case "AAG":arr.append("K");break;case "AGC":case "AGU":arr.append("S");break;case "AGA":case "AGG":arr.append("R");break;case "CUA":case "CUC":case "CUG":case "CUU":arr.append("L");break;case "CCA":case "CCC":case "CCG":case "CCU":arr.append("P");break;case "CAC":case "CAU":arr.append("H");break;case "CAA":case "CAG":arr.append("Q");break;case "CGA":case "CGC":case "CGG":case "CGU":arr.append("R");break;case "GUA":case "GUC":case "GUG":case "GUU":arr.append("V");break;case "GCA":case "GCC":case "GCG":case "GCU":arr.append("A");break;case "GAC":case "GAU":arr.append("D");break;case "GAA":case "GAG":arr.append("E");break;case "GGA":case "GGC":case "GGG":case "GGU":arr.append("G");break;case "UCA":case "UCC":case "UCG":case "UCU":arr.append("S");break;case "UUC":case "UUU":arr.append("F");break;case "UUA":case "UUG":arr.append("L");break;case "UAC":case "UAU":arr.append("Y");break;case "UGC":case "UGU":arr.append("C");break;case "UAA":case "UAG":case "UGA":break;case "UGG":arr.append("W");break;default:break;}}String str = arr.toString();return str;}//反向互补序列public static String CompleDNA(String s) {//新建StringBuilder类数据以存储替换后的碱基StringBuilder arr = new StringBuilder();for (int i = 0; i < s.length(); i++) {switch (s.charAt(i)) {//括号中的输出类型为char类型,匹配后面的字符时需要将字符加单引号可匹配。case 'A':case 'a':arr.append('T');break;case 'C':case 'c':arr.append('G');break;case 'G':case 'g':arr.append('C');break;case 'T':case 't':arr.append('A');break;default:System.out.println("第" + (i + 1) + "位不是ACTG");break;}}String str = arr.toString();return str;}public static String reverse(String s) {String ss = "";for (int i = s.length() - 1; i >= 0; i--) {ss += s.charAt(i);}return ss;}
}

补充.HashSet解决正反链重复输出问题

java集合类提供了存储空间可变的存储类型,存储的数据量可以随时改变。而集合类collection中存储的数据如果有重复出现的元素则为List接口(可重复),存储不可重复元素的集合用Set接口。HashSet就是这么一类保存不可重复元素的集合,因此正反义链的内容可同时存入同一个HashSet以消除翻译结果相同的ORF

但是对于Hashset的使用也有一些特殊的地方需要注意:
Hashset存储的内容是无序的,这也意味着一些需要输出有顺序的索引值的问题使用哈希会有麻烦。如果希望输出特定索引值则可以考虑list集合来处理。

Rosalind Java|Open Reading Frames相关推荐

  1. Rosalind Java|Locating Restriction Sites

    Rosalind编程问题之检索限制性位点. Locating Restriction Sites Problem: A DNA string is a reverse palindrome if it ...

  2. Rosalind Java|Matching Random Motifs

    Rosalind编程问题之计算随机序列出现并匹配待比对序列的概率. 跟Rosalind Java|Introduction to Random Strings有异曲同工之妙. Matching Ran ...

  3. Rosalind Java| Computing GC Content

    Rosalind编程问题之计算GC含量. Computing GC Content Problem The GC-content of a DNA string is given by the per ...

  4. Rosalind Java| Counting Point Mutations

    Rosalind编程问题之计数核酸序列突变数. Counting Point Mutations Problem Given two strings s and t of equal length, ...

  5. ftdi android,FTDI D2xx android java not reading

    问题 I am currently porting some code I have form C# to Java to run in on an Android system. In my cod ...

  6. Rosalind Java|Longest Increasing Subsequence动态规划算法

    Rosalind编程问题之计算集合中最长的递增元素子集. Longest Increasing Subsequence Problem: A subsequence of a permutation ...

  7. Rosalind Java| Finding a Shared Motif

    Rosalind编程问题之寻找共有的motif. Finding a Shared Motif Problem A common substring of a collection of string ...

  8. Rosalind Java|Overlap Graphs

    Rosalind编程问题之查找重叠区段. Overlap Graphs Problem: A graph whose nodes have all been labeled can be repres ...

  9. Rosalind Java|Consensus and Profile

    Rosalind编程问题之统计多个序列中profile矩阵和consensus. Consensus and Profile Problem: A matrix is a rectangular ta ...

最新文章

  1. 三本毕业后,我进入了世界五百强
  2. 服务容错保护断路器Hystrix之二:Hystrix工作流程解析
  3. rpm安装文件制作和使用
  4. 数字人民币明确不采用区块链技术,对数字货币投资须保持警惕
  5. c读取ini配置文件_Go-INI - 超赞的Go语言INI文件操作库
  6. java pojo 是什么_什么是POJO
  7. qt中设置QTabWidget,QGroupBox,QScrollArea的样式
  8. php多进程并发,php多进程模拟并发事务
  9. VS2017 远程调试linux(centos).net core
  10. iostat 输出CPU、磁盘IO的使用情况统计信息
  11. 小米游戏本bios更新_小米笔记本电脑bios升级方法图文步骤
  12. 中国制药机械行业发展态势与运营展望分析报告2022版
  13. ping命令结果中如何显示时间
  14. 微信公众号自动回复及多客服功能实现
  15. 麻省理工学院-人工智能公开课总结01
  16. CentOS配置yum源-本地和在线
  17. 与一名上海学生深度交流上大学的事
  18. Sql递归(用with 实现递归查询)
  19. 计算机突然关闭应用程序,应用程序无法正常启动(0xc000007b)。请单击“确定”关闭应用程序的解决方法...
  20. Android使用AudioRecord实现录音

热门文章

  1. 微信小程序客服相关功能
  2. C++中的数据类型及其所占字节
  3. Linux:刻录u盘,格式化u盘
  4. 无约束优化:修正阻尼牛顿法
  5. dolphin scheduler(一)
  6. 生物实验室搬迁需要注意什么
  7. 2017互联网十大未解之谜,你能回答几个?
  8. 计算机更换硬盘键盘鼠标不好使,计算机上安装的原始Win7系统的鼠标和键盘无法移动硬盘,并且无法识别解决方案...
  9. 电视剧《龙虎人生》剧照
  10. fiddler配置手机连接电脑抓取手机APP的包