Rosalind编程问题之查找两个序列由内含子分隔的共有motif。

Ordering Strings of Varying Length Lexicographically

Problem:
A subsequence of a string is a collection of symbols contained in order (though not necessarily contiguously) in the string (e.g., ACG is a subsequence of TATGCTAAGATC). The indices of a subsequence are the positions in the string at which the symbols of the subsequence appear; thus, the indices of ACG in TATGCTAAGATC can be represented by (2, 5, 9).

As a substring can have multiple locations, a subsequence can have multiple collections of indices, and the same index can be reused in more than one appearance of the subsequence; for example, ACG is a subsequence of AACCGGTT in 8 different ways.
Given: Two DNA strings s and t (each of length at most 1 kbp) in FASTA format.
Sample input

Rosalind_14
ACGTACGTGACG
Rosalind_18
GTA

Return: One collection of indices of s in which the symbols of t appear as a subsequence of s. If multiple solutions exist, you may return any one.
Sample output

3 8 10


题目给出两条序列,需要我们在长的一条中找到短的一条里全部碱基的位置。也可以理解为短序列是长序列的cds,长序列包含内含子,需要我们定位出cds的碱基序号。(本题答案不唯一)

解题思路如下:
1.读取两条序列。
2.双指针法分别遍历长短序列。
3.如碱基相同则输出该碱基的序号。

下面是实现代码:

public class Finding_a_Spliced_Motif {public static void main(String[] args) {ArrayList<String> fasta = BufferedReader2("C:/Users/Administrator/Desktop/rosalind_sseq.txt", "fasta");ArrayList<Integer> index = new ArrayList<>();//双指针法int i = 0;//第一条序列,主序列int j = 0;//第二条序列,亚序列while (j < fasta.get(1).length()) {if (fasta.get(1).charAt(j) == fasta.get(0).charAt(i)) {index.add(i + 1);j++;//亚序列前进}i++;//主序列前进}for (int k = 0; k < index.size(); k++) {System.out.print(index.get(k) + " ");}}public static ArrayList<String> BufferedReader2(String path, String choose) {//返回值类型是新建集合大类,此处是Set而非哈希。BufferedReader reader;ArrayList<String> tag = new java.util.ArrayList<String>();ArrayList<String> fasta = new java.util.ArrayList<String>();try {reader = new BufferedReader(new FileReader(path));String line = reader.readLine();StringBuilder sb = new StringBuilder();while (line != null) {//多次匹配带有“>”的行,\w代表0—9A—Z_a—z,需要转义。\W代表非0—9A—Z_a—z。if (line.matches(">[\\w*|\\W*]*")) {tag.add(line);//定义字符串变量seq保存删除换行符的序列信息if (sb.length() != 0) {String seq = sb.toString();fasta.add(seq);sb.delete(0, sb.length());//清空StringBuilder中全部元素}} else {sb.append(line);//重新向StringBuilder添加元素}// read next lineline = reader.readLine();}String seq = sb.toString();fasta.add(seq);reader.close();} catch (IOException e) {e.printStackTrace();}if (choose.equals("tag")) {return tag;}return fasta;}
}

双指针法

双指针法实现遍历的核心思想就是在遍历对象的过程中,不只使用单个指针进行数组或集合的访问,而是使用两个相同方向或者相反方向的指针进行扫描,从而达到相应的目的。换言之,双指针法充分使用了数组有序这一特征,从而在某些情况下简化运算。而实现双指针法关键点在于设定终止条件,本道题中两碱基字母相等就是终止条件:fasta.get(1).charAt(j) == fasta.get(0).charAt(i)。

Rosalind Java|Finding a Spliced Motif相关推荐

  1. Rosalind Java| Finding a Shared Motif

    Rosalind编程问题之寻找共有的motif. Finding a Shared Motif Problem A common substring of a collection of string ...

  2. 【Rosalind】Finding a Protein Motif – 正则表达式的使用

    Rosalind习题Finding a Protein Motif 先祝大家中秋和国庆双节快乐! 在这个日历我还在努力地学习代码(指只有晚上写了2h白天都在睡觉)zzzz 最近在做Rosalind上的 ...

  3. ROsalind 014 Finding a Shared Motif

    背景: 在"Finding a Motif in DNA"中,我们在给定的基因串中搜索基序;但是,这个问题假设我们事先知道主题.在实践中, 生物学家通常不知道他们在寻找什么.相反, ...

  4. Rosalind Java|Matching Random Motifs

    Rosalind编程问题之计算随机序列出现并匹配待比对序列的概率. 跟Rosalind Java|Introduction to Random Strings有异曲同工之妙. Matching Ran ...

  5. Rosalind Java|Locating Restriction Sites

    Rosalind编程问题之检索限制性位点. Locating Restriction Sites Problem: A DNA string is a reverse palindrome if it ...

  6. 碱基序列的最长公共子串(Finding a Shared Motif)

    碱基序列的最长公共子串(Finding a Shared Motif) 题目 A common substring of a collection of strings is a substring ...

  7. Rosalind Java|Open Reading Frames

    Rosalind编程问题之读取开放阅读框. Open Reading Frames Problem Either strand of a DNA double helix can serve as t ...

  8. Rosalind Java| Computing GC Content

    Rosalind编程问题之计算GC含量. Computing GC Content Problem The GC-content of a DNA string is given by the per ...

  9. Rosalind Java| Counting Point Mutations

    Rosalind编程问题之计数核酸序列突变数. Counting Point Mutations Problem Given two strings s and t of equal length, ...

最新文章

  1. 修改mysql的用户名和密码
  2. Oracle Sql技巧 - Upsert, Multitable Insert, Undrop
  3. 图集打包算法_UGUI打包图集工具-插件Simple Sprite Packer详解
  4. 织梦dedecms移动版设置二级域名的方法 织梦如何设置m.开头的域名
  5. python二分法查找程序_Python程序查找最大EVEN数
  6. 常见的SQL笔试题和面试题:SQL经典50题
  7. JavaScript中DOM的层次节点(一)
  8. 解决许可证兼容性问题,Ubuntu 内核将内置 ZFS
  9. 关于JS的window.onload与$(function (){})方法区别
  10. Redis 6.0 源码阅读笔记(0) -- Redis 哈希表和字典 铺垫
  11. UIScrollerView ,UIPageControl混搭使用,添加定时器,无限循环滚动
  12. IBM DB2百度云下载
  13. smartdns使用指南_OpenWrt之SmartDNS 使用教程(PW版)
  14. php数字转成字符串的函数,php怎么将数字转成字符串?
  15. Unity插件 - MeshEditor(一) 3D线段作画 模型网格编辑器
  16. 通过Redis入侵服务器
  17. 滴滴的2019:巨亏和裁员之后,群狼将至
  18. 2018年秋招运维岗面试常见python和数据结构知识点总结
  19. 揪出Android流氓软件
  20. TCP协议发送SKB时ip_summed成员的设置

热门文章

  1. Unity3D中文视频教程【超清+精选】
  2. Linux Rootkit躲避内核检测
  3. 安卓下快速搜索文件实现历程{NDK}
  4. 手写Fbank语音特征提取
  5. 什么是 PM,什么是 SCM,和 NVM 什么关系?
  6. java执行sql文件
  7. linux修改vnc设置密码,更改root与vnc密码,配置vnc
  8. 论文阅读笔记——Multi-Label Learning with Global and Local Label Correlation(具有全局和局部标签相关性的多标签学习)
  9. 技术周刊(第10期):新技术又来了?
  10. android 反编译解析.