1.从cloud/adam移出到xubo/ref:

hadoop@Master:~/cloud/adam/xubo/data/test20160310$ mkdir -p ~/xubo/ref/GRCH38Index/
hadoop@Master:~/cloud/adam/xubo/data/test20160310$ mv GCA_000001405.15_GRCh38/* ~/xubo/ref/GRCH38Index/
hadoop@Master:~/cloud/adam/xubo/data/test20160310$ cd ~/xubo/ref/GRCH38Index/
hadoop@Master:~/xubo/ref/GRCH38Index$ ls
createFastqBywgsim.sh   GCA_000001405.15_GRCh38_full_analysis_set.fna      GCA_000001405.15_GRCh38_full_analysis_set.fna.ann  GCA_000001405.15_GRCh38_full_analysis_set.fna.pac
createFastqBywgsim.txt  GCA_000001405.15_GRCh38_full_analysis_set.fna.alt  GCA_000001405.15_GRCh38_full_analysis_set.fna.bwt  GCA_000001405.15_GRCh38_full_analysis_set.fna.sa
fastq                   GCA_000001405.15_GRCh38_full_analysis_set.fna.amb  GCA_000001405.15_GRCh38_full_analysis_set.fna.fai

2.每个节点创建目录:

mkdir -p ~/xubo/ref/
ssh Mcnode2
mkdir -p ~/xubo/ref/
ssh Mcnode3
mkdir -p ~/xubo/ref/
ssh Mcnode4
mkdir -p ~/xubo/ref/
ssh Mcnode5
mkdir -p ~/xubo/ref/
ssh Mcnode6
mkdir -p ~/xubo/ref/

3.分发index到每个节点:

hadoop@Master:~/xubo/ref$ dispatch.sh GRCH38Index/

比较耗时。

hadoop@Master:~/xubo/ref$ dispatch.sh GRCH38Index/
GCA_000001405.15_GRCh38_full_analysis_set.fna.ann                                                                                                            100%   72KB  71.7KB/s   00:00
createFastqBywgsim.sh                                                                                                                                        100%  541     0.5KB/s   00:00
GCA_000001405.15_GRCh38_full_analysis_set.fna.pac                                                                                                            100%  765MB  10.8MB/s   01:11
GCA_000001405.15_GRCh38_full_analysis_set.fna.sa                                                                                                             100% 1530MB  10.5MB/s   02:26
GCA_000001405.15_GRCh38_full_analysis_set.fna                                                                                                                100% 3105MB  10.7MB/s   04:50
GCA_000001405.15_GRCh38_full_analysis_set.fna.amb                                                                                                            100%   20KB  19.7KB/s   00:00
GCA_000001405.15_GRCh38_full_analysis_set.fna.alt                                                                                                            100%  214KB 214.2KB/s   00:00
GCA_000001405.15_GRCh38_full_analysis_set.fna.fai                                                                                                            100%   19KB  19.0KB/s   00:00
GCA_000001405.15_GRCh38_full_analysis_set.fna.bwt                                                                                                            100% 3061MB  10.6MB/s   04:49
GCA_000001405.15_GRCh38_full_analysis_set.fna.ann                                                                                                            100%   72KB  71.7KB/s   00:00
createFastqBywgsim.sh                                                                                                                                        100%  541     0.5KB/s   00:00
GCA_000001405.15_GRCh38_full_analysis_set.fna.pac                                                                                                            100%  765MB  10.5MB/s   01:13
GCA_000001405.15_GRCh38_full_analysis_set.fna.sa                                                                                                             100% 1530MB  10.7MB/s   02:23
GCA_000001405.15_GRCh38_full_analysis_set.fna                                                                                                                100% 3105MB  10.7MB/s   04:50
GCA_000001405.15_GRCh38_full_analysis_set.fna.amb                                                                                                            100%   20KB  19.7KB/s   00:00
GCA_000001405.15_GRCh38_full_analysis_set.fna.alt                                                                                                            100%  214KB 214.2KB/s   00:00
GCA_000001405.15_GRCh38_full_analysis_set.fna.fai                                                                                                            100%   19KB  19.0KB/s   00:00
GCA_000001405.15_GRCh38_full_analysis_set.fna.bwt                                                                                                            100% 3061MB  10.3MB/s   04:57
GCA_000001405.15_GRCh38_full_analysis_set.fna.ann                                                                                                            100%   72KB  71.7KB/s   00:00
createFastqBywgsim.sh                                                                                                                                        100%  541     0.5KB/s   00:00
GCA_000001405.15_GRCh38_full_analysis_set.fna.pac                                                                                                            100%  765MB  10.9MB/s   01:10
GCA_000001405.15_GRCh38_full_analysis_set.fna.sa                                                                                                             100% 1530MB   8.3MB/s   03:04
GCA_000001405.15_GRCh38_full_analysis_set.fna                                                                                                                100% 3105MB   9.9MB/s   05:13
GCA_000001405.15_GRCh38_full_analysis_set.fna.amb                                                                                                            100%   20KB  19.7KB/s   00:00
GCA_000001405.15_GRCh38_full_analysis_set.fna.alt                                                                                                            100%  214KB 214.2KB/s   00:00
GCA_000001405.15_GRCh38_full_analysis_set.fna.fai                                                                                                            100%   19KB  19.0KB/s   00:00
GCA_000001405.15_GRCh38_full_analysis_set.fna.bwt                                                                                                            100% 3061MB  10.3MB/s   04:58
GCA_000001405.15_GRCh38_full_analysis_set.fna.ann                                                                                                            100%   72KB  71.7KB/s   00:00
createFastqBywgsim.sh                                                                                                                                        100%  541     0.5KB/s   00:00
GCA_000001405.15_GRCh38_full_analysis_set.fna.pac                                                                                                            100%  765MB  10.9MB/s   01:10
GCA_000001405.15_GRCh38_full_analysis_set.fna.sa                                                                                                             100% 1530MB  10.1MB/s   02:32
GCA_000001405.15_GRCh38_full_analysis_set.fna                                                                                                                100% 3105MB   9.7MB/s   05:20
GCA_000001405.15_GRCh38_full_analysis_set.fna.amb                                                                                                            100%   20KB  19.7KB/s   00:00
GCA_000001405.15_GRCh38_full_analysis_set.fna.alt                                                                                                            100%  214KB 214.2KB/s   00:00
GCA_000001405.15_GRCh38_full_analysis_set.fna.fai                                                                                                            100%   19KB  19.0KB/s   00:00
GCA_000001405.15_GRCh38_full_analysis_set.fna.bwt                                                                                                            100% 3061MB  10.4MB/s   04:54
GCA_000001405.15_GRCh38_full_analysis_set.fna.ann                                                                                                            100%   72KB  71.7KB/s   00:00
createFastqBywgsim.sh                                                                                                                                        100%  541     0.5KB/s   00:00
GCA_000001405.15_GRCh38_full_analysis_set.fna.pac                                                                                                            100%  765MB  10.8MB/s   01:11
GCA_000001405.15_GRCh38_full_analysis_set.fna.sa                                                                                                             100% 1530MB  10.8MB/s   02:22
GCA_000001405.15_GRCh38_full_analysis_set.fna                                                                                                                100% 3105MB  10.0MB/s   05:11
GCA_000001405.15_GRCh38_full_analysis_set.fna.amb                                                                                                            100%   20KB  19.7KB/s   00:00
GCA_000001405.15_GRCh38_full_analysis_set.fna.alt                                                                                                            100%  214KB 214.2KB/s   00:00
GCA_000001405.15_GRCh38_full_analysis_set.fna.fai                                                                                                            100%   19KB  19.0KB/s   00:00
GCA_000001405.15_GRCh38_full_analysis_set.fna.bwt                                                                                                            100% 3061MB  10.9MB/s   04:42
GCA_000001405.15_GRCh38_full_analysis_set.fna.ann                                                                                                            100%   72KB  71.7KB/s   00:00
createFastqBywgsim.sh                                                                                                                                        100%  541     0.5KB/s   00:00
GCA_000001405.15_GRCh38_full_analysis_set.fna.pac                                                                                                            100%  765MB  10.6MB/s   01:12
GCA_000001405.15_GRCh38_full_analysis_set.fna.sa                                                                                                             100% 1530MB  10.4MB/s   02:27
GCA_000001405.15_GRCh38_full_analysis_set.fna                                                                                                                100% 3105MB   9.8MB/s   05:17
GCA_000001405.15_GRCh38_full_analysis_set.fna.amb                                                                                                            100%   20KB  19.7KB/s   00:00
GCA_000001405.15_GRCh38_full_analysis_set.fna.alt                                                                                                            100%  214KB 214.2KB/s   00:00
GCA_000001405.15_GRCh38_full_analysis_set.fna.fai                                                                                                            100%   19KB  19.0KB/s   00:00
GCA_000001405.15_GRCh38_full_analysis_set.fna.bwt                                                                                                            100% 3061MB   9.7MB/s   05:15
hadoop@Master:~/xubo/ref$ mv GCA_000001405.15_GRCh38/* ~/xubo/ref/GRCH38Index/

参考

【1】https://github.com/xubo245/AdamLearning
【2】https://github.com/bigdatagenomics/adam/
【3】https://github.com/xubo245/SparkLearning
【4】http://spark.apache.org
【5】http://stackoverflow.com/questions/28166667/how-to-pass-d-parameter-or-environment-variable-to-spark-job
【6】http://stackoverflow.com/questions/28840438/how-to-override-sparks-log4j-properties-per-driver

研究成果:

【1】 [BIBM] Bo Xu, Changlong Li, Hang Zhuang, Jiali Wang, Qingfeng Wang, Chao Wang, and Xuehai Zhou, "Distributed Gene Clinical Decision Support System Based on Cloud Computing", in IEEE International Conference on Bioinformatics and Biomedicine. (BIBM 2017, CCF B)
【2】 [IEEE CLOUD] Bo Xu, Changlong Li, Hang Zhuang, Jiali Wang, Qingfeng Wang, Xuehai Zhou. Efficient Distributed Smith-Waterman Algorithm Based on Apache Spark (CLOUD 2017, CCF-C).
【3】 [CCGrid] Bo Xu, Changlong Li, Hang Zhuang, Jiali Wang, Qingfeng Wang, Jinhong Zhou, Xuehai Zhou. DSA: Scalable Distributed Sequence Alignment System Using SIMD Instructions. (CCGrid 2017, CCF-C).
【4】more: https://github.com/xubo245/Publications

Help

If you have any questions or suggestions, please write it in the issue of this project or send an e-mail to me: xubo245@mail.ustc.edu.cn
Wechat: xu601450868
QQ: 601450868

基因数据处理83之移动GRCH38Index到每个节点相关推荐

  1. 基因数据处理1之mapping_to_cram

    基因数据处理1之mapping_to_cram 参考资料: A Worked Example Obtain some public data We will use the first 100,000 ...

  2. 基因数据处理12之samtool的tview来查看sam的匹配文件

    基因数据处理12之samtool的tview来查看sam的匹配文件 具体的之前有文章讲过:http://blog.csdn.net/xubo245/article/details/50836185 记 ...

  3. 基因数据处理8之BWA_MEM小数据集处理(成功)

    基因数据处理8之BWA_MEM小数据集处理 环境:ubuntu14.04 6G内存 参考基因:GRCH38 来源请参考[1] 1.fastq数据:SRR003161.fastq 的头20行,即5条re ...

  4. 基因数据处理118之SSW运行

    更多代码请见:https://github.com/xubo245 基因数据处理系列 1.解释 SSW是一个更快的SW算法,并且提供了c语言lib和java的调用 代码: https://github ...

  5. 基因数据处理120之scala调用SSW在linux下运行

    更多代码请见:https://github.com/xubo245 基因数据处理系列 1.解释 先有java提供转换,使用jni调用c 然后scala调用java 2.代码: 2.1 java: pa ...

  6. 基因数据处理122之SSW和SparkSW评分不一致,query为Q9

    更多代码请见:https://github.com/xubo245 基因数据处理系列 1.解释 RT,但是顺序一致 2.代码: hadoop@Master:~/disk2/xubo/project/a ...

  7. 基因数据处理123之SSW代码不正确,到时比SparkSW时间长

    更多代码请见:https://github.com/xubo245 基因数据处理系列 1.解释 由于要生成新的score matrix:blosum50,第一次使用静态方法,直接传给align,到时每 ...

  8. 基因数据处理121之SSW的score matrix调整,使得与SparkSW评分一致

    更多代码请见:https://github.com/xubo245 基因数据处理系列 1.解释 SSW的评分矩阵是128*128的,是按char的int值来进行计算的.而blosum50是蛋白质的,而 ...

  9. 基因数据处理119之java调用SSW在linux下运行

    更多代码请见:https://github.com/xubo245 基因数据处理系列 1.解释 测试自带Example: xubo@xubo:~/xubo/tools/Complete-Striped ...

最新文章

  1. 我能考虑到的数组(老)方法就这些了(es5)
  2. OC-Foundation框架
  3. 转载:JAVA 操作 Ant API
  4. 推荐一篇关于java集合的博文,写的很nice
  5. 【C/C++】代码换行问题
  6. 发布composer包到 Packagist,并设置自动同步(从github到Packagist)
  7. C++ 函数返回char*
  8. (27)VHDL实现非(数据流描述)
  9. 华为鸿蒙os2.0系统beta,华为发布HarmonyOS 2系统:万物互联时代鸿蒙大一统
  10. ssd训练时提示:Cannot copy param 0 weights from 'xxxx',以及提示No module named caffe.proto,推理时设置GPU模式
  11. 【golang-GUI开发】struct tags系统(一)
  12. matplotlib-plt.plot用法
  13. 内存管理--分发您的程序存储器
  14. 北大中文核心期刊目录(2004年版)
  15. 计算机网络报考注册测绘师,测绘员该不该考“注册测绘师”?看过来人的心得体会.........
  16. 计算机考研数据结构答案,计算机考研数据结构试卷八(练习题含答案)
  17. 2048游戏回顾三:自定义Dialog和ProgressBar
  18. 使用segue进行页面跳转
  19. 提高免疫力吃什么 多吃奶制品
  20. golang 将数据导入excel

热门文章

  1. WinImage 8.5版本制作任意容量的ima或img磁盘文件
  2. 案例八:Shell自动化管理账号脚本
  3. [响应式布局]响应式布局技巧
  4. Android 静音功能实现
  5. 免费做淘宝TOP Taoapi.com测试平台历程
  6. 计算机语言恢复,如何将win7电脑中不见的语言栏恢复回来?
  7. 日本电子货币用户数今夏有望突破一亿
  8. 移动GPU通用计算现状与展望
  9. 统一身份认证简单对接流程
  10. Scala语言学习一——基础语法