共线性分析软件MCScanX安装、报错解决方法及使用

MCScanX安装、报错解决方法及简单使用

1.简介：
（1）介绍：
MCScanX采用改进了的MCScan算法，分析基因组内或者基因组间的共线性区块。它利用两个物种蛋白质（或核酸）blastp比对结果，再结合这些蛋白质基因在基因组中的位置（处理过的gff），得到两个物种基因组的共线性区块。如果是分析基因组内的共线性区块，物种内蛋白质自己比对自己就可以。
mannual:http://chibba.pgml.uga.edu/mcscan2/documentation/manual.pdf

软件包含两个部分：1.MCScan算法；2.后期的可视化分析。目前这个软件可以在MAC OS（需要提前安装xcode）和 linux（需要Java SE Develoment Kit和“libpng”）上使用。MCScanX包括MCScanX、MCScanX_h、duplicate_gene_classifier三个主程序，位于主文件夹中；还有12个下游分析程序位于downstream_analyses文件夹。注意：优化版本中，13年发布的MCScanX-transposed是用来检测基因组内或组间的transposed gene重复。

（2）发表文献：Wang Y , Tang H , Debarry J D , et al. MCScanX : a toolkit for detection and evolutionary analysis of gene synteny and collinearity[J]. Nucleic Acids Research, 2012, 40(7):e49-e49

2.安装
pengzw@super-server:~$ wget http://chibba.pgml.uga.edu/mcscan2/MCScanX.zip
pengzw@super-server:~$ unzip MCScanX.zip -d ~/biosoft/ #安装在~/biosoft/下
pengzw@super-server:~/biosoft/MCScanX$ cd ~/biosoft/MCScanX
pengzw@super-server:~/biosoft/MCScanX$ make #会出现如下报错，修改文件以后再make

pengzw@super-server:~/biosoft/MCScanX$ make ##出现以下信息，则证明安装对啦
g++ struct.cc mcscan.cc read_data.cc out_utils.cc dagchainer.cc msa.cc permutation.cc -o MCScanX
g++ struct.cc mcscan_h.cc read_homology.cc out_homology.cc dagchainer.cc msa.cc permutation.cc -o MCScanX_h
g++ struct.cc dup_classifier.cc read_data.cc out_utils.cc dagchainer.cc cls.cc permutation.cc -o duplicate_gene_classifier
g++ dissect_multiple_alignment.cc -o downstream_analyses/dissect_multiple_alignment
g++ detect_collinear_tandem_arrays.cc -o downstream_analyses/detect_collinear_tandem_arrays
cd downstream_analyses/ && make
make[1]: Entering directory ‘/home/pengzw/biosoft/MCScanX/downstream_analyses’
javac -g family_circle_plotter.java
javac -g dot_plotter.java
javac -g family_tree_plotter.java
javac -g family_tree_plotter_show_length.java
javac -g bar_plotter.java
javac -g dual_synteny_plotter.java
javac -g circle_plotter.java
javac -g family_tree_plotter_chr.java
make[1]: Leaving directory ‘/home//pengzw/biosoft/MCScanX/downstream_analyses’

pengzw@super-server:~/biosoft/MCScanX$ echo 'PATH=PATH:/biosoft/MCScanX/′>>/.bashrcpengzw@super−server:/biosoft/MCScanXPATH:~/biosoft/MCScanX/ ' >> ~/.bashrc pengzw@super-server:~/biosoft/MCScanXPATH: /biosoft/MCScanX/′>> /.bashrcpengzw@super−server: /biosoft/MCScanX source ~/.bashrc
pengzw@super-server:~/biosoft/MCScanX$ MCScanX
报错如图：是因为MCScanX 不支持64位系统。如果要在 64位上运行，需要加入相关库文件

报错1: “msa.cc:289:9: error: ‘chdir’ was not declared in this scope”
解决方案: 打开msa.cc，在顶部加上#include <unistd.h>

报错2: “dissect_multiple_alignment.cc:252:44: error: ‘getopt’ was not declared in this scope”
解决方案: 打开"dissect_multiple_alignment.cc"，在顶部加上#include <getopt.h>

报错3: “detect_collinear_tandem_arrays.cc:286:17: error: ‘getopt’ was not declared in this scope”
解决方案：打开"detect_collinear_tandem_arrays.cc"，在顶部加上#include <getopt.h>

报错4: “make[1]: javac: Command not found”
解决方案: 在https://www.oracle.com/technetwork/java/javase/downloads/index.html下载JDK，安装Java环境
有权限就直接sudo，因为我真的很懒。

pengzw@super-server:~$ sudo apt install openjdk-8-jdk
3.使用
(1).准备文件xyz.gff
MCscanX要求的gff文件和标准的gff文件不一样，它只有四列, 其中"sp#"的sp意味着你要用2个字母代表物种(多个字母好像也不影响结果)，#则表示是哪条染色体。而"gene"则要是你蛋白序列的基因名。

sp# gene starting_position ending_position
gff3文件第九列是=连接，利用awk指定多个分隔符就可得到

pengzw@super-server:~/reference/At$ awk -F “[= \t]” '$3 == “gene” {print$1"\t"$11"\t"$4"\t"KaTeX parse error: Expected 'EOF', got '}' at position 2: 5}̲' Athaliana_167… sudo apt install ncbi-blast+

或者本地安装

pengzw@super-server:~$ mkdir biosoft
pengzw@super-server:~$ wget ftp://ftp.ncbi.nlm.nih.gov/blast/executables/blast+/LATEST/ncbi-blast-2.8.1±x64-linux.tar.gz
pengzw@super-server:~$ tar zxvf ncbi-blast-2.8.1±x64-linux.tar.gz -C ~/biosoft/
pengzw@super-server:~$ cd ~/biosoft/ncbi-blast-2.8.1+/bin
pengzw@super-server:~/biosoft/ncbi-blast-2.8.1+/bin$ ls #绿色为程序
blastdb_aliastool blast_formatter blastx dustmasker makeblastdb psiblast segmasker update_blastdb.pl
blastdbcheck blastn convert2blastmask get_species_taxids.sh makembindex rpsblast tblastn windowmasker
blastdbcmd blastp deltablast legacy_blast.pl makeprofiledb rpstblastn tblastx
pengzw@super-server:~/biosoft/ncbi-blast-2.8.1+/bin$ echo 'PATH=PATH:/biosoft/ncbi−blast−2.8.1+/bin/′>>/.bashrcpengzw@super−server:/biosoft/ncbi−blast−2.8.1+/binPATH:~/biosoft/ncbi-blast-2.8.1+/bin/' >> ~/.bashrc pengzw@super-server:~/biosoft/ncbi-blast-2.8.1+/binPATH: /biosoft/ncbi−blast−2.8.1+/bin/′>> /.bashrcpengzw@super−server: /biosoft/ncbi−blast−2.8.1+/bin source ~/.bashrc
pengzw@super-server:~/biosoft/ncbi-blast-2.8.1+/bin$ makeblastdb
blast建库（索引）：
makeblastdb -in refpep.fa -dbtype prot -parse_seqids -out refpep.db
options:
-in :带格式化的序列文件，必须为fa
-dbtype ：数据库类型，prot或者nucl
-out：数据库索引名
…

blast比对：
blastp -query yourpep.fa -db refpep.db -out xyz.blast -evalue 1e-10 -num_threads 24 -outfmt 6 -num_alignments 5
options：
-parse_seqids:解析序列标识，一般都要加上
-evalue:E值的阈值设置官网推荐的1e-10
-num_threads 24:线程为24
-num_alignments5:是取最好的5个比对结果
-outfmt 6:输出文件格式，总共有12种格式，6是tab格式

注意：
1.需要对序列进行预处理，仅保留每个基因中的一个转录本。
2.注意统一gff和blast文件中ID，不然无结果（0 matches ）。
3.blastp format6：12列

1 2 3 4 5 6 7 8 9 10 11 12
queryID dbID identity% length mismatch gap querypos1 querypos2 dbpos1 dbpos2 e-value bit-score
4.如果要做基因组组内和组间的共线性，那么就要将这两个基因组先进行合并, cat 1genome.fa 2genome.fa > all.fa, 然后用all.fa建索引，用all.fa进行比对。文件生成后也需要整合 cat 1genome.gff 2genome.gff > all.gff。

(3)使用MCScanX分析基因组共线性区块：
./MCScanX dir/xyz #xyz.blast and xyz.gff在同一文件夹下
options:
[Usage] ./bin/mcscan2 prefix_fn [options]
-k MATCH_SCORE, final score=MATCH_SCORE+NUM_GAPS*GAP_PENALTY
(default: 50)
-g GAP_PENALTY, gap penalty (default: -1)
-s MATCH_SIZE, number of genes required to call a collinear block
(default: 5)
-e E_VALUE, alignment significance (default: 1e-05)
-m MAX_GAPS, maximum gaps allowed (default: 25)
-a only builds the pairwise blocks (.aligns file)
-b patterns of collinear blocks. 0:intra- and inter-species (default); 1:intra-species; 2:inter-species -h print this help page

运行结果如图：

4.结果
注意：0 matches imported (xxxxx discarded), 那么一定是你的GFF文件里的基因名和blast结果的基因名不对应导致

程序输出3个文件：
Filename Description
xyz.collinaeriry 共性性区域数据。可以是同一物种类的共线性区域，也可以是物种间的共线性区域
xyz.html 在网页中浏览，可以直观看到在各个染色体上共线性的状态。灰色表示染色体序列；红色表示染色体上的串联基因；黄色表示共线性基因。
xyz.tandem 基因串联数据。2个或2个以上的同源基因在基因组上紧挨在一起。
核心程序：
MCScanX 检测共线性区域，并比对到参考染色体上。
MCScanX_h 和MCScanX类似，只不过输入文件是成对的用tab隔开的同源基因。
duplicate_gene_classifier 基因分类
5.下游分析及可视化
12个下游分析程序：
1.detect_collinear_tandem_arrays
2.dissect_multiple_alignment
3.add_ka_and_ks_to_collinearity.pl
4.group_collinear_genes.pl
5.detect_collinearity_within_gene_families.pl
6.origin_enrichment_analysis.pl
7.dot_plotter.java
8.dual_synteny_plotter.java
9.circle_plotter.java
10.bar_plotter.java
11.family_circle_plotter.java
12.family_tree_plotter.java

(1)duplicate_gene_classifier(基因分类):
其中0，1，2，3，4分别代表了哪五大类：
0：singleton（非重复基因）
1：dispersed（不是2，3，4的其它重复）
2：proximal（染色体附近的重复，但是不相邻）
3：tandem（串联重复）
4：WGD/segmental（在共线性区域的共线性基因）

(2)detect_collinearity_within_gene_families.pl:提取基因家族的分析结果
1)准备gene_family_file:txt, 以tab键分隔

2)detect_collinearity_within_gene_families.pl用法：得到复制基因对

perl detect_collinearity_within_gene_families.pl -i gene_family_file.txt -d xyz.collinearity -o
output_file
3)对基因家族的复制基因对分类：
安装：

wget http://chibba.pgml.uga.edu/mcscan2/transposed/MCScanX-transposed.zip
unzip MCScanX-transposed.zip -d ~/biosoft/
cd ~/biodoft/MCScanX-transposed
make
提取结果：

perl MCScanX-transposed.pl -i data -t at -c al,br,cp,pt,vv -o result/at_result -x 3
绘图:
按照manual中画图，
若结果不满意可以下载分析结果用circos软件绘图。

共线性分析软件MCScanX安装、报错解决方法及使用相关推荐

mysqlclient Windows 下安装报错解决方法
用pip install mysqlclient时,出现了如下报错问题: error: Microsoft Visual C++ 14.0 is required. Get it with " ...
gcc安装报错解决方法
本文系转载! 版权声明:本文为博主[我头像是啥]原创文章,遵循 CC 4.0 BY-SA 版权协议,转载请附上原文出处链接和本声明. 感谢!转自https://blog.csdn.net/weixin ...
installshield 安装报错解决方法
在win7/win8点击setup.exe安装文件时显示如下错误: --------------------------- Setup Initialization Error ----------- ...
ThinkPad安装deepin操作系统报错解决方法
ThinkPad安装deepin操作系统报错解决方法参考文章: (1)ThinkPad安装deepin操作系统报错解决方法 (2)https://www.cnblogs.com/haihua85/p ...
安装fastDFS的依赖包fdfs_client报错解决方法
安装fastDFS的依赖包fdfs_client报错解决方法参考文章: (1)安装fastDFS的依赖包fdfs_client报错解决方法 (2)https://www.cnblogs.com/Se ...
mac上安装webpack报错解决方法Hit error EACCES: permission denied, mkdir ‘/usr/local/lib/node_modules/webpack
mac上安装webpack报错解决方法Hit error EACCES: permission denied, mkdir '/usr/local/lib/node_modules/webpack 参 ...
高翔视觉SLAM十四讲（第二版）各种软件、库安装的以及报错解决方法
目录前言系统版本下载高翔视觉SLAM十四讲第二版的源代码一.安装 Vim 二.安装 g++ 三.安装 KDevelop 以及汉化 1.安装 2.汉化四.安装 Eigen 库五.安装 Pan ...
Jenkins安装插件报错解决方法
Jenkins安装插件报错解决方法 1.报错场景 2.场景分析 3.问题解决(不一定能全解决,看运气) 1.报错场景按正常方式安装完Jenkins后安装插件会报错如下: // An highligh ...
npm 安装依赖报错解决方法总结
npm 安装依赖报错解决方法总结参考文章: (1)npm 安装依赖报错解决方法总结 (2)https://www.cnblogs.com/ysxq/p/11658571.html (3)https: ...
Mac更新后ae不能打开，ae安装后打开报错解决方法
Mac更新后ae不能打开,mac最新系统ae打不开,ae安装后打开报错怎么办?有网友提问,装的AE2021版本,之前还好好的,突然某天就打不开了,重装AE也没用,怎么都打不开,每次都提示这个?如何解决 ...

共线性分析软件MCScanX安装、报错解决方法及使用

或者本地安装

共线性分析软件MCScanX安装、报错解决方法及使用相关推荐

最新文章

热门文章