【转】BLAST+中blastn参数详解
2012-05-22 13:25
转载自  lidaof
最终编辑  lidaof

与之前的blast相比,新的blast+将blastn,blastx等合作与blastall命令分隔开来,对各个命令的参数定制更加方便

个人在使用blastn的过程中总结了一些自认为常用的参数,总结如下:

blastn -db database_name -query input_file -out output_file -evalue evalue -max_target_seqs num_sequences -num_threads int_value -outfmt format format_string

blastn -db database_name -query input_file -out output_file -evalue evalue -max_target_seqs num_sequences -num_threads int_value -outfmt format "7 qacc sacc evalue length pident"

例如:

blastn -db plant_rna -query test.fa -out test.out -evalue 0.00001 -max_target_seqs 5 -num_threads 4 -outfmt format "7 qacc sacc evalue length pident"

blastn:这个不用说了吧,核酸对核酸的比对

-db: 指定blast搜索用的数据库,详见上篇文章

-query:用来查询的输入序列,fasta格式

-out:输出结果文件

-evalue: 设置e值cutoff

-max_target_seqs:设置最多的目标序列匹配数(以前我都用-b 5 -v 5,理解不对请指教)

-num_threads:指定多少个cpu运行任务(依赖于你的系统,同于以前的-a参数)

-outfmt format "7 qacc sacc evalue length pident" :这个是新BLAST+中最拉风的功能了,直接控制输出格式,不用再用parser啦, 7表示带注释行的tab格式的输出,可以自定义要输出哪些内容,用空格分格跟在7的后面,并把所有的输出控制用双引号括起来,其中qacc查询序列的acc,sacc表示目标序列的acc,evalue即是e值,length即是匹配的长度,pident即是序列相同的百分比,其他可用的特征(红色字体)如下:

*** Formatting options
-outfmt <String>
   alignment view options:
     0 = pairwise,
     1 = query-anchored showing identities,
     2 = query-anchored no identities,
     3 = flat query-anchored, show identities,
     4 = flat query-anchored, no identities,
     5 = XML Blast output,
     6 = tabular,
     7 = tabular with comment lines,
     8 = Text ASN.1,
     9 = Binary ASN.1
    10 = Comma-separated values

Options 6, 7, and 10 can be additionally configured to produce
   a custom format specified by space delimited format specifiers.
   The supported format specifiers are:
               When not provided, the default value is:
   'qseqid sseqid pident length mismatch gapopen qstart qend sstart send
   evalue bitscore', which is equivalent to the keyword 'std'
   Default = `0'

调用blastn合作加-help参数可以打印出下面详细的帮助信息

blastn -help

blastn [-h] [-help] [-import_search_strategy filename]
    [-export_search_strategy filename] [-task task_name] [-db database_name]
    [-dbsize num_letters] [-gilist filename] [-negative_gilist filename]
    [-entrez_query entrez_query] [-db_soft_mask filtering_algorithm]
    [-subject subject_input_file] [-subject_loc range] [-query input_file]
    [-out output_file] [-evalue evalue] [-word_size int_value]
    [-gapopen open_penalty] [-gapextend extend_penalty]
    [-perc_identity float_value] [-xdrop_ungap float_value]
    [-xdrop_gap float_value] [-xdrop_gap_final float_value]
    [-searchsp int_value] [-penalty penalty] [-reward reward] [-no_greedy]
    [-min_raw_gapped_score int_value] [-template_type type]
    [-template_length int_value] [-dust DUST_options]
    [-filtering_db filtering_database]
    [-window_masker_taxid window_masker_taxid]
    [-window_masker_db window_masker_db] [-soft_masking soft_masking]
    [-ungapped] [-culling_limit int_value] [-best_hit_overhang float_value]
    [-best_hit_score_edge float_value] [-window_size int_value]
    [-off_diagonal_range int_value] [-use_index boolean] [-index_name string]
    [-lcase_masking] [-query_loc range] [-strand strand] [-parse_deflines]
    [-outfmt format] [-show_gis] [-num_descriptions int_value]
    [-num_alignments int_value] [-html] [-max_target_seqs num_sequences]
    [-num_threads int_value] [-remote] [-version]

DESCRIPTION
   Nucleotide-Nucleotide BLAST 2.2.23+

OPTIONAL ARGUMENTS
-h
   Print USAGE and DESCRIPTION; ignore other arguments
-help
   Print USAGE, DESCRIPTION and ARGUMENTS description; ignore other arguments
-version
   Print version number; ignore other arguments

*** Input query options
-query <File_In>
   Input file name
   Default = `-'
-query_loc <String>
   Location on the query sequence (Format: start-stop)
-strand <String, `both', `minus', `plus'>
   Query strand(s) to search against database/subject
   Default = `both'

*** General search options
-task <String, Permissible values: 'blastn' 'blastn-short' 'dc-megablast'
                'megablast' 'vecscreen' >
   Task to execute
   Default = `megablast'
-db <String>
   BLAST database name
    * Incompatible with: subject, subject_loc
-out <File_Out>
   Output file name
   Default = `-'
-evalue <Real>
   Expectation value (E) threshold for saving hits
   Default = `10'
-word_size <Integer, >=4>
   Word size for wordfinder algorithm (length of best perfect match)
-gapopen <Integer>
   Cost to open a gap
-gapextend <Integer>
   Cost to extend a gap
-penalty <Integer, <=0>
   Penalty for a nucleotide mismatch
-reward <Integer, >=0>
   Reward for a nucleotide match
-use_index <Boolean>
   Use MegaBLAST database index
-index_name <String>
   MegaBLAST database index name

*** BLAST-2-Sequences options
-subject <File_In>
   Subject sequence(s) to search
    * Incompatible with: db, gilist, negative_gilist, db_soft_mask
-subject_loc <String>
   Location on the subject sequence (Format: start-stop)
    * Incompatible with: db, gilist, negative_gilist, db_soft_mask, remote

*** Formatting options
-outfmt <String>
   alignment view options:
     0 = pairwise,
     1 = query-anchored showing identities,
     2 = query-anchored no identities,
     3 = flat query-anchored, show identities,
     4 = flat query-anchored, no identities,
     5 = XML Blast output,
     6 = tabular,
     7 = tabular with comment lines,
     8 = Text ASN.1,
     9 = Binary ASN.1
    10 = Comma-separated values

Options 6, 7, and 10 can be additionally configured to produce
   a custom format specified by space delimited format specifiers.
   The supported format specifiers are:
            qseqid means Query Seq-id
               qgi means Query GI
              qacc means Query accesion
            sseqid means Subject Seq-id
         sallseqid means All subject Seq-id(s), separated by a ';'
               sgi means Subject GI
            sallgi means All subject GIs
              sacc means Subject accession
           sallacc means All subject accessions
            qstart means Start of alignment in query
              qend means End of alignment in query
            sstart means Start of alignment in subject
              send means End of alignment in subject
              qseq means Aligned part of query sequence
              sseq means Aligned part of subject sequence
            evalue means Expect value
          bitscore means Bit score
             score means Raw score
            length means Alignment length
            pident means Percentage of identical matches
            nident means Number of identical matches
          mismatch means Number of mismatches
          positive means Number of positive-scoring matches
           gapopen means Number of gap openings
              gaps means Total number of gaps
              ppos means Percentage of positive-scoring matches
            frames means Query and subject frames separated by a '/'
            qframe means Query frame
            sframe means Subject frame
   When not provided, the default value is:
   'qseqid sseqid pident length mismatch gapopen qstart qend sstart send
   evalue bitscore', which is equivalent to the keyword 'std'
   Default = `0'
-show_gis
   Show NCBI GIs in deflines?
-num_descriptions <Integer, >=0>
   Number of database sequences to show one-line descriptions for
   Default = `500'
-num_alignments <Integer, >=0>
   Number of database sequences to show alignments for
   Default = `250'
-html
   Produce HTML output?

*** Query filtering options
-dust <String>
   Filter query sequence with DUST (Format: 'yes', 'level window linker', or
   'no' to disable)
   Default = `20 64 1'
-filtering_db <String>
   BLAST database containing filtering elements (i.e.: repeats)
-window_masker_taxid <Integer>
   Enable WindowMasker filtering using a Taxonomic ID
-window_masker_db <String>
   Enable WindowMasker filtering using this repeats database.
-soft_masking <Boolean>
   Apply filtering locations as soft masks
   Default = `true'
-lcase_masking
   Use lower case filtering in query and subject sequence(s)?

*** Restrict search or results
-gilist <String>
   Restrict search of database to list of GI's
    * Incompatible with: negative_gilist, remote, subject, subject_loc
-negative_gilist <String>
   Restrict search of database to everything except the listed GIs
    * Incompatible with: gilist, remote, subject, subject_loc
-entrez_query <String>
   Restrict search with the given Entrez query
    * Requires: remote
-db_soft_mask <Integer>
   Filtering algorithm ID to apply to the BLAST database as soft masking
    * Incompatible with: subject, subject_loc
-perc_identity <Real, 0..100>
   Percent identity
-culling_limit <Integer, >=0>
   If the query range of a hit is enveloped by that of at least this many
   higher-scoring hits, delete the hit
    * Incompatible with: best_hit_overhang, best_hit_score_edge
-best_hit_overhang <Real, (>=0 and =<0.5)>
   Best Hit algorithm overhang value (recommended value: 0.1)
    * Incompatible with: culling_limit
-best_hit_score_edge <Real, (>=0 and =<0.5)>
   Best Hit algorithm score edge value (recommended value: 0.1)
    * Incompatible with: culling_limit
-max_target_seqs <Integer, >=1>
   Maximum number of aligned sequences to keep

*** Discontiguous MegaBLAST options
-template_type <String, `coding', `coding_and_optimal', `optimal'>
   Discontiguous MegaBLAST template type
    * Requires: template_length
-template_length <Integer, Permissible values: '16' '18' '21' >
   Discontiguous MegaBLAST template length
    * Requires: template_type

*** Statistical options
-dbsize <Int8>
   Effective length of the database
-searchsp <Int8, >=0>
   Effective length of the search space

*** Search strategy options
-import_search_strategy <File_In>
   Search strategy to use
    * Incompatible with: export_search_strategy
-export_search_strategy <File_Out>
   File name to record the search strategy used
    * Incompatible with: import_search_strategy

*** Extension options
-xdrop_ungap <Real>
   X-dropoff value (in bits) for ungapped extensions
-xdrop_gap <Real>
   X-dropoff value (in bits) for preliminary gapped extensions
-xdrop_gap_final <Real>
   X-dropoff value (in bits) for final gapped alignment
-no_greedy
   Use non-greedy dynamic programming extension
-min_raw_gapped_score <Integer>
   Minimum raw gapped score to keep an alignment in the preliminary gapped and
   traceback stages
-ungapped
   Perform ungapped alignment only?
-window_size <Integer, >=0>
   Multiple hits window size, use 0 to specify 1-hit algorithm
-off_diagonal_range <Integer, >=0>
   Number of off-diagonals to search for the 2nd hit, use 0 to turn off
   Default = `0'

*** Miscellaneous options
-parse_deflines
   Should the query and subject defline(s) be parsed?
-num_threads <Integer, >=1>
   Number of threads to use in the BLAST search
   Default = `1'
    * Incompatible with: remote
-remote
   Execute search remotely?
    * Incompatible with: gilist, negative_gilist, subject_loc, num_threads

BLAST+中blastn参数详解相关推荐

  1. BLAST+中makeblastdb参数详解

    转自http://hi.baidu.com/lidaof/blog/item/fb4569cfc2011931f9dc612f.html 以后打算工作中用到的相关BLAST操作全部用BLAST+来完成 ...

  2. oracle安装过程中内核参数详解

    转载网址:https://www.cnblogs.com/colben/p/4120439.html 在安装Oracle的时候需要调整linux的内核参数,但是各参数代表什么含义呢,下面做详细解析. ...

  3. FFMPEG进阶系列03-ffmpeg转码专题(中)x264参数详解

    文章目录 概述 版本 Presets(预设) tune slow-firstpass Frame-type options(帧类型选项) keyint min-keyint no-scenecut s ...

  4. Android AVD创建及设置中各参数详解

    本文根据如下的模拟器安装做一些解释: 本文环境:Windows XP sp3,最新JAVa环境,android-sdk_r06-windows.zip,android 2.2 API Level 8, ...

  5. Random Forest算法中的参数详解

    本篇不是介绍RF的,关于RF网上有很多通俗易懂的解释 西瓜书与统计学习方法等很多教材中的解释也都足够 本篇仅针对如何使用sklearn中的RandomForestClassifier作记录 一.代码怎 ...

  6. struts2 action中result参数详解

    chain 用来处理Action链 com.opensymphony.xwork2.ActionChainResult dispatcher 用来转向页面,通常处理JSP org.apache.str ...

  7. OpenCV3中 HOGDescriptor 参数详解

    最近在做数字识别,需要用一些特征检测的方法,所以研究了一下hog特征以及opencv3中的实现. 首先我们进入HOGDescriptor所在的头文件"objdetect.hpp", ...

  8. plt.scatter 中cmap参数详解

    1.首先,cmap参数和c参数配合使用的.参数c可以是一个序列,如:plt.scatter(a,b,c=['b','r','b','r','b'],s=80) 此时c的序列是一个颜色序列,除了上述的简 ...

  9. linux中shmget参数详解

    #include <sys/ipc.h> #include <sys/shm.h> int shmget (key_t key, size_t size, int shmflg ...

最新文章

  1. @程序员:这些瓜没吃到,可以告别互联网了!
  2. 为给微芯片拍照,IBM小哥用乐高拼了个电动显微镜,搭载树莓派,360度无死角拍摄...
  3. Android-JNINDK(一)入门
  4. Storm配置文件中主要参数配置说明
  5. 一次java导出pdf的经历
  6. [Unity 游戏设计的元素]
  7. 从外包月薪5K到阿里月薪15K,原理+实战+视频+源码
  8. Hive中生成随机唯一标识ID的方法
  9. python日期对照表_2020年日期表-python实现
  10. 【译】Serverless Jenkins with Jenkins X
  11. 初步探究ES6之箭头函数
  12. itx机箱尺寸_讲解 ATX M-ATX ITX 各种主板尺寸
  13. 成分句法分析依存文法分析
  14. Android 虚拟机EditText键盘无法输入解决方法
  15. 高效使用Mac标签功能
  16. 淹没之城(Submerged).PC单机游戏 免费下载.虚幻4 打造
  17. CSDN 如何设置博客名、博客简介及描述?
  18. 在Docker中使用Python Selenium和Headless Chrome进行网站自动化测试的方法
  19. 2019美国大学生数学建模竞赛B题分析
  20. 文化怪杰--辜鸿铭全传!

热门文章

  1. [生活大杂烩-3] 让你尖叫的13种思维方式
  2. spring-boot创建项目出现spring-boot-starter-parent版本报红问题
  3. 利用Block Design在Vivado实现三位四选一多路选择器
  4. 快速扫描某个服务器上所有开放端口
  5. 实时数据库学习(1)
  6. Python抢票项目源码
  7. python制作电脑软件_利用PYTHON制作桌面版爬虫软件(二)
  8. Java 集合深入理解 (十一) :HashMap之实现原理及hash碰撞
  9. 怎样在计算机桌面上添加小工具日历,如何在电脑桌面添加时钟,日历等小工具。 在电脑桌面添加时钟,日历等小工具的方法。...
  10. Mavenir推出云原生、模块化和基于微服务的融合计费解决方案(CCS)