Search with database masking enabled(启用数据库屏蔽的搜索)

Created: June 23, 2008; Updated: January 7, 2021.
Database masking has two modes. The first is known as “soft-masking”, and BLAST uses the database mask only during the (initial) word-finding phase of BLAST. The second is known as “hard-masking”, and BLAST uses the database mask during all phases of the search. Here, we look at both types of masking.
To enable database masking during a BLAST search, we use the –info parameter of blastdbcmd to discover the masking Algorithm ID. For the database generated in the previous cookbook entry, we can use the following command line to activate the windowmasker soft masking:

数据库屏蔽有两种模式。第一种称为“软屏蔽”,BLAST仅在BLAST的(初始)单词查找阶段使用数据库屏蔽。第二种称为“硬屏蔽”,BLAST在搜索的所有阶段都使用数据库屏蔽。在这里,我们看两种类型的掩蔽。
为了在BLAST搜索期间启用数据库屏蔽,我们使用blastdbcmd的–info参数来发现屏蔽算法ID。对于在上一个cookbook条目中生成的数据库,我们可以使用以下命令行激活windowmasker软屏蔽:

blastn -query HTT_gene -task megablast -db hs_chr -db_soft_mask 30 \
-outfmt 7 -out HTT_megablast_softmask.out -num_threads 4

Here, we search a nucleotide query, HTT_gene* (-query HTT_gene), with the megablast algorithm (-task megablast) against the database hs_chr (-db hs_chr). We use soft masking (-db_soft_mask 30), set the result format to tabular output (-outfmt 7), and save the result to a file named HTT_megablast_softmask.tab (-out HTT_megablast_softmask.tab). We also activated the multi-threaded feature of blastn to speed up the search by using 4 CPUs $ (-num_threads 4).

在这里,我们针对数据库hs_ chr-db hs_chr)使用megablast算法(-task megablast)搜索核苷酸序列, HTT_gene* (-query HTT_gene)。我们使用软屏蔽(-db_soft_mask 30),将结果格式设置为表格输出(-outpmt 7),并将结果保存到名为HTT_megablast_softmask.tab的文件中(-out-HTT_megablast_softmask.tab)。我们还激活了blastn的多线程功能,通过使用4个CPU$(-num_threads 4)来加快搜索速度。

For the database generated in the previous cookbook entry, we can use the following command line to activate the windowmasker hard masking:

对于在上一个cookbook条目中生成的数据库,我们可以使用以下命令行激活windowmasker硬屏蔽:

blastn -query HTT_gene -task megablast -db hs_chr -db_hard_mask 30 \
-outfmt 7 -out HTT_megablast_hardmask.out -num_threads 4

The options are similar to the ones for soft masking, except that we use –db_hard_mask rather than –db_soft_mask. Additionally, we changed the name of the output file.

这些选项与软屏蔽的选项类似,只是我们使用–db_hard_mask而不是–db_ soft_maske。此外,我们还更改了输出文件的名称。

Hard masking is much more aggressive than soft masking. In interspersed or simple repeats, soft masking normally provides the best results. Hard masking may be warranted to remove vector or other contamination from the BLAST results.

硬屏蔽比软屏蔽更具攻击性。在分散或简单重复中,软掩蔽通常提供最佳结果。可能需要进行硬屏蔽,以从爆破结果中去除载体或其他污染物。

*This is a genomic fragment containing the HTT gene from human, including 5 kb up- and down-stream of the transcribed region. It is represented by NG_009378.

*这是一个包含人类HTT基因的基因组片段,包括5 kb的转录区上游和下游。它由NG_ 009378表示。

$ The number to use under in your run will depend on the number of CPUs your system has.

$在运行中使用的数字将取决于系统的CPU数量。

In a test run under a 64-bits Linux machine, the search with soft masking took about 1.5 seconds real time, and the search with hard masking took about 2.5 seconds real time. The search without database masking took about 31 minutes.

在64位Linux机器下的测试运行中,使用软屏蔽的搜索实时耗时约1.5秒,而使用硬屏蔽的搜索则实时耗时约2.5秒。没有数据库屏蔽的搜索耗时约31分钟。

Display BLAST search results with custom output format(使用自定义输出格式显示BLAST搜索结果)

Created: June 23, 2008; Updated: January 7, 2021.
The –outfmt option permits formatting arbitrary fields from the BLAST tabular and comma-separated-value (CSV) formats. Use the –help option on the command-line application (e.g., blastn) to see the supported fields. The max_target_seqs option should be used with any tabular output to control the number of matches reported.

–outpmt选项允许格式化BLAST表格格式和逗号分隔值(CSV)格式的任意字段。使用命令行应用程序(如blastn)上的–help选项查看支持的字段。max_target_seqs选项应与任何表格输出一起使用,以控制报告的匹配数。

Example of custom output format: field selection(自定义输出格式示例:字段选择)

The following example shows how to display the results of a BLAST search using a custom output format. The tabular output format with comments is used, but only the query accession, subject accession, evalue, query start, query stop, subject start, and subject stop are requested. For brevity, only the first 10 lines of output are shown:

下面的示例显示了如何使用自定义输出格式显示BLAST搜索的结果。使用带注释的表格输出格式,但仅请求查询加入、主题subject 加入、评估值、查询开始、查询停止、主题subject 开始和主题subject 停止。为简洁起见,仅显示了前10行输出:

Example of custom output format: output separator(自定义输出格式示例:输出分隔符)

One can also customize the output separator in the tabular and comma-separated-value output formats using the delim token immediately after the numeric output format selector. In the example below ‘@’ is used as a custom output separator:

还可以在数字输出格式选择器之后立即使用delim标记自定义表格和逗号分隔值输出格式中的输出分隔符。在下面的示例中,“@”用作自定义输出分隔符:???

Trace-back operations (BTOP)(追溯操作)

The “Blast trace-back operations” (BTOP) string describes the alignment produced by BLAST. This string is similar to the CIGAR string produced in SAM format, but there are important differences. BTOP is a more flexible format that lists not only the aligned region but also matches and mismatches. BTOP operations consist of 1.) a number with a count of matching letters, 2.) two letters showing a mismatch (e.g., “AG” means A was replaced by G), or 3.) a dash (“-“) and a letter showing a gap. The box below shows a blastn run first with BTOP output and then the same run with the BLAST report showing the alignments.

“Blast回溯操作”(BTOP)字符串描述Blast产生的对齐。该字符串类似于SAM格式的CIGAR字符串,但有重要区别。BTOP是一种更灵活的格式,它不仅列出对齐区域,还列出匹配和不匹配。BTOP运算包括1.)一个带有匹配字母计数的数字,2.)两个表示不匹配的字母(例如,“AG”表示A被G替换),或3.)一个短划线(“-”)和一个表示间隙的字母。下面的框显示了一个blastn运行,首先是BTOP输出,然后是相同的运行,BLAST报告显示了路线。

Use blastdb_aliastool to manage the BLAST databases(使用blastdb_aliastool管理BLAST数据库)

Created: June 23, 2008; Updated: January 7, 2021.
Often, one needs to search multiple databases together or wishes to search a specific subset of sequences within an existing database. For these type of searches a convenient way to conduct them is by creating a virtual BLAST database. The blastdb_aliastool can perform three types of tasks to assist in that process. First, it can build an alias file to transparently combine searches of different databases. Second, it can build an alias file that limits a search based on a list of GIs (numerical IDs) or accessions. Finally, it can convert the list of GI’s or accessions to a more efficient binary format.

通常,需要一起搜索多个数据库,或者希望搜索现有数据库中序列的特定子集。对于这些类型的搜索,一种方便的方法是创建一个虚拟BLAST数据库。blastdb_ Alias工具可以执行三种类型的任务来协助该过程。首先,它可以构建一个别名文件来透明地组合不同数据库的搜索。其次,它可以构建一个别名文件,限制基于GIs(数字ID)或访问列表的搜索。最后,它可以将GI或访问列表转换为更有效的二进制格式。

Note: When combining BLAST databases, all the databases must be of the same molecule type. The following examples assume that the two databases as well as the GI file are in the current working directory. The binary format for accessions is only supported in the newer version 5 of the BLAST databases (BLAST+ 2.10.0 or newer suggested). Version 5 of the BLAST databases supports limiting a search natively by taxonomy, and only the relevant TAXIDs are needed.

注意:当组合BLAST数据库时,所有数据库必须是相同的分子类型。以下示例假设这两个数据库以及GI文件位于当前工作目录中。只有更新版本5的BLAST数据库(建议使用BLAST+2.10.0或更新版本)才支持二进制访问格式。BLAST数据库的版本5支持通过分类法限制本地搜索,并且只需要相关的分类。

Aggregate existing BLAST databases(聚合现有BLAST数据库)

To combine the two nematode nucleotide databases, named “nematode_mrna” and “nematode_genomic", we use the following command line:

为了组合两个线虫核苷酸数据库,命名为“线虫_mrna”和“线虫_基因组”,我们使用以下命令行:

blastdb_aliastool -dblist "nematode_mrna nematode_genomic" -dbtype nucl \
-out nematode_all -title "Nematode RefSeq mRNA + Genomic"

Create a subset of a BLAST database(创建BLAST数据库的子集)

The nematode_mrna database contains RefSeq mRNAs for several species of round worms. The best subset is from C. elegans. In most cases, we want to search this subset instead of the complete collection. Since the database entries are from NCBI nucleotide databases and the database is formatted with ”-parse_seqids”, we can use the “-gilist c_elegans_mrna.gi” parameter/value pair to limit the search to the subset of interest, alternatively, we can create a subset of the nematode_mrna database as follows:

线虫_ mrna数据库包含几种圆线虫的RefSeq mrna。最佳子集来自线虫。在大多数情况下,我们希望搜索这个子集,而不是整个集合。由于数据库条目来自NCBI核苷酸数据库,并且数据库的格式为**“-parse_seqids”,我们可以使用“-gilist c_elegans_mrna.gi”**参数/值对将搜索限制在感兴趣的子集,或者,我们可以创建线虫_mrna数据库的子集,如下所示:

blastdb_aliastool -db nematode_mrna -gilist c_elegans_mrna.gi -dbtype \
nucl -out c_elegans_mrna -title "C. elegans refseq mRNA entries"

Note: one can also specify multiple databases using the -db parameter of blastdb_aliastool.

注意:还可以使用blastdb_aliastool的-db参数指定多个数据库。

Convert a GI or accession list to binary format(将GI或登录列表转换为二进制格式)

The blastdb_aliastool can convert a GI or accession list to a binary format that is more efficient during the BLAST search. The example below converts a list of accessions to the binary format. The last two options shown (-seqid_db and -seqid_dbtype) are optional and limit the contents of the resulting accession list to accessions in the specified database, in this case swissprot. This may result in a much smaller file and shorter run times, but BLAST will exit with an error if the specified database is not used. As mentioned earlier, binary accession lists are only supported with version 5 BLAST databases.

blastdb_ Alias工具可以将GI或登录列表转换为二进制格式,在BLAST搜索期间更高效。下面的示例将访问列表转换为二进制格式。显示的最后两个选项(-seqid_db和-seqid_dbtype)是可选的,并将结果登录列表的内容限制为指定数据库中的登录,在本例中为swissprot。这可能会导致文件更小,运行时间更短,但如果不使用指定的数据库,BLAST将退出并出错。如前所述,只有版本5的BLAST数据库才支持二进制登录列表。

blastdb_aliastool -seqid_file_in myacc.acc -seqid_file_out myacc.bin.acc -seqid_db
swissprot -seqid_dbtype prot

Reformat BLAST reports with blast_formatter(使用BLAST_formatter重新格式化BLAST报告)

Created: June 23, 2008; Updated: January 7, 2021.
It may be helpful to view the same BLAST results in different formats. A user may first parse the tabular format looking for matches meeting a certain criteria, then go back and examine the relevant alignments in the full BLAST report. He may also first look at pair-wise alignments, then decide to use a query-anchored view. Viewing a BLAST report in different formats has been possible on the NCBI BLAST web site since 2000, but has not been possible with stand-alone BLAST runs. The blast_formatter allows this, if the original search produced blast archive format using the –outfmt 11 switch. The query sequence, the BLAST options, the masking information, the name of the database, and the alignment are written out as ASN.1 (a structured format similar
to XML). The –max_target_seqs option should be used to control the number of matches recorded in the alignment. The blast_formatter reads this information and formats a report. The BLAST database used for the original search must be available, or the sequences need to be fetched from the NCBI, assuming the database contains sequences in the public dataset. The box below illustrates the procedure. A blastn run first produces the BLAST archive format, and the blast_fomatter then reads the file and produces tabular output.

以不同格式查看相同的BLAST结果可能会有所帮助。用户可以首先解析表格格式,寻找符合特定标准的匹配项,然后返回并检查完整BLAST报告中的相关对齐。他还可能首先查看成对对齐,然后决定使用查询锚定视图。自2000年以来,在NCBI BLAST网站上可以查看不同格式的BLAST报告,但在独立BLAST运行中不可能。如果原始搜索使用 –outfmt 11 开关生成blast存档格式,则blast_formatter允许???查询序列、BLAST选项、屏蔽信息、数据库名称和对齐方式都写为ASN.1(类似于XML的结构化格式)。应使用**–max_target_seqs**选项控制对齐中记录的匹配数。blast_formatter读取此信息并格式化报告。用于原始搜索的BLAST数据库必须可用,或者需要从NCBI获取序列,假设数据库包含公共数据集中的序列。下面的方框说明了该过程。blastn运行首先生成BLAST存档格式,然后 blast_fomatter 读取文件并生成表格输出。

Blast_formatter will format stand-alone searches performed with an earlier version of a database if both the search and formatting databases are prepared so that fetching by sequence ID is possible. To enable fetching by sequence ID use the –parse_seqids flag when running makeblastdb, or (if available) download preformatted BLAST databases from ftp://ftp.ncbi.nlm.nih.gov/blast/db/ using update_blastdb.pl (provided as part of the BLAST+ package). Currently the blast archive format and blast_formatter do not work with database free searches (i.e., -subject rather than –db was used for the original search).

如果搜索和格式化数据库都已准备好,以便可以通过序列ID获取,则 Blast_formatter 将格式化使用早期版本数据库执行的独立搜索。要启用按序列ID提取,请在运行 makeblastdb 时使用**–parse_seqids** 标志,或者(如果可用)从中下载预格式化的BLAST数据库ftp://ftp.ncbi.nlm.nih.gov/blast/db/使用 update_blastdb.pl(作为BLAST+包的一部分提供)。目前,blast存档格式和blast_formatter 不适用于无数据库搜索(即,原始搜索使用-subject而不是-db)。

Blast中文手册(3)相关推荐

  1. Blast中文手册(6)

    Appendices Created: June 23, 2008; Updated: March 14, 2021. Conversion from C toolkit applications(从 ...

  2. Blast中文手册(1)

    原文链接BLAST Command Line Applications User Manual Building a BLAST database with your (local) sequence ...

  3. Blast中文手册(2)

    Get NCBI BLAST databases(获取NCBI BLAST数据库 ) Created: June 23, 2008; Updated: January 7, 2021. The bes ...

  4. Blast中文手册(1)补充

    Limiting a Search by taxonomy(按分类法限制搜索) Created: June 23, 2008; Updated: January 7, 2021. In order t ...

  5. Smarty中文手册,Smarty教程,Smarty模板的入门教材

    Smarty中文手册,Smarty教程,Smarty模板的入门教材 首先,这份Smarty中文手册的翻译工作是由喜悦国际村村民自发组织的,不代表任何人的意见和观点.对他们的无私奉献精神,我们表示感谢, ...

  6. man nfsd(rpc.nfsd中文手册)

    本人译作集合:http://www.cnblogs.com/f-ck-need-u/p/7048359.html rpc.nfsd(8) System Manager's Manual rpc.nfs ...

  7. CSS2.0中文手册(CHM版)

    Div+Css是现在网站架设的一个趋势,应用Css对于网站有诸多的好处.本教程是沈小雨2002年制作的 Css2.0中文手册,对学习CSS和查询CSS属性非常有帮助. Css2.0中文手册针对的是已有 ...

  8. iPhone开发环境搭建全过程 iPhone手机开发内容,中文手册

    http://3g.edu.csdn.net/kecheng/iphone.html  iPhone手机开发内容 http://www.docin.com/p-34874880.html# iPhon ...

  9. Apache2.2中文手册

    Apache2.2中文手册 CHM格试 转载于:https://blog.51cto.com/zjcookies/114174

最新文章

  1. /etc/rc.d/rc.sysinit 分析
  2. 利用 FastCoding 将对象进行本地持久化
  3. webpack前言:前端模块系统的演进
  4. win7内存占用过高怎么处理
  5. LVS Nginx 负载均衡区别
  6. 在著名出版社出版书,你也行——记录我写书出版的经历和体会
  7. Using C++ in Eclipse - Program file not Specified problem
  8. 自己做量化交易软件(45)小白量化实战18--直接使用通达信自编指标公式进行分析绘图和回测
  9. EICU数据库安装教程
  10. Do we need an operating system?
  11. Ubuntu安装中文字体
  12. 影响百度SEM竞价账户推广效果的8大因素
  13. java调用百度翻译_Java调用百度API实现翻译-Go语言中文社区
  14. 避免怀疑跳槽,程序员该怎么请假?
  15. C++缺省参数函数简介和使用
  16. 排序算法-python
  17. 海龟如何保留米帝手机号
  18. python编写一个函数,输入n为偶数时,调用函数求1 / 2 + 1 / 4 + ... + 1 / n, 当输入n为奇数时,调用函数1 / 1 + 1 / 3 + ... + 1 / n(指针函数
  19. css罕见的冷门样式
  20. git checkout 时提示 “The following untracked working tree files would be overwritten by checkout“

热门文章

  1. 什么是云计算的基本原理?具体的核心技术有哪些?
  2. 工控系统主动安全防御体系的构建
  3. CTFHub Bypass disable_function系列(已完结)
  4. 怎么用python实现快递信息自动查询和跟踪?
  5. 软件开发本质论——自然之路 1
  6. Oracle列合并成行之wm_concat函数浅析
  7. Linux 下谨慎使用 rm,避免从删库到跑路的悲剧发生
  8. フェーン現象 (Foehn Phenomena)
  9. 编写一个函数,计算两个数字的和,差,积,商
  10. 信息管理导论 | 信息与信息资源、信息社会