NCBIdatasets:

Datasets - NCBIhttps://www.ncbi.nlm.nih.gov/datasets/

安装

windows下载链接:

https://ftp.ncbi.nlm.nih.gov/pub/datasets/command-line/LATEST/win64/datasets.exe

exe路径写入环境变量后cmd输入datasets出现提示即为安装成功

conda安装:

conda create -n ncbi_datasetsconda activate ncbi_datasetsconda install -c conda-forge ncbi-datasets-cli

使用

Examplesdatasets download genome accession GCF_000001405.39 --chromosomes X,Y --exclude-gff3 --exclude-rnadatasets download genome taxon "bos taurus"datasets download gene gene-id 672datasets download gene symbol brca1 --taxon mousedatasets download gene accession NP_000483.3datasets download virus genome taxon sars-cov-2 --host dogdatasets download virus protein S --host dog --filename SARS2-spike-dog.zipdatasets download --input-json request_file.json --filename output.zip

其中基因组下载选项:

选择自己需要的数据进行下载

Flags-a, --annotated                only include genomes with annotation--assembly-level string    restrict assemblies to a comma-separated list of one or more of: chromosome, complete_genome, contig, scaffold--assembly-source string   restrict assemblies to refseq or genbank only--chromosomes strings      limit to a specified, comma-delimited list of chromosomes (default [all])--dehydrated               download a dehydrated zip archive including the data report and locations of data files (use the rehydrate command to retrieve data files).--exclude-genomic-cds      exclude cds_from_genomic.fna (genomic cds file)--exclude-gff3             exclude genomic.gff (gff3 annotation file)--exclude-protein          exclude protein.faa (protein sequence file)--exclude-rna              exclude rna.fna (transcript sequence file)--exclude-seq              exclude genomic.fna (genomic sequence file)-h, --help                     help for genome--include-gbff             include genomic.gbff (GenBank flat file sequence and annotation), if available--include-gtf              include genomic.gtf (gtf annotation file), if available--reference                limit to reference and representative (GCF_ and GCA_) assemblies--released-before string   only include genomes that have been released before a specified date (MM/DD/YYYY)--released-since string    only include genomes that have been released after a specified date (MM/DD/YYYY)--search strings           only include genomes that have the specified text in thesearchable fields: species and infraspecies, assembly name and submitterTo provide multiple strings '--search' can be included multiple times

比如需要下载真菌(taxid:4751)基因组数据:

(taxid 可以通过NCBI搜索得到。其他下载选项可以通过命令 datasets download查看)

datasets download genome taxon "4751" --dehydrated --filename fungi_genome_dataset.zip --api-key 123456789abcdefghijk

因为数据量比较大先下载为json的压缩包形式 ,后面的--api-key防止短时请求次数过多被服务器屏蔽IP,api-key可以通过注册ncbi账号得到。

fungi_genome_dataset.zip下载完后解压到当前目录文件结构如下:

Archive:  fungi_genome_dataset.zipinflating:fungi_genome_dataset/README.mdinflating:fungi_genome_dataset/ncbi_dataset/data/*/assembly_data_report.jsonlinflating:fungi_genome_dataset/ncbi_dataset/data/dataset_catalog.jsoninflating:fungi_genome_dataset/ncbi_dataset/fetch.txt

下载

##提示not find 仔细检查路径格式
datasets rehydrate --directory fungi_genome_dataset/

datasets download genome:


Download a genome dataset including genome, transcript and protein sequence, annotation and a detailed data report.
Genome datasets can be specified by NCBI Assembly or BioProject accession or taxon. Datasets are downloaded as a zip file.The default genome dataset includes the following files (if available):
* genomic.fna (genomic sequences)
* rna.fna (transcript sequences)
* protein.faa (protein sequences)
* genomic.gff (genome annotation in gff3 format)
* data_report.jsonl (data report with genome assembly and annotation metadata)
* dataset_catalog.json (a list of files and file types included in the dataset)Refer to NCBI's [command line quickstart](https://www.ncbi.nlm.nih.gov/datasets/docs/quickstarts/command-line-tools/) documentation for information about getting started with the command-line tools.Usagedatasets download genome [command]Examplesdatasets download genome accession GCF_000001405.39 --chromosomes X,Y --exclude-gff3 --exclude-rnadatasets download genome taxon "bos taurus" --dehydrateddatasets download genome taxon human --assembly-level chromosome,complete_genome --dehydrateddatasets download genome taxon mouse --search C57BL/6J --search "Broad Institute" --dehydratedAvailable Commandsaccession   download a genome dataset by NCBI Assembly or BioProject accessiontaxon       download a genome dataset by taxon (NCBI Taxonomy ID, scientific or common name at any tax rank)Flags-a, --annotated                only include genomes with annotation--assembly-level string    restrict assemblies to a comma-separated list of one or more of: chromosome, complete_genome, contig, scaffold--assembly-source string   restrict assemblies to refseq or genbank only--chromosomes strings      limit to a specified, comma-delimited list of chromosomes (default [all])--dehydrated               download a dehydrated zip archive including the data report and locations of data files (use the rehydrate command to retrieve data files).--exclude-genomic-cds      exclude cds_from_genomic.fna (genomic cds file)--exclude-gff3             exclude genomic.gff (gff3 annotation file)--exclude-protein          exclude protein.faa (protein sequence file)--exclude-rna              exclude rna.fna (transcript sequence file)--exclude-seq              exclude genomic.fna (genomic sequence file)-h, --help                     help for genome--include-gbff             include genomic.gbff (GenBank flat file sequence and annotation), if available--include-gtf              include genomic.gtf (gtf annotation file), if available--reference                limit to reference and representative (GCF_ and GCA_) assemblies--released-before string   only include genomes that have been released before a specified date (MM/DD/YYYY)--released-since string    only include genomes that have been released after a specified date (MM/DD/YYYY)--search strings           only include genomes that have the specified text in thesearchable fields: species and infraspecies, assembly name and submitterTo provide multiple strings '--search' can be included multiple timesGlobal Flags--api-key string    NCBI Datasets API Key--filename string   specify a custom file name for the downloaded dataset (default "ncbi_dataset.zip")--no-progressbar    hide progress barUse datasets download genome help <command> for detailed help about a command.

datasets download gene:

Usagedatasets download gene [flags]datasets download gene [command]Examplesdatasets download gene gene-id 672datasets download gene symbol brca1 --taxon mousedatasets download gene accession NP_000483.3datasets download gene gene-id 2778 --fasta-filter NC_000020.11,NM_001077490.3,NP_001070958.1Available Commandsgene-id     download a gene dataset by NCBI Gene IDsymbol      download a gene dataset by gene symbolaccession   download a gene dataset by RefSeq nucleotide or protein accessiontaxon       download a gene dataset by taxonFlags--exclude-gene               exclude gene.fna (gene sequence file)--exclude-protein            exclude protein.faa (protein sequence file)--exclude-rna                exclude rna.fna (transcript sequence file)--fasta-filter strings       limit gene fasta download to a specific list of accessions--fasta-filter-file string   file of accessions to limit gene fasta download-h, --help                       help for gene--include-3p-utr             include 3p_utr.fna (3'-UTR sequence file)--include-5p-utr             include 5p_utr.fna (5'-UTR sequence file)--include-cds                include cds.fna (CDS sequence file)Global Flags--api-key string    NCBI Datasets API Key--filename string   specify a custom file name for the downloaded dataset (default "ncbi_dataset.zip")--no-progressbar    hide progress barUse datasets download gene help <command> for detailed help about a command.

利用NCBIdatasets批量下载大规模生信数据集相关推荐

  1. (详细思路)利用py批量下载某站的视频

    (详细思路)利用py批量下载某站的视频 @[TOC]((详细思路)利用py批量下载某站的视频) 前言 一.视频的种类 二.分析网站 三.视频音频合并 四.想要看的更具体的点击这个网址 总结 前言 某站 ...

  2. 利用python 批量下载美拍视频

    前些日子写了一个利用Python批量下载微博配图的程序,因为是基于微博的移动端,即weibo.cn  ,难度要小很多.而当我面对美拍时却发现,好像有点困难啊. 美拍的页面有很多动态元素,当我们打开某一 ...

  3. python利用多线程批量下载高清美女图片(350秒下载近3600张1.2个G的照片,地址可变)

    目录 第一章.前言 1.1.实现的效果: 1.2.需要用到的库: 第二章.代码分块讲解 2.1.对象的定义和初始化 2.2.方法1和2获取所有图集链接 2.2.1. 对应网站结构 2.2.2 .相应代 ...

  4. 190410-4步利用EndNote批量下载参考文献及施引文献

    参考文献如何利用EndNote批量下载文章 Step1: 登陆Web of Science官网 https://apps.webofknowledge.com Step2: 搜索需要检索的文章并点击进 ...

  5. 利用python批量下载美拍视频

    前些日子写了一个利用python批量下载微博配图的程序,因为是基于微博的移动端,即weibo.cn  ,难度要小很多.而当我面对美拍时却发现,好像有点困难啊. 美拍的页面有很多动态元素,当我们打开某一 ...

  6. 利用wget批量下载http目录下文件

    因为网络特殊,连不上互联网,只好自己制作一个puppet本地源.因为自己的电脑是是可以连互联网的,所以就直接将puppetlabs.repo导入进来了 rpm -Uvh http://yum.pupp ...

  7. Python爬虫——利用Scrapy批量下载图片

    Python爬虫--利用Scrapy批量下载图片 Scrapy下载图片项目介绍 使用Scrapy下载图片 项目创建 项目预览 创建爬虫文件 项目组件介绍 Scrapy爬虫流程介绍 页面结构分析 定义I ...

  8. 利用you-get批量下载bilibili等网站的视频

    因为资源保存问题我需要下载b站的一组视频,一共64p 2020.4更新 因为b站从av号改为了bv号,直接下载会出错.下载之前将bv转换为av即可.有个大佬的工具箱很好用可以用下https://too ...

  9. 利用Python批量下载自己喜欢听得音乐

    前言 文的文字及图片来源于网络,仅供学习.交流使用,不具有任何商业用途,版权归原作者所有,如有问题请及时联系我们以作处理. 音乐是生活的调剂品,目前很多的音乐只能播放不能下载.生为技术员的我们,怎么甘 ...

最新文章

  1. 雷达融合笔记及一些易错点总结(1)----------一线激光雷达
  2. 图解系列之JVM运行时数据区
  3. http://q.cnblogs.com/q/54251/
  4. linux下 zip解压 tar解压 gz解压 bz2等各种解压文件命令
  5. jgGrid获得的id值是主键的id而不是jqGrid的行号值
  6. 计算机系统基础:CPU相关知识笔记
  7. Mac OS X Terminal 101:终端使用初级教程以及Xcode
  8. pythonutf8转gbk,Python实现把utf-8格式的文件转换成gbk格式的文件
  9. 关于performSelector调用和直接调用区别
  10. Hadoop中MR程序的几种提交运行模式
  11. MAC Unity安装教程
  12. mysql出现1048_MySQLdb_异常操作错误:(1048,“……不能为空”)
  13. 智力问答选择题_智力问答题题库
  14. python怎么使用int四舍五入_python浮点数舍入(ROUND)方式总结
  15. 深信服应用交付(AD)学习笔记
  16. Spring、SpringMVC、SpringBoot及其插件学习笔记集合(持续更新中....)
  17. PJzhang:关闭wps小广告和快速关闭445端口
  18. JS之数据结构与算法
  19. SAGA GIS使用———加载以及显示影像
  20. 多模态机器学习基础、应用及预训练模型介绍

热门文章

  1. Docker查看container的详细信息啥的*
  2. 采购申请PR和采购订单PO的关系
  3. 如何结决PDF中的文字无法复制或选中(使用Adobe Acrobat X Pro转换双重PDF)
  4. Ubuntu16.04安装x11vnc服务并设置自动启动
  5. Flutter 错误解决Building with plugins requires symlink support.
  6. 单调、加班、血汗工厂,被夸大的富士康背后真相到底是什么?
  7. Android自定义相机实现定时拍照
  8. 人的一生要疯狂一次,无论是为一个人,一段情,或一个梦想
  9. 如何用Camtasia将喜欢的视频做出复古的感觉
  10. esxi中利用ovf模板迁虚拟机