1.Entrez Guidelines

对于超过100次的访问，请在周末时间或者错开USA的高峰时间，这是你需要去遵守的。
Biopython使用的是 https://eutils.ncbi.nlm.nih.gov 这个网址，并不是标准的NCBI网址。
如果你使用API访问时，你最多能在一分钟访问10次，否则只能在一分钟访问3次。
使用可选参数email以至于NCBI知道你是谁，并且可以在NCBI有问题时与你取得联系。
如果在一些大型的软件中使用Biopython，请使用tool参数去明确的定义。例如：
```
from Bio import Entrez
Entrez.tool = "MyLocalScript"
```
对于大量的访问，Biopython也推荐使用WebEnv session cookie string进行访问，稍见之后的章节介绍。

2.EInfo:Obtaining information about the Entrez databases

你可以使用EInfo去获得数据库名称列表，最新的更新情况和NCBI的可用数据库链接。

from Bio import Entrez
Entrez.email = "xxx@example.com"  ##always tell NCBI who you are
Info = Entrez.einfo()
result = Info.read() ##The variable result now contain a list of databases of XMLformat.
print(result)

'''
Now,this result is a XML file,we could extract the infomation by string searching.
Instead Using Bio.Entrze's parser,we can directly parse this XML file into a Python object.
'''
Info = Entrez.einfo()
record = Entrez.read(Info)
#Now, record is a dictionary with exactly one key:
record['DbList']  #record是一个字典，键值是Dblist,内容是一个列表，包含了NCBI数据库名称。

#for each of these databases,we can use EInfo again to obtain more infomation.
from Bio import Entrez
Entrez.email = 'xxx@example.com'
Info = Entrez.einfo(db = 'pubmed')
record = Entrez.read(Info)
#print(record)#可以打印出record这个字典
print(record['DbInfo'].keys())
print(record['DbInfo']['Description'])
print(record['DbInfo']['Count'])
print(record['DbInfo']['LastUpdate'])
#for field in record["DbInfo"]["FieldList"]:
#    print(field.keys())
#for field in record["DbInfo"]["FieldList"]:
#    print("%(Name)s, %(FullName)s, %(Description)s" % field)

3.ESearch:Searching the Entrez databases

我们可以使用Bio.search()去搜索这些数据库的相关信息。

#在PubMed中查找与Biopython相关的发表物
from Bio import Entrez
Entrez.email = 'xxx@example.com'
handle = Entrez.esearch(db = 'pubmed',term='biopython')
record = Entrez.read(handle)
"19304878" in record['IdList'] #true
print(len(record["IdList"]))

#you can also use ESearch to search GenBank.
from Bio import Entrez
Entrez.email = 'xxx@example.com'
handle = Entrez.esearch(db="nucleotide", term="Cypripedioideae[Orgn] AND matK[Gene]", idtype="acc")
record = Entrez.read(handle)
print(record["Count"])
print(record["IdList"]) #the ID is a GenBank identifier,and then I will introduce how to download

4.EPost:Uploading a list of identifiers

当你有一个的数据库列表ID想要使用EFetch去下载时，如果这些ID很长，那么生成的URL也会变得很长，最后这个URL就会崩溃(或者不能处理好)。
然而，你可以将其分为两个步骤。首先，使用EPost上传ID列表(这里内部使用的是"HTML post",而不是"HTML get",这将会绕过长URL的问题)。之后你就能使用EFetch下载相关的数据了。

from Bio import Entrez
Entrez.email = "xxx@example.com"
id_list = ["19304878", "18606172", "16403221", "16377612", "14871861", "14630660"]
#print(Entrez.epost("pubmed",id = ",".join(id_list)).read())
#返回的XML包括两个重要的字符串：QueryKey,WebEnv,它们共同定义了你的历史访问。
#你可以Entrez的其它模块:EFetch去提取这些历史会话的信息。
search_result = Entrez.read(Entrez.epost("pubmed",id = ",".join(id_list)))
QueryKey = search_result["QueryKey"]
WebEnv = search_result["WebEnv"]
#print(QueryKey)
#print(WebEnv)

5.ESummary:Retrieving summaries from primary IDs

我们可以使用ESummary概要。

from Bio import Entrez
Entrez.email = "xxx@example.com"
handle = Entrez.esummary(db = 'nlmcatalog',id = '101660833')
record = Entrez.read(handle)
info = record[0]["TitleMainList"][0]
print(info)
print("-------------------------------------------")
for key in record[0]:print(key,":",record[0][key])
print("-------------------------------------------")
print("Journal info\nid : {}\nTitle : {}".format(record[0]["Id"],info["Title"]))

6.EFetch:Downloading full records from Entrez

当你需要检索来自于Entrez的所有信息时，你可以使用EFetch,它包含了一些数据库。
在这些数据库中，NCBI支持各种文件格式，在使用EFetch时，需要使用可选参数retype/retmode去指定文件格式。
EFetch的一个常用功能就是下载FASTA,GenBank/GenPept序列。

使用Bio.Entrez.efetch去下载 GenBank record EU490707
from Bio import Entrez
Entrez.email = "xxx@example.com"
handle = Entrez.efetch(db = 'nucleotide',id = "eu490707",rettype = 'gbwithparts',retmode = 'text')
print(handle.read())

#如果你是通过Bio.SeqIO获取的格式，可以直接解析到SeqRecord.
from Bio import SeqIO
from Bio import Entrez
Entrez.email = "xxx@example.com" #always tell NCBI who you are
handle = Entrez.efetch(db = 'nucleotide' , id = "EU490707",rettype = 'gb',retmode = 'text')
record = SeqIO.read(handle,'genbank')
handle.close()
print(record.id)
print(record.name)
print(record.description)
#print(record.features)
print(repr(record.seq))

#一个更加普遍的使用方法就是将序列保存到本地目录下，然后使用Bio.SeqIO去解析它。
#这将可以节省反复下载同一个文件的时间，并且减少NCBI服务器的负载。
import os
from Bio import SeqIO
from Bio import Entrez
Entrez.email = 'xxx@example.com'
filename = 'EU490707.gbk'
if not os.path.isfile(filename):print('Download...')net_handle = Entrez.efetch(db = 'nucleotide' , id = 'EU490707' , rettype = 'gb' , retmode = 'text')outfile = open(filename,'w')outfile.write(net_handle.read())outfile.close()net_handle.close()print('Save...')
print('Parsing...')
record = SeqIO.read(filename,"genbank")
print(record)

#在使用Bio.Entrez解析数据时，若要得到XML格式的输出，可以使retmode = 'xml'
from Bio import Entrez
Entrez.email = 'xxx@example.com'
handle = Entrez.efetch(db = 'nucleotide',id = 'EU490707',retmode = 'xml')
result = Entrez.read(handle)
handle.close()
for key in result[0]:print(key)    #打印出键值
result[0]['GBSeq_definition']
result[0]['GBSeq_source']

使用Biopython访问NCBI's Entrez数据库相关推荐

构建NCBI本地BLAST数据库 (NR NT等) | blastx/diamond使用方法 | blast构建索引 | makeblastdb...
参考链接: FTP README 如何下载 NCBI NR NT数据库? 下载blast:ftp://ftp.ncbi.nlm.nih.gov/blast/executables/blast+ 先了解 ...
NCBI中SRA数据库简介
NCBI中SRA数据库简介 SRA数据库简介 SRA 数据库, 为Sequence Read Archive 的缩写.主要存储高通量测序的数据,来自四个测序平台,分别为: Roche_LS454,Il ...
从NCBI当中SRA数据库中下载高通量测序数据
从NCBI当中SRA数据库中下载高通量测序数据 NCBI sra 数据下载用迅雷替代官方的prefetch批量下载SRA测序数据,更快更稳定! 用别人的数据,发自己的文章由于大多数杂志在文章发表前 ...
解决Docker容器内访问宿主机MySQL数据库服务器的问题
解决Docker容器内访问宿主机MySQL数据库服务器的问题参考文章: (1)解决Docker容器内访问宿主机MySQL数据库服务器的问题 (2)https://www.cnblogs.com/ga ...
使用JDBC驱动程序访问SQL Server 2000数据库（实例）
 <% String drivername="com.microsoft.jdbc.sqlse ...
数据访问基础类(基于Access数据库)
数据访问基础类(基于Access数据库) using System; using System.Collections; using System.Collections.Specialized; u ...
c 连接mysql数据库查询_C语言实现访问及查询MySQL数据库的方法
本文实例讲述了C语言实现访问及查询MySQL数据库的方法.分享给大家供大家参考,具体如下: 1.添加头文件路径(MySQL安装路径中的include路径) 2.添加库文件(直接从MySQL安装路径中c ...
nodejs进入mysql数据库_nodejs简单访问及操作mysql数据库的方法示例
本文实例讲述了nodejs简单访问及操作mysql数据库的方法.分享给大家供大家参考,具体如下: var mysql = require('mysql'); //调用MySQL模块 mysql模块要安 ...
外网访问内网Oracle数据库
为什么80%的码农都做不了架构师?>>> 本地安装了一个Oracle数据库,只能在局域网内访问到,怎样从外网也能访问到本地的Oracle数据库呢?本文将介绍具体的实现步骤. 1 ...
redis 公网ip访问_怎样从公网访问内网Redis数据库
公网访问内网Redis数据库本地安装了Redis数据库,只能在局域网内访问,怎样从公网也能访问本地Redis数据库? 本文将介绍具体的实现步骤. 1. 准备工作 1.1 安装并启动Redis数据库 ...

使用Biopython访问NCBI's Entrez数据库