文章目录

EInfo: Obtaining information about the Entrez databases
- For each of these databases, we can use EInfo again to obtain more information
ESearch: Searching the Entrez databases
EPost: Uploading a list of identifiers
EFetch: Downloading full records from Entrez
ELink: Searching for related items in NCBI Entrez
Parsing Medline records
Parsing GEO records
Parsing UniGene records
ESpell: Obtaining spelling suggestions
ESummary: Retrieving summaries from primary IDs
example 1
example 2
example 3
- Searching, downloading, and parsing Entrez Nucleotide records
Searching, downloading, and parsing GenBank records
Using the history and WebEnv -- most import
Searching for and downloading abstracts using the history

Entrez (https://www.ncbi.nlm.nih.gov/Web/Search/entrezfs.html) is a data retrieval system that provides users access to NCBI’s databases such as PubMed, GenBank, GEO, and many others. You can access Entrez from a web browser to manually enter queries, or you can use Biopython’s Bio.Entrez module for programmatic access to Entrez. The latter allows you for example to search PubMed or download GenBank records from within a Python script.

The Bio.Entrez module makes use of the Entrez Programming Utilities (also known as EUtils), consisting of eight tools that are described in detail on NCBI’s page at https://www.ncbi.nlm.nih.gov/books/NBK25501/. Each of these tools corresponds to one Python function in the Bio.Entrez module, as described in the sections below.

Before using Biopython to access the NCBI’s online resources (via Bio.Entrez or some of the other modules), please read the NCBI’s Entrez User Requirements. If the NCBI finds you are abusing their systems, they can and will ban your access!

• For any series of more than 100 requests, do this at weekends or outside USA peak times. This is up
to you to obey.
• Use the https://eutils.ncbi.nlm.nih.gov address, not the standard NCBI Web address. Biopython
uses this web address.
• If you are using a API key, you can make at most 10 queries per second, otherwise at most 3 queries per
second.
• Use the optional email parameter so the NCBI can contact you if there is a problem. You can either explicitly set this as a parameter with each call to Entrez (e.g. include email=“A.N.Other@example.com”
in the argument list), or you can set a global email address:

>>> from Bio import Entrez
>>> Entrez.email = "A.N.Other@example.com"

• For large queries, the NCBI also recommend using their session history feature

EInfo: Obtaining information about the Entrez databases

# EInfo: Obtaining information about the Entrez databases
# EInfo provides field index term counts, last update, and available links for each of NCBI’s databases
# In addition, you can use EInfo to obtain a list of all database names accessible through the Entrez utilitiesfrom Bio import EntrezEntrez.email = "A.N.Other@example.com" # Always tell NCBI who you are
handle = Entrez.einfo()
result = handle.read()
handle.close()
print(result)

<?xml version="1.0" encoding="UTF-8" ?>
<!DOCTYPE eInfoResult PUBLIC "-//NLM//DTD einfo 20190110//EN" "https://eutils.ncbi.nlm.nih.gov/eutils/dtd/20190110/einfo.dtd">
<eInfoResult>
<DbList><DbName>pubmed</DbName><DbName>protein</DbName><DbName>nuccore</DbName><DbName>ipg</DbName><DbName>nucleotide</DbName><DbName>structure</DbName><DbName>genome</DbName><DbName>annotinfo</DbName><DbName>assembly</DbName><DbName>bioproject</DbName><DbName>biosample</DbName><DbName>blastdbinfo</DbName><DbName>books</DbName><DbName>cdd</DbName><DbName>clinvar</DbName><DbName>gap</DbName><DbName>gapplus</DbName><DbName>grasp</DbName><DbName>dbvar</DbName><DbName>gene</DbName><DbName>gds</DbName><DbName>geoprofiles</DbName><DbName>homologene</DbName><DbName>medgen</DbName><DbName>mesh</DbName><DbName>ncbisearch</DbName><DbName>nlmcatalog</DbName><DbName>omim</DbName><DbName>orgtrack</DbName><DbName>pmc</DbName><DbName>popset</DbName><DbName>proteinclusters</DbName><DbName>pcassay</DbName><DbName>protfam</DbName><DbName>biosystems</DbName><DbName>pccompound</DbName><DbName>pcsubstance</DbName><DbName>seqannot</DbName><DbName>snp</DbName><DbName>sra</DbName><DbName>taxonomy</DbName><DbName>biocollections</DbName><DbName>gtr</DbName>
</DbList></eInfoResult>

# Bio.Entrez’s parser parse XML
handle = Entrez.einfo()
record = Entrez.read(handle)
record.keys()

dict_keys(['DbList'])

record["DbList"]

['pubmed', 'protein', 'nuccore', 'ipg', 'nucleotide', 'structure', 'genome', 'annotinfo', 'assembly', 'bioproject', 'biosample', 'blastdbinfo', 'books', 'cdd', 'clinvar', 'gap', 'gapplus', 'grasp', 'dbvar', 'gene', 'gds', 'geoprofiles', 'homologene', 'medgen', 'mesh', 'ncbisearch', 'nlmcatalog', 'omim', 'orgtrack', 'pmc', 'popset', 'proteinclusters', 'pcassay', 'protfam', 'biosystems', 'pccompound', 'pcsubstance', 'seqannot', 'snp', 'sra', 'taxonomy', 'biocollections', 'gtr']

For each of these databases, we can use EInfo again to obtain more information

# For each of these databases, we can use EInfo again to obtain more information:
handle = Entrez.einfo(db="pubmed")
record = Entrez.read(handle)
record["DbInfo"] #输出一个字典，可以再次引用keys然后读取响应的信息

ESearch: Searching the Entrez databases

# ESearch: Searching the Entrez databases
# To search any of these databases, we use Bio.Entrez.esearch(). For example, let’s search in PubMed for
# publications that include Biopython in their titlehandle = Entrez.esearch(db="pubmed", term="biopython[title]", retmax="40" )
record = Entrez.read(handle)
# "19304878" in record["IdList"]
print(record["IdList"])

['34434786', '22909249', '19304878']

# You can also use ESearch to search GenBank. Here we’ll do a quick search for the matK gene in Cypripedioideae orchids
handle = Entrez.esearch(db="nucleotide", term="Cypripedioideae[Orgn] AND matK[Gene]", idtype="acc")
record = Entrez.read(handle)
print(record["Count"])
print(record["IdList"]) #Each of the IDs is a GenBank identifier

585
['NC_058834.1', 'NC_058833.1', 'NC_058832.1', 'OK120861.1', 'OK120860.1', 'NC_058212.1', 'NC_058211.1', 'NC_058210.1', 'NC_058209.1', 'NC_058208.1', 'NC_058207.1', 'NC_058206.1', 'MN315109.1', 'MN315108.1', 'MN315107.1', 'MN315106.1', 'MN315105.1', 'NC_056912.1', 'MZ150831.1', 'MZ150830.1']

EPost: Uploading a list of identifiers

# EPost: Uploading a list of identifiers
# EPost uploads a list of UIs for use in subsequent search strategies
# To give an example of when this is useful, suppose you have a long list of IDs you want to download
# using EFetch (maybe sequences, maybe citations { anything). When you make a request with EFetch your
# list of IDs, the database etc, are all turned into a long URL sent to the server. If your list of IDs is long,
# this URL gets long, and long URLs can break (e.g. some proxies don’t cope well)
id_list = ["19304878", "18606172", "16403221", "16377612", "14871861", "14630660"]
print(Entrez.epost("pubmed", id=",".join(id_list)).read())

<?xml version="1.0" encoding="UTF-8" ?>
<!DOCTYPE ePostResult PUBLIC "-//NLM//DTD epost 20090526//EN" "https://eutils.ncbi.nlm.nih.gov/eutils/dtd/20090526/epost.dtd"><ePostResult><QueryKey>1</QueryKey><WebEnv>MCID_61a9ccaf13217214767bf8b1</WebEnv>
</ePostResult>

# The returned XML includes two important strings, QueryKey and WebEnv which together define your history
# session. You would extract these values for use with another Entrez call such as EFetch
id_list = ["19304878", "18606172", "16403221", "16377612", "14871861", "14630660"]
search_results = Entrez.read(Entrez.epost("pubmed", id=",".join(id_list)))
webenv = search_results["WebEnv"]
query_key = search_results["QueryKey"]

EFetch: Downloading full records from Entrez

# EFetch: Downloading full records from Entrez
# EFetch is what you use when you want to retrieve a full record from Entrez
# For most of their databases, the NCBI support several different file formats. Requesting a specific file
# format from Entrez using Bio.Entrez.efetch() requires specifying the rettype and/or retmode optional arguments.handle = Entrez.efetch(db="nucleotide", id="EU490707", rettype="gb", retmode="text")
print(handle.read())

LOCUS       EU490707                1302 bp    DNA     linear   PLN 26-JUL-2016
DEFINITION  Selenipedium aequinoctiale maturase K (matK) gene, partial cds;chloroplast.
ACCESSION   EU490707
VERSION     EU490707.1
KEYWORDS    .
SOURCE      chloroplast Selenipedium aequinoctialeORGANISM  Selenipedium aequinoctialeEukaryota; Viridiplantae; Streptophyta; Embryophyta; Tracheophyta;Spermatophyta; Magnoliopsida; Liliopsida; Asparagales; Orchidaceae;Cypripedioideae; Selenipedium.
REFERENCE   1  (bases 1 to 1302)AUTHORS   Neubig,K.M., Whitten,W.M., Carlsward,B.S., Blanco,M.A., Endara,L.,Williams,N.H. and Moore,M.TITLE     Phylogenetic utility of ycf1 in orchids: a plastid gene morevariable than matKJOURNAL   Plant Syst. Evol. 277 (1-2), 75-84 (2009)
REFERENCE   2  (bases 1 to 1302)AUTHORS   Neubig,K.M., Whitten,W.M., Carlsward,B.S., Blanco,M.A.,Endara,C.L., Williams,N.H. and Moore,M.J.TITLE     Direct SubmissionJOURNAL   Submitted (14-FEB-2008) Department of Botany, University ofFlorida, 220 Bartram Hall, Gainesville, FL 32611-8526, USA
FEATURES             Location/Qualifierssource          1..1302/organism="Selenipedium aequinoctiale"/organelle="plastid:chloroplast"/mol_type="genomic DNA"/specimen_voucher="FLAS:Blanco 2475"/db_xref="taxon:256374"gene            <1..>1302/gene="matK"CDS             <1..>1302/gene="matK"/codon_start=1/transl_table=11/product="maturase K"/protein_id="ACC99456.1"/translation="IFYEPVEIFGYDNKSSLVLVKRLITRMYQQNFLISSVNDSNQKGFWGHKHFFSSHFSSQMVSEGFGVILEIPFSSQLVSSLEEKKIPKYQNLRSIHSIFPFLEDKFLHLNYVSDLLIPHPIHLEILVQILQCRIKDVPSLHLLRLLFHEYHNLNSLITSKKFIYAFSKRKKRFLWLLYNSYVYECEYLFQFLRKQSSYLRSTSSGVFLERTHLYVKIEHLLVVCCNSFQRILCFLKDPFMHYVRYQGKAILASKGTLILMKKWKFHLVNFWQSYFHFWSQPYRIHIKQLSNYSFSFLGYFSSVLENHLVVRNQMLENSFIINLLTKKFDTIAPVISLIGSLSKAQFCTVLGHPISKPIWTDFSDSDILDRFCRICRNLCRYHSGSSKKQVLYRIKYILRLSCARTLARKHKSTVRTFMRRLGSGLLEEFFMEEE"
ORIGIN      1 attttttacg aacctgtgga aatttttggt tatgacaata aatctagttt agtacttgtg61 aaacgtttaa ttactcgaat gtatcaacag aattttttga tttcttcggt taatgattct121 aaccaaaaag gattttgggg gcacaagcat tttttttctt ctcatttttc ttctcaaatg181 gtatcagaag gttttggagt cattctggaa attccattct cgtcgcaatt agtatcttct241 cttgaagaaa aaaaaatacc aaaatatcag aatttacgat ctattcattc aatatttccc301 tttttagaag acaaattttt acatttgaat tatgtgtcag atctactaat accccatccc361 atccatctgg aaatcttggt tcaaatcctt caatgccgga tcaaggatgt tccttctttg421 catttattgc gattgctttt ccacgaatat cataatttga atagtctcat tacttcaaag481 aaattcattt acgccttttc aaaaagaaag aaaagattcc tttggttact atataattct541 tatgtatatg aatgcgaata tctattccag tttcttcgta aacagtcttc ttatttacga601 tcaacatctt ctggagtctt tcttgagcga acacatttat atgtaaaaat agaacatctt661 ctagtagtgt gttgtaattc ttttcagagg atcctatgct ttctcaagga tcctttcatg721 cattatgttc gatatcaagg aaaagcaatt ctggcttcaa agggaactct tattctgatg781 aagaaatgga aatttcatct tgtgaatttt tggcaatctt attttcactt ttggtctcaa841 ccgtatagga ttcatataaa gcaattatcc aactattcct tctcttttct ggggtatttt901 tcaagtgtac tagaaaatca tttggtagta agaaatcaaa tgctagagaa ttcatttata961 ataaatcttc tgactaagaa attcgatacc atagccccag ttatttctct tattggatca1021 ttgtcgaaag ctcaattttg tactgtattg ggtcatccta ttagtaaacc gatctggacc1081 gatttctcgg attctgatat tcttgatcga ttttgccgga tatgtagaaa tctttgtcgt1141 tatcacagcg gatcctcaaa aaaacaggtt ttgtatcgta taaaatatat acttcgactt1201 tcgtgtgcta gaactttggc acggaaacat aaaagtacag tacgcacttt tatgcgaaga1261 ttaggttcgg gattattaga agaattcttt atggaagaag aa
//

# save the sequence data to a local file, and then parse it with Bio.SeqIO
import os
from Bio import SeqIO
from Bio import EntrezEntrez.email = "A.N.Other@example.com" # Always tell NCBI who you are
filename = "EU490707.gbk"
if not os.path.isfile(filename):# Downloading...net_handle = Entrez.efetch(db="nucleotide", id="EU490707", rettype="gb", retmode="text")out_handle = open(filename, "w")out_handle.write(net_handle.read())out_handle.close()net_handle.close()print("Saved")
print("Parsing...")
record = SeqIO.read(filename, "genbank")
print(record)

ELink: Searching for related items in NCBI Entrez

# ELink: Searching for related items in NCBI Entrez
# can be used to find related items in the NCBI Entrez databases
# Let’s use ELink to find articles related to the Biopython application note published in Bioinformatics in 2009. The PubMed ID of this article is 19304878from Bio import EntrezEntrez.email = "A.N.Other@example.com" # Always tell NCBI who you are
pmid = "19304878"
record = Entrez.read(Entrez.elink(dbfrom="pubmed", id=pmid)) #The record variable consists of a Python list
# Since we specified only one PubMed ID to search for, record contains only one item. This item is a dictionary
# containing information about our search term, as well as all the related items that were found# record[0]  #list of one element
len(record[0]["LinkSetDb"])  #The "LinkSetDb" key contains the search results, a list consisting of one item for each target databasefor linksetdb in record[0]["LinkSetDb"]:print(linksetdb["DbTo"], linksetdb["LinkName"], len(linksetdb["Link"]))record[0]["LinkSetDb"][0]["Link"][0] #The actual search results are stored as under the "Link" keyfor link in record[0]["LinkSetDb"][0]["Link"]:print(link["Id"])

pubmed pubmed_pubmed 168
pubmed pubmed_pubmed_alsoviewed 22
pubmed pubmed_pubmed_citedin 1332
pubmed pubmed_pubmed_combined 6
pubmed pubmed_pubmed_five 6
pubmed pubmed_pubmed_refs 17
pubmed pubmed_pubmed_reviews 15
pubmed pubmed_pubmed_reviews_five 8{'Id': '19304878'}

Parsing Medline records

# Parsing Medline records
# You can find the Medline parser in Bio.Medline  . MEDLINE format used in PubMed
from Bio import Medlinewith open("pubmed_result1.txt") as handle:record = Medline.read(handle) #The record now contains the Medline record as a Python dictionaryrecord["PMID"]
record["AB"]#To parse a file containing multiple Medline records, you can use the parse function instead
with open("pubmed_result2.txt") as handle:for record in Medline.parse(handle):print(record["TI"])

Parsing GEO records

# Parsing GEO records
# The Bio.Geo module can be used to parse GEO-formatted data
# The following code fragment shows how to parse the example GEO file GSE16.txt into a record and print the record:from Bio import Geohandle = open("GSE16.txt")
records = Geo.parse(handle)
for record in records:print(record)

Parsing UniGene records

# Parsing UniGene records
# UniGene is an NCBI database of the transcriptome, with each UniGene record showing the set of transcripts
# that are associated with a particular gene in a specific organism
# This particular record shows the set of transcripts (shown in the SEQUENCE lines)
# The PROTSIM lines show proteins with significant similarity to target
# the STS lines show the corresponding sequence-tagged sites in the genome
from Bio import UniGeneinput = open("myunigenefile.data")
record = UniGene.read(input)
record.ID
record.title
record.sts[0].acc
record.sts[0].unistsinput = open("unigenerecords.data")
records = UniGene.parse(input)  #iterators
for record in records:print(record.ID)

ESpell: Obtaining spelling suggestions

# ESpell: Obtaining spelling suggestions
# ESpell retrieves spelling suggestions. In this example, we use Bio.Entrez.espell() to obtain the correct spelling of Biopython
from Bio import EntrezEntrez.email = "A.N.Other@example.com" # Always tell NCBI who you are
handle = Entrez.espell(term="biopythooon")
record = Entrez.read(handle)
record["Query"]
'biopythooon'
record["CorrectedQuery"]

'biopython'

ESummary: Retrieving summaries from primary IDs

# ESummary: Retrieving summaries from primary IDs
# ESummary retrieves document summaries from a list of primary IDs
handle = Entrez.esummary(db="nlmcatalog", id="101660833")
record = Entrez.read(handle)
info = record[0]["TitleMainList"][0]
print("Journal info\nid: {}\nTitle: {}".format(record[0]["Id"], info["Title"]))

Journal info
id: 101660833
Title: IEEE transactions on computational imaging.

example 1

# example 1
from Bio import EntrezEntrez.email = "A.N.Other@example.com" # Always tell NCBI who you are
handle = Entrez.esearch(db="pubmed", term="biopython")
record = Entrez.read(handle)
record["IdList"]
['19304878', '18606172', '16403221', '16377612', '14871861', '14630660', '12230038']#We now use Bio.Entrez.efetch to download these Medline records:
idlist = record["IdList"]
handle = Entrez.efetch(db="pubmed", id=idlist, rettype="medline", retmode="text")# Here, we specify rettype="medline", retmode="text" to obtain the Medline records in plain-text Medline
# format. Now we use Bio.Medline to parse these records
from Bio import Medlinerecords = Medline.parse(handle)
for record in records:print(record["AU"])

example 2

# example 2
# In this example, we will query PubMed for all articles having to do with orchids
from Bio import EntrezEntrez.email = "A.N.Other@example.com" # Always tell NCBI who you are
handle = Entrez.egquery(term="orchid") #check how many of such articles there are
record = Entrez.read(handle)for row in record["eGQueryResult"]:if row["DbName"]=="pubmed":print(row["Count"])# Now we use the Bio.Entrez.efetch function to download the PubMed IDs of these 463 articles
handle = Entrez.esearch(db="pubmed", term="orchid", retmax=463)
record = Entrez.read(handle)
handle.close()
idlist = record["IdList"] #This returns a Python list containing all of the PubMed IDs of articles related to orchids# get the corresponding Medline records and extract the information from them
from Bio import Medlinehandle = Entrez.efetch(db="pubmed", id=idlist, rettype="medline",retmode="text")records = Medline.parse(handle)records = list(records)
for record in records:print("title:", record.get("TI", "?"))print("authors:", record.get("AU", "?"))print("source:", record.get("SO", "?"))print("")

example 3

Searching, downloading, and parsing Entrez Nucleotide records

# example 3
# Searching, downloading, and parsing Entrez Nucleotide records
from Bio import EntrezEntrez.email = "A.N.Other@example.com" # Always tell NCBI who you are
handle = Entrez.egquery(term="Cypripedioideae") # EGQuery will tell us how many search results were found in each of the databases
record = Entrez.read(handle)
for row in record["eGQueryResult"]:if row["DbName"]=="nuccore": # we are only interested in nucleotidesprint(row["Count"])handle = Entrez.esearch(db="nucleotide", term="Cypripedioideae", retmax=814, idtype="acc")
# use the retmax argument to restrict the maximum number of records retrieved
record = Entrez.read(handle)
handle.close()# We can download these records using efetch
idlist = ",".join(record["IdList"][:5])
print(idlist)
# KX265015.1, KX265014.1, KX265013.1, KX265012.1, KX265011.1]handle = Entrez.efetch(db="nucleotide", id=idlist, retmode="xml")
records = Entrez.read(handle)
len(records)

Searching, downloading, and parsing GenBank records

# Searching, downloading, and parsing GenBank records
from Bio import EntrezEntrez.email = "A.N.Other@example.com" # Always tell NCBI who you are
handle = Entrez.egquery(term="Opuntia AND rpl16")
record = Entrez.read(handle)
for row in record["eGQueryResult"]:if row["DbName"]=="nuccore":print(row["Count"])# download the list of GenBank identifiers
handle = Entrez.esearch(db="nuccore", term="Opuntia AND rpl16")
record = Entrez.read(handle)
gi_list = record["IdList"]
gi_list

37['1972904692', '1972904685', '1972904678', '1972904671', '1972904664', '1972904657', '1972904650', '1972904643', '1972904636', '1972904629', '1972904622', '1841709044', '377581039', '330887241', '330887240', '330887239', '330887238', '330887237', '330887236', '330887235']

# Now we use these GIs to download the GenBank records
gi_str = ",".join(gi_list)
handle = Entrez.efetch(db="nuccore", id=gi_str, rettype="gb", retmode="text")
text = handle.read() # or download friendly
handle = Entrez.efetch(db="nuccore", id=gi_str, rettype="gb", retmode="text")
records = SeqIO.parse(handle, "gb") #iterators

Using the history and WebEnv – most import

# Using the history and WebEnv -- most import
# the NCBI prefer you to take advantage of their history support - for example combining ESearch and EFetch
# Another typical use of the history support would be to combine EPost and EFetch# Searching for and downloading sequences using the history
# To do this, call Bio.Entrez.esearch() as normal, but with the additional argument of usehistory="y"
from Bio import EntrezEntrez.email = "history.user@example.com" # Always tell NCBI who you are
search_handle = Entrez.esearch(db="nucleotide",term="Opuntia[orgn] and rpl16",usehistory="y", idtype="acc")
search_results = Entrez.read(search_handle)
search_handle.close()acc_list = search_results["IdList"]
len(acc_list) #the XML output includes the first retmax search results, with retmax defaulting to 20
count = int(search_results["Count"])webenv = search_results["WebEnv"]
query_key = search_results["QueryKey"]# You use the retstart and retmax parameters to specify which range of search results you want
# returned (starting entry using zero-based counting, and maximum number of results to return)batch_size = 3
out_handle = open("orchid_rpl16.fasta", "w")
for start in range(0, count, batch_size):end = min(count, start + batch_size)print("Going to download record %i to %i" % (start + 1, end))fetch_handle = Entrez.efetch(db="nucleotide",rettype="fasta",retmode="text",retstart=start,retmax=batch_size,webenv=webenv,query_key=query_key,idtype="acc",)data = fetch_handle.read()fetch_handle.close()out_handle.write(data)out_handle.close()

Searching for and downloading abstracts using the history

# Searching for and downloading abstracts using the historyfrom Bio import EntrezEntrez.email = "history.user@example.com"
search_results = Entrez.read(Entrez.esearch(db="pubmed", term="Opuntia[ORGN]", reldate=365, datetype="pdat", usehistory="y"))count = int(search_results["Count"])
print("Found %i results" % count)
batch_size = 10out_handle = open("recent_orchid_papers.txt", "w")
for start in range(0, count, batch_size):end = min(count, start + batch_size)print("Going to download record %i to %i" % (start + 1, end))fetch_handle = Entrez.efetch(db="pubmed",rettype="medline",retmode="text",retstart=start,retmax=batch_size,webenv=search_results["WebEnv"],query_key=search_results["QueryKey"],)data = fetch_handle.read()fetch_handle.close()out_handle.write(data)
out_handle.close()

Biopython -- Bio.Entrez module相关推荐

蛋白质结构信息获取与解析（基于Biopython）
通常情况下,一个蛋白质所包含的信息是非常多的,与结构相关的包括:包括链名.氨基酸残基序列.原子坐标等.一个蛋白质的结构相关的信息可以以pdb文件的形式保存,这些文件可以直接从PDB.NCBI等数据库获 ...
biopython 【1】简单介绍【常用板块、安装】
[学习]https://blog.csdn.net/weixin_43569478/article/details/111714256 Biopython工程是一个使用Python来开发计算分子生物学 ...
将blast等工具的命令行写入到biopython的代码脚本中
biopython: Bio.Application package全面解析将生物信息学工具(blast,muscle,bwa,samtool等)的命令行写入到biopython的代码脚本中大家好 ...
【Bioconda】Can‘t locate Bio/SeqIO.pm in @INC
Ubuntu 18.04 报错: Can't locate Bio/SeqIO.pm in @INC (you may need to install the Bio::SeqIO module) 参 ...
生物信息中的Python 05 | 从 Genbank 文件中提取 CDS 等其他特征序列
1 介绍在基因结构分析或其他生物功能分析中会时常用到 CDS 序列,以及其他诸如 mRNA 序列,misc RNA序列等具有生物意义的序列片段.而NCBI 的基因库中已经包含有这些的信息,但是只有一 ...
alignment object and alignment tools
文章目录 multiple sequence alignment object Writing Alignments manipulating alignment result Alignment T ...
根据氨基酸变化，从NCBI - ClinVar数据库抓取信息（基于python - BeautifulSoup）
写在前面 NCBI 网址中提供有各种数据库,这里使用'ClinVar'数据库.从ClinVar数据库搜索氨基酸变化信息后, 获取搜索结果的相关信息. biopython 提供了访问NCBI Entre ...
IRAP 的安装以及运行的实例主要就是瞎调让程序能跑，，，
环境:AWS Ubuntu 18.04 t2.txlarge 在实验室打杂,老板让用的一个做RNA-seq数据的奇怪的pipeline,irap. 全名是高大上的 Integrate RNA sequ ...
PDB Database - AlphaFold DB PDB 数据集的多维度分析与整理 (2)
欢迎关注我的CSDN:https://spike.blog.csdn.net/ 本文地址:https://blog.csdn.net/caroline_wendy/article/details/13 ...
biopython安装_BioPython的安装和使用
BioPython 是一个用来处理序列和生物信息的python包,里面包含了很多的工具,可以用来直接读取fasta格式.安装可以通过两种方式,pip方式: 1. pip 方式 pip3 install ...

Biopython -- Bio.Entrez module