文章目录

  • EInfo: Obtaining information about the Entrez databases
    • For each of these databases, we can use EInfo again to obtain more information
  • ESearch: Searching the Entrez databases
  • EPost: Uploading a list of identifiers
  • EFetch: Downloading full records from Entrez
  • ELink: Searching for related items in NCBI Entrez
  • Parsing Medline records
  • Parsing GEO records
  • Parsing UniGene records
  • ESpell: Obtaining spelling suggestions
  • ESummary: Retrieving summaries from primary IDs
  • example 1
  • example 2
  • example 3
    • Searching, downloading, and parsing Entrez Nucleotide records
  • Searching, downloading, and parsing GenBank records
  • Using the history and WebEnv -- most import
  • Searching for and downloading abstracts using the history

Entrez (https://www.ncbi.nlm.nih.gov/Web/Search/entrezfs.html) is a data retrieval system that provides users access to NCBI’s databases such as PubMed, GenBank, GEO, and many others. You can access Entrez from a web browser to manually enter queries, or you can use Biopython’s Bio.Entrez module for programmatic access to Entrez. The latter allows you for example to search PubMed or download GenBank records from within a Python script.

The Bio.Entrez module makes use of the Entrez Programming Utilities (also known as EUtils), consisting of eight tools that are described in detail on NCBI’s page at https://www.ncbi.nlm.nih.gov/books/NBK25501/. Each of these tools corresponds to one Python function in the Bio.Entrez module, as described in the sections below.

Before using Biopython to access the NCBI’s online resources (via Bio.Entrez or some of the other modules), please read the NCBI’s Entrez User Requirements. If the NCBI finds you are abusing their systems, they can and will ban your access!

• For any series of more than 100 requests, do this at weekends or outside USA peak times. This is up
to you to obey.
• Use the https://eutils.ncbi.nlm.nih.gov address, not the standard NCBI Web address. Biopython
uses this web address.
• If you are using a API key, you can make at most 10 queries per second, otherwise at most 3 queries per
second.
• Use the optional email parameter so the NCBI can contact you if there is a problem. You can either explicitly set this as a parameter with each call to Entrez (e.g. include email=“A.N.Other@example.com”
in the argument list), or you can set a global email address:

>>> from Bio import Entrez
>>> Entrez.email = "A.N.Other@example.com"

• For large queries, the NCBI also recommend using their session history feature

EInfo: Obtaining information about the Entrez databases

# EInfo: Obtaining information about the Entrez databases
# EInfo provides field index term counts, last update, and available links for each of NCBI’s databases
# In addition, you can use EInfo to obtain a list of all database names accessible through the Entrez utilitiesfrom Bio import EntrezEntrez.email = "A.N.Other@example.com" # Always tell NCBI who you are
handle = Entrez.einfo()
result = handle.read()
handle.close()
print(result)
<?xml version="1.0" encoding="UTF-8" ?>
<!DOCTYPE eInfoResult PUBLIC "-//NLM//DTD einfo 20190110//EN" "https://eutils.ncbi.nlm.nih.gov/eutils/dtd/20190110/einfo.dtd">
<eInfoResult>
<DbList><DbName>pubmed</DbName><DbName>protein</DbName><DbName>nuccore</DbName><DbName>ipg</DbName><DbName>nucleotide</DbName><DbName>structure</DbName><DbName>genome</DbName><DbName>annotinfo</DbName><DbName>assembly</DbName><DbName>bioproject</DbName><DbName>biosample</DbName><DbName>blastdbinfo</DbName><DbName>books</DbName><DbName>cdd</DbName><DbName>clinvar</DbName><DbName>gap</DbName><DbName>gapplus</DbName><DbName>grasp</DbName><DbName>dbvar</DbName><DbName>gene</DbName><DbName>gds</DbName><DbName>geoprofiles</DbName><DbName>homologene</DbName><DbName>medgen</DbName><DbName>mesh</DbName><DbName>ncbisearch</DbName><DbName>nlmcatalog</DbName><DbName>omim</DbName><DbName>orgtrack</DbName><DbName>pmc</DbName><DbName>popset</DbName><DbName>proteinclusters</DbName><DbName>pcassay</DbName><DbName>protfam</DbName><DbName>biosystems</DbName><DbName>pccompound</DbName><DbName>pcsubstance</DbName><DbName>seqannot</DbName><DbName>snp</DbName><DbName>sra</DbName><DbName>taxonomy</DbName><DbName>biocollections</DbName><DbName>gtr</DbName>
</DbList></eInfoResult>
# Bio.Entrez’s parser parse XML
handle = Entrez.einfo()
record = Entrez.read(handle)
record.keys()
dict_keys(['DbList'])
record["DbList"]
['pubmed', 'protein', 'nuccore', 'ipg', 'nucleotide', 'structure', 'genome', 'annotinfo', 'assembly', 'bioproject', 'biosample', 'blastdbinfo', 'books', 'cdd', 'clinvar', 'gap', 'gapplus', 'grasp', 'dbvar', 'gene', 'gds', 'geoprofiles', 'homologene', 'medgen', 'mesh', 'ncbisearch', 'nlmcatalog', 'omim', 'orgtrack', 'pmc', 'popset', 'proteinclusters', 'pcassay', 'protfam', 'biosystems', 'pccompound', 'pcsubstance', 'seqannot', 'snp', 'sra', 'taxonomy', 'biocollections', 'gtr']

For each of these databases, we can use EInfo again to obtain more information

# For each of these databases, we can use EInfo again to obtain more information:
handle = Entrez.einfo(db="pubmed")
record = Entrez.read(handle)
record["DbInfo"] #输出一个字典,可以再次引用keys然后读取响应的信息

ESearch: Searching the Entrez databases

# ESearch: Searching the Entrez databases
# To search any of these databases, we use Bio.Entrez.esearch(). For example, let’s search in PubMed for
# publications that include Biopython in their titlehandle = Entrez.esearch(db="pubmed", term="biopython[title]", retmax="40" )
record = Entrez.read(handle)
# "19304878" in record["IdList"]
print(record["IdList"])
['34434786', '22909249', '19304878']
# You can also use ESearch to search GenBank. Here we’ll do a quick search for the matK gene in Cypripedioideae orchids
handle = Entrez.esearch(db="nucleotide", term="Cypripedioideae[Orgn] AND matK[Gene]", idtype="acc")
record = Entrez.read(handle)
print(record["Count"])
print(record["IdList"]) #Each of the IDs is a GenBank identifier
585
['NC_058834.1', 'NC_058833.1', 'NC_058832.1', 'OK120861.1', 'OK120860.1', 'NC_058212.1', 'NC_058211.1', 'NC_058210.1', 'NC_058209.1', 'NC_058208.1', 'NC_058207.1', 'NC_058206.1', 'MN315109.1', 'MN315108.1', 'MN315107.1', 'MN315106.1', 'MN315105.1', 'NC_056912.1', 'MZ150831.1', 'MZ150830.1']

EPost: Uploading a list of identifiers

# EPost: Uploading a list of identifiers
# EPost uploads a list of UIs for use in subsequent search strategies
# To give an example of when this is useful, suppose you have a long list of IDs you want to download
# using EFetch (maybe sequences, maybe citations { anything). When you make a request with EFetch your
# list of IDs, the database etc, are all turned into a long URL sent to the server. If your list of IDs is long,
# this URL gets long, and long URLs can break (e.g. some proxies don’t cope well)
id_list = ["19304878", "18606172", "16403221", "16377612", "14871861", "14630660"]
print(Entrez.epost("pubmed", id=",".join(id_list)).read())
<?xml version="1.0" encoding="UTF-8" ?>
<!DOCTYPE ePostResult PUBLIC "-//NLM//DTD epost 20090526//EN" "https://eutils.ncbi.nlm.nih.gov/eutils/dtd/20090526/epost.dtd"><ePostResult><QueryKey>1</QueryKey><WebEnv>MCID_61a9ccaf13217214767bf8b1</WebEnv>
</ePostResult>
# The returned XML includes two important strings, QueryKey and WebEnv which together define your history
# session. You would extract these values for use with another Entrez call such as EFetch
id_list = ["19304878", "18606172", "16403221", "16377612", "14871861", "14630660"]
search_results = Entrez.read(Entrez.epost("pubmed", id=",".join(id_list)))
webenv = search_results["WebEnv"]
query_key = search_results["QueryKey"]

EFetch: Downloading full records from Entrez

# EFetch: Downloading full records from Entrez
# EFetch is what you use when you want to retrieve a full record from Entrez
# For most of their databases, the NCBI support several different file formats. Requesting a specific file
# format from Entrez using Bio.Entrez.efetch() requires specifying the rettype and/or retmode optional arguments.handle = Entrez.efetch(db="nucleotide", id="EU490707", rettype="gb", retmode="text")
print(handle.read())
LOCUS       EU490707                1302 bp    DNA     linear   PLN 26-JUL-2016
DEFINITION  Selenipedium aequinoctiale maturase K (matK) gene, partial cds;chloroplast.
ACCESSION   EU490707
VERSION     EU490707.1
KEYWORDS    .
SOURCE      chloroplast Selenipedium aequinoctialeORGANISM  Selenipedium aequinoctialeEukaryota; Viridiplantae; Streptophyta; Embryophyta; Tracheophyta;Spermatophyta; Magnoliopsida; Liliopsida; Asparagales; Orchidaceae;Cypripedioideae; Selenipedium.
REFERENCE   1  (bases 1 to 1302)AUTHORS   Neubig,K.M., Whitten,W.M., Carlsward,B.S., Blanco,M.A., Endara,L.,Williams,N.H. and Moore,M.TITLE     Phylogenetic utility of ycf1 in orchids: a plastid gene morevariable than matKJOURNAL   Plant Syst. Evol. 277 (1-2), 75-84 (2009)
REFERENCE   2  (bases 1 to 1302)AUTHORS   Neubig,K.M., Whitten,W.M., Carlsward,B.S., Blanco,M.A.,Endara,C.L., Williams,N.H. and Moore,M.J.TITLE     Direct SubmissionJOURNAL   Submitted (14-FEB-2008) Department of Botany, University ofFlorida, 220 Bartram Hall, Gainesville, FL 32611-8526, USA
FEATURES             Location/Qualifierssource          1..1302/organism="Selenipedium aequinoctiale"/organelle="plastid:chloroplast"/mol_type="genomic DNA"/specimen_voucher="FLAS:Blanco 2475"/db_xref="taxon:256374"gene            <1..>1302/gene="matK"CDS             <1..>1302/gene="matK"/codon_start=1/transl_table=11/product="maturase K"/protein_id="ACC99456.1"/translation="IFYEPVEIFGYDNKSSLVLVKRLITRMYQQNFLISSVNDSNQKGFWGHKHFFSSHFSSQMVSEGFGVILEIPFSSQLVSSLEEKKIPKYQNLRSIHSIFPFLEDKFLHLNYVSDLLIPHPIHLEILVQILQCRIKDVPSLHLLRLLFHEYHNLNSLITSKKFIYAFSKRKKRFLWLLYNSYVYECEYLFQFLRKQSSYLRSTSSGVFLERTHLYVKIEHLLVVCCNSFQRILCFLKDPFMHYVRYQGKAILASKGTLILMKKWKFHLVNFWQSYFHFWSQPYRIHIKQLSNYSFSFLGYFSSVLENHLVVRNQMLENSFIINLLTKKFDTIAPVISLIGSLSKAQFCTVLGHPISKPIWTDFSDSDILDRFCRICRNLCRYHSGSSKKQVLYRIKYILRLSCARTLARKHKSTVRTFMRRLGSGLLEEFFMEEE"
ORIGIN      1 attttttacg aacctgtgga aatttttggt tatgacaata aatctagttt agtacttgtg61 aaacgtttaa ttactcgaat gtatcaacag aattttttga tttcttcggt taatgattct121 aaccaaaaag gattttgggg gcacaagcat tttttttctt ctcatttttc ttctcaaatg181 gtatcagaag gttttggagt cattctggaa attccattct cgtcgcaatt agtatcttct241 cttgaagaaa aaaaaatacc aaaatatcag aatttacgat ctattcattc aatatttccc301 tttttagaag acaaattttt acatttgaat tatgtgtcag atctactaat accccatccc361 atccatctgg aaatcttggt tcaaatcctt caatgccgga tcaaggatgt tccttctttg421 catttattgc gattgctttt ccacgaatat cataatttga atagtctcat tacttcaaag481 aaattcattt acgccttttc aaaaagaaag aaaagattcc tttggttact atataattct541 tatgtatatg aatgcgaata tctattccag tttcttcgta aacagtcttc ttatttacga601 tcaacatctt ctggagtctt tcttgagcga acacatttat atgtaaaaat agaacatctt661 ctagtagtgt gttgtaattc ttttcagagg atcctatgct ttctcaagga tcctttcatg721 cattatgttc gatatcaagg aaaagcaatt ctggcttcaa agggaactct tattctgatg781 aagaaatgga aatttcatct tgtgaatttt tggcaatctt attttcactt ttggtctcaa841 ccgtatagga ttcatataaa gcaattatcc aactattcct tctcttttct ggggtatttt901 tcaagtgtac tagaaaatca tttggtagta agaaatcaaa tgctagagaa ttcatttata961 ataaatcttc tgactaagaa attcgatacc atagccccag ttatttctct tattggatca1021 ttgtcgaaag ctcaattttg tactgtattg ggtcatccta ttagtaaacc gatctggacc1081 gatttctcgg attctgatat tcttgatcga ttttgccgga tatgtagaaa tctttgtcgt1141 tatcacagcg gatcctcaaa aaaacaggtt ttgtatcgta taaaatatat acttcgactt1201 tcgtgtgcta gaactttggc acggaaacat aaaagtacag tacgcacttt tatgcgaaga1261 ttaggttcgg gattattaga agaattcttt atggaagaag aa
//
# save the sequence data to a local file, and then parse it with Bio.SeqIO
import os
from Bio import SeqIO
from Bio import EntrezEntrez.email = "A.N.Other@example.com" # Always tell NCBI who you are
filename = "EU490707.gbk"
if not os.path.isfile(filename):# Downloading...net_handle = Entrez.efetch(db="nucleotide", id="EU490707", rettype="gb", retmode="text")out_handle = open(filename, "w")out_handle.write(net_handle.read())out_handle.close()net_handle.close()print("Saved")
print("Parsing...")
record = SeqIO.read(filename, "genbank")
print(record)

ELink: Searching for related items in NCBI Entrez

# ELink: Searching for related items in NCBI Entrez
# can be used to find related items in the NCBI Entrez databases
# Let’s use ELink to find articles related to the Biopython application note published in Bioinformatics in 2009. The PubMed ID of this article is 19304878from Bio import EntrezEntrez.email = "A.N.Other@example.com" # Always tell NCBI who you are
pmid = "19304878"
record = Entrez.read(Entrez.elink(dbfrom="pubmed", id=pmid)) #The record variable consists of a Python list
# Since we specified only one PubMed ID to search for, record contains only one item. This item is a dictionary
# containing information about our search term, as well as all the related items that were found# record[0]  #list of one element
len(record[0]["LinkSetDb"])  #The "LinkSetDb" key contains the search results, a list consisting of one item for each target databasefor linksetdb in record[0]["LinkSetDb"]:print(linksetdb["DbTo"], linksetdb["LinkName"], len(linksetdb["Link"]))record[0]["LinkSetDb"][0]["Link"][0] #The actual search results are stored as under the "Link" keyfor link in record[0]["LinkSetDb"][0]["Link"]:print(link["Id"])
pubmed pubmed_pubmed 168
pubmed pubmed_pubmed_alsoviewed 22
pubmed pubmed_pubmed_citedin 1332
pubmed pubmed_pubmed_combined 6
pubmed pubmed_pubmed_five 6
pubmed pubmed_pubmed_refs 17
pubmed pubmed_pubmed_reviews 15
pubmed pubmed_pubmed_reviews_five 8{'Id': '19304878'}

Parsing Medline records

# Parsing Medline records
# You can find the Medline parser in Bio.Medline  . MEDLINE format used in PubMed
from Bio import Medlinewith open("pubmed_result1.txt") as handle:record = Medline.read(handle) #The record now contains the Medline record as a Python dictionaryrecord["PMID"]
record["AB"]#To parse a file containing multiple Medline records, you can use the parse function instead
with open("pubmed_result2.txt") as handle:for record in Medline.parse(handle):print(record["TI"])

Parsing GEO records

# Parsing GEO records
# The Bio.Geo module can be used to parse GEO-formatted data
# The following code fragment shows how to parse the example GEO file GSE16.txt into a record and print the record:from Bio import Geohandle = open("GSE16.txt")
records = Geo.parse(handle)
for record in records:print(record)

Parsing UniGene records

# Parsing UniGene records
# UniGene is an NCBI database of the transcriptome, with each UniGene record showing the set of transcripts
# that are associated with a particular gene in a specific organism
# This particular record shows the set of transcripts (shown in the SEQUENCE lines)
# The PROTSIM lines show proteins with significant similarity to target
# the STS lines show the corresponding sequence-tagged sites in the genome
from Bio import UniGeneinput = open("myunigenefile.data")
record = UniGene.read(input)
record.ID
record.title
record.sts[0].acc
record.sts[0].unistsinput = open("unigenerecords.data")
records = UniGene.parse(input)  #iterators
for record in records:print(record.ID)

ESpell: Obtaining spelling suggestions

# ESpell: Obtaining spelling suggestions
# ESpell retrieves spelling suggestions. In this example, we use Bio.Entrez.espell() to obtain the correct spelling of Biopython
from Bio import EntrezEntrez.email = "A.N.Other@example.com" # Always tell NCBI who you are
handle = Entrez.espell(term="biopythooon")
record = Entrez.read(handle)
record["Query"]
'biopythooon'
record["CorrectedQuery"]
'biopython'

ESummary: Retrieving summaries from primary IDs

# ESummary: Retrieving summaries from primary IDs
# ESummary retrieves document summaries from a list of primary IDs
handle = Entrez.esummary(db="nlmcatalog", id="101660833")
record = Entrez.read(handle)
info = record[0]["TitleMainList"][0]
print("Journal info\nid: {}\nTitle: {}".format(record[0]["Id"], info["Title"]))
Journal info
id: 101660833
Title: IEEE transactions on computational imaging.

example 1

# example 1
from Bio import EntrezEntrez.email = "A.N.Other@example.com" # Always tell NCBI who you are
handle = Entrez.esearch(db="pubmed", term="biopython")
record = Entrez.read(handle)
record["IdList"]
['19304878', '18606172', '16403221', '16377612', '14871861', '14630660', '12230038']#We now use Bio.Entrez.efetch to download these Medline records:
idlist = record["IdList"]
handle = Entrez.efetch(db="pubmed", id=idlist, rettype="medline", retmode="text")# Here, we specify rettype="medline", retmode="text" to obtain the Medline records in plain-text Medline
# format. Now we use Bio.Medline to parse these records
from Bio import Medlinerecords = Medline.parse(handle)
for record in records:print(record["AU"])

example 2

# example 2
# In this example, we will query PubMed for all articles having to do with orchids
from Bio import EntrezEntrez.email = "A.N.Other@example.com" # Always tell NCBI who you are
handle = Entrez.egquery(term="orchid") #check how many of such articles there are
record = Entrez.read(handle)for row in record["eGQueryResult"]:if row["DbName"]=="pubmed":print(row["Count"])# Now we use the Bio.Entrez.efetch function to download the PubMed IDs of these 463 articles
handle = Entrez.esearch(db="pubmed", term="orchid", retmax=463)
record = Entrez.read(handle)
handle.close()
idlist = record["IdList"] #This returns a Python list containing all of the PubMed IDs of articles related to orchids# get the corresponding Medline records and extract the information from them
from Bio import Medlinehandle = Entrez.efetch(db="pubmed", id=idlist, rettype="medline",retmode="text")records = Medline.parse(handle)records = list(records)
for record in records:print("title:", record.get("TI", "?"))print("authors:", record.get("AU", "?"))print("source:", record.get("SO", "?"))print("")

example 3

Searching, downloading, and parsing Entrez Nucleotide records

# example 3
# Searching, downloading, and parsing Entrez Nucleotide records
from Bio import EntrezEntrez.email = "A.N.Other@example.com" # Always tell NCBI who you are
handle = Entrez.egquery(term="Cypripedioideae") # EGQuery will tell us how many search results were found in each of the databases
record = Entrez.read(handle)
for row in record["eGQueryResult"]:if row["DbName"]=="nuccore": # we are only interested in nucleotidesprint(row["Count"])handle = Entrez.esearch(db="nucleotide", term="Cypripedioideae", retmax=814, idtype="acc")
# use the retmax argument to restrict the maximum number of records retrieved
record = Entrez.read(handle)
handle.close()# We can download these records using efetch
idlist = ",".join(record["IdList"][:5])
print(idlist)
# KX265015.1, KX265014.1, KX265013.1, KX265012.1, KX265011.1]handle = Entrez.efetch(db="nucleotide", id=idlist, retmode="xml")
records = Entrez.read(handle)
len(records)

Searching, downloading, and parsing GenBank records

# Searching, downloading, and parsing GenBank records
from Bio import EntrezEntrez.email = "A.N.Other@example.com" # Always tell NCBI who you are
handle = Entrez.egquery(term="Opuntia AND rpl16")
record = Entrez.read(handle)
for row in record["eGQueryResult"]:if row["DbName"]=="nuccore":print(row["Count"])# download the list of GenBank identifiers
handle = Entrez.esearch(db="nuccore", term="Opuntia AND rpl16")
record = Entrez.read(handle)
gi_list = record["IdList"]
gi_list
37['1972904692', '1972904685', '1972904678', '1972904671', '1972904664', '1972904657', '1972904650', '1972904643', '1972904636', '1972904629', '1972904622', '1841709044', '377581039', '330887241', '330887240', '330887239', '330887238', '330887237', '330887236', '330887235']
# Now we use these GIs to download the GenBank records
gi_str = ",".join(gi_list)
handle = Entrez.efetch(db="nuccore", id=gi_str, rettype="gb", retmode="text")
text = handle.read() # or download friendly
handle = Entrez.efetch(db="nuccore", id=gi_str, rettype="gb", retmode="text")
records = SeqIO.parse(handle, "gb") #iterators

Using the history and WebEnv – most import

# Using the history and WebEnv -- most import
# the NCBI prefer you to take advantage of their history support - for example combining ESearch and EFetch
# Another typical use of the history support would be to combine EPost and EFetch# Searching for and downloading sequences using the history
# To do this, call Bio.Entrez.esearch() as normal, but with the additional argument of usehistory="y"
from Bio import EntrezEntrez.email = "history.user@example.com" # Always tell NCBI who you are
search_handle = Entrez.esearch(db="nucleotide",term="Opuntia[orgn] and rpl16",usehistory="y", idtype="acc")
search_results = Entrez.read(search_handle)
search_handle.close()acc_list = search_results["IdList"]
len(acc_list) #the XML output includes the first retmax search results, with retmax defaulting to 20
count = int(search_results["Count"])webenv = search_results["WebEnv"]
query_key = search_results["QueryKey"]# You use the retstart and retmax parameters to specify which range of search results you want
# returned (starting entry using zero-based counting, and maximum number of results to return)batch_size = 3
out_handle = open("orchid_rpl16.fasta", "w")
for start in range(0, count, batch_size):end = min(count, start + batch_size)print("Going to download record %i to %i" % (start + 1, end))fetch_handle = Entrez.efetch(db="nucleotide",rettype="fasta",retmode="text",retstart=start,retmax=batch_size,webenv=webenv,query_key=query_key,idtype="acc",)data = fetch_handle.read()fetch_handle.close()out_handle.write(data)out_handle.close()

Searching for and downloading abstracts using the history

# Searching for and downloading abstracts using the historyfrom Bio import EntrezEntrez.email = "history.user@example.com"
search_results = Entrez.read(Entrez.esearch(db="pubmed", term="Opuntia[ORGN]", reldate=365, datetype="pdat", usehistory="y"))count = int(search_results["Count"])
print("Found %i results" % count)
batch_size = 10out_handle = open("recent_orchid_papers.txt", "w")
for start in range(0, count, batch_size):end = min(count, start + batch_size)print("Going to download record %i to %i" % (start + 1, end))fetch_handle = Entrez.efetch(db="pubmed",rettype="medline",retmode="text",retstart=start,retmax=batch_size,webenv=search_results["WebEnv"],query_key=search_results["QueryKey"],)data = fetch_handle.read()fetch_handle.close()out_handle.write(data)
out_handle.close()

Biopython -- Bio.Entrez module相关推荐

  1. 蛋白质结构信息获取与解析(基于Biopython)

    通常情况下,一个蛋白质所包含的信息是非常多的,与结构相关的包括:包括链名.氨基酸残基序列.原子坐标等.一个蛋白质的结构相关的信息可以以pdb文件的形式保存,这些文件可以直接从PDB.NCBI等数据库获 ...

  2. biopython 【1】简单介绍【常用板块、安装】

    [学习]https://blog.csdn.net/weixin_43569478/article/details/111714256 Biopython工程是一个使用Python来开发计算分子生物学 ...

  3. 将blast等工具的命令行写入到biopython的代码脚本中

    biopython: Bio.Application package全面解析 将生物信息学工具(blast,muscle,bwa,samtool等)的命令行写入到biopython的代码脚本中 大家好 ...

  4. 【Bioconda】Can‘t locate Bio/SeqIO.pm in @INC

    Ubuntu 18.04 报错: Can't locate Bio/SeqIO.pm in @INC (you may need to install the Bio::SeqIO module) 参 ...

  5. 生物信息中的Python 05 | 从 Genbank 文件中提取 CDS 等其他特征序列

    1 介绍 在基因结构分析或其他生物功能分析中会时常用到 CDS 序列,以及其他诸如 mRNA 序列,misc RNA序列等具有生物意义的序列片段.而NCBI 的基因库中已经包含有这些的信息,但是只有一 ...

  6. alignment object and alignment tools

    文章目录 multiple sequence alignment object Writing Alignments manipulating alignment result Alignment T ...

  7. 根据氨基酸变化,从NCBI - ClinVar数据库抓取信息(基于python - BeautifulSoup)

    写在前面 NCBI 网址中提供有各种数据库,这里使用'ClinVar'数据库.从ClinVar数据库搜索氨基酸变化信息后, 获取搜索结果的相关信息. biopython 提供了访问NCBI Entre ...

  8. IRAP 的安装以及运行的实例 主要就是瞎调让程序能跑,,,

    环境:AWS Ubuntu 18.04 t2.txlarge 在实验室打杂,老板让用的一个做RNA-seq数据的奇怪的pipeline,irap. 全名是高大上的 Integrate RNA sequ ...

  9. PDB Database - AlphaFold DB PDB 数据集的多维度分析与整理 (2)

    欢迎关注我的CSDN:https://spike.blog.csdn.net/ 本文地址:https://blog.csdn.net/caroline_wendy/article/details/13 ...

  10. biopython安装_BioPython的安装和使用

    BioPython 是一个用来处理序列和生物信息的python包,里面包含了很多的工具,可以用来直接读取fasta格式.安装可以通过两种方式,pip方式: 1. pip 方式 pip3 install ...

最新文章

  1. Microsoft training Kits
  2. 手动爬虫之流程笔记1(python3)
  3. python源码剖析_Python源码剖析 - 对象初探
  4. 旷视科技印奇:孜孜不倦做硬件 看好3个应用场景
  5. 字符串之String类
  6. 让VS2010支持Windows2000
  7. mysql中sql语句有if_Sql中的if函数学习
  8. 最简单的视音频播放示例2:GDI播放YUV, RGB
  9. CentOS 配置防火墙操作实例(启、停、开、闭port)
  10. ORACLE 包内的存储过程的动态sql
  11. 用Excel求解线性规划问题
  12. node.js 上传文件比较 busboy vs. formidable vs. multer vs. multiparty
  13. 腾讯云云服务器的地域和可用区有哪些?已购买的腾讯云服务器可以更换地域吗?
  14. IMO2017day1.1
  15. How to get current full screen dimention and orientation in run time
  16. html班级管理,谈小学班级管理
  17. HTML5网页文本内容
  18. 下载pyboard的flash中的驱动程序_驱动人生下载-驱动人生绿色最新下载正式版
  19. ZJM 与纸条(KMP算法)
  20. 服务器信息怎么备份,DNS服务器信息备份与还原教程

热门文章

  1. libx264 编码参数调整--流媒体
  2. 前端高级进阶13本经典书籍
  3. ele表单验证的数字的坑
  4. react-native系列(11)组件篇:Image图片加载和ImageEditor图片剪切
  5. python保存excel文件列宽自适应解决方案
  6. 二进制数转换为十进制数c语言程序代码,任意二进制数转换为十进制数
  7. 把AppData目录挪到D盘的方法方法
  8. 功能最强大的编辑器——vi
  9. ThoughtWorks笔试题大致解题思路总结
  10. 蚂蚁金服java研发面经_蚂蚁金服Java研发岗实习内推面经