biopython中文指南

When you hear the word Biopython what is the first thing that came to your mind? A python library to handle biological data…? You are correct! Biopython provides a set of tools to perform bioinformatics computations on biological data such as DNA data and protein data. I have been using Biopython ever since I started studying bioinformatics and it has never let me down with its functions. It is an amazing library which provides a wide range of functions from reading large files with biological data to aligning sequences. In this article, I will introduce you to some basic functions of Biopython which can make implementations much easier with just a single call.

当您听到Biopython一词时，您想到的第一件事是什么？一个处理生物学数据的python库...？你是对的！ Biopython提供了一套工具，可对DNA数据和蛋白质数据等生物学数据进行生物信息学计算。自从我开始研究生物信息学以来，我就一直在使用Biopython，但是它从来没有让我失望过它的功能。它是一个了不起的库，它提供了广泛的功能，从读取带有生物学数据的大文件到比对序列。在本文中，我将向您介绍Biopython的一些基本功能，这些功能只需一次调用就可以使实现更加容易。

入门 (Getting started)

The latest version available when I’m writing this article is biopython-1.77 released in May 2020.

在我撰写本文时，可用的最新版本是2020年5月发布的biopython-1.77 。

You can install Biopython using pip

您可以使用pip安装Biopython

pip install biopython

or using conda.

或使用conda 。

conda install -c conda-forge biopython

You can test whether Biopython is properly installed by executing the following line in the python interpreter.

您可以通过在python解释器中执行以下行来测试Biopython是否已正确安装。

import Bio

If you get an error such as ImportError: No module named Bio then you haven’t installed Biopython properly in your working environment. If no error messages appear, we are good to go.

如果您收到诸如ImportError: No module named Bio类的错误，则说明您的工作环境中没有正确安装Biopython。如果没有错误消息出现，我们很好。

In this article, I will be walking you through some examples where Seq, SeqRecord and SeqIO come in handy. We will go through the functions that perform the following tasks.

在本文中，我将向您介绍一些示例，其中Seq ， SeqRecord和SeqIO会派上用场。我们将介绍执行以下任务的功能。

Creating a sequence创建一个序列
Get the reverse complement of a sequence获取序列的反补
Count the number of occurrences of a nucleotide计算核苷酸的出现次数
Find the starting index of a subsequence查找子序列的起始索引
Reading a sequence file读取序列文件
Writing sequences to a file将序列写入文件
Convert a FASTQ file to FASTA file将FASTQ文件转换为FASTA文件
Separate sequences by ids from a list of ids按ID从ID列表中分离序列

1.创建一个序列 (1. Creating a sequence)

To create your own sequence, you can use the Biopython Seq object. Here is an example.

要创建自己的序列，可以使用Biopython Seq对象。这是一个例子。

>>> from Bio.Seq import Seq>>> my_sequence = Seq("ATGACGTTGCATG")>>> print("The sequence is", my_sequence)The sequence is ATGACGTTGCATG>>> print("The length of the sequence is", len(my_sequence))The length of the sequence is 13

2.获得序列的反补 (2. Get the reverse complement of a sequence)

You can easily get the reverse complement of a sequence using a single function call reverse_complement().

您可以使用单个函数reverse_complement()轻松获得序列的反向补码。

>>> The reverse complement if the sequence is CATGCAACGTCAT

3.计算核苷酸的出现次数 (3. Count the number of occurrences of a nucleotide)

You can get the number of occurrence of a particular nucleotide using the count() function.

您可以使用count()函数获得特定核苷酸的出现count() 。

>>> print("The number of As in the sequence", my_sequence.count("A"))The number of As in the sequence 3

4.查找子序列的起始索引 (4. Find the starting index of a subsequence)

You can find the starting index of a subsequence using the find() function.

您可以使用find()函数find()序列的起始索引。

>>> print("Found TTG in the sequence at index", my_sequence.find("TTG"))Found TTG in the sequence at index 6

5.读取序列文件 (5. Reading a sequence file)

Biopython’s SeqIO (Sequence Input/Output) interface can be used to read sequence files. The parse() function takes a file (with a file handle and format) and returns a SeqRecord iterator. Following is an example of how to read a FASTA file.

Biopython的SeqIO (序列输入/输出)接口可用于读取序列文件。 parse()函数获取一个文件(具有文件句柄和格式)，并返回一个SeqRecord迭代器。以下是如何读取FASTA文件的示例。

from Bio import SeqIOfor record in SeqIO.parse("example.fasta", "fasta"):    print(record.id)

record.id will return the identifier of the sequence. record.seq will return the sequence itself. record.description will return the sequence description.

record.id将返回序列的标识符。 record.seq将返回序列本身。 record.description将返回序列描述。

6.将序列写入文件 (6. Writing sequences to a file)

Biopython’s SeqIO (Sequence Input/Output) interface can be used to write sequences to files. Following is an example where a list of sequences are written to a FASTA file.

Biopython的SeqIO (序列输入/输出)接口可用于将序列写入文件。以下是将序列列表写入FASTA文件的示例。

from Bio import SeqIOfrom Bio.SeqRecord import SeqRecordfrom Bio.Alphabet import generic_dnasequences = ["AAACGTGG", "TGAACCG", "GGTGCA", "CCAATGCG"]records = (SeqRecord(Seq(seq, generic_dna), str(index)) for index,seq in enumerate(sequences))with open("example.fasta", "w") as output_handle:    SeqIO.write(

This code will result in a FASTA file with sequence ids starting from 0. If you want to give a custom id and a description you can create the records as follows.

此代码将生成一个FASTA文件，其序列ID从0开始。如果要提供自定义ID和说明，可以按以下方式创建记录。

sequences = ["AAACGTGG", "TGAACCG", "GGTGCA", "CCAATGCG"]new_sequences = []i=1for     record = SeqRecord(    new_sequences.append(record)with open("example.fasta", "w") as output_handle:    SeqIO.write(

The SeqIO.write() function will return the number of sequences written.

SeqIO.write()函数将返回写入的序列数。

7.将FASTQ文件转换为FASTA文件 (7. Convert a FASTQ file to FASTA file)

We need to convert DNA data file formats in certain applications. For example, we can do file format conversions from FASTQ to FASTA as follows.

我们需要在某些应用程序中转换DNA数据文件格式。例如，我们可以按照以下步骤进行从FASTQ到FASTA的文件格式转换。

from Bio import SeqIOwith open("path/to/fastq/file.fastq", "r") as input_handle, open("path/to/fasta/file.fasta", "w") as output_handle:    sequences = SeqIO.parse(input_handle, "fastq")            count = SeqIO.write(sequences, output_handle, "fasta")        print("Converted %i records" % count)

If you want to convert a GenBank file to FASTA format,

如果要将GenBank文件转换为FASTA格式，

from Bio import SeqIOwith open("

    sequences = SeqIO.parse(input_handle, "genbank")    count = SeqIO.write(sequences, output_handle, "fasta")print("Converted %i records" % count)

8.将ID序列与ID列表分开 (8. Separate sequences by ids from a list of ids)

Assume that you have a list of sequence identifiers in a file named list.lst where you want to separate the corresponding sequences from a FASTA file. You can run the following and write those sequences to a file.

假设您有一个名为list.lst的文件中的序列标识符列表，您想在其中将相应的序列与FASTA文件分开。您可以运行以下命令，并将这些序列写入文件。

from Bio import SeqIOids = set(x[:-1] for x in open(path+"list.lst"))with open(path+'list.fq', mode='a') as my_output:

    for seq in SeqIO.parse(path+"list_sequences.fq", "fastq"):

        if seq.id in ids:             my_output.write(seq.format("fastq"))

最后的想法 (Final Thoughts)

Hope you got an idea of how to use Seq, SeqRecord and SeqIO Biopython functions and will be useful for your research work.

希望您对如何使用Seq ， SeqRecord和SeqIO Biopython函数有所了解，并且对您的研究工作很有用。

Thank you for reading. I would love to hear your thoughts. Stay tuned for the next part of this article with more usages and Biopython functions.

感谢您的阅读。我很想听听您的想法。请继续关注本文的下一部分，了解更多用法和Biopython函数。

Cheers, and stay safe!

干杯，保持安全！

翻译自: https://medium.com/computational-biology/newbies-guide-to-biopython-part-1-9ec82c3dfe8f

biopython中文指南

查看全文

http://www.taodudu.cc/news/show-4835527.html

等保二级和等保三级区别有哪些呢？
等保2.0流程解读
Ubuntu操作系统如何截图
Ubuntu截图和录屏方法
Ubuntu系统中如何进行屏幕截图
Ubuntu截图软件flameshot命令行下载
ubuntu之截图工具Flameshot
Python如何通过主函数调用类
python3主函数返回值_Python 详解基本语法_函数_返回值
Python学习笔记——常用基本语法
Photoshop 降噪滤镜 noiseware
Alien Skin Exposure X5 5.2.2.247 WinMac版 — 图片处理、调色滤镜插件
html如何自动调整边框大小,html5如何设置复选框大小
HTML14 按钮和多选框(DAY 52)
HTML中如何改变多选框的背景色,如何用css更改输入复选框的背景颜色？
HTML修改单选框多选框按钮样式
html 多选框点击事件,jquery/javascript:单击复选框上的事件和“checked”属性
HTML多选框美化和动效插件
html-下拉框多选
LCD中文字模编译前自动提取的方法
C# 提取字体点阵字模数据
字模生成/提取原理
【ESP32】15.OLED显示实验（SPI / 字模提取）
如何利用windows自带的矢量字库提取字模
51单片机点阵和取字模软件的使用方法（显示心形图案）
Nokia 5110字模提取
java8 date获取第一个星期几，最后一个星期日LocalDate
【Hive】Hive求所在周的第一天（周一），求所在月的第一天，求所在年的第一天
cron表达式指定每周几调度
JAVA格式化日期、时间，及获取每月第几周每周第几天每月第几天

biopython中文指南_Biopython新手指南-第1部分相关推荐

biopython中文指南_Biopython的列表和限制类型
这本食谱松散地使用了"列表"这个词.他们讨论的是一个包含有效酶名的列表,这些酶已经在import Bio.Restriction中定义.您可以使用以下工具列出所有这些工具(以及其他 ...
伪官宣：Envoy 中文指南新鲜出炉
点击上方蓝色"程序猿DD",选择"设为星标" 回复"资源"获取独家整理的学习资料! 前言 Envoy 是专为大型现代 SOA(面向服务架构) ...
Day19 - 摄像、拍照，滤镜中文指南
Day19 - 摄像.拍照,滤镜中文指南本文出自:春哥个人博客作者:©黎跃春-追时间的人简介:JavaScript30 是 Wes Bos 推出的一个 30 天挑战.项目免费提供了 30 个视频 ...
最新历史版本 :LINUX KERNEL 配置编译中文指南
LINUX KERNEL 配置编译中文指南序言近几年,linux大行其道,令不满windows蓝屏的使用者跃跃欲试,结果发现linux安装不及windows方便,界面不及windows友好,配置不 ...
MySQL Workbench 使用教程 - 如何使用 Workbench 操作 MySQL / MariaDB 数据库中文指南
MySQL Workbench 是一款专门为 MySQL 设计的可视化数据库管理软件,我们可以在自己的计算机上,使用图形化界面远程管理 MySQL 数据库. 有关 MySQL 远程管理软件,你可以选择 ...
HTML5视频教程，HTML5项目实战，HTML5中文指南，HTML5使用手册
HTML5视频教程,HTML5项目实战,HTML5中文指南,HTML5使用手册. 超过2G 的 HTML5 视频教程免费分享,免费下载! 尚硅谷前端HTML5视频_HTML & CSS 核心基 ...
Google Guava 中文指南
温馨提示:Guava 中文指南的 GitHub 地址为「guava-guide」,欢迎大家Star.Fork,纠错. Guava 中文指南 Guava 项目包含若干被 Google 的 Java 项目 ...
viper4android md,DCS F-16CM VIPER毒蛇中文指南 14.2AN/ARC-164 UHF无线电
#DCS数字战斗模拟# 教育模拟飞行请: DCS 中文指南 AN/ARC-164 UHF无线电(COM1) 组件教程(手动频率) 1.将COMM1 UHF无线电电源/音量旋钮设置为ON(按需音量) ...
太绝了！大佬总结的《PyCharm中文指南》开放下载
最近由云计算大佬明哥原创的 <PyCharm 中文指南>一书,开始火爆整个 Python 圈,发布仅一个月的时间,下载量就突破了 12000,如今在 Github 上收获了 800 的 s ...

biopython中文指南_Biopython新手指南-第1部分