利用R包ggmsa进行多序列比对_2020-05-31
## 1.设置当前工作目录
setwd("./ggmsa")
## 2.安装和导入R包
# install.packages("ggmsa")
library(ggmsa)
library(ggplot2)
## 3.R包简要信息
help(package = "ggmsa")
# Package: ggmsa
# Title: Plot Multiple Sequence Alignment using 'ggplot2'
# Version: 0.0.4
# Authors@R: c( person("Guangchuang", "Yu", email = "guangchuangyu@gmail.com", role = c("aut", "cre"), comment = c(ORCID = "0000-0002-6485-8781")),
# person("Lang", "Zhou", email = "nyzhoulang@gmail.com", role = "aut"),
# person("Huina", "Huang", email = "1185796994@qq.com", role = "ctb"))
# Description: Supports visualizing multiple sequence alignment of DNA and protein sequences using 'ggplot2'. It supports a number of colour schemes, including Chemistry, Clustal, Shapely, Taylor and Zappo. Multiple sequence alignment can easily be combined with other 'ggplot2' plots, such as aligning a phylogenetic tree produced by 'ggtree' with multiple sequence alignment.
# Depends: R (>= 3.5.0)
# Imports: Biostrings, ggplot2, magrittr, tidyr, utils, stats, stringr
# Suggests: ape, cowplot, ggtree, knitr, methods, seqmagick
# License: Artistic-2.0
# Encoding: UTF-8
# LazyData: true
# RoxygenNote: 7.1.0
# VignetteBuilder: knitr
# NeedsCompilation: no
# Packaged: 2020-05-28 08:15:32 UTC; ygc
# Author: Guangchuang Yu [aut, cre] (<https://orcid.org/0000-0002-6485-8781>),
# Lang Zhou [aut],
# Huina Huang [ctb]
# Maintainer: Guangchuang Yu <guangchuangyu@gmail.com>
# Repository: CRAN
# Date/Publication: 2020-05-28 10:50:10 UTC
# Built: R 3.6.3; ; 2020-05-29 14:03:22 UTC; windows
ls(package:ggmsa)
# [1] "available_colors" "available_fonts" "available_msa"
# [4] "facet_msa" "geom_asterisk" "geom_GC"
# [7] "geom_msa" "geom_seed" "geom_seqlogo"
# [10] "ggmotif" "ggmsa" "tidy_msa"
## 4.测试
# Plot multiple sequence alignment using ggplot2 with multiple color schemes supported.
# Supports visualizing multiple sequence alignment of DNA and protein sequences using ggplot2 It supports a number of colour schemes, including Chemistry, Clustal, Shapely, Taylor and Zappo. Multiple sequence alignment can easily be combined with other ‘ggplot2’ plots, such as aligning a phylogenetic tree produced by ‘ggtree’ with multiple sequence alignment.
### 4.1 Load sample data
# Three sample data are shipped with the ggmsa package. Note that ggmsa supports not only fasta files but other objects as well. available_msa()can be used to list MSA objects currently available.
available_msa()
# files currently available:
# .fasta
# XStringSet objects from 'Biostrings' package:
# DNAStringSet RNAStringSet AAStringSet BStringSet DNAMultipleAlignment RNAMultipleAlignment AAMultipleAlignment
# bin objects from 'seqmagick' package:
# DNAbin AAbin
protein_sequences <- system.file("extdata", "sample.fasta", package = "ggmsa")
miRNA_sequences <- system.file("extdata", "seedSample.fa", package = "ggmsa")
nt_sequences <- system.file("extdata", "LeaderRepeat_All.fa", package = "ggmsa")
path.package("ggmsa")
# [1] "C:/Users/lenovo/Documents/R/win-library/3.6/ggmsa"
# Visualizing Multiple Sequence Alignments #
### 4.2 The most simple code to use ggmsa:
?ggmsa
#@ 简单绘制
ggmsa(protein_sequences, start = 265, end = 300)
#@ 调整参数,实现个性化绘制多序列比对图
ggmsa(protein_sequences, start = 265, end = 300, font = "TimesNewRoman", color = "Clustal", char_width = 0.8, none_bg = T, seq_name = T)
ggmsa(protein_sequences, start = 265, end = 300, font = "TimesNewRoman", color = "Chemistry_AA", char_width = 0.8, none_bg = F)
# Colour Schemes #
available_colors()
# color schemes for nucleotide sequences currently available:
# Chemistry_NT Shapely_NT Taylor_NT Zappo_NT
# color schemes for AA sequences currently available:
# Clustal Chemistry_AA Shapely_AA Zappo_AA Taylor_AA
### 4.3 Clustal X Colour Scheme(Default)
#@ This is an emulation of the default colourscheme used for alignments in Clustal X, a graphical interface for the ClustalW multiple sequence alignment program. Each residue in the alignment is assigned a colour if the amino acid profile of the alignment at that position meets some minimum criteria specific for the residue type.
ggmsa(protein_sequences, start = 320, end = 360, color = "Clustal")
### 4.4 Color by Chemistry
#@ Amino acids are colored according to their side chain chemistry:
ggmsa(protein_sequences, start = 320, end = 360, color = "Chemistry_AA")
### 4.5 Color by Shapely
#@ This color scheme matches the RasMol amino acid and RasMol nucleotide color schemes, which are, in turn, based on Robert Fletterick’s “Shapely models”.
ggmsa(protein_sequences, start = 320, end = 360, color = "Shapely_AA")
### 4.6 Color by Taylor
#@ This color scheme is taken from Taylor(Taylor 1997) and is also used in JalView(Waterhouse et al. 2009).
ggmsa(protein_sequences, start = 320, end = 360, color = "Taylor_AA")
### 4.7 Color by Zappo
#@ This scheme colors residues according to their physico-chemical properties, and is also used in JalView(Waterhouse et al. 2009).
ggmsa(protein_sequences, start = 320, end = 360, color = "Zappo_AA")
### 4.8 Font
#@ Several classic font for MSA are shipped in the package. In the same ways, you can use available_fonts() to list font currently available
available_fonts()
# font families currently available:
# helvetical mono TimesNewRoman DroidSansMono
# helvetical
ggmsa(protein_sequences, start = 320, end = 360, font = "helvetical", color = "Chemistry_AA")
# TimesNewRoman
ggmsa(protein_sequences, start = 320, end = 360, font = "TimesNewRoman", color = "Chemistry_AA")
# DroidSansMono
ggmsa(protein_sequences, start = 320, end = 360, font = "DroidSansMono", color = "Chemistry_AA")
#@ If you specify font = NULL, only tiles will be plot.
ggmsa(protein_sequences, start = 320, end = 360, font = NULL, color = "Chemistry_AA", seq_name = F)
ggmsa(protein_sequences, start = 320, end = 360, font = NULL, color = "Chemistry_AA", seq_name = T)
### 4.9 Characters width
#@ Characters width can be specified by char_width. Defaults is 0.9.
ggmsa(protein_sequences, start = 320, end = 360, char_width = 0.5, color = "Chemistry_AA")
### 4.10 Background
#@ Background can be specified by none_bg. If none_bg = TRUE, only the character will be plot.
ggmsa(protein_sequences, start = 320, end = 360, none_bg = TRUE) + theme_void()
### 4.11 Position Highligthed
#@ Position Highligthed can be specified by posHighligthed. The none_bg = FALSE when you specified position Highligthed by posHighligthed
# 不连续高亮
ggmsa(protein_sequences, 164, 213, color = "Chemistry_AA",
posHighligthed = c(185, 190))
ggmsa(protein_sequences, 164, 213, color = "Chemistry_AA", posHighligthed = c(180, 190, 200))
# 连续高亮
ggmsa(protein_sequences, 164, 213, color = "Chemistry_AA",
posHighligthed = c(180:200))
### 4.12 Sequence names
#@ Sequence names Defaults is ‘NULL’ which indicates that the sequence name is displayed when font = NULL, but ‘font = char’ will not be displayed. If seq_name = TRUE the sequence name will be displayed when you need it.
ggmsa(protein_sequences, 164, 213, color = "Chemistry_AA", seq_name = TRUE)
#2 If seq_name = FALSE the sequence name will not be displayed in any case.
ggmsa(protein_sequences, 164, 213, font = NULL, color = "Chemistry_AA", seq_name = FALSE)
## 5.结束
# RUNRPTEST("./ggmsa", rpackage = "ggmsa",install_method = "website", rpackage_repository = "cran")
sessionInfo()
# R version 3.6.3 (2020-02-29)
# Platform: x86_64-w64-mingw32/x64 (64-bit)
# Running under: Windows 10 x64 (build 18363)
#
# Matrix products: default
#
# locale:
# [1] LC_COLLATE=Chinese (Simplified)_China.936
# [2] LC_CTYPE=Chinese (Simplified)_China.936
# [3] LC_MONETARY=Chinese (Simplified)_China.936
# [4] LC_NUMERIC=C
# [5] LC_TIME=Chinese (Simplified)_China.936
#
# attached base packages:
# [1] stats graphics grDevices utils datasets methods
# [7] base
#
# other attached packages:
# [1] ggplot2_3.3.0 ggmsa_0.0.4
#
# loaded via a namespace (and not attached):
# [1] Rcpp_1.0.4.6 pillar_1.4.4 compiler_3.6.3
# [4] XVector_0.26.0 tools_3.6.3 zlibbioc_1.32.0
# [7] digest_0.6.25 packrat_0.5.0 lifecycle_0.2.0
# [10] tibble_3.0.1 gtable_0.3.0 pkgconfig_2.0.3
# [13] rlang_0.4.6 rstudioapi_0.11 seqmagick_0.1.3
# [16] parallel_3.6.3 withr_2.2.0 dplyr_0.8.5
# [19] stringr_1.4.0 Biostrings_2.54.0 S4Vectors_0.24.3
# [22] vctrs_0.3.0 IRanges_2.20.2 stats4_3.6.3
# [25] grid_3.6.3 tidyselect_1.1.0 glue_1.4.1
# [28] R6_2.4.1 purrr_0.3.4 tidyr_1.1.0
# [31] farver_2.0.3 magrittr_1.5 scales_1.1.1
# [34] ellipsis_0.3.1 BiocGenerics_0.32.0 assertthat_0.2.1
# [37] colorspace_1.4-1 labeling_0.3 stringi_1.4.6
# [40] munsell_0.5.0 crayon_1.3.4
#@ 两篇参考文献,有兴趣的同学读一下
# Taylor, W R. 1997. “Residual Colours: A Proposal for Aminochromography.” Protein Eng 10 (7): 743–46.
# Waterhouse, A. M., J. B. Procter, D. M. Martin, M Clamp, and G. J. Barton. 2009. “Jalview Version 2–a Multiple Sequence Alignment Editor and Analysis Workbench.” Bioinformatics 25 (9): 1189.
利用R包ggmsa进行多序列比对_2020-05-31相关推荐
- RNA-seq流程学习笔记(14)-在windows10平台上利用R包合并表达矩阵、设置实验分组信息、列名及数据的导入导出
参考文章: RNA-seq(6): reads计数,合并矩阵并进行注释 进入R学习和相关操作后,各种折腾已经快2周了,看了几个网站的教程,借了几本教科书,发现都是零零碎碎的知识.索性就按照之前的方法, ...
- 包r语言_R语言代码共享:制作R包
作者:黄天元,复旦大学博士在读,热爱数据科学与开源工具(R),致力于利用数据科学迅速积累行业经验优势和科学知识发现,涉猎内容包括但不限于信息计量.机器学习.数据可视化.应用统计建模.知识图谱等,著有& ...
- [R]指令总结-Rstudio,R版本,R包
Rstudio是R的IDE,先安装R,再安装RStudio 参考链接:下载和安装R.RStudio [安装更新R版本] [参考]http://blog.leanote.com/post/qiukain ...
- R包ggseqlogo |绘制序列分析图
简介 在生物信息分析中,经常会做序列分析图(sequence logo),这里的序列指的是核苷酸(DNA/RNA链中)或氨基酸(在蛋白质序列中).sequence logo图是用来可视化一段序列某个位 ...
- 利用R语言irr包计算ICC值(组内相关系数)
ICC值是一个较为陌生的概念,在统计学中应用较多,引用百度百科的介绍: 组内相关系数(ICC)是衡量和评价观察者间信度(inter-observer reliability)和复测信度(test-re ...
- 河北大学生命科学学院期末Biopython编程实践。利用Biopython包读取新冠病毒(id:NC_045512.2)序列(GenBank格式NC_045512.2.gb),将其中FEATURE的类
计算机高级语言编程实践报告 2021年1月 基本要求: 1 学习python的Biopython包的使用(Seq,SeqRecord,SeqFeature三个class和SeqIO模块).编写代码,完 ...
- R语言中利用jiebaR包实现中文分词
文章目录 介绍 worker()函数介绍 参数介绍 new_user_word()函数介绍 参数介绍 freq()函数介绍 实例 利用默认库进行分词 利用自定义词库进行分割 通过文本文件添加用户自定义 ...
- Microbiome:animalcules-交互式微生物组分析和可视化的R包
animalcules-交互式微生物组分析和可视化的R包 animalcules: interactive microbiome analytics and visualization in R Mi ...
- 16S预测宏基因组最强R包-Tax4Fun
之前在公众号的文章<根据16S预测微生物群落功能最全攻略>阅读人数近3000人,有需求的用户还是非常多的.其中提到了4个软件,之前已经介绍了其中非常有特点的三种,分别为: - PICRUS ...
最新文章
- js 多维数组长度_C++申请与释放动态数组1(学习笔记:第6章 16)
- CUDA高性能计算经典问题:前缀和
- Flink 在有赞的实践和应用
- 不认识java代码_程序员进阶:优雅的代码对于一个架构师的重要性
- 大疆 机器学习 实习生_我们的数据科学机器人实习生
- 计算机一般的应用课题方向,计算机类哪个方向的课题好立项
- Terminal中输入一行命令快速移动光标至行首行尾
- 添加多个tomcat服务目录
- Mr.J-- jQuery学习笔记(十五)--实现页面的对联广告
- request一些常用方法
- 银行不放款可以换银行贷款吗?
- 计算机房网络布线培训方案,网络工程综合布线实训授课计划.doc
- 小程序加入人脸识别_微信小程序实现人脸识别
- 联想服务器加装显卡无显示,Lenovo双显卡机型安装显卡驱动方案汇总
- 统计推断——假设检验——检验的功效(势)
- 飞链云创始人受CSDN邀请,参与元宇宙创富交流会
- 一个面试我的后端妹子问的405错误
- 什么是浮动、为什么要清除浮动
- android——利用gradle实现多渠道打包并自定义包名(umeng多渠道)
- 什么是车联网,IoV(Internet of Vehicles)
热门文章
- java调用ltp_LTP随笔——本地调用ltp之ltp4j
- 用批处理文件实现同步到个人时间服务器,局域网内时间同步net time的使用
- 烂笔头投资日记20221212
- linaro公司:交叉编译器 arm-linux-gnueabi 和 arm-linux-gnueabihf 的区别
- python电化学cv曲线怎么画_Maya创建NURBS曲线:CV曲线工具详解
- 微机保护装置智能操控及无线测温等产品在某助剂厂新建项目的应用
- P3426 [POI2005]SZA-Template
- 【论文笔记】D2A U-Net: Automatic segmentation of COVID-19 CT slices based on dual attention and hybrid di
- linux外置光驱调速,买内置不如买外置 五款外置光驱推荐
- 虚拟机无法在更新服务器,今win10更新导致VMware workstation pro无法打开的解决方法...