二代测序数据分析软件包大全 Integrated solutions*CLCbio

Genomics Workbench-de

novoand

reference assembly of Sanger, Roche FLX, Illumina, Helicos, and

SOLiD data. Commercial next-gen-seq software that extends the

CLCbio Main Workbench software. Includes SNP detection, CHiP-seq,

browser and other features. Commercial. Windows, Mac OS X and

Linux.

*Galaxy-

Galaxy = interactive and reproducible genomics. A job

webportal.

*Genomatix-

Integrated Solutions for Next Generation Sequencing data

analysis.

*JMP

Genomics-

Next gen visualization and statistics tool from SAS. They

areworking with NCGRto

refine this tool and produce others.

*NextGENe-de

novoand

reference assembly of Illumina, SOLiD and Roche FLX data. Uses a

novel Condensation Assembly Tool approach where reads are joined

via "anchors" into mini-contigs before assembly. Includes SNP

detection, CHiP-seq, browser and other features. Commercial. Win or

MacOS.

*SeqMan

Genome Analyser-

Software for Next Generation sequence assembly of Illumina, Roche

FLX and Sanger data integrating with Lasergene Sequence Analysis

software for additional analysis and visualization capabilities.

Can use a hybrid templated/de novo approach. Commercial. Win or Mac

OS X.

*SHORE-

SHORE, for Short Read, is a mapping and analysis pipeline for short

DNA sequences produced on a Illumina Genome Analyzer. A suite

created by the 1001 Genomes project. Source for

POSIX.

*SlimSearch-

Fledgling commercial product.

Align/Assemble to a reference

*BFAST-

Blat-like Fast Accurate Search Tool. Written by Nils Homer, Stanley

F. Nelson and Barry Merriman at UCLA.

*Bowtie-

Ultrafast, memory-efficient short read aligner. It aligns short DNA

sequences (reads) to the human genome at a rate of 25 million reads

per hour on a typical workstation with 2 gigabytes of memory. Uses

a Burrows-Wheeler-Transformed (BWT) index.Link

to discussion thread here.

Written by Ben Langmead and Cole Trapnell. Linux, Windows, and Mac

OS X.

*BWA-

Heng Lee's BWT Alignment program - a progression from Maq. BWA is a

fast light-weighted tool that aligns short sequences to a sequence

database, such as the human reference genome. By default, BWA finds

an alignment within edit distance 2 to the query sequence. C++

source.

*ELAND-

Efficient Large-Scale Alignment of Nucleotide Databases. Whole

genome alignments to a reference genome. Written by Illumina author

Anthony J. Cox for the Solexa 1G machine.

*Exonerate-

Various forms of pairwise alignment (including

Smith-Waterman-Gotoh) of DNA/protein against a reference. Authors

are Guy St C Slater and Ewan Birney from EMBL. C for

POSIX.

*GenomeMapper-

GenomeMapper is a short read mapping tool designed for accurate

read alignments. It quickly aligns millions of reads either with

ungapped or gapped alignments. A tool created by the 1001 Genomes

project. Source for POSIX.

*GMAP-

GMAP (Genomic Mapping and Alignment Program) for mRNA and EST

Sequences. Developed by Thomas Wu and Colin Watanabe at Genentec.

C/Perl for Unix.

*gnumap-

The Genomic Next-generation Universal MAPper (gnumap) is a program

designed to accurately map sequence data obtained from

next-generation sequencing machines (specifically that of

Solexa/Illumina) back to a genome of any size. It seeks to align

reads from nonunique repeats using statistics. From authors at

Brigham Young University. C source/Unix.

*MAQ-

Mapping and Assembly with Qualities (renamed from MAPASS2).

Particularly designed for Illumina with preliminary functions to

handle ABI SOLiD data. Written by Heng Li from the Sanger Centre.

Features extensive supporting tools for DIP/SNP detection, etc. C++

source

*MOSAIK-

MOSAIK produces gapped alignments using the Smith-Waterman

algorithm. Features a number of support tools. Support for Roche

FLX, Illumina, SOLiD, and Helicos. Written by Michael Str?mberg at

Boston College. Win/Linux/MacOSX

*MrFAST and

MrsFAST-

mrFAST & mrsFAST are designed to map short reads generated with

the Illumina platform to reference genome assemblies; in a fast and

memory-efficient manner. Robust to INDELs and MrsFAST has a

bisulphite mode. Authors are from the University of Washington. C

as source.

*MUMmer-

MUMmer is a modular system for the rapid whole genome alignment of

finished or draft sequence. Released as a package providing an

efficient suffix tree library, seed-and-extend alignment, SNP

detection, repeat detection, and visualization tools. Version 3.0

was developed by Stefan Kurtz, Adam Phillippy, Arthur L Delcher,

Michael Smoot, Martin Shumway, Corina Antonescu and Steven L

Salzberg - most of whom are at The Institute for Genomic Research

in Maryland, USA. POSIX OS required.

*Novocraft-

Tools for reference alignment of paired-end and single-end Illumina

reads. Uses a Needleman-Wunsch algorithm. Can support Bis-Seq.

Commercial. Available free for evaluation, educational use and for

use on open not-for-profit projects. Requires Linux or Mac OS

X.

*PASS-

It supports Illumina, SOLiD and Roche-FLX data formats and allows

the user to modulate very finely the sensitivity of the alignments.

Spaced seed intial filter, then NW dynamic algorithm to a SW(like)

local alignment. Authors are from CRIBI in Italy.

Win/Linux.

*RMAP-

Assembles 20 - 64 bp Illumina reads to a FASTA reference genome. By

Andrew D. Smith and Zhenyu Xuan at CSHL. (published in BMC

Bioinformatics). POSIX OS required.

*SeqMap-

Supports up to 5 or more bp mismatches/INDELs. Highly tunable.

Written by Hui Jiang from the Wong lab at Stanford. Builds

available for most OS's.

*SHRiMP-

Assembles to a reference sequence. Developed with Applied

Biosystem's colourspace genomic representation in mind. Authors are

Michael Brudno and Stephen Rumble at the University of Toronto.

POSIX.

*Slider-

An application for the Illumina Sequence Analyzer output that uses

the probability files instead of the sequence files as an input for

alignment to a reference sequence or a set of reference sequences.

Authors are from BCGSC. Paper ishere.

*SOAP-

SOAP (Short Oligonucleotide Alignment Program). A program for

efficient gapped and ungapped alignment of short oligonucleotides

onto reference sequences. The updated version uses a BWT. Can call

SNPs and INDELs. Author is Ruiqiang Li at the Beijing Genomics

Institute. C++, POSIX.

*SSAHA-

SSAHA (Sequence Search and Alignment by Hashing Algorithm) is a

tool for rapidly finding near exact matches in DNA or protein

databases using a hash table. Developed at the Sanger Centre by

Zemin Ning, Anthony Cox and James Mullikin. C++ for

Linux/Alpha.

*SOCS-

Aligns SOLiD data. SOCS is built on an iterative variation of the

Rabin-Karp string search algorithm, which uses hashing to reduce

the set of possible matches, drastically increasing search speed.

Authors are Ondov B, Varadarajan A, Passalacqua KD and Bergman

NH.

*SWIFT-

The SWIFT suit is a software collection for fast index-based

sequence comparison. It contains: SWIFT — fast local alignment

search, guaranteeing to find epsilon-matches between two sequences.

SWIFT BALSAM — a very fast program to find semiglobal non-gapped

alignments based on k-mer seeds. Authors are Kim Rasmussen (SWIFT)

and Wolfgang Gerlach (SWIFT BALSAM)

*SXOligoSearch-

SXOligoSearch is a commercial platform offered by the Malaysian

basedSynamatix.

Will align Illumina reads against a range of Refseq RNA or NCBI

genome builds for a number of organisms. Web Portal. OS

independent.

*Vmatch-

A versatile software tool for efficiently solving large scale

sequence matching tasks. Vmatch subsumes the software tool REPuter,

but is much more general, with a very flexible user interface, and

improved space and time requirements. Essentially a large string

matching toolbox. POSIX.

*Zoom-

ZOOM (Zillions Of Oligos Mapped) is designed to map millions of

short reads, emerged by next-generation sequencing technology, back

to the reference genomes, and carry out post-analysis. ZOOM is

developed to be highly accurate, flexible, and user-friendly with

speed being a critical priority. Commercial. Supports Illumina and

SOLiD data.

De

novoAlign/Assemble

*ABySS-

Assembly By Short Sequences. ABySS is a de novo sequence assembler

that is designed for very short reads. The single-processor version

is useful for assembling genomes up to 40-50 Mbases in size. The

parallel version is implemented using MPI and is capable of

assembling larger genomes. By Simpson JT and others at the Canada's

Michael Smith Genome Sciences Centre. C++ as source.

*ALLPATHS-

ALLPATHS: De novo assembly of whole-genome shotgun microreads.

ALLPATHS is a whole genome shotgun assembler that can generate high

quality assemblies from short reads. Assemblies are presented in a

graph form that retains ambiguities, such as those arising from

polymorphism, thereby providing information that has been absent

from previous genome assemblies. Broad

Institute.

*Edena-

Edena (Exact DE Novo Assembler) is an assembler dedicated to

process the millions of very short reads produced by the Illumina

Genome Analyzer. Edena is based on the traditional overlap layout

paradigm. By D. Hernandez, P. Fran?ois, L. Farinelli, M. Osteras,

and J. Schrenzel. Linux/Win.

*EULER-SR-

Short readde

novoassembly.

By Mark J. Chaisson and Pavel A. Pevzner from UCSD (published in

Genome Research). Uses a de Bruijn graph

approach.

*MIRA2-

MIRA (Mimicking Intelligent Read Assembly) is able to perform true

hybrid de-novo assemblies using reads gathered through 454

sequencing technology (GS20 or GS FLX). Compatible with 454, Solexa

and Sanger data. Linux OS required.

*SEQAN-

A Consistency-based Consensus Algorithm for De Novo and

Reference-guided Sequence Assembly of Short Reads. By Tobias Rausch

and others. C++, Linux/Win.

*SHARCGS-

De novo assembly of short reads. Authors are Dohm JC, Lottaz C,

Borodina T and Himmelbauer H. from the Max-Planck-Institute for

Molecular Genetics.

*SSAKE-

The Short Sequence Assembly by K-mer search and 3' read Extension

(SSAKE) is a genomics application for aggressively assembling

millions of short nucleotide sequences by progressively searching

for perfect 3'-most k-mers using a DNA prefix tree. Authors are

René Warren, Granger Sutton, Steven Jones and Robert Holt from the

Canada's Michael Smith Genome Sciences Centre.

Perl/Linux.

*SOAPdenovo-

Part of the SOAP suite. See above.

*VCAKE-

De novo assembly of short reads with robust error correction. An

improvement on early versions of SSAKE.

*Velvet-

Velvet is a de novo genomic assembler specially designed for short

read sequencing technologies, such as Solexa or 454. Need about

20-25X coverage and paired reads. Developed by Daniel Zerbino and

Ewan Birney at the European Bioinformatics Institute

(EMBL-EBI).

SNP/Indel Discovery

*ssahaSNP-

ssahaSNP is a polymorphism detection tool. It detects homozygous

SNPs and indels by aligning shotgun reads to the finished genome

sequence. Highly repetitive elements are filtered out by ignoring

those kmer words with high occurrence numbers. More tuned for ABI

Sanger reads. Developers are Adam Spargo and Zemin Ning from the

Sanger Centre. Compaq Alpha, Linux-64, Linux-32, Solaris and

Mac

*PolyBayesShort-

A re-incarnation of the PolyBayes SNP discovery tool developed by

Gabor Marth at Washington University. This version is specifically

optimized for the analysis of large numbers (millions) of

high-throughput next-generation sequencer reads, aligned to whole

chromosomes of model organism or mammalian genomes. Developers at

Boston College. Linux-64 and Linux-32.

*PyroBayes-

PyroBayes is a novel base caller for pyrosequences from the 454

Life Sciences sequencing machines. It was designed to assign more

accurate base quality estimates to the 454 pyrosequences.

Developers at Boston College.

Genome Annotation/Genome Browser/Alignment Viewer/Assembly

Database

*EagleView-

An information-rich genome assembler viewer. EagleView can display

a dozen different types of information including base quality and

flowgram signal. Developers at Boston

College.

*LookSeq-

LookSeq is a web-based application for alignment visualization,

browsing and analysis of genome sequence data. LookSeq supports

multiple sequencing technologies, alignment sources, and viewing

modes; low or high-depth read pileups; and easy visualization of

putative single nucleotide and structural variation. From the

Sanger Centre.

*MapView-

MapView: visualization of short reads alignment on desktop

computer. From the Evolutionary Genomics Lab at Sun-Yat Sen

University, China. Linux.

*SAM-

Sequence Assembly Manager. Whole Genome Assembly (WGA) Management

and Visualization Tool. It provides a generic platform for

manipulating, analyzing and viewing WGA data, regardless of input

type. Developers are Rene Warren, Yaron Butterfield, Asim Siddiqui

and Steven Jones at Canada's Michael Smith Genome Sciences Centre.

MySQL backend and Perl-CGI web-based frontend/Linux.

*STADEN-

Includes GAP4. GAP5 once completed will handle next-gen sequencing

data. A partially implemented test version is availablehere

*XMatchView-

A visual tool for analyzing cross_match alignments. Developed by

Rene Warren and Steven Jones at Canada's Michael Smith Genome

Sciences Centre. Python/Win or Linux.

Counting e.g. CHiP-Seq, Bis-Seq, CNV-Seq

*BS-Seq-

The source code and data for the "Shotgun Bisulphite Sequencing of

the Arabidopsis Genome Reveals DNA Methylation Patterning" Nature

paper byCokus et al.(Steve

Jacobsen's lab at UCLA). POSIX.

*CHiPSeq-

Program used by Johnson et al. (2007) in their Science

publication

*CNV-Seq-

CNV-seq, a new method to detect copy number variation using

high-throughput sequencing. Chao Xie and Martti T Tammi at the

National University of Singapore. Perl/R.

*FindPeaks-

perform analysis of ChIP-Seq experiments. It uses a naive algorithm

for identifying regions of high coverage, which represent Chromatin

Immunoprecipitation enrichment of sequence fragments, indicating

the location of a bound protein of interest. Original algorithm by

Matthew Bainbridge, in collaboration with Gordon Robertson. Current

code and implementation by Anthony Fejes. Authors are from the

Canada's Michael Smith Genome Sciences Centre. JAVA/OS independent.

Latest versions available as part of theVancouver Short Read Analysis

Package

*MACS-

Model-based Analysis for ChIP-Seq. MACS empirically models the

length of the sequenced ChIP fragments, which tends to be shorter

than sonication or library construction size estimates, and uses it

to improve the spatial resolution of predicted binding sites. MACS

also uses a dynamic Poisson distribution to effectively capture

local biases in the genome sequence, allowing for more sensitive

and robust prediction. Written by Yong Zhang and Tao Liu from

Xiaole Shirley Liu's Lab.

*PeakSeq-

PeakSeq: Systematic Scoring of ChIP-Seq Experiments Relative to

Controls. a two-pass approach for scoring ChIP-Seq data relative to

controls. The first pass identifies putative binding sites and

compensates for variation in the mappability of sequences across

the genome. The second pass filters out sites that are not

significantly enriched compared to the normalized input DNA and

computes a precise enrichment and significance. By Rozowsky J et

al. C/Perl.

*QuEST-

Quantitative Enrichment of Sequence Tags. Sidow and Myers Labs at

Stanford. From the 2008 publicationGenome-wide analysis of transcription factor binding

sites based on ChIP-Seq data.

(C++)

*SISSRs-

Site Identification from Short Sequence Reads. BED file input. Raja

Jothi @ NIH. Perl.

**See alsothis

threadfor

ChIP-Seq, until I get time to update this

list.

Alternate Base Calling

*Rolexa-

R-based framework for base calling of Solexa data.

Projectpublication

*Alta-cyclic-

"a novel Illumina Genome-Analyzer (Solexa) base

caller"

Transcriptomics

*ERANGE-

Mapping and Quantifying Mammalian Transcriptomes by RNA-Seq.

Supports Bowtie, BLAT and ELAND. From the Wold

lab.

*G-Mo.R-Se-

G-Mo.R-Se is a method aimed at using RNA-Seq short reads to build

de novo gene models. First, candidate exons are built directly from

the positions of the reads mapped on the genome (without any ab

initio assembly of the reads), and all the possible splice

junctions between those exons are tested against unmapped reads.

From CNS in France.

*MapNext-

MapNext: A software tool for spliced and unspliced alignments and

SNP detection of short sequence reads. From the Evolutionary

Genomics Lab at Sun-Yat Sen University,

China.

*QPalma-

Optimal Spliced Alignments of Short Sequence Reads. Authors are

Fabio De Bona, Stephan Ossowski, Korbinian Schneeberger, and Gunnar

R?tsch. A paper isavailable.

*RSAT-

RSAT: RNA-Seq Analysis Tools. RNASAT is developed and maintained by

Hui Jiang at Stanford University.

*TopHat-

TopHat is a fast splice junction mapper for RNA-Seq reads. It

aligns RNA-Seq reads to mammalian-sized genomes using the ultra

high-throughput short read aligner Bowtie, and then analyzes the

mapping results to identify splice junctions between exons. TopHat

is a collaborative effort between the University of Maryland and

the University of California, Berkeley

转载自:http://blog.163.com/luyiming_1986@126/blog/static/151141532201122494757719/

二代测序数据预处理与分析

常使用的工具列表

质量控制Quality Control:FastQC、Fastx-toolkit

拼接Aligner:BWA,Bowtie, Tophat, SOAP2

Mapper:Tophat, Cufflinks

基因定量 Gene Quantification: Cufflinks, Avadis NGS

质量改进 Quality improvement: Genome Analysis Toolkit(GATK)

SNP: Unified Genotyper,Glfmultiple, SAMtools, Avadis NGS

CNV: CNVnator

Indel: Pindel, Dindel, Unified Genotyper, Avadis NGS

Mapping to a gene: Cufflinks, Rsamtools, Genomic Features

相关的数据格式

FASTQ:

SAM: A generic nucleotide alignment format

BAM: binary format

VCF

数据处理的流程

转载自:http://www.dxy.cn/bbs/thread/23163706#23163706

http://boyun.sh.cn/bio/?p=1862

二代测序linux软件,二代测序数据分析软件包大全相关推荐

  1. 第三代测序成本偏高是什么原因导致的? 是看了这道题下面的邹捷萌回答:现在基因测序的瓶颈主要在哪里?精度?速度? 在精确度方面第三代测序已经很高了,但目前国内生物实验室的测序还是以二代为主,推测成本可能

    第三代测序成本偏高是什么原因导致的? 是看了这道题下面的邹捷萌回答:现在基因测序的瓶颈主要在哪里?精度?速度? 在精确度方面第三代测序已经很高了,但目前国内生物实验室的测序还是以二代为主,推测成本可能 ...

  2. DNA测序,第一代DNA测序,第二代DNA测序,第三代DNA测序,sanger法

    视频地址: DNA测序,第一代DNA测序,第二代DNA测序,第三代DNA测序,sanger法测序,gilbert法测序--分子生物学实验教程,生物化学实验教程 关注 DNA测序(A,T,C,G),DN ...

  3. 二代测序组装PK三代测序组装

    二代测序组装PK三代测序组装 2016-07-29    编辑:诺禾致源 三代Pacbio测序技术 以其长读长,无需扩增,无GC偏好性等优势成为de novo组装的新宠儿. 然而,Pacbio测序成本 ...

  4. 二代测序技术之illumina测序技术原理简介

    现今的生信领域几乎就是和无数的序列打交道,而这些序列的来源就是如今风靡的高通量测序技术,现今的测序不论是测RNA.DNA.miRNA还是ChIP-Seq等等,都是基于NGS(二代测序,next-gen ...

  5. 三代测序纠错软件汇总篇

    三代测序纠错软件汇总篇 原创: 李海滨 诺禾科服 2017-12-21 在之前推出的一篇微信中,已经介绍过了三代测序下机数据"三代全长转录组测序常见问题说明".那么我们拿到数据后是 ...

  6. Linux学习之CentOS(二十三)--Linux软件管理之源代码以及RPM软件包管理

    在Linux系统下,对于软件包的管理有多种机制,有源代码方式.RPM软件包管理方式以及YUM软件管理方式,本篇随笔将详细讲解CentOS下源代码形式安装软件以及RPM软件包管理机制 一.源代码形式 首 ...

  7. kail linux安装软件提示“无法定位软件包”解决方法

    kail linux安装软件提示"无法定位软件包"解决方法 参考文章: (1)kail linux安装软件提示"无法定位软件包"解决方法 (2)https:// ...

  8. linux本地yum源与软件包管理,【Linux系统中的】本地yum源的搭建与使用yum源进行软件的下载...

    前言: 当我们在使用Linux系统时,有时需要一些辅助工具帮助我们实现某项功能, 例如画图.wps.等功能,而这些功能并不是系统自带的,需要我们下载相关的 软件,那如何去下载这些软件那,这里我们不得不 ...

  9. Linux的软件包封装格式有,linux软件安装包详解---全

    详细介绍了常见的四种Linux应用软件安装包及其安装方法. 一.解析Linux应用软件安装包,通常Linux应用软件的安装包有四种: 1) tar包,如software-1.2.3-1.tar.gz. ...

最新文章

  1. php 类中的变量的定义
  2. Egret之eui.Scroller
  3. linux下的文本编辑
  4. 这 8 篇文章告诉你:未来的软件研发是怎样的?
  5. make[1]: *** [objs/Makefile:445: objs/src/core/ngx_murmurhash.o] Error
  6. web.xml:url-pattern
  7. 映美精双目相机无法同时显示的问题
  8. Git上一些不错的项目
  9. 组织机构、权限、角色设计
  10. 山东计算机考研909,山东大学2018年计算机考研909数据结构考试大纲
  11. linux下下载fnl数据,「技术讲堂第二期」|不用到处找,FNL数据直接用!
  12. 菜鸡记录之初试自动更新,源码及出现的问题
  13. [bzoj3698]XWW的难题
  14. archetype-catalog
  15. 3.Java流程控制语句
  16. webpack之打包library
  17. Irrlicht学习笔记(5)--UserInterface
  18. 华为HCIP云计算考证心得
  19. 计算机网络管理 常见的计算机网络管理工具snmputil,Mib browser,SNMPc管理软件的功能和异同
  20. java反编译;将class变成java;利用idea进行反编译

热门文章

  1. ModuleNotFoundError: No module named ‘_lzma‘
  2. 怎样开发客户管理系统
  3. 2013-2017:中国 CV(计算机视觉)公司恩仇录
  4. antd Modal
  5. 微信怎么更改绑定的游戏服务器,注意啦!微信号可以改了!这里还有一个新功能...
  6. 刷脸支付服务商巧借东风顺势而为
  7. 给2021金三银四的程序员们-投简历100份,1份面试通知都没收到,哪里出了问题
  8. Vulkan Samples 阅读 -- Extensions(二)
  9. 海外直播互动怎么做?如何活跃用户?
  10. 【C语言】指针Pointer初阶(1)