Volume 12 Supplement 11

Highlights from the Seventh International Society for Computational Biology (ISCB) Student Council Symposium 2011

Open Access

Oqtans: a Galaxy-integrated workflow for quantitative transcriptome analysis from NGS Data

  • Sebastian J Schultheiss1Email author,
  • Géraldine Jean1,
  • Jonas Behr1,
  • Regina Bohnert1,
  • Philipp Drewe1,
  • Nico Görnitz1, 2,
  • André Kahles1,
  • Pramod Mudrakarta1,
  • Vipin T Sreedharan1,
  • Georg Zeller1, 3 and
  • Gunnar Rätsch1
BMC Bioinformatics201112(Suppl 11):A7

DOI: 10.1186/1471-2105-12-S11-A7

Published: 21 November 2011

Background

The current revolution in sequencing technologies allows us to obtain a much more detailed picture of transcriptomes via RNA-Sequencing. We have developed the first integrative online platform, oqtans, for quantitatively analyzing RNA-Seq experiments. Our approach of providing a self-contained machine image with the accessible, transparent Galaxy framework [1] minimizes the risk of using a third-party web service for data analysis. These services often disappear a few years after publication and render results irreproducible [2]. With oqtans, bioinformatics becomes reproducible by providing analysis building blocks for a customized workflow of read mapping, transcript reconstruction and quantitation as well as differential expression analysis.

Method

Oqtans includes a comprehensive machine-learning-powered toolsuite developed by the authors for NGS data analysis. PALMapper is a short-read mapper which efficiently computes both unspliced and spliced alignments at high accuracy by taking advantage of base quality information and computational splice site predictions [3]. mTIM is a transcript reconstruction method, which exploits features derived from RNA-seq read alignments and from computational splice site predictions to infer the exon-intron structure of the corresponding transcripts. rQuant is based on quadratic programming. It simultaneously estimates biases inherent in library preparation, sequencing, and read mapping, and accurately determines the abundances of given transcripts [4]. rDiff is a set of statistical test techniques that determine significant differences between two RNA-seq experiments to find differentially expressed regions with or without knowledge of transcripts.

Results

We compare predictions to the published annotation at the intron and transcript levels. The performance of read aligners is shown in Fig. 1A from D. melanogaster data, and transcript segmentation tools in Fig. 1B, on C. elegans. Our tools, PALMapper and mTIM, outperform TopHat [5] and Cufflinks [6]. Oqtans is available free and open-source, from http://oqtans.org as a virtual machine for cloud computing environments, and ready to use on our public compute cluster at http://bioweb.me/mlb-galaxy.
https://static-content.springer.com/image/art%3A10.1186%2F1471-2105-12-S11-A7/MediaObjects/12859_2011_Article_4894_Fig1_HTML.jpg
Figure 1

A) Accuracy (F-score) of intron predictions in 3-day-old adults of D. melanogaster with aligners PALMapper (green) and TopHat (blue). B) Accuracy of intron predictions with the same aligners and transcript predictions with mTIM (green) and Cufflinks (blue) on C. elegans RNA-seq transcriptome data.

Authors’ Affiliations

(1)
Machine Learning in Biology Group, Friedrich Miescher Laboratory of the Max Planck Society
(2)
Department of Software Engineering and Theoretical Computer Science, Technical University Berlin
(3)
Structural and Computational Biology Unit, European Molecular Biology Laboratory

References

  1. Goecks J, Nekrutenko A, Taylor J: Galaxy: a comprehensive approach for supporting accessible, reproducible, and transparent computational research in the life sciences. Genome biology 2010, 11(8):R86. 10.1186/gb-2010-11-8-r86PubMed CentralView ArticlePubMedGoogle Scholar
  2. Schultheiss SJ, Münch MC, Andreeva GD, Rätsch G: Persistence and Availability of Web Services in Computational Biology. PLoS computational biology 2011, 6(9):e24914.Google Scholar
  3. Jean G, Kahles A, Sreedharan VT, De Bona F, Ratsch G: RNA-Seq read alignments with PALMapper. In Current protocols in bioinformatics Edited by: Andreas D Baxevanis [et al]. 2010. Chapter 11:Unit 11 16 Chapter 11:Unit 11 16Google Scholar
  4. Bohnert R, Ratsch G: rQuant.web: a tool for RNA-Seq-based transcript quantitation. Nucleic acids research 2010, 38(Web Server):W348–351. 10.1093/nar/gkq448PubMed CentralView ArticlePubMedGoogle Scholar
  5. Trapnell C, Pachter L, Salzberg SL: TopHat: discovering splice junctions with RNA-Seq. Bioinformatics 2009, 25(9):1105–1111. 10.1093/bioinformatics/btp120PubMed CentralView ArticlePubMedGoogle Scholar
  6. Roberts A, Trapnell C, Donaghey J, Rinn JL, Pachter L: Improving RNA-Seq expression estimates by correcting for fragment bias. Genome biology 2011, 12(3):R22. 10.1186/gb-2011-12-3-r22PubMed CentralView ArticlePubMedGoogle Scholar

Copyright

© Schultheiss et al; licensee BioMed Central Ltd. 2011

This article is published under license to BioMed Central Ltd. This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Advertisement