Transcript quantification with RNA-Seq data

Bohnert, Regina; Behr, Jonas; Rätsch, Gunnar

doi:10.1186/1471-2105-10-S13-P5

Volume 10 Supplement 13

Highlights from the Fifth International Society for Computational Biology (ISCB) Student Council Symposium

Poster presentation
Open access
Published: 19 October 2009

Transcript quantification with RNA-Seq data

Regina Bohnert¹,
Jonas Behr¹ &
Gunnar Rätsch¹

BMC Bioinformatics volume 10, Article number: P5 (2009) Cite this article

7175 Accesses
9 Citations
Metrics details

Motivation

Novel high-throughput sequencing technologies open exciting new approaches to transcriptome profiling. Sequencing transcript populations of interest, e.g. from different tissues or variable stress conditions, with RNA sequencing (RNA-Seq) [1] generates millions of short reads. Accurately aligned to a reference genome, they provide digital counts and thus facilitate transcript quantification. As the observed read counts only provide the summation of all expressed sequences at one locus, the inference of the underlying transcript abundances is crucial for further quantitative analyses.

Methods

To approach this problem, we have developed a new technique, called rQuant, based on quadratic programming. Given a gene annotation and position-wise exon/intron read coverage from read alignments, we determine the abundances for each annotated transcript by minimising a suitable loss function. It penalises the deviation of the observed from the expected read coverage given the transcript weights. The observed read coverage is typically non-uniformly distributed over the transcript due to several biases in the generation of the sequencing libraries and the sequencing. This leads to distortions of the transcript abundances, if not corrected properly. We therefore extended our approach to jointly optimise transcript profiles, modeling the coverage deviations depending on the position in the transcript. Our method can be applied without knowledge of the underlying transcript abundances and equally benefits from loci with and without alternative transcripts.

Results

To quantitatively evaluate the quality of our abundance predictions, we used a set of simulated reads from transcripts with known expression as a benchmark set. It was generated using the Flux Simulator [2] modeling biases in RNA-Seq as well as preparation experiments. Table 1 shows preliminary results with segment- and position-based loss as well as with and without the transcript profiles. Our results indicate that the position-based modeling together with transcript profiles allows us to accurately infer the underlying expression of single transcripts as well as of multiple isoforms of one gene locus.

Table 1 Correlation of underlying expression level and inferred abundances for different approaches

Full size table

Conclusion

Our preliminary results show that modeling the transcript profiles can significantly improve the accuracy of transcript abundance estimates from RNA-Seq data. However, the described and other recent approaches [3, 4] for transcript quantification with RNA-Seq rely on annotated gene structures. As most genome annotations are incomplete, they cannot reveal and quantify novel and also (novel) alternative transcripts. Nevertheless, rQuant can be extended to quantify de novo transcripts by combining it with a gene finding system such as mGene [5].

Revealing and quantifying novel alternative transcripts with the powerful tool of RNA-Seq will be a fundamental step towards a deeper understanding of RNA transcript regulation.

References

Wang Z, Gerstein M, Snyder M: RNA-Seq: a revolutionary tool for transcriptomics. Nature Reviews Genetics 2009, 10: 57–63. 10.1038/nrg2484
Article PubMed Central CAS PubMed Google Scholar
Sammeth M: Flux Simulator.2009. [http://flux.sammeth.net/simulator.html]
Google Scholar
Mortazavi A, Williams BA, McCue K, Schaeffer L, Wold B: Mapping and quantifying mammalian transcriptomes by RNA-Seq. Nat Methods 2008, 5: 621–628. 10.1038/nmeth.1226
Article CAS PubMed Google Scholar
Jiang H, Wong WH: Statistical inferences for isoform expression in RNA-Seq. Bioinformatics 2009, 25(8):1026–1032. 10.1093/bioinformatics/btp113
Article PubMed Central CAS PubMed Google Scholar
Schweikert G, Zien A, Zeller G, Behr J, Dieterich C, Ong CS, Philips P, De Bona F, Hartmann L, Bohlen A, Krüger N, Sonnenburg S, Rätsch G: mGene: Accurate SVM-based gene finding with an application to nematode genomes. Genome Research 2009. doi:10.1101/gr.090597.108. doi:10.1101/gr.090597.108.
Google Scholar

Download references

Author information

Authors and Affiliations

Friedrich Miescher Laboratory of the Max Planck Society, 72076, Tübingen, Germany
Regina Bohnert, Jonas Behr & Gunnar Rätsch

Authors

Regina Bohnert
View author publications
You can also search for this author in PubMed Google Scholar
Jonas Behr
View author publications
You can also search for this author in PubMed Google Scholar
Gunnar Rätsch
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Regina Bohnert.

Rights and permissions

Open Access This article is published under license to BioMed Central Ltd. This is an Open Access article is distributed under the terms of the Creative Commons Attribution 2.0 International License (https://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Reprints and permissions

About this article

Cite this article

Bohnert, R., Behr, J. & Rätsch, G. Transcript quantification with RNA-Seq data. BMC Bioinformatics 10 (Suppl 13), P5 (2009). https://doi.org/10.1186/1471-2105-10-S13-P5

Download citation

Published: 19 October 2009
DOI: https://doi.org/10.1186/1471-2105-10-S13-P5

Highlights from the Fifth International Society for Computational Biology (ISCB) Student Council Symposium

Transcript quantification with RNA-Seq data

Motivation

Methods

Results

Conclusion

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

BMC Bioinformatics

Contact us

Highlights from the Fifth International Society for Computational Biology (ISCB) Student Council Symposium

Transcript quantification with RNA-Seq data

Motivation

Methods

Results

Conclusion

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

BMC Bioinformatics

Contact us