Tetranucleotide usage in mycobacteriophage genomes: alignment-free methods to cluster phage and infer evolutionary relationships

Siranosian, Benjamin; Herold, Emma; Williams, Edward; Ye, Chen; de Graffenried, Christopher

doi:10.1186/1471-2105-16-S2-A7

Volume 16 Supplement 2

Highlights from the Tenth International Society for Computational Biology (ISCB) Student Council Symposium 2014

Meeting abstract
Open access
Published: 28 January 2015

Tetranucleotide usage in mycobacteriophage genomes: alignment-free methods to cluster phage and infer evolutionary relationships

Benjamin Siranosian^1,2,
Emma Herold²,
Edward Williams²,
Chen Ye² &
…
Christopher de Graffenried³

BMC Bioinformatics volume 16, Article number: A7 (2015) Cite this article

1233 Accesses
3 Citations
Metrics details

Background

The genomic sequences of phages isolated on mycobacterial hosts are diverse, mosaic and often share little nucleotide similarity. However, about 30 unique types have been isolated, allowing most phage to be grouped into clusters and further into subclusters [1]. Many tools for the analysis of mycobacteriophage genomes depend on sequence alignment or knowledge of gene content. These methods are computationally expensive, can require significant manual input (for example, gene annotation) and can be ineffective for significantly diverged sequences [2]. We evaluated tetranucleotide usage in mycobacteriophages as an alternative to alignment-based methods for genome analysis.

Description

We computed tetranucleotide usage deviation, the ratio of observed counts of 4-mers in a genome to the expected count under a null model [3]. Tetranucleotide usage deviation is comparable for members of the same phage subcluster and distinct between subclusters. Neighbor joining phylogenetic trees were constructed on pairwise Euclidean distances between all genomes in the mycobacteriophage database. In almost every case, phage were placed in a monophyletic clade with members of the same subcluster. With few exceptions, trees computed from tetranucleotide usage deviation accurately reconstruct trees based on gene content for a subset of the mycobacteriophage population (Figure 1). We also evaluated the possibility of assigning clusters to unknown phage based on tetranucleotide usage deviation. Under a simple nearest neighbor classifier, cluster assignments were recovered at a frequency greater than 98%. In addition, we looked for evidence of horizontal gene transfer by using tetranucleotide difference index, a measure of the deviation in tetranucleotide usage from the genomic mean in a sliding window across the genome [3]. Tetranucleotide difference index plots showed a strong spike at the end of cluster L mycobacteriophages, which could indicate horizontal gene transfer in the region.

Conclusions

Genome analysis based on tetranucleotide usage shows promise for evaluating host-parasite coevolution and gene exchange within the mycobacteriophage population. These methods are computationally inexpensive and independent of gene annotation, making them optimal candidates for further research aimed at clustering phage and determining evolutionary relationships. Code for genome analysis and data used in this project are freely available at https://github.com/bsiranosian/tango_final.

References

Hatfull GF: Mycobacteriophages: Windows into Tuberculosis. PLoS Pathog. 2014, 10: e1003953-10.1371/journal.ppat.1003953.
Article PubMed Central PubMed Google Scholar
Vinga S, Almeida J: Alignment-free sequence comparison-a review. Bioinformatics. 2003, 19: 513-523. 10.1093/bioinformatics/btg005.
Article CAS PubMed Google Scholar
Pride DT, Wassenaar TM, Ghose C, Blaser MJ: Evidence of host-virus co-evolution in tetranucleotide usage patterns of bacteriophages and eukaryotic viruses. BMC Genomics. 2006, 7: 8-10.1186/1471-2164-7-8.
Article PubMed Central PubMed Google Scholar
Hatfull GF, Jacobs-Sera D, Lawrence JG, Pope WH, Russell DA, Ko C-C, Weber RJ, Patel MC, Germane KL, Edgar RH, Hoyte NN, Bowman CA, Tantoco AT, Paladin EC, Myers MS, Smith AL, Grace MS, Pham TT, O'Brien MB, Vogelsberger AM, Hryckowian AJ, Wynalek JL, Donis-Keller H, Bogel MW, Peebles CL, Cresawn SG, Hendrix RW: Comparative Genomic Analysis of 60 Mycobacteriophage Genomes: Genome Clustering, Gene Acquisition, and Gene Size. Journal of Molecular Biology. 2010, 397: 119-143. 10.1016/j.jmb.2010.01.011.
Article PubMed Central CAS PubMed Google Scholar

Download references

Author information

Authors and Affiliations

Center for Computational Molecular Biology, Brown University, Providence, RI, USA
Benjamin Siranosian
Division of Biology and Medicine, Brown University, Providence, RI, USA
Benjamin Siranosian, Emma Herold, Edward Williams & Chen Ye
Department of Molecular Microbiology and Immunology, Brown University, Providence, RI, USA
Christopher de Graffenried

Authors

Benjamin Siranosian
View author publications
You can also search for this author in PubMed Google Scholar
Emma Herold
View author publications
You can also search for this author in PubMed Google Scholar
Edward Williams
View author publications
You can also search for this author in PubMed Google Scholar
Chen Ye
View author publications
You can also search for this author in PubMed Google Scholar
Christopher de Graffenried
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Benjamin Siranosian.

Rights and permissions

This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.

Reprints and permissions

About this article

Cite this article

Siranosian, B., Herold, E., Williams, E. et al. Tetranucleotide usage in mycobacteriophage genomes: alignment-free methods to cluster phage and infer evolutionary relationships. BMC Bioinformatics 16 (Suppl 2), A7 (2015). https://doi.org/10.1186/1471-2105-16-S2-A7

Download citation

Published: 28 January 2015
DOI: https://doi.org/10.1186/1471-2105-16-S2-A7

Highlights from the Tenth International Society for Computational Biology (ISCB) Student Council Symposium 2014

Tetranucleotide usage in mycobacteriophage genomes: alignment-free methods to cluster phage and infer evolutionary relationships

Background

Description

Conclusions

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

BMC Bioinformatics

Contact us

Highlights from the Tenth International Society for Computational Biology (ISCB) Student Council Symposium 2014

Tetranucleotide usage in mycobacteriophage genomes: alignment-free methods to cluster phage and infer evolutionary relationships

Background

Description

Conclusions

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

BMC Bioinformatics

Contact us