PhyTB: Phylogenetic tree visualisation and sample positioning for M. tuberculosis
© Benavente et al.; licensee BioMed Central. 2015
Received: 14 January 2015
Accepted: 29 April 2015
Published: 13 May 2015
Phylogenetic-based classification of M. tuberculosis and other bacterial genomes is a core analysis for studying evolutionary hypotheses, disease outbreaks and transmission events. Whole genome sequencing is providing new insights into the genomic variation underlying intra- and inter-strain diversity, thereby assisting with the classification and molecular barcoding of the bacteria. One roadblock to strain investigation is the lack of user-interactive solutions to interrogate and visualise variation within a phylogenetic tree setting.
We have developed a web-based tool called PhyTB (http://pathogenseq.lshtm.ac.uk/phytblive/index.php) to assist phylogenetic tree visualisation and identification of M. tuberculosis clade-informative polymorphism. Variant Call Format files can be uploaded to determine a sample position within the tree. A map view summarises the geographical distribution of alleles and strain-types. The utility of the PhyTB is demonstrated on sequence data from 1,601 M. tuberculosis isolates.
PhyTB contextualises M. tuberculosis genomic variation within epidemiological, geographical and phylogenic settings. Further tool utility is possible by incorporating large variants and phenotypic data (e.g. drug-resistance profiles), and an assessment of genotype-phenotype associations. Source code is available to develop similar websites for other organisms (http://sourceforge.net/projects/phylotrack).
Strain-specific genomic diversity in the Mycobacterium tuberculosis complex (MTBC) is an important factor in tuberculosis pathogenesis that may affect virulence, transmissibility, host response and emergence of drug resistance [1,2]. Some modern strains (e.g. Beijing, Euro-American, Haarlem) are believed to exhibit more virulent phenotypes compared to ancient ones (e.g. East African, Indian, M. africanum) . M. tuberculosis is relatively clonal, with little recombination and a low mutation rate . Like other bacterial genomic settings, the construction of phylogenetic trees using sequence data facilitates taxonomic localisation and the evolutionary analysis. The growing availability of M. tuberculosis whole genome sequences is leading to the full characterisation of single nucleotide polymorphisms (SNPs) and other nucleotide variation, such as insertions and deletions (indels). A SNP–based barcode has been developed to discriminate strain-types . Trees constructed using genome-wide variation have greater discriminatory power than traditional genotyping approaches such as MIRU-VNTR and spoligotyping . Clades reflecting strain type variations may be used to investigate disease outbreaks or transmission events, where samples are identified through apparent identical genomic signatures [5,6]. The tree also provides a structure to identify variants that can be used to investigate clinically important traits such as drug resistance . The primary mechanism for acquiring resistance is the accumulation of point mutations in genes coding for drug-targets or -converting enzymes (e.g. katG, inhA, rpoB, pncA, embB, rrs, gyrA, gyrB genes) , and these mutations may exist in multiple lineages in the tree, reflecting homoplasy events. Some mutations thought to be related to drug resistance are actually not, but instead strain-informative . With the increased application of sequencing technologies within clinical and microbiological research settings, it is important that informatic tools are available to identify informative strain-type and drug resistance related variants. Web-browsers for the visualisation of M. tuberculosis genomic variation exist [8-10], but there is limited connectivity with phylogenetic trees and downstream analysis, especially involving strain-types and drug resistance. In addition, there is little provision for uploading new data, such as standard variant call files (VCFs) (www.htslib.org). Here we present the PhyTB tool, which facilitates the phylogenetic exploration of M. tuberculosis isolates, including the display of clade-specific informative and drug resistance markers and their genomic annotation. Using the browser, it is possible to upload multiple standard genomic variant call files (VCF format) to identify the closest relative within the M. tuberculosis complex global phylogeny, thereby potentially assisting their interpretation in a clinical or epidemiological context. Source code is available to facilitate the development of sites for other organisms with genomes that can be represented in a phylogeny.
Results and discussion
The PhyTB web-browser attempts to contextualise TB genomic variation within epidemiological, geographical and phylogenic settings. To assist with integrating such data for other organisms, we provide the source code, which has been packaged in the PhyloTrack library. In pathogenic bacteria like M. tuberculosis, data integration is crucial to distinguish drug-resistance mutations from phylogenetic markers, to study the transmission of outbreak strains, to detect the source of an infection, inform patient management and design appropriate infection control measures (e.g. rapid tests). Further tool utility is possible by extending it to incorporate large variants and phenotypic data (e.g. drug-resistance profiles).
Availability and requirements
This work has been supported by Bloomsbury Research Fund, Medical Research Council UK and Wellcome Trust.
- Reiling N, Homolka S, Walter K, Brandenburg J, Niwinski L, Ernst M, et al. Clade-specific virulence patterns of mycobacterium tuberculosis complex strains in human primary macrophages and aerogenically infected mice. mBio. 2013; 4(4):00250–13.View ArticleGoogle Scholar
- Coll F, McNerney R, Guerra-Assuncao JA, Glynn JR, Perdigao J. A robust snp barcode for typing mycobacterium tuberculosis complex strains. Nat Commun. 2014; 5:4812.View ArticlePubMedPubMed CentralGoogle Scholar
- Ford CB, Shah RR, Maeda MK, Gagneux S, Murray MB, Cohen T, et al. Mycobacterium tuberculosis mutation rate estimates from different lineages predict substantial differences in the emergence of drug-resistant tuberculosis. Nat Genet. 2013; 45:784–90.View ArticlePubMedPubMed CentralGoogle Scholar
- Coll F, Mallard K, Preston M, Bentley S, Parkhill J. Spolpred: Rapid and accurate ascertainment of mycobacterium tuberculosis strain types from short genomic sequences. Bioinformatics. 2012; 28:2991–3.View ArticlePubMedPubMed CentralGoogle Scholar
- Clark TG, Mallard K, Coll F, Preston M. Transmission of multidrug-resistant tuberculosis in treatment experienced patients. PLoS One. 2013; 8(12):83012.View ArticleGoogle Scholar
- Guerra-Assunção JA, Houben RM, Crampin AC, Mzembe T, Mallard K, Coll F, Khan P, Banda L, Chiwaya A, Pereira RP, McNerney R, Harris D, Parkhill J, Clark TG, Glynn JR. Recurrence due to relapse or reinfection with Mycobacterium tuberculosis: a whole-genome sequencing approach in a large, population-based cohort with a high HIV infection prevalence and active follow-up. J Infect Dis. 2015; 211(7):1154–63.View ArticlePubMedGoogle Scholar
- Sandgren A, Strong M, Muthukrishnan P, Weiner BK, Church GM. Tuberculosis drug resistance mutation database. PLoS Med. 2009; 6(2):2.View ArticleGoogle Scholar
- Chernyaeva EN, Shulgina MV, Rotkevich MS, Dobrynin PV. Genome-wide mycobacterium tuberculosis variation (gmtv) database: A new tool for integrating sequence variations and epidemiology. BMC Genomics. 2014; 15:308.View ArticlePubMedPubMed CentralGoogle Scholar
- Coll F, Preston MD, Guerra-Assuncao JA, Glynn JR, Perdigao J. Polytb: A genomic variation map for mycobacterium tuberculosis. Tuberculosis. 2014; 94(3):346–54.View ArticlePubMedPubMed CentralGoogle Scholar
- Wattam AR, Abraham D, Dalay O, Disz TL. Patric, the bacterial bioinformatics database and analysis resource. Nucl Acids Res. 2014; 42(D1):581–91.View ArticleGoogle Scholar
- Bostock M. D3.js - data driven documents. http://d3js.org/, (last modified June 21, 2014).
- Skinner ME, Uzilov AV, Stein LD, Mungall CJ, Holmes IH. Jbrowse: a next-generation genome browser. Genome Resh. 2009; 19(9):1630–8.View ArticleGoogle Scholar
- Li H. Tabix: fast retrieval of sequence features from generic tab-delimited files. Bioinformatics (Oxford, England). 2011; 27(5):718–9.View ArticleGoogle Scholar
- Coll F, McNerney R, Preston MD, Guerra-Assuncao JA, Warry A, Hill-Cawthorne G, et al. Rapid determination of anti-tuberculosis drug resistance from whole-genome sequences. Genome Med. 2015.Google Scholar
This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.