Taxonomic colouring of phylogenetic trees of protein sequences
© Palidwor et al; licensee BioMed Central Ltd. 2006
Received: 10 November 2005
Accepted: 17 February 2006
Published: 17 February 2006
Phylogenetic analyses of protein families are used to define the evolutionary relationships between homologous proteins. The interpretation of protein-sequence phylogenetic trees requires the examination of the taxonomic properties of the species associated to those sequences. However, there is no online tool to facilitate this interpretation, for example, by automatically attaching taxonomic information to the nodes of a tree, or by interactively colouring the branches of a tree according to any combination of taxonomic divisions. This is especially problematic if the tree contains on the order of hundreds of sequences, which, given the accelerated increase in the size of the protein sequence databases, is a situation that is becoming common.
We have developed PhyloView, a web based tool for colouring phylogenetic trees upon arbitrary taxonomic properties of the species represented in a protein sequence phylogenetic tree. Provided that the tree contains SwissProt, SpTrembl, or GenBank protein identifiers, the tool retrieves the taxonomic information from the corresponding database. A colour picker displays a summary of the findings and allows the user to associate colours to the leaves of the tree according to any number of taxonomic partitions. Then, the colours are propagated to the branches of the tree.
PhyloView can be used at http://www.ogic.ca/projects/phyloview/. A tutorial, the software with documentation, and GPL licensed source code, can be accessed at the same web address.
Phylogenetic trees based upon multiple sequence alignments of proteins from many species are commonly used to determine the evolutionary relationships between homologous sequences, which can give insights into the evolution of a protein family and the functional specificity of the members of the family .
It is expected that the phylogenetic tree reflects the events of duplication and speciation of proteins. Through speciation, related proteins in different organisms are generated that reflect the taxonomic relations of those organisms. However, cases of phylogenetic associations being at odds with known taxonomy can be interesting anomalies worthy of investigation, perhaps indicating problems in the generation of the multiple sequence alignment or events of lateral gene transfer . Also, events of gene duplication (that may define subfamilies with subtle variations of the common functional theme of the family) can be observed by the repetition of a taxonomic structure at multiple places in the tree.
Unfortunately, available software for rendering phylogenetic trees does not provide a simple means of automatically retrieving taxonomic information for the sequences represented in the tree, or of graphically representing arbitrary taxonomic properties of trees, thus allowing the study of the relation of a phylogenetic tree with the taxonomic relations between the species represented in the tree. PhyloView was developed in an attempt to address this limitation and provide a simple and generic means of doing this.
Results and discussion
Upon initial loading the script provides a form where the user may upload a phylogenetic tree in Newick (New Hampshire) format, used by most phylogenetic packages (e.g., ). The tree should contain SwissProt, SpTrembl , or GenBank GI protein identifiers  in the leaf node names. The associated records are then dynamically retrieved from the appropriate database online over the internet and the associated taxonomic information is extracted. Processing times for the initial upload of a tree vary with the number of sequences in the tree, the load on the server and the response speed of the public databases queried. For example, a tree with 600 sequences (which in our experience is pretty large for a phylogenetic tree) takes about one minute to load. Subsequent processing times for the same tree tend to be much shorter as the taxonomic information associated with the protein sequence identifiers is cached and no further queries of public databases are required.
To allow users to show their own identifiers in the tree, we use the following internal format for names: "DBID*YOURID" where DBID is the database identifier used by PhyloView to extract the taxonomic information, YOURID is the identifier to be displayed in the tree, and * is a user defined separator (the default symbol is "/"). Optionally, the DBID can be removed from the rendered tree leaving the user identifiers.
Once the colours have been chosen, resubmitting the form will render a new tree where the various nodes and branches are coloured based upon the above choice. The taxonomic colouring algorithm is such that every branch of the tree receives the colour assigned to the taxonomic group with most members under that branch. In case of a tie between assignments, the more specific one is given precedence (for example, Viridiplantae over Eukarya). Colouring of a given branch only happens if more than 50% of the sequences under that branch belong to a single taxonomic group with an assigned colour.
Mouse-over of the phylogenetic tree leaf nodes in SVG mode creates floating tool-tip type output with full taxonomic information for the sequence. The preferred form of output for the tree is an SVG image. SVG is an XML based standard for vector graphics. Though not natively supported by most browsers, a number of plug-ins is freely available, for example the Adobe SVG viewer .
We plan to extend PhyloView as a visualization framework for enhancing sequence phylogenetic tree images with associated data. We welcome feedback and proposals for additional features from users.
PhyloView is the first web server dedicated to colouring according to taxonomy of phylogenetic trees. There is other software that may be used to attain similar results but with considerably more effort. For example, Mesquite  (an open source modular software system for evolutionary analysis written in Java) and MacClade  (a commercial computer program for phylogenetic analysis that runs only on MacOS), allow the manual colouring of the branches of a phylogenetic tree, but these are complicated general purpose programs and achieving this is a laborious and complicated process. PhyloView is intended to streamline and simplify this, allowing the user to rapidly explore different combinations of colours and taxonomic partitions for the best visual result.
Availability and requirements
PhyloView requires Internet Explorer with the Adobe SVG viewer plug-in and can be used at . Source code is available from that location as well.
This work was supported by projects funded by the Canadian Foundation for Innovation, and the Ontario Research and Development Challenge Funds. MA is recipient of a Canada Research Chair of Bioinformatics. We thank the members of the Bioinformatics group of the Ontario Genomics Innovation Centre for helpful discussions.
- Zuckerkandl E, Pauling L: Molecules as documents of evolutionary history. J Theor Biol 1965, 8: 357–366.View ArticlePubMedGoogle Scholar
- Bapteste E, Susko E, Leigh J, Macleod D, Charlebois RL, Doolittle WF: Do orthologous gene phylogenies really support tree-thinking? BMC Evol Biol 2005, 5: 33. 10.1186/1471-2148-5-33PubMed CentralView ArticlePubMedGoogle Scholar
- Stajich JE, Block D, Boulez K, Brenner SE, Chervitz SA, Dagdigian C, Fuellen G, Gilbert JGR, Korf I, Lapp H L, ehvaslaiho H, Matsalla C, Mungall CJ, Osborne BI, Pocock MR, Schattner P, Senger M, Stein LD, Stupka ED, Wilkinson M, E. B: The Bioperl Toolkit: Perl modules for the life sciences. Genome Research 2002, 12: 1611–1618. 10.1101/gr.361602PubMed CentralView ArticlePubMedGoogle Scholar
- GNU General Public License[http://www.gnu.org/copyleft/gpl.html]
- Choi JH, Jung HY, Kim HS, Cho HG: PhyloDraw: a phylogenetic tree drawing system. Bioinformatics 2000, 16: 1056–1058. 10.1093/bioinformatics/16.11.1056View ArticlePubMedGoogle Scholar
- Bairoch A, Apweiler R, Wu CH, Barker WC, Boeckmann B, Ferro S, Gasteiger E, Huang H, Lopez R, Magrane M, Martin MJ, Natale DA, O'Donovan C, Redaschi N, Yeh LS: The Universal Protein Resource (UniProt). Nucleic Acids Res 2005, 33: D154–9. 10.1093/nar/gki070PubMed CentralView ArticlePubMedGoogle Scholar
- Wheeler DL, Barrett T, Benson DA, Bryant SH, Canese K, Church DM, DiCuccio M, Edgar R, Federhen S, Helmberg W, Kenton DL, Khovayko O, Lipman DJ, Madden TL, Maglott DR, Ostell J, Pontius JU, Pruitt KD, Schuler GD, Schriml LM, Sequeira E, Sherry ST, Sirotkin K, Starchenko G, Suzek TO, Tatusov R, Tatusova TA, Wagner L, Yaschenko E: Database resources of the National Center for Biotechnology Information. Nucleic Acids Res 2005, 33: D39–45. 10.1093/nar/gki062PubMed CentralView ArticlePubMedGoogle Scholar
- Adobe SVG Viewer[http://www.adobe.com/svg/viewer/install/main.html]
- Maddison WP, Maddison DR: Interactive analysis of phylogeny and character evolution using the computer program MacClade. Folia Primatol (Basel) 1989, 53: 190–202.View ArticleGoogle Scholar
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.