Phylogenomic inference of functional divergence
© Williams et al; licensee BioMed Central Ltd. 2009
Published: 19 October 2009
The divergence of protein function following gene duplication – or the colonization of new ecological niches – is of central importance in the evolution of novelty. Changes in protein structure and function are reflected at the level of amino acid sequence. This principle suggests that lineage-specific functional divergence in proteins can be identified by the analysis of primary sequence data. However, many amino acid substitutions have a negligible effect on protein function. This means that a simple comparison of the sequence differences between two clusters of proteins will not reveal the subset of changes responsible for functional divergence. While several methods to identify these biologically important substitutions exist , they are not optimized for analyses of large numbers of protein sequences. Here, we present a fast new method for identifying these substitutions across a large phylogenetic tree.
Materials and methods
Our method requires a bifurcating phylogenetic tree and a protein sequence alignment. Each node on the tree is defined by two downstream clades and one or more outgroup sequences. Using BLOSUM  scores to quantify how radical or conservative substitutions in each clade are relative to the outgroup, we assign a score to each column of the alignment at each tree node, which is then tested for significance . Here, we apply our method to a tree of the GroEL genes from 622 bacterial genomes.
- Gu X, Velden K: DIVERGE: Phylogeny-based analysis for functional-structural divergence of a protein family. Bioinformatics 2002, 18: 500–501. 10.1093/bioinformatics/18.3.500View ArticlePubMedGoogle Scholar
- Henikoff S, Henikoff JG: Amino acid substitution matrices from protein blocks. Proc Natl Acad Sci USA 1992, 89: 10915–10919. 10.1073/pnas.89.22.10915PubMed CentralView ArticlePubMedGoogle Scholar
- Toft C, Williams TA, Fares MA: Genome-wide functional divergence after the symbiosis of Proteobacteria with Insects unraveled through a novel computational approach. PLoS Comput Biol 2009, 5: e1000344. 10.1371/journal.pcbi.1000344PubMed CentralView ArticlePubMedGoogle Scholar
- Kerner MJ, Naylor DJ, Ishihama Y, Maier T, Chang H-C, Stines AP, Georgopoulos C, Frishman D, Hayer-Hartl M, Mann M, Hartl FU: Proteome-wide analysis of chaperonin-dependent protein folding in Escherichia coli . Cell 2005, 122: 209–220. 10.1016/j.cell.2005.05.028View ArticlePubMedGoogle Scholar
- Fares MA, Moya A, Barrio E: GroEL and the maintenance of bacterial endosymbiosis. Trends Genet 2004, 20: 413–6. 10.1016/j.tig.2004.07.001View ArticlePubMedGoogle Scholar
- Stamatakis A: RAxML-VI-HPC: Maximum likelihood-based phylogenetic analyses with thousands of taxa and mixed models. Bioinformatics 2006, 22: 2688–2690. 10.1093/bioinformatics/btl446View ArticlePubMedGoogle Scholar
This article is published under license to BioMed Central Ltd.