Skip to content


  • Poster presentation
  • Open Access

Phylogenomic inference of functional divergence

  • 1,
  • 1,
  • 1,
  • 1, 2 and
  • 1
BMC Bioinformatics200910 (Suppl 13) :P4

  • Published:


  • Functional Divergence
  • Molecular Chaperone
  • Bacterial Genome
  • Tree Node
  • Intracellular Pathogen


The divergence of protein function following gene duplication – or the colonization of new ecological niches – is of central importance in the evolution of novelty. Changes in protein structure and function are reflected at the level of amino acid sequence. This principle suggests that lineage-specific functional divergence in proteins can be identified by the analysis of primary sequence data. However, many amino acid substitutions have a negligible effect on protein function. This means that a simple comparison of the sequence differences between two clusters of proteins will not reveal the subset of changes responsible for functional divergence. While several methods to identify these biologically important substitutions exist [1], they are not optimized for analyses of large numbers of protein sequences. Here, we present a fast new method for identifying these substitutions across a large phylogenetic tree.

Materials and methods

Our method requires a bifurcating phylogenetic tree and a protein sequence alignment. Each node on the tree is defined by two downstream clades and one or more outgroup sequences. Using BLOSUM [2] scores to quantify how radical or conservative substitutions in each clade are relative to the outgroup, we assign a score to each column of the alignment at each tree node, which is then tested for significance [3]. Here, we apply our method to a tree of the GroEL genes from 622 bacterial genomes.


GroEL is an important molecular chaperone which helps at least 250 client proteins fold in Escherichia coli [4]. Interestingly, we found that four out of the five bacterial lineages most enriched for functional divergence are intracellular pathogens (see Figure 1). Radical change in GroEL has previously been implicated in the adaptation of endosymbiotic bacteria to intracellular life [5], and these results suggest this may be a more general response to the population-genetic conditions of an intracellular lifestyle.
Figure 1
Figure 1

Bacterial lineages enriched for functional divergence in GroEL. The thermosome-related sequences are found in certain extremophilic bacteria, perhaps as a result of horizontal gene transfer from archaea. The other highlighted lineages are intracellular pathogens, with the exception of Chloroflexi. The tree was produced by RAxML [6].

Authors’ Affiliations

Smurfit Institute of Genetics, Trinity College, Dublin 2, Ireland
Department of Molecular Evolution, Evolutionary Biology Centre, Uppsala University, Uppsala, Sweden


  1. Gu X, Velden K: DIVERGE: Phylogeny-based analysis for functional-structural divergence of a protein family. Bioinformatics 2002, 18: 500–501. 10.1093/bioinformatics/18.3.500View ArticlePubMedGoogle Scholar
  2. Henikoff S, Henikoff JG: Amino acid substitution matrices from protein blocks. Proc Natl Acad Sci USA 1992, 89: 10915–10919. 10.1073/pnas.89.22.10915PubMed CentralView ArticlePubMedGoogle Scholar
  3. Toft C, Williams TA, Fares MA: Genome-wide functional divergence after the symbiosis of Proteobacteria with Insects unraveled through a novel computational approach. PLoS Comput Biol 2009, 5: e1000344. 10.1371/journal.pcbi.1000344PubMed CentralView ArticlePubMedGoogle Scholar
  4. Kerner MJ, Naylor DJ, Ishihama Y, Maier T, Chang H-C, Stines AP, Georgopoulos C, Frishman D, Hayer-Hartl M, Mann M, Hartl FU: Proteome-wide analysis of chaperonin-dependent protein folding in Escherichia coli . Cell 2005, 122: 209–220. 10.1016/j.cell.2005.05.028View ArticlePubMedGoogle Scholar
  5. Fares MA, Moya A, Barrio E: GroEL and the maintenance of bacterial endosymbiosis. Trends Genet 2004, 20: 413–6. 10.1016/j.tig.2004.07.001View ArticlePubMedGoogle Scholar
  6. Stamatakis A: RAxML-VI-HPC: Maximum likelihood-based phylogenetic analyses with thousands of taxa and mixed models. Bioinformatics 2006, 22: 2688–2690. 10.1093/bioinformatics/btl446View ArticlePubMedGoogle Scholar


© Williams et al; licensee BioMed Central Ltd. 2009

This article is published under license to BioMed Central Ltd.