Local comparison of protein structures highlights cases of convergent evolution in analogous functional sites
© Ausiello et al; licensee BioMed Central Ltd. 2007
Published: 8 March 2007
We performed an exhaustive search for local structural similarities in an ensemble of non-redundant protein functional sites. With the purpose of finding new examples of convergent evolution, we selected only those matching sites composed of structural regions whose residue order is inverted in the relative protein sequences.
A novel case of local analogy was detected between members of the ABC transporter and of the HprK/P families in their ATP binding site. This case cannot be derived by events of circular permutation since the residues of one of the region pairs are located in reverse order in the sequence of the two protein families. One of the analogous binding sites, the one identified in HprK/P, is known to also bind pyrophosphate, which is used as preferred energy source in its kinase and phosphorylase activity.
The discovery of this striking molecular similarity, also associated to a functional similarity, may help in suggesting new experiments aimed at a deeper understanding of members of the ABC transporter family known to be involved in many serious human diseases.
The global comparison of protein sequences or structures is one of the most used computational tools in the analysis of newly discovered proteins . These methods can highlight evolutionary relationships. However, due to possible events of divergent evolution in the functional site/s, they cannot always allow the "guilt by association" inference of a protein function.
The vast majority of known functions (enzymatic activities, binding sites etc.) are encoded by a relatively small set of residues located in a conserved geometry both in the protein sequence and in the protein structure. 3D motifs [2, 3] can thus be used, in different forms, for analyzing and inferring molecular functions when the global similarity is not conserved (for a review see ).
Local similarities in the context of a global non-similarity, are generated by phenomena of divergent  or convergent evolution. The latter are uncommon and few are described in the literature, well-known examples being represented by the SHD catalytic triad in serine proteases [3, 6] or by the region surrounding the ploop in many nucleotide-binding proteins [7, 8].
Methods for the identification of local structural similarities alone are not sufficient to spot cases of convergent evolution. Here we applied a new approach that consists of an exhaustive search for local structural similarities between known protein structures, followed by a selection of structural similarities coming from different regions and located in a different order in the sequence of the protein families sharing the site.
We analyzed the results of an all-versus-all local comparison in an ensemble of protein functional sites, searching for 3D matches characterized by sequence inversion events. Non-collinear matches which also have a strong statistical significance were manually analyzed and a few cases of convergently evolved sites were identified and are discussed below.
Structural comparison experiment
Starting from a non-redundant structural dataset of about 2000 protein chains, we identified 10175 surface cavities. About 2500 of those cavities were defined as functional since they contain a consistent fraction of residues associated to a PROSITE  pattern or a known ligand binding site (see Methods). We performed an all versus all structural comparison of the functional clefts with the whole dataset of 10000 clefts in search of significant structural similarities.
Number of structural matches found. Number of structural matches found between functionally annotated and whole ensemble of surface cavities.
Selection of non collinear and significant matches
We selected only those matches whose matching residues are non-collinear in the corresponding protein sequences. To do so, the list of matches has been searched for non-collinearity between the paired residues, see Methods. Table 1 reports the number of matches that are collinear and the matches that are not collinear, for each match length.
We calculated a significance value for each one of these matches in the form of a Z-score. Only matches longer than 7 residues and with a Z-score higher than 10 were analyzed. This threshold is considerably stringent, as can be deduced from different tests performed in other massive structural comparison experiments . This stringency in statistical significance strongly supports the hypothesis that only real structural similarities have been considered.
Analysis of non collinear matches
A total of 32 non-collinear structural matches were selected within these stringent thresholds and manually analyzed (see Table 1). 28 matches were identified as common cases of non-collinearity deriving from events of circular permutations of protein sequences .
A new case of sequence inversion
The fourth case (z-score 12.14), see Figure 1d, involves one member of the ABC transporters family and a bacterial Hpr kinase/phosphorylase protein (PDB codes: 1b0u and 1kkl, respectively). This 3D similarity belongs to the category of inverted structural matches (see Method) and cannot have arisen from simple events of sequence rearrangements, therefore identifying a true and almost unique case of convergent evolution.
The ATP binding subunit of a bacterial histidine permease belongs to the ABC transporters family , whose members are widespread into different phyla, from bacteria to humans. Some of the ABC transporters are known to be involved in several human disorders, such as cystic fibrosis, muscular dystrophy, adrenoleukodystrophy, Stargardt disease and others. The bacterial HPrK/P , on the other hand, is a bacterial sensor enzyme that plays a major role in the regulation of carbon metabolism and sugar transport, controlling the expression of numerous catabolic genes; it catalyzes the ATP- as well as the pyrophosphate-dependent phosphorylation of Ser-46 in HPr, a phosphocarrier protein of the phosphoenolpyruvate-dependent sugar phosphotransferase system (PTS).
Local multiple alignment of the two families.
Residues in inverted regions
Residues in ploop
ATP-Binding Subunit Of The Histidine Permease
Hypothetical ABC Transporter ATP-Binding Protein Mj0796
Escherichia coli K12
ATP-Bound E. Coli Malk, Maltose/Maltodextrin Transport
Maltose Transport Protein Malk
DNA Mismatch Repair Protein Muts
DNA Mismatch Repair Protein Muts
Cdc6P, Cell Division Control Protein 6
ABC Type 2
Structural Maintenance Of Chromosome 1, Head Domain Residues 1–214, 1024–1225
Peptide Transporter Tap1, C-Terminal ABC ATPase Domain
Cystic Fibrosis Transmembrane Conductance Regulator, Nucleotide Binding Domain One
Cystic Fibrosis Transmembrane Conductance Regulator, NDB1 Domain (Residues 389–673)
Hprk/P Bound To Phosphate, Hprk Protein
Hprk/P In Complex With B. Subtilis Hpr, Phosphocarrier Protein Hpr
Hprk/P In Complex With B. Subtilis P-Ser-Hpr
In both proteins, the residues identified belong to a well-studied functional site, described in detail below. The co-crystallized molecules (an ATP and a pyro-phosphate) neatly superpose the corresponding phosphate atoms, in support of a functional meaning of the 3D similarity. This quite unique situation accounts for the fact that this striking similarity has not been highlighted so far.
A multiple alignment of members of the two protein families derived from the corresponding superposition of the functional sites is shown in Table 2.
An exhaustive search has been performed for significant 3D similarities between protein functional sites containing residues that are non-collinear in the respective protein sequences. From a non-redundant set of protein structures, an ensemble of about 10,000 cavities were defined, one fourth of which could be associated to a known molecular function using PROSITE patterns and/or bound ligands.
The local comparison produced more than 60 thousand 3D matches. In this list, a relatively low number of matches appeared to involve non-collinear residues. All matches were evaluated for their statistical significance using the Z-score. Cases with Z-score > 10 were carefully analyzed and manually inspected in the graphics.
Four interesting cases were identified, three of which were already known in the literature as cases involving a permutation in one of the protein families comprised in the structural match.
The fourth case involved a member of the ABC transporter family and a bacterial HprK/P (Figures 2 and 3). Three regions in the two protein sequences are involved in the structural match: their location in the respective sequences is 1-2-3 versus 3-1-2. This is compatible with a circular permutation in one of the protein families. But in one of the regions (the one identified by #3 in the preceding sentence) the residues are located in inverse order in the two sequences, therefore suggesting that a case of convergent evolution has occurred.
Interestingly, a structural core composed of 4 beta filaments and one helix is conserved in the two structures (Figure 4), but two of the beta filaments are oriented in opposite directions. This finding can be relevant both for a better understanding of structure-function relationships and for medical significance, since members of the ABC transporter family are involved in several human diseases, such as cystic fibrosis, muscular distrophy, adrenoleukodystrophy, Stargardt disease and others. This analogy might help in suggesting experimental strategies to devise new classes of inhibitors, peptides or compounds.
We used a NCBI non-redundant PDB  composed of 1924 chains obtained using only X-ray solved structures and a sequence-similarity cut-off corresponding to a minimum BLAST p-value of 10e-7.
Using the SURFNET algorithm  we identified a dataset of 10175 surface clefts on these chains with a cavity volume higher then 200 Å3.
We defined each cleft as the set of residues identified by the algorithm that surrounds the cavity pocket .
Functionally important residues were identified in the set of defined cavities by searching for PROSITE  patterns and ligand binding sites.
PROSITE motifs were searched in the sequences of our protein dataset with the ScanProsite algorithm . All PROSITE regular expressions were used excluding those marked as "unspecific".
Ligand binding residues were identified with a distance criterion. All residues within 3.5 Å distance from any HETEROATOM found in the selected co-ordinate set were selected and assigned to the category.
All those clefts displaying less than 75% of the residues identified to be involved in one of the defined functions were discarded, with the purpose of considering only almost complete functional sites .
All vs. all structural comparison
The structural comparison was performed using the sequence independent local comparison algorithm Query3D . Matching criteria of Query3D are both geometrical (r.m.s.d. between paired residues) and biochemical (scores from a substitution matrix). The algorithm uses a two-point representation of each residue, the C-alpha and a side-chain representative point. An exhaustive exploration guarantees finding the two largest sets of matching aminoacids in a pair of protein structures. For this experiment, high stringency parameters were set (r.m.s.d. < 0.7 Å and residue similarity > 1.2 according to the Dayhoff substitution matrix) in order to obtain only matches with a high similarity. Whenever a match involves more than 10 aminoacids, only the first ten are considered.
In the all vs. all comparison experiment, all surface clefts containing at least 75% of functionally important residues were compared to the whole set of clefts. Moreover, the algorithm was forced to consider only the structural similarities involving at least 50% of aminoacids being annotated as functionally important . This requirement helps in selecting only matches in protein regions characterized by an easily deducible function.
Each match is scored with the match length, i.e. with the number of residues that can be superposed within the defined similarity thresholds. The significance of each match is evaluated by calculating the Z-score over the value distribution of the query cleft comparison with the whole dataset. For each match, the Z-score is computed as the difference between the value of the match and the average value of all the matches for the query patch, divided by the standard deviation.
Definition of collinear and inverted structural matches
A structural match can be described as a set of pairs of residues that can be superposed in 3D. Each residue pair is identified by an uppercase letter (i.e. A) and the two composing residues with the same letter in lowercase followed by one or two apice depending on its belonging to the first or the second structure (i.e. a', a").
Given two residues a' ∈ A and b' ∈ B, a' < b' if a' precedes b' in the primary sequence. Two pairs A and B are non-collinear if a' < b' while b" < a" or if b' < a' while a" < b".
A structural match is non-collinear if it contains at least 2 non-collinear pairs, and is inverted if it contains at least 3 pairs, each of these non-collinear between each other.
We gratefully acknowledge the support of Telethon GGP04273, AIRC and a PNR 2003–2007 (FIRB art.8).
This article has been published as part of BMC Bioinformatics Volume 8, Supplement 1, 2007: Italian Society of Bioinformatics (BITS): Annual Meeting 2006. The full contents of the supplement are available online at http://www.biomedcentral.com/1471-2105/8?issue=S1.
- Novotny M, Madsen D, Kleywegt GJ: Evaluation of protein fold comparison servers. Proteins 2004, 54: 260–270. 10.1002/prot.10553View ArticlePubMedGoogle Scholar
- Ausiello G, Zanzoni A, Peluso D, Via A, Helmer-Citterich M: Pdbfun: mass selection and fast comparison of annotated pdb residues. Nucleic Acids Res 2005, 33: W133–7. 10.1093/nar/gki499PubMed CentralView ArticlePubMedGoogle Scholar
- Wallace AC, Laskowski RA, Thornton JM: Derivation of 3d coordinate templates for searching structural databases: application to ser-his-asp catalytic triads in the serine proteinases and lipases. Protein Sci 1996, 5: 1001–1013.PubMed CentralView ArticlePubMedGoogle Scholar
- Jones S, Thornton JM: Searching for functional sites in protein structures. Curr Opin Chem Biol 2004, 8: 3–7. 10.1016/j.cbpa.2003.11.001View ArticlePubMedGoogle Scholar
- Lesk AM, Fordham WD: Conservation and variability in the structures of serine proteinases of the chymotrypsin family. J Mol Biol 1996, 258: 501–537. 10.1006/jmbi.1996.0264View ArticlePubMedGoogle Scholar
- Brady L, Brzozowski AM, Derewenda ZS, Dodson E, Dodson G, Tolley S, Turkenburg JP, Christiansen L, Huge-Jensen B, Norskov L: A serine protease triad forms the catalytic centre of a triacylglycerol lipase. Nature 1990, 343: 767–770. 10.1038/343767a0View ArticlePubMedGoogle Scholar
- Via A, Ferre F, Brannetti B, Helmer-Citterich M: Protein surface similarities: a survey of methods to describe and compare protein surfaces. Cell Mol Life Sci 2000, 57: 1970–1977. 10.1007/PL00000677View ArticlePubMedGoogle Scholar
- Via A, Ferre F, Brannetti B, Valencia A, Helmer-Citterich M: Three-dimensional view of the surface motif associated with the p-loop structure: cis and trans cases of convergent evolution. J Mol Biol 2000, 303: 455–465. 10.1006/jmbi.2000.4151View ArticlePubMedGoogle Scholar
- Hulo N, Bairoch A, Bulliard V, Cerutti L, De Castro E, Langendijk-Genevaux PS, Pagni M, Sigrist CJA: The prosite database. Nucleic Acids Res 2006, 34: D227–30. 10.1093/nar/gkj063PubMed CentralView ArticlePubMedGoogle Scholar
- Ausiello G, Via A, Helmer-Citterich M: Query3d: a new method for high-throughput analysis of functional residues in protein structures. BMC Bioinformatics 2005, 6(Suppl 4):S5. 10.1186/1471-2105-6-S4-S5PubMed CentralView ArticlePubMedGoogle Scholar
- Ferre F, Ausiello G, Zanzoni A, Helmer-Citterich M: Functional annotation by identification of local surface similarities: a novel tool for structural genomics. BMC Bioinformatics 2005, 6: 194. 10.1186/1471-2105-6-194PubMed CentralView ArticlePubMedGoogle Scholar
- Jung J, Lee B: Circularly permuted proteins in the protein structure database. Protein Sci 2001, 10: 1881–1886.PubMed CentralView ArticlePubMedGoogle Scholar
- Gong W, O'Gara M, Blumenthal RM, Cheng X: Structure of pvu ii dna-(cytosine n4) methyltransferase, an example of domain permutation and protein fold assignment. Nucleic Acids Res 1997, 25: 2702–2715. 10.1093/nar/25.14.2702PubMed CentralView ArticlePubMedGoogle Scholar
- Polekhina G, Board PG, Gali RR, Rossjohn J, Parker MW: Molecular basis of glutathione synthetase deficiency and a rare gene permutation event. EMBO J 1999, 18: 3204–3213. 10.1093/emboj/18.12.3204PubMed CentralView ArticlePubMedGoogle Scholar
- Kim CA, Gingery M, Pilpa RM, Bowie JU: The sam domain of polyhomeotic forms a helical polymer. Nat Struct Biol 2002, 9: 453–457.PubMedGoogle Scholar
- Pedersen PL: Transport atpases: structure, motors, mechanism and medicine: a brief overview. J Bioenerg Biomembr 2005, 37: 349–357. 10.1007/s10863-005-9470-3View ArticlePubMedGoogle Scholar
- Nessler S: The bacterial hpr kinase/phosphorylase: a new type of ser/thr kinase as antimicrobial target. Biochim Biophys Acta 2005, 1754: 126–131.View ArticlePubMedGoogle Scholar
- Berman HM, Westbrook J, Feng Z, Gilliland G, Bhat TN, Weissig H, Shindyalov IN, Bourne PE: The protein data bank. Nucleic Acids Res 2000, 28: 235–242. 10.1093/nar/28.1.235PubMed CentralView ArticlePubMedGoogle Scholar
- Laskowski RA: Surfnet: a program for visualizing molecular surfaces, cavities, and intermolecular interactions. J Mol Graph 1995, 13: 323–30. 10.1016/0263-7855(95)00073-9View ArticlePubMedGoogle Scholar
- Ferre F, Ausiello G, Zanzoni A, Helmer-Citterich M: Surface: a database of protein surface regions for functional annotation. Nucleic Acids Res 2004, 32: D240–4. 10.1093/nar/gkh054PubMed CentralView ArticlePubMedGoogle Scholar
- Gattiker A, Gasteiger E, Bairoch A: Scanprosite: a reference implementation of a prosite scanning tool. Appl Bioinformatics 2002, 1: 107–108.PubMedGoogle Scholar
This article is published under license to BioMed Central Ltd. This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.