Exploiting protein flexibility to predict the location of allosteric sites
© Panjkovich and Daura; licensee BioMed Central Ltd. 2012
Received: 26 April 2012
Accepted: 17 October 2012
Published: 25 October 2012
Skip to main content
© Panjkovich and Daura; licensee BioMed Central Ltd. 2012
Received: 26 April 2012
Accepted: 17 October 2012
Published: 25 October 2012
Allostery is one of the most powerful and common ways of regulation of protein activity. However, for most allosteric proteins identified to date the mechanistic details of allosteric modulation are not yet well understood. Uncovering common mechanistic patterns underlying allostery would allow not only a better academic understanding of the phenomena, but it would also streamline the design of novel therapeutic solutions. This relatively unexplored therapeutic potential and the putative advantages of allosteric drugs over classical active-site inhibitors fuel the attention allosteric-drug research is receiving at present. A first step to harness the regulatory potential and versatility of allosteric sites, in the context of drug-discovery and design, would be to detect or predict their presence and location. In this article, we describe a simple computational approach, based on the effect allosteric ligands exert on protein flexibility upon binding, to predict the existence and position of allosteric sites on a given protein structure.
By querying the literature and a recently available database of allosteric sites, we gathered 213 allosteric proteins with structural information that we further filtered into a non-redundant set of 91 proteins. We performed normal-mode analysis and observed significant changes in protein flexibility upon allosteric-ligand binding in 70% of the cases. These results agree with the current view that allosteric mechanisms are in many cases governed by changes in protein dynamics caused by ligand binding. Furthermore, we implemented an approach that achieves 65% positive predictive value in identifying allosteric sites within the set of predicted cavities of a protein (stricter parameters set, 0.22 sensitivity), by combining the current analysis on dynamics with previous results on structural conservation of allosteric sites. We also analyzed four biological examples in detail, revealing that this simple coarse-grained methodology is able to capture the effects triggered by allosteric ligands already described in the literature.
We introduce a simple computational approach to predict the presence and position of allosteric sites in a protein based on the analysis of changes in protein normal modes upon the binding of a coarse-grained ligand at predicted cavities. Its performance has been demonstrated using a newly curated non-redundant set of 91 proteins with reported allosteric properties. The software developed in this work is available upon request from the authors.
Proteins can be regarded as the functional building blocks of life, carrying out and coordinating almost all biological processes. Tight regulation of these processes is fundamental in all kingdoms of life and allostery represents one of the most commmon and powerful means of modulating protein activity . Allostery can be defined as the regulation of a protein’s function by binding of an effector molecule at a site which is not the active site. Its relevance was emphasized decades ago by Jacques Monod, when he referred to allosteric regulation as the ‘second secret of life’, second only to the genetic code . Even though allostery and its often intrincate nature have captured the interest of researchers since the initial discoveries more than half a century ago (for a review see ), most allosteric mechanisms are still not completely understood . At present, allosteric phenomena are being intensively studied for their potential as target mechanisms for the development of new classes of therapeutics .
Expanding drug-design through allostery opens up an unexplored territory of novel potential therapeutic solutions, beyond what has been already covered by the classic, active-site oriented drug-development approach. An important factor fueling interest in allosteric drugs consists in their characteristic advantages compared to traditional active-site inhibitors. For example, allosteric sites tend to be under lower sequence-conservation pressure than active sites, facilitating the design of highly specific drugs and reducing the risks of toxicity or side-effects [5–7]. To explain this briefly, if the pathogen’s active site is very well conserved in nature it may share important structural features with the human homologue, which could be then bound and inhibited as well by the antimicrobial drug causing toxic side-effects on the patient. Thus, lower levels of evolutionary conservation at ligand-binding sites may allow for more selective drugs. Furthermore, allosteric drugs may not only inhibit but also increase target-protein activity, enabling novel therapeutic possibilites as seen for example in the activation of glucokinase by allosteric drugs, a potential treatment for type 2 diabetes mellitus [8, 9]. On the same line, traditional drugs may be complemented by allosteric effectors, as observed in the case of aminoglycoside phosphotransferase where a previously unknown binding site could be exploited to allosterically counteract antibiotic resistance .
However, the field of allosteric-drug design is rather young and the amount of allosteric drugs known today is still marginal . For example, at the time of this writing a query in DrugBank  for the term ‘allosteric’ returns 7 results, while ‘inhibition’ returns 483 entries. This may be in part due to the intrinsic difficulties in understanding allosteric mechanisms and to the lack of systematic studies on the topic . Only recently the first initiative to store and organize information on allosteric cases has surfaced in the form of the AlloSteric Database (ASD) . By browsing ASD it becomes apparent that part of the difficulty in studying allosteric systems lies in the large degree of variety found among them, as there are many ways in which protein activity can be affected allosterically [12, 14]. A textbook example is the one provided by glucose-induced glucokinase, in which the ligand triggers a conformational change allowing the active site to become functional . In other cases the presence of the allosteric ligand triggers the formation of the biologically active protein complex (e.g. GTP cyclohydrolase stimulatory complex ). A protein illustrating the variety and complexity allosteric mechanisms may reach is ribonucleotide reductase. This protein presents two different allosteric sites: one affects the enzyme catalytic rate and the other alters its specificity allowing the enzyme to switch substrates . Furthermore, allosteric signals may also propagate solely by altering protein dynamics, without a detectable conformational change [18, 19].
In the context of such diversity, unveiling common patterns beneath allosteric phenomena could increase their potential for therapeutic exploitation, stimulating the design of allosteric drugs. We postulate that the first step in such a procedure would be to computationally detect or predict the presence and location of protein allosteric sites, to allow further focusing of drug-screening processes on selected protein targets down the pipeline. The algorithm should be able to pinpoint which proteins are sensible to allosteric regulation. However, if as already suggested any dynamic protein has the potential to be regulated allosterically , then the method should indicate the location of putative allosteric sites on the protein. Based solely on sequence, it would be very hard to predict the location of allosteric sites as it has been done by homology on active sites [21, 22], because the evolutionary pressure for sequence conservation on allosteric sites is generally much lower and harder to detect, if at all present [3, 23].
Until now, much of the research in the field has been focusing on the conformational changes induced by allosteric signals. The group of Jeffrey Gray studied conformational changes upon allosteric activation  and expanded this research by analysing the networks of quaternary and tertiary motions on which allosteric communication relies . Following a similar line, a very interesting and thorough study was published where different parameters were interrogated in terms of their potential to indicate which protein residues are involved in transmitting the allosteric signal, on the basis of experimental mutation data . The results from these analyses aim at defining the particular pathway of residues that mediate the allosteric communication. However, other authors have argued that this may not be the case in vivo, where multiple effector sites may be present on the protein acting through multiple signaling pathways . In general, recent studies aggree in the idea that allostery is mainly a thermodynamic process and among the different protein properties that are involved in allosteric phenomena, flexibility (i.e. protein dynamics) stands out as the most significant one [3, 28–31].
Following this line of thought, Ming and Wall developed a theoretical framework to study allosteric effects by comparing the dynamics of bound and unbound protein-ligand pairs . They further refined their methodology and tested its ability to predict functional ligand-binding sites (not necessarily linked to allosterism) on a set of 305 protein-ligand complexes of known structure . Very recently, two other approaches partially aiming at predicting allosteric sites have been published by Mitternacht and coworkers. In a first article they describe a geometric measure that helps at locating biologically functional ligand-binding sites, while a second one describes a more elaborate measure called ‘binding leverage’, related to protein dynamics, which appears useful at locating biologically relevant binding sites including allosteric sites. They tested this last feature on 15 allosteric proteins [34, 35], observing different results for specific proteins and concluding that regulatory sites may be identified without previous experimental knowledge on conformational changes. However, these studies were not completely focused on allosteric sites and did not benefit from the larger data set now available at ASD .
Even though the previously cited articles represent an important step forward in the understanding of allostery, we consider that further research is needed if allosteric sites are to be predicted with the same coverage and precision as active sites [21, 22]. The first thing we did in this direction was to integrate more than one hundred allosteric entries available at ASD. Among the multiple allosteric mechanisms known and the different effectors (other proteins, small-molecules, phosphorylation, etc), we chose to focus on small-molecule ligands, as these are the best candidates to be mimicked by therapeutic drugs [4, 14]. Moreover, the approach presented here is based on the idea that changes in protein flexibility upon ligand binding can be related to allosteric and regulatory effects [1, 24, 36–38]. A simple computational way to estimate protein structural flexibility is the use of Normal Mode Analysis (NMA) [32, 35, 38, 39]. In this case, however, we were not interested in measuring absolute flexibility values but the change in flexibility that occurs when a ligand binds to the protein structure in a particular location, in a similar fashion to the approach developed by Ming and Wall . Once we had gathered and filtered allosteric proteins of known structure, we tested if changes in flexibility could be linked to the presence of the allosteric ligand. Experiments were performed using different molecular representations of the small-molecule ligands, and across different ranges of normal modes. Moreover, as a control we simulated the presence of ligands in alternative binding sites. This helped in parameterizing the methodology and made it applicable to cases where there is no a priori knowledge on the allostery the protein may present. Besides evaluating the overall results on a set of allosteric proteins, we took a closer look into particularly interesting cases.
To study allosteric sites from a structural perspective we first gathered the available data. We started by integrating the 146 allosteric site entries that were, at the time of this writing, annotated in the AlloSteric Database (ASD)  with another 72 allostery examples we had previously found in the literature. We proceeded to filter and cluster the data set as described in the Methods section to avoid overrepresentation  and low quality structures, turning the inital 213 cases into a total of 91 representative proteins where both the structure and location of the allosteric ligand are known.
Our first experiment aimed at quantifying the number of proteins in our data set that undergo a significant change in flexibility when the allosteric ligand is bound. However, known allostery cases show large diversity in their mechanisms  and we did not expect a positive result on the complete data set, since in many cases the allosteric effect may not be primarily driven by changes in local or overall flexibility but specific conformational changes, oligomerization or other mechanisms may be more relevant .
As explained in the Background section, we have chosen to estimate flexibility using Normal Mode Analysis (NMA). When applying NMA, calculated low-frequency modes reflect large collective oscillations of the protein structure and high-frequency modes reflect small local fluctuations . Even though for most cases it has been shown that low-frequency normal modes are better descriptors of allosteric effects , we made no a priori assumption on the set of normal modes that would be more appropriate to detect an allosteric effect upon ligand binding for the ample protein set studied here. Thus, we decided to explore this parameter by using different ranges of normal modes, as described in the Methods section.
We used the calculated normal modes to predict B-factors , as this is a standard quantity for the estimation of protein flexibility . Briefly, NMA calculations were performed for proteins in our data set both in the presence and absence of the allosteric ligand. For each protein, Cα B-factors derived from both conditions were compared and considered to be significantly different if the Wilcoxon-Mann-Whitney test returned a p-value < 0.05.
We performed a second experiment to measure how different the results from this approach would be if we used a simplified molecular representation instead of the full-atom ligand, given the fact that knowledge on the ligand structure may not always be available. Moreover, a predictive approach that does not require information on the ligand molecule has a much larger field of application (e.g. structural genomics) paving the way for the discovery of novel and pharmacologically interesting allosteric sites. Another interesting possibility that would open up is the detection of serendipitous allosteric sites, which despite having no natural ligand effectively become an allosteric site given the presence of an appropriate ‘opportunistic’ ligand .
We tested two representations of ligands: a single dummy atom located at the ligand geometric center and a set of 6 dummy atoms located at the vertices of an octahedron around the geometric center, as explained in the Methods section.
Figure 1 shows that for most cases the single dummy atom at the ligand geometric center is not able to trigger a significant change in flexibility during the simulations, while the octahedron exerts an effect much closer to that of the full-atom ligand. From a methodological point of view, simulating ligands in a simplified form allowed us to perform control experiments which are described below.
To further develop a predictive approach, we used the LIGSITE cs program  to predict the putative ligand-binding sites on the protein structure. Different programs are available for this task with very good performance in general as shown in recent reviews [44–46]. We chose LIGSITE cs because pockets are predicted based only on the shape of the protein surface; programs incorporating more parameters (e.g. evolutionary conservation, druggability) could have improved but also biased our results.
We predicted the location of up to 8 ligand-binding pockets per protein and performed NMA to check if any of the predicted pockets had a significant effect upon protein flexibility when occupied by a small-molecule ligand, as described in the Methods section.
At the time of this writing, no large-scale study attempting the prediction of the allosteric-site location in known allosteric proteins has been published. Recent work by Demerdash and coworkers aimed at predicting the residues involved in the propagation of allosteric signals within a protein structure for a set of 16 different proteins . Quite distinctly, our method follows a drug-discovery oriented approach where the intention is to pinpoint specific protein pockets that present a high potential for affecting biological function. From that perspective our method is comparable to the one developed by Ming and Wall for finding functional sites, as it exploits NMA to assess the differences in flexibility between ligand-bound and unbound states of a protein [33, 47]. However, their approach differs from ours in multiple major points, including the sampling of protein sites, the parametrization of probes and of their interaction with the protein and the approach by which the perturbation of protein dynamics is assessed. The method described in this paper is also similar to the approach recently published by Mitternacht and coworkers, where they measured the ‘binding leverage’, or ability of a binding site to couple to the intrinsic motions of a protein, on a set of 15 allosteric proteins .
In this context, any pocket with a biological regulatory role would be suited for the analysis, but we chose to focus on allosteric sites since these are possibly the most interesting, albeit complex, regulatory sites to approach. Starting from a data set containing 91 proteins, we measured the rate of success of our approach to identify allosteric sites as follows. First, we discarded a total of 33 proteins for which no single LIGSITE cs predicted pocket matched the allosteric site, leaving a total of 58 cases to work with (63,7%). The rational for discarding these proteins is that the present analysis is not concerned with the ability of a specific program to detect a cavity but with the ability of our approach to identify the cavity, among those detected, that corresponds to the allosteric site. Indeed, it has been previously observed that not all allosteric sites are predicted to be potential ligand-binding cavities by common algorithms . There can be different reasons for this, for example the allosteric site may be deeply buried in the protein, may display a planar shape or be located at the interface of subunits, making it difficult for the pocket-prediction algorithm to detect its presence.
Prediction results on protein allosteric sites
However, it will not be common to select up to eight pockets per protein as potential allosteric sites. A researcher working on a particular protein without a priori knowledge on its regulatory mechanism will probably keep the first three largest pockets predicted by default  or, as Thornton and coworkers explain for the case of active sites , the largest pocket will usually be the best bet. In those two scenarios, the tendency shown for the complete set of predicted pockets is conserved (Table 1). When keeping the first 3 pockets (set c123), the chance to match an allosteric site (positive predictive value) goes from 25% to 39% when using the flexibility criterion and up to 47% when incorporating structural conservation as well. Out of the 58 allosteric sites, however, 14 are not found within the c123 set. Likewise, when selecting only the first and largest pocket, the inital success rate goes from 45% to 52% when considering the effect on flexibility upon binding (set c1F) and to 65% when structural conservation is also required. Note that between sets FS and c1FS the number of false positives decreases by three-fold, while only two additional false negatives are added.
We considered only allosteric sites as desirable matches. However, other pockets with biological functions were matched by our criteria, as described further below on a few particular examples. The performance of this approach might be improvable using other pocket prediction programs or a combination of them. However, performance of pocket prediction methods does not vary largely, as shown by a recent large-scale comparison . Our study represents the largest test to date (58 non-redundant proteins in complex with their corresponding small-molecule allosteric ligands) proving the concept that changes in overall flexibility upon ligand binding are relevant identifiers for some allosteric sites, and these effects can be captured in many cases with the simple approach described here. In addition, we further show (see also ) that evaluation of the structural conservation of the candidate pockets may contribute as much to the identification of the allosteric site.
As mentioned in the Background section, allostery can work through many different mechanisms. Thus, we consider it important, besides the overall results presented above, to explain the results for a few proteins in more detail. The following section should help to better illustrate the relevance of incorporating a flexibility measure when studying allosteric systems and predicting the location of allosteric sites.
Aldehyde dehydrogenases (ALDH) are found across all kingdoms of life. They play a vital role in multiple cellular processes, including glycolysis, detoxification and embryogenic development. A distinct family within the ALDH superfamily consists of the non-phosphorylating glyceraldehyde-3-phosphate dehydrogenases (GAPN), which catalyze the phosphate-independent irreversible oxidation of glyceraldehyde 3-phosphate (GAP) to 3-phosphoglycerate using NAD(P) as a cosubstrate. Unlike other proteins in the GAPN family, the enzyme of the hyperthermophilic Archaeum Thermoproteus tenax (Tt-GAPN) is regulated by a set of inhibitors (NADH, NADP(H) and ATP) and activators (AMP, ADP, glucose 1-phosphate and fructose 6-phosphate (F6P)) which decrease or increase, respectively, the affinity for NAD. This suggests that Tt-GAPN plays a crucial role in regulating the carbohydrate catabolism in T. tenax.
All different activators bind to the same allosteric site, which is located more than 20 Å away from both the active site and the cosubstrate-binding site of any monomer of the tetramer . The activator binding site is located at the interface between the tetramerization domain and the cosubstrate binding/catalytic domains. It is also observed that the allosteric ligands are in direct contact with 3 or even four monomers in the protein complex, indicating a role in the stabilization of the complex. This role probably combines with the detected effect on flexibility to influence enzyme kinetics, as no large conformational change is observed when comparing the ligand-bound and ligand-free structures besides a rearrangement of the tetramerization domain with respect to the cosubstrate binding/catalytic domain .
Another predicted pocket (PKT7), appears to significantly affect protein flexibility on most normal-mode ranges when occupied during the NMA. This pocket is not occupied by any ligand in the original structure ([PDB:3HRF]), but it does match the position of a phosphoserine (SEP) in the activation loop of PDK1, as shown in Figure 4. Also in this pocket, residue THR226 is considered a crucial element of the allosteric mechanism of this protein, given that mutation of this residue inhibits activation without inhibiting binding . These results indicate that stabilization of this protein region would have an effect on the overall flexibility of the protein, linking it to a regulatory function which correlates with what has been observed previously based on deuterium exchange and other experimental procedures . The other 5 pockets predicted on this structure were not found to significantly affect protein flexibility.
Non-nucleoside reverse transcriptase inhibitors (NNRTIs) are key elements of the so-called HAART (Highly Active Antiretroviral Therapy) multi-drug treatments against HIV-1 infection. However, rapid mutation of HIV-1 compromises the efficacy and durability of HAART. This high mutation rate fuels the need to discover novel agents with better activity profiles against HIV-1 reverse transcriptase (RT) and its most common mutants. In this context, Anthony and coworkers have developed substituted tetrahydroquinolines which are potent allosteric inhibitors of HIV-1 RT and some of its key mutants .
When glycolysis takes place under anaerobic conditions, pyruvate is reduced to L-lactate, a reaction that is catalyzed by L-lactate dehydrogenase (LDH). In contrast to their mammalian counterparts, some bacterial LDHs display allosteric regulation by fructose 1,6-bisphosphate (FBP) . Iwata and co-workers solved the structure of Bifidobacterium longum LDH in both active (R) and inactive (T) states, co-crystallized with the allosteric activator . A significant difference can be observed between the B-factors of both structures, suggesting an overall change in flexibility being part of the allosteric mechanism.
No other pocket on this protein displayed an effect related to flexibility according to our calculations when considering the normal mode range 6-20, not even those pockets matching the location of the active site or other ligands.
Given that animal LDHs are not regulated allosterically, this protein/pocket could be an excellent target for antimicrobial compounds. To further explore this idea, we analyzed the human LDH homolog ([PDB:1I10]) as well, which shows a sequence identity of 37.7% and a local RMSD of 1.04 Å according to the SUPERPOSE web server  when compared to Bifidobacterium longum LDH. On the human protein, which is not regulated allosterically, the pocket equivalent to the allosteric site in the bacterial homolog did not produce a significant effect on flexibility according to our calculations. It is remarkable that a coarse approximation such as this (based on Cαs and NMA) is able to distinguish that the presence of the allosteric ligand has a significant effect on the bacterial protein flexibility but not on its human homolog.
In this article we have proposed a very simple approach exploiting changes in protein flexibility upon ligand binding to predict the presence and location of allosteric sites. We tested the methodology on a non-redundant set of 58 proteins achieving in the best case a success rate (positive predictive value) of 65%, with a sensitivity of 0.22. Furthermore, we analyzed four cases in more detail, revealing how the coarse-grained approach described here is able to capture the effect triggered by the allosteric ligand, matching the current literature. The structural analysis proposed here could help medicinal chemists and other researchers on their way through the promising field of allosteric-drug design.
To ensure that the quality and nature of the selected structural data was appropriate to our study, we discarded structures with a resolution lower than 3 Å or with a G-Factor lower than -1, as calculated by PROCHECK . We conservatively defined a non-redundant data set to avoid possible bias in the results that may arise from overrepresentation of any protein family . Clustering was performed with the BLASTCLUST program  using a threshold of 30% sequence identity, which grouped the 213 initial entries into 91 groups. We then selected the highest resolution structure of each group as its representative and defined a non-redundant data set which contained a total of 91 distinct allosteric proteins, for which the structure and location of both the allosteric site and ligand were known.
where is the Euclidean distance between atoms i and j in the crystallographic structure and R c and k were given in this study the values 10 Å and 1 Kcal mol−1Å−2, respectively. Note that this energy function was designed in such a way that it does not require energy minimization of the X-ray structure prior to the normal-mode calculation since the X-ray structure is the minimum of the function. Although this method uses very gross approximations (reduction to Cα atoms, extremely simple energy function, no solvent), it has proven to perform surprisingly well in front of both more complex approximations and experimental data (B-factors) [39, 60, 62].
Low-frequency modes reflect large collective or delocalized motions in the protein structure, while high-frequency modes reflect small vibrations in localized regions. We estimated B-factors for the ligand-bound and unbound protein structures using different ranges of normal modes to explore this variable. Ranges were named using two numbers X-Y: Starting from the low frequency modes, X is the number of modes that are skipped and Y is the number of normal modes that are taken into account. The first six normal modes, with zero frequency, are of no interest as they represent rigid-body translation and rotation. The ranges tested were: 6-5, 6-10, 6-20, 6-50, 6-94, 11-10, 16-10, 26-10, 56-10, 90-10, 16-84, 26-74, 56-44 and 96-4.
We prepared protein coordinate files for NMA as follows: (1) Protein chains in direct contact with the allosteric ligand (i.e. multiple residues within 3.0 Å) were selected and atoms belonging to other chains or molecules in the structure were removed. (2) The LIGSITE cs program  was used to predict up to 8 pockets per structure. (3) After pocket prediction, protein structures were parsed to keep Cαatoms only.
We took the first 100 normal modes for each protein and ligand representation: apo, the Cα only ‘apo’ protein crystallographic structure (allosteric ligand is not present); ligand, same protein structure as in ‘apo’ but including the allosteric ligand (or a simplified molecular representation) in the allosteric site; PKTX, same protein structure as in ‘apo’ plus a simplified molecular representation of a ligand occupying the predicted pocket number X (1 to 8).
In the last case, the molecular representation of the ligand was located at the pocket geometric center, as predicted by LIGSITE cs .
The ligand molecule during NMA was simulated in different ways: full atom, all atoms in the ligand molecule are included in the calculation; geometric center, a single dummy atom located at the ligand-pocket geometric center is considered; octahedron, the ligand’s presence is simulated by a dummy atom positioned at the geometric center and six extra dummy atoms located at 4 Å distance from the center on both sides of each axis (i.e. forming the vertices of a regular octahedron).
For each protein-ligand pair, calculated B-factors for the Cα protein atoms of the apo structure were compared to those obtained for the same atoms in the configurations including real or simulated ligands to test for significant changes in flexibility using the Wilcoxon-Mann-Whitney test. Differences with a p-value<0.05 were considered significant.
The authors dedicate this paper to Professor Wilfred F. van Gunsteren on the occasion of his 65th birthday. This project is supported by funding under the Seventh Research Framework Programme of the European Union (ref. HEALTH-F3-2009-223101). AP acknowledges the FPU Scholarship from MICINN, Spanish Gov.
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License ( http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.