PoPMuSiC 2.1: a web server for the estimation of protein stability changes upon mutation and sequence optimality
© Dehouck et al; licensee BioMed Central Ltd. 2011
Received: 27 December 2010
Accepted: 13 May 2011
Published: 13 May 2011
The rational design of modified proteins with controlled stability is of extreme importance in a whole range of applications, notably in the biotechnological and environmental areas, where proteins are used for their catalytic or other functional activities. Future breakthroughs in medical research may also be expected from an improved understanding of the effect of naturally occurring disease-causing mutations on the molecular level.
PoPMuSiC-2.1 is a web server that predicts the thermodynamic stability changes caused by single site mutations in proteins, using a linear combination of statistical potentials whose coefficients depend on the solvent accessibility of the mutated residue. PoPMuSiC presents good prediction performances (correlation coefficient of 0.8 between predicted and measured stability changes, in cross validation, after exclusion of 10% outliers). It is moreover very fast, allowing the prediction of the stability changes resulting from all possible mutations in a medium size protein in less than a minute. This unique functionality is user-friendly implemented in PoPMuSiC and is particularly easy to exploit. Another new functionality of our server concerns the estimation of the optimality of each amino acid in the sequence, with respect to the stability of the structure. It may be used to detect structural weaknesses, i.e. clusters of non-optimal residues, which represent particularly interesting sites for introducing targeted mutations. This sequence optimality data is also expected to have significant implications in the prediction and the analysis of particular structural or functional protein regions. To illustrate the interest of this new functionality, we apply it to a dataset of known catalytic sites, and show that a much larger than average concentration of structural weaknesses is detected, quantifying how these sites have been optimized for function rather than stability.
The freely available PoPMuSiC-2.1 web server is highly useful for identifying very rapidly a list of possibly relevant mutations with the desired stability properties, on which subsequent experimental studies can be focused. It can also be used to detect sequence regions corresponding to structural weaknesses, which could be functionally important or structurally delicate regions, with obvious applications in rational protein design.
The availability of computational tools yielding reasonably accurate estimations of the impact of amino acid substitutions on the stability of proteins is of crucial importance in a wide range of applications. In particular, such tools have the potential to stimulate and support protein engineering and design projects dedicated to the creation of modified proteins that remain active in non-physiological conditions, or that present enhanced functional properties [1, 2]. On the other hand, advances in the ability to predict and rationalize the functional effect of naturally occurring amino acid variants and their relationship to disease will have tremendous implications in medicine. Indeed, they can be expected to lead to significant improvements in the understanding of the mechanisms of various diseases, to the development of enhanced diagnostics, new therapeutic approaches, and more personalized treatment options [3, 4]. Although approaches based on multiple sequence alignments remain predominant in this context, predictions of stability changes upon mutation have been recognized as a relevant input in the identification of deleterious and disease-causing mutations [4–7]. On a more fundamental level, the analysis of the predicted distributions of stabilizing or destabilizing mutations in sets of natural or engineered proteins may be extremely valuable to refine our understanding of the relationships between protein sequence, structure, and function [8–11], or to probe the evolutionary dynamics of protein sequences [11–14].
Over the last decade, several methods have been developed to predict the effects of mutations on the stability of proteins. Many of these methods rely primarily on an energy function describing the interactions between residues, within a simplified structural representation. We may distinguish the approaches based on statistical potentials extracted from datasets of protein structures [15–19], from those based on empirical potentials built from optimised combinations of various physical energy terms [20–23]. Several predictors were set up with the help of machine learning technologies, through the establishment of an empirical relationship between the stability change upon mutation and a large number of sequence and/or structural features of the mutated and mutant amino acids [24–27]. More recently, approaches combining the advantages of statistical energy functions and machine learning tools have also been described [28, 29].
Three recent studies independently assessed and compared the performances of several of those predictors, using datasets of experimentally characterized mutants that had not been used to train any of the predictive models [22, 29, 30]. Overall, the conclusions were mixed: all methods show a correct trend in the predictions, but the accuracies often remain moderate. PoPMuSiC was however shown to be a standout and to perform quite well in comparison with several other methods . It should be considered that most of these methods are extremely fast with respect to more detailed approaches, such as free energy perturbation  or thermodynamic integration . In particular, PoPMuSiC allows the estimation of the stability changes resulting from all possible point mutations in an average-sized protein in a matter of seconds.
This advantage was exploited to predict the stability changes induced by all possible point mutations in several globular proteins , using the FoldX algorithm . The results were in good agreement with previous experimental studies, in that a large majority of mutations appear to have a destabilizing effect on protein structures. Moreover, the overall distributions of predicted stability changes were shown to be very similar in different globular proteins. However, it has to be stressed that although this trend holds true - on average - for whole proteins, some local regions may present a different behavior. For example, residues belonging to the active site of a protein have been selected during evolution so as to ensure proper functioning, and are thus generally less optimal with respect to stability [33, 34]. Specific protein regions associated with peculiar patterns of stability changes upon mutation corresponding to structural weaknesses were also suggested, using the PoPMuSiC algorithm, to be involved in the occurrence of conformational changes, such as 3D domain swapping  or amyloid fibril formation ; the latter structural weaknesses were supported by experimental analyses .
We present here the PoPMuSiC-2.1 web server, which allows fast and accurate predictions of the stability changes resulting from point mutations in globular proteins. Besides its top-level performances, our server also distinguishes itself from other available tools by an important advantage in terms of computational speed, and by the ability to perform a systematic scan of all possible mutations in a protein. A new functionality of the current PoPMuSiC web server is that it gives the opportunity to obtain easily an estimation of the optimality of each residue in a protein's sequence, with respect to the stability of its structure. To illustrate the general interest of this unique feature, we performed a large-scale investigation of the optimality of residues involved in catalytic sites, and discuss the possibility of using such data to improve methods aiming at predicting functional sites in proteins.
Prediction of protein stability changes upon mutations
where v, w, x are any of the descriptors s, t, a and d, k is the Boltzmann constant and T the absolute temperature. The terms ΔWasdas and ΔWstdst are defined in a similar way . Higher order coupling terms are not taken into account, since they were shown to yield no improvement in the prediction of stability changes upon mutation . In addition to the statistical potentials, two terms in eq.(1), i.e. ΔV±, are related to the volume difference between the mutant and wild-type amino acid: ΔV = Vm-Vw. They are defined as ΔV± = ΔV H(± ΔV), where the H is the Heaviside function. They provide a coarse description of the impact of creating of a cavity (if ΔV<0) or accommodating a larger side-chain within the protein structure (if ΔV>0). Statistical potentials cannot be expected to describe correctly such effects, since they are derived from a dataset of native structures of wild-type proteins, with very few packing defects.
where ci is the inflection point of sigmoid i, ri its slope, fi its scaling factor and bi its vertical shift. The reason for this choice is that it enables the description of a smooth transition between two different environments: the protein core and the protein surface. Indeed, it was shown previously that the relative weights of the different types of interactions vary according to whether they concern residues at the surface or in the core .
where ΔΔGM, m is the experimentally measured folding free energy change of mutant m and ΔΔGP, m its predicted value, obtained with eqs. (1)-(3). An iterative parameter reduction procedure was devised to eliminate the parameters that present a large uncertainty, which reduced their number from 64 to 52 .
Estimation of protein sequence optimality
PoPMuSiC is fast enough to estimate within seconds the stability changes resulting from all possible mutations in an average-sized protein. It is therefore possible to estimate how robust the structure of a given protein is against mutations in its sequence. It is also possible to identify positions that are particularly poorly optimized with respect to protein stability, i.e. positions for which the predictions suggest that several possible mutations would improve stability. The ability to identify such positions in a protein sequence may be of substantial interest. Indeed, they obviously constitute attractive targets for protein engineering applications. They may also be involved in the mechanisms of protein function , the occurrence of structural switches or the development of conformational diseases [15, 35].
where H is the Heaviside function, m is one of the 19 possibilities of mutation of the amino acid w in position i, and ΔΔGP,w i→mis the corresponding predicted stability change. The score Γ is thus the sum of the predicted stability changes of all stabilizing mutations at a given position in the sequence. Since the large majority of mutations have a destabilizing effect on the protein, Γ can be expected to be close to zero for many positions in the sequence. In contrast, very negative values of Γ point out particularly interesting positions, where some mutations are strongly stabilizing and/or many mutations mildly stabilizing.
PoPMuSiC is called by a user-friendly PHP/MySQL web interface. Since the predictions of PoPMuSiC are based on the structure of the target protein, all queries require a structure file to be specified. The user may either provide the 4-letter code of the Protein DataBank (PDB) structure, which will then be automatically retrieved from the PDB server , upload his own structure file, or select a previously uploaded file. The user may choose to provide a structure file generated by a modeling approach, as long as it complies with the PDB format. Note, however, that the performances of PoPMuSiC were evaluated on the basis of experimentally resolved protein structures and are likely to be lower for modeled structures. Obviously, the accuracy of the predictions will depend on the quality of the model.
Three types of queries may be performed:
The "Single" query allows the prediction of the stability change resulting from one given mutation, specified by the user, in the protein of interest.
The "File" query allows the stability change prediction of a list of single-site mutations in a protein of interest. A (plain text) file containing the list of mutations must be uploaded. The server will output a (plain text) file containing the predicted stability change resulting from each mutation.
The "Systematic" query allows the prediction of the stability changes resulting from all possible single-site mutations in the protein of interest. The server will output a (plain text) file containing the predicted stability change resulting from each mutation. The user may choose how the results will be ordered: either sequentially or on the basis of the value of the predicted ΔΔG P s.
Results and Discussion
Comparison of predicted and measured stability changes
Performances in 5-fold validation
Exclusion of 10% outliers
R d / R c a
σ d /σ c (kcal/mol) b
R d / R c a
σ d /σ c (kcal/mol) b
ΔΔGP = < ΔΔGM >
Average 1 d
Average 2 d, e
As expected, the performances are slightly better in direct validation (on the training set) than in cross validation (on the validation set), but the differences are quite small, indicating the absence of overfitting. On average, the correlation coefficient Rc between predicted and measured ΔΔG values is 0.62, and the root mean square error σc is 1.16 kcal/mol. These measures of performance indicate a strong improvement over the random predictor, which uses randomly shuffled ΔΔGM values as predicted ΔΔGP values, and yields on average a root mean square error of 2.08 kcal/mol (Table 1). Setting all ΔΔGP values equal to the average of the ΔΔGM values generates a root mean square error of 1.47 kcal/mol (Table 1), and is thus more efficient than the random predictor, but still far from reaching the performances of PoPMuSiC. The predictive power of PoPMuSiC was also shown to surpass that of five previously published prediction tools, on an independent dataset of 350 mutations. Indeed, these five methods yielded a value of Rc comprised between 0.29 and 0.48, as compared to 0.67 for PoPMuSiC, and a root mean square error σc comprised between 1.43 and 4.12 kcal/mol, as compared to 1.16 kcal/mol for PoPMuSiC . A recently published prediction method, PEAT-SA, was also benchmarked using the same dataset of 350 mutations: a Rc value of 0.5 and a root mean square error of 1.92 kcal/mol were reported .
The values of R and σ after removal of the 10% most badly predicted mutations are also reported in Table 1. These values provide relevant complementary information to the performance indicators computed on the whole dataset, since a number of poorly predicted mutations may be related to experimental measurements made in specific, non-physiological, conditions or affected by a significant error, to a poorly resolved structure, to mistakes in the database indexing of the measured ΔΔG value, or to structural modifications that are not taken into account by PoPMuSiC.
The last row of Table 1 corresponds to a second round of parameter identification, performed after removal from the training sets of all mutants for which |ΔΔGP−ΔΔGM| is larger than 1.5 kcal/mol in each of the five initial runs. The validation sets are left unchanged, in order to obtain comparable results. This induces a slight improvement of the performances in cross-validation, indicating that the presence of outliers in the training set had a negative impact on the identification of the model.
A frequent objective of protein engineering studies is the increase of the structural stability of a protein, via the introduction of mutations in its sequence. For such applications, PoPMuSiC can be used to identify a small subset of mutations that are likely to present the desired properties, and can be tested experimentally. An important indicator of the performances would then be the proportion of mutations in this subset that actually stabilize the structure, which is related to the specificity of the predictions with respect to stabilizing mutations. In this regard, PoPMuSiC appears as a very reliable prediction tool. Indeed, in cross-validation, 70% of the mutations predicted as mildly stabilizing (ΔΔGP < -0.5 kcal/mol), and 86% of the mutations predicted as strongly stabilizing (ΔΔGP < -1.0 kcal/mol), are actually stabilizing (ΔΔGM < 0.0 kcal/mol).
The good performances of PoPMuSiC were made possible by using a combination of statistical potentials, whose relative weights were optimized via a simple neural network. The total number of adjusted parameters is 52, which remains very reasonable with respect to the size of the training/validation dataset (2648 non-redundant entries), and with respect to other prediction methods based on machine learning techniques [24–28]. Our model also clearly distinguishes itself from a pure black box, as the overall shapes of the optimized weighting functions were shown to exhibit little dependence on the training set, and possess biophysical significance .
Speed of the computations
Comparison of the practical features of different web servers
Maximal number of mutations per querya
Number of mutations processed per minuted
( N = 103)
( N = 370)
( N = 689)
19 × N b
19 × 5
Case study: sequence optimality scores in catalytic sites
To illustrate the relevance and usefulness of the sequence optimality scores Γ computed by PoPMuSiC-2.1, we investigated the relationship between the involvement of residues in protein function and their nonoptimality with respect to protein stability. More precisely, we computed the Γ-score for each residue in a dataset of proteins whose catalytic sites have been experimentally identified and reported in the literature. Our analysis is based on version 2.2.10 of the Catalytic Site Atlas . We selected only the "original" entries, i.e. proteins for which evidence of the location of the catalytic site comes directly from literature references, and excluded the "homologous" entries found by sequence alignment to one of the original entries. The resulting dataset contains 964 proteins, with 3227 catalytic residues and approximately 3.7 105 other residues.
Γ optimality score for catalytic versus noncatalytic residues
(N = 3227)
(N = 1784)
(N = 1443)
(N = 3.7 10 5 )
(N = 1.6 10 5 )
(N = 2.1 10 5 )
Γ < 0.0
Γ < -0.25
Γ < -0.5
Γ < -1.0
Γ < -2.0
Γ < -3.0
Γ < -5.0
Γ < -7.5
Γ < -10.0
We also investigated the relationship between solvent accessibility and sequence optimality. For that purpose, the residues were distributed in two classes (Core and Surface) according to whether their relative solvent accessibility, computed with NACCESS , is smaller or larger than 10%. Table 3 indicates that the overall proportion of nonoptimal catalytic residues is larger in the core than on the surface of proteins (53% versus 48%, for Γ < 0 kcal/mol), and that this difference gets more striking when lower threshold values are considered (e.g. 14% versus 3%, for Γ < -5 kcal/mol). These results denote a stronger trade-off between stability and function in the core of proteins: selecting residue types at specific positions along the sequence to ensure proper functioning is on average more detrimental to protein stability when these residues have a low solvent accessibility. This can be related to the fact that many catalytic residues are charged and/or polar, and thus more likely to have a destabilizing impact when buried in the protein core.
In contrast, in the case of residues that are not involved in catalytic sites, the overall proportion of nonoptimal residues is quite larger on the surface than in the core (46% versus 27% for Γ<0 kcal/mol). It is however interesting to notice that this trend is inverted when threshold values of Γ lower than -2 kcal/mol are considered. These results are in good agreement with previously published studies, which reported that mutations of core residues are more likely to be detrimental to protein stability, while the distribution of stability changes induced by mutations on the surface is quite narrow, with very few highly (de)stabilizing effects . It may also be related to the fact that surface residues have more often nonoptimal conformations because of crystal constraints or interaction with ligands.
Figure 2A is an example of the results obtained with PoPMuSiC on the bifunctional enzyme phosphoribuloseanthranilate isomerase:indoleglycerolphosphate synthase (PRAI:IGPS) from E. Coli (PDB code 1PII) . According to the Catalytic Site Atlas, the residues of the N-terminal domain (residues 1-255) involved in the IGPS catalytic activity are Glu53, Lys55, Lys114, Glu163, Asn184, and Ser215. With a threshold value of -5 kcal/mol on the Γ-score, PoPMuSiC identifies only nine non-optimal residues in this domain, including four of the six catalytic residues. Interestingly, among the five other residues that present a Γ-value lower than -5 kcal/mol, four are located in the same cavity, in close contact with the catalytic residues (Figure 2B). They hold thus probably some importance with respect to the affinity or the specificity of the interaction with the ligand. In the C-terminal domain (residues 256-452) that catalyses the PRAI reaction, PoPMuSiC identifies 11 residues with a Γ-value lower than -5 kcal/mol. The catalytic residues of this domain are not recorded in the Catalytic Site Atlas. However, as can be seen on Figure 2B, many of the residues pointed out by PoPMuSiC are clustered together, in a region previously identified as the PRAI active site . Overall, it appears thus that a large majority of the 20 residues identified as non-optimal by PoPMuSiC in this protein are involved in its enzymatic activity, even though only four of them were recorded in the Catalytic Site Atlas .
We present a web server for the prediction of protein stability changes upon mutations and for the estimation of the optimality of each amino acid in a protein's sequence with respect to the stability of its structure. The prediction performances were evaluated by a 5-fold cross validation procedure, and turned out to be quite impressive for a coarse-grained and very fast prediction method: the correlation coefficient R between predicted and measured ΔΔG s is 0.63 and the root mean square error σ = 1.15 kcal/mol. The performances increase up to R = 0.79 and σ = 0.86 kcal/mol after removal of 10% outliers. PoPMuSiC was also shown to outperform several other prediction tools, on an independent dataset of 350 mutations that were not included in the training sets of the compared methods [23, 29].
A significant advantage of PoPMuSiC-2.1 is that it allows the rapid computation of the stability changes resulting from all single-site mutations in a protein. It can thus be used in the context of rational protein design, to help identify, among the multitude of possibilities, a small number of mutations that are likely to present the desired stability properties. For example, the different versions of PoPMuSiC [15, 29] have been successfully applied by several groups (including us) to predict mutations in the prion protein that stabilize the soluble form and occur in a region that has since then been shown to be determinant for the aggregation tendencies , to modulate the polymerization propensity of α1-antitrypsin , to increase the solubility of a TEV protease by stabilizing the folded state relatively to the aggregated form , to stabilize the folded dark state of a photocontrolled DNA-binding protein in view of modulating the degree of photo-switching , or to identify mutations that stabilize various enzymes, such as pyruvate formate-lyase  or feruloyl esterase . PoPMuSiC has also been used to characterize in silico the effect on stability of specific mutations, in view of rationalizing their impact on a protein of therapeutic interest. The considered mutations were for example naturally occurring variants responsible for the development of hereditary diseases [49–52], mutations related to the acquisition of drug resistance in bacteria , or spanning the natural genetic heterogeneity of a viral protein .
Another consequence of the speed of the predictions is that PoPMuSiC-2.1 can be used to evaluate the optimality of the sequence of a protein with respect to the stability of its structure. This optimality, which is the result of evolution, is shown to be intimately related to the mechanisms of protein function. We indeed applied our prediction method to a large number of enzymes whose catalytic sites have been previously identified and recorded in the Catalytic Site Atlas . Our results indicate that catalytic residues are on average significantly less optimal than other residues, with respect to protein stability. Although the same general trend is observed both on the surface and in the core of proteins, it is much stronger in the core, which is in agreement with previous studies of protein stability.
Obviously, the distinction between catalytic and noncatalytic residues is not perfectly clearcut. According to our predictions, approximately half of the catalytic residues are nonoptimal with respect to protein stability, which means that the other half are totally optimal and thus that all possible mutations of these residues are predicted as destabilizing. This observation indicates that many residues playing an essential role in protein function are not detrimental to stability, which somewhat relativizes the well-known trade-off between stability and function [33, 34, 55]. On the other hand, a number of noncatalytic residues were also identified as not optimal with respect to stability. It is very likely that many of these residues are somehow involved in protein function without being actually part of the catalytic site. They may for instance be close to this site and important to ensure a proper binding affinity or specificity with a ligand, or to generate a sufficient level of structural flexibility [56, 57]. The example of the PRAI:IGPS protein nicely illustrates this point, given that most of the nonoptimal residues highlighted by PoPMuSiC were clustered in the two active sites of this enzyme (Figure 2). Other nonoptimal residues may be present as a result of a compromise between protein stability and other constraints such as the kinetics of the folding or binding process, the prevention of misfolding , or the adjustment of the resistance to proteolysis . Following previously published arguments, we may also note that evolution is a dynamic process, during which protein stability is kept in a near-optimal state by mutations that slightly diminish stability without causing any deleterious effect . Finally, the predictions are of course not perfectly accurate, which may lead to incorrectly label some residues as being not optimal, especially in regions where the structure is poorly defined or subject to crystal constraints.
The inference of protein function is currently often performed through the analysis of sequence conservation data derived from multiple sequence alignments, despite the limitations inherent to this approach . A number of other computational methods have also been developed to identify unknown functional sites in proteins on the basis of structural features [60–62]. These include geometry-based methods such as the detection of pockets or cavities that could accommodate a ligand, energy-based methods such as the identification of sites that interact favorably with various types of probes, and knowledge-based methods that typically involve structural comparisons with datasets or atlases of known functional sites. Some studies were also conducted to investigate the contribution of functional residues to the overall stability of the protein, sometimes with a predictive purpose [9, 63–66]. Recent developments tend to be more focused on the design of prediction schemes integrating various types of information, such as structural attributes and evolutionary sequence conservation, in order to benefit from their complementary [67–69]. To our knowledge, the optimality of the amino acids with respect to protein stability is not explicitly taken into account by any of these integrated methods.
Although further studies are necessary to clarify the relationship between sequence optimality and protein function, our results strongly support the idea that the inclusion of sequence optimality data is likely to improve the performances of methods that aim at identifying unknown catalytic sites or other function-related residues. In addition, such sequence optimality data may be of interest in various types of other applications, such as the assessment of the quality of model protein structures, or the investigation of the evolutionary dynamics of proteins. This data may also provide complementary information to that derived from other prediction tools in view, for instance, of identifying hot spots for molecular recognition [70, 71] or protein aggregation .
Availability and Requirements
Project name: PoPMuSiC
Project home page: http://babylone.ulb.ac.be/popmusic
Operating system: Platform independent (web server)
Programming language: C, PHP/HTML
Any restrictions to use by non-academics: none
This work was supported by the Belgian State Science Policy Office through an Interuniversity Attraction Poles Program (DYSCO), the Belgian Fund for Scientific Research (F.R.S.-FNRS) through an FRFC project, and the Brussels Region (TheraVIP project). YD and MR are Postdoctoral Researcher and Research Director, respectively, at the F.R.S.-FNRS.
- Lippow SM, Tidor B: Progress in computational protein design. Curr Opin Biotechnol 2007, 18: 305–311. 10.1016/j.copbio.2007.04.009PubMed CentralView ArticlePubMedGoogle Scholar
- Damborsky J, Brezovsky J: Computational tools for designing and engineering biocatalysts. Curr Opin Chem Biol 2009, 13: 26–34. 10.1016/j.cbpa.2009.02.021View ArticlePubMedGoogle Scholar
- Ng PC, Henikoff S: Predicting the effects of amino acid substitutions on protein function. Annu Rev Genomics Hum Genet 2006, 7: 61–80. 10.1146/annurev.genom.7.080505.115630View ArticlePubMedGoogle Scholar
- Jordan DM, Ramensky VE, Sunyaev SR: Human allelic variation: perspective from protein function, structure, and evolution. Curr Opin Struct Biol 2010, 20: 342–350. 10.1016/j.sbi.2010.03.006PubMed CentralView ArticlePubMedGoogle Scholar
- Yue P, Li Z, Moult J : Loss of protein structure stability as a major causative factor in monogenic disease. J Mol Biol 2005, 353: 459–473. 10.1016/j.jmb.2005.08.020View ArticlePubMedGoogle Scholar
- Bromberg Y, Rost B : Correlating protein function and stability through the analysis of single amino acid substitutions. BMC Bioinformatics 2009, 10: S8.PubMed CentralView ArticlePubMedGoogle Scholar
- Thusberg J, Vihinen M: Pathogenic or not? And if so, then how? Studying the effects of missense mutations using bioinformatics methods. Hum Mutat 2009, 30: 703–714. 10.1002/humu.20938View ArticlePubMedGoogle Scholar
- Bloom JD, Silberg JJ, Wilke CO, Drummond DA, Adami C: Thermodynamic prediction of protein neutrality. Proc Natl Acad Sci (USA) 2005, 102: 606–611. 10.1073/pnas.0406744102View ArticleGoogle Scholar
- Sanchez IE, Tejero J, Gomez-Moreno C, Medina M, Serrano L : Point mutations in protein globular domains: contributions from function, stability and misfolding. J Mol Biol 2006, 363: 422–432. 10.1016/j.jmb.2006.08.020View ArticlePubMedGoogle Scholar
- Tokuriki N, Stricher F, Schymkowitz J, Serrano L, Tawfik DS : The stability effects of protein mutations appear to be universally distributed. J Mol Biol 2007, 369: 1318–1332. 10.1016/j.jmb.2007.03.069View ArticlePubMedGoogle Scholar
- Tokuriki N, Tawfik DS : Stability effects of mutations and protein evolvability. Curr Opin Struct Biol 2009, 19: 596–604. 10.1016/j.sbi.2009.08.003View ArticlePubMedGoogle Scholar
- DePristo M, Weinreich DM, Hartl DL : Missense meanderings in sequence space: a biophysical view of protein evolution. Nat Rev Genet 2005, 6: 678–687. 10.1038/nrg1672View ArticlePubMedGoogle Scholar
- Camps M, Herman A, Loh E, Loab LA : Genetic constraints on protein evolution. Crit Rev Biochem Mol Biol 2007, 42: 313–326. 10.1080/10409230701597642View ArticlePubMedGoogle Scholar
- Bloom JD, Raval A, Wilke CO: Thermodynamics of neutral protein evolution. Genetics 2007, 175: 255–266.PubMed CentralView ArticlePubMedGoogle Scholar
- Gilis D, Rooman M: PoPMuSiC, an algorithm for Predicting Protein Mutant Stability Changes. Application to prion proteins. Protein Eng 2000, 13: 849–856. 10.1093/protein/13.12.849View ArticlePubMedGoogle Scholar
- Kwasigroch JM, Gilis D, Dehouck Y, Rooman M: PoPMuSiC, rationally designing point mutations in protein structures. Bioinformatics 2002, 18: 1701–1702. 10.1093/bioinformatics/18.12.1701View ArticlePubMedGoogle Scholar
- Zhou H, Zhou Y: Distance-scaled, finite ideal-gas reference state improves structure-derived potentials of mean force for structure selection and stability prediction. Protein Sci 2002, 11: 2714–2726.PubMed CentralView ArticlePubMedGoogle Scholar
- Parthiban V, Gromiha MM, Schomburg D: CUPSAT: prediction of protein stability upon point mutations. Nucleic Acids Res 2006, 34: W239-W242. 10.1093/nar/gkl190PubMed CentralView ArticlePubMedGoogle Scholar
- Deutsch C, Krishnamoorthy B: Four-body scoring function for mutagenesis. Bioinformatics 2007, 23: 3009–3015. 10.1093/bioinformatics/btm481View ArticlePubMedGoogle Scholar
- Guerois R, Nielsen JE, Serrano L: Predicting changes in the stability of proteins and protein complexes: a study of more than 1000 mutations. J Mol Biol 2002, 320: 369–387. 10.1016/S0022-2836(02)00442-4View ArticlePubMedGoogle Scholar
- Yin S, Ding F, Dokholyan NV: Modeling backbone flexibility improves protein stability estimation. Structure 2007, 15: 1567–1576. 10.1016/j.str.2007.09.024View ArticlePubMedGoogle Scholar
- Potapov V, Cohen M, Schreiber G: Assessing computational methods for predicting protein stability upon mutation: good on average but not in the details. Protein Eng Des Sel 2009, 22: 553–556. 10.1093/protein/gzp030View ArticlePubMedGoogle Scholar
- Johnston MA, Sondergaard C, Nielsen JE: Integrated prediction of the effect of mutations on multiple protein characteristics. Proteins 2011, 79: 165–178. 10.1002/prot.22870View ArticlePubMedGoogle Scholar
- Capriotti E, Fariselli P, Casadio R: I-Mutant2.0: predicting stability changes upon mutation from the protein sequence or structure. Nucleic Acids Res 2005, 33: W306-W310. 10.1093/nar/gki375PubMed CentralView ArticlePubMedGoogle Scholar
- Cheng J, Randall A, Baldi P: Prediction of protein stability changes for single-site mutations using support vector machines. Proteins 2006, 62: 1125–1132.View ArticlePubMedGoogle Scholar
- Huang LT, Gromiha MM, Ho SY: Sequence analysis and rule development of predicting protein stability change upon mutation using decision tree model. J Mol Model 2007, 13: 879–890. 10.1007/s00894-007-0197-4View ArticlePubMedGoogle Scholar
- Shen B, Bai J, Vihinen M: Physicochemical feature-based classification of amino acid mutations. Protein Eng Des Sel 2008, 21: 37–44.View ArticlePubMedGoogle Scholar
- Masso M, Vaisman II: Accurate prediction of stability changes in protein mutants by combining machine learning with structure based computational mutagenesis. Bioinformatics 2008, 24: 2002–2009. 10.1093/bioinformatics/btn353View ArticlePubMedGoogle Scholar
- Dehouck Y, Grosfils A, Folch B, Gilis D, Bogaerts P, Rooman M : Fast and accurate predictions of protein stability changes upon mutations using statistical potentials and neural networks: PoPMuSiC-2.0. Bioinformatics 2009, 25: 2537–2543. 10.1093/bioinformatics/btp445View ArticlePubMedGoogle Scholar
- Khan S, Vihinen M: Performance of protein stability predictors. Hum Mutat 2010, 31: 675–684. 10.1002/humu.21242View ArticlePubMedGoogle Scholar
- Zwanzig RW: High-temperature equation of state by a perturbation method. 1. Nonpolar gases. J Chem Phys 1954, 22: 1420–1426.View ArticleGoogle Scholar
- Straatsma TP, Berendsen HJC: Free energy of ionic hydration: Analysis of a thermodynamic integration technique to evaluate free energy differences by molecular dynamics simulations. J Chem Phys 1988, 89: 5876–5886. 10.1063/1.455539View ArticleGoogle Scholar
- Shoichet BK, Baase WA, Kuroki R, Matthews BW: A relationship between protein stability and protein function. Proc Natl Acad Sci (USA) 1995, 92: 452–456. 10.1073/pnas.92.2.452View ArticleGoogle Scholar
- Beadle BM, Shoichet BK : Structural bases of stability-function tradeoffs in enzymes. J Mol Biol 2002, 321: 285–296. 10.1016/S0022-2836(02)00599-5View ArticlePubMedGoogle Scholar
- Dehouck Y, Biot C, Gilis D, Kwasigroch JM, Rooman M: Sequence-structure signals of 3D domain swapping in proteins. J Mol Biol 2003, 330: 1215–1225. 10.1016/S0022-2836(03)00614-4View ArticlePubMedGoogle Scholar
- Ronga L, Palladino P, Ragone R, Benedetti E, Rossi F : A thermodynamic approach to the conformational preferences of the 180–195 segment derived from the human prion protein alpha2-helix. J Pept Sci 2009, 15: 30–35. 10.1002/psc.1086View ArticlePubMedGoogle Scholar
- Dehouck Y, Gilis D, Rooman M : A new generation of statistical potentials for proteins. Biophys J 2006, 90: 4010–4017. 10.1529/biophysj.105.079434PubMed CentralView ArticlePubMedGoogle Scholar
- Gilis D, Rooman M: Predicting protein stability changes upon mutation using database-derived potentials: solvent accessibility determines the importance of local versus non-local interactions along the sequence. J Mol Biol 1997, 272: 276–290. 10.1006/jmbi.1997.1237View ArticlePubMedGoogle Scholar
- Bava KA, Gromiha MM, Uedaira H, Kitajima K, Sarai A: ProTherm, version 4.0: thermodynamic database for proteins and mutants. Nucleic Acids Res 2004, 32: D120-D121. 10.1093/nar/gkh082PubMed CentralView ArticlePubMedGoogle Scholar
- Berman HM, Westbrook J, Feng Z, Gilliland G, Bhat TN, Weissig H, Shindyalov IN, Bourne PE: The Protein Data Bank. Nucl Acids Res 2000, 28: 235–242. 10.1093/nar/28.1.235PubMed CentralView ArticlePubMedGoogle Scholar
- Porter CT, Bartlett GJ, Thornton JM: The Catalytic Site Atlas: a resource of catalytic sites and residues identified in enzymes using structural data. Nucleic Acids Res 2004, 32: D129-D133. 10.1093/nar/gkh028PubMed CentralView ArticlePubMedGoogle Scholar
- Hubbard SJ, Thornton JM : 'NACCESS'. Computer Program, Department of Biochemistry and Molecular Biology, University College London; 1993.Google Scholar
- Wilmanns M, Priestle JP, Niermann T, Jansonius JN: Three-dimensional structure of the bifunctional enzyme phosphoribosylanthranilate isomerase: indoleglycerolphosphate synthase from Escherichia coli refined at 2.0 A resolution. J Mol Biol 1992, 223: 477–507. 10.1016/0022-2836(92)90665-7View ArticlePubMedGoogle Scholar
- Gilis D, McLennan HR, Dehouck Y, Cabrita LD, Rooman M, Bottomley SP: In vitro and in silico design of alpha1-antitrypsin mutants with different conformational stabilities. J Mol Biol 2003, 325: 581–589. 10.1016/S0022-2836(02)01221-4View ArticlePubMedGoogle Scholar
- Cabrita LD, Gilis D, Robertson AL, Dehouck Y, Rooman M, Bottomley SP: Enhancing the stability and solubility of TEV protease using in silico design. Protein Sci 2007, 16: 2360–2367. 10.1110/ps.072822507PubMed CentralView ArticlePubMedGoogle Scholar
- Fan HY, Morgan SA, Brechun KE, Chen YY, Jaikaran ASI, Woolley GA: Improving a designed photocontrolled DNA-binding protein. Biochemistry 2011, 50: 1226–1237. 10.1021/bi101432pPubMed CentralView ArticlePubMedGoogle Scholar
- Yang DF, Wei YT, Huang RB : Computer-aided design of the stability of pyruvate formate-lyase from Escherichia coli by site-directed mutagenesis. Biosci Biotechnol Biochem 2007, 71: 746–753. 10.1271/bbb.60576View ArticlePubMedGoogle Scholar
- Zhang SB, Wu ZL : Identification of amino acid residues responsible for increased thermostability of feruloyl esterase A from Aspergillus niger using the PoPMuSiC algorithm. Bioresource Technol 2011, 102: 2093–2096. 10.1016/j.biortech.2010.08.019View ArticleGoogle Scholar
- Thusberg J, Vihinen M : Bioinformatic analysis of protein structure-function relationships: case study of leukocyte elastase (ELA2) missense mutations. Hum Mutat 2006, 27: 1230–1243. 10.1002/humu.20407View ArticlePubMedGoogle Scholar
- Thusberg J, Vihinen M : The structural basis of hyper IgM deficiency - CD40L mutations. Protein Eng Des Sel 2007, 20: 133–141. 10.1093/protein/gzm004View ArticlePubMedGoogle Scholar
- Facchiano A, Marabotti A : Analysis of galactosemia-linked mutations of GALT enzyme using a computational biology approach. Protein Eng Des Sel 2010, 23: 103–113. 10.1093/protein/gzp076View ArticlePubMedGoogle Scholar
- Tam LCS, Kiang A-S, Campbell M, Keaney J, Farrar GJ, Humphries MM, Kenna PF, Humphries P: Prevention of autosomal dominant retinitis pigmentosa by systemic drug therapy targeting heat shock protein 90 (Hsp90). Hum Mol Genet 2010, 19: 4421–4436. 10.1093/hmg/ddq369PubMed CentralView ArticlePubMedGoogle Scholar
- Mathys V, Wintjens R, Lefevre P, Bertout J, Singhal A, Kiass M, Kurepina N, Wang XM, Mathema B, Baulard A, Kreiswirth BN, Bifani P : Molecular genetics of para-aminosalicylic acid resistance in clinical isolates and spontaneous mutants of Mycobacterium tuberculosis. Antimicrob Agents Chemother 2009, 53: 2100–2109. 10.1128/AAC.01197-08PubMed CentralView ArticlePubMedGoogle Scholar
- Vallet S, Gouriou S, Nousbaum JB, Legrand-Quillien MC, Goudeau A, Picard B : Genetic heterogeneity of the NS3 protease gene in hepatitis C virus genotype 1 from untreated infected patients. J Med Virol 2005, 75: 528–537. 10.1002/jmv.20302View ArticlePubMedGoogle Scholar
- Tokuriki N, Stricher F, Serrano L, Tawfik DS : How protein stability and new functions trade off. PloS Comput Biol 2008, 4: e10000002.View ArticleGoogle Scholar
- Somero GN: Proteins and temperature. Annu Rev Physiol 1995, 57: 43–68. 10.1146/annurev.ph.57.030195.000355View ArticlePubMedGoogle Scholar
- Wolf-Watz M, Thai V, Henzler-Wildman K, Hadjipavlou G, Eisenmesser EZ, Kern D: Linkage between dynamics and catalysis in a thermophilic-mesophilic enzyme pair. Nat Struct Mol Biol 2004, 11: 945–949. 10.1038/nsmb821View ArticlePubMedGoogle Scholar
- Markert Y, Koditz J, Mansfeld J, Arnold U, Ulbrich-Hofmann R: Increased proteolytic resistance of ribonuclease A by protein engineering. Protein Eng 2001, 14: 791–796. 10.1093/protein/14.10.791View ArticlePubMedGoogle Scholar
- Lichtarge O, Sowa ME: Evolutionary predictions of binding surfaces and interactions. Curr Opin Struct Biol 2002, 12: 21–27. 10.1016/S0959-440X(02)00284-1View ArticlePubMedGoogle Scholar
- Jones S, Thornton JM: Searching for functional sites in protein structures. Curr Opin Chem Biol 2004, 8: 3–7. 10.1016/j.cbpa.2003.11.001View ArticlePubMedGoogle Scholar
- Szilágyi A, Grimm V, Arakaki AK, Skolnick J: Prediction of physical protein-protein interactions. Phys Biol 2005, 2: S1-S16. 10.1088/1478-3975/2/2/S01View ArticlePubMedGoogle Scholar
- Laurie ATR, Jackson RM: Methods for the prediction of protein-ligand binding sites for structure-based drug design and virtual ligand screening. Curr Prot Pept Sci 2006, 7: 395–406. 10.2174/138920306778559386View ArticleGoogle Scholar
- Luque I, Freire E: Structural stability of binding sites: consequences for binding affinity and allosteric effects. Proteins 2000, S4: 63–71.View ArticleGoogle Scholar
- Elcock AH : Prediction of functionally important residues based solely on the computed energetics of protein structure. J Mol Biol 2001, 312: 885–896. 10.1006/jmbi.2001.5009View ArticlePubMedGoogle Scholar
- Cheng G, Qian B, Samudrala R, Baker D: Improvement in protein functional site prediction by distinguishing structural and functional constraints on protein family evolution using computational design. Nucleic Acids Res 2005, 33: 5861–5867. 10.1093/nar/gki894PubMed CentralView ArticlePubMedGoogle Scholar
- Dessailly BH, Lensink MF, Wodak SJ : Relating destabilizing regions to known functional sites in proteins. BMC Bioinformatics 2007, 8: 141. 10.1186/1471-2105-8-141PubMed CentralView ArticlePubMedGoogle Scholar
- Capra JA, Laskowski RA, Thornton JM, Singh M, Funkhouser TA : Predicting protein ligand binding sites by combining evolutionary sequence conservation and 3D structure. PloS Comput Biol 2009, 5: e10000585.View ArticleGoogle Scholar
- Skolnick J, Brylinski M: FINDSITE: a combined evolution/structure-based approach to protein function prediction. Brief Bioinform 2009, 10: 378–391. 10.1093/bib/bbp017PubMed CentralView ArticlePubMedGoogle Scholar
- Sankararaman S, Kirsch JF, Jordan MI, Sjölander K : Active site prediction using evolutionary and structural information. Bioinformatics 2010, 26: 617–624. 10.1093/bioinformatics/btq008PubMed CentralView ArticlePubMedGoogle Scholar
- DeLano WL: Unraveling hot spots in binding interfaces: progress and challenges. Curr Opin Struct Biol 2002, 12: 14–20. 10.1016/S0959-440X(02)00283-XView ArticlePubMedGoogle Scholar
- Ozbabacan SE, Gursoy A, Keskin O, Nussinov R : Conformational ensembles, signal transduction and residue hot spots: application to drug discovery. Curr Opin Drug Discov Devel 2010, 13: 527–537.Google Scholar
- Conchillo-Solé O, de Groot NS, Avilés FX, Vendrell J, Daura X, Ventura S : AGGRESCAN: a server for the prediction and evaluation of "hot spots" of aggregation in polypeptides. BMC Bioinformatics 2007, 8: 65. 10.1186/1471-2105-8-65PubMed CentralView ArticlePubMedGoogle Scholar
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.