- Research article
- Open Access
Testing the Coulomb/Accessible Surface Area solvent model for protein stability, ligand binding, and protein design
© am Busch et al; licensee BioMed Central Ltd. 2008
Received: 09 October 2007
Accepted: 13 March 2008
Published: 13 March 2008
Protein structure prediction and computational protein design require efficient yet sufficiently accurate descriptions of aqueous solvent. We continue to evaluate the performance of the Coulomb/Accessible Surface Area (CASA) implicit solvent model, in combination with the Charmm19 molecular mechanics force field. We test a set of model parameters optimized earlier, and we also carry out a new optimization in this work, using as a target a set of experimental stability changes for single point mutations of various proteins and peptides. The optimization procedure is general, and could be used with other force fields. The computation of stability changes requires a model for the unfolded state of the protein. In our approach, this state is represented by tripeptide structures of the sequence Ala-X-Ala for each amino acid type X. We followed an iterative optimization scheme which, at each cycle, optimizes the solvation parameters and a set of tripeptide structures for the unfolded state. This protocol uses a set of 140 experimental stability mutations and a large set of tripeptide conformations to find the best tripeptide structures and solvation parameters.
Using the optimized parameters, we obtain a mean unsigned error of 2.28 kcal/mol for the stability mutations. The performance of the CASA model is assessed by two further applications: (i) calculation of protein-ligand binding affinities and (ii) computational protein design. For these two applications, the previous parameters and the ones optimized here give a similar performance. For ligand binding, we obtain reasonable agreement with a set of 55 experimental mutation data, with a mean unsigned error of 1.76 kcal/mol with the new parameters and 1.47 kcal/mol with the earlier ones. We show that the optimized CASA model is not inferior to the Generalized Born/Surface Area (GB/SA) model for the prediction of these binding affinities. Likewise, the new parameters perform well for the design of 8 SH3 domain proteins where an average of 32.8% sequence identity relative to the native sequences was achieved. Further, it was shown that the computed sequences have the character of naturally-occuring homologues of the native sequences.
Overall, the two CASA variants explored here perform very well for a wide variety of applications. Both variants provide an efficient solvent treatment for the computational engineering of ligands and proteins.
Solvation effects play an important role in protein folding and stability. Likewise, the processes of protein:protein and protein:ligand binding are accompanied by effects such as desolvation and rearrangement of solvent molecules. These solvation effects can be calculated by explicit solvent models, such as molecular dynamics simulations using a sphere of water molecules . For certain large-scale applications, however, this explicit solvent treatment is too time-consuming. In protein design, an enormous number of amino acid sequences need to be considered . Likewise, the screening of a large library of ligand molecules as a function of their binding affinity to a protein is a costly procedure and requires efficient methods. To avoid these problems, implicit solvent models are often used, which yield significant computational efficiency . They do not consider the solvent degrees of freedom explicitly, but treat the solvent as a continuous medium having the average properties of the real solvent. Empirical methods, such as the solvent accessible surface area (ASA) model , often provide simple and quick ways of evaluating the solvation energy with an accuracy comparable to theoretical models. ASA models have become widely accepted within available implicit solvent treatments and have been used successfully in many applications, such as protein molecular dynamics [5–7], structure prediction  and protein:ligand binding [9, 10]. In the ASA approach, the solvation free energy of a solute is expressed as a sum of atomic contributions, weighted by their solvent-exposed area. The contribution of each atom is quantified by a surface coefficient, which reflects the hydrophobicity or hydrophilicity of the particular atom type.
Apart from non-polar contributions to the solvation free energy, such as the entropy cost for cavity formation and van der Waals interactions, electrostatic contributions play an important role. Due to the fitting of the ASA model to experimental data, the electrostatic contribution is partly incorporated into the parameters. However, especially when using a small number of atom types, it is necessary to additionally calculate a screening energy, which accounts for the shielding of protein-protein electrostatic interactions by the high dielectric solvent. A simple approach is to add a term that reduces the electrostatic interactions between protein atoms by a constant factor, ε, which plays the role of a dielectric constant. To balance the two components of the solvent model, an overall weight α is applied to the ASA term. This combined model is known as the Coulomb/Accessible Surface Area (CASA) model . More accurate approaches for screening energy calculations are the Generalized Born (GB) model [11, 12] and the Poisson-Boltzmann (PB) model [13–15]. A disadvantage of these methods, however, is that they are not easily pairwise decomposable : the energy is not expressed as a sum over atom pairs. Further, they are relatively time-consuming compared with surface area based models.
The first set of atomic solvation parameters distinguishing between different atom types was developed by Eisenberg and McClachlan in 1986 . They used octanol to water transfer energies for 20 amino acids to derive solvation coefficients for 5 atom types. Subsequently, a number of studies have been devoted to the parameterization of the atomic surface coefficients. They differ in the assignment of atoms to characteristic groups and in the experimental data that were used to fit the coefficients. Ooi et al.  used 7 different atomic coefficients, fitted to experimental free energies of solvation of small organic molecules. Fraternali and van Gunsteren  restricted the atom types to only two: one for carbon, representing the hydrophobic effect, and one for both nitrogen and oxygen, representing the hydrophilic effect. These two parameters were optimized, such that the hydrophobic and hydrophilic solvent-accessible surface areas in proteins obtained from MD simulations matched those measured from the corresponding X-ray structures. Later, several more highly-parameterized ASA models [10, 18, 19] were developed, which use up to 100 different atom types and large training sets of experimental solvation free energies for diverse organic molecules.
In previous work , we optimized the CASA model for side chain placement and mutagenesis. We modified the atomic solvation parameters of Fraternali  by including an additional surface coefficient for atoms in charged groups, and by optimizing the dielectric constant ε and the weight α of the surface term. The model was optimized and tested using sidechain reconstruction calculations, protein solvation energies, and stability changes for point mutations involving the insertion or removal of a charged sidechain. In this paper, we continue to explore the performance of the CASA model, pursuing two directions. First, we consider whether a specific treatment of aromatic groups can lead to an improved parameterization. Second, we consider a broader set of test calculations than before. We again consider stability changes, but we include a wider variety of mutations, most of which do not affect charged groups. We consider the calculation of protein:ligand binding free energy changes due to point mutations, a very important application. So far, most ligand binding studies using ASA models used a large set of atomic surface coefficients. Pei et al , for example, used 100 atom types and reproduced the binding free energies of a test set of 50 protein:ligand complexes with a standard error of 2.0 kcal/mol. Our own model is much less heavily parameterized, but yields a comparable accuracy.
Finally, we perform automated protein design for eight small proteins of the SH3 family. Computational protein design is another area that requires good implicit solvent models [2, 21, 22]. This approach can help engineer new proteins as well as predict protein structure. It considers a given backbone structure of a protein and predicts the amino acid sequences that fold into it [23–30]. This information can be used either to identify mutations that stabilize a given structure or to assign the given 3D structure to new sequences with yet undetermined protein structures. The protein design procedure is applied here to a small test set of eight SH3 proteins and the performance of the optimized solvation parameters is compared to that of the earlier parameter set.
The newer CASA parameterization derived here treats aromatic groups as a specific group. All the model parameters are then reoptimized from scratch. Most studies so far have used experimental transfer free energies from octanol or cyclohexane to water for small model molecules to derive solvation parameters. However, organic solvents are only a crude approximation of the protein interior . A more recent study by Zhou et al.  used a database of protein mutation experiments to develop atomic solvation parameters. With these parameters, the binding free energies of 21 protein-protein complexes were predicted with an rms deviation of 2.3 kcal/mol. Likewise, Lomize et al.  achieved very good agreement with experimental data using atomic solvation parameters based on protein stabilities. Here, we use a similar approach and derive our newer solvation parameters from experimental protein and peptide stability changes. We employ a procedure that attempts to match the computed stability changes to the experimental values, using a set of 140 mutations. Simultaneously, the model for the unfolded reference state of a protein is optimized: for each amino acid type, a preferred unfolded conformation is chosen from a large library of tripeptide fragments obtained from various proteins.
The final parameter set yields improved performance for protein stabilities, as expected. The newer and the earlier parameters yield comparable, good performance for ligand binding and protein design. Given the importance of implicit solvent models in the fields of structure-based drug design, prediction of protein structure and protein design, both of our optimized CASA models should be valuable tools for a wide range of applications.
CASA parameter optimization and stability calculations
Atomic solvation parameters (kcal/mol/Å2) for different atom types
To compute the stability of a protein, it is necessary to construct a model for the unfolded state. Here, we assume that sidechains do not interact with each other in the unfolded state, and we describe the environment of each sidechain with a simple tripeptide model. Each sidechain thus interacts only with the local tripeptide backbone and the solvent. For each amino acid type, one preferred structure was chosen from a large set of tripeptide conformations obtained from various proteins.
Rms and mean unsigned error (kcal/mol) for stability mutations
group of data
number of mutations
In addition to the criterium of minimal deviation from the experimental data, care was taken to select a parameter set that makes sense physically. The ordering of the coefficients should follow the expected preference of solvent exposure of each atom group: unpolar < aromatic < polar < ionized. Further, the coefficients of different groups should be sufficiently separated from each other. The surface coefficient of the unpolar atom group was allowed to be slightly negative, as this led to better agreement with the experimental data. To ensure that this does not lead to unphysical behaviour, the surface coefficient was tested on a propane dimer system. Studies on methane and neopropane association in water report an association free energy of -1.0 and -2.7 kcal/mol, respectively [35, 36]. Using values between 0.0119 and -0.01 kcal/mol/Å2 for the nonpolar surface coefficient, the computed association free energy varies from -2.75 to +0.09 kcal/mol. The nonpolar coefficient of -0.005 kcal/mol/Å2 used here gives an acceptable association free energy of -0.56 kcal/mol.
To verify that the parameters were not overly biased by the optimization procedure, we performed cross-validation tests (see Methods). Part of the data (30 mutations) were omitted from the optimization process, then used to test the error level. This led to very similar parameters and errors compared to the original parameter optimization. Specifically, one cross-validation run led to the following optimized parameters (in kcal/mol/Å2; the PHIA values obtained above are given in parentheses): aromatic: -0.08 (-0.04); ionic: -0.10 (-0.10); polar: -0.09 (-0.08); nonpolar: -0.005 (-0.005). The optimal dielectric was 24, as before. The mean and rms errors for the omitted data were 2.22 and 3.13 kcal/mol, respectively, compared to 2.11 and 2.70 for the optimization data. The mean and rms errors with the PHIA parameters for the omitted data were similar: 2.32 and 2.92 kcal/mol. The other cross-validation run led to the exact same atomic coefficients and dielectric constant, and to mean and rms errors for the omitted data of 2.08 and 2.68 kcal/mol (compared to 2.16 and 2.71 kcal/mol with the PHIA set). Thus the optimized parameters and the error levels are similar with and without cross-validation, showing that the resulting parameters are fairly robust.
The model performance shows a moderate dependence on the system considered. The peptide models taken alone lead to a slightly better agreement with the experimental data (Table 2), which might be due to their simple helical structure compared with a large protein system (see also Figure 2). The performance of the uncharged mutations is better than that of the charged ones (Table 2). This occurs partly because the proportion of charged mutations is higher for the proteins (mostly charged mutations).
Compared with the results of Lomize et al. , which gave an rms deviation from experimental stability changes of only 0.41 kcal/mol, the performance here is less good. In their study, however, the solvation parameters were fitted to a more restricted set of experimental data. Only buried, uncharged residues in α-helices and β-sheets of proteins were used. In contrast, in our study, both charged and uncharged, solvent-exposed and buried, and protein and peptide mutations were included. Therefore, it can be expected that our more varied set of mutations results in a higher deviation from experiment.
The performance of both the earlier, MF and the new, PHIA solvation parameters was assessed on a large set of experimental binding affinities and resistance mutations. These data include 80 mutations in six different ligand-protein systems: (i) tyrosyl-tRNA synthetase in complex with tyrosine (TyrRS), (ii) aspartyl-tRNA synthetase in complex with aspartate (AspRS), (iii) lysozyme in complex with the antibody HyHel-10 (Lyso) and (iv) the complex of the transmembrane glycoprotein CD4 with the gp120 component of the HIV virus (CD4); (v) the BPTI:trypsin complex; (vi) the chymotrypsin:BPTI complex. A seventh system was also studied: the tyrosine kinase Abl in complex with the drug imatinib. For this system, information is only available on mutations leading to resistance to the drug imatinib , while for the other four systems, more precise values of the binding affinities are available. The energy function was slightly different from the one used for the stability mutations above. Here, the dielectric constant (Eq. 2) ε was set to 16 and the weight α of the surface area term was set to 0.5 (instead of ε = 24, α = 1, above). These values gave improved performance for the binding affinities.
Mean error (kcal/mol) for the binding free energies with CASA and GB/SA
number of mutations b
Results were also computed for two other systems, the BPTI:trypsin  and BPTI:chymotrypsin  complexes. 13 mutations at position 15 in BPTI were studied in each case. The native residue is a lysine. One mutation was excluded (K15W in BPTI:trypsin) due to a large van der Waals contribution to the affinity change (see Methods), leaving 25 mutations. With the simple protocol used above, the agreement with experiment was poorer, with a mean error of 3.4 kcal/mol for the 25 mutations. A slightly different protocol was then tried. The entire BPTI was minimized, instead of just the mutated position (see Methods). This led to improved agreement, with mean errors of 2.68 and 2.81 kcal/mol for BPTI:trypsin and BPTI:chymotrypsin, respectively: about the same level as for the lysozyme:antibody complex. This illustrates the need for more extensive structural relaxation with some mutations. More generally, these two examples show that, not surprisingly, with the simple energy functions and conformational exploration used here, a certain amount of system-specific parameter fitting and adjustment can be necessary.
Binding free energy differences for the ABL:imatinib complex with CASA and GB/SA
There are several mutations that are badly predicted, including R59A in the CD4:gp120 system and D101G and K96M in the Lysozyme:antibody system (Figure 3). The first two cases involve a large to small mutation, and all three involve removal of a net charge; these processes which might require a more extensive rearrangement of the surrounding residues, as in the BPTI complexes (where the native residue was a lysine in all 25 mutations). In the simple protocol employed here, the rotamers of the side chains in the vicinity of the mutation are not reoptimized, and the slight minimization carried out here might not be sufficient to model a realistic mutated protein conformation. For the K96M case, we tried the same protocol as for the BPTI cases, applying 50 steps of Powell minimization to the entire lysozyme protein, instead of just the mutated sidechain. This led to a computed binding free energy change of +4.5 kcal/mol, much closer to the experimental value of 7.9 kcal/mol. For one other case, the D78A mutation in TyrRS, we employed a much more extensive conformational search: all the rotamers close to the mutation site were explored using a stochastic search strategy; the error in the binding free energy decreased from 1.9 kcal/mol to 0.8 kcal/mol (not shown). These examples suggest that when the sidechain charge and/or volume changes substantially, a more sophisticated conformational sampling is required. More work in this direction is underway.
On the whole, however, we obtain fair agreement with experimental data using a very simple method. The overall mean unsigned error for the 80 mutations is 1.96 kcal/mol; the rms error is 2.73 kcal/mol. The correlation between the computed and measured data is 73%. The rank ordering of the data is characterized by a Spearman rank correlation of 70% ; the probability of obtaining this value by chance is less than 0.001 (according to Student's test with a t-value of 8.59 and 80 degrees of freedom ). Our errors are only slightly higher than those reported by Pei et al., who obtained an rms error of 2.0 kcal/mol for the binding affinities of 50 protein-ligand complexes. Likewise, the accuracy obtained here appears comparable to that achieved by Guérois et al.  for their charged mutations, although this subset was not reported separately. Recently, Handel et al.  predicted the stabilities of more than 1500 mutants to within 1 kcal/mol. Partly, this improvement might be due to their more sophisticated, all-atom, force field, which increases accuracy but makes their calculations almost an order of magnitude more expensive, compared with the CHARMM19/CASA level (the increased cost being due partly to the explicit treatment of all hydrogens and partly to the need for a more detailed rotamer library). Further, our data set contains a higher percentage of charged mutations, which makes predictions more demanding.
As an additional reference, the performance of the optimized CASA models was compared to that of a GB/SA model, which should provide a more accurate treatment of the electrostatic contribution to the solvation free energy . Only the 55 mutations for systems (i-v) were studied. Two variants of the GB model were tested: (i) GB-HCT  in combination with the Amber, all-atom force field  and (ii) GB-ACE [45, 46] with the CHARMM19 force field . For the GB-HCT variant, the overall quality of the results is close to the CASA level, with a slightly higher mean error (1.96 kcal/mol, Table 3). The GB-ACE variant gives notably higher mean errors for all protein-ligand systems. This might be due to the GB-ACE parameterization which is not optimized for this specific application. The GB-HCT parameters were optimized previously for computational sidechain placement and protein mutagenesis . Some differences between GB-HCT and CASA might be due to the different force field treatments, as Amber uses explicit hydrogens on all atoms, while CHARMM19 uses implicit hydrogens for unpolar atoms. The qualitative agreement with the experimental resistance mutations for Abl:imatinib for the two GB variants is comparable to that obtained using the optimized CASA models (Table 4).
In protein design, the amino acid side chains are mutated and sequences are selected to optimize the folding free energy, using a heuristic search algorithm. In our previous protein design study , we obtained good results for 16 different globular proteins using the MF solvation parameters. Here, we consider a subset of those proteins, consisting of 8 SH3 domains. These proteins are used to assess the performance of the new, PHIA solvation parameters for protein design. The protein design calculations were carried out using our Proteins@Home distributed computing platform, with the help of volunteers in several countries. Proteins@Home is discribed in more detail elsewhere .
Mean identities (%) for the computed sequences
MF (ε = 10)
PHIA (ε = 14)
Blosum scores for computed and natural sequences
With both the PHIA and the MF parameters, the identity scores obtained for the 8 proteins lie within the range of published average identity scores for redesigned proteins [27, 28, 41, 50, 51]. In a protein design study by Jaramillo et al.  sequence optimizations for 11 SH3 domains were performed and resulted in an average sequence identity of 23.9%. Our energy-ranked PHIA sequences lie well above this score, with a sequence identity of 32.8% averaged over all 8 proteins. Recently, Saunders et al.  used a refined protein design method and reported sequence identities as high as 37% for 42 globular proteins. Considering the full sequence identities of our best-scoring sequences, both our parameter sets give results that lie close to this value. Pokala and Handel  used an all-atom force field and a GB/SA solvent model to redesign 8 proteins and achieved somewhat higher sequence identities, between 33.5 and 46.7%. This approach, however, includes a negative design criterion, which constrains the surface amino acid composition of the proteins to be native-like. It also leads to an increase in computational effort of about two orders of magnitude, compared with the CASA solvent model and a united-atom force field such as Charmm19.
Simple, efficient, solvent models are of great importance in protein modelling and structural bioinformatics. Here, we have continued to explore the performance of the CASA solvent model, in two directions. First, we considered a variant of increased complexity, where aromatic atoms are treated as a separate group. Our previous approach  did not distinguish between unpolar and aromatic atoms. Indeed, the solvation properties of aromatic groups are rather different from other nonpolar groups found in proteins. For this variant, we reparameterized the model completely, leading to the PHIA parameter set. Second, we applied the CASA model to a wider set of applications than previously. We considered protein stability changes associated with point mutations, including all amino acid types (in contrast to our previous work ). We also considered protein:ligand binding, an especially important application. Finally, we performed complete protein redesign with the new parameters.
For the new, PHIA parameterization, four different atom types were considered for the atomic surface coefficients: unpolar, aromatic, polar and ionized atoms. The solvation parameters were fitted to a set of 140 experimental stability changes for protein and peptide mutations. Protein stabilities were calculated using an unfolded reference state modelled by a collection of tripeptide structures. These reference structures were taken from a large library of structural fragments from six different proteins. Starting from the earlier, MF solvation parameters [6, 20], an iterative procedure was employed, which optimizes, at each cycle, both the solvation parameters and the model for the unfolded reference state of a protein. Atomic parameters were chosen that gave a minimal deviation from the experimental data and also represent the expected relative hydrophobicities of the atom groups. The selected parameter set gives a mean unsigned error of 2.28 kcal/mol for the 140 stability mutations, compared to 3.56 kcal/mol with the MF parameters. Cross-validation tests gave similar parameter values and similar error levels.
For the binding calculations, over 50 experimental mutations in 5 different protein-ligand systems were used, including both small molecule ligands and protein-protein complexes. The calculated differences in binding free energy are in reasonable agreement with the experimental data, with both CASA variants. The mean unsigned error is 1.76 kcal/mol with the PHIA parameters and 1.47 kcal/mol with the MF parameters. It was also shown that the optimized CASA models are not inferior to methods such as GB-ACE/SA or GB-HCT/SA, which treat the electrostatic contribution to solvation more accurately. Two additional protein-protein complexes (BPTI:trypsin, BPTI:chymotrypsin) required a slightly modified protocol to give comparable error levels.
Protein design was carried out for eight SH3 domain proteins and the performance of the PHIA parameters was compared with that of the MF parameters. On average, slightly lower sequence identities with the native sequence were achieved using the new, PHIA parameters. Differences, however, are small, and the performance depends on the particular protein. On the whole, both parameter sets give sequences of comparable quality, which are competitive with recently published results for designed proteins. Further, the computed sequences were found to have the character of naturally occuring, distant homologues of the native sequences.
Overall, the CASA model performs well for a wide variety of applications. The PHIA parameters, specifically optimized here for protein stability, give a distinct improvement over the earlier, MF parameters for this application. For ligand binding and protein design, the specific treatment of aromatic groups in the PHIA parameterization did not lead to an improvement in performance. Rather, the two parameter sets perform well for both applications, with the exact relative performance dependening on the particular system. Both variants provide an efficient tool for the computational engineering of ligands and proteins.
Effective energy function
The effective free energy function we used in our calculations takes the following form:
E = E bonds + E Angl + E Dihe + E impr + E vdW + E Coul + E solv (1)
The first six terms represent the protein internal energy and are taken from the CHARMM19 empirical energy function : a covalent bond energy term, a bond angle energy term, a torsion energy term, an improper dihedral energy term which maintains the chirality or planarity of certain atom centres, a Van der Waals energy term and a Coulomb electrostatic energy term. The last term, E solv , models the effect of the solvent, and represent either a CASA term or a GB term in this study. When using the GB variant HCT for the solvent term, force field parameters for the energy function were taken from Amber  (see below).
Coulomb/Accessible Surface Area (CASA) model
where A i is the exposed solvent accessible surface area of atom i, and the summation is over all atoms in the solute; σ i (measured in kcal/mol/Å2) is a parameter that depends on the nature of atom i and reflects each atom's preference to be exposed or hidden from solvent; α is an overall weight applied to the surface energy term. Surface areas were computed by the Lee and Richards algorithm , implemented in the XPLOR program , using a 1.5 Å probe radius. The solute atoms were divided into 4 groups with characteristic surface coefficients σ i : unpolar, aromatic, polar and ionic. Hydrogen atoms were assigned a surface coefficient of 0. The weight α was not optimized during the parameter scans (fixed to 1) but was adjusted in subsequent applications. For the protein design calculations (see below), a value of 1 was used. For the ligand-binding calculations (below), a value of 0.5 worked best. The dielectric constant was optimized in the parameter scans, with a value of 24 working well for the stability mutations. Values of 16 and 14 were used, respectively, for the ligand-binding and protein design calculations.
Generalized Born/Surface Area (GB/SA) model
where E GB is the GB term consisting of a self-energy term and an interaction term as described elsewhere [14, 20]; A i is the exposed solvent accessible surface area of atom i, and the summation is over all atoms in the solute. In effect, a single surface coefficient σ is used for all atom types.
Calculation of stability changes
where Gmut, Gnat are the free energies of the mutant and native protein, respectively, and , are the free energies of the unfolded reference state for the mutant and native protein, respectively. The free energy of each state is evaluated using the effective energy function given in equation (1) with the solvent contribution being represented by a CASA term. The nonbonded interactions were cut off at a distance of 10 Å between atoms using a shifting and a switching function for electrostatic and van der Waals interactions, respectively.
The native structures for the folded state were taken from the Protein Data Bank (PDB)  with the structure codes 2LZM, 2RN2 and 1STN, respectively. The side chains were slightly minimized prior to any energy evaluation. For the peptide mutations, experimental structures are not available, and models were built using the SwissPDB viewer , which constructs an ideal α helix from a given sequence. After side chain minimization of the helix models, the structures were treated identically to the protein structures.
The corresponding mutant protein and peptide structures were created by replacing the side chain at the mutated position with the mutant side chain while maintaining all other atom coordinates. The coordinates for the mutant side chain were taken from the Tuffery rotamer library . For each rotamer, the side chain was minimized (30 steps of Powell minimization), with the backbone fixed, and the energy of the mutant protein was evaluated. The mutant side chain rotamer giving the lowest energy for the mutant protein was retained. For the native structures, the side chain conformation at the position to be mutated was kept as in the crystal structure and only subjected to a short minimization (30 steps). In some cases, unfavourable van der Waals contacts in the proteins occured upon mutation. These data were considered as outliers and were not used for parameter adjustment (see below).
The experimental stability changes for the protein mutations were taken from the ProTherm database . For the peptide stability changes, experimental helix propensity scales were used. The sequences of the various peptide systems are given below, with the mutated position denoted by X:
pepT1: SSDVSTAQXAAYKLHED ,
KEAKE: YEAAAKEAXAKEAAAKA ,
K2AE2: YSEEEEKAKKAXAEEAEKKKK ,
VAR: KETAAAKFERQHMDS ,
PAD: YKAAAAKAAXAKAAAAK ,
KAL: YSEEEEKKKKXEEEEKKKK ,
SH1: AETAAAKFERQHM ,
SH2: KETAAAKFERAHA 
In the unfolded state of a protein, it is assumed that amino acid side chains do not interact with each other, but only with nearby backbone groups and with solvent. This situation can be modelled by a collection of n tripeptide structures with the sequence Ala-X-Ala; n is the number of amino acids in the protein. For each amino acid type X, a number of possible structures with different backbone and side chain conformations were considered. These structures were extracted from various positions in the X-ray structures of 6 different proteins taken from the PDB : lysozyme (2LZM), bovine pancreatic trypsin inhibitor (4PTI), staphylococcal nuclease (1STN), α-toxin (1PTX), ribonuclease A (2RN2) and cyclophilin (2CPL). In each tripeptide structure, the side chain X was slightly minimized with respect to itself and the backbone of the whole tripeptide. To choose the optimal tripeptide structure for each amino acid type, the interaction between the respective side chain and the tripeptide backbone served as a criterium. Thus, for each amino acid type X, the tripeptide structure giving the lowest interaction energy was taken to represent the preferred structure for X in the unfolded state. The total free energy of the unfolded state is obtained by summing the contributions, E X , of the n individual amino acids of the protein. When comparing the folding free energies of two sequences, only sidechain – sidechain and sidechain – backbone interactions are taken into account. Interactions between different portions of the backbone cancel, both in the folded and the unfolded state, so that no important interactions are missed through the tripeptide unfolded model.
Since the choice of the reference structure for each amino acid depends on the set of parameters used in the CASA model, the solvent parameters and tripeptide structures had to be optimized iteratively (Figure 1). As a starting point for optimization, we took the surface parameters developed by Fraternali and van Gunsteren , supplemented by an additional surface coefficient for ionic atoms . In this initial parameter set, aromatic atoms were assigned the same surface coefficient as unpolar atoms.
The atom groups were assigned as follows: (i) unpolar: all alkane carbons, the carbonyl carbons of the protein backbone, and S; (ii) aromatic: Trp, Phe and Tyr aromatic ring carbons and nitrogens; (iii) polar: N/O atoms not belonging to ionized groups, N-C-N group in the His ring; (iv) ionic: guanidinium group of Arg, carboxyl group of Asp/Glu, N-C-N group in the ring for protonated His.
Range of solvation parameters (kcal/mol/Å2) and dielectric values ε scanned during the iterative optimization
-0.005 to 0.01
-0.08, -0.06 to 0.01
-0.12, -0.10 to -0.04
-0.20, -0.18 to -0.10
16 to 32
The optimization procedure was tested for bias and overfitting by the following cross-validation procedure. 30 of the 140 mutants were chosen randomly and left out of the optimization. The reference structures were taken from the above, iterative optimization and kept fixed. Parameter scanning was then performed, leading to several good quality parameter sets. The mean errors were then computed for the omitted, or "test" data. This procedure was done twice, with two distinct sets of mutations omitted from the optimization. The parameter sets and error levels from these two runs were similar to each other and to the iterative optimization described above, showing that the optimization is not subject to excessive overfitting or bias.
Binding affinities were calculated for 5 different ligand-protein systems taken from the PDB  and a number of their mutants: (i) tyrosine kinase Abl in complex with imatinib (1OPJ), (ii) Tyrosyl-tRNA synthetase in complex with tyrosine (4TS1), (iii) Aspartyl-tRNA synthetase in complex with aspartate (1IL2), (iv) Lysozyme in complex with the antibody HyHel-10 (3HFM) and (v) the complex of the glycoprotein CD4 with the gp120 component of the HIV virus (1G9M). As starting structures, the ligand-bound X-ray structures of these 5 proteins were used. The system (iii) was truncated to a 30Å sphere around the ligand; systems (ii), (iv) and (v) were truncated to 20 Å spheres around the ligand, and for system (i) the untruncated structure of chain B was used. The mutant structures were created by replacing the side chain at the relevant position with a rotamer of the mutant side chain from the Tuffery library .
A simple protocol was adopted for the energy evaluation: The side chain at the mutated position was subjected to 50 steps of minimization with respect to itself and to all other side chains with the backbone kept fixed. During this minimization, all sidechains were allowed to adjust to the introduced mutation but otherwise inter-sidechain interactions were excluded. The energy of this slightly adjusted protein conformation was taken as the ligand-bound energy. For the ligand-free state, the ligand was removed, the side chains were again minimized and the energy of this conformation was taken. For the mutated sidechain, all rotamers from the library were considered, and the lowest energy for the ligand-bound and ligand-free states, respectively, was retained. In the native structure, the rotamer at the position to be mutated was not varied, as it is assumed that the X-ray structure already represents a low energy conformation.
The CASA model as described above using the parameters optimized earlier , with a weight factor α of 0.5 and a dielectric constant ε of 16. We obtained good results for sidechain placement and stability changes in our previous work  using these values.
The CASA model with the parameters optimized here, with the same weight factor α of 0.5 and dielectric constant ε of 16. These values of α and ε were chosen because they gave the best agreement with the experimental binding affinities.
A Generalized Born/Surface Area (GB/SA) model with a weight factor σ of -0.05 kcal/mol/Å2 for the surface term and a dielectric constant ε of 8.0.
Mutations for which the van der Waals energy contributed more than 10 kcal/mol to the difference in binding energy were considered outliers and were not included in the results. These contributions are probably due to unfavourable contacts that are not resolved by the simple minimization protocol used here.
For two additional systems, a slightly different protocol gave distinctly better results. These were the BPTI:trypsin and BPTI:chymotrypsin complexes [38, 39]. Several BPTI mutations at position 15 (within the interface) were considered. Instead of minimizing just the mutated sidechain, as above, we minimized the entire BPTI protein for each choice of rotamer (for 50 steps, as above).
The energy function for protein design corresponds to Equation (1), with the solvation contribution described by a CASA term. The interaction energy between each possible combination of sidechain pairs, or between a sidechain and the backbone, are precomputed and stored in an energy matrix. For a given sidechain pair, this calculation includes all possible combinations of both amino acid types and rotamer values. Once the energy matrix is computed, the amino acid sequence is optimized in a second stage, through cycles of random mutations and steepest-descent minimization. This heuristic procedure was developed and validated by Wernisch et al. . A "heuristic cycle" proceeds as follows. An initial amino acid sequence and set of sidechain rotamers are chosen randomly. These are improved in a stepwise way. At a given amino acid position i, the best amino acid type and rotamer are selected, with the rest of the sequence and structure held fixed. The same is done for the following position i + 1, and so on, performing multiple passes over the amino acid sequence until the energy no longer improves (or a given, large number of passes is reached). The final sequence, rotamer set, and energy are output, ending the cycle. For the design calculations below, we performed 450,000 heuristic cycles for each protein. Disulfide-bonded cysteines, glycines and prolines are expected to have a special effect on the protein's folded and unfolded state structures, which may not be accurately captured by our method. Therefore, if these amino acids were present in the native sequence, they were held fixed; all other amino acids were allowed to mutate freely. The calculations were done using our Proteins@Home distributed computing platform. This allows us to use the computers of several thousand volunteers in over 70 countries. Proteins@Home is based on the Berkeley Open Infrastructure for Network Computing, BOINC . The Proteins@Home platform and project will be described in detail elsewhere .
Reference energies (kcal/mol) characterizing the unfolded state
We thank the many volunteers who have participated in the Proteins@Home project and contributed computer cycles used in this work. See biology.polytechnique.fr/proteinsathome for a complete list of participants. We thank the BOINC development community for testing the alpha version of Proteins@Home. We thank Alexey Aleksandrov and David Mignon for discussions and the ANR High Performance Computing program for support.
- Becker O, Mackerell A Jr, Roux B, Watanabe M, Eds: Computational Biochemistry & Biophysics. Marcel Dekker, New York; 2001.Google Scholar
- Guérois R, Lopez de la Paz M, Eds: Protein Design: Methods And Applications. Humana Press; 2007.Google Scholar
- Roux B, Simonson T: Implicit solvent models. Biophys Chem 1999, 78: 1–20. 10.1016/S0301-4622(98)00226-9View ArticlePubMedGoogle Scholar
- Eisenberg D, McClachlan A: Solvation energy in protein folding and binding. Nature 1986, 319: 199–203. 10.1038/319199a0View ArticlePubMedGoogle Scholar
- Wesson L, Eisenberg D: Atomic solvation parameters applied to molecular dynamics of proteins in solution. Prot Sci 1992, 1(2):227–235.View ArticleGoogle Scholar
- Fraternali F, van Gunsteren W: An efficient mean solvation force model for use in molecular dynamics simulations of proteins in aqueous solution. J Mol Biol 1996, 256: 939–948. 10.1006/jmbi.1996.0139View ArticlePubMedGoogle Scholar
- Ferrara P, Apostolakis J, Caflisch A: Evaluation of a fast implicit solvent model for molecular dynamics simulations. Proteins 2002, 46: 24–33. 10.1002/prot.10001View ArticlePubMedGoogle Scholar
- Koehl P, Delarue M: Polar and nonpolar atomic environments in the protein core. Implications for folding and binding. Proteins 1994, 20: 264–278. 10.1002/prot.340200307View ArticlePubMedGoogle Scholar
- Juffer AH, Eisenhaber F, Hubbard SJ, Walther D: Comparison of atomic solvation parametric sets: Application and limitations in protein folding and binding. Prot Sci 1995, 4: 2499–2509.View ArticleGoogle Scholar
- Pei J, Wang Q, Zhou J, Lai L: Estimating protein-ligand binding free energy: Atomic solvation parameters for partition coefficient and solvation free energy calculation. Proteins 2004, 57(4):661–664. 10.1002/prot.20198View ArticleGoogle Scholar
- Feig M, Brooks CL III: Recent Advances in the Development and Application of Implicit Solvent Models in Biomolecule Simulations. Curr Opin Struct Biol 2004, 14: 217–224. 10.1016/j.sbi.2004.03.009View ArticlePubMedGoogle Scholar
- Simonson T: Macromolecular electrostatics: continuum models and their growing pains. Curr Opin Struct Biol 2001, 11: 243–252. 10.1016/S0959-440X(00)00197-4View ArticlePubMedGoogle Scholar
- Honig B, Nicholls A: Classical electrostatics in biology and chemistry. Science 1995, 268: 1144–1149. 10.1126/science.7761829View ArticlePubMedGoogle Scholar
- Schaefer M, Sommer M, Karplus M: pH-dependence of protein stability: absolute electrostatic free energy differences between conformations. J Phys Chem B 1998, 101: 1663–1683. 10.1021/jp962972sView ArticleGoogle Scholar
- Simonson T: Electrostatics and dynamics of proteins. Rep Prog Phys 2003, 66: 737–787. 10.1088/0034-4885/66/5/202View ArticleGoogle Scholar
- Archontis G, Simonson T: A residue-pairwise Generalized Born scheme suitable for protein design calculations. J Phys Chem B 2005, 109: 22667–22673. 10.1021/jp055282+View ArticlePubMedGoogle Scholar
- Ooi T, Oobatake M, Nemethy G, Scheraga H: Accessible surface areas as a measure of the thermodynamic hydration parameters of peptides. Proc Natl Acad Sci USA 1987, 84: 3086–3090. 10.1073/pnas.84.10.3086PubMed CentralView ArticlePubMedGoogle Scholar
- Wang W, Lim W, Jakalian A, Wang J, Luo R, Bayly C, Kollman P: An analysis of the interactions between the Sem-5 SH3 domain and its ligands using molecular dynamics, free energy calculations, and sequence analysis. J Am Chem Soc 2001, 123: 3986–3994. 10.1021/ja003164oView ArticlePubMedGoogle Scholar
- Hou T, Qiao X, Zhang W, Xu X: Empirical aqueous solvation models based on accessible surface areas with implicit electrostatics. J Phys Chem B 2002, 106: 11295–11304. 10.1021/jp025595uView ArticleGoogle Scholar
- Lopes A, Aleksandrov A, Bathelt C, Archontis G, Simonson T: Computational sidechain placement and protein mutagenesis with implicit solvent models. Proteins 2007, 67: 853–867. 10.1002/prot.21379View ArticlePubMedGoogle Scholar
- Bolon D, Mayo S: Enzyme-like proteins by computational design. Proc Natl Acad Sci USA 2001, 98: 14274–14279. 10.1073/pnas.251555398PubMed CentralView ArticlePubMedGoogle Scholar
- Liang S, Grishin N: Effective scoring function for protein sequence design. Proteins 2004, 54: 271–281. 10.1002/prot.10560View ArticlePubMedGoogle Scholar
- Hellinga H, Richards F: Optimal sequence selection in proteins of known structure by simulated evolution. Proc Natl Acad Sci USA 1994, 91: 5803–5807. 10.1073/pnas.91.13.5803PubMed CentralView ArticlePubMedGoogle Scholar
- Wernisch L, Héry S, Wodak S: Automatic protein design with all atom force fields by exact and heuristic optimization. J Mol Biol 2000, 301: 713–736. 10.1006/jmbi.2000.3984View ArticlePubMedGoogle Scholar
- Kuhlman B, Baker D: Native protein sequences are close to optimal for their structures. Proc Natl Acad Sci USA 2000, 97: 10383–10388. 10.1073/pnas.97.19.10383PubMed CentralView ArticlePubMedGoogle Scholar
- Koehl P, Levitt M: Protein topology and stability define the space of allowed sequences. Proc Natl Acad Sci USA 2002, 99: 1280–1285. 10.1073/pnas.032405199PubMed CentralView ArticlePubMedGoogle Scholar
- Dantas G, Kuhlman B, Callender D, Wong M, Baker D: A Large Test of Computational Protein Design: Folding and Stability of Nine Completely Redesigned Globular Proteins. J Mol Biol 2003, 332: 449–460. 10.1016/S0022-2836(03)00888-XView ArticlePubMedGoogle Scholar
- Saunders C, Baker D: Recapitulation of protein family divergence using flexible backbone protein design. J Mol Biol 2005, 346: 631–644. 10.1016/j.jmb.2004.11.062View ArticlePubMedGoogle Scholar
- Madaoui H, Becker E, Guérois R: Sequence search methods and scoring functions for the design of protein structures. Methods Mol Biol 2006, 340: 183–206.PubMedGoogle Scholar
- Kang SG, Saven JG: Computational protein design: structure, function and combinatorial diversity. Curr Opin Chem Biol 2007, 11: 329–334. 10.1016/j.cbpa.2007.05.006View ArticlePubMedGoogle Scholar
- Zhou H, Zhou Y: Stability scale and atomic solvation parameters extracted from 1023 mutation experiments. Proteins 2002, 49: 483–492. 10.1002/prot.10241View ArticlePubMedGoogle Scholar
- Lomize AL, Reibarkh MY, Pogozheva ID: Interatomic potentials and solvation parameters from protein engineering data for buried residues. Prot Sci 2002, 11(8):1984–2000. 10.1110/ps.0307002View ArticleGoogle Scholar
- Makhatadze GI, Privalov PL: Energetics of interactions of aromatic hydrocarbons with water. Biophys Chem 1994, 50: 285–291. 10.1016/0301-4622(93)E0096-NView ArticlePubMedGoogle Scholar
- Press W, Flannery B, Teukolsky S, Vetterling W: Numerical Recipes. Cambridge University Press, Cambridge; 1986.Google Scholar
- Rick SW, Berne BJ: Free energy of the hydrophobic interaction from molecular dynamics simulations: The effects of solute and solvent polarizability. J Phys Chem B 1997, 101: 10488–10493. 10.1021/jp971579zView ArticleGoogle Scholar
- Huang X, Margulis CJ, Berne BJ: Do molecules as small as neopentane induce a hydrophobic response similar to that of large hydrophobic surfaces? J Phys Chem B 2003, 107: 11742–11748. 10.1021/jp030652kView ArticleGoogle Scholar
- Gambacorti-Passerini CB, Gunby RH, Piazza R, Galietta A, Rostagno R, Scapozza L: Molecular mechanisms of resistance to imatinib in philadelphia-chromosome-positive leukaemias. Lancet Oncol 2003, 4: 75–85. 10.1016/S1470-2045(03)00979-3View ArticlePubMedGoogle Scholar
- Almlöf M, Aqvist J, Smalas AO, Bransdal BO: Probing the effect of point mutations at protein-protein interfaces with free energy calculations. Biophys J 2006, 90: 433–442. 10.1529/biophysj.105.073239PubMed CentralView ArticlePubMedGoogle Scholar
- Krowarsch D, Dadlez M, Buczek O, Krokoszynska I, Smalas AO, Otlewski J: Probing the effect of point mutations at protein-protein interfaces with free energy calculations. J Mol Biol 1999, 289: 175–186. 10.1006/jmbi.1999.2757View ArticlePubMedGoogle Scholar
- Guérois R, Nielsen J, Serrano L: Predicting changes in the stability of proteins and protein complexes: a study of more than 1000 mutations. J Mol Biol 2002, 320: 369–387. 10.1016/S0022-2836(02)00442-4View ArticlePubMedGoogle Scholar
- Pokola N, Handel T: Energy functions for protein design: Adjustement with protein-protein complex affinities, models for the unfolded state, and negative design of solubility and specificity. J Mol Biol 2005, 347: 203–227. 10.1016/j.jmb.2004.12.019View ArticleGoogle Scholar
- Bashford D, Case D: Generalized Born models of macromolecular solvation effects. Ann Rev Phys Chem 2000, 51: 129–152. 10.1146/annurev.physchem.51.1.129View ArticleGoogle Scholar
- Hawkins G, Cramer C, Truhlar D: Pairwise descreening of solute charges from a dielectric medium. Chem Phys Lett 1995, 246: 122–129. 10.1016/0009-2614(95)01082-KView ArticleGoogle Scholar
- Cornell W, Cieplak P, Bayly C, Gould I, Merz K, Ferguson D, Spellmeyer D, Fox T, Caldwell J, Kollman P: A Second Generation Force Field for the Simulation of Proteins, Nucleic Acids, and Organic Molecules. J Am Chem Soc 1995, 117: 5179–5197. 10.1021/ja00124a002View ArticleGoogle Scholar
- Schaefer M, Karplus M: A comprehensive analytical treatment of continuum electrostatics. J Phys Chem 1996, 100: 1578–1599. 10.1021/jp9521621View ArticleGoogle Scholar
- Calimet N, Schaefer M, Simonson T: Protein molecular dynamics with the Generalized Born/ACE solvent model. Proteins 2001, 45: 144–158. 10.1002/prot.1134View ArticlePubMedGoogle Scholar
- Brooks B, Bruccoleri R, Olafson B, States D, Swaminathan S, Karplus M: CHARMM: a program for macromolecular energy, minimization, and molecular dynamics calculations. J Comp Chem 1983, 4: 187–217. 10.1002/jcc.540040211View ArticleGoogle Scholar
- Schmidt am Busch M, Lopes A, Mignon D, Simonson T: Computational protein design: software implementation, parameter optimization, and performance of a simple model. J Comp Chem 2007, in press.Google Scholar
- Simonson T, Mignon D, Schmidt am Busch M, Lopes A, Bathelt C: The inverse protein folding problem: structure prediction in the genomic era. In Distributed & Grid Computing – Science Made Transparent for Everyone. Principles, Applications and Supporting Communities. Tektum Publishers, Berlin; 2007.Google Scholar
- Jaramillo A, Wernisch L, Héry S, Wodak S: Folding free energy function selects native-like protein sequences in the core but not on the surface. Proc Natl Acad Sci USA 2002, 99: 13554–13559. 10.1073/pnas.212068599PubMed CentralView ArticlePubMedGoogle Scholar
- Larson S, Garg A, Desjarlais J, Pande V: Increased detection of structural templates using alignments of designed sequences. Proteins 2003, 51: 390–396. 10.1002/prot.10346View ArticlePubMedGoogle Scholar
- Lee B, Richards F: The interpretation of protein structures: estimation of static accessibility. J Mol Biol 1971, 55: 379–400. 10.1016/0022-2836(71)90324-XView ArticlePubMedGoogle Scholar
- Brünger AT: X-PLOR version 3.1, A System for X-ray crystallography and NMR. Yale University Press, New Haven; 1992.Google Scholar
- Berman H, Westbrook J, Feng Z, Gilliland G, Bhat T, Weissig H, Shindyalov I, Bourne P: The Protein Data Bank. Nucl Acids Res 2000, 28: 235–242. 10.1093/nar/28.1.235PubMed CentralView ArticlePubMedGoogle Scholar
- Guex N, Peitsch MC: SWISS-MODEL and the Swiss-PdbViewer: An environment for comparative protein modeling. Electrophoresis 1997, 18: 2714–2723. 10.1002/elps.1150181505View ArticlePubMedGoogle Scholar
- Tuffery P, Etchebest C, Hazout S, Lavery R: A New Approach to the Rapid Determination of Protein Side Chain Conformations. J Biomol Struct Dyn 1991, 8: 1267.View ArticlePubMedGoogle Scholar
- Kumar M, Bava K, Gromiha M, Parabakaran P, Kitajima K, Uedaira H, Sarai A: ProTherm and ProNIT: thermodynamic databases for proteins and protein-nucleic acid interactions. Nucl Acids Res 2006, 34: D204–206. 10.1093/nar/gkj103PubMed CentralView ArticlePubMedGoogle Scholar
- Myers JK, Pace CN, Scholtz JM: Helix propensities are identical in proteins and peptides. Biochemistry 1997, 36: 10923–10929. 10.1021/bi9707180View ArticlePubMedGoogle Scholar
- Park SH, Shalongo W, Stellwagen E: Residue helix parameters obtained from dichroic analysis of peptides of defined sequence. Biochemistry 1993, 32: 7048–7053. 10.1021/bi00078a033View ArticlePubMedGoogle Scholar
- Yang J, Spek EJ, Gong Y, Zhou H, Kallenbach NR: The role of context on alpha-helix stabilization: host-guest analysis in a mixed background peptide model. Prot Sci 1997, 6(6):1–9.Google Scholar
- Varadarajan R, Connelly PR, Sturtevant JM, Richards FM: Heat capacity changes for protein-peptide interactions in the ribonuclease S system. Biochemistry 1992, 31: 1421–1426. 10.1021/bi00120a019View ArticlePubMedGoogle Scholar
- Padmanabhan S, Marqusee S, Ridgeway T, Laue TM, Baldwin RL: Relative helix-forming tendencies of nonpolar amino acids. Nature 1990, 344: 268–270. 10.1038/344268a0View ArticlePubMedGoogle Scholar
- Lyu CP, Liff MI, Marky LA, Kallenbach NR: Side chain contributions to the stability of alpha-helical structure in peptides. Science 1990, 250: 669–673. 10.1126/science.2237416View ArticlePubMedGoogle Scholar
- Shoemaker KR, Kim PS, Brems DN, Marqusee S, York EJ, Chaiken IM, Stewart JM, Baldwin RL: Nature of the charged-group effect on the stability of the C-peptide helix. Proc Natl Acad Sci USA 1985, 82: 2349–2353. 10.1073/pnas.82.8.2349PubMed CentralView ArticlePubMedGoogle Scholar
- Anderson DP: BOINC: A System for Public-Resource Computing and Storage. In 5th IEEE/ACM International Workshop on Grid Computing. IEEE Computer Society Press, USA; 2004.Google Scholar
- Ho CK, Fersht AR: Internal thermodynamics of position 51 mutants and natural variants of tyrosyl-tRNA synthetase. Biochemistry 1986, 25: 1891–1897. 10.1021/bi00356a009View ArticlePubMedGoogle Scholar
- Wells TN, Fersht AR: Use of binding energy in catalysis analyzed by mutagenesis of the tyrosyl-tRNA synthetase. Biochemistry 1986, 25: 1881–1886. 10.1021/bi00356a007View ArticlePubMedGoogle Scholar
- First EA, Fersht AR: Mutational and kinetic analysis of a mobile loop in tyrosyl-tRNA synthetase. Biochemistry 1993, 32: 13658–13663. 10.1021/bi00212a034View ArticlePubMedGoogle Scholar
- De Prat Gay G, Duckworth HW, Fersht AR: Modification of the amino acid specificity of tyrosyl-tRNA synthetase by protein engineering. FEBS Letters 1993, 318: 167–171. 10.1016/0014-5793(93)80014-LView ArticlePubMedGoogle Scholar
- Fersht AR, Leatherbarrow RJ, Wells TN: Structure-reactivity relationships in engineered proteins:analysis of use of binding energy by linear free energy relationships. Biochemistry 1987, 26: 6030–6038. 10.1021/bi00393a013View ArticlePubMedGoogle Scholar
- Sharp KA: Calculation of HyHel10-lysozyme binding free energy changes: Effect of ten point mutations. Proteins 1998, 33: 39–48. 10.1002/(SICI)1097-0134(19981001)33:1<39::AID-PROT4>3.0.CO;2-GView ArticlePubMedGoogle Scholar
- Moebius U, Clayton LK, Abraham S, Harrison SC, Reinherz EL: The human immunodeficiency virus gp120 binding site on CD4: Delineation by quantitative equilibrium and kinetic binding studies of mutants in conjunction with a high-resolution CD4 atomic structure. J Exp Med 1992, 176: 507–517. 10.1084/jem.176.2.507View ArticlePubMedGoogle Scholar
- Cavarelli J, Eriani G, Rees B, Ruff M, Boeglin M, Mitschler A, Martin F, Gangloff J, Thierry J, Moras D: The active site of yeast aspartyl-tRNA synthetase: structural and functional aspects of the aminoacylation reaction. EMBO J 1994, 13: 327–337.PubMed CentralPubMedGoogle Scholar
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.