A decoy set for the thermostable subdomain from chicken villin headpiece, comparison of different free energy estimators

Background Estimators of free energies are routinely used to judge the quality of protein structural models. As these estimators still present inaccuracies, they are frequently evaluated by discriminating native or native-like conformations from large ensembles of so-called decoy structures. Results A decoy set is obtained from snapshots taken from 5 long (100 ns) molecular dynamics (MD) simulations of the thermostable subdomain from chicken villin headpiece. An evaluation of the energy of the decoys is given using: i) a residue based contact potential supplemented by a term for the quality of dihedral angles; ii) a recently introduced combination of four statistical scoring functions for model quality estimation (FRST); iii) molecular mechanics with solvation energy estimated either according to the generalized Born surface area (GBSA) or iv) the Poisson-Boltzmann surface area (PBSA) method. Conclusion The decoy set presented here has the following features which make it attractive for testing energy scoring functions: 1) it covers a broad range of RMSD values (from less than 2.0 Å to more than 12 Å); 2) it has been obtained from molecular dynamics trajectories, starting from different non-native-like conformations which have diverse behaviour, with secondary structure elements correctly or incorrectly formed, and in one case folding to a native-like structure. This allows not only for scoring of static structures, but also for studying, using free energy estimators, the kinetics of folding; 3) all structures have been obtained from accurate MD simulations in explicit solvent and after molecular mechanics (MM) energy minimization using an implicit solvent method. The quality of the covalent structure therefore does not suffer from steric or covalent problems. The statistical and physical effective energy functions tested on the set behave differently when native simulation snapshots are included or not in the set and when averaging over the trajectory is performed.


Background
The correct estimation of the free energy of protein conformations has been the focus of intense research for over 20 years. Despite recent improvements, this goal remains elusive. One important consequence of this is the limited ability to judge the quality of protein models built by homology or "ab initio". In particular, the estimators need to be accurate enough to help in the selection of the most native-like model from an ensemble of closely related alternative conformations. This has been recently identified as one of the bottlenecks limiting the quality and usefulness of protein structure prediction [1].
Free energy estimators can be broadly divided in two different classes: physical effective energy functions based on physical potentials and database derived statistical effective energy functions. Physical effective energy functions (PEEFs) for discrimination of native structures [2][3][4][5] generally yield results comparable to statistical effective energy functions (SEEFs) based on the Boltzmann hypothesis [6]. SEEFs are commonly formalized as sets of pairwise potentials of mean force [7][8][9][10][11][12].
An intuitive test for a free energy estimator is the recognition of the native structure among a large number of wellconstructed decoy structures. A number of standard decoy sets, consisting of the native structure plus a large ensemble of more or less incorrect models resembling real protein structures, have been established for benchmarking purposes [13]. The method used for constructing the decoys, which are generally minimized using a simple energy function, introduces some systematic bias. It is therefore important to benchmark a scoring function on different decoy sets in order to assess its generality. As the results for any decoy set are known beforehand, this test does not prevent the scoring function from learning some hidden characteristics shared by all decoy structures, thus limiting its usefulness for practical purposes.
The presently available decoy sets are mainly, if not exclusively, built from "ab initio" models and contain structures which do not cover the full range of root mean square deviation (RMSD) values, with respect to native structure. Most structures, in particular for the older decoy sets, have high RMSD values and are obviously misfolded, limiting the usefulness of the decoy sets. Moreover the simple energy functions used during decoy construction could also account for some systematic bias. For this reason, we have decided to derive a new pool of decoy structures from the snapshots of different trajectories from molecular dynamics simulations of villin headpiece thermostable domain. This new decoy set is evaluated using a battery of PEEFs and SEEFs to give a comparison of their relative performance.
Villin headpiece thermostable domain is a small helical domain, folding on a microsecond time scale. For its limited size, its stability and for fast kinetics of folding it has served as a model for many theoretical studies. All-atom molecular dynamics simulations of the villin headpiece thermostable domain have been performed by several groups using explicit solvent [14][15][16] and implicit solvent [17][18][19][20]. Although simulations were able to produce native-like conformations, the ability of free energy estimators to recognize native structures among all conformations seems rather limited. In a pioneering work Kollman and coworkers were able to identify structures close to native based on MM/PBSA energy [21]. Although molecular mechanics/implicit solvent methods are able to consistently assign low energy to native-like conformations [19,20], their performance depends much on the force field used, the protocol applied and the solvation model adopted. In this context, the well-constructed native-and non-native-like decoys described in this paper will provide a further benchmark for refinement of free energy estimators.
Since this work was started X-ray structures for mutants of the villin headpiece thermostable domain have been deposited in the Protein Data Bank with pdb id.: 1yrf, 1yri, 1wy3, 1wy4 [22]. The RMSD of the trace of these newly deposited structures with respect to the average NMR structure used in the present work (pdb id. 1vii) is 1.92 to 2.09 Å. In general, the results discussed hereafter are essentially the same when any of the other structures deposited in the PDB is used as a reference. Although there are differences among the NMR and X-ray structures [22] all conclusions based on ranking of models accord-Histogram of RMSD from native structure of the decoy set Figure 1 Histogram of RMSD from native structure of the decoy set.
ing to C α trace RMSD from native are not significantly affected.

Results and discussion
Description of the decoy set Snapshots from the five long time molecular dynamics trajectories described by De Mori et al. [15,16], extended to 100 ns here, provide a set of conformations which span a large range of RMSDs from native structures. The histogram of counts for each interval of 0.5 Å is reported in Figure 1.
Although the distribution is not uniform, samples are present for native-and non-native-like conformations. Moreover, in one of the simulations (F1) the peptide assumes a native-like conformation starting from a rather different conformation (see Figure 2). Snapshots taken at 20 ns intervals from NATIVE and F1 simulations have been superimposed on native structure in Figure 3. Superposition of NATIVE snapshots on the starting native structure sets the overall limits for definining native-like conformations (see later). It is evident from superposition of F1 snapshots on native structure that this set qualifies for being defined native-like.
The average radius of gyration of models in each F1, F3, F4 and F7 sets ranges from 9.58 Å (F4) to 10.43 Å (F3). These values are comparable with the average radius of gyration of models in the NATIVE set (9.55 Å). All sets display, after initial collapse at the beginning of the simulation, rather compact structures, as witnessed also by the average solvent accessible surface areas (SASA) ranging from 3013 Å 2 (NATIVE) to 3216 Å 2 (F3).
The breakdown of SASA in hydrophobic and polar contributions displays a more diverse behavior. NATIVE set has the lowest hydrophobic SASA and the lowest ratio of hydrophobic to polar SASA. Among other sets F1 has the lowest hydrophobic SASA, while sets F4 and F7 show the highest ratio of hydrophobic to polar SASA.
The average number of residues involved in helical conformations has been also monitored for all sets. For NATIVE simulation the average number of residues in αor 3 10 -helices is dropping between 400 ns and 700 ns to reach an average value of ca. 14. This number is comparable to the average number of helical residues in F1 and F3, 13.7 and 13.0 respectively, but definitely higher than the value of 5.4 and 0.3 for F4 and F7, respectively. The average number of hydrogen bonds, however, is not very different among the sets, ranging from 18.5 for F7 to 23.8 for NATIVE.
The quality of structures can be assessed by subjecting the minimized structures to the program Procheck [23] which summarizes the analysis using so-called G-Factors for the quality of dihedral angles, covalent geometry and overall quality of the structure.  It should be noted that conformations from NATIVE simulation and F1 simulation, folding to a native-like conformation, have higher (i.e. better) overall and dihedral G-Factors than (non-native) conformations from simulations F3, F4, F7. This confirms that statistics of model dihedral angles and covalent structure may be an important value for quality model assessment as recently demonstrated [24][25][26].
The data discussed in this section are summarized in Table  1.
In the following we examine the behaviour of SEEFs and PEEFs on pooled sets of conformations. We consider separately the set containing NATIVE snapshots and the set not containing NATIVE snapshots. The latter set does not contain any information about the native structure. It is homogeneous, in that all snapshots are obtained using the same procedure and it is representative of a realistic prediction scenario. On the other hand, including snapshots from NATIVE simulation allows one to access a lower range of RMSDs from native, which is not available with snapshots from the other simulations.
The decoys arranged in five sets corresponding to the different molecular dynamics trajectories are available in the Internet (see Materials and Methods section).

Statistical effective energy functions
Residue-residue mean contact energy The contact energy computed according to the methodology of Berrera et al. [10] has been supplemented with a scoring function for the overall quality of the structure (i.e. the G-Factors given by Procheck). The resulting potential allows to separate NATIVE and native-like F1 snapshots from non-native snapshots for most of the conformations.
G-Factors (or any equivalent scoring of the quality of the covalent structure of the protein) are necessary in order to take into account other factors which may raise the energy of a molecule. A few of these, like steric clashes and bad covalent geometry should be removed by molecular dynamics simulation and energy minimization, as performed here. On the contrary, unusual torsion angles are not necessarily removed both for the limited simulation time and for possible inaccuracies in available force fields.
The running average of the contact energy for simulations NATIVE and F1 is mostly separated from that of the other simulations, although the structure is rather flexible and fluctuations in the structure imply fluctuations in energy and in RMSD. Based on this potential (without averaging) a structure at RMSD of 4.0 Å from native could be picked up as the lowest energy structure out of the four simulations F1, F3, F4 and F7.
Another way of looking at these results is to plot the energy versus the RMSD from native. This is done in Figure 4 where the different simulations are roughly separated. The correlation between contact energy (supplemented by the overall Procheck G-Factor) and RMSD is apparent, although it is hidden by fluctuations. The appearance of the plot greatly improves when the running average of the contact energy is plotted against the RMSD of the structure at the center of the averaging window ( Figure 4). This shows that even for such coarse grained potential averaging of fluctuations greatly improves the performance of the scoring function.
The contact energy is strongly correlated with the hydrophobic SASA. The correlation coefficient is 0.67 much larger than the same coefficient for the polar SASA which is -0.40. A similar positive correlation coefficient is found also within each different set of models, showing that this correlation is not artifactual ( Table 2). Although this was somehow expected based on the table of pairwise residueresidue contact energies favoring contacts between hydrophobic resiudes, this correlation shows that a simple criterion like burial of hydrophobic SASA may be an effective feature in model quality assessment. The correlation with helical content is less pronounced and largely artifactual because sets NATIVE, F1 and F3, which have the lowest contact energy are also similar in helical content and quite different from sets F4 and F7. When single sets are considered the correlation vanishes.

FRST
The most striking feature of the FRST energy appears to be the high energy fluctuation between closely related structures. This behaviour, which is more similar to PEEFs, is induced by the torsion angle term of the FRST function. In order to pass from one local minimum to another, it appears necessary to overcome an energy barrier formed by distorsions in local geometry. This effect accounts for the better discrimination of the most native-like conformations in Figure 5. On the other hand it creates some noise in the energy evaluations for the simulations F1, F3 and F4 in particular where FRST has a slight preference for some higher RMSD structures. Overall this SEEF is reliable at discriminating native-like conformations when at least some are present, but it is not as effective at choosing the less erroneous between those with high RMSD.
FRST energy is not correlated with the sets features selected in Table 2, due to the above mentioned fluctuations in the energy. However, it is worth noting that the RAPDF component of FRST energy correlates well with the more coarse grained residue-residue contact energy (correlation coefficient 0.8).

Molecular mechanics/Poisson-Boltzmann surface area (MM/PBSA) energy and molecular mechanics/Generalized Born surface area (MM/GBSA) energy
The performance of the scoring function drastically improves when solvation effects are added to the MM energy function either through the Poisson-Boltzmann model or through the Generalized Born solvent accessibility model. It is apparent that a procedure for smoothing fluctuations should be employed for better reliability. We have previously used the colony energy approach proposed by Honig and co-workers [27] coupled with MM/ PBSA energy estimation [5] and we showed that it could greatly improve the energy vs. RMSD curve for a small number of decoy sets. The peculiar nature of the structural models described here makes a simple running average procedure effective.
The MM/GBSA energy for NATIVE and folding simulation F1 is lower than for non-native simulations in a consistent portion of time. The plots of MM/PBSA and MM/GBSA energy versus RMSD from native structure display a very good correlation. This feature is better displayed in the Residue-residue contact energy for simulations NATIVE (black), F1 (red), F3 (green), F4 (blue), F7 (yellow) versus RMSD from native structure Figure 4 Residue-residue contact energy for simulations NATIVE (black), F1 (red), F3 (green), F4 (blue), F7 (yellow) versus RMSD from native structure. In the upper plot no averaging is applied, in the lower plot a moving average is applied over a window of 50 molecular dynamics snapshots.
MM/GBSA energy versus RMSD plot (see Figure 6) rather than in the MM/PBSA plot (Figure 7). This is most probably due to the fact that the: i) structures are optimized using MM/GBSA energy and changing the solvation model for energy evaluation may introduce noise; ii) the MM/GBSA method does not suffer from all the numerical approximations implied by the numerical finite difference solution of the Poisson-Boltzmann equation. In this case the best model that could be picked up, based on the energy running average, from simulations starting from non native conformations has a RMSD from native structure of 4.8 Å and 4.4 Å for MM/PBSA and MM/GBSA energy model, respectively. It is worth noting that nonnative models in the set F3 have even lower MM/PBSA energy than native-like models of set F1. The running average favors native-like models by few tenths of kcal/ mol, largely within the numerical inaccuracies of the methodology. Both MM/GBSA and MM/PBSA energies anticorrelate with the burial of hydrophobic SASA. The correlation with the number of helical residues and number of hydrogen bonds is artifactual, as discussed above, and disappears when single sets are considered.

Comparison of different effective energy functions
For the present decoy set native-like conformations may be conveniently defined in terms of the RMSD from native structure. The long time simulation of the native structure has an average RMSD compared to the deposited structure of 3.4 Å ± 0.7 Å with a maximum RMSD of 4.8 Å. The latter could be arbitrarily defined as the RMSD for considering a structure native-like.
Tables 3 and 4 report the performance of the SEEFs and PEEFs tested, evaluated with different criteria and according to the RMSD from native of the lowest energy model. Two sets, alternatively including or not snapshots from native simulation, are considered and the energy or scoring function values are considered as they are or averaged over 50 continuous snapshots.
For the set with native simulation snapshots included, averaging improves significantly the performance of the FRST scoring function and MM energy function. For the set not including native simulation snapshots, averaging improves significantly the performance of MM/GBSA and MM/PBSA energy functions which are known to be affected by large fluctuations (see the previous sections).
For the set including native simulation snapshots the FRST scoring function, MM/GBSA and MM/PBSA energy functions show the best performance and they all recognize native-like models with RMSD from native less than 3.0 Å.
The poor performance of MM energy points out the need to consider solvation effects. For the set not including native simulation snapshots SEEFs appear superior to PEEFs. Averaging makes MM/GBSA and MM/PBSA more effective while it worsens the performance of atom-based FRST scoring function. The sensitivity of FRST to molecular details is greater than that of the residue-based contact energy [10] for which, indeed, averaging has an overall beneficial effect. Since FRST is trained on native structures it is likely that while assigning low scores to native features, it assigns exceedingly bad scores to non-native-like features. This would explain the effect of averaging the score over closely related structures. Overall from the set not including native simulation snapshots it is seen that most methods are able to detect models with reasonably low RMSD from native (4.0 Å to 5.3 Å).

Conclusion
The decoy set presented here displays features which make it attractive for testing energy scoring functions: 1) it covers a broad range of RMSD values (from less than 2.0 Å to more than 12.0 Å); 2) it has been obtained from molecular dynamics trajectories, starting from native and different non-native conformations which have diverse behaviour, with secondary structure elements correctly or incorrectly formed, and in one case folding to a native-like structure. This allows not only for scoring of static structures, but also for studying, using free energy estimators, the kinetics of folding; 3) All structures have been obtained from accurate MD simulations in explicit solvent and after molecular mechanics (MM) energy minimization using an implicit Molecular mechanics/generalized Born surface area energy for simulations NATIVE (black), F1 (red), F3 (green), F4 (blue), F7 (yellow) versus RMSD from native structure Figure 6 Molecular mechanics/generalized Born surface area energy for simulations NATIVE (black), F1 (red), F3 (green), F4 (blue), F7 (yellow) versus RMSD from native structure. In the upper plot no averaging is applied, in the lower plot a moving average is applied over a window of 50 molecular dynamics snapshots.
FRST energy for simulations NATIVE (black), F1 (red), F3 (green), F4 (blue), F7 (yellow) versus RMSD from native structure Figure 5 FRST energy for simulations NATIVE (black), F1 (red), F3 (green), F4 (blue), F7 (yellow) versus RMSD from native structure. In the upper plot no averaging is applied, in the lower plot a moving average is applied over a window of 50 molecular dynamics snapshots.
solvent method. The quality of the covalent structure therefore does not suffer from major steric or covalent problems.
Different free energy or scoring functions have been tested on the sets. The performance of SEEFs and PEEFs depends strongly on the decoy set. Here we have a decoy set with native, native-like and non-native structures. The set not including decoys from the native trajectory is representative of a realistic prediction scenario. Moreover a simple running averaging procedure can be employed in order to smooth free energy fluctuations, because decoys are obtained by sequential sampling of molecular dynamics trajectory. These features are not common for decoy sets.
The performance of free energy estimators may be assessed therefore in different contexts: 1) when snapshots from native simulation are included in the set; 2) when only snapshots from simulations starting from nonnative conformations are included in the set.
Moreover the effect of averaging may be easily assessed.
When native structures are included in the set, i.e., when the set includes models very close to native structure, atom based energy functions perform very well. The atombased FRST SEEF compares well with atom-based MM/ PBSA and MM/GBSA PEEFs.
When snapshots from native simulation are not included in the set, the performance of residue based SEEF less sensitive to molecular details improves over atom-based SEEF and PEEFs. For MM/GBSA and MM/PBSA averaging has a beneficial effect being able to smooth large fluctuations expected in the methodology. The effect of averaging on FRST SEEF is dominated by relatively poor discriminative power on non-native free energy.
Overall these results suggest that different free energy estimators, possibly in a hierarchical scheme, should be used depending on the expected quality of predictive models. Application of some form of averaging in order to smooth fluctuations due to molecular details, should be considered carefully depending on the energy function employed.

Molecular Dynamics simulations and analysis
The structures and trajectories used in this work were selected as described in a previous paper [15,16], which we briefly outline here. Seven uncorrelated conformations were chosen from a coarse-grained Monte Carlo (MC) trajectory thermalised at the transition temperature T c . The MC evolution occurs in a simplified free-energy landscape in which an off-lattice representation of the protein with every residue represented by a bead is used to allow an efficient selection of marginally-compact structures, which are subsequently taken as viable initial conformations for the subsequent all-atom MD. The coarse-grained MC structural representation is connected to the one with atomic resolution through a "fine-graining" reconstruc- Molecular mechanics/Poisson Boltzmann surface area energy for simulations NATIVE (black), F1 (red), F3 (green), F4 (blue), F7 (yellow) versus RMSD from native structure. In the upper plot no averaging is applied, in the lower plot a moving average is applied over a window of 50 molecular dynamics snapshots.
NATIVE throughout the text. Every all-atom model obtained was subjected to Molecular Dynamics simulations (simulations NATIVE, F1, F3, F4, F7), using the same simulation protocol and conditions. All basic NH 2 groups were considered protonated while all the acidic COOgroups were considered deprotonated. These conditions resulted in a total charge of +2 on the protein. The protein in each simulation was solvated with explicit water molecules in a periodic octahedral box large enough to contain the protein and 1.0 nm of solvent on all sides. The simple point charge (SPC) [28] water model was used to simulate the solvent. Electroneutrality of the system was ensured by the addition of two negatively charged chloride ions.
The same equilibration protocol was used in all simulation:

1.
Each system was initially energy minimized with a steepest descent method for 1000 steps. The calculation of electrostatic forces utilized the PME implementation of the Ewald summation method [29]. The GROMOS96 force field [30,31] was used.

2.
To release excess strain in simulations F1, F3, F4, F7 due to possible artifacts in the reconstruction procedure, three sequential MD runs of 50 ps each with position restraints on the protein, backbone and sidechains, respectively, were performed. In the first, the protein atoms were kept fixed with the solvent molecules free to move. In the subsequent run, the backbone of the protein was kept fixed with both the sidechains and solvent free to move in MD.
In the third and final simulation, the sidechains were kept fixed, while the backbone and solvent atoms were allowed to move.

3.
After these first relaxation steps, each system was heated to 300 K by 200 ps of MD simulation under NPT conditions, with no restraints, by weak coupling to a bath of constant pressure (P 0 = 1 bar, coupling time τ P = 0.5 ps) [32].

4.
Each of the systems was then equilibrated by 50 ps of unrestrained MD with coupling to an external temperature bath [32] with coupling constant of 0.1 ps.

5.
The production runs at 300 K, using NPT conditions were all 100 ns long in the case of the simulation starting from the NMR determined structure 1VII.pdb (NATIVE) and simulations F1, F3, F4, F7.
The number of explicit water molecules and the structural features of the simulations are reported in Table 5. In all production runs, the temperature was maintained close to the intended values, by weak coupling to an external temperature bath [32] with a coupling constant of 0.1 ps. The protein and the rest of the system were coupled separately to the temperature bath. The LINCS algorithm [33] was used to constrain all bond lengths. For the water molecules the SETTLE algorithm [34] was used. A dielectric permittivity, = 1, and a time step of 2 fs were used. All atoms were given an initial velocity obtained from a Maxwellian distribution at the desired initial temperature of 300 K.
All the minimization, MD runs were performed using the GROMACS software package. [35].

Description of the decoy set
All snapshots from the five trajectories described above have been energy minimized in a two-step fashion using the program CHARMM and the CHARMM forcefield (version 27b2) [36,37]. First all C α coordinates have been fixed and the energy minimized with 20 steepest descent and 30 conjugate gradients steps with 10 conjugate gradient cycles, in order to remove high energy spots, like van der Waals clashes. During this minimization a distance dependent dielectric constant has been employed (ε = 1r) with a cutoff on non-bond interactions of 12 Å. In the second step the structure is totally unrestrained and a more accurate solvent model (the Generalized Born model) has been employed using the parameters suggested by Qiu et al. [38], in order to reduce artifacts ensuing from missing solvent. First 50 steepest descent minimization steps have been performed followed by 200 conjugate gradients minimization steps with 10 conjugate gradient cycles. The energy minimized structures have been deposited in the Decoys'R'us database with the name vhp_mcmd (villin headpiece -Monte Carlo molecular dynamics) (http:// dd.stanford.edu/, [13]).

Amino acid empirical contact energy
Residue-residue contact energies have been computed according to the methodology described by Berrera et al. [10]. A contact between two residues exists when these are not adjacent and when a heavy atom of the sidechain of one of the residues is closer than a threshold distance to a heavy atom of the sidechain of the other residue. The threshold was chosen, according to Berrera et al., 2003 [10] as the sum of the van der Waals radii of the two atoms plus 1.0 Å. The table of contact energies of Berrera et al. [10] has been used and the total energy has been divided by the number of aminoacids. In order to take into account the overall quality of each model the overall Gfactor (with changed sign) from Procheck has been added to the contact energy.

FRST
The FRST energy function is a potential covering the main features of native protein structures. It has been fully presented elsewhere [26]. Briefly put, the energy function uses a linear combination of four different statistical potentials to calculate a pseudo-energy which can be used to estimate the quality of a protein structure. The most important feature is the quality of the local geometry of the protein backbone, expressed by a residue-specific torsion angle potential. The (φ,ψ) angles of each residue are used to construct a 10 × 10 degree propensity map for each of the twenty amino acids. The RAPDF potential [7] is a statistical pairwise potential used as the second most important feature in the total energy. The remaining two energy terms are a statistical solvation term constructed in analogy to ref. [39] and a simple count of intra-chain backbone hydrogen bonds. The weighting factors were derived from optimizing the results for a training set of 3,757 models submitted during CASP-4, covering the full range from easy comparative modelling all the way to difficult novel folds targets.

MM energy, MM/GBSA and MM/PBSA free energy
The MM/PBSA methodology, as applied here, has been described at length and its performance has been tested [5,40]. In this methodology the free energy of a system is written as the sum of an intrasolute term, computed according to one of the available molecular mechanics forcefield, and a solvation term which is further split in an electrostatic term and a hydrophobic term. The latter term is taken as proportional to the solvent accessible surface area. The electrostatic solvation term is computed according to the Poisson-Botzmann theoretical framework [41].
In summary, the free energy of the macroscopic state C, unless a constant term and entropic contributions, is: where A is defined as the area of the solvent accessible surface. There is no agreement in the literature on the proportionality coefficient (i.e. the surface tension coefficient). A value of 50 cal mol -1 Å -2 has been proposed by Nicholls et al. [42] for proteins. This value has been significantly decreased by us for use in MM/PBSA framework because van der Waals intramolecular interactions will account for a large part of this surface tension. Based on simple considerations we have proposed a value of 20 cal mol -1 Å -2 [43].
If we assume that entropic contributions are the same for all sampled conformations, we can use ∆G c for estimating the stability of each conformation. We have used both the MM energy computed using CHARMM [36,37], the MM/ GBSA energy computed using CHARMM and MSMS [44] for solvent accessible surface area and MM/PBSA computed using CHARMM and the program UHBD [45,46]. The linear version of the PB equation has been used as justified by the low charge density of the protein [47]. A procedure using focusing makes the computation of PB energies rather efficient [43,48].

Averaging over molecular dynamics simulations
The values of MM, MM/GBSA and MM/PBSA energy during a simulation are affected by very large fluctuations, typically in the range of 100 kcal/mol. In order to damp fluctuations, a running average within a window of 50 snapshots contiguous in time was performed and the average energy or score was assigned to the model at the center of the moving window.

Performance evaluation
The performance of the scoring functions used has been evaluated using the parameters recently proposed and employed by Samudrala and coworkers [11]. We summarize here the definitions and features of the scoring functions employed: 1) the logP(B 1 ) is defined as: where r 1 is the rank in RMSD from native for the best scoring model, normalized by the number of decoys in the set.  where is the rank in RMSD from native of the structure with lowest RMSD from native among the ten best scoring models, normalized by the number of decoys in the set.
The values of logP(B 1 ) and logP(B 10 ) depend on the fraction of models with RMSD from native similar to the best model. These values are the logarithm of the probability of scoring the best model as the lowest or among the ten lowest energy models [11].
3) the fraction enrichment parameter defined as the fraction of the 10% lowest RMSD models which is found among the 10% best scoring models. This parameter provides evidence for the tendency of the scoring system to assign low RMSD models best scores. 4) the correlation coefficient between score and RMSD from native. This parameter may be ineffective if the scoring function is sensitive to minor model details and if it is not accurate in all the range of RMSDs. 5) the RMSD from native of the best score model. This parameter is directly related to the performance of the scoring function in recognizing native-like models.
These five parameters have been considered for the pool of NATIVE, F1, F3, F4 and F7 minimized snapshots and for the same pool not including snapshots from NATIVE simulation. The decoys from the latter ensemble do not include any information on native structure. The same analysis has been repeated taking for each model the average of the energy or scoring function in a window of 50 snapshots.

Authors' contributions
FF, SCET and GC conceived the project. GC performed the molecular dynamics simulations, their analysis, and provided snapshots, FF developed the MM/GBSA minimization protocol and MM, MM/GBSA, MM/PBSA and contact energy evaluation, SCET developed the FRST evaluation protocol and analysis, FF and SCET designed analysis methods. FF performed the performance and correlation analysis. All authors edited the manuscript.