Multibody potentials accounting for cooperative effects of molecular interactions have shown better accuracy than typical pairwise potentials. The main challenge in the development of such potentials is to find relevant structural features that characterize the tightly folded proteins. Also, the side-chains of residues adopt several specific, staggered conformations, known as rotamers within protein structures. Different molecular conformations result in different dipole moments and induce charge reorientations. However, until now modeling of the rotameric state of residues had not been incorporated into the development of multibody potentials for modeling non-bonded interactions in protein structures.

Results

In this study, we develop a new multibody statistical potential which can account for the influence of rotameric states on the specificity of atomic interactions. In this potential, named “rotamer-dependent atomic statistical potential” (ROTAS), the interaction between two atoms is specified by not only the distance and relative orientation but also by two state parameters concerning the rotameric state of the residues to which the interacting atoms belong. It was clearly found that the rotameric state is correlated to the specificity of atomic interactions. Such rotamer-dependencies are not limited to specific type or certain range of interactions. The performance of ROTAS was tested using 13 sets of decoys and was compared to those of existing atomic-level statistical potentials which incorporate orientation-dependent energy terms. The results show that ROTAS performs better than other competing potentials not only in native structure recognition, but also in best model selection and correlation coefficients between energy and model quality.

Conclusions

A new multibody statistical potential, ROTAS accounting for the influence of rotameric states on the specificity of atomic interactions was developed and tested on decoy sets. The results show that ROTAS has improved ability to recognize native structure from decoy models compared to other potentials. The effectiveness of ROTAS may provide insightful information for the development of many applications which require accurate side-chain modeling such as protein design, mutation analysis, and docking simulation.

Background

Understanding the structure and function of proteins requires an accurate potential energy function to quantify interactions between residues or atoms. One approach for the design and construction of potential energy functions is to make use of the information embedded in the known protein structures [1–6]. Such energy functions, called statistical potentials or knowledge-based potentials are derived by converting the observed frequencies of residue or atomic interactions in a database of protein structures into the free energies of corresponding interactions. Any aspect of structural features which characterize important interactions in the folded structures can be incorporated into the derivation of statistical potentials. Although their physical interpretations are still debated [7–9], due to their accuracy and computational efficiency, statistical potentials have been used with considerable success in many applications such as fold recognition and threading [10, 11], protein structure prediction [12], protein design [13], binding [14, 15] and aggregation [16].

The key idea in the development of statistical potentials is how to decompose the 3-D network of interactions in protein structures. Typical pairwise potentials cannot accurately describe non-bonded interactions in protein structures. As the folded protein structures are tightly packed and surrounded by solvent molecules, the surrounding circumstances of interacting atoms are inhomogeneous and anisotropic. Also, due to the bond connectivity, there are always correlated interactions from nearby bonded atoms. Thus, more detailed and complex structural features involving multibody effects have been incorporated into the formulation of statistical potentials. For example, sequential segments of various lengths have proved useful for prediction of secondary structure [17–20]. Four body potentials were used to improve cooperativity of main-chain hydrogen-bonds [21, 22]. A variety of structural motifs (i.e., residue clusters) has been identified to better characterize tightly packed protein structures [23–28]. Delaunay tessellation technique also has been employed as a means of defining multibody interactions [29, 30]. Local environment templates which could account for maximum 17 residues have been introduced to more accurately capture cooperative effects in protein structures [27]. A secondary structure specific implementation of pairwise potentials has demonstrated its superiority to typical residue pairwise potentials [31, 32]. The introduction of orientation dependencies of interactions into typical distance-dependent pairwise potentials has achieved substantial improvements in both residue-level [33–36] and atomic-level potentials [37–40]. These multibody potentials are not only able to describe the 3-D interactions more completely but also able to account for cooperative effects of molecular interactions more accurately than typical pairwise potentials.

On the other hand, protein residues have great flexibility because their single covalent bonds allow rotation of the atoms they join. It is well known that residues prefer to adopt only a limited number of staggered conformations, known as rotamers due to local steric interactions (e.g. overlapped electron orbitals) [41–44]. Since the electron density distribution around each nucleus can vary depending on the molecular conformation [45–47], different rotamers may result in different dipole moments and induce charge reorientations, which are reflected in dispersion forces and electrostatic forces. In addition to the polarization effect, solvation effect may be another source of multibody effects related to the rotameric state. For example, compact rotameric states would prefer to be buried within protein structures, while extended rotameric states would prefer to be exposed to solvent with high conformational entropy. Thus, non-bonded interactions between residue atoms may be influenced by the rotameric state of the residues to which the interacting atoms belong.

Existing potentials had not modeled the flexibility of residues explicitly. For example, residue-level potentials which have only one interaction site per residue simply ignore the flexibility of residue conformation. In case of atomic-level potentials, although the orientation dependence of atomic interactions may be able to account for the anisotropic environment around each atom, it is also based on rigid blocks [37] or rigid atom fragments (i.e. three atoms that are consecutively bonded) [38–40]. Thus they cannot reflect the influence of rotameric states on the specificity of atomic interactions no matter how complete a description of the relative orientation and distance between interacting atoms may be used.

Here we studied the energy dependence of residue flexibility and developed a new multibody potential, named “rotamer-dependent atomic statistical potential” (ROTAS). The interaction between two atoms is specified by not only the distance and relative orientation but also by two state parameters which concern the rotameric state of the residues to which the interacting atoms belong. It was clearly found that the rotameric state of residues is correlated to the specificity of interactions within protein structures. Furthermore, such rotamer-dependencies are not limited to specific type or certain range of interactions. We tested ROTAS on various sets of decoys and compared its performance to those of several existing atomic potentials. The results show that ROTAS led to an improvement not only in the native structure recognition, but also in the best model selection and the correlation coefficients between energy and model quality. The ROTAS potential is freely available in https://sites.google.com/a/umich.edu/rotas/.

Methods

Derivation of ROTAS

In the ROTAS potential, the interaction between two atoms is described by the spatial distance, relative orientation and rotameric states as illustrated in Figure 1. Basically, it extends the description of inter-atomic interaction in GOAP [40] by including the rotameric states of residues. The detailed description for how the rotameric state is defined is explained in the next section. Here we focus on the formulation of the ROTAS potential.

In this study, we only consider the interaction between heavy atoms and distinguish 167 residue-specific heavy atom types. First, local coordinate frames are attached to all heavy atoms as described in the GOAP potential. The interaction between atom i and j is then specified by eight parameters: d_{
ij
}, θ_{
i
}, ϕ_{
i
}, θ_{
j
}, ϕ_{
j
}, ω, R_{
i
} and R_{
j
} (see Figure 1). Here, d_{
ij
}, θ_{
i
}, ϕ_{
i
} are the spherical coordinates of atom j with respect to the local frame of atom i, and ω is a torsional angle around d_{
ij
}, and R_{
i
} and R_{
j
} represent the rotameric state of residues. The equation of the ROTAS potential can be obtained using the inverse Boltzmann law:

where k_{
B
} is the Boltzmann constant and T is the absolute temperature. P^{obs} is the probability of a particular state (d_{
ij
}, θ_{
i
}, φ_{
i
}, θ_{
j
}, φ_{
j
}, ω, R_{
i
}, R_{
j
}) observed in a sample of known protein structures and P^{exp} is the expected probability of the same state in a reference state where the interaction is zero. Considering that there are a finite number of known protein structures, we assume conditional dependencies of parameters as shown in Figure 2 to obtain sufficient statistics. Namely, the angular parameters are assumed as independent of each other at the given distance and rotameric states like other studies [38, 40, 48]. Thus the joint probability can be written as

Here, E(R_{
i
}) and E(R_{
j
}) can be seen as rotamer intrinsic energy. Assuming that the stability of overall folded structure is mainly determined by non-bonded interactions, we ignore these terms in this study.

Defining the rotameric state

A rotameric state is a combination of side-chain dihedral angles that describes the residue conformation, assuming the bond lengths and angles are fixed (see Figure 3). The observed side-chain dihedral angles cluster around ideal values, such as +60°, −60°, and 180° dihedral angles expected between two sp3 hybridized atoms (see Figure 3B). Since long residues such as Met, Lys or Arg have too many rotameric states to obtain sufficient statistics for each rotamer, we associate up to two side-chain dihedral angles whose rotating bonds are within 3 bond lengths from the considered atom to its rotameric state. For example, the local structural environment of C^{β}, C^{γ} and C^{δ} atoms in Lys is defined by a combination of {X_{1}, X_{2}}, {X_{2}, X_{3}} and {X_{3}, X_{4}} dihedral angles, respectively. One exception is the backbone oxygen atom, which is related to {X_{1}, X_{2}} angles because it frequently interacts with side-chain atoms depending on backbone ψ angle. Also, every atom in Pro is associated to only X_{1} angle because X_{2} angle is strongly correlated with X_{1} angle.

For each side-chain dihedral angle, we divide the dihedral angle space into three or two regions. The dihedral angle between two sp3 hybridized atoms is classified into three distinct rotameric states: 0° ~ 120° (g+), −120° ~ 0° (g-), and 120° ~ 240° (t). Last dihedral angles of Asn, Asp, Gln, Glu, His, Trp, Phe and Tyr are non-rotameric [49]. For those non-rotameric dihedral angles, we divide the dihedral angle space into two regions, {(0 ~ π), (−π ~ 0)}. X_{1} dihedral angle of Pro is also divided into two regions, positive or negative. All 167 heavy atom types and their associated dihedral angles for defining the local structural environments are listed in Table 1.

Construction of distance-dependent potential

In ROTAS, the distance-dependent pairwise energy term does not involve the rotamer-dependence. While the observed distance-dependent pairwise probability P^{obs}(d_{
ij
}) can be calculated straightforwardly, a reference state needs to be defined to compute the expected probability P^{exp}(d_{
ij
}). Because the focus of this work is the effect of rotamer-dependence on the performance of potential energy function, we simply employed the DFIRE [50] reference state. The DFIRE reference state is an ideal gas system in which atoms are uniformly distributed, and has been successfully applied in other studies [38, 40]. The DFIRE-based distance-dependent potential energy can be calculated by

where N^{obs}(d_{
ij
}) is the number of observed atom pair i and j at distance d, and α is a scaling factor such that N^{exp}(d) increases in d^{α}. Beyond a distance cutoff {d}_{\mathit{ij}}^{\mathit{cut}}, it is assumed that both observed and expected pairwise distributions are equal. Here we set {d}_{\mathit{ij}}^{\mathit{cut}} = 15 Å and α = 1.61 as suggested by the original work [50]. To obtain the distribution, the bin width is set to 0.5 Å from 0 to 15 Å. When estimating the observed probability and evaluating the distance-dependent pairwise potential, atom pairs that are in the same residue are excluded.

In addition to DFIRE, we constructed other widely used distance-dependent potentials such as RAPDF [51], KBP [52], DOPE [53] and RW [39] and tested each of them in ROTAS in order to examine the influence of different reference states on the performance of ROTAS. The same structural database, distance cutoff and bin width were applied.

Construction of orientation-dependent potential

In order to obtain smooth and continuous estimates of the observed probability distribution of angular parameters {θ_{
i
}, φ_{
i
}, ω} for a particular distance and rotameric state (d_{
ij
}, R_{
i
}) from a finite sample data, we employed kernel density estimation. Suppose that {θ_{
s
}}_{s = 1 … N} is a set of angles θi collected at a given distance d_{
ij
} and rotameric state R_{
i
}. Then the probability density p(θ_{
i
}|d_{
ij
}, R_{
i
}) can be calculated using von Mises distribution as the kernel:

where K_{
VM
} denotes the von Mises kernel function, κ is the kernel bandwidth controlling the smoothness of the kernel and I_{
0
} is the Bessel function of the first kind of order 0. Here, we set κ = 8.21 which is equivalent to σ = π/9 in the normal distribution. The distances d_{
ij
} were discretized into 0.5 Å bins which span from 2 to 15 Å. The kernel density estimator is computed at π/9 grid points that are ranged from -π to π (in case of ϕ, from -π/2 to π/2).

The relative orientation between atoms is significantly affected by chain connectivity constrains when the atoms are positioned in residues that are close in the sequence. In order to reduce the chain (or bond) connectivity effect on the estimates of orientation-dependent probability, we applied a sequence separation as done in other studies [33, 40]. In this study, only atom pairs that are separated by at least 6 residues along the protein chain are considered.

Despite the use of kernel density estimation, in the case of rarely observed rotameric states in protein structures, there is still a problem of insufficient sample data. For example, the number of Ile rotamers in (+60°, +60°) dihedral pair is less than 1,000 in our database. In such case, rather than using poorly estimated probability density p^{obs}(θ_{
i
}|d_{
ij
}, R_{
i
}), we calculated the corrected probability density {p}_{\mathit{corr}}^{\mathit{obs}}\left({\theta}_{i}|{d}_{\mathit{ij}},\phantom{\rule{0.25em}{0ex}}{R}_{i}\right) as a linear combination of p^{obs}(θ_{
i
}|d_{
ij
}, R_{
i
}) and p^{obs}(θ_{
i
}|d_{
ij
}):

where N(d_{
ij
}, R_{
i
}) is the number of observations used to estimate P^{obs}(θ_{
i
}|d_{
ij
}, R_{
i
}) and σ is a parameter that controls how many observations must be sampled such that both P^{obs}(θ_{
i
}|d_{
ij
}, R_{
i
}) and P^{obs}(θ_{
i
}|d_{
ij
}) would have equal weights. Here we set σ = 1/100.

The expected probability distribution of angles can be calculated from a reference state in which the relative orientation of atom pair is determined randomly. Thus the expected probability is calculated by:

where M is a normalization factor such that the integration of P^{exp}(ϕ) from -π/2 to π/2 becomes one. P^{exp}(ϕ) is calculated numerically because there is no analytical way for integrating above equation.

Interaction cutoff for ROTAS

Although the distance bin between 14.5 and 15 Å was used as the cutoff in the construction of distance-dependent pairwise potential, we calculate the energy score within 10 Å and ignore the long-range tail of potentials beyond 10 Å. In fact, most physical interactions between atoms rapidly converge to zero beyond 8 ~ 10 Å. However, statistically derived potentials are likely to have fluctuations in the long-range, which inherently resulted from the statistical uncertainties. For example, Figures 4 reveals that the deviations of the observed probability from the expected probability for angular parameters do not consistently decrease as the atom-pair distance increases. It is noted that the root mean square of (P^{obs}(ϕ|d) − P^{exp}(ϕ|d)) increase after 12 Å. In addition, it was reported that distance-dependent pairwise potentials between hydrophobic atom pairs have either repulsive or attractive tail in the long range, even if no electrostatic interaction exists [7]. Thus it’s not always beneficial to include the long-range interactions in statistical potentials. We set the interaction cutoff to 10 Å without fine-tuning against a specific training dataset.

Preparation of protein structures

We obtained a set of protein X-ray structures with a maximum R-factor of 0.25 and a resolution better than 2 Å from the protein sequence culling server, PISCES [54]. Also, protein chains were filtered out with a 40% sequence identity cutoff in order to have a set of non-homologous protein structures. A total 9321 protein structures were selected and downloaded from the Protein Data Bank (PDB) [55]. We did not attempt to exclude the homologous proteins to the test decoy sets from the 9321 proteins used for constructing the potential. It was reported that the exclusion has very little effect on the performance of statistical potentials [50]. The program REDUCE [56] was used to optimize the flip states of Asn, Gln, and His in all protein structures. Residues with multiple side-chain conformations were modified such that only the side-chain conformations with atoms having the highest occupancy and/or lowest temperature factors were used.

Performance evaluation using decoy sets

We tested the ROTAS potential on various sets of decoys generated by different methods. A total of 13 decoy sets, including 4 state_reduced [57], fisa [58], fisa_casp3 [58], lmds [59], hg_structal, ig_structal, ig_structal_hires, lattice_ssfit [60], moulder [61], Rosetta [62], I-TASSER [39], AMBER99 [63] and CASP5-8 [64], were used The first 8 decoy sets were downloaded from the Decoys ‘R’ Us database [65] (http://dd.compbio.washington.edu/). The moulder decoy set produced by iterative target-template alignment and comparative-modeling methods was download from the Sali lab (http://salilab.org/decoys/). Three ab-initio simulation based decoy sets, Rosetta, I-TASSWER, Amber99 were obtained from http://zhanglab.ccmb.med.umich.edu/decoys/, and http://cssb.biology.gatech.edu/amberff99/, respectively. The CASP5-8 decoy set collected from the CASP5-CASP8 experiments was downloaded from http://zhanglab.ccmb.med.umich.edu/RW/ (cleaned version). The decoy models in this set were generated by a large variety of groups and methods participated in the CASP experiments.

The performance of ROTAS potential was compared to those of four other existing atomic potentials which take into account the orientation-dependencies on the interactions between atoms, blocks or side-chains: dDFIRE [38], OPUS_PSP [37], RWplus [39], and GOAP [40]. Furthermore, we compared ROTAS to evolutionary pairwise distance-dependent potential, EPAD [66] and attempted to combine both potentials to maximize the performance. The binary programs for these potentials were downloaded from the corresponding authors’ websites. Because ROTAS can be seen as an extended version of GOAP, we constructed our own GOAP potential energy function using the same structure database and techniques that were used for the construction of ROTAS. In this manner we reduced the possibility that estimation of probability distribution, specific computational implementation, or other technical aspects could affect the results, so that the improvements of ROTAS compared to GOAP can be fairly demonstrated.

The performance of statistical potentials is evaluated by four aspects: (1) the recognition of native structure from decoys, (2) the selection of the best (most native-like) decoy model, (3) the correlation between the energy score and model quality, and (4) the classification of near-native and non-native model. The quality of decoy models was assessed by TM-score which measures the similarity between two protein structures by a score between (0, 1] [67].

Results and discussion

The influence of rotameric states on atomic interactions

We constructed both ROTAS and GOAP potentials using the same structure database and techniques as described above. Figure 5 shows the energy profiles of ROTAS and GOAP for four different atom pairs. First of all, all examples clearly show that the energy profiles of ROTAS significantly vary depending on the rotameric state. While GOAP only reflects in some average sense the preferred orientation between interacting atoms, ROTAS adjusts the preferred orientation accurately depending on the rotameric state. The first example shows the disulfide interaction between Cys S^{γ} atoms (see Figure 5A). The torsional angular term E\left(\omega |{d}_{\mathit{ij}},{R}_{i}\right) has two distinct favored positions regardless of the rotameric state. However, E(θ_{
i
}|d_{
ij
}, R_{
i
}) shows slightly different curves. The most favored positions for θ_{
i
} are 90°, −72° and 72° for three rotameric states of Cys, g+, g-, and t, respectively. This might be due to close steric interactions between the backbone atoms and Cys S^{γ}. The second example is a typical hydrogen bond interaction between Ser O and Gly N at a distance of 3 Å (see Figure 5B). It is observed that different relative position of Ser O^{γ} atom significantly affects on the hydrogen bond interaction between backbone atoms. Figure 5C shows an example of a non-polar interaction between Ile C^{γ2} and Val C^{γ1} at a distance of 5 Å. In this example, the GOAP potential shows very similar energy profiles with a particular rotameric state, (X1 = g- and X2 = t), which is the most populated rotamer for Ile (59% of Ile residues observed in this rotamer). The last example shows a polar interaction between Lys N^{ξ} and Asp O^{δ2} at a distance of 7 Å. It is noted that, although the pair distance is relatively longer rather than previous examples, the energy profiles of different rotameric states significantly differ. This suggests that the rotamer-dependency is not limited to short range interactions resulting from strong steric effects.

Native structure recognition

We assessed the performance of ROTAS in terms of its ability for recognizing the native structures from decoy models and compared it with those of four other atomic statistical potentials. In this test, the performance was assessed by two measures: the number of targets having the native structure ranked as the lowest energy score and Z-score of the native structure. The Z-score represents the energy gap between the energy of native structure (E_{
native
}) and the averaged energy of all decoys (〈E_{
decoy
}〉) in units of the energy standard deviation of all decoys (σ_{
decoy
}), which is defined as:

The lower the Z-score, the better the potential is for recognizing the native structures.

The results of the native structure recognition are summarized in Table 2. ROTAS could recognize total 409 native structures correctly out of 469 targets, which is the best success rate (87.2%) in the comparison. Although RWplus and GOAP record the highest success rate on I-TASSER and Amber 99, respectively, for the remaining 11 decoy sets, ROTAS recognized native structures more or equal than other potentials. GOAP recognized 399 native structures (85.1% success rate) with the average Z-score of −3.35. These results are consistent with those in the GOAP article which reported that the success rate and the average Z-score of GOAP are 81.3% (226 out of 278) and −3.57, respectively.

The relative improvement of ROTAS over GOAP can be clearly seen in the average Z-scores. While GOAP correctly recognized the native structures comparable to ROTAS, it is noticed that ROTAS shows consistently improved Z-scores over all decoy sets tested here. Figure 6 shows the relationship between the energy scores of ROTAS and GOAP for all native (red) and decoy (gray) structures used in the test. It can be easily confirmed that ROTAS scores native structures with lower energies and decoy models with higher energies, compared to GOAP. We further investigated how this relationship can be affected by the interaction cutoff or the database used for deriving statistical potentials (see Additional file 1). We found that, as the interaction cutoff increases (e.g. > 10 Å), the correlation between ROTAS and GOAP scores decreases. However, the tendency that ROTAS gives lower scores to native structures than GOAP could be observed over different cutoffs. The use of different databases did not make a significant change in the relationship.

We found that the performance of ROTAS in native structure recognition is largely affected by experimental methods used to determine the native structures. The success rate of ROTAS is 89% for targets whose native structures were determined by X-ray crystallography, whereas the success rate significantly decreases to 60% when the native structures were determined by NMR spectroscopy (see Table 3). Furthermore, both the average success rate and Z-score decrease for low-resolution native structures. This might be because the ROTAS potential was constructed based on high-resolution X-ray structures. The large margin of error in the location of atoms in low-resolution structures (e.g., > 2.2 Å) would decrease the confidence of computed energy score. This trend is also observed for other potential energy functions except RWplus which performs very well on NMR native structures. In fact, the RWplus potential can correctly recognize all 18 native NMR structures in the I-TASSER decoy set with low Z-scores.

Best model selection

We also assessed the ability of ROTAS in selecting the best models without native structures. This is more difficult and realistic task than the native structure recognition because, in practice, potential energy functions are used to find more and more native-like conformations in an iterative way when the native structure is not known. Thus, good potential energy function should be able to score the most native-like decoy model in the lowest energy. In this study, we use TM-score [67] to assess the quality of decoy models quantitatively. The TM-score measures the similarity between two protein structures by a score between (0, 1]. It is reported that TM-score is more accurate than other measures such as RMSD or GDT_TS because TM-score is sensitive to overall topology rather than local substructures [68].

Table 4 summarizes the result of the best model selection by dDFIRE, OPUS_PSP, RWplus, GOAP and ROTAS for 13 decoy sets. Measures log P_{B1} and log P_{B10} are the log probability of selecting the best (highest TM-score) model as the lowest energy model or among the top 10 lowest energy models, respectively. Suppose the top i^{th} scoring conformation x_{
i
} has the TM-score rank of R_{
i
} in n decoy models, then the log probability can be calculated as

In both measures, GOAP and ROTAS shows better performance than other three potentials, dDFIRE, OPUS_PSP and RWplus. The average log P_{B1} by GOAP is slightly better than that by ROTAS, whereas the average log P_{B10} by ROTAS is better than that by GOAP. This indicates that the lowest energy model by GOAP is likely to be better in TM-score than that by ROTAS. However, when we consider the top 10 lowest energy models, ROTAS tend to include better TM-score decoy models in the top 10 than GOAP.

Correlation between the energy score and decoy model quality

Next, we examined the correlation of the energy score and the quality of decoy models in order to assess the ability of ROTAS in guiding conformation sampling to near-native states. In an energy landscape perspective, a good potential energy function should not only be able to make a deep energy minimum with steep wall at the native state but also be able to form a middle-range funnel biased toward the native state. In Table 5, we compare the performance of potentials as assessed by both their Pearson correlation coefficient (r) and the Kendall’s rank correlation coefficient (τ) between the energy score and TM-score. Overall, the performance of potentials does not show significant difference depending on the correlation measures. We find that ROTAS shows the best performance in both measures. GOAP yields the second best performance in the average correlation coefficients. dDFIRE and RWplus have comparable performance although the average correlation coefficients of RWplus is slightly better than those of dDFIRE. OPUS_PSP performs significantly worse than the other potentials tested although its performance comes in third in the native structure recognition. Figure 7 shows some examples of the correlation between ROTAS energy and TM-score from different decoy sets.

Classification of near-native and non-native model

In order to compare the performance of ROTAS and other potentials in a more robust way, we evaluated the performance of statistical potentials using receiver operating characteristic (ROC) technique [69]. That is, the energy score was used to rank the decoy models for each target, and then thresholds were applied to classify a group of near-native models among a pool of putative models. The near-native (positive) were defined as those with TM-score larger than 0.5 with respect to the native structure, and non-native (negative) models otherwise. In fact, it is reported that protein structures having a TM-score > 0.5 are mostly in the same fold [68]. ROC curves were obtained by plotting the true positive ratio against the corresponding false positive ratio for all thresholds on the energy score.

We computed the area under the ROC curve (AUC) which provides a robust measure of accuracy over the whole range of thresholds. In the context of this test, the AUC represents the probability that a potential energy function scores a randomly chosen near-native (positive) model lower than a randomly chosen non-native (negative) model. Table 6 presents the results of the classification test. The average AUC for each decoy set is shown. We performed this classification test only on targets having a sufficient number of near-native models (>10). The four decoy sets including hg_structal, ig_structal, ig_structal_hires, and lattice_ssfit were excluded, accordingly. Although RWplus and dDFIRE give the best average AUCs for one or two decoy sets, ROTAS provides the best classification performance against all other decoy sets. Thus, the highest average AUC for all targets is obtained by ROTAS.

To quantify the statistical significance of the difference between ROTAS and other potentials, P values of the paired t-test of the differences between ROTAS and other potentials for the AUCs were also calculated. Cleary, ROTAS gives statistically significant (P value < 0.01) better results than all other potentials.

So far, the results showed that ROTAS performs better than other competing potentials not only in native structure recognition, but also in best model selection and correlation coefficients between energy and model quality. The following sections discuss factors affecting on the performance of ROTAS as well as a possible way to improve the performance by combining other statistical potentials.

Interaction cutoff effect on the performance

The interaction cutoff effect on the performance of ROTAS and GOAP was examined. The performances of ROTAS and GOAP are significantly affected by the interaction cutoff (see Figure 8). Interaction cutoffs between 7 and 10 Å maximize the number of correctly recognized native structures and minimize the average Z-score for both potentials. Increasing or decreasing the cutoff outside of this range makes the performance for native structure recognition worse dramatically. The performance of ROTAS and GOAP for recognizing the best models is maximized around 11 ~ 13 Å. On the other hand, as the interaction cutoff increases, the average correlation coefficient decreases. But the slopes around 13 ~ 15 Å are almost zero. Although the optimal interaction cutoff varies depending on the evaluation criteria, we confirm that the long-range interactions in statistical potentials could reduce the performance of potentials and an interaction cutoff of 10 Å for ROTAS gives a moderate performance on various evaluation criteria. It should be noticed that even though optimal interaction cutoffs are applied to individual potentials, ROTAS performs better than GOAP.

It is noticed that the highest average correlation coefficient is obtained when we consider all the long-range interactions available in the potentials. However, in this case, the native structures are poorly recognized. A similar observation that a scoring function producing a good linear correlation is normally less capable of recognizing the native state has been reported in a previous study [70]. A theoretical study argue that the potential energy of near-native conformations might not be linearly related to their distances from the native state [71]. Also, since a shorter interaction cutoff would increase ruggedness of the energy landscape [72], the energy score of decoy models might be affected by small structural differences sensitively.

Different reference states

We applied five widely-used reference states including DFIRE, DOPE, RW, RAPDF and KBP for the distance-dependent pairwise potential in ROTAS and compared their performances. To rigorously compare the influence of the reference state on the performance, we constructed all five distance-dependent pairwise potentials using the same structure database, the same cutoff distance, and the same bin width. Table 7 summarizes the performance results on the 13 decoy sets. It is not clear to find the best reference state outperforming other reference states. In terms of Rank1, there is little difference on the performance. Each reference state shows strength on difference evaluation criteria as incorporated into ROTAS. The RAPDF reference state gives the best average Z-score whereas the DFIRE reference state shows the best average log P_{B10}. The RW reference state shows the best performance on log P_{B1} and both correlation measures. Overall, the DFIRE and RW reference states are found to show better performance than other three reference states in ROTAS.

Possible improvement by incorporating evolutionary information

Beyond structural features embedded in known protein structures, evolutionary information also can be utilized in protein structure prediction [73]. Evolutionary pairwise distance-dependent potential (EPAD) [66] is a successful example of statistical potentials utilizing evolutionary information in a large amount of sequence data. In fact, EPAD has different energy profile between two atoms depending on the protein under consideration and the sequence profile context of the atoms (i.e. evolutionary information). As a possible way to improve ROTAS, we built a composite energy function by replacing the distance-dependent pairwise energy term in ROTAS with EPAD.

Table 8 compares the performance of EPAD, ROTAS and the composite energy function, EPAD + ROTAS. It was confirmed that ROTAS could improve the performance in native structure recognition when incorporating EPAD. It correctly recognized 417 native structures, 7 more than ROTAS alone. The average Z-score was also improved. However, in correlation coefficients, EPAD shows the best performance, which indicates that EPAD would be good for ab initio folding. It should be noted that, in EPAD + ROTAS, we did not fine-tune weights for energy terms (i.e. equal weight). In fact, EPAD ignores side-chain atoms in energy calculation (i.e. backbone-based potential), while ROTAS takes all atoms into account. Thus, it would be desirable to adjust weights for ROTAS and EPAD to maximize the performance when building a composite energy function.

Conclusions

In this study, we hypothesized that the rotameric state of residues critically affects on the specificity of non-bonded interactions within protein structures. This idea was applied to develop a new multibody statistical potential (ROTAS) for protein structure prediction. The interaction between two atoms is specified by not only the distance and relative orientation but also by two state parameters concerning the rotameric state of the residues to which the interacting atoms belong. It was clearly found that the rotameric state is correlated to the specificity of atomic interactions. Furthermore, such rotamer-dependencies are not limited to specific type or certain range of interactions.

The incorporation of accurate modeling of residue flexibility has been shown to be a possible means of improving the specificity of potential energy functions. We tested ROTAS using various decoy sets and compared its performance to those of several existing atomic statistical potentials which incorporate orientation-dependent energy terms. For a fair comparison, we implemented our own GOAP potential using the same structure database and techniques used for the construction of ROTAS. The results showed that ROTAS performs better than other competing potentials not only in native structure recognition, but also in best model selection and correlation coefficients between energy and model quality. In particular, the relative improvement of ROTAS over GOAP indicates that the rotameric state of residues can be incorporated for a fine-tuning of atomic-level statistical potentials. The effectiveness of ROTAS may provide insightful information for the development of many applications which require accurate side-chain modeling such as homology modeling, protein design, mutation analysis, protein-protein docking and flexible ligand docking.

References

Tanaka S, Scheraga HA: Medium- and long-range interaction parameters between amino acids for predicting three-dimensional structures of proteins. Macromolecules. 1976, 9: 945-950. 10.1021/ma60054a013.

Sippl MJ: Calculation of conformational ensembles from potentials of mean force. an approach to the knowledge-based prediction of local structures in globular proteins. J Mol Biol. 1990, 213: 859-883. 10.1016/S0022-2836(05)80269-4.

Thomas PD, Dill KA: Statistical potentials extracted from protein structures: how accurate are they?. J Mol Biol. 1996, 257: 457-469. 10.1006/jmbi.1996.0175.

Hamelryck T, Borg M, Paluszewski M, Paulsen J, Frellsen J, Andreetta C, Boomsma W, Bottaro S, Ferkinghoff-Borg J: Potentials of mean force for protein structure prediction vindicated, formalized and generalized. PLoS One. 2010, 5: e13714-10.1371/journal.pone.0013714.

Miyazawa S, Jernigan RL: An empirical energy potential with a reference state for protein fold and sequence recognition. Proteins. 1999, 36: 357-369. 10.1002/(SICI)1097-0134(19990815)36:3<357::AID-PROT10>3.0.CO;2-U.

Su Y, Zhou A, Xia X, Li W, Sun Z: Quantitative prediction of protein-protein binding affinity with a potential of mean force considering volume correction. Protein Sci. 2009, 18: 2550-2558. 10.1002/pro.257.

Deane CM, Blundell TL: A novel exhaustive search algorithm for predicting the conformation of polypeptide segments in proteins. Proteins Struct Funct Genet. 2000, 40: 135-144. 10.1002/(SICI)1097-0134(20000701)40:1<135::AID-PROT150>3.0.CO;2-1.

De Brevern AG, Valadié H, Hazout S, Etchebest C: Extension of a local backbone description using a structural alphabet: a new approach to the sequence-structure relationship. Protein Sci. 2002, 11: 2871-2886.

Figureau A, Soto MA, Tohá J: A pentapeptide-based method for protein secondary structure prediction. Protein Eng. 2003, 16: 103-107. 10.1093/proeng/gzg019.

Fernández A, Sosnick TR, Colubri A: Dynamics of hydrogen bond desolvation in protein folding. J Mol Biol. 2002, 321: 659-675. 10.1016/S0022-2836(02)00679-4.

Kolinski A, Skolnick J: Discretized model of proteins. I. Monte Carlo study of cooperativity in homopolypeptides. J Chem Phys. 1992, 97: 9412-9426. 10.1063/1.463317.

Jonassen I, Eidhammer I, Conklin D, Taylor WR: Structure motif discovery and mining the PDB. Bioinformatics. 2002, 18: 362-367. 10.1093/bioinformatics/18.2.362.

Karlin S, Zhu Z-Y: Characterizations of diverse residue clusters in protein three-dimensional structures. Proc Natl Acad Sci U S A. 1996, 93: 8344-8349. 10.1073/pnas.93.16.8344.

Zhu Z-Y, Karlin S: Clusters of charged residues in protein three-dimensional structures. Proc Natl Acad Sci U S A. 1996, 93: 8350-8355. 10.1073/pnas.93.16.8350.

Jonassen I, Eidhammer I, Taylor WR: Discovery of local packing motifs in protein structures. Proteins Struct Funct Genet. 1999, 34: 206-219. 10.1002/(SICI)1097-0134(19990201)34:2<206::AID-PROT6>3.0.CO;2-N.

Mayewski S: A multibody, whole-residue potential for protein structures, with testing by Monte Carlo simulated annealing. Proteins. 2005, 59: 152-169. 10.1002/prot.20397.

Munson PJ, Singh RK: Statistical significance of hierarchical multi-body potentials based on Delaunay tessellation and their application in sequence-structure alignment. Protein Sci. 1997, 6: 1467-1481. 10.1002/pro.5560060711.

Buchete N-V, Straub JE, Thirumalai D: Development of novel statistical potentials for protein fold recognition. Curr Opin Struct Biol. 2004, 14: 225-232. 10.1016/j.sbi.2004.03.002.

Miyazawa S, Jernigan RL: How effective for fold recognition is a potential of mean force that includes relative orientations between contacting residues in proteins?. J Chem Phys. 2005, 122: 024901-10.1063/1.1824012.

Wu Y, Lu M, Chen M, Li J, Ma J: OPUS-Ca: a knowledge-based potential function requiring only Calpha positions. Protein Sci. 2007, 16: 1449-1463. 10.1110/ps.072796107.

Lu M, Dousis AD, Ma J: OPUS-PSP: an orientation-dependent statistical all-atom potential derived from side-chain packing. J Mol Biol. 2008, 376: 288-301. 10.1016/j.jmb.2007.11.033.

Yang Y, Zhou Y: Specific interactions for ab initio folding of protein terminal regions with secondary structures. Proteins. 2008, 72: 793-803. 10.1002/prot.21968.

Zhang J, Zhang Y: A novel side-chain orientation dependent potential derived from random-walk reference state for protein fold selection and structure prediction. PLoS One. 2010, 5: e15386-10.1371/journal.pone.0015386.

Ponder JW, Richards FM: Tertiary templates for proteins: use of packing criteria in the enumeration of allowed sequences for different structural classes. J Mol Biol. 1987, 193: 775-791. 10.1016/0022-2836(87)90358-5.

Schrauber H, Eisenhaber F, Argos P: Rotamers: to be or not to be? An analysis of amino acid side-chain conformations in globular proteins. J Mol Biol. 1993, 230: 592-612. 10.1006/jmbi.1993.1172.

Dunbrack RL, Karplus M: Conformational-analysis of the backbone-dependent rotamer preferences of protein side-chains. Nat Struct Biol. 1994, 1: 334-340. 10.1038/nsb0594-334.

Shapovalov MV, Dunbrack RL: A smoothed backbone-dependent rotamer library for proteins derived from adaptive kernel density estimates and regressions. Structure. 2011, 19: 844-858. 10.1016/j.str.2011.03.019.

Zhou H, Zhou Y: Distance-scaled, finite ideal-gas reference state improves structure-derived potentials of mean force for structure selection and stability prediction. Protein Sci. 2002, 11: 2714-2726.

Samudrala R, Moult J: An all-atom distance-dependent conditional probability discriminatory function for protein structure prediction. J Mol Biol. 1998, 275: 895-916. 10.1006/jmbi.1997.1479.

Lu H, Skolnick J: A distance-dependent atomic knowledge-based potential for improved protein structure selection. Proteins. 2001, 44: 223-232. 10.1002/prot.1087.

Word JM, Lovell SC, Richardson JS, Richardson DC: Asparagine and glutamine: using hydrogen atom contacts in the choice of side-chain amide orientation. J Mol Biol. 1999, 285: 1735-1747. 10.1006/jmbi.1998.2401.

Park B, Levitt M: Energy functions that discriminate X-ray and near-native folds from well-constructed decoys. J Mol Biol. 1996, 258: 367-392. 10.1006/jmbi.1996.0256.

Simons KT, Kooperberg C, Huang E, Baker D: Assembly of protein tertiary structures from fragments with similar local sequences using simulated annealing and Bayesian scoring functions. J Mol Biol. 1997, 268: 209-225. 10.1006/jmbi.1997.0959.

Keasar C, Levitt M: A novel approach to decoy set generation: designing a physical energy function having local minima with native structure characteristics. J Mol Biol. 2003, 329: 159-174. 10.1016/S0022-2836(03)00323-1.

Xia Y, Huang ES, Levitt M, Samudrala R: Ab initio construction of protein tertiary structures using a hierarchical approach. J Mol Biol. 2000, 300: 171-185. 10.1006/jmbi.2000.3835.

John B, Sali A: Comparative protein structure modeling by iterative alignment, model building and model assessment. Nucleic Acids Res. 2003, 31: 3982-3992. 10.1093/nar/gkg460.

Wroblewska L, Skolnick J: Can a physics-based, all-atom potential find a protein’s native structure among misfolded structures? I. Large scale AMBER benchmarking. J Comput Chem. 2007, 28: 2059-2066. 10.1002/jcc.20720.

Rykunov D, Fiser A: New statistical potential for quality assessment of protein models and a survey of energy functions. BMC Bioinformatics. 2010, 11: 128-10.1186/1471-2105-11-128.

Zhao F, Xu J: A position-specific distance-dependent statistical potential for protein structure and functional study. Structure. 2012, 20: 1118-1126. 10.1016/j.str.2012.04.003.

Xu J, Zhang Y: How significant is a protein structure similarity with TM-score = 0.5?. Bioinformatics. 2010, 26: 889-895. 10.1093/bioinformatics/btq066.

Cossio P, Granata D, Laio A, Seno F, Trovato A: A simple and efficient statistical potential for scoring ensembles of protein structures. Sci Rep. 2012, 2: 351-doi:10.1038/srep00351

Panjkovich A, Melo F, Marti-Renom MA: Evolutionary potentials: structure specific knowledge-based potentials exploiting the evolutionary record of sequence homologs. Genome Biol. 2008, 9: R68-10.1186/gb-2008-9-4-r68.

The authors declare that they have no competing interests.

Authors’ contributions

JP developed the ROTAS potential energy function, conducted comparison tests, and drafted the manuscript. KS conceived of the study, and participated in its design and coordination and revised the draft manuscript. Both authors read and approved the final manuscript.

Additional file 1: The effects of sample database and interaction cutoff on the relationship between ROTAS and GOAP scores. Two different databases, each of which includes 6,000 protein structures randomly selected from our database, are used to derive GOAP and ROTAS. (DOCX 107 KB)

Authors’ original submitted files for images

Below are the links to the authors’ original submitted files for images.

This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly credited. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.

Park, J., Saitou, K. ROTAS: a rotamer-dependent, atomic statistical potential for assessment and prediction of protein structures.
BMC Bioinformatics15, 307 (2014). https://doi.org/10.1186/1471-2105-15-307

## Comments

View archived comments (1)