Validation of protein models by a neural network approach
© Mereghetti et al; licensee BioMed Central Ltd. 2008
Received: 24 April 2007
Accepted: 29 January 2008
Published: 29 January 2008
The development and improvement of reliable computational methods designed to evaluate the quality of protein models is relevant in the context of protein structure refinement, which has been recently identified as one of the bottlenecks limiting the quality and usefulness of protein structure prediction.
In this contribution, we present a computational method (Artificial Intelligence Decoys Evaluator: AIDE) which is able to consistently discriminate between correct and incorrect protein models. In particular, the method is based on neural networks that use as input 15 structural parameters, which include energy, solvent accessible surface, hydrophobic contacts and secondary structure content. The results obtained with AIDE on a set of decoy structures were evaluated using statistical indicators such as Pearson correlation coefficients, Z nat , fraction enrichment, as well as ROC plots. It turned out that AIDE performances are comparable and often complementary to available state-of-the-art learning-based methods.
In light of the results obtained with AIDE, as well as its comparison with available learning-based methods, it can be concluded that AIDE can be successfully used to evaluate the quality of protein structures. The use of AIDE in combination with other evaluation tools is expected to further enhance protein refinement efforts.
The very large and continuously increasing amount of data obtained by genome sequencing makes the development of reliable computational methods capable to infer protein structures from sequences a crucial step for functional annotation of proteins. In fact, functional annotation is often strictly dependent on the availability of structural data, which in turn are still difficult to obtain experimentally. As a consequence, efforts and progresses in high throughput X-ray and NMR methods need to be accompanied by computational techniques suitable for three-dimensional structure predictions, such as homology modeling, fold recognition or ab-initio methods [1–7], which are intrinsically characterized by different levels of accuracy.
In parallel to the development and improvement of prediction methods, reliable and accurate evaluation tools are necessary to check the quality of computational protein models [8, 9]. Moreover, in the context of protein structure refinement, which has been recently identified as one of the bottlenecks limiting the quality and usefulness of protein structure prediction , it has been noted that improvements in the selection of the most native-like model from an ensemble of closely related alternative conformations can be crucial. The increasing importance of the field of quality assessment methods is demonstrated by the introduction of a dedicated section in the latest CASP edition (CASP7) .
To evaluate protein structures, several different scoring functions have been developed, which can be classified into different categories depending on the principles and on the structural features considered in the evaluation. Physical scoring (energy) functions aim to describe the physics of the interaction between atoms in a protein and are generally parameterized on molecular systems smaller than proteins . Knowledge-based scoring functions are designed by evaluating the differences between some selected features of a random protein model and the characteristics of a real protein structure [12–16].
Learning-based functions can be developed by training algorithms to discriminate between correct and incorrect models . Independently by the category, scoring functions are generally tested by examining their capability to detect the native structure among a set of decoys , which can be generated in several different ways [19–21].
It is important to note that the performance of learning-based functions are generally strongly dependent on the specific aim for which they were developed, and consequently on the training set used. As an example, ProQ, a neural network based method developed to predict the quality of protein models , was specifically designed to discriminate between correct and wrong models, i.e. to recognize folds that are not compatible with a protein sequence. In fact, ProQ was recently combined successfully with the Pcons  fold recognition predictor and ranked as one the best methods in a recent survey of quality assessment methods . Other reliable and extensively used computational methods used to validate the quality of protein structures are PROSA , ERRAT , Verify 3D [25, 26], PROCHECK , what-if , PROVE  and victor/FRST .
In the present contribution, we present a computational method (A rtificial I ntelligence D ecoys E valuator: AIDE) that is able to reliably and consistently discriminate between correct and incorrect protein models. In particular, the quality of the protein structure is evaluated with neural networks using as input 15 structural parameters, which include solvent accessible surface, hydrophobic contacts and secondary structure content. In the first section of the paper, the neural network structure and the training procedure are presented and discussed. In the second section, the performance of the neural network is evaluated, compared to available methods, and critically discussed.
Results and Discussion
The evaluation of the quality of protein structures is generally carried out calculating a score which is a function of a set of parameter values computed for the protein model under study. In our computational procedure, the description of the relation between the parameters space and the scoring values is obtained using neural networks, because of their ability to describe complex non-linear relationships among data.
Selection of protein parameters related to structure quality
Among the possible parameters that can be computed for a protein structure, we have selected some properties that are expected to be related to structure quality: solvent accessible surface of hydrophobic and hydrophilic residues, secondary structure content, the fraction of secondary structure content of the model fitting with that predicted by PSIPRED , number of hydrophobic contacts, and selected PROCHECK parameters  (see Methods for details). It should be noted that other possibly relevant parameters, such as the number of hydrogen bonds, have not been used due to intrinsic difficulties in the normalisation of their values.
Selection of the parameters used to evaluate structure similarity
A key issue for evaluating the quality of a predicted protein structure is the measure of its "distance" relative to the "real" structure, experimentally obtained by X-ray diffraction or NMR. Since AIDE has been developed to evaluate protein models that are often characterized by the correct fold but may differ for local details, the backbone root mean square deviation (RMSD) of the protein model relative to the X-ray structure can be considered a suitable measure of structure similarity . In fact, it is well known that the proper evaluation of the quality of protein structures can be a non-trivial task, often depending on the methods used to generate protein models. Therefore, several other measures of protein structure similarity have been formulated, the most commonly used being: GDT-TS , LG-score , TM-score  and MaxSub , which have also been adopted in the present work.
Selection and optimization of the neural networks
A preliminary evaluation of the relative importance of each parameter in the description of structure quality was obtained using a linear models built with the M5-prime attribute selection algorithm , as implemented in Weka 3.4.2 . A different linear model was computed for each accuracy measure. Analysis of the linear models revealed that the secondary structure content and the solvent accessible surface have the highest importance in all models. Moreover, results show that it is not possible to exclude any parameter since non negligible weights are associated to all selected parameters, when the accuracy measures chosen are considered as a whole (Additional File 1).
The neural networks forming the core of AIDE are four layers feed forward neural networks with fifteen neurons (corresponding to the selected parameters) in the input layer, two hidden layers formed by two neurons each, and one neuron in the output layer. A linear activation function was chosen for all neurons. Indeed, different combinations of hidden layers (one or two) and different numbers of hidden neurons per layer (from two to ten nodes per layer) were tested. In addition, we tested also different activation functions of the neurons (sigmoid, log-sigmoid and linear functions). It turned out that, among the different combinations, the neural network featuring two hidden layers formed by two neurons gave the best results. In fact, an increase in the number of neurons led to poorer performances, probably due to the increased difficulties in the optimization procedure arising from the augmented network complexity. To carry out the optimization of neural networks, we have implemented the attractive-repulsive particle swarm optimization algorithm (AR-PSO) , as explained in Methods. Training of the neural networks using more conventional approaches (Gradient descent, Levenberg-Marquardt), led to slightly lower performances (Additional file 2). This may be due to the greater exploration ability that characterize the PSO methods.
AIDE was trained and tested on datasets of all-atoms protein decoys for which the three-dimensional structures are available. Since it is known that methods used for building decoys may introduce some systematic bias, it is important to benchmark a scoring function on different decoy sets in order to assess its generality. The overall dataset used in the present study is composed by an ensemble of widely used all-atom datasets containing models of different proteins (4state-reduced, fisa, fisa-casp3, rosetta all-atoms, CASP5, CASP7, Livebench2, lmds, and hg-structal [19–21, 38–40]), plus a molecular dynamics set that was generated in our laboratory from X-ray structures (see Methods).
After computation of the structural parameters to be inserted in the neural networks, the overall dataset was subdivided into a training and a test set, which were composed by 13693 and 49126 structures, respectively. The training-set includes only the proteins belonging to the LiveBench2 and CASP7 decoy sets (13693 model structures built on 96 different proteins). The test-set includes the lmds, CASP5, hg_structal, MD, Rosetta and 4state-reduced subsets (49126 models build on 97 proteins). The LiveBench2 and CASP7 decoy sets were chosen as training sets because they contain models build with different methods and of different protein size, ranging from 20 to 500 residues. No protein contained in the training set is present also in the test set.
Then, a population of 50 neural networks was trained starting from different initializations of the structural parameters. The network featuring the best performance (the highest correlation coefficient on the training set) was selected as the working network in AIDE.
A different neural network was trained for each measure of structure similarity chosen to evaluate proteins quality (RMSD, TM-score, GDT-TS, LG-score and MaxSub). Therefore, five different versions of AIDE were obtained from the training procedure, referred to in the following as AIDE RMSD, AIDE TM-score, AIDE GDT-TS, AIDE LG-score and AIDE MaxSub.
Assessment of AIDE performance
The performances of the different version of AIDE have been compared to results obtained from widely used methods developed to evaluate protein models quality.
The performances of the different methods were evaluated using a test-set which includes lmds, CASP5, hg_structal, MD, Rosetta and 4state-reduced subsets. The LiveBench2 and CASP7 sets were already used for training AIDE and therefore were not used in the comparative evaluation.
Pearson correlation coefficients. For each dataset belonging to the test-set the Pearson correlation coefficient between the predicted and the computed values is reported. The performance of AIDE is compared to that of ProQ and Victor/FRST validation softwares.
10%-fraction enrichment. The 10%-fraction enrichment is shown for each dataset belonging to the test-set. The performance of AIDE is compared to that of ProQ and Victor/FRST validation softwares.
Z nat . Comparison of Z nat values obtained using AIDE and other protein structure validation softwares. ProQ values have been obtained from Ref. 17.
Analysis of Pearson correlation coefficients (Table 1) shows that, according to this statistical indicator, the different AIDE versions behave quite similarly. Most importantly, average AIDE performances are similar or slightly better than those obtained by two state-of-the-art methods such as ProQ  and Victor . It is also noteworthy that the performance of AIDE changes significantly moving through the different subsets forming the test-set. In particular, very high correlation coefficients are obtained with the MD and hg_structal datasets (correlation coefficient in the range 0.61–0.89 and 0.48–0.73, respectively), whereas low values of Pearson coefficients are associated to the CASP5 dataset (0.15–0.38). Relatively different values of Pearson correlation coefficients are obtained also with ProQ and Victor. In particular, and differently from AIDE, low correlation coefficients are obtained by ProQ for the Rosetta subset, and by Victor for the fisa subset (Table 1). The factors responsible for such non-homogeneous performances of the methods, when applied to different datasets, could not be unrevealed and might require further dissection of the test-set. In light of these results and observations it can be concluded that, even if the overall performances of AIDE, ProQ and Victor are similar, these methods can behave very differently on protein models obtained using different approaches, suggesting that the combined use of AIDE, ProQ and Victor could be useful to properly evaluate the quality of a protein structure.
Analysis of F. E. values (Table 2) shows again quite similar overall performances of AIDE, ProQ and Victor. However, the average F. E. values obtained using ProQ are consistently higher (by 5–10%) relative to the corresponding values obtained with Victor and AIDE. A more detailed analysis of F. E. values obtained from the different subsets composing the test set highlights some interesting trends. F. E. values obtained from the lmds and fisa subsets are consistently lower than the average. Moreover, AIDE and ProQ versions trained using different parameters to evaluate structure similarity can give quite different results. The latter observation is particularly evident for the lmds subset. It is also interesting to note that the best performances on the different subsets forming the test set are often obtained by different methods. As an example, the best F. E. values for the fisa subset are obtained using AIDE, whereas the best values for the hg_structal subset are obtained with ProQ, further suggesting that the combined use of the different methods can be a good strategy to obtain a more confident evaluation of the quality of a protein structure. Z nat allows to evaluate how (and if) the different methods distinguish the native (X-ray) structure from the ensemble of its models (Table 3). In this case it was possible to extend the comparison to other methods widely used to evaluate protein structures quality (Errat, Prosa II and Verify 3D). Only the lmds and 4state_reduced subsets have been used in this comparison because these are the only datasets in common among all the compared methods for which data are available. Analysis of Z nat values reveals that ProQ and Victor have better performances in this statistical test, whereas AIDE results are generally comparable to those obtained with Errat, Prosa II and Verify 3D. Notably, very low Z nat scores are obtained using AIDE RMDS and AIDE LG-score on the 4state_reduced subset.
Considering the different AIDE versions, a clear distinction can be observed when comparing the overall accuracy of AIDE RMSD and AIDE MaxSub relative to AIDE LGscore, AIDE GDT-TS and AIDE TMscore (Figure 1). Notably, a similar difference was not evident when considering the correlation coefficients or the fraction enrichment test. It is also important to note that AIDE LGscore behaves very similarly to ProQ LGscore until about 60% of sensitivity, whereas at higher sensitivity levels AIDE outperforms ProQ LGscore. These observations further corroborate the hypothesis that the combined use of ProQ and AIDE should give improved results in the evaluation of the quality of three-dimensional protein models.
The web interface of AIDE
The availability of five different AIDE versions gives a nice picture of the overall performance of the method. However, the overloading of output information can become a drawback for the user interested only in the most relevant results. In fact, the analysis of AIDE performance has shown that the five different versions of AIDE are generally characterised by similar behaviour (see Table 1, 2, 3). To better evaluate the degree of correlation among different AIDE versions we have carried out a principal component analysis on the Pearson correlation matrix of the descriptors chosen to evaluate models quality. This analysis reveals a strong correlation between TM-score, GDT-TS and MaxSub. The different clustering of TM-score, GDT-TS and MaxSub relative to RMSD and LG-score is mainly due to the inverse relationship between the two families(Additional files 3 and 4). Therefore, two (GDT-TS and the MaxSub) of these highly correlated parameters have been excluded from the output of the AIDE program available on the Internet . Moreover, to help the user in the evaluation of AIDE results, we have defined a threshold for each predicted parameter, in order to discriminate between incorrect and correct models. In particular, correct models should have TM-score ≥ 0.31, RMSD ≤ 4.96 Å and LG-score ≤ 0.35. These thresholds were chosen using a dataset of manually assessed models composed by some CASP5 targets belonging to the new fold and fold recognition categories. According to the visual evaluation of Aloy and coworkers , the models were divided into three class: class 2 ("excellent") when the overall fold is correct, class 1 ("good") when the model is considered partway to the correct fold, and class 0 for all the other models. For each model, the TM-score, LG-score and RMSD were computed (Additional files 5, 6, 7) and the average value for the models belonging to the "excellent" class was used as threshold. To further evaluate the classification ability using the chosen thresholds, the sensitivity and the specificity based on the ROC plots were also computed (Additional file 8).
In this paper we have presented AIDE, a neural network system which is able to evaluate the quality of protein structures obtained by prediction methods.
AIDE differs from other evaluation methods mainly for : i) a different choice of the parameters used to describe the protein structure, ii) a different choice of the parameters related to structure quality, iii) a novel strategy used to optimize the neural networks. AIDE overall performances are comparable to recently published state of the art methods, such as ProQ  and Victor . However, detailed comparative analysis of results obtained using AIDE, ProQ and Victor reveals that the three methods have different and often complementary ability to properly assess the quality of protein structures. This observation suggests that the combined use of AIDE, ProQ and Victor could increase the reliability in the evaluation of protein structures quality. AIDE is presently available on the Internet .
The 4state-reduced set is an all-atom version of the models generated by Park & Levitt  using a four-state off-lattice model.
The fisa and fisa-casp3 sets contain decoys of four small alpha-helix proteins. In these sets main chains were generated using a procedure of fragment insertion based on simulated annealing: native-like structures were assembled from a combination of fragments of known unrelated protein structures characterized by similar local sequences, using Bayesian scoring functions . The side chains of fisa and fisa-casp3 were modeled with the software package SCWRL .
The hg-structal is a set of hemoglobin models generated by homology modelling.
The lmds subset [19, 44] was produced by Keasar and Levitt by geometry optimizations carried out using a complex potential that contains a pairwise component, as well as cooperative hydrogen bonds terms. The Rosetta all-atom decoys were generate with the ROSETTA method developed by David Baker . The molecular dynamics set of decoys was generated by molecular dynamics (MD) simulations carried out in vacuum with the software GROMACS 3.2 [45, 46]. Each protein structure was submitted to 100 ps of simulation using the OPLS force field . MD simulations were performed in the NVT ensemble at 600 K, using an external bath with a coupling constant of 0.1 ps . The LINCS algorithm  was adopted to constrain bond lengths of heavy atoms, allowing us to use a 2 fs time step. Van der Waals and Coulomb interactions were truncated at 8 Å, while long-range electrostatics interactions were evaluated using the particle mesh ewald summation scheme . The Van der Waals radii were increased to 4 Å for all atoms, in order to speed up the unfolding process .
Snapshots from the trajectory have been extracted every 0.4 ps, collecting 250 misfolded structures for each protein, with a backbone RMSD (root mean square deviation between the initial structure and each snapshot) ranging from 0 to about 10 Å.
The complete dataset contains 62819 protein models build on 193 proteins.
Training-set and test-set
The dataset was splitted into two disjoint sets : a training-set and a test-set. The training-set includes only proteins belonging to the LiveBench2 and CASP7 decoys sets (13693 model structures built on 96 different proteins). The test-set includes the lmds, CASP5, hg_structal, MD, Rosetta and 4state-reduced datasets (49126 models build on 97 proteins).
Parameters-Descriptors used in the neural network
where the residues A, L, V, I, P, F, M, W were considered as hydrophobic and the SAS total is the total solvent accessible surface computed using NACCESS .
The secondary structure was evaluated with the DSSP program , in which the typical 8-state DSSP definition was simplified according to the following rules : H and G to helix, E and B to strand and all other states considered as coil, in agreement with PSIPRED definition .
where n ss is the number of residues located in well-defined secondary structure elements, and N is the number of protein residues.
where n c is the number of residues located in corresponding secondary structure elements according to DSSP definition and PSIPRED secondary structure prediction.
PROCHECK parameters. PROCHECK parameters used in AIDE. The G-factor, which is a log-odds score based on the observed distributions of stereochemical parameters, provides a measure of how "normal", or alternatively how "unusual", a given stereochemical property is.
Percentage of residue in Ramachandran plot core regions
Percentage of residue in Ramachandran plot allowed regions
Percentage of residue in Ramachandran plot generously allowed regions
Percentage of residue in Ramachandran plot disallowed regions
Number of bad contacts
G-factor for dihedral angles
G-factor for covalent bonds
Model accuracy measures
Quality of protein models was evaluated by means of five different descriptors, using the crystal structure as reference: RMSD on the backbone atoms, TM-score , GDT-TS , LG-score  and MaxSub . RMSD was computed on the backbone atoms after superposing the model structure on the crystal structure, using the program CE .
TM-score was developed to evaluate the topology similarity of two protein structures . TM-score values fall into the interval [0, 1]. Scores equal or below 0.17 indicate that the prediction has a reliability compared to a random selection from the PDB library.
GDT-TS gives an estimation of the largest number of residues that can be found in which all distances between the model and the reference structure are shorter than the cutoff D. The number of residues is measured as a percentage of the length of the target structure. The values of GDT-TS fall into the interval [0–1], with a GDT-TS of 1 corresponding to perfect superposition.
The LG-score represents the significance (P-value) of a score (S str ) associated to the best subpart of a structural alignment between the model and the correct structure. The value is measured by using a structural P-value ranging from 0 to 1, with a value of 0 corresponding to optimal superposition.
MaxSub is calculated from the largest number of residues that superimpose well over the reference structure, and produces a normalized score that ranges between 0 and 1. A MaxSub value of 1 is associated to perfect superposition.
Four layers feed forward neural networks were used, with fifteen neurons in the input layer, two neurons in two hidden layers and one neuron in the output layer. A linear activation function was chosen for all neurons.
For each accuracy measure chosen to evaluate proteins quality (RMSD, TM-score, GDT-TS, LG-score and MaxSub) a different neural network was trained.
where t is the vector of predicted values for each decoy, y is the vector of true values, μ t , σ t , μ y , σ y are the averages and the standard deviations of predicted and true values, respectively, and M is the number of decoys.
Optimization of neural networks was carried out using the attractive-repulsive particle swarm optimization algorithm (AR-PSO) , which is a modification of the original PSO method [57, 58]. PSO is a stochastic population-based optimization approach which explores the hyper-dimensional parameters space of a population of candidate solutions named particles. Particles fly over the solution space looking for the global optimum. Each particle retains an individual memory of the best position visited and a global memory of the best position visited by all the particles.
A particle calculates its next position combining information from its last movement, the individual memory, the global memory and a random component.
in which wi, t+1represents the position vector of the particle i at time t (i.e. the neural network weights), is the best position identified by the particle i so far (i.e. the neural network weights associated with the best performance value) and is the best position identified among all the particles. The vector v represents the particles velocity, which is computed as the difference between two positions and assuming unitary time.
The term ( - wi, t) represents the individual memory component and ( - wi, t) is the global one. These two terms are rescaled by the random coefficients c1 and c2, respectively. The μ coefficient is used to rescale the velocity.
where S is the number of particles in the swarm, N is space dimension (the number of networks weights) and is the average of the parameter j among the particles.
causing the particles to spread in the phase space. If D reaches a maximal threshold (t max ) the update rule is restored as in the standard PSO method. We choose t min = 0.1 and t max = 5.0.
The parameters c 1, c 2 and μ were set as in the original PSO method as c 1 = c 2 ∈ [0.0, 2.0] and μ = 0.7298. The maximum number of iterations was set to 10000. A population size of 5 particles was chosen. It should be noted that standard training algorithms such as gradient descent back-propagation, Levenberg-Marquardt and BFGS, led to poorer results when compared to the particle swarm optimization (data not shown).
The following statistical parameters were used: Pearson correlation coefficient, already described in the neural network section, fraction enrichment (F.E.) and Z_nat.
Fraction enrichment (F.E.) is defined as the fraction of the top 10% conformations featuring best structural resemblance to the native structure among the top 10% best scoring conformations.
Higher Z nat values correspond to higher capacity to discriminate between the native structure and the corresponding decoys.
The Receiver Operating Characteristic (ROC) graph is a plot of all sensitivity/specificity pairs resulting from continuously varying the decision threshold over the range of results observed. The sensitivity or true positive fraction is reported on the y-axis, while the x-axis represents the 1-specificity or true negative fraction. A test with perfect discrimination (no overlap between the two distribution of results) has a plot curve that passes through the upper left corner, where both specificity and sensitivity are 1.00. The ipotetical plot of a test with no discrimination between the two groups is a 45° line going from the lower left to the upper right corner.
Qualitatively, the closer the plot is to the upper left corner, the higher the overall accuracy of the test.
Availability and requirements
Project name: AIDE
Project home page: http://linux.btbs.unimib.it/cgi-bin/aide.cgi
Operating system(s): Linux, Unix
Programming language: Fortran77/90, Perl
Licence: GNU GPLv3
Any restrictions to use by non-academics: No restrictions
The authors are grateful to CINECA (Project 562)-2007 for the use of computational facilities
- Tress M, Ezkurdia I, Grana O, Lopez G, A V: Assessment of predictions submitted for the CASP6 comparative modeling category. Proteins: Structure, Function, and Bioinformatics 2005, 61(Suppl 7):27–45.View ArticleGoogle Scholar
- Bradley P, Malmstrom L, Qian B, Schonbrun J, Chivian D, Kim D, Meiler J, Misura K, D B: Free modeling with Rosetta in CASP6. Proteins: Structure, Function, and Bioinformatics 2005, 61(Suppl 7):128–134.View ArticleGoogle Scholar
- Soonming J, Eunae K, Seokmin S, P Y: Ab inition folding of helix bundle proteins using molecular dynamics simulations. JACS 2003, 125: 14841–14846.View ArticleGoogle Scholar
- Andrzej Kolinacuteski JMB: Generalized protein structure prediction based on combination of fold-recognition with de novo folding and evaluation of models. Proteins: Structure, Function, and Bioinformatics 2005, 61(Suppl 7):84–90.Google Scholar
- Moult J, Fidelis K, Rost B, Hubbard T, Tramontano A: Critical assessment of methods of protein structure prediction (CASP)-round 6. Proteins: Structure, Function, and Bioinformatics 2005, 61(Suppl 7):3–7.View ArticleGoogle Scholar
- Xu J, Yu L, Li M: Consensus fold recognition by predicted model quality. APBC 2005, 73–83.Google Scholar
- Xu J: Fold Recognition by Predicted Alignment Accuracy. IEEE/ACM Trans Comput Biology Bioinform 2005, 2(2):157–165.View ArticleGoogle Scholar
- Moult J: A decade of CASP: progress, bottlenecks and prognosis in protein structure prediction. Curr Opin Struct Biol 2005, 15: 285–289.View ArticlePubMedGoogle Scholar
- Kryshtafovych A, Venclovas C, Fidelis K, Moult J: Progress Over the First Decade of CASP Experiments. Proteins: Structure, Function, and Bioinformatics 2005, 61(Suppl 7):225–267.View ArticleGoogle Scholar
- Tramontano A: An account of the Seventh Meeting of the Worldwide Critical Assessment of Techniques for Protein Structure Prediction. FEBS Journal 2007, 274(7):1651–1654.View ArticlePubMedGoogle Scholar
- Lazaridis T, Karplus M: Effective energy functions for protein structure prediction. Curr Opin Struct Biol 2000, 10: 139–145.View ArticlePubMedGoogle Scholar
- Sippl M: Recognition of errors in three-dimensional structures of proteins. Proteins 1993, 17: 355–362.View ArticlePubMedGoogle Scholar
- Sippl M: Knowledge based potential for proteins. Curr Opin Struct Biol 1995, 5: 229–235.View ArticlePubMedGoogle Scholar
- Melo F, Feytmans : Novel knowledge-based mean force potential at atomic level. J Mol Biol 1997, 267: 207–222.View ArticlePubMedGoogle Scholar
- Tosatto S: The Victor/FRST Function for Model Quality Estimation. Journal of Computational Biology 2005, 12: 1316–1327.View ArticlePubMedGoogle Scholar
- Melo F, Sanchez R, Sali A: Statistical potentials for fold assessment. Protein Science 2002, 11: 430–448.PubMed CentralView ArticlePubMedGoogle Scholar
- Wallner B, Elofsson A: Can correct protein models be identified? Protein Science 2003, 12: 1073–1086.PubMed CentralView ArticlePubMedGoogle Scholar
- Samudrala R, Levitt M: Decoys R Us: A database of incorrect conformations to improve protein structure prediction. Protein Science 2000, 9: 1399–1401.PubMed CentralView ArticlePubMedGoogle Scholar
- Park B, Levitt M: Energy functions that discriminate X-ray and near native folds from well-constructed decoys. J Mol Biol 1996, 258: 367–392.View ArticlePubMedGoogle Scholar
- Simons K, Kooperberg C, Huang E, Baker D: Assembly of protein tertiary structures from fragments with similar local sequences using simulated annealing and bayesian scoring functions. J Mol Biol 1997, 268: 209–225.View ArticlePubMedGoogle Scholar
- Simons K, Bonneau R, Ruczinski I, Baker D: Ab initio protein structure prediction of CASP III targets using ROSETTA Proteins. Proteins 1999, (Suppl 3):171–176.Google Scholar
- Lundstrom J, Rychlewski L, Bujnicki J, Elofsson A: Pcons: a neural-network-based consensus predictor that improves fold recognition. Protein Sci 2001, 10: 2354–2362.PubMed CentralView ArticlePubMedGoogle Scholar
- Sippl MJ: Recognition of errors in three-dimensional structures of proteins. Proteins 1993, 17: 355–362.View ArticlePubMedGoogle Scholar
- Colovos C, Yeates TO: Verification of protein structures: patterns of nonbonded atomic interactions. Protein Sci 1993, 2: 1511–1519.PubMed CentralView ArticlePubMedGoogle Scholar
- Bowie JU, Luthy R, Eisenberg D: A method to identify protein sequences that fold into a known three-dimensional structure. Science 1991, 253: 164–170.View ArticlePubMedGoogle Scholar
- Luthy R, Bowie JU, Eisenberg D: Assessment of protein models with three-dimensional profiles. Nature 1992, 356: 83–85.View ArticlePubMedGoogle Scholar
- Laskowski RA, MacArthur MW, Moss DS, Thornton JM: PROCHECK: a program to check the stereochemical quality of protein structures. J Appl Cryst 1993, 26: 283–291.View ArticleGoogle Scholar
- Vriend G: WHAT IF: a molecular modeling and drug design program. J Mol Graph 1990, 8: 52–56.View ArticlePubMedGoogle Scholar
- Pontius J, Richelle J, Wodak SJ: Deviations from standard atomic volumes as a quality measure for protein crystal structures. J Mol Biol 1996, 264: 121–136.View ArticlePubMedGoogle Scholar
- Jones DT: Protein secondary structure prediction based on position-specific scoring matrices. J Mol Biol 1999, 292: 195–202.View ArticlePubMedGoogle Scholar
- Eramian D, yi Shen M, Devos D, Melo F, Sali A, Marti-Renom MA: A composite score for predicting errors in protein structure models. Protein Sci 2006, 15(7):1653–1666.PubMed CentralView ArticlePubMedGoogle Scholar
- Cristobal S, Zemla A, Fischer D, Rychlewski L, Elofsson A: A study of quality measures for protein threading models. BMC Bioinformatics 2001, 2: 5.PubMed CentralView ArticlePubMedGoogle Scholar
- Zhang Y, Skolnick J: Scoring function for automated assessment of protein structure template quality. Proteins 2004, 57(4):702–710.View ArticlePubMedGoogle Scholar
- Siew N, Elofsson A, Rychlewski L, Fischer D: MaxSub: an automated measure for the assessment of protein structure prediction quality. Bioinformatics 2000, 16(9):776–785.View ArticlePubMedGoogle Scholar
- Induction of model trees for predicting continuous classes 1997.
- S G: WEKA: The Waikato Environment for Knowledge Analysis. University of Waikato, Hamilton, New Zealand: University of Waikato; 1995.Google Scholar
- Riget J, Vesterstrom S: A Diversity-Guided Particle Swarm Optimizer – the ARPSO. 2002.Google Scholar
- Bujnicki JM, Elofsson A, Fischer D, Rychlewski L: LiveBench-2: large-scale automated evaluation of protein structure prediction servers. Proteins 2001, (Suppl 5):184–191.Google Scholar
- AIDE : Artificial Intelligence Decoys Evaluator[http://linux.btbs.unimib.it/cgi-bin/aide.cgi]
- Aloy P, Stark A, Hadley C, Russell R: Prediction wihout templates: new fold, secondary structure, and contacts in CASP5. Proteins 2003, 53(Suppl 6):436–456.View ArticlePubMedGoogle Scholar
- Bower M, Cohen F, Dunbrack R: Prediction of protein side-chain rotamer from a backbone dependent rotamer library: a new homology modelling tool. J Mol Biol 1997, 267: 1268–1282.View ArticlePubMedGoogle Scholar
- Fain B, Xia Y, Levitt M: Design of an optimal Chebyshev-expanded discrimination function for globular proteins. Protein Sci 2002, 11: 2010–2021.PubMed CentralView ArticlePubMedGoogle Scholar
- Lindahl E, Hess B, van der Spoel D: GROMACS 3.0: A package for molecular simulation and trajectory analysis. J Mol Biol 2001, 7: 306–317.Google Scholar
- Berendsen H, van der Spoel D, van Drunen R: GROMACS: A message passing parallel molecular dynamics implementation. Comp Phys Comm 1995, 91: 43–56.View ArticleGoogle Scholar
- Jorgensen W, Tirado-Rives J: The OPLS potential functions for proteins. energy minimizations for crystals of cyclic peptides and crambin. J Am Chem Soc 1988, 110: 1657–1666.View ArticleGoogle Scholar
- Berendsen H, Postma J, Dinola A, JR H: MD with coupling to an external bath. J Phys Chem 1984, 81: 3684–3690.View ArticleGoogle Scholar
- Hess B, Bekker H, Berendsen H, JGEM F: LINCS: A linear constraint solver for molecular simulations. J Comp Chem 1997, 18: 1463–1472.View ArticleGoogle Scholar
- Essman U, Perela L, Berkowitz M, Darden T, Lee H, Pederson L: A smooth particle mesh Ewald method. J Chem Phys 1995, 103: 8577–8592.View ArticleGoogle Scholar
- Lazaridis T, Karplus M: Discrimination of the native from misfolded protein models with an energy function including implicit solvation. J Mol Biol 1999, 288: 477–487.View ArticlePubMedGoogle Scholar
- Hubbard SJ, Thornton JM: NACCESS Computer Program. Department of Biochemistry and Molecular Biology, University College London; 1993.Google Scholar
- Kabsch W, Sander C: Dictionary of Protein Secondary-Structure: Pattern Recognition of Hydrogen-Bonded and Geometrical Features. Biopolymers 1983, 22: 2577–2637.View ArticlePubMedGoogle Scholar
- Salerno WJ, Seaver SM, Armstrong BR, Radhakrishnan I: MONSTER: inferring non-covalent interactions in macromolecular structures from atomic coordinate data. Nucleic Acids Res 2004, 32: 566–568.View ArticleGoogle Scholar
- Shindyalov IN, Bourne PE: Protein structure alignment by incremental combinatorial extension (CE) of the optimal path. Protein Eng 1998, 11: 739–747.View ArticlePubMedGoogle Scholar
- Levitt M, Gerstein M: A unified statistical framework for sequence comparison and structure comparison. Proc Natl Acad Sci USA 1998, 95: 5913–5920.PubMed CentralView ArticlePubMedGoogle Scholar
- Kennedy J, Eberhart RC: Particle swarm optimization. In Proc IEEE Int'l Conf on Neural Networks, IV, 1942–1948. Piscataway, NJ; 1995:1942–94.View ArticleGoogle Scholar
- Eberhart RC, Kennedy J: A new optimizer using particle swarm theory. In Proceedings of the Sixth International Symposium on Micromachine and Human Science. Nagoya, Japan; 1995:39–43.View ArticleGoogle Scholar
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.