Volume 15 Supplement 16
Bhageerath-H: A homology/ab initio hybrid server for predicting tertiary structures of monomeric soluble proteins
© Jayaram et al.; licensee BioMed Central Ltd. 2014
Published: 8 December 2014
The advent of human genome sequencing project has led to a spurt in the number of protein sequences in the databanks. Success of structure based drug discovery severely hinges on the availability of structures. Despite significant progresses in the area of experimental protein structure determination, the sequence-structure gap is continually widening. Data driven homology based computational methods have proved successful in predicting tertiary structures for sequences sharing medium to high sequence similarities. With dwindling similarities of query sequences, advanced homology/ ab initio hybrid approaches are being explored to solve structure prediction problem. Here we describe Bhageerath-H, a homology/ ab initio hybrid software/server for predicting protein tertiary structures with advancing drug design attempts as one of the goals.
Bhageerath-H web-server was validated on 75 CASP10 targets which showed TM-scores ≥0.5 in 91% of the cases and Cα RMSDs ≤5Å from the native in 58% of the targets, which is well above the CASP10 water mark. Comparison with some leading servers demonstrated the uniqueness of the hybrid methodology in effectively sampling conformational space, scoring best decoys and refining low resolution models to high and medium resolution.
Bhageerath-H methodology is web enabled for the scientific community as a freely accessible web server. The methodology is fielded in the on-going CASP11 experiment.
KeywordsProtein tertiary structure prediction ab initio homology RMSD protein folding
"The native conformation of a protein is determined by the totality of interatomic interactions and hence, by the amino acid sequence, in a given environment" (Nobel Lecture, Christian B. Anfinsen, December 11, 1972). According to Anfinsen's protein folding hypothesis, a protein's native structure is determined by its amino acid sequence which drives protein into its minimum Gibbs energy state . This hypothesis evolved as a basic tenet for protein structure prediction algorithms (PSPAs). However limited understanding of net balance of forces involved in protein folding creates deficiencies in various proposed PSPAs. One of the early efforts in solving protein folding problem was driven by thermodynamic calculations, which incorporate searching algorithms to investigate a conformation that corresponds to minimum free energy . Here the large number of degrees of freedom of a protein gives rise to innumerable conformations, an enumeration of which is practically impossible. This despite, proteins fold rapidly into their native structure in milliseconds to seconds time scales implying that a brute force enumeration of all possible conformations may not be required as implicit in Levinthal's Paradox . The fact that sequence introduces local structural bias, narrows down the accessible conformational space and introduces local as well as long range interactions, suggesting a halfway solution to the paradox [4–7]. As a result, PSPAs need two key components: (a) a rapid computational algorithm for protein conformational search and (b) an accurate scoring function to capture the best available conformation. The first component involves use of different physics based as well as knowledge based approaches for extensive sampling of the vast conformational space [8, 9]. Physics based sampling methods include use of Monte Carlo (MC) methods [10–15], Genetic algorithms , molecular dynamics simulations (MD) [17, 18], simulated annealing [19, 20], replica-exchange MC or MD and local enhanced sampling [21–23]. Knowledge based methods use information from the solved protein structures and knowledge based potentials for sampling protein conformational space .
Homology modeling [25–30] and fold recognition/ threading methods [31–35] are knowledge based approaches, which are routinely used to generate reliable models for proteins with overall fold topology similar to an available template in the protein databases. Query protein with no sequence and structural similarity are modeled from scratch using physics based/ ab initio approaches. The success of ab initio or physics-based sampling methods is limited by lack of accurate energy functions [36, 37], heavy computational requirements, force field errors [38–40] and protein size, while knowledge based approaches are limited by sequence similarity and evolutionary relationships [41–43]. A popular trend in protein conformational sampling is the fragment assembly method, which uses parts of known protein or protein fragments to generate a structure of the target. After conformational sampling, the next immediate concern is to capture the best available structure by means of a scoring function [44–56]. These functions combine chemical, physical, geometrical and energetic constraints to capture native or near native models [57, 58].
A thorough literature survey reveals that the available protein structure prediction algorithms are based on methods such as (a) homology modeling, (b) fold recognition, (c) ab initio and (d) hybrid [59, 60]. Different software/tools are available in the public domain based on these computational approaches and are evaluated every two years during the Critical Assessment of techniques for protein structure prediction (CASP experiments) . Recent CASP experiments have shown significant progress by hybrid approaches, which combine homology, ab initio along with atomic level model refinements for protein structure prediction . This article describes Bhageerath-H, a homology/ ab initio hybrid software for predicting tertiary structure of monomeric proteins. Bhageerath-H makes use of Bhageerath-H Strgen algorithm  for extensive sampling of the protein fold space and generates a large basket of decoys containing near-native protein conformations, which are further supplemented by a chemical logic based alignment scheme and then clustered to eliminate non-unique redundant structures. These are then screened by a physico-chemical scoring metric (pcSM) and assessed for their quality. The selected models are refined via a unique and effective quantum mechanics based loop bond angle optimization method, which drives the selected models further close to the native topology. Bhageerath-H automated pipeline is freely available to the scientific community across the world via http://www.scfbio-iitd.res.in/bhageerath/bhageerath_h.jsp.
(A) Bhageerath-H Strgen for candidate structures
The first step in the pipeline involves generation of a large pool of full length decoys. In the proposed protein structure prediction pipeline, Bhageerath-H Strgen algorithm for protein conformational sampling  is the first module. The module takes as input protein amino acid sequence and provides as output a large pool of decoys. A revised and improved version of structure generation algorithm is incorporated in the Bhageerath-H software suite. Bhageerath-H Strgen makes use of the current sequence and structural database knowledge along with Bhageerath ab initio folding [64, 65] in order to effectively search the fold space for an input protein sequence. It starts with amino acid sequence, followed by secondary structure prediction and BLAST  search for sequence based homologs. In addition, it also searches for distant analogs and structural homologs using tools such as pGenthreader [67, 68], ffas [69, 70], spark-x  and HHSearch . A new addition to this methodology is a chemical logic based  procedure for template selection followed by alignment generation. It utilizes amino acid chemical properties such as hydrogen bond donor, conformational flexibility, shape and size of side chains for generating an amino acid substitution scoring matrix. This scoring matrix is used for template selection as well as template-target alignment generation. The matrix helps in selecting distant homologs, which are generally missed during a normal database search. The templates and template-target alignments are used for modeling fragments of varying length via Modeller [74, 75]. Modeled fragments are then screened for missing links with no available templates. These missing stretches are generated using Bhageerath ab initio modeling method [65, 76, 77]. All the incomplete protein fragments are patched in order to generate full-length models, which are energy scored and top 5 lowest energy decoys are sent for Bhageerath abintio loop sampling. The newly sampled structures are added to the growing pool of full length protein decoys. The output of the first step is a large pool of protein decoys. The average size of the decoy pool is on the order of 104-105 structures.
Bhageerath-H Strgen module includes locally installed copies of Psipred, BLAST, PFAM , SCOP [79, 80], nr , pdb database http://www.pdb.org/pdb/home/home.do, HHSearch, Spark-X, pGenthreader, ffas and modeller. The scalable Bhageerath-H Strgen algorithm is currently configured to utilize 64 processors of Linux Cluster. Programs are written in C++, MPI language and involve use of linux shell scripting. Average time taken for Bhageerath-H Strgen run is 1-2hrs. This first module of the Bhageerath-H pipeline generates a large pool of decoys which needs to be further filtered, processed and refined. We would like to note that Bhageerath-H software is not just limited to Bhageerath-H Strgen an already published algorithm. Bhageerath-H Strgen is a protein decoy generation program which is the first module here. After protein decoy generation, protein decoy selection and refinement are the other two very important steps in protein structure prediction pipeline. In Bhageerath-H software modules 2-5 are dedicated for decoy clustering, selection and refinement, which are not included in Bhageerath-H Strgen. Output from this module is submitted for clustering in the next step.
This command gives as an output a cluster file, which contains the centroid in the pdb format along with the members of each centroid and the root mean square deviation (rmsd) distance of each member from the centroid. The centroids themselves are mathematical constructs and convey no information, but utilizing rmsd information one lowest rmsd member from each cluster is picked . To overcome the time limitation, clustering is performed in a parallel mode. The output of K-mean clustering is a set of decoys, which are unique, non-recurring and contain near-native structural models. This set of decoys containing near-native models is submitted for physico-chemical scoring in the third step.
(C) Scoring based on a physico-chemical metric
where A1 is the fractional area of exposed non-polar residues, A2 is the fractional area of exposed non polar part of residues, A3 is the weighted exposed area, A4 is the total surface area, PH and Ps are secondary structure penalties for helix and sheet respectively, M1 is Euclidean distance. The prefix "c" for each parameter in the above equation refers to its optimized coefficient. cA1 = 10, cA2 = 0.1, cA3 = 0.00001, cA4 = 0.001, cM1 = 0.001, cp = 0.15(PH) and 0.21(PS).
In order to get the top 10 structures, each of the seven parameters are evaluated for all the clustered decoys and a short energy minimization is performed to remove steric clashes. For the given input decoy pool, pcSM gives as an output top 10 ranked native-like candidates structures. pcSM algorithm runs in parallel mode and utilizes 64 processors. On an average, time taken for scoring varies from 2 to 3 hours. The top 10 pcSM ranked models are submitted for protein structure analysis and validation in the next step.
(D) Protein Structure Analysis and Validation (PROTSAV) based ranking
PROTSAV is a protein structure quality assessment meta-server (manuscript under preparation). Currently, it comprises six tools namely Procheck , Verify-3D , ERRAT , Naccess , PROSA  and dDFIRE , for quality assessment of protein structures. PROTSAV generates an overall protein quality score, which is a summation of scores predicted by individual modules. High PROTSAV values reflect poor structure quality of query protein and low values close to zero represent good quality of query protein structure. Run time for this module is 40-45 seconds. In this step, pcSM selected top 10 protein models are analysed and ranked. The top ranked model is submitted for QM based loop bond angle refinement in the next step.
(E) Quantum mechanics (PM6) based loop bond angle optimization
Quantum mechanics (PM6) based loop bond angle optimization (manuscript in preparation) takes topmost PROTSAV selected model as an input, optimizes loop bond angles and performs ab initio loop sampling . The small pool of decoys generated in the process is side chain optimized using Scwrl4. Scwrl4 is a program for prediction of protein side chain conformation . Scwrl4 uses latest backbone-dependent library to provide rotamer frequency, dihedral angles and variances. The side chain optimized decoys are further energy minimized (SD = 500, CG = 500) using sander module of AMBER10 software .
These optimized and energy minimized refinement generated decoys are scored using pcSM and the top 10 ranked QM refined models are passed to next step.
(F) Final ranking
Input to this step is top 10 pcSM ranked QM refined models from step (E) and top 5 PROTSAV ranked models from the step (D). PROTSAV ranked models are side chain optimized and energy minimized before final ranking. The selected 15 models are re-ranked using pcSM and the top 5 are given to the user as an output.
Results and Discussion
Validation of Bhageerath-H software suite
Bhageerath-H automated pipeline was thoroughly tested and validated on the benchmark CASP10 dataset. Each CASP experiment reveals the state of the art in the field of protein structure prediction. About75 CASP10 targets of varying size and complexity were considered here for the analysis. To begin with the assessment, CASP-like conditions were mimicked, which means the native and near-native homologs were excluded during structure prediction. Any template released later than the first CASP10 server target i.e. fifth May of 2012 was not considered. For structure assessment an automated pipeline was developed. For each CASP10 target, sequence was extracted from the native structure. Then predicted structure sequence and the native sequence were aligned using ClustalW . Residues with missing coordinates were removed from the predictions in order to make the sequence of the two structures match exactly. The native and the Bhageerath-H generated final five models were compared based on the widely used criteria of Cα root mean square deviation (Cα RMSD) and Template modeling score (TM-score). Cα RMSD is a global indicator of structural identity, while TM-score identifies local substructures and evaluates local identity. TM-score refers to template modeling score. TM-score is considered as a quantitative measure for classification of protein topology. A TM-score > 0.5 signifies that protein pairs share same fold whereas a TM-score < 0.5 are mostly not of the same fold and a TM-score of 0.17 indicates random prediction [93, 94].
(A) Bhageerath-H performance on 75 CASP10 targets
Comparison of Bhageerath-H performance with BAKER-ROSETTA, Quark and MULTICOM-CLUSTER
CASP organizers assign a unique target id to each protein fielded in the CASP experiment. While validating and comparing performance of Bhageerath-H software on 75 CASP10 targets, we have closely analyzed some of the CASP10 target proteins in which Bhageerath-H outperformed other three servers under consideration. A brief description of the biological role of the targets T0655, T0672, T0675, T0700, T0716, T0736, T0747, T0755, T0669, T0713, T0686, T0724 is given in Additional File 2.
A close inspection of the reason for better performance of Bhageerath-H revealed that for targets such as T0675, T0672, T0669, T0716, T0736, T0700 it was Bhageerath-H Strgen patching module as well as ab initio loop sampling which generated a low RMSD near-native structure. In systems T0655, T0747, the low RMSD sampled structure is due to the amino acid chemical logic based scoring matrix. The amino acid substitution scoring matrix is a new addition to Bhageerath-H Strgen methodology and performs a very thorough search of the database for homologs based on amino acids chemical properties. This matrix helped in template search and alignment generation especially in targets T0655 and T0747, where most other servers failed to predict a low RMSD structure. It identified correct templates and generated better target-template alignments, which resulted in high quality near-native structural models for proteins with low sequence similarity. In cases where a full length template is unavailable, the matrix helped in generating high quality alignments for short sequence fragments. Other than amino acid chemical logic based scoring matrix the major contributor for better performance of Bhageerath-H software is abinitio loop sampling. Loops are the most flexible parts of a protein structure involved in molecular recognition. Correct modeling of loops has always been a challenge. Ab initio loop sampling module helped in systematic and thorough sampling of the loop conformation space and generated low RMSD models. CASP 10 targets where Bhageerath-H outperformed other participating servers were mainly modeled through chemical logic and ab-initio loop sampling.
Other than above specified targets, Bhageerath-H's performance is noteworthy for targets T0713, T0686 and T0724 when compared to the other three servers under consideration. Though high quality Bhageerath-H models were not predicted, these targets need special attention and discussion. These three targets are described below as case studies for illustration of Bhageerath-H performance.
(i) Target T0713: This target is a hypothetical protein from Eubacterium ventriosum having PDB: 4H09 and 739 amino acid residues. It has four leucine rich repeats domains which take solenoid shape in protein structure. These domains help protein to interact with its complementary protein partner. Bhageerath-H sampled a lowest RMSD structure of 8.91Å in pool of trial structures. After clustering and pcSM decoy selection the lowest RMSD model in top 10 was 9.80Å. The topmost PROTSAV selected model was given to QM based structure refinement. QM refined the input model and generated a decoy in the small pool having 6.61Å Cα RMSD from the native. It is due to the bond angle optimization which assisted in a better conformational sampling and a lower RMSD decoy, which was picked by pcSM during final five ranking. Bhageerath-H successfully modeled and picked a structure in the top five having leucine repeat domain similar to the native structure. The domain form horseshoe shape reflects its biological activity.
(ii) Target T0686: This target is a sporozite surface protein of plasmodium vivax, one of the causative agents for malarial disease. It is also called TRAP (thrombospondin repeat anonymous protein) which mediates the invasion of mosquitoes and vertebrates host cells in malaria. TRAP protein has two functional domains (i) TSP (thrombospondin type I) and (ii) VWA (von willebrand factor type A) that are responsible for cell adhesion. Bhageerath-H Strgen generated a 7.41Å RMSD structure which was retained post clustering. pcSM and PROTSAV picked an 8.13Å structure which was submitted for QM based refinement. The final lowest RMSD model in top 5 is 7.75Å, which is a much better prediction in comparison to other server predictions. Model structure closely superimposes with VWA domain of native crystal structure (PDB: 4QHO) protein while there are a few anomalies in TSP domain. VWA domain is mainly responsible for protein's biological activity and covers a stretch of ~180 amino acids. TSP is a shorter domain (∼40 amino acids). The final ranked Bhageerath-H modelled structure missed an extended β-sheet, which resulted in a high RMSD of the prediction from the native.
(iii) Target T0724: This target is a hypothetical uncharacterized protein from bacteroides vulgates having PDB: 4FMR. It has only one characterized functional domain i.e DNA binding. QM based structure refinement assisted in better conformational sampling and in generating a near-native decoy. A brief biological description of the studied targets is given in the Additional File 2.
In a nut shell, major reasons behind the ability of Bhageerath-H to predict lower RMSD near-native models are firstly exhaustive sampling technique. Bhageerath-H Strgen and the newly developed amino acid chemical logic based scoring matrix help in a thorough search of template and protein conformational space, ensuring generation of near-native models in maximum instances. Secondly, it is the pcSM scoring function which cherry picks these native-like candidates with 93% accuracy. Apart from these two major modules, it is the PROTSAV structure analysis which ranks models accordingly and submits for QM refinement. Finally, QM based refinement protocol facilitates in going one step ahead and improves prediction accuracy.
(B) Assessment of individual modules of Bhageerath-H pipeline
Assessment of individual modules of Bhageerath-H pipeline for 7 CASP10 targets.
Lowest Cα RMSD structure in Bhageerath-H Strgen sampled decoy pool
Number of decoys generated
Lowest Cα RMSD structure post K-mean clustering
Number of filtered decoy
Lowest Cα RMSD structure among top10 pcSM ranked decoys
Cα RMSD of the topmost SAVPRO ranked model
Lowest Cα RMSD among final five Bhageerath-H predictions
(D) Quality assessment of Bhageerath-H predictions
Finally, the quality of Bhageerath-H predictions was assessed based on Molprobity score . Molprobity score evaluates the stereochemistry of input structure. Online Molprobity server http://molprobity.biochem.duke.edu was used for score calculation. Additional File 3 shows the Molprobity score of the best Bhageerath-H predictions. Best refers to the lowest Cα RMSD in the final five Bhageerath-H predictions. The average Molprobity score is 1.94 for 75 predictions.
Bhageerath-H web server
We have developed Bhageerath-H, an automated pipeline for protein tertiary structure prediction and made it into a freely accessible web server http://www.scfbio-iitd.res.in/bhageerath/bhageerath_h.jsp. The pipeline comprise six different modules which are Bhageerath-H Strgen for decoy generation, K-mean clustering, pcSM for decoy selection, PROTSAV for structure validation, QM (PM6) based loop refinement and final ranking. Together each module assists in pushing the prediction accuracy to higher limits. Bhageerath-H server was validated on 75 CASP10 targets and results show that the methodology is effective in predicting good structures for proteins with varying sequence and structural similarities. Comparison with some of the existing softwares demonstrated the uniqueness of the hybrid methodology in effectively sampling conformational space, scoring best decoys and refining low resolution models to high and medium resolution. A critical analysis of the targets where Bhageerath-H was unsuccessful in predicting low RMSD structures highlights the areas of improvement. These include better secondary structure prediction, better alignment strategies, improvement in ab initio modeling for sampling new folds and refinement strategies. We are currently working on these areas especially for targets with very low sequence similarity. The current version of Bhageerath-H has already taken the structure prediction field beyond CASP10. This improved methodology is fielded in the ongoing CASP11 experiment.
Several proteins exhibit partial or complete instability in their structures. These proteins are classified as intrinsically disordered proteins (IDPs). Bhageerath-H is a homology and abinito hybrid method for modeling structures of monomeric proteins. The current web-enabled version of the protocol is not specifically programmed to model structures of IDPs. Rather, the ab initio loop modeling section of the first module as well as QM(PM6) method for loop bond angle refinement attempt to sample conformation space of long loop stretches/disordered regions.
Thus to summarize, in the recent years, data driven homology based computational methods have proved successful in predicting tertiary structures for sequences with high sequence similarity. With the dwindling similarities of query sequences, advanced homology/ ab initio hybrid approaches are being explored to solve structure prediction problem. Overcoming these limitations while pushing the frontiers of protein structure prediction, we have proposed Bhageerath-H algorithm. The proposed algorithm finds applications in the field of protein structure/function prediction, active-site directed drug design, in studying protein-protein interactions, and in protein design and engineering. In the absence of experimental protein structure, the availability of computational protein tertiary structural models helps to probe biological functions of proteins.
List of abbreviations
Critical Assessment of Protein Tertiary Structure Prediction.
Root mean square deviation.
- Cα RMSD:
C-alpha root mean square deviation
Template modeling score.
Protein structure prediction algorithms.
Programme support to the Supercomputing Facility for Bioinformatics & Computational Biology (SCFBio), IIT Delhi from the Department of Biotechnology Govt. of India and Indian Council of Medical Research is gratefully acknowledged. RK is a recipient of Senior Research Fellowship from Council of Scientific and Industrial Research (CSIR), India
The publication charges of this article were funded by Professional Development Fund, IIT Delhi, India.
This article has been published as part of BMC Bioinformatics Volume 15 Supplement 16, 2014: Thirteenth International Conference on Bioinformatics (InCoB2014): Bioinformatics. The full contents of the supplement are available online at http://www.biomedcentral.com/bmcbioinformatics/supplements/15/S16.
- Anfinsen CB: Principles that govern the folding of protein chains. Science. 1973, 181: 223-230. 10.1126/science.181.4096.223.View ArticlePubMedGoogle Scholar
- Guo JT, Ellrott K, Xu Y: A historical perspective of template-based protein structure prediction. Methods Mol Biol. 2008, 413: 3-42.PubMedGoogle Scholar
- Levinthal C: Are there pathways for protein folding?. Journal de Chimie Physique et de Physico-Chimie Biologique. 1968, 65: 44-45.Google Scholar
- Srinivasan R, Rose GD: A physical basis for protein secondary structure. Proc Natl Acad Sci USA. 1999, 96: 14258-14263. 10.1073/pnas.96.25.14258.PubMed CentralView ArticlePubMedGoogle Scholar
- Street AG, Mayo SL: Intrinsic β-sheet propensities result from van der Waals interactions between side chains and the local backbone. Proc Natl Acad Sci USA. 1999, 96: 9074-9076. 10.1073/pnas.96.16.9074.PubMed CentralView ArticlePubMedGoogle Scholar
- Honig B: Protein folding: From the levinthal paradox to structure prediction. J Mol Biol. 1999, 293: 283-293. 10.1006/jmbi.1999.3006.View ArticlePubMedGoogle Scholar
- Chikenji G, Fujitsuka Y, Takada S: Shaping up the protein folding funnel by local interaction: Lesson from a structure prediction study. Proc Natl Acad Sci USA. 2006, 103: 3141-3146. 10.1073/pnas.0508195103.PubMed CentralView ArticlePubMedGoogle Scholar
- Liwo A, Czaplewski C, Ołdziej S, Scheraga HA: Computational techniques for efficient conformational sampling of proteins. Curr Opin Struct Biol. 2008, 18: 134-139. 10.1016/j.sbi.2007.12.001.PubMed CentralView ArticlePubMedGoogle Scholar
- Baldwin RL, Rose GD: Is protein folding hierarchic? I. Local structure and peptide folding. Trends Biochem Sci. 1999, 24: 26-33. 10.1016/S0968-0004(98)01346-2.View ArticlePubMedGoogle Scholar
- Scheraga HA, Lee J, Pillardy J, Ye YJ, Liwo A, Ripoll D: Surmounting the Multiple-Minima Problem in Protein Folding. Journal of Global Optimization. 1999, 15: 235-260. 10.1023/A:1008328218931.View ArticleGoogle Scholar
- Wales DJ, Scheraga HA: Global Optimization of Clusters, Crystals, and Biomolecules. Science. 1999, 285: 1368-1372. 10.1126/science.285.5432.1368.View ArticlePubMedGoogle Scholar
- Metropolis N, Rosenbluth AW, Rosenbluth MN, Teller AH, Teller E: Equations of State Calculations by Fast Computing Machines. Journal of Chemical Physics. 1953, 21: 1087-1092. 10.1063/1.1699114.View ArticleGoogle Scholar
- Da Silva RA, Degreve L, Caliri A: LMProt: An Efficient Algorithm for Monte Carlo Sampling of Protein Conformational Space. Biophys J. 2004, 87: 1567-1577. 10.1529/biophysj.104.041541.PubMed CentralView ArticlePubMedGoogle Scholar
- Tang K, Zhang J, Liang J: Fast Protein Loop Sampling and Structure Prediction Using Distance-Guided Sequential Chain-Growth Monte Carlo Method. PLoS Comput Biol. 2014, 10: e1003539-10.1371/journal.pcbi.1003539.PubMed CentralView ArticlePubMedGoogle Scholar
- Zhang J, Lin M, Chen R, Liang J, Liu JS: Monte Carlo sampling of near-native structures of proteins with applications. Proteins. 2007, 66: 61-68.View ArticlePubMedGoogle Scholar
- Lee J, Scheraga HA, Rackovsky S: New optimization method for conformational energy calculations on polypeptides: Conformational space annealing. J Comput Chem. 1997, 18: 1222-1232. 10.1002/(SICI)1096-987X(19970715)18:9<1222::AID-JCC10>3.0.CO;2-7.View ArticleGoogle Scholar
- Caves LS, Evanseck JD, Karplus M: Locally accessible conformations of proteins: multiple molecular dynamics simulations of crambin. Protein Sci. 1998, 7: 649-666. 10.1002/pro.5560070314.PubMed CentralView ArticlePubMedGoogle Scholar
- Abrams CF, Vanden-Eijnden E: Large-scale conformational sampling of proteins using temperature-accelerated molecular dynamics. Proc Natl Acad Sci USA. 2010, 107: 4961-4966. 10.1073/pnas.0914540107.PubMed CentralView ArticlePubMedGoogle Scholar
- Chou KC, Carlacci L: Simulated annealing approach to the study of protein structures. Protein Eng. 1991, 4: 661-667. 10.1093/protein/4.6.661.View ArticlePubMedGoogle Scholar
- Kannan S, Zacharias M: Simulated annealing coupled replica exchange molecular dynamics--an efficient conformational sampling method. J Struct Biol. 2009, 166: 288-294. 10.1016/j.jsb.2009.02.015.View ArticlePubMedGoogle Scholar
- Hansmann UHE: Parallel Tempering Algorithm for Conformational Studies of Biological Molecules. Chem Phys Lett. 1997, 281: 140-10.1016/S0009-2614(97)01198-6.View ArticleGoogle Scholar
- Zhou R: Methods Replica exchange molecular dynamics method for protein folding simulation. Mol Bio. 2007, 350: 205-223.Google Scholar
- Zhang W, Chen J: Efficiency of adaptive temperature-based replica exchange for sampling large-scale protein conformational transitions. J Chem Theory Comp. 2013, 9: 2849-2856. 10.1021/ct400191b.View ArticleGoogle Scholar
- Zhou H, Skolnick : Ab initio protein structure prediction using chunk-TASSER. J Biophys J. 2007, 93: 1510-1518. 10.1529/biophysj.107.109959.View ArticlePubMedGoogle Scholar
- Arnold K, Bordoli L, Kopp J, Schwede T: The SWISS-MODEL Workspace: A web-based environment for protein structure homology modelling. Bioinformatics. 2006, 22: 195-201. 10.1093/bioinformatics/bti770.View ArticlePubMedGoogle Scholar
- Ashkenazy H, Erez E, Martz E, Pupko T, Ben-Tal N: ConSurf 2010: calculating evolutionary conservation in sequence and structure of proteins and nucleic acids. Nucleic Acids Res. 2010, 38: W529-W533. 10.1093/nar/gkq399.PubMed CentralView ArticlePubMedGoogle Scholar
- Goujon M, McWilliam H, Li W, Valentin F, Squizzato S, Paern J, Lopez R: A new bioinformatics analysis tools framework at EMBL-EBI. Nucleic Acids Res. 2010, 38 (suppl 2): W695-9704.PubMed CentralView ArticlePubMedGoogle Scholar
- Jayaram B, Dhingra Priyanka: Towards creating complete proteomic structural databases of whole organisms. Current Bioinformatics. 2012, 7: 424-435. 10.2174/157489312803900992.View ArticleGoogle Scholar
- Kopp J, Schwede T: Automated protein structure homology modeling: a progress report. Pharmacogenomics. 2004, 5: 405-416. 10.1517/14622418.104.22.1685.View ArticlePubMedGoogle Scholar
- Bordoli L, Kiefer F, Arnold K, Benkert P, Battey J, Schwede T: Protein structure homology modelling using SWISS-MODEL Workspace. Nature Protocols. 2009, 4: 1-13.View ArticlePubMedGoogle Scholar
- Norel P, Petrey D, Honig B: PUDGE: a flexible, interactive server for protein structure prediction. Nucleic Acid Research. 2010, 38: W550-554. 10.1093/nar/gkq475.View ArticleGoogle Scholar
- Rost B, Schneider R, Sander C: Protein Fold Recognition by Prediction-based threading. J Mol Biol. 1997, 270: 471-80. 10.1006/jmbi.1997.1101.View ArticlePubMedGoogle Scholar
- Godzik A: Fold recognition methods. Methods Biochem Anal. 2003, 44: 525-46.PubMedGoogle Scholar
- Taylor William, Jonassen Inge: A structural pattern-based method for protein fold recognition. Proteins Structure Function and Bioinformatics. 2004, 56: 222-34. 10.1002/prot.20073.View ArticleGoogle Scholar
- Jinbo X, Feng J, Libo Y: Protein structure prediction using threading. In Protein Structure prediction Methods in Molecular biology. 2008, 413: 91-121.Google Scholar
- Richard B, David B: Ab initio protein STRUCTURE PREDICTION: Progress and Prospects. Annual Review of Biophysics and Biomolecular Structure. 2001, 30: 173-189. 10.1146/annurev.biophys.30.1.173.View ArticleGoogle Scholar
- Themis L, Martin K: Effective energy functions for protein structure prediction. Current Opinion in Structural Biology. 2000, 10: 139-145. 10.1016/S0959-440X(00)00063-4.View ArticleGoogle Scholar
- Fan H, Periole X, Mark AE: Mimicking the action of folding chaperones by Hamiltonian replica-exchange molecular dynamics simulations: Application in the refinement of de novo models. Proteins. 2012, 80: 1744-1754.PubMedGoogle Scholar
- Lin MS, Gordon TH: Reliable protein structure refinement using a physical function. J Comp Chem. 2011, 32: 709-717. 10.1002/jcc.21664.View ArticleGoogle Scholar
- Zhu J, Fan H, Periole X, Honig B, Mark AE: Refining homology models by combining replica-exchange molecular dynamics and statistical potentials. Proteins. 2008, 72: 1171-1188. 10.1002/prot.22005.PubMed CentralView ArticlePubMedGoogle Scholar
- Margelevicius M, Venclovas C: Re-searcher: a system for recurrent detection of homologous protein sequences. BMC Bioinformatics. 2010, 11: 89-102. 10.1186/1471-2105-11-89.PubMed CentralView ArticlePubMedGoogle Scholar
- Wernisch L, Hunting M, Wodak SJ: Identification of Structural Domains in Proteins by a Graph Heuristic. Proteins Struct Funct Genet. 1999, 35: 338-352. 10.1002/(SICI)1097-0134(19990515)35:3<338::AID-PROT8>3.0.CO;2-I.View ArticlePubMedGoogle Scholar
- Liu T, Guerquin M, Samudrala R: Improving the accuracy of template-based predictions by mixing and matching between initial models. BMC Structural Biology. 2008, 8: 24-10.1186/1472-6807-8-24.PubMed CentralView ArticlePubMedGoogle Scholar
- Rykunov D, Fiser A: New statistical potential for quality assessment of protein models and a survey of energy functions. BMC Bioinformatics. 2010, 11: 28-10.1186/1471-2105-11-28.View ArticleGoogle Scholar
- Fang Q, Shortle D: Protein refolding in silico with atom-based statistical potentials and conformational search using a simple genetic algorithm. J Mol Biol. 2006, 359: 1456-10.1016/j.jmb.2006.04.033.View ArticlePubMedGoogle Scholar
- McConkey BJ, Sobolev V, Edelman M: Discrimination of native protein structures using atom-atom contact scoring. Proc Natl Acad Sci USA. 2003, 100: 3215-3220. 10.1073/pnas.0535768100.PubMed CentralView ArticlePubMedGoogle Scholar
- Benkert P, Kunzli M, Schwede T: QMEAN server for protein model quality estimation. Nucleic Acids Research. 2009, 37: W510-W514. 10.1093/nar/gkp322.PubMed CentralView ArticlePubMedGoogle Scholar
- Benkert P, Tosatto SC, Schomburg D: QMEAN: A comprehensive scoring function for model quality assessment. Proteins. 2008, 71: 261-277. 10.1002/prot.21715.View ArticlePubMedGoogle Scholar
- McConkey BJ, Sobolev V, Edelman M: Discrimination of native protein structures using atom-atom contact scoring. Proc Natl Acad Sci USA. 2003, 100: 3215-10.1073/pnas.0535768100.PubMed CentralView ArticlePubMedGoogle Scholar
- Zhang J, Chen R, Liang J: Empirical potential function for simplified protein models: Combining contact and local sequence-structure descriptors. Proteins: Structure Function and Bioinformatics. 2006, 63: 949-960. 10.1002/prot.20809.View ArticleGoogle Scholar
- Lu M, Dousis AD, Ma J: OPUS-PSP: An Orientation-dependent Statistical All-atom Potential Derived from Side-chain Packing. Journal of Molecular Biology. 2008, 376: 288-301. 10.1016/j.jmb.2007.11.033.PubMed CentralView ArticlePubMedGoogle Scholar
- Rykunov D, Fiser A: Effects of amino acid composition, finite size of proteins, and sparse statistics on distance-dependent statistical pair potentials. Proteins Structure, Function and Bioinformatics. 2007, 67: 559-568. 10.1002/prot.21279.View ArticleGoogle Scholar
- Phillips JC, Braun R, Wang W, Gumbart J, Tajkhorshid E, Villa E, Chipot C, Skeel RD, Kale L, Schulten K: Scalable molecular dynamics with NAMD. Journal of Computational Chemistry. 2005, 26: 1781-1802. 10.1002/jcc.20289.PubMed CentralView ArticlePubMedGoogle Scholar
- Fang Q, Shortle D: A consistent set of statistical potentials for quantifying local side-chain and backbone interactions. Proteins. 2005, 60: 90-10.1002/prot.20482.View ArticlePubMedGoogle Scholar
- Shen MY, Sali A: Statistical potential for assessment and prediction of protein structures. Protein Sci. 2006, 15: 2507-2524. 10.1110/ps.062416606.PubMed CentralView ArticlePubMedGoogle Scholar
- Sippl MJ: Recognition of errors in three-dimensional structures of proteins. Proteins. 1993, 17: 355-10.1002/prot.340170404.View ArticlePubMedGoogle Scholar
- Mishra A, Rao S, Mittal A, Jayaram B: Capturing Native/Native like Structures with a Physico-Chemical Metric (pcSM) in Protein Folding. BBA - Proteins and Proteomics. 2013, 1834: 1520-31. 10.1016/j.bbapap.2013.04.023.View ArticlePubMedGoogle Scholar
- Mishra A, Rana PS, Mittal A, Jayaram B: D2N: Distance to native. BBA - Proteins and Proteomics. 2014, 1844: 1798-1807. 10.1016/j.bbapap.2014.07.010.View ArticlePubMedGoogle Scholar
- Morea V, Tramontano A: Assessment of homology-based predictions in CASP5. Proteins Structure Function and Bioinformatics. 2003, 53: 352-368. 10.1002/prot.10543.View ArticleGoogle Scholar
- Tress M, Tai CH, Wang G, Ezkurdia I, López G, Valencia A, Lee B, Dunbrack RL: Domain defi nition and target classifi cation for CASP6. Proteins Structure Function and Bioinformatics. 2005, 61: 8-18. 10.1002/prot.20717.View ArticleGoogle Scholar
- Moult J, Fidelis K, Kryshtafovych A, Schwede T, Tramontano A: Critical assessment of methods of protein structure prediction (CASP) -- round x. Proteins Structure Function and Bioinformatics. 2013, 82: 1-6.View ArticleGoogle Scholar
- Zhang Y: Progress and challenges in protein structure prediction. Curr Opin Struct Biol. 2008, 18: 342-348. 10.1016/j.sbi.2008.02.004.PubMed CentralView ArticlePubMedGoogle Scholar
- Dhingra P, Jayaram B: A homology/ab initio hybrid algorithm for sampling near-native protein conformations. J ComputChem. 2013, 34: 1925-1936.Google Scholar
- Jayaram B, Bhushan K, Shenoy RS, Narang P, Bose S, Agarwal P, Sahu D, Pandey V: Bhageerath: An Energy Based Web Enabled Computer Software Suite for Limiting the Search Space of Tertiary Structures of Small Globular Proteins. Nucl Acids Res. 2006, 34: 6195-6204. 10.1093/nar/gkl789.PubMed CentralView ArticlePubMedGoogle Scholar
- Jayaram B, Dhingra P, Lakhani B, Shekhar S: Bhageerath - Targeting the Near Impossible: Pushing the Frontiers of Atomic Models for Protein Tertiary Structure Prediction. Journal of Chemical Sciences. 2012, 124: 83-91. 10.1007/s12039-011-0189-x.View ArticleGoogle Scholar
- Altschul SF, Madden TL, Schäffer AA, Zhang J, Zhang Z, Miller W, Lipman DJ: Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res. 1997, 25: 3389-402. 10.1093/nar/25.17.3389.PubMed CentralView ArticlePubMedGoogle Scholar
- Lobley A, Sadowski MI, Jones DT: pGenTHREADER and pDomTHREADER: new methods for improved protein fold recognition and superfamily discrimination. Bioinformatics. 2009, 25: 1761-1767. 10.1093/bioinformatics/btp302.View ArticlePubMedGoogle Scholar
- McGuffin LJ, Jones DT: Improvement of the GenTHREADER method for genomic fold recognition. Bioinformatics. 2003, 19: 874-881. 10.1093/bioinformatics/btg097.View ArticlePubMedGoogle Scholar
- Jaroszewski L, Rychlewski L, Li Z, Li W, Godzik A: FFAS03: a server for profile-profile sequence alignments. Nucleic Acids Res. 2005, 33: W284-288. 10.1093/nar/gki418.PubMed CentralView ArticlePubMedGoogle Scholar
- Jaroszewski L, Li Z, Cai XH, Weber C, Godzik A: FFAS server: novel features and applications. Nucleic Acids Res. 2011, 39: W38-W44. 10.1093/nar/gkr441.PubMed CentralView ArticlePubMedGoogle Scholar
- Yang Y, Faraggi E, Zhao H, Zhou Y: Improving protein fold recognition and template-based modeling by employing probabilistic-based matching between predicted one-dimensional structural properties of query and corresponding native properties of templates. Bioinformatics. 2011, 27: 2076-2082. 10.1093/bioinformatics/btr350.PubMed CentralView ArticlePubMedGoogle Scholar
- Söding J: Protein homology detection by HMM-HMM comparison. Bioinformatics. 2005, 21: 951-960. 10.1093/bioinformatics/bti125.View ArticlePubMedGoogle Scholar
- Jayaram B: Decoding the design principles of amino acids and the chemical logic of protein sequences. Nature Precedings. 2008, [http://precedings.nature.com/documents/2135/version/1]Google Scholar
- Sali A, Blundell TL: Comparative protein modelling by satisfaction of spatial restraints. J Mol Biol. 1993, 234: 779-815. 10.1006/jmbi.1993.1626.View ArticlePubMedGoogle Scholar
- Sali A, Potterton L, Yuan F, Vlijmen HV, Karplus M: Evaluation of comparative protein modeling by MODELLER. Proteins. 1995, 23: 318-326. 10.1002/prot.340230306.View ArticlePubMedGoogle Scholar
- Narang P, Bhushan K, Bose S, Jayaram B: A computational pathway for bracketing native-like structures for small alpha helical globular proteins. Phys Chem Chem Phys. 2005, 7: 2364-2375. 10.1039/b502226f.View ArticlePubMedGoogle Scholar
- Narang P, Bhushan K, Bose S, Jayaram B: Protein structure evaluation using an all-atom energy based empirical scoring function. J Biomol Str Dyn. 2006, 23: 385-406. 10.1080/07391102.2006.10531234.View ArticleGoogle Scholar
- Finn RD, Tate J, Mistry J, Coggill PC, Sammut SJ, Hotz HR, Ceric G, Forslund K, Eddy SR, Sonnhammer EL, Bateman A: The Pfam protein families database. Nucleic Acid Res. 2010, 38: D211-222. 10.1093/nar/gkp985.PubMed CentralView ArticlePubMedGoogle Scholar
- Murzin A, Brenner SE, Hubbard T, Chothia C: SCOP: a structural classification of proteins database. Journal of Molecular Biology. 1995, 247: 536-540.PubMedGoogle Scholar
- Holm L, Sander C: Protein folds and families: sequence and structure alignments. Nucleic Acids Res. 1999, 27: 244-247. 10.1093/nar/27.1.244.PubMed CentralView ArticlePubMedGoogle Scholar
- Pruitt KD, Tatusova T, Maglott DR: NCBI reference sequences (RefSeq): a curated non-redundant sequence database of genomes, transcripts and proteins. Nucleic Acids Res. 2007, 35: D61-65. 10.1093/nar/gkl842.PubMed CentralView ArticlePubMedGoogle Scholar
- Berman HM, Westbrook J, Feng Z, Gilliland G, Bhat TN, Weissig H, Shindyalov IN, Bourne PE: The Protein Data Bank. Nucleic Acids Research. 2000, 28: 235-242. 10.1093/nar/28.1.235.PubMed CentralView ArticlePubMedGoogle Scholar
- Feig M, Karanicolas J, Brooks CL: MMTSB Tool Set: enhanced sampling and multiscale modeling methods for applications in structural biology. J Mol Graph Model. 2004, 22 (5): 377-395. 10.1016/j.jmgm.2003.12.005.View ArticlePubMedGoogle Scholar
- Laskowski RA, MacArthur MW, Moss DS, Thornton JM: PROCHECK - a program to check the stereo chemical quality of protein structures. J App Cryst. 1993, 26: 283-291. 10.1107/S0021889892009944.View ArticleGoogle Scholar
- Luthy R, Bowie JU, Eisenberg D: Assessment of protein models with three-dimensional profiles. Nature. 1992, 356: 83-85. 10.1038/356083a0.View ArticlePubMedGoogle Scholar
- Colovos C, Yeates TO: Verification of protein structures: Patterns of nonbonded atomic interactions. Protein Science Cambridge University Press. 1993, 2: 1511-1519.View ArticleGoogle Scholar
- Lee B, Richards FM: The interpretation of protein structures: estimation of static accessibility. J Mol Biol. 1971, 55: 379-400. 10.1016/0022-2836(71)90324-X.View ArticlePubMedGoogle Scholar
- Wiederstein M, Sippl MJ: ProSA-web: interactive web service for the recognition of errors in three-dimensional structures of proteins. Nucleic Acids Research. 2007, 35: W407-W410. 10.1093/nar/gkm290.PubMed CentralView ArticlePubMedGoogle Scholar
- Yang Y, Zhou Y: Ab initio folding of terminal segments with secondary structures reveals the fine difference between two closely related all-atom statistical energy functions. Protein Science. 2008, 17: 1212-1219. 10.1110/ps.033480.107.PubMed CentralView ArticlePubMedGoogle Scholar
- Krivov G, Shapovalov MV, Dunbrack RL: Improved prediction of protein side-chain conformations with SCWRL4. Proteins. 2009, 77: 778-795. 10.1002/prot.22488.PubMed CentralView ArticlePubMedGoogle Scholar
- Case DA, Darden TA, Simmerling CL, Wang J, Duke RE, Luo R, Crowley M, Walker RC, Zhang W, Merz KM, Wang B, Hayik S, Roitberg A, Seabra G, Kolossváry I, Wong KF, Paesani F, Vanicek J, Wu X, Brozell SR, Steinbrecher T, Gohlke H, Yang L, Tan C, Mongan J, Hornak V, Cui G, Mathews DH, Seetin MG, Sagui C, Babin V, Kollman PA: Amber 10. 2008, University of California, San FranciscoGoogle Scholar
- Larkin MA, Blackshields G, Brown NP, Chenna R, McGettigan PA, McWilliam H, Valentin F, Wallace IM, Wilm A, Lopez R, Thompson JD, Gibson TJ, Higgins DG: Clustal W and Clustal × version 2.0. Bioinformatics. 2007, 23: 2947-2948. 10.1093/bioinformatics/btm404.View ArticlePubMedGoogle Scholar
- Xu J, Zhang Y: How significant is a protein structure similarity with TM-score = 0.5?. Bioinformatics. 2010, 26: 889-895. 10.1093/bioinformatics/btq066.PubMed CentralView ArticlePubMedGoogle Scholar
- Zhang Y, Skolnick J: TM-align: A protein structure alignment algorithm based on the TM-score. Nucleic Acids Res. 2005, 33: 2303-2309.Google Scholar
- Rohl CA, Strauss CE, Misura KM, Baker D: Protein structure prediction using Rosetta. Methods Enzymol. 2004, 383: 66-93.View ArticlePubMedGoogle Scholar
- Xu D, Zhang Y: Ab initio protein structure assembly using continuous structure fragments and optimized knowledge-based force field. Proteins. 2012, 80: 1715-1735.PubMed CentralView ArticlePubMedGoogle Scholar
- Wang Z, Eickholt J, Cheng J: MULTICOM: a multi-level combination approach to protein structure prediction and its assessments in CASP8. Bioinformatics. 2010, 26: 882-888. 10.1093/bioinformatics/btq058.PubMed CentralView ArticlePubMedGoogle Scholar
- Chen VB, Arendall WB, Headd JJ, Keedy DA, Immormino RM, Kapral GJ, Murray LW, Richardson JS, Richardson DC: MolProbity: all-atom structure validation for macromolecular crystallography. Acta Crystallogr D Biol Crystallogr. 2010, 66: 12-21.PubMed CentralView ArticlePubMedGoogle Scholar
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.