The MULTICOM toolbox for protein structure prediction
© Cheng et al.; licensee BioMed Central Ltd. 2012
Received: 20 January 2012
Accepted: 30 April 2012
Published: 30 April 2012
As genome sequencing is becoming routine in biomedical research, the total number of protein sequences is increasing exponentially, recently reaching over 108 million. However, only a tiny portion of these proteins (i.e. ~75,000 or < 0.07%) have solved tertiary structures determined by experimental techniques. The gap between protein sequence and structure continues to enlarge rapidly as the throughput of genome sequencing techniques is much higher than that of protein structure determination techniques. Computational software tools for predicting protein structure and structural features from protein sequences are crucial to make use of this vast repository of protein resources.
To meet the need, we have developed a comprehensive MULTICOM toolbox consisting of a set of protein structure and structural feature prediction tools. These tools include secondary structure prediction, solvent accessibility prediction, disorder region prediction, domain boundary prediction, contact map prediction, disulfide bond prediction, beta-sheet topology prediction, fold recognition, multiple template combination and alignment, template-based tertiary structure modeling, protein model quality assessment, and mutation stability prediction.
These tools have been rigorously tested by many users in the last several years and/or during the last three rounds of the Critical Assessment of Techniques for Protein Structure Prediction (CASP7-9) from 2006 to 2010, achieving state-of-the-art or near performance. In order to facilitate bioinformatics research and technological development in the field, we have made the MULTICOM toolbox freely available as web services and/or software packages for academic use and scientific research. It is available at http://sysbio.rnet.missouri.edu/multicom_toolbox/.
KeywordsProtein structure prediction Bioinformatics tool Secondary structure Solvent accessibility Domain Contact map Tertiary structure Protein model quality assessment Fold recognition Protein disorder
The central dogma of protein science is that protein sequence specifies protein structure; and protein structure determines protein function. Therefore, understanding protein structure is crucial for elucidating protein function and has fundamental significance in biomedical sciences including protein function analysis, protein design, protein engineering, genome annotation, and drug design. Since the experimental determination of the first two protein structures - myoglobin and haemoglobin - using X-ray crystallography [1, 2], the structures of more and more proteins have been solved by either X-ray crystallography or Nuclear Magnetic Resonance (NMR) techniques. Currently, there are about 75,000 protein sequences with determined structures deposited in the Protein Data Bank (PDB), which account for about 0.07% of the total known protein sequences (i.e. > 108 million). With the exponential growth of protein sequences with unsolved structures produced by various high-throughput, next generation sequencing techniques, predicting protein structure from sequence, which is critical for filling the sequence-structure gap , has become one of the most fundamental problems in structural bioinformatics and genomics. Accurate high-throughput protein structure prediction tools are urgently needed for both scientific research as well as the bio-tech industry. These tools will also fulfill a very important and major goal of the structural genomics project, namely to provide a rather complete set of experimentally determined structures for predicting the structure of about 99.9% of proteins with unsolved structures .
The protein structure prediction problem is usually decomposed and attacked from the three different dimensional levels: 1D structure prediction, 2D structure prediction, and 3D structure prediction . One-dimensional (1D) structure prediction is the prediction of protein structural features such as secondary structures, solvent accessibilities, disordered residues or domain boundaries along one-dimensional sequences. Since 1D prediction is usually the first step to obtain protein structure, the largest number of methods and tools had been developed for it, such as Porter , SAM , SSpro [7, 8], PSIPRED , SABLE [10–13], YASSPP , Jpred , PREDATOR [16–18], and GOR  for secondary structure prediction; NetSurfP , ACCpro [7, 21] and Real-SPINE  for solvent accessibility prediction; PONDR [23, 24], MFDp , DISOPRED , SPINE-D , PrDOS , Spritz , POODLE [29–31], IUPRred [32, 33], DISOclust , and IntFOLD-DR  for disorder prediction; DomPred , DomSVR , PPRODO , CHOPnet , DoBo  and SSEP-Domain  for domain boundary prediction; and PredictProtein , Distill , and SCRATCH  for all four kinds of 1D predictions.
Two-dimensional (2D) structure prediction is to predict the spatial relationships (e.g., residue-residue contacts, disulfide bonds, or beta-residue pairings) of two residues. 2D prediction is a challenging and increasingly important problem . Some methods and tools for 2D prediction are PROFcon , Distill , TMHcon , DiANNA , GDAP , CYSPRED , BETAWRAP , SVM-BetaPred , BETTY , ProC_S3 , FragHMMent , SVMSEQ , and SAM .
Three-dimensional (3D) structure prediction is to predict the 3D coordinates of each residue [56–61], which is the ultimate goal of structure prediction. Some popular tools are I-TASSER [62–64], MODELLER [65, 66], HHpred , QUARK , chunk-TASSER , Rosetta , Pcons-net , SAM , Raptor-X , SparksX , and MULTICOM. 1D, 2D, and 3D protein structure prediction methods are routinely evaluated in the Critical Assessment of Techniques for Protein Structure Prediction (CASP)  - a community-wide experiment for blind protein structure prediction that has been held every two years since 1994. CASP experiments have driven the development of protein structure prediction methods by objectively assessing the state of the art of the most active and imperative protein structure prediction problems. The last two CASPs (CASP8, 2008 and CASP9, 2010)  focused on trying to solve the most pressing structure prediction problems: disorder region prediction (1D) , residue-residue contact prediction (2D) , protein tertiary structure prediction (3D) [78–80], evaluation of 3D models [81–87], and protein model refinement [74, 88, 89].
During the last several years, we have developed a series of tools for predicting protein structure and structural features at the 1D, 2D, and 3D levels, including secondary structure prediction, solvent accessibility prediction, disorder region prediction, domain boundary prediction, contact map prediction, disulfide bond prediction, beta-sheet topology prediction, protein fold recognition, multiple template combination and alignment, protein tertiary structure modeling, protein model quality assessment, and mutation stability prediction. Most of these tools have been rigorously tested by many users in the last several years and/or during the last three rounds of the Critical Assessment of Techniques for Protein Structure Prediction (CASP7-9) achieving state-of-the-art or near performance. In order to facilitate bioinformatics research and technological development in the field, we have incorporated updates and improvements accumulated over years into these tools and packed them together into one single comprehensive MULTICOM toolbox equipped with tutorials, documentation, software executables, some source code, web service, and online mailing list for technical support.
Methods and benchmarks
1D structure prediction tools
PSpro2.0 for secondary structure and relative solvent accessibility prediction
PSpro2.0 is an improved and combined version of the popular tools SSpro/ACCpro 4 [7, 8, 21] for the prediction of protein secondary structure and relative solvent accessibility. It integrates both homology-based and ab initio methods to make predictions. The ab initio approach uses a 1-D recursive neural networks (1D-RNN) [7, 90] and takes the profile of a query protein sequence as input to predict its secondary structures (i.e. helix, strand, and loop) or relative solvent accessibility (i.e. exposed and buried) at 20 different exposure thresholds (i.e. 0%, 5%, 10%, …, 95%). The sequence profile was generated by using PSI-BLAST to search the query sequence against a Non-Redundant protein (NR) sequence database, which has been updated to the most recent version. The PSpro2.0 allows users to plug in any version of the NR database of their choice.
The homology-based method in PSpro2.0 is called to make predictions if a significant homologous template protein can be found for a query protein in the Protein Data Bank (PDB) . The homology-based method uses BLAST to search the query sequence against a locally compiled version of the PDB database to identify homologous hits. Information regarding the alignment between the query and the most significant hit, including the alignment e-value, the number of amino acids aligned, number of gaps, sequence identity, is gathered and used by a linear regression function to predict the accuracy of transferring the secondary structure and solvent accessibility of the hit to the query protein. The linear regression function was trained on a set of query-template alignments with known alignment information and transferring accuracy. If the predicted transferring accuracy is > = 0.82 for secondary structure (resp. > = 0.80 for relative solvent accessibility), the secondary structure (resp. relative solvent accessibility) is transferred from the hit to the query as predictions. Otherwise, ab initio predictions will be used. The combination of the ab initio method and homology-based method can automatically apply the most appropriate method for the query proteins having or not having significant homology with a known protein structure in order to improve the prediction performance. In order to take advantage of abundant new protein structures in the PDB, PSpro2.0 uses an updated local version of the PDB database comprised of 62,607 proteins. The new local PDB database is a few times larger than the old one used with SSpro/ACCpro 4 which had 22,064 proteins.
We benchmarked PSpro2.0 on the protein targets of the last two Critical Assessments of Techniques for Protein Structure Prediction (CASP8 in 2008 and CASP9 in 2010). The CASP datasets were chosen because of their wide adoption in the field, their balance of easy (homology-based) and hard (ab initio or weak homology) targets, and their relatively large size. When the homology-based method was tested, the target proteins in the CASP8 and CASP9 data sets were removed from the local PDB database in order to avoid using themselves to make predictions. 100 CASP9 targets and 119 CASP8 targets that were not present in the local PDB database were used in this test.
The accuracy of the prediction of secondary structure (SS) and relative solvent accessibility (SA) on 100 CASP9 targets and 119 CASP8 targets, respectively
both ab initio and homology
ab initio alone
PreDisorder1.1 for protein disorder prediction
DoBo for protein domain boundary prediction
Protein domain boundary prediction is often used as a means to decompose the modeling of a large, multi-domain protein in to smaller, more manageable pieces. In order for such a technique to be applicable to hard, free modeling targets it should not rely extensively on templates or known structures to delineate protein domain boundaries. DoBo  is the sequence based protein domain boundary predictor we have developed and included in the MULTICOM toolbox. It leverages evolutionary information contained in multiple sequence alignments to identify potential domain boundary sites. These candidate sites are then classified using a support vector machine. Predicted domain boundary sites are finally scored and a confidence value provided.
We recently evaluated DoBo on 14 continuous, multi-domain CASP9 targets . DoBo is able to recall 70% of the domain boundaries, which occur at least 40 residues from the N or C terminal end of the sequence. The precision of the domain boundary prediction is 49%. Here, a domain boundary prediction is considered correct if it occurs within 20 residues of a true domain boundary. Furthermore, on a large benchmark dataset using a 10 fold cross validation procedure, DoBo achieves a break-even point of 60% (ie, precision equals recall) for domain boundary predictions .
2D structure prediction tools
NNcon and SVMcon for general residue-residue contact prediction
Residue-residue contact prediction continues to be an area of active research and becoming of greater importance in the latest rounds of CASP. Of particular importance to tertiary structure prediction are sequence based (ie ab-initio) contact prediction methods and recent work by Wu et al. has shown that predicted contact information can be used to significantly improve predictions for free modeling targets . The MULTICOM toolbox contains two general residue-residue contact predictors – NNCon  and SVMcon . NNcon  is a sequence-based, ab initio method to predict intra-chain protein residue-residue contacts. NNcon uses a set of two-dimensional (2D) recursive neural network ensembles  which predict the probability that the distance between any two residues are below a threshold (i.e. in contact). Features used for each residue include a sequence profile, secondary structure and solvent accessibility.
SVMcon  is an ab initio method based on a support vector machine (SVM). For each residue pair, a set of features including secondary structure, solvent accessibility and a sequence profile is encoded for a 9-residue window centered on each residue. This feature vector is fed into a SVM trained on a large dataset which classifies the residue-residue pair.
Accuracy for NNcon and SVMcon contact predictions on all CASP9 targets
medium range contacts (12 < = seq. separation < 24)
long range contacts (seq. separation > = 24)
DIpro2.0 for protein disulfide bond prediction
DIpro2.0 is a tool that uses kernel methods, two-dimensional recursive neural networks, and weighted graph matching for large-scale protein disulfide bridge prediction [99, 100]. Given a protein sequence, it can predict if a cysteine in the protein participates in a disulfide bond and how bonding cysteines are connected. The method can handle proteins with arbitrary number of disulfide bonds. Benchmarked on a large disulfide bond data set , the specificity and sensitivity of classifying individual residues as bonded or non-bonded are 87% and 89%, respectively, and the accuracy of overall disulfide connectivity pattern prediction is 51%. Some other disulfide bond prediction tools are DiANNA , GDAP , and CYSPRED .
BETApro1.0 for protein beta-sheet structure prediction
BETApro1.0 integrates two-dimensional recursive neural networks and graph algorithms with protein sequence profiles and predicted structural features (e.g. secondary structure and relative solvent accessibility) to predict specific beta residue pairs, beta strand pairs, strand alignments, strand pairing direction, and beta-sheet topology for beta sheets in a protein . BETApro1.0 was evaluated on a large dataset using different standard measures . At the break-even point, the specificity and sensitivity of beta-residue pairing predictions is 41%. At 59% specificity, the sensitivity of beta strand pairing predictions is 54%. Some other beta-sheet prediction tools are BETAWRAP , SVM-BetaPred , and BETTY .
3D structure prediction and evaluation tools
MULTICOM for tertiary structure prediction
MULTICOM , an automated multi-level combination method, combines complementary and alternative templates, alignments, and models to predict protein tertiary structures. Several implementations of this approach with minor differences were tested in the last two Critical Assessments of Techniques for Protein Structure Predictions (CASP8 and CASP9) in 2008 and 2010, respectively . One significant improvement on multi-template combination benchmarked in CASP9 is to check the structural consistency between multiple template candidates. This procedure avoids potential atom clashes caused by conflicting structural conformations from inconsistent templates. The structural similarity of a pair of query-template alignments was checked by comparing the structures of two templates after they are aligned to the same regions of the query using TM-Align . Only structurally similar query-template alignments are combined. Both MULTICOM-server and MULTICOM-human predictors were ranked among the best in CASP8 and CASP9.
The average GDT-TS and TM scores of top-one and best-of-five models of MULTICOM predictors on 107 CASP9 targets
Best of Five
APOLLO for protein model quality assessment
APOLLO is a software package that can predict global and residue-specific qualities of individual or multiple protein models without knowing native structures . For an individual model, APOLLO uses a machine learning method (support vector machine) to predict its absolute global  and residue-specific qualities . The absolute global quality of a model is the overall structural similarity between the model and its native structure in terms of GDT-TS score, whereas the absolute residue-specific qualities are the structural deviations at each residue position in terms of Angstrom (Å). The features used in the machine learning algorithm include amino acid sequence and the differences between predicted (predicted from amino acid sequence) and parsed (parsed from protein model) secondary structures, solvent accessibilities, and residue-residue contact probabilities. For multiple models, APOLLO uses a pair-wise comparison method to predict their relative global qualities . This algorithm performs a full pair-wise comparison of each model against all the others by the structural alignment program TM-Score ; and the average structural similarity scores are used as the predicted global qualities. APOLLO also employs a hybrid approach to refine absolute quality scores. It selects the top five models ranked by initial quality scores as reference models and then superimposes every model with each of the reference models by TM-Score . The average GDT-TS score resulted from the superimpositions is used as the predicted global quality.
We evaluated the APOLLO software package on the models of 107 valid CASP9 targets whose experimental structures were available in the Protein Data Bank . For global quality prediction, the average Pearson’s correlations between predicted and real quality scores of pair-wise, hybrid, and machine learning methods are 0.917, 0.870, and 0.671, respectively . For residue-specific quality prediction, APOLLO has an average error deviation of 2.60 and 3.18 Å on the residues whose actual distances to the native are < = 10 and 20 Å, respectively .
Other protein bioinformatics tools
MUpro1.0 for protein mutation stability prediction
MUpro1.0  is a tool using support vector machines to predict protein stability changes for single amino acid mutations. It can predict the amount of the energy change caused by an amino acid mutation from a protein sequence, a protein structure, or both. MUpro1.0 was evaluated on a large dataset of single amino acid mutations . It predicted the direction (positive versus negative) of the mutation-induced energy changes at 84% accuracy. The method can also reliably predict the absolute value of an energy change. Some mutation stability prediction tools are PoPMuSiC , SDM , I-Mutant2.0 , and CUPSAT .
SeqRate for protein folding rate prediction
SeqRate  is a sequence-based tool for large-scale protein folding rate prediction. It uses a Support Vector Machine regression method with a set of features derived from protein sequences alone to make predictions. The tool can predict both folding kinetic types and real-value folding rates. The folding kinetic type prediction accuracy of SeqRate on a standard benchmark is 80% .
MSACompro1.2.0 for protein multiple sequence alignment with predicted structural features
MSACompro1.2.0  is a new tool that integrates predicted secondary structure, solvent accessibility, and contact map information with protein sequences to improve protein multiple sequence alignment. MSACompro1.2.0 was evaluated on the BAliBASE 3.0 datasets , yielding an average alignment Sum of Pair score (SP score) of 88.85 and the average alignment True Column score (TC score) of 61.31. The results showed that incorporating protein structural features into multiple sequence alignment improves alignment accuracy over existing tools without using structural features.
HMMEditor for visualization of hidden Markov models of protein sequence family
HMMEditor  is a visual, interactive editor for visualizing and manipulating profile Hidden Markov Models of a protein family. It provides a series of functions to visualize the profile HMM architecture, transition probabilities, and emission probabilities. It also allows users to align a sequence against the profile HMM and visualize the corresponding Viterbi path.
Software packages, web services, documentation, and user support
The availability and running environment of the MULTICOM tools
Linux, Browser, Unix, Windows
The MULTICOM toolbox has been implemented in different programming languages including C++, Java, and Perl. The tools have been extensively tested on the Linux platform. We expect to gradually release some standalone tools for other popular platforms such as Windows and Mac. Most of the tools in the toolbox are available as online web services, which makes it easy for users to make predictions on a small scale without a need to install the software. The web interface is generally simple and intuitive and requires a minimum amount of information from the user. The results may be sent to users by email or be presented in the browser. Most tools are also available as software packages that can be downloaded by users for large-scale prediction or other purposes. In general, installing these tools is straightforward and often only requires unzipping the software, setting a few paths in a configuration file, and running a configuration script. The package of each tool includes a readme file that contains both installation instructions and a quick guide on using the tool. One or more test examples with expected results are often provided with the package for users to test an installation.
In order to facilitate the use of the tools, the user manuals for these tools have been developed in PDF and HTML format and are available at the MULTICOM web site. The user manuals usually include step-by-step installation instructions, application examples, references to more technical documents, and frequently asked questions (FAQ) and solutions. In order to better serve users and gather community feedback to improve the toolbox, a mailing list is created. After subscribing the MULTICOM mailing list (email@example.com), a user can post a message to the mailing list and view the collection of all prior postings. The technical support of the MULTICOM toolbox regularly reads the message postings and answers questions. Collected improvements will be released in future versions of the toolbox.
We developed a comprehensive MULTICOM toolbox consisting of a number of protein structure and structural feature prediction tools. These tools have been extensively tested and used internally and externally during the last several years yielding good performance. All the tools are freely available as software packages and/or online web services for academic use and scientific research at the MULTICOM web site. This makes them useful for large-scale annotation of structure and function of vast protein sequence resources generated in the genomic era. In the future, we will continue to improve the performance, usability, and documentation of these tools, make them available to more platforms (e.g. Windows and Mac), and add new protein structure and function prediction tools into the toolbox. Improvements and new developments will be released on the MULTICOM toolbox web site.
The work is partially supported by a NIH grant (5R01GM093123) to JC, a NLM fellowship to JE, and a Shumaker fellowship to ZW.
- Kendrew J, Dickerson R, Strandberg B, Hart R, Davies D, Phillips D, Shore V: Structure of myoglobin: a three-dimensional Fourier synthesis at 2å resolution. Nature 1960, 185(4711):422–427. 10.1038/185422a0PubMed
- Perutz M, Rossmann M, Cullis A, Muirhead H, Will G, North A: Structure of haemoglobin: a three-dimensional Fourier synthesis at 5.5å resolution, obtained by X-ray analysis. Nature 1960, 185(4711):416–422. 10.1038/185416a0PubMed
- Fox BG, Goulding C, Malkowski MG, Stewart L, Deacon A: Structural genomics: from genes to structures with valuable materials and many questions in between. Nat Methods 2008, 5(2):129–132. 10.1038/nmeth0208-129PubMed
- Rost B, Liu J, Przybylski D, Nair R, Wrzeszczynski KO, Bigelow H, Ofran Y: Prediction of protein structure through evolution. Handbook of Chemoinformatics 2003, 1789–1811.
- Pollastri G, Mclysaght A: Porter: a new, accurate server for protein secondary structure prediction. Bioinformatics 2005, 21(8):1719–1720. 10.1093/bioinformatics/bti203PubMed
- Karplus K, Karchin R, Draper J, Casper J, Mandel-Gutfreund Y, Diekhans M, Hughey R: Combining local-structure, fold-recognition, and new fold methods for protein structure prediction. Proteins: Structure, Function, and Bioinformatics 2003, 53(S6):491–496. 10.1002/prot.10540
- Cheng J, Randall A, Sweredoski M, Baldi P: SCRATCH: a protein structure and structural feature prediction server. Nucleic Acids Res 2005, 33(Web Server Issue):W72-W76.PubMed CentralPubMed
- Vullo A, Bortolami O, Pollastri G, Tosatto SCE: Spritz: a server for the prediction of intrinsically disordered regions in protein sequences using kernel machines. Nucleic Acids Res 2006, 34: W164-W168. 10.1093/nar/gkl166PubMed CentralPubMed
- McGuffin L, Bryson K, Jones D: The PSIPRED protein structure prediction server. Bioinformatics 2000, 16(4):404. 10.1093/bioinformatics/16.4.404PubMed
- Adamczak R, Porollo A, Meller J: Accurate prediction of solvent accessibility using neural networks–based regression. Proteins: Structure, Function, and Bioinformatics 2004, 56(4):753–767. 10.1002/prot.20176
- Adamczak R, Porollo A, Meller J: Combining prediction of secondary structure and solvent accessibility in proteins. Proteins: Structure, Function, and Bioinformatics 2005, 59(3):467–475. 10.1002/prot.20441
- Wagner M, Adamczak R, Porollo A, Meller J: Linear regression models for solvent accessibility prediction in proteins. J Comput Biol 2005, 12(3):355–369. 10.1089/cmb.2005.12.355PubMed
- Porollo A, Adamczak R, Wagner M, Meller J: Maximum feasibility approach for consensus classifiers: Applications to protein structure prediction. 2003, 2003: 75–76.
- Karypis G: YASSPP: better kernels and coding schemes lead to improvements in protein secondary structure prediction. Proteins: Structure, Function, and Bioinformatics 2006, 64(3):575–586. 10.1002/prot.21036
- Cole C, Barber JD, Barton GJ: The Jpred 3 secondary structure prediction server. Nucleic Acids Res 2008, 36(suppl 2):W197-W201.PubMed CentralPubMed
- Frishman D, Argos P: Incorporation of long-distance interactions into a secondary structure prediction algorithm. Protein Eng 1996, 9(2):133–142. 10.1093/protein/9.2.133PubMed
- Frishman D, Argos P: Knowledge-based protein secondary structure assignment. Proteins: Structure, Function, and Bioinformatics 1995, 23(4):566–579. 10.1002/prot.340230412
- Kabsch W, Sander C: Dictionary of protein secondary structure: pattern recognition of hydrogen-bonded and geometrical features. Biopolymers 1983, 22(12):2577–2637. 10.1002/bip.360221211PubMed
- Sen TZ, Jernigan RL, Garnier J, Kloczkowski A: GOR V server for protein secondary structure prediction. Bioinformatics 2005, 21(11):2787–2788. 10.1093/bioinformatics/bti408PubMed CentralPubMed
- Petersen B, Petersen TN, Andersen P, Nielsen M, Lundegaard C: A generic method for assignment of reliability scores applied to solvent accessibility predictions. BMC Struct Biol 2009, 9(1):51. 10.1186/1472-6807-9-51PubMed CentralPubMed
- Pollastri G, Baldi P, Fariselli P, Casadio R: Prediction of coordination number and relative solvent accessibility in proteins. Proteins: Structure, Function, and Bioinformatics 2002, 47(2):142–153. 10.1002/prot.10069
- Faraggi E, Xue B, Zhou Y: Improving the prediction accuracy of residue solvent accessibility and real-value backbone torsion angles of proteins by guided-learning through a two-layer neural network. Proteins: Structure, Function, and Bioinformatics 2009, 74(4):847–856. 10.1002/prot.22193
- Iakoucheva LM, Kimzey AL, Masselon CD, Bruce JE, Garner EC, Brown CJ, Dunker AK, Smith RD, Ackerman EJ: Identification of intrinsic order and disorder in the DNA repair protein XPA. Protein Sci 2001, 10(3):560–571. 10.1110/ps.29401PubMed CentralPubMed
- Dunker AK, Cortese MS, Romero P, Iakoucheva LM, Uversky VN: Flexible nets. FEBS J 2005, 272(20):5129–5148. 10.1111/j.1742-4658.2005.04948.xPubMed
- Mizianty MJ, Stach W, Chen K, Kedarisetti KD, Disfani FM, Kurgan L: Improved sequence-based prediction of disordered regions with multilayer fusion of multiple information sources. Bioinformatics 2010, 26(18):i489-i496. 10.1093/bioinformatics/btq373PubMed CentralPubMed
- Ward JJ, McGuffin LJ, Bryson K, Buxton BF, Jones DT: The DISOPRED server for the prediction of protein disorder. Bioinformatics 2004, 20(13):2138–2139. 10.1093/bioinformatics/bth195PubMed
- Zhang T, Faraggi E, Xue B, Dunker A, Uversky VN, Zhou Y: SPINE-D: Accurate Prediction of Short and Long Disordered Regions by a Single Neural-Network Based Method. J Biomol Struct Dyn 2012, 29(4):799–813. 10.1080/073911012010525022PubMed CentralPubMed
- Ishida T, Kinoshita K: PrDOS: prediction of disordered protein regions from amino acid sequence. Nucleic Acids Res 2007, 35(suppl 2):W460-W464.PubMed CentralPubMed
- Shimizu K, Hirose S, Noguchi T: POODLE-S: web application for predicting protein disorder by using physicochemical features and reduced amino acid set of a position-specific scoring matrix. Bioinformatics 2007, 23(17):2337–2338. 10.1093/bioinformatics/btm330PubMed
- Hirose S, Shimizu K, Kanai S, Kuroda Y, Noguchi T: POODLE-L: a two-level SVM prediction system for reliably predicting long disordered regions. Bioinformatics 2007, 23(16):2046–2053. 10.1093/bioinformatics/btm302PubMed
- Shimizu K, Muraoka Y, Hirose S, Tomii K, Noguchi T: Predicting mostly disordered proteins by using structure-unknown protein data. BMC Bioinforma 2007, 8(1):78. 10.1186/1471-2105-8-78
- Dosztányi Z, Csizmok V, Tompa P, Simon I: The pairwise energy content estimated from amino acid composition discriminates between folded and intrinsically unstructured proteins. J Mol Biol 2005, 347(4):827–839. 10.1016/j.jmb.2005.01.071PubMed
- Dosztányi Z, Csizmok V, Tompa P, Simon I: IUPred: web server for the prediction of intrinsically unstructured regions of proteins based on estimated energy content. Bioinformatics 2005, 21(16):3433–3434. 10.1093/bioinformatics/bti541PubMed
- McGuffin L: The ModFOLD server for the quality assessment of protein structural models. Bioinformatics 2008, 24(4):586. 10.1093/bioinformatics/btn014PubMed
- Roche DB, Buenavista MT, Tetchner SJ, McGuffin LJ: The IntFOLD server: an integrated web resource for protein fold recognition, 3D model quality assessment, intrinsic disorder prediction, domain prediction and ligand binding site prediction. Nucleic Acids Res 2011, 39(suppl 2):W171-W176.PubMed CentralPubMed
- Marsden RL, McGuffin LJ, Jones DT: Rapid protein domain assignment from amino acid sequence using predicted secondary structure. Protein Sci 2002, 11(12):2814–2824.PubMed CentralPubMed
- Chen P, Liu C, Burge L, Li J, Mohammad M, Southerland W, Gloster C, Wang B: DomSVR: domain boundary prediction with support vector regression from sequence information alone. Amino Acids 2010, 39(3):713–726. 10.1007/s00726-010-0506-6PubMed CentralPubMed
- Sim J, Kim SY, Lee J: PPRODO: prediction of protein domain boundaries using neural networks. Proteins: Structure, Function, and Bioinformatics 2005, 59(3):627–632. 10.1002/prot.20442
- Liu J, Rost B: Sequence-based prediction of protein domains. Nucleic Acids Res 2004, 32(12):3522–3530. 10.1093/nar/gkh684PubMed CentralPubMed
- Eickholt J, Deng X, Cheng J: DoBo: Protein domain boundary prediction by integrating evolutionary signals and machine learning. BMC Bioinforma 2011, 12: 43. 10.1186/1471-2105-12-43
- Gewehr JE, Zimmer R: SSEP-Domain: protein domain prediction by alignment of secondary structure elements and profiles. Bioinformatics 2006, 22(2):181–187. 10.1093/bioinformatics/bti751PubMed
- Rost B, Yachdav G, Liu J: The predictprotein server. Nucleic Acids Res 2004, 32(suppl 2):W321-W326.PubMed CentralPubMed
- Baú D, Martin A, Mooney C, Vullo A, Walsh I, Pollastri G: Distill: a suite of web servers for the prediction of one-, two-, and three-dimensional structural features of proteins. BMC Bioinforma 2006, 7(1):402. 10.1186/1471-2105-7-402
- Singh S, Hajela K, Ramani A: SVM-BetaPred: prediction of right-handed ß-helix fold from protein sequence using SVM. Pattern Recognition in Bioinformatics 2007, 108–119.
- Punta M, Rost B: PROFcon: novel prediction of long-range contacts. Bioinformatics 2005, 21(13):2960–2968. 10.1093/bioinformatics/bti454PubMed
- Fuchs A, Kirschner A, Frishman D: Prediction of helix–helix contacts and interacting helices in polytopic membrane proteins using neural networks. Proteins: Structure, Function, and Bioinformatics 2009, 74(4):857–871. 10.1002/prot.22194
- Ferre F, Clote P: DiANNA: a web server for disulfide connectivity prediction. Nucleic Acids Res 2005, 33(suppl 2):W230-W232.PubMed CentralPubMed
- O’Connor BD, Yeates TO: GDAP: a web tool for genome-wide protein disulfide bond prediction. Nucleic Acids Res 2004, 32(suppl 2):W360-W364.PubMed CentralPubMed
- Fariselli P, Riccobelli P, Casadio R: Role of evolutionary information in predicting the disulfide-bonding state of cysteine in proteins. Proteins: Structure, Function, and Bioinformatics 1999, 36(3):340–346. 10.1002/(SICI)1097-0134(19990815)36:3<340::AID-PROT8>3.0.CO;2-D
- Bradley P, Cowen L, Menke M, King J, Berger B: Betawrap: Successful prediction of parallel β-helices from primary sequence reveals an association with many microbial pathogens. Proc Natl Acad Sci 2001, 98(26):14819–14824. 10.1073/pnas.251267298PubMed CentralPubMed
- Zimmermann O, Wang L, Hansmann UHE: BETTY: Prediction of β-Strand Type from Sequence. In Silico Biol 2007, 7(4):535–542.PubMed
- Li Y, Fang Y, Fang J: Predicting residue–residue contacts using random forest models. Bioinformatics 2011, 27(24):3379–3384. 10.1093/bioinformatics/btr579PubMed
- Björkholm P, Daniluk P, Kryshtafovych A, Fidelis K, Andersson R, Hvidsten TR: Using multi-data hidden Markov models trained on local neighborhoods of protein structure to predict residue–residue contacts. Bioinformatics 2009, 25(10):1264–1270. 10.1093/bioinformatics/btp149PubMed CentralPubMed
- Wu S, Zhang Y: A comprehensive assessment of sequence-based and template-based methods for protein contact prediction. Bioinformatics 2008, 24(7):924–931. 10.1093/bioinformatics/btn069PubMed CentralPubMed
- Shackelford G, Karplus K: Contact prediction using mutual information and neural nets. Proteins: Structure, Function, and Bioinformatics 2007, 69(S8):159–164. 10.1002/prot.21791
- Zhang Y, Skolnick J: The protein structure prediction problem could be solved using the current PDB library. Proc Natl Acad Sci 2005, 102(4):1029–1034. 10.1073/pnas.0407152101PubMed CentralPubMed
- Baker D, Sali A: Protein structure prediction and structural genomics. Science 2001, 294(5540):93–96. 10.1126/science.1065659PubMed
- Zhang Y: Progress and challenges in protein structure prediction. Curr Opin Struct Biol 2008, 18(3):342–348. 10.1016/j.sbi.2008.02.004PubMed CentralPubMed
- Zhou H, Zhou Y: SPEM: improving multiple sequence alignment with sequence profiles and predicted secondary structures. Bioinformatics 2005, 21(18):3615–3621. 10.1093/bioinformatics/bti582PubMed
- Xu J, Li M, Kim D, Xu Y: RAPTOR: optimal protein threading by linear programming. J Bioinforma Comput Biol 2003, 1(1):95–117. 10.1142/S0219720003000186
- Simons K, Kooperberg C, Huang E, Baker D: Assembly of protein tertiary structures from fragments with similar local sequences using simulated annealing and Bayesian scoring functions. J Mol Biol 1997, 268(1):209–225. 10.1006/jmbi.1997.0959PubMed
- Roy A, Kucukural A, Zhang Y: I-TASSER: a unified platform for automated protein structure and function prediction. Nat Protoc 2010, 5(4):725–738. 10.1038/nprot.2010.5PubMed CentralPubMed
- Zhang Y: I-TASSER: Fully automated protein structure prediction in CASP8. Proteins: Structure, Function, and Bioinformatics 2009, 77(S9):100–113. 10.1002/prot.22588
- Zhang Y: I-TASSER server for protein 3D structure prediction. BMC Bioinforma 2008, 9(1):40. 10.1186/1471-2105-9-40
- Šali A, Potterton L, Yuan F, van Vlijmen H, Karplus M: Evaluation of comparative protein modeling by MODELLER. Proteins: Structure, Function, and Bioinformatics 1995, 23(3):318–326. 10.1002/prot.340230306
- Fiser A, Sali A: Modeller: generation and refinement of homology-based protein structure models. Methods Enzymol 2003, 374: 461–491.PubMed
- Soding J, Biegert A, Lupas A: The HHpred interactive server for protein homology detection and structure prediction. Nucleic Acids Res 2005, 33(Web Server Issue):W244-W248.PubMed CentralPubMed
- Xu D, Zhang Y: Ab initio protein structure assembly using continuous structure fragments and optimized knowledge‐based force field. Proteins: Structure, Function, and Bioinformatics 2012.
- Zhou H, Skolnick J: Ab initio protein structure prediction using chunk-TASSER. Biophys J 2007, 93(5):1510–1518. 10.1529/biophysj.107.109959PubMed CentralPubMed
- Wallner B, Larsson P, Elofsson A: Pcons. net: protein structure prediction meta server. Nucleic Acids Res 2007, 35(suppl 2):W369-W374.PubMed CentralPubMed
- Karplus K, Barrett C, Hughey R: Hidden Markov models for detecting remote protein homologies. Bioinformatics 1998, 14(10):846–856. 10.1093/bioinformatics/14.10.846PubMed
- Peng J, Xu J: Low-homology protein threading. Bioinformatics 2010, 26(12):i294-i300. 10.1093/bioinformatics/btq192PubMed CentralPubMed
- Yang Y, Faraggi E, Zhao H, Zhou Y: Improving protein fold recognition and template-based modeling by employing probabilistic-based matching between predicted one-dimensional structural properties of query and corresponding native properties of templates. Bioinformatics 2011, 27(15):2076–2082. 10.1093/bioinformatics/btr350PubMed CentralPubMed
- Moult J, Fidelis K, Kryshtafovych A, Rost B, Hubbard T, Tramontano A: Critical assessment of methods of protein structure prediction-round VII. Proteins: Structure, Function, and Bioinformatics 2007, 69(Suppl 8):3–9.
- Moult J, Fidelis K, Kryshtafovych A, Tramontano A: Critical assessment of methods of protein strucutre prediction – round IX. Protiens 2011, 79(S10):1–5. 10.1002/prot.23200
- Monastyrskyy B, Fidelis K, Moult J, Tramontano A, Kryshtafovych A: Evaluation of disorder predictions in CASP9. Proteins 2011, 79(S10):107–118. 10.1002/prot.23161PubMed CentralPubMed
- Monastyrskyy B, Fidelis K, Tramontano A, Kryshtafovych A: Evaluation of residue-residue contact prediction in CASP9. Proteins 2011, 79(S10):119–125. 10.1002/prot.23160PubMed CentralPubMed
- Cozzetto D, Kryshtafovych A, Fidelis K, Moult J, Rost B, Tramontano A: Evaluation of template-based models in CASP8 with standard measures. Proteins: Structure, Function, and Bioinformatics 2009, 77(Suppl 9):000–000.
- Mariani V, Kiefer F, Schmidt T, Haas J, Schwede T: Assessment of template based protein structure predictions in CASP9. Proteins 2011, 79(S10):37–58. 10.1002/prot.23177PubMed
- Kinch L, Shi SY, Cong Q, Cheng H, Liao Y, Grishin NV: CASP9 assessment of free modeling target predictions. Proteins 2011, 79(S10):59–73. 10.1002/prot.23181PubMed CentralPubMed
- Benkert P, Tosatto S, Schomburg D: QMEAN: a comprehensive scoring function for model quality assessment. Proteins 2008., 71(1):
- Cozzetto D, Kryshtafovych A, Tramontano A: Evaluation of CASP8 model quality predictions. Proteins: Structure, Function, and Bioinformatics 2009, 77(S9):157–166. 10.1002/prot.22534
- Eisenberg D, Luthy R, Bowie J: VERIFY3D: assessment of protein models with three-dimensional profiles. Methods Enzymol 1997, 277: 396–404.PubMed
- Larsson P, Skwark M, Wallner B, Elofsson A: Assessment of global and local model quality in CASP8 using Pcons and ProQ. Proteins 2009, 77(S9):167–172. 10.1002/prot.22476PubMed
- McGuffin L, Roche D: Rapid model quality assessment for protein structure predictions using the comparison of multiple models without structural alignments. Bioinformatics 2010, 26(2):182–188. 10.1093/bioinformatics/btp629PubMed
- Paluszewski M, Karplus K: Model Quality Assessment using Distance Constraints from Alignments. Proteins 2008, 75: 540–549.
- Kryshtafovych A, Fidelis K, Tramontano A: Evaluation of model quality predictions in CASP9. Proteins 2011, 79(S10):91–109. 10.1002/prot.23180PubMed CentralPubMed
- Moult J, Fidelis K, Kryshtafovych A, Rost B, Tramontano A: Critical assessment of methods of protein structure prediction (CASP)-round VIII. 2009. (Accpeted)
- MacCallum JL, Perez A, Schnieders MJ, Hua L, Jacobson MP, Dill KA: Assessment of protein structure refinement in CASP9. Proteins 2011, 79(S10):74–90. 10.1002/prot.23131PubMed CentralPubMed
- Baldi P, Pollastri G: The principled design of large-scale recursive neural network architectures–DAG-RNNs and the protein structure prediction problem. J Mach Learn Res 2003, 4: 575–602.
- Bernstein FC, Koetzle TF, Williams GJB, Meyer EF: The protein data bank: A computer-based archival file for macromolecular structures*. J Mol Biol 1977, 112(3):535–542. 10.1016/S0022-2836(77)80200-3PubMed
- Deng X, Eickholt J, Cheng J: PreDisorder: ab initio sequence-based prediction of protein disordered regions. BMC Bioinforma 2009, 10(1):436. 10.1186/1471-2105-10-436
- Deng X, Eickholt J, Cheng J: A comprehensive overview of computational protein disorder prediction methods. Mol BioSyst 2011, 8.
- Wu S, Szilagyi A, Zhang Y: Improving protein structure prediction using multiple sequence-based contact predictions. Structure 2011, 19(8):1182–1191. 10.1016/j.str.2011.05.004PubMed CentralPubMed
- Tegge AN, Wang Z, Eickholt J, Cheng J: NNcon: improved protein contact map prediction using 2D-recursive neural networks. Nucleic Acids Res 2009, 37(suppl 2):W515-W518.PubMed CentralPubMed
- Cheng J, Baldi P: Improved residue contact prediction using support vector machines and a large feature set. BMC Bioinforma 2007, 8(1):113. 10.1186/1471-2105-8-113
- Ezkurdia I, Graña O, Izarzugaza JMG, Tress ML: Assessment of domain boundary predictions and the prediction of intramolecular contacts in CASP8. Proteins: Structure, Function, and Bioinformatics 2009, 77(S9):196–209. 10.1002/prot.22554
- Izarzugaza JMG, Graña O, Tress ML, Valencia A, Clarke ND: Assessment of intramolecular contact predictions for CASP7. Proteins: Structure, Function, and Bioinformatics 2007, 69(S8):152–158. 10.1002/prot.21637
- Cheng J, Saigo H, Baldi P: Large scale prediction of disulphide bridges using kernel methods, two dimensional recursive neural networks, and weighted graph matching. Proteins: Structure, Function, and Bioinformatics 2006, 62(3):617–629.
- Baldi P, Cheng J, Vullo A Advances in Neural Information Processing Systems 17: 2004. In Large-scale prediction of disulphide bond connectivity. The MIT Press, Cambridge, MA; 2004:97–104.
- Cheng J, Baldi P: Three-stage prediction of protein β-sheets by neural networks, alignments and graph algorithms. Bioinformatics 2005, 21(suppl 1):i75-i84. 10.1093/bioinformatics/bti1004PubMed
- Wang Z, Eickholt J, Cheng J: MULTICOM: a multi-level combination approach to protein structure prediction and its assessments in CASP8. Bioinformatics 2010, 26(7):882–888. 10.1093/bioinformatics/btq058PubMed CentralPubMed
- Zhang Y, Skolnick J: Scoring function for automated assessment of protein structure template quality. Proteins: Structure, Function, and Bioinformatics 2004, 57(4):702–710. 10.1002/prot.20264
- Berman H, Westbrook J, Feng Z, Gilliland G, Bhat T, Weissig H, Shindyalov I, Bourne P: The protein data bank. Nucleic Acids Res 2000, 28(1):235–242. 10.1093/nar/28.1.235PubMed CentralPubMed
- Zemla A: LGA: a method for finding 3D similarities in protein structures. Nucleic Acids Res 2003, 31(13):3370–3374. 10.1093/nar/gkg571PubMed CentralPubMed
- Wang Z, Eickholt J, Cheng J: APOLLO: a quality assessment service for single and multiple protein models. Bioinformatics 2011, 27(12):1715–1716. 10.1093/bioinformatics/btr268PubMed CentralPubMed
- Wang Z, Tegge AN, Cheng J: Evaluating the absolute quality of a single protein model using structural features and support vector machines. Proteins: Structure, Function, and Bioinformatics 2009, 75(3):638–647. 10.1002/prot.22275
- Cheng J, Wang Z, Tegge A, Eickholt J: Prediction of global and local quality of CASP8 models by MULTICOM series. Proteins 2009, 77(S9):181–184. 10.1002/prot.22487PubMed
- Wang Z, Cheng J: An iterative self-refining and self-evaluating approach for protein model quality estimation. Protein Sci 2012, 21(1):142–151. 10.1002/pro.764PubMed CentralPubMed
- Cheng J, Randall A, Baldi P: Prediction of protein stability changes for single site mutations using support vector machines. Proteins: Structure, Function, and Bioinformatics 2006, 62(4):1125–1132.
- Gilis D, Rooman M: PoPMuSiC, an algorithm for predicting protein mutant stability changes. Application to prion proteins. Protein Engineering 2000, 13(12):849–856. 10.1093/protein/13.12.849PubMed
- Worth CL, Preissner R, Blundell TL: SDM—a server for predicting effects of mutations on protein stability and malfunction. Nucleic Acids Res 2011, 39(suppl 2):W215-W222.PubMed CentralPubMed
- Capriotti E, Fariselli P, Casadio R: I-Mutant2. 0: predicting stability changes upon mutation from the protein sequence or structure. Nucleic Acids Res 2005, 33(suppl 2):W306-W310.PubMed CentralPubMed
- Parthiban V, Gromiha MM, Schomburg D: CUPSAT: prediction of protein stability upon point mutations. Nucleic Acids Res 2006, 34(suppl 2):W239-W242.PubMed CentralPubMed
- Lin G, Wang Z, Xu D, Cheng J: SeqRate: sequence-based protein folding type classification and rates prediction. BMC Bioinforma 2010, 11(Suppl 3):S1. 10.1186/1471-2105-11-S3-S1
- Deng X, Cheng J: MSACompro: Protein Multiple Sequence Alignment Using Predicted Secondary Structure, Solvent Accessibility, and Residue-Residue Contacts. BMC Bioinforma 2011, 12: 472. 10.1186/1471-2105-12-472
- Thompson JD, Koehl P, Ripp R, Poch O: BAliBASE 3.0: latest developments of the multiple sequence alignment benchmark. Proteins: Structure, Function, and Bioinformatics 2005, 1: 127–136.
- Dai J, Cheng J: HMMEditor: a visual editing tool for profile hidden Markov model. BMC genomics 2008, 9(Suppl 1):S8. 10.1186/1471-2164-9-S1-S8PubMed CentralPubMed
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.