Predicting β-turns and their types using predicted backbone dihedral angles and secondary structures
© Kountouris and Hirst; licensee BioMed Central Ltd. 2010
Received: 12 May 2010
Accepted: 31 July 2010
Published: 31 July 2010
β-turns are secondary structure elements usually classified as coil. Their prediction is important, because of their role in protein folding and their frequent occurrence in protein chains.
We have developed a novel method that predicts β-turns and their types using information from multiple sequence alignments, predicted secondary structures and, for the first time, predicted dihedral angles. Our method uses support vector machines, a supervised classification technique, and is trained and tested on three established datasets of 426, 547 and 823 protein chains. We achieve a Matthews correlation coefficient of up to 0.49, when predicting the location of β-turns, the highest reported value to date. Moreover, the additional dihedral information improves the prediction of β-turn types I, II, IV, VIII and "non-specific", achieving correlation coefficients up to 0.39, 0.33, 0.27, 0.14 and 0.38, respectively. Our results are more accurate than other methods.
We have created an accurate predictor of β-turns and their types. Our method, called DEBT, is available online at http://comp.chem.nottingham.ac.uk/debt/.
Secondary structure can provide important information about three-dimensional protein structure. Therefore, its prediction has been an area of intense research over the past three decades. To predict secondary structure many methods have been implemented, including different machine learning techniques, such as artificial neural networks (ANNs) [1, 2] and support vector machines (SVMs) [3–5], and different input schemes, such as position specific scoring matrices (PSSMs)  and hidden Markov models . Notably, the predictive accuracy reached 80% for three-state prediction, where residues are divided into helix, strand and coil. Helices and strands are repetitive, regular structures, while the remaining residues, which can be tight turns, loops, bulges or random coil, are all classified as coil; they are non-repetitive, irregular secondary structures . Although the helix and strand classes are structurally well-defined, the third class, coil, does not provide any detailed structural information. Hence, further analysis of the local structure is necessary, such as prediction of backbone dihedral angles [5, 8] and prediction of tight turns .
The dihedral angles of β-turn types 
Dihedral angles (°)
ϕ i + 1
ψ i + 1
ϕ i + 2
ψ i + 2
Prediction of β-turns has attracted interest in the past. The approaches can be divided into statistical methods and machine learning techniques. The former include early methods which used amino acid propensities [23–27] as well as more recent methods, like COUDES , which used probabilities with multiple sequence alignments. Over the past few years, machine learning techniques have been applied successfully to predict β-turns. Since their first use , ANNs have been frequently used for β-turn prediction [30–32]. Over the past decade, several studies used SVMs to predict β-turns [33–37] and other techniques, such as nearest neighbour, have been applied recently . Through the use of evolutionary information and more sophisticated machine learning techniques, the correlation coefficient for turn/non-turn prediction is now as high as 0.47 . Other methods predict the type of β-turn, rather than the location of the turn in the chain, with significant success, even though this problem is challenging, due to the lack of examples for many β-turn types. BTPRED , BetaTurns , MOLEBRNN  and the method of Asgary and colleagues  are ANN-based, whereas COUDES  uses amino acid propensities with multiple sequence alignments. In spite of its successful use for the prediction of β-turn location [34, 37], the SVM method has not been employed widely for β-turn type prediction.
Despite the success so far, there is a need for more accurate predictions of both β-turn location and β-type, which could be realised through the use of additional information. Evolutionary information from multiple alignments  as well as predicted secondary structures  can improve β-turn predictions dramatically. In this work, we show that the backbone dihedral angles can provide crucial information for turn/non-turn prediction and can also noticeably improve the prediction of β-turn types, since the types are defined by the dihedral angles of the central residues. Predicted dihedral angles have been used successfully for secondary structure prediction [5, 41]. The method presented here, called DEBT (Dihedrally Enhanced Beta Turn prediction), uses predicted secondary structures and predicted dihedral angles from DISSPred  and achieves the highest correlation coefficient reported to date for turn/non-turn prediction, while the prediction of β-turn types is, in most cases, more accurate than other contemporary methods. The method predicts β-turn type I, II, IV, VIII as defined by Hutchinson and Thornton , while all remaining types are classified as NS (non-specific). Moreover, we show that using a small local window of predicted secondary structures and dihedral angles, rather than using the predictions of one individual residue, is beneficial.
Distribution of residues in β-turns and their types in different datasets
β -turns (%)
DEBT method utilises PSSMs, constructed by the PSI-BLAST algorithm , to predict β-turns and their types. PSSMs have N × 20 elements, where the N rows correspond to the length of the amino acid sequence and the columns correspond to the 20 standard amino acids. PSSMs represent the log-likelihood of a particular residue substitution, usually based on a weighted average of BLOSUM62 . We generated the PSSMs using the BLOSUM62 substitution matrix with an E-value of 0.001 and three iterations against a non-reduntant (nr) database, which was downloaded in February 2009. The data were filtered by pfilt to remove low complexity regions, transmembrane spans and coiled coil regions. The PSSM values were linearly scaled simply by dividing them by ten. Typically, PSSM values are in the range [-7,7], but some values outside this range may appear. Linear scaling maintains the same distribution in the input data and helps avoid numerical difficulties during training.
Support Vector Machines
DEBT employs SVM , a state-of-the-art supervised learning technique. The SVM method has become an area of intense research, because it performs well with real-world problems, it is simple to understand and implement and, most importantly, it finds the global solution, while other methods, like ANNs, have several local solutions . The SVM can find non-linear boundaries between two classes by using a kernel function, which maps the data from the input space into a richer feature space, where linear boundaries can be implemented. Furthermore, the SVM effectively handles large feature spaces, since it does not suffer from the "curse of dimensionality", and, therefore, avoids overfitting, a common drawback of supervised learning techniques.
where x i and x j are the input vectors for instances i and j, respectively, and γ is a parameter that controls the width of the kernel.
Optimised parameters for each SVM classifier used in DEBT.
Because the prediction is based on individual residues, the SVM outputs include some β-turns that are shorter than four residues, which is unrealistic. Turn predictions longer than four adjacent residues are acceptable, since there are many β-turns in the dataset that are overlapping. In fact, about 58% are multiple turns . To ensure that the predictions are at least four residue long, we applied some filtering rules similar to the "state-flipping" rule described by Shepherd and colleagues . The rules are applied with the following order: (1) flip isolated non-turn predictions to turn (tnt → ttt), (2) flip isolated turn predictions to non-turn (ntn → nnn), (3) flip isolated turn pairs of turn prediction to non-turn (nttn → nnnn) and (4) flip the adjacent non-turn predictions to turn for isolated three consecutive turn predictions (ntttn → ttttt).
Prediction accuracy assessment
Apart from the scalar measures described above, we report the receive-operator characteristics (ROC) curves, which represent the sensitivity (or true positive rate - TP rate) against the false positive rate (1 - specificity). ROC curves have been widely used in bioinformatics  for visualisation and assessment of machine learning classifiers. Moreover, the area under the ROC curve (AUC) is calculated to provide a scalar measure of the ROC analysis and compare different methods. The trapezium rule is used to calculate the AUC, as described by Fawcett .
Results and Discussion
The effect of the input scheme
Experiments on the GR426 dataset with different input schemes.
PSSM + SS
PSSM + Dih
PSSM + SS + Dih
PSSM + SS + Dih
PSSM + SS + Dih
PSSM + SS + Dih
Performance of DEBT for the prediction of β-turn location on three datasets.
Q total (%)
Q pred (%)
Q obs (%)
Comparison of DEBT with other turn/non-turn prediction methods on three different datasets.
Q total (%)
Q pred (%)
Q obs (%)
Zheng and Kurgan 
Hu and Li 
Zhang et al. 
Zheng and Kurgan 
Hu and Li 
Zheng and Kurgan 
Hu and Li 
Prediction of β-turn types
DEBT's prediction of β-turn types on three different datasets.
Q total (%)
Performance of DEBT and other β-turn type prediction methods based on the achieved MCC value.
In this article, we presented a method that predicts the location of β-turns and their types in a protein chain. Our method uses predicted dihedral angles from DISSPred  to enhance the predictions. Moreover, we improved the predictive performance by using a local window of predicted secondary structures and dihedral angles, rather than the predictions for one individual residue. The MCC of 0.48, achieved for turn/non-turn prediction on a set of 426 non-redundant proteins, shows that DEBT is more accurate than other β-turn prediction methods. Moreover, we report the highest MCCs of 0.49 and 0.48 on two larger datasets of 547 and 823 non-redundant protein chains. Additionally, the dihedrally enhanced prediction for β-turn types is more accurate than other methods. We report DEBT's prediction on three datasets with achieved MCCs up to 0.39, 0.33, 0.27, 0.14 and 0.38 for β-turn types I, II, IV, VIII and NS, respectively. The prediction of β-turn types has limitations derived from the observation that identical tetrapeptides may form different β-turn types. In fact, around 15% of all tetrapeptides that form β-turns in datasets GR426 and FA547 appear in multiple β-turn types. This number is close to 18% in the FA823 dataset. A detailed analysis of the fundamental limitation of β-turn prediction is a challenging future focus. In spite of the limitations, the performance might be improved further by applying techniques introduced by other studies, such as feature selection techniques , or by using predicted secondary structures and dihedral angles from multiple predictors. Predicted β-turns can be used to improve secondary structure prediction  and we are currently exploring this.
We thank the HPC facility at the University of Nottingham and the University of Nottingham for a PhD studentship.
- Rost B, Sander C: Prediction of protein secondary structure at better than 70% accuracy. J Mol Biol 1993, 232(2):584–599. 10.1006/jmbi.1993.1413PubMedGoogle Scholar
- Jones DT: Protein secondary structure prediction based on position-specific scoring matrices. J Mol Biol 1999, 292(2):195–202. 10.1006/jmbi.1999.3091PubMedGoogle Scholar
- Hua S, Sun Z: A novel method of protein secondary structure prediction with high segment overlap measure: support vector machine approach. J Mol Biol 2001, 308(2):397–407. 10.1006/jmbi.2001.4580PubMedGoogle Scholar
- Karypis G: YASSPP: Better kernels and coding schemes lead to improvements in protein secondary structure prediction. Proteins 2006, 64(3):575–586. 10.1002/prot.21036PubMedGoogle Scholar
- Kountouris P, Hirst JD: Prediction of backbone dihedral angles and protein secondary structure using support vector machines. BMC Bioinformatics 2009, 10: 437. 10.1186/1471-2105-10-437PubMedPubMed CentralGoogle Scholar
- Karplus K, Barrett C, Cline M, Diekhans M, Grate L, Hughey R: Predicting protein structure using only sequence information. Proteins 1999, (Suppl 3):121–125. 10.1002/(SICI)1097-0134(1999)37:3+<121::AID-PROT16>3.0.CO;2-Q
- Richardson JS: The anatomy and taxonomy of protein structure. Adv Protein Chem 1981, 34: 167–339. full_textPubMedGoogle Scholar
- Dor O, Zhou Y: Real-SPINE: an integrated system of neural networks for real-value prediction of protein structural properties. Proteins 2007, 68: 76–81. 10.1002/prot.21408PubMedGoogle Scholar
- Chou KC: Prediction of tight turns and their types in proteins. Anal Biochem 2000, 286: 1–16. 10.1006/abio.2000.4757PubMedGoogle Scholar
- Marcelino AMC, Gierasch LM: Roles of beta-turns in protein folding: from peptide models to protein engineering. Biopolymers 2008, 89(5):380–391. 10.1002/bip.20960PubMedPubMed CentralGoogle Scholar
- Kabsch W, Sander C: Dictionary of protein secondary structure: pattern recognition of hydrogen-bonded and geometrical features. Biopolymers 1983, 22(12):2577–2637. 10.1002/bip.360221211PubMedGoogle Scholar
- de la Cruz X, Hutchinson EG, Shepherd A, Thornton JM: Toward predicting protein topology: an approach to identifying beta hairpins. Proc Natl Acad Sci USA 2002, 99(17):11157–11162. 10.1073/pnas.162376199PubMedPubMed CentralGoogle Scholar
- Kuhn M, Meiler J, Baker D: Strand-loop-strand motifs: prediction of hairpins and diverging turns in proteins. Proteins 2004, 54(2):282–288. 10.1002/prot.10589PubMedGoogle Scholar
- Kumar M, Bhasin M, Natt NK, Raghava GPS: BhairPred: prediction of beta-hairpins in a protein from multiple alignment information using ANN and SVM techniques. Nucleic Acids Res 2005, (33 Web Server):W154-W159. 10.1093/nar/gki588Google Scholar
- Takano K, Yamagata Y, Yutani K: Role of amino acid residues at turns in the conformational stability and folding of human lysozyme. Biochemistry 2000, 39(29):8655–8665. 10.1021/bi9928694PubMedGoogle Scholar
- Trevino SR, Schaefer S, Scholtz JM, Pace CN: Increasing protein conformational stability by optimizing beta-turn sequence. J Mol Biol 2007, 373: 211–218. 10.1016/j.jmb.2007.07.061PubMedPubMed CentralGoogle Scholar
- Fu H, Grimsley GR, Razvi A, Scholtz JM, Pace CN: Increasing protein stability by improving beta-turns. Proteins 2009, 77(3):491–498. 10.1002/prot.22509PubMedPubMed CentralGoogle Scholar
- Rose GD, Gierasch LM, Smith JA: Turns in peptides and proteins. Adv Protein Chem 1985, 37: 1–109. full_textPubMedGoogle Scholar
- Müller G, Hessler G, Decornez HY: Are β -turn mimetics mimics of β -turns? Angew Chem Int Ed Engl 2000, 39(5):894–896. 10.1002/(SICI)1521-3773(20000303)39:5<894::AID-ANIE894>3.0.CO;2-2PubMedGoogle Scholar
- Kee KS, Jois SDS: Design of β -turn based therapeutic agents. Curr Pharm Des 2003, 9(15):1209–1224. 10.2174/1381612033454900PubMedGoogle Scholar
- Fuller AA, Du D, Liu F, Davoren JE, Bhabha G, Kroon G, Case DA, Dyson HJ, Powers ET, Wipf P, Gruebele M, Kelly JW: Evaluating beta-turn mimics as beta-sheet folding nucleators. Proc Natl Acad Sci USA 2009, 106(27):11067–11072. 10.1073/pnas.0813012106PubMedPubMed CentralGoogle Scholar
- Hutchinson EG, Thornton JM: A revised set of potentials for β -turn formation in proteins. Protein Sci 1994, 3(12):2207–2216. 10.1002/pro.5560031206PubMedPubMed CentralGoogle Scholar
- Chou PY, Fasman GD: Conformational parameters for amino acids in helical, β -sheet, and random coil regions calculated from proteins. Biochemistry 1974, 13(2):211–222. 10.1021/bi00699a001PubMedGoogle Scholar
- Wilmot CM, Thornton JM: Analysis and prediction of the different types of β -turn in proteins. J Mol Biol 1988, 203: 221–232. 10.1016/0022-2836(88)90103-9PubMedGoogle Scholar
- Wilmot CM, Thornton JM: β -turns and their distortions: a proposed new nomenclature. Protein Eng 1990, 3(6):479–493. 10.1093/protein/3.6.479PubMedGoogle Scholar
- Chou KC, Blinn JR: Classification and prediction of β -turn types. J Protein Chem 1997, 16(6):575–595. 10.1023/A:1026366706677PubMedGoogle Scholar
- Zhang C, Chou K: Prediction of β -turns in proteins by 1–4 and 2–3 correlation model. Biopolymers 1997, 41(6):673–702. 10.1002/(SICI)1097-0282(199705)41:6<673::AID-BIP7>3.0.CO;2-NGoogle Scholar
- Fuchs PFJ, Alix AJP: High accuracy prediction of β -turns and their types using propensities and multiple alignments. Proteins 2005, 59(4):828–839. 10.1002/prot.20461PubMedGoogle Scholar
- McGregor MJ, Flores TP, Sternberg MJE: Prediction of β -turns in proteins using neural networks. Protein Eng 1989, 2(7):521–526. 10.1093/protein/2.7.521PubMedGoogle Scholar
- Shepherd AJ, Gorse D, Thornton JM: Prediction of the location and type of β -turns in proteins using neural networks. Protein Sci 1999, 8(5):1045–1055. 10.1110/ps.8.5.1045PubMedPubMed CentralGoogle Scholar
- Kaur H, Raghava GPS: Prediction of β -turns in proteins from multiple alignment using neural network. Protein Sci 2003, 12(3):627–634. 10.1110/ps.0228903PubMedPubMed CentralGoogle Scholar
- Kirschner A, Frishman D: Prediction of β -turns and β -turn types by a novel bidirectional Elman-type recurrent neural network with multiple output layers (MOLEBRNN). Gene 2008, 422(1–2):22–29. 10.1016/j.gene.2008.06.008PubMedGoogle Scholar
- Cai YD, Liu XJ, Li YX, Xu XB, Chou KC: Prediction of β -turns with learning machines. Peptides 2003, 24(5):665–669. 10.1016/S0196-9781(03)00133-5PubMedGoogle Scholar
- Zheng C, Kurgan L: Prediction of β -turns at over 80% accuracy based on an ensemble of predicted secondary structures and multiple alignments. BMC Bioinformatics 2008, 9: 430. 10.1186/1471-2105-9-430PubMedPubMed CentralGoogle Scholar
- Zhang Q, Yoon S, Welsh WJ: Improved method for predicting β -turn using support vector machine. Bioinformatics 2005, 21(10):2370–2374. 10.1093/bioinformatics/bti358PubMedGoogle Scholar
- Pham TH, Satou K, Ho TB: Prediction and analysis of β -turns in proteins by support vector machine. Genome Inform 2003, 14: 196–205.PubMedGoogle Scholar
- Hu X, Li Q: Using support vector machine to predict β - and γ -turns in proteins. J Comput Chem 2008, 29(12):1867–1875. 10.1002/jcc.20929PubMedGoogle Scholar
- Kim S: Protein beta-turn prediction using nearest-neighbor method. Bioinformatics 2004, 20: 40–44. 10.1093/bioinformatics/btg368PubMedGoogle Scholar
- Kaur H, Raghava GPS: A neural network method for prediction of β -turn types in proteins using evolutionary information. Bioinformatics 2004, 20(16):2751–2758. 10.1093/bioinformatics/bth322PubMedGoogle Scholar
- Asgary MP, Jahandideh S, Abdolmaleki P, Kazemnejad A: Analysis and identification of β -turn types using multinomial logistic regression and artificial neural network. Bioinformatics 2007, 23(23):3125–3130. 10.1093/bioinformatics/btm324PubMedGoogle Scholar
- Wood MJ, Hirst JD: Protein secondary structure prediction with dihedral angles. Proteins 2005, 59(3):476–481. 10.1002/prot.20435PubMedGoogle Scholar
- Guruprasad K, Rajkumar S: β - and γ -turns in proteins revisited: a new set of amino acid turn-type dependent positional preferences and potentials. J Biosci 2000, 25(2):143–156.PubMedGoogle Scholar
- Kaur H, Raghava GPS: An evaluation of β -turn prediction methods. Bioinformatics 2002, 18(11):1508–1514. 10.1093/bioinformatics/18.11.1508PubMedGoogle Scholar
- Hobohm U, Scharf M, Schneider R, Sander C: Selection of representative protein data sets. Protein Sci 1992, 1(3):409–417. 10.1002/pro.5560010313PubMedPubMed CentralGoogle Scholar
- Hutchinson EG, Thornton JM: PROMOTIF-a program to identify and analyze structural motifs in proteins. Protein Sci 1996, 5(2):212–220. 10.1002/pro.5560050204PubMedPubMed CentralGoogle Scholar
- Altschul SF, Madden TL, Schäffer AA, Zhang J, Zhang Z, Miller W, Lipman DJ: Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res 1997, 25(17):3389–3402. 10.1093/nar/25.17.3389PubMedPubMed CentralGoogle Scholar
- Henikoff S, Henikoff JG: Amino acid substitution matrices from protein blocks. Proc Natl Acad Sci USA 1992, 89(22):10915–10919. 10.1073/pnas.89.22.10915PubMedPubMed CentralGoogle Scholar
- Jones DT, Swindells MB: Getting the most from PSI-BLAST. Trends Biochem Sci 2002, 27(3):161–164. 10.1016/S0968-0004(01)02039-4PubMedGoogle Scholar
- Vapnik V: The Nature of Statistical Learning Theory. N.Y.: Springer; 1995.Google Scholar
- Cristianini N, Shawe-Taylor J: An Introduction to Support Vector Machines and Other Kernel-based Learning Methods. Cambridge University Press; 2000.Google Scholar
- Burges CJ: A Tutorial on Support Vector Machines for Pattern Recognition. Data Min and Knowl Disc 1998, 2(2):121–167. 10.1023/A:1009715923555Google Scholar
- Scholkopf B, Smola AJ: Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond. Cambridge, MA, USA: MIT Press; 2001.Google Scholar
- Chang CC, Lin CJ:LIBSVM: a library for support vector machines. 2001. [http://www.csie.ntu.edu.tw/~cjlin/libsvm]Google Scholar
- Osuna E, Freund R, Girosi F: Support Vector Machines: Training and Applications. Tech. rep., Cambridge, MA, USA; 1997.Google Scholar
- Kaur H, Raghava GPS: BetaTPred: prediction of β -turns in a protein using statistical algorithms. Bioinformatics 2002, 18(3):498–499. 10.1093/bioinformatics/18.3.498PubMedGoogle Scholar
- Matthews BW: Comparison of the predicted and observed secondary structure of T4 phage lysozyme. Biochim Biophys Acta 1975, 405(2):442–451.PubMedGoogle Scholar
- Sonego P, Kocsor A, Pongor S: ROC analysis: applications to the classification of biological sequences and 3 D structures. Brief Bioinform 2008, 9(3):198–209. 10.1093/bib/bbm064PubMedGoogle Scholar
- Fawcett T: An introduction to ROC analysis. Pattern Recogn Lett 2006, 27(8):861–874. 10.1016/j.patrec.2005.10.010Google Scholar
- Frishman D, Argos P: Seventy-five percent accuracy in protein secondary structure prediction. Proteins 1997, 27(3):329–335. 10.1002/(SICI)1097-0134(199703)27:3<329::AID-PROT1>3.0.CO;2-8PubMedGoogle Scholar
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.