Improving protein fold recognition using the amalgamation of evolutionary-based and structural based information
© Paliwal et al.; licensee BioMed Central Ltd. 2014
Published: 8 December 2014
Deciphering three dimensional structure of a protein sequence is a challenging task in biological science. Protein fold recognition and protein secondary structure prediction are transitional steps in identifying the three dimensional structure of a protein. For protein fold recognition, evolutionary-based information of amino acid sequences from the position specific scoring matrix (PSSM) has been recently applied with improved results. On the other hand, the SPINE-X predictor has been developed and applied for protein secondary structure prediction. Several reported methods for protein fold recognition have only limited accuracy. In this paper, we have developed a strategy of combining evolutionary-based information (from PSSM) and predicted secondary structure using SPINE-X to improve protein fold recognition. The strategy is based on finding the probabilities of amino acid pairs (AAP). The proposed method has been tested on several protein benchmark datasets and an improvement of 8.9% recognition accuracy has been achieved. We have achieved, for the first time over 90% and 75% prediction accuracies for sequence similarity values below 40% and 25%, respectively. We also obtain 90.6% and 77.0% prediction accuracies, respectively, for the Extended Ding and Dubchak and Taguchi and Gromiha benchmark protein fold recognition datasets widely used for in the literature.
Recognition of protein folds is an essential step in identifying the tertiary structure of proteins. The identification of protein tertiary structures helps in analysing and understanding function, heterogeneity and protein-protein and protein-peptide interactions. The protein fold recognition problem can be tackled by first extracting useful and informative features from protein sequences followed by the identification of the fold of a novel protein sequence using an appropriate classifier. A range of techniques have been developed addressing both the feature extraction and classification areas. Protein fold recognition comprises two major steps: feature extraction and classification.
For feature extraction, several techniques, based on structural, physicochemical and evolutionary information, are available. Dubchak et al.  have shown importance of syntactical and physicochemical features in protein fold recognition using amino acid composition (AAC), in conjunction with five physicochemical attributes of amino acids: hydrophobicity (H), polarity (P), van der Waals volume (V), predicted secondary structure based on normalized frequency of -helix (X) and polarizability (Z). Their 120-dimensional feature set is composed of 20 AAC together with 105 physicochemical features. Their features have been extensively used in protein fold recognition [2–13]. There are other attributes used to extract features after . These are size of the amino acid side chain , solvent accessibility , flexibility , bulkiness , first and second order entropy . As the selection of these attributes was done arbitrarily, we have proposed a more systematic approach to attribute selection has been proposed [18, 19]. Further, a profile-profile alignment method is proposed by Ohlson et al.,  to improve protein fold recognition. The syntactical-based features using amino acid occurrence are proposed by  and by using amino acid residues along with residue pairs are proposed in . In , authors have proposed pairwise frequencies in two ways: PF1 for amino acids separated by one residue and PF2 for adjacent amino acid residues, where PF1 and PF2 are 400-dimensional each. These features are further concatenated in  resulting in 800 features. In some cases, the dimensionality of features could be large which increases computational complexity of the classifier used. In this case, feature selection methods can be used as a preprocessing step to reduce the number of feature [25–27]. To present protein sequence in an effective manner, authors in  proposed pseudo-amino acid composition (A) features. In , authors proposed autocross-covariance (ACC) transformation and the work in [30–32] has shown protein sequence autocorrelation. In , authors derived additional features from physicochemical properties. The bi-gram features  using evolutionary based information (PSSM) have also shown effective recognition results. For more feature extraction or selection methods please see [33–40].
For the classification step, a variety of algorithms, such as linear discriminant analysis , Bayesian classifiers , Bayesian decision rule , k-nearest neighbor , Hidden Markov model [43, 44], artificial neural network [45, 46], support vector machine (SVM) [5, 22, 23] and ensemble classifiers [6, 24, 47, 48], have been adopted. Among the various protein fold recognition classifiers reported in the literature, SVM (or SVM-based) classifiers demonstrate excellent performance [23, 31, 32].
Since the feature extraction is crucial in protein structure recognition, our approach is focussed on developing an appropriate feature extraction method. There can be four distinct types of features, extracted from protein sequence: sequential-based, physicochemical-based, structural-based and evolutionary-based features. In this work, we have investigated evolutionary-based and structural-based features perform as done by other authors [24, 29].
The evolutionary information is extracted from PSSM matrices (a publically available tool to retrieve the PSSM matrix is PSI-BLAST) . PSSM matrix estimates the relative probability of amino acid substitution. If a protein sequence is of length L then PSSM matrix would have L rows and 20 columns (since there are maximum of 20 distinct amino acids in a protein sequence). The structural information is extracted from predicted secondary structure of the proteins using predictors such as SPINE-X and PSIPRED [50, 51]. Protein secondary structures are classified into three states namely, alpha-helix, beta-strands and coils. Since SPINE-X outperformed PSIPRED for protein secondary structure prediction , we use SPINE-X in this study. For a protein sequence of length L, SPINE-X provides a matrix of probabilities of size L × 3 (where 3 refers to the number of secondary structure states). This matrix contains useful information for secondary structure class prediction.
In this paper, we combine the information from the PSSM matrix and secondary structure prediction matrix (SSPM) from SPINE-X to extract relevant and useful knowledge for protein fold recognition. The motivation of combining these two categories comes from the fact that they produce high performance in fold recognition and secondary structure prediction, respectively. Therefore, they have extracted relevant information for the respective tasks and if their impact can be utilized as a whole then the performance of fold recognition can be appreciated. Considering this, we developed k-amino acid pair (AAP) feature extraction method based on PSSM and SSPM, and show its usefulness on several protein benchmark datasets. Compared to the best results reported in the literature, we have enhanced the recognition accuracy by 8.9% and 4.7% for sequence similarity values of less than 25% and 40%, respectively. The next section covers materials and methods.
Materials and Methods
The k-AAP feature extraction method
In this work, we used and 4. We observed that by using higher values of k, the performance does not improve further. This is because as we increase k the correlation between the two amino acids decreases which do not provide relevant information for fold recognition. By using the representation of the above feature vector for all values of k, we can denote feature vector F as , where superscript T is the transpose of the vector. The dimensionality of this feature vector would be 2116. From the feature vector computation, we note that all PSSM and SSPM probability information have been utilized. From a biological perspective, proteins with the same fold also share similar general secondary structure information. In other words, proteins with the same fold often have highly conserved amino acid sub-sequences and can translate to a specific secondary structure residue. In these conserved regions, k -AAP probability values effectively characterize the amino acid sub-sequences. For each sub-sequence conserved in a fold and/or related to a particular residue, all proteins with that fold will contain amino acid pairs characterizing that conserved region and/or set of residues. This information can therefore filter out folds that do not share the same amino acid sub-sequences. Therefore, intuitively F contains more useful information for fold recognition. This has been demonstrated in the experimentation part of the paper.
For classification of the feature vectors, we used the support vector machine (SVM) classifier as it has shown promising results in protein fold recognition. We employed SVM from libsvm with RBF kernel . The parameters of SVM are optimized by using grid search.
Support Vector Machine as a classifier
SVM  is used as a classifier in this experiment. It is one of the leading classification technique and has also been applied in regression areas. The goal of SVM is to discover maximum margin hyper plane (MMH) in order to reduce misclassification error. Data in SVM is transformed through a kernel K function (e.g. linear or RBF) [54, 55].
where denotes the predicted class label of ; is kernel function; number of support vectors is defined by n; bias is defined by b and adjustable weights are defined by . In this work, LibSVM  has been used to conduct training and testing of data. The kernel function utilized is radial basis function (RBF) which is defined by , where g is gamma parameter. The gamma and complexity parameter (C) parameters are optimized using LibSVM. The data is not normalized before processing to the SVM classifier.
We have used three protein sequence datasets in this study: 1) Ding and Dubchak (DD) , 2) Taguchi and Gromiha (TG)  and 3) Extended DD (EDD) . The DD-dataset utilizes protein sequences from 27 Structural Classification of Proteins (SCOP) folds comprehensively, comprehensively covering , , and structural classes . The training set contains 311 protein sequences with no two proteins having more than 35% of sequence identity for alignments longer than 80 residues. The test set comprises 383 protein sequences of less than 40% sequence identity. The training and test sets were merged for analysis.
TG-dataset has 30 folds of globular proteins from SCOP. It has a total of 1612 protein sequences with sequence similarity no more than 25%. The dataset has been described in detail in Taguchi and Gromiha .
EDD-dataset comprises 27 folds which are also present in the DD-dataset. This dataset has 3418 proteins with sequence similarity less than 40%. In this study, we have used the approach described by Dong et al.  to extract the EDD-dataset from SCOP.
We perform n-fold cross-validation process, where n = 5, 6, 7, 8, 9 and 10 for analysis and observation. The next section describes the experimental part of the work.
Results and Discussion
Recognition accuracy by n-fold cross validation procedure for different feature extraction techniques for SVM classification for the DD-dataset.
k-AAP (this paper)
Recognition accuracy by n-fold cross validation procedure for different feature extraction techniques for SVM classification for the TG dataset.
k -AAP (this paper)
Recognition accuracy by n-fold cross validation procedure for different feature extraction techniques for SVM classification for the EDD dataset.
k -AAP (this paper)
Table 1 shows that the highest accuracy obtained by k-AAP is 76.1% on DD-dataset which is at least 2% higher than the other techniques. On TG-dataset (Table 2), k-AAP achieved 77.0% accuracy which is around 10.6% better than Dong et al.,  results and 8.9% better than the best results achieved for this benchmark . It is important to highlight that this enhancement is achieved by using 2116 () features compared to 4000 features used in Dong et al.,  study. For EDD-dataset (Table 3), k -AAP achieved 90.6% accuracy which is around 4.7% higher than the other techniques. This enhancement in prediction accuracy is obtained at low sequence similarity of proteins (less than 25% sequence similarity for TG dataset and less than 40% sequence similarity for EDD dataset). This shows that the extracted features are able to maintain their discriminatory information when the sequence similarity is reduced. Therefore, it can be deduced that k-AAP is performing quite well in recognizing protein folds.
To analyse the statistical significance of the prediction accuracy obtained for protein fold recognition, we carried out paired t-test on our results obtained from the experiments and the highest accuracies reported in the literature. Our results indicate an associated probability value of from the paired t-test. This value confirms that the reported improvement in this work compared to the results found in the literature is significant
Recognition accuracy (in percentage) for 10-fold cross validation procedure for PSSM and SSPM using SVM classifier on the DD, TG and EDD datasets.
Using PSSM only
Using SSPM only
Using PSSM+SSPM (i.e., k -AAP)
Recognition accuracy (in percentage) for 10-fold cross validation procedure using different classifiers on k -AAP.
SVM (SMO with linear polynomial of degree P = 1)
SVM (SMO with P = 3)
Random Forest (10 base learners)
Adaboost.M1 (10 base learners)
kNN (for k = 1)
Furthermore, sensitivity and specificity values for all the features used here have been computed for the three datasets. Figure 2, depicts this analysis on DD dataset, Figure 3 on TG dataset and Figure 4 on EDD dataset. It can be observed from Figures 2, 3, 4 that although specificity values are is high for all the feature sets, sensitivity values are variable. This indicates that false positive is very small in comparison with true negative. Thus true negative dominates the results. This usually happens for difficult problems. It can be seen from the results that by incorporating evolutionary-based features, the sensitivity increased. This highlights the impact of evolutionary-based features in improving protein fold recognition accuracy. For all the datasets, sensitivity is highest for k-AAP method.
In this paper, we have proposed the k-amino acid pair feature extraction method. This method utilizes PSSM linear probabilities and SSPM probabilities. The accuracy of fold recognition of the proposed method was consistently better than that obtained from other similar methods.
To the best of our knowledge, we achieved over 90% and 75% prediction accuracies with sequence similarity rates less than 40% and 25%, respectively. For the EDD and TG benchmark datasets, we attained 90.6% and 77.0% prediction accuracies, which are 4.7% and 8.9%, respectively, better than the best results reported in the literature. We also observed 76.1% for the DD benchmark which is 1.9% better than other methods.
The publication costs for this article were funded by Griffith University, Australia.
This article has been published as part of BMC Bioinformatics Volume 15 Supplement 16, 2014: Thirteenth International Conference on Bioinformatics (InCoB2014): Bioinformatics. The full contents of the supplement are available online at http://www.biomedcentral.com/bmcbioinformatics/supplements/15/S16.
- Dubchak I, Muchnik I, Kim SK: Protein folding class predictor for SCOP: approach based on global descriptors. Proceedings, 5th International Conference on Intelligent Systems for Molecular Biology. 1997, 104-107.Google Scholar
- Chinnasamy A, Sung WK, Mittal A: Protein structure and fold prediction using tree-augmented naive Bayesian classifier. J Bioinf CompBio. 2005, 3 (4): 803-819. 10.1142/S0219720005001302.View ArticleGoogle Scholar
- Krishnaraj Y, Reddy CK: Boosting methods for protein fold recognition: an empirical comparison. IEEE Int Conf on Bioinfor and Biomed. 2008, 393-396.Google Scholar
- Valavanis IK, Spyrou GM, Nikita KS: A comparative study of multi-classification methods for protein fold recognition. Int J Comput Intelligence in Bioinformatics and Systems Biology. 2010, 1 (3): 332-346. 10.1504/IJCIBSB.2010.031394.View ArticleGoogle Scholar
- Ding C, Dubchak I: Multi-class protein fold recognition using support vector machines and neural networks. Bioinformatics. 2001, 17 (4): 349-358. 10.1093/bioinformatics/17.4.349.View ArticlePubMedGoogle Scholar
- Dehzangi A, Amnuaisuk SP, Ng KH, Mohandesi E: Protein fold prediction problem using ensemble of classifiers. Proceedings of the 16th International Conference on Neural Information Processing. 2009, 503-511.View ArticleGoogle Scholar
- Kecman V, Yang T: Protein fold recognition with adaptive local hyper plane Algorithm. Computational Intelligence in Bioinformatics and Computational Biology, CIBCB '09 IEEE Symposium. 2009, 75-78.Google Scholar
- Kavousi K, Moshiri B, Sadeghi M, Araabi BN, Moosavi-Movahedi AA: A protein fold classier formed by fusing different modes of pseudo amino acid composition via PSSM. Computational Biology and Chemistry. 2011, 35 (1): 1-9. 10.1016/j.compbiolchem.2010.12.001.View ArticlePubMedGoogle Scholar
- Dehzangi A, Amnuaisuk SP: Fold prediction problem: the application of new physical and physicochemical-based features. Protein and Peptide Letters. 2011, 18: 174-185. 10.2174/092986611794475101.View ArticlePubMedGoogle Scholar
- Chmielnicki W, Stapor K: A hybrid discriminative-generative approach to protein fold recognition. Neurocomputing. 2012, 75: 194-198. 10.1016/j.neucom.2011.04.033.View ArticleGoogle Scholar
- Dehzangi A, Paliwal KK, Sharma A, Dehzangi O, Sattar A: A Combination of Feature Extraction Methods with an Ensemble of Different Classifiers for Protein Structural Class Prediction Problem. IEEE/ACM transactions on computational biology and bioinformatics. 2013a, 10 (3): 564-575.View ArticleGoogle Scholar
- Dehzangi A, Paliwal KK, Lyons J, Sharma A, Sattar A: Exploring potential discriminatory information embedded in pssm to enhance protein structural class prediction accuracy. Proceeding of the Pattern Recognition in Bioinformatics PRIB. 2013b, 7986: 208-219. 10.1007/978-3-642-39159-0_19.View ArticleGoogle Scholar
- Dehzangi A, Paliwal KK, Lyons J, Sharma A, Sattar A: Enhancing protein fold prediction accuracy using evolutionary and structural features. Proceeding of the Pattern Recognition in Bioinformatics. 2013c, 7986: 196-207. 10.1007/978-3-642-39159-0_18.View ArticleGoogle Scholar
- Zhang H, Zhang T, Gao J, Ruan J, Shen S, Kurgan LA: Determination of protein folding kinetic types using sequence and predicted secondary structure and solvent accessibility. Amino Acids. 2010, 1: 1-13.View ArticleGoogle Scholar
- Najmanovich R, Kuttner J, Sobolev V, Edelman M: Side-chain flexibility in proteins upon ligand binding. Proteins: Structure, Function, and Bioinformatics. 2000, 39 (3): 261-268. 10.1002/(SICI)1097-0134(20000515)39:3<261::AID-PROT90>3.0.CO;2-4.View ArticleGoogle Scholar
- Huang JT, Tian J: Amino acid sequence predicts folding rate for middle-size two-state proteins. Proteins: Structure, Function, and Bioinformatics. 2006, 63 (3): 551-554. 10.1002/prot.20911.View ArticleGoogle Scholar
- Zhang TL, Ding YS, Chou KC: Prediction protein structural classes with pseudo amino acid composition: approximate entropy and hydrophobicity pattern. Journal of Theoretical Biology. 2008, 250: 186-193. 10.1016/j.jtbi.2007.09.014.View ArticlePubMedGoogle Scholar
- Sharma A, Paliwal KK, Dehzangi A, Lyons J, Imoto S, Miyano S: A Strategy to Select Suitable Physicochemical Attributes of Amino Acids for Protein Fold Recognition. BMC Bioinformatics. 2013a, 14: 233-10.1186/1471-2105-14-233.View ArticleGoogle Scholar
- Sharma A, Lyons J, Dehzangi A, Paliwal KK: A feature extraction technique using bi-gram probabilities of position specific scoring matrix for protein fold recognition. Journal of Theoretical Biology. 2013b, 320 (7): 41-46.View ArticleGoogle Scholar
- Ohlson T, Wallner B, Elofsson A: Profile-profile methods provide improved fold-recognition: a study of different profile-profile alignment methods. Proteins: Structure, Function, and Bioinformatics. 2004, 57: 188-197. 10.1002/prot.20184.View ArticleGoogle Scholar
- Taguchi Yh, Gromiha MM: Application of amino acid occurrence for discriminating different folding types of globular proteins. BMC Bioinformatics. 2007, 8: 404-10.1186/1471-2105-8-404.PubMed CentralView ArticlePubMedGoogle Scholar
- Shamim MTA, Anwaruddin M, Nagarajaram HA: Support vector machine-based classification of protein folds using the structural properties of amino acid residues and amino acid residue pairs. Bioinformatics. 2007, 23 (24): 3320-3327. 10.1093/bioinformatics/btm527.View ArticlePubMedGoogle Scholar
- Ghanty P, Pal NR: Prediction of protein folds: extraction of new features, dimensionality reduction, and fusion of heterogeneous classifiers. IEEE Trans On Nano Bioscience. 2009, 8: 100-110.View ArticleGoogle Scholar
- Yang T, Kecman V, Cao L, Zhang C, Huang JZ: Margin-based ensemble classifier for protein fold recognition. Expert Systems with Applications. 2011, 38: 12348-12355. 10.1016/j.eswa.2011.04.014.View ArticleGoogle Scholar
- Sharma A, Paliwal KK: A gradient linear discriminant analysis for small sample sized problem. Neural Processing Letters. 2008, 27 (1): 17-24. 10.1007/s11063-007-9056-7.View ArticleGoogle Scholar
- Sharma A, Koh CH, Imoto S, Miyano S: Strategy of finding optimal number of features on gene expression data. Electronics Letters. 2011, 47 (8): 480-482. 10.1049/el.2011.0526.View ArticleGoogle Scholar
- Sharma A, Imoto S, Miyano S, Sharma V: Null space based feature selection method for gene expression data. International Journal of Machine Learning and Cybernetics. 2012a, 3 (4): 269-276. 10.1007/s13042-011-0061-9.View ArticleGoogle Scholar
- Chou KC: Prediction of protein cellular attributes using pseudo amino acid composition. Proteins. 2001, 43: 246-255. 10.1002/prot.1035.View ArticlePubMedGoogle Scholar
- Dong Q, Zhou S, Guan J: A new taxonomy-based protein fold recognition approach based on autocross-covariance transformation. Bioinformatics. 2009, 25 (20): 2655-2662. 10.1093/bioinformatics/btp500.View ArticlePubMedGoogle Scholar
- Shen HB, Chou KC: Ensemble classier for protein fold pattern recognition. Bioinformatics. 2006, 22: 1717-1722. 10.1093/bioinformatics/btl170.View ArticlePubMedGoogle Scholar
- Kurgan LA, Zhang T, Zhang H, Shen S, Ruan J: Secondary structure-based assignment of the protein structural classes. Amino Acids. 2008, 35: 551-564. 10.1007/s00726-008-0080-3.View ArticlePubMedGoogle Scholar
- Liu T, Geng X, Zheng X, Li R, Wang J: Accurate Prediction of Protein Structural Class Using AutoCovariance Transformation of PSI-BLAST Profiles. Amino Acids. 2012, 42: 2243-2249. 10.1007/s00726-011-0964-5.View ArticlePubMedGoogle Scholar
- Paliwal KK, Sharma A, Lyons J, Dehzangi A: A tri-gram based feature extraction technique using linear probabilities of position specific scoring matrix for protein fold recognition. IEEE Transactions on Nanobioscience. 2014, 13 (1): 44-50.View ArticlePubMedGoogle Scholar
- Sharma A, Paliwal KK: Fast Principal Component Analysis using Fixed-Point Algorithm. Pattern Recognition Letters. 2007, 28 (10): 1151-1155. 10.1016/j.patrec.2007.01.012.View ArticleGoogle Scholar
- Sharma A, Paliwal KK: Cancer Classification by Gradient LDA Technique Using Microarray Gene Expression Data. Data & Knowledge Engineering. 2008b, 66 (2): 338-347. 10.1016/j.datak.2008.04.004.View ArticleGoogle Scholar
- Sharma A, Imoto S, Miyano S: A between-class overlapping filter-based method for transcriptome data analysis. Journal of Bioinformatics and Computational Biology. 2012c, 10 (5): 1250010-1-1250010-20.View ArticleGoogle Scholar
- Sharma A, Paliwal KK, Imoto S, Miyano S: Principal component analysis using QR decomposition. International Journal of Machine Learning and Cybernetics. 2013c, 4 (6): 679-683. 10.1007/s13042-012-0131-7.View ArticleGoogle Scholar
- Sharma A, Paliwal KK, Imoto S, Miyano S: A feature selection method using improved regularized linear discriminant analysis. Machine Vision and Applications. 2014, 25 (3): 775-786. 10.1007/s00138-013-0577-y.View ArticleGoogle Scholar
- Sharma A, Dehzangi A, Lyons J, Imoto S, Miyano S, Nakai K, Patil A: Evaluation of sequence features from intrinsically disordered regions for the estimation of protein function. PLOS One. 2014, 9 (2): e89890-10.1371/journal.pone.0089890.PubMed CentralView ArticlePubMedGoogle Scholar
- Sharma A, Imoto S, Miyano S: A top-r feature selection algorithm for microarray gene expression data. IEEE/ACM Transactions on Computational Biology and Bioinformatics. 2012b, 9 (3): 754-764.View ArticleGoogle Scholar
- Klein P: Prediction of protein structural class by discriminant analysis. Biochim BiophysActa. 1986, 874: 205-215.Google Scholar
- Wang ZZ, Yuan Z: How good is prediction of protein-structural class by the component-coupled method?. Proteins: Structure, Function, and Bioinformatics. 2000, 38: 165-175. 10.1002/(SICI)1097-0134(20000201)38:2<165::AID-PROT5>3.0.CO;2-V.View ArticleGoogle Scholar
- Bouchaffra D, Tan J: Protein fold recognition using a structural Hidden Markov Model. Proceedings of the 18th International Conference on Pattern Recognition. 2006, 186-189.Google Scholar
- Deschavanne P, Tuffery P: Enhanced protein fold recognition using a structural alphabet. Proteins: Structure, Function, and Bioinformatics. 2009, 76: 129-137. 10.1002/prot.22324.View ArticleGoogle Scholar
- Chen K, Zhang X, Yang MQ, Yang JY: Ensemble of probabilistic neural networks for protein fold recognition. Proceedings of the 7th IEEE International Conference on Bioinformatics and Bioengineering (BIBE). 2007, 66-70.Google Scholar
- Ying Y, Huang K, Campbell C: Enhanced protein fold recognition through a novel data integration approach. BMC Bioinformatics. 2009, 10 (1): 267-10.1186/1471-2105-10-267.PubMed CentralView ArticlePubMedGoogle Scholar
- Dehzangi A, Amnuaisuk SP, Dehzangi O: Enhancing protein fold prediction accuracy by using ensemble of different classifiers. Australian Journal of Intelligent Information Processing Systems. 2010, 26 (4): 32-40.Google Scholar
- Dehzangi A, Karamizadeh S: Solving protein fold prediction problem using fusion of heterogeneous classifiers. Information an International Interdisciplinary Journal. 2011, 14 (11): 3611-3622.Google Scholar
- Altschul SF, Madden TL, Schaffer AA, Zhang JH, Zhang Z, Miller W, Lipman DJ: Gapped blast and psi-blast: a new generation of protein database search programs. Nucleic Acids Research. 1997, 17: 3389-3402.View ArticleGoogle Scholar
- Faraggi E, Zhang T, Yang Y, Kurgan L, Zhou Y: SPINE X: improving protein secondary structure prediction by multi-step learning coupled with prediction of solvent accessible surface area and backbone torsion angels. Journal of Computational Chemistry. 2012, 30 (3): 259-267.View ArticleGoogle Scholar
- McGuffin LJ, Bryson K, Jones DT: The PSIPRED protein structure prediction server. Bioinformatics. 2000, 16 (4): 404-5. 10.1093/bioinformatics/16.4.404.View ArticlePubMedGoogle Scholar
- Chang CC, Lin CJ: LIBSVM: a library for support vector machines. ACM Transactions on Intelligent Systems and Technology. 2011, 2 (3): 1-27.View ArticleGoogle Scholar
- Vapnik VN: The nature of statistical learning theory. 1995, New York: Springer-Verlag, 314-View ArticleGoogle Scholar
- Bishop CM: Pattern recognition and machine learning. 2006, New York: Springer Science, 738-Google Scholar
- Lyons J, Biswas N, Sharma A, Dehzangi A, Paliwal KK: Protein fold recognition by alignment of amino acid residues using kernelized dynamic time warping. Journal of Theoretical Biology. 2014, 354: 137-145.View ArticlePubMedGoogle Scholar
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.