Gram-positive and gram-negative subcellular localization using rotation forest and physicochemical-based features
- Abdollah Dehzangi†1, 2Email author,
- Sohrab Sohrabi3,
- Rhys Heffernan3,
- Alok Sharma1, 4,
- James Lyons3,
- Kuldip Paliwal3 and
- Abdul Sattar1, 2
© Dehzangi et al.; licensee BioMed Central Ltd. 2015
Published: 23 February 2015
The functioning of a protein relies on its location in the cell. Therefore, predicting protein subcellular localization is an important step towards protein function prediction. Recent studies have shown that relying on Gene Ontology (GO) for feature extraction can improve the prediction performance. However, for newly sequenced proteins, the GO is not available. Therefore, for these cases, the prediction performance of GO based methods degrade significantly.
In this study, we develop a method to effectively employ physicochemical and evolutionary-based information in the protein sequence. To do this, we propose segmentation based feature extraction method to explore potential discriminatory information based on physicochemical properties of the amino acids to tackle Gram-positive and Gram-negative subcellular localization. We explore our proposed feature extraction techniques using 10 attributes that have been experimentally selected among a wide range of physicochemical attributes. Finally by applying the Rotation Forest classification technique to our extracted features, we enhance Gram-positive and Gram-negative subcellular localization accuracies up to 3.4% better than previous studies which used GO for feature extraction.
By proposing segmentation based feature extraction method to explore potential discriminatory information based on physicochemical properties of the amino acids as well as using Rotation Forest classification technique, we are able to enhance the Gram-positive and Gram-negative subcellular localization prediction accuracies, significantly.
Bacterial proteins are considered to be among the most important proteins and play a wide range of both useful and harmful roles. They are categorized as Prokaryotic microorganisms and generally can be divided into two groups namely: Gram-positive and Gram-negative . The main difference between these two groups is that Gram-positive bacterial proteins have ticker cell wall containing many layers (consists of peptidoglycan and teichoic acids) while Gram-negative bacterial proteins have a tinner cell wall containing of a few layers (consists of only peptidoglycan). This causes difference between Gram-positive, and Gram-negative bacteria in reaction to antibiotics. In fact, despite ticker cell wall, Gram-negative are more resistent to antibiotics than Gram-positive bacteria due to their impenetrable lipid layer in their outer membrane . The importance of bacteria, regardless of being Gram-positive and Gram-negative, is because they are the active elements on many useful biological interactions and at the same time, they are the source of many diseases which makes it crucially important to determine their functions especially for drug and vaccine design .
To be able to function properly, a protein (including the Gram-positive and Gram-negative bacterial proteins) needs to be in its appropriate subcellular place. Given a protein, determining its functioning place in the cell is called protein subcellular localization which is a difficult problem for computational biology and bioinformatics. Especially knowing that some proteins can function in more than one subcellular location which turn it to multi-label problem. This problem can be defined as a multi-class classification task in pattern recognition where its performance relies on the discriminatory information embedded in the extracted features as well as the performance of the classification technique being used.
Since the introduction of the protein subcellular localization problem in , a wide range of classification techniques have been used to tackle this problem [2, 5, 6]. Among the employed classifiers, the best results achieved by using Support Vector Machine (SVM) , Artificial Neural Network (ANN) , and K-Nearest Neighbor (KNN) . However, recent studies have shifted their focus to enhance protein subcellular localization relying on better feature extraction techniques rather than exploring different classification techniques.
The early studies to tackle protein subcellular localization focused on sequence-based features to solve this problem . Later on, wider range of features have been extracted to tackle this problem such as: physicochemical-based , Evolutionary-based , and Structural-based features . However, the most significant enhancement for this task achieved by using Gene Ontology (GO)  information for feature extraction . The term GO was coined to describe the properties of genes in organisms and its database established to represent molecular function, biological process and cellular components of proteins . Despite its importance, GO has two main drawbacks. First, extracting GO for proteins produces a large number of features (over 18000 features) which needs further feature selection and filtering to extract adequate features . Second, the GO information for new proteins is unavailable and many studies use homology-based approaches to extract GO for these proteins . Hence, GO needs further investigation to be used as a reliable source for the feature extraction purposes.
In this study, we propose two overlapped segmentation-based feature extraction techniques to explore discriminatory information of physicochemical attributes of the amino acids. We investigate 117 different physicochemical attributes and select 10 best attributes for this task. We investigate our technique using the transformed protein sequences using evolutionary information embedded in the Position Specific Scoring Matrix (PSSM) to provide a mixture of physicochemical-based and evolutionary-based information. Finally, by applying the Rotation Forest classifier which to the-best-of-our-knowledge has not been explored previously for this task, we enhance the Gram-positive and Gram-negative subcellular localization prediction accuracies up to 3.4% compared to previous studies which have used GO for feature extraction. In this manner, we propose a new reliable method that explore the potential prediction ability of novel classification techniques as well as discriminatory information embedded in physicochemical and evolutionary-based features for protein subcellular localization.
In this study, we use two data sets that have been widely used in the literature for Gram-positive and Gram-negative subcellular localizations. For the Gram-positive subcellular localization, we use the data set that was proposed in [11, 14, 15]. This data set consists of 519 different proteins belonging to 4 Gram-positive subcellular locations. Among these 519 proteins, 515 belong to one location while 4 of these proteins belong to two locations. Hence, there are 523 (515 + 4 × 2) samples in this data set which are divided into four locations as follows: Cell membrane (174), Cell wall (18), cytoplasm (208), and Extracellular (123). This data set is publicly available at: http://www.csbio.sjtu.edu.cn/bioinf/Gpos-multi.
For the Gram-negative we have also used the data set that was introduced in [11, 14, 16]. This data set consists of 1392 different proteins belonging to 8 Gram-negative subcellular locations. Among these proteins 1328 belong to one location and 64 to two locations. Therefore, there are 1456 (1328 + 64 × 2) total samples in this data set which are divided into 8 locations as follows: Cell inner membrane (557), Cell outer membrane (124), Cytoplasm (410), Extracellular (133), Fimbrium (32), Flagellum (12), Nucleoid (8), and Periplasm (180). This data set is publicly available at: http://www.csbio.sjtu.edu.cn/bioinf/Gneg-multi/.
There are four feature groups extracted in this study in which two of them are physicochemical-based (overlapped segmented density and overlapped segmented autocorrelation) and two of them are evolutionary-based (semi composition and auto-covariance). To extract physicochemical-based feature groups, we first transform the protein sequence using evolutionary information and then extract physicochemical-based features from these transformed sequences . We study 10 physicochemical attributes for feature extraction. These 10 attributes are selected among a wide range of physicochemical attributes in the following manner. First, we have extracted 117 physicochemical attributes from [18, 19]. We then extract six feature groups based on each attribute using overlapped segmentation-based feature extraction techniques that have been explained in detail in . In the next step, we have applied six different classifiers to each feature groups namely, Naive Bayes, KNN, SVM, Multi-Layer Perceptron (MLP), Multi-Class Adaptive Boosting (AdaBoost.M1), and Random Forest. Hence, we have 36 results (6 × 6) for a given attribute and 4212 results (36 × 117) for whole set of attributes for each data set (Gram-positive and Gram-negative). We then select 10 physicochemical attributes that individually attains the best results compared to the attributes (comparing the maximum, minimum, and average of all 36 results [20, 21]). The experimental results for this step for both of our data sets are available upon request.
In this study, we aim at proposing novel feature extraction techniques to explore the potential discriminatory information of an individual physicochemical attribute of the amino acids. We have investigated these techniques for protein fold and structural class prediction problems and aim to investigate the generality of our proposed feature extraction techniques to capture local discriminatory information based on an individual physicochemical attribute of the amino acids [17, 20, 22]. We have also investigated the combinations of features extracted from a wider range of physicochemical attributes of the amino acids for protein fold and structural class prediction problems by using simplified segmentation-based feature extraction technique and will investigate these techniques for protein subcellular localization in our future works [23, 24].
The list of the physicochemical attributes and the number assigned to them.
Average number of surrounding residues
Retardation Factor (RF) chromatographic index
Mean Root Mean Square (RMS) fluctuational displacement
Solvent accessible reduction ratio
Average surrounding hydrophobicity
Hydrophobicity scale (contact energy derived from 3D data)
Hydrophilicity scale derived from (HPLC) peptide retention data
where P ij is the substitution probability of the amino acid at location i with the j-th amino acid in PSSM. We then replace the amino acid at the i-th location of the original protein sequence by the j-th amino acid to form the consensus sequence. We replace the original sequence with the consensus sequence and extract physicochemical-based features from this sequence. In this manner, we can gets benefit of evolutionary and physicochemical-based information simultaneously . In the following subsections, we will first explain our proposed method to extract physicochemical-based features and then the employed methods to extract evolutionary-based features.
To explore potential discriminatory information embedded in physicochemical properties of the amino acids, we extract overlapped segmented density and overlapped segmented autocorrelation feature groups.
Overlapped segmented density (OSD)
where R i is the attribute value (normalized) of the i-th amino acid. However, it fails to provide adequate local information for a given attribute . In this study, we calculate local density of the amino acids using a segmentation-based technique.
In this study, 5% distribution factor and 75% (called overlapping factor), are selected based on the average length of the proteins in the explored benchmarks and the experiments that were conducted by the authors . The overlapping approach is proposed to provide more information about the distribution of the amino acids in the middle of a protein considering each side. Considering the number of features (only 10 overlapping features), this approach is able to provide important overlapping information to tackle this problem. This approach also enables us to explore the impact of each attribute more comprehensively compared to previously explored methods [17, 27].
Overlapped segmented autocorrelation (OSA)
where is corresponding to the number of amino acids in each segments (the number of amino acids that the summation of their physicochemical-based values is equal to and DF is the distance factor parameter and is set to 10 as the most effective value for this parameter . Note that 70 (7 × DF ) autocorrelation coefficients are computed in this manner by analyzing the protein sequence from the left side. This process is repeated to obtain another 70 (7 × DF ) autocorrelation coefficients by analyzing the protein sequence from the right side. We also compute the global autocorrelation coefficient of the whole protein sequence (using DF = 10). Thus, we have extracted a total of 150 (7 DF + 7 DF + DF = 15 × DF ) autocorrelation features in this manner. These two physicochemical-based feature groups are extracted to provide local and global discriminatory information based on density, distribution, and autocorrelation properties simultaneously [17, 25].
We also extract two evolutionary-based feature groups, namely Semi-composition and Auto-covariance. These feature groups provide important evolutionary information extracted from PSSM to tackle protein subcellular localization .
Evolutionary-based auto covariance (PSSM-AC)
where P ave,j is the average substitution score of the amino acid j in the PSSM. A distance factor (DF) of 10 is used as the most effective value for this parameter . Hence, there are 200 features (20 × DF) calculated for this feature group.
Classification technique (Rotation Forest)
Rotation Forest is generally categorized as a Meta classifier and is based on the Random Forest classifier, Bagging, and Principal Component Analysis (PCA) . It was introduced in  to enhance the performance of the Random Forest classifier by increasing the impact of diversity and individual prediction accuracy of its base learners (also called weak learners). The Rotation Forest works in the following manner. It builds independently trained decision trees to construct an ensemble of classifiers in a parallel scheme and then combines their predictions using majority voting . The Rotation Forest uses a rotated feature space rather than using random subsets of features (as it is used in the Random Forest classifier) to train each base learner. To do this, the feature set of size N is split randomly into K subsets (where K is the number of base learners in this classifier) and then PCA is applied separately to each subset to linearly transform the feature vector. Then, by combining all K transformed feature subsets, a new set of M features is built to train each base learner . Note that M is equal to N when none of the eigenvalues are zero and M <N when some of the eigenvalues are equal to zero [30, 32].
As it was mentioned earlier, in the Rotation Forest classifier, the aim is to increase diversity within the ensemble classifier better than the Random Forest classifier by using the principle components . This is better than the Bagging and Random Forest classifiers that use bootstrap sampling and random selection to encourage diversity [31, 32, 34]. Also, the individual accuracy of the base learner is considered in the Rotation Forest classifier. Unlike the Random Forest classifier, the Rotation Forest can be used with a wide range of classifiers as its base learner. Hence, it is easier to build different ensemble classifiers using the Rotation Forest classifier compared to the Random Forest classifier . For this classifier, the individual accuracy is enhanced by using more accurate base learner than the Random Forest which uses naive decision tree as its base learner . In this study, C4.5 decision tree is chosen because of its sensitivity to the rotation of the features, as shown by . In this experiment, the data mining toolkit WEKA is used for the classification, K is set to 100 (as the most effective value for this parameter [31, 32]) and J48 (WEKA's own version of C4.5 decision tree algorithm) was used as the base classifier.
Results and discussion
Results achieved for the feature vectors extracted for all 10 physicochemical-based attributes for Gram-positive data set (in percentage %) for all 4 subcellular locations ((1) Cell membrane, (2) Cell wall (3) Cytoplasm, (4) Extracellular, which are numbered from one to four respectively).
Results achieved for the feature vectors extracted for all 10 physicochemical-based attributes for Gram-negative data set (in percentage %) for all 8 subcellular locations ((1) Cell inner membrane, (2) Cell outer membrane, (3) Cytoplasm, (4) Extracellular, (5) Fimbrium, (6) Flagellum, (7) Nucleoid, (8) Periplasm which are numbered from one to eight respectively).
As is shown in Table 2 we achieve over 81.0% prediction accuracy for all the feature vectors extracted from the physicochemical attributes explored in this study. These results are better than the 80.3% prediction accuracy reported in the literature for this task . Achieving high results for all the physicochemical attributes emphasizes the effectiveness of our proposed feature extraction techniques for this task. In addition, we reach 83.6% prediction accuracy for attribute number 9 (Hydrophobicity scale (contact energy derived from 3D data)) which is better than all the other physicochemical attributes explored in this study which emphasizes the effectiveness of this attribute for protein subcellular localization. We enhance Gram-positive subcellular localization 3.3% over previously reported results found in the literature .
For Gram-negative data set we achieve over 75.0% prediction accuracy for all the physicochemical attributes explored in this study. These results are better than the best results reported for this data set (73.2% in ) which emphasizes on the effectiveness of our proposed feature extraction methods. Similarly, among the explored physicochemical attributes, using attribute number 9 (Hydrophobicity scale (contact energy derived from 3D data)) we achieve the best result. We report 76.6% prediction accuracy for Gram-negative subcellular localization which is 3.4% better than previously reported results for this data set . Note that better prediction performance for Gram-positive is because of its simplicity compared to Gram-negative subcellular localization. For Gram-positive subcellular localization, the number of locations is just four and the distribution of samples in different location is more consistent. While there are eight subcellular locations for Gram-negative bacterial proteins and the number of samples in different locations is inconsistence (there are 557 and 410 samples are in the Cell inner membrane and Cytoplasm while there are 8 and 12 samples in the Flagellum and Nucleoid locations). Therefore, the prediction performance for Gram-positive is better than the prediction performance for Gram-negative which is consistent with previously reported results for these two tasks . Achieving high results for both Gram-negative, and Gram-positive data sets shows the generality of our proposed methods and also preference for the Hydrophobicity scale (contact energy derived from 3D data) attribute for Gram-positive and Gram-negative protein subcellular localization. Note that for the rest of our experiments we will use this attribute (Comb_9).
In order to investigate the statistical significance of our achieved improvement for Gram-negative and Gram-positive subcellular localization prediction problems we use paired t-test. The probability value calculated for the pairwise t-test (p = 0.0047) emphasizes the statistical significance of our reported results and the enhancement achieved in this study.
Impact of using Rotation Forest
To investigate the effectiveness of the Rotation Forest for protein subcellular localization, we apply the neural network that was used in  (back propagation ANN using Radial Basis Function (RBF) activation function) to our extracted features and compare the results with those achieve here. We achieve 77.4% and 71.1% prediction accuracies which are 6.2% and 5.5% less than using the Rotation Forest for Gram-positive and Gram-negative protein subcellular localization data sets, respectively. This shows the effectiveness of using the Rotation Forest which has not been explored at all for this task.
Investigating the importance of the explored feature groups in this study
The overall prediction accuracy achieved using Rotation Forest to each feature groups investigated in this study (in percentage).
PSSM_AAC + PSSM_AC
PSSM_AAC + PSSM_AC + OSD
PSSM_AAC + PSSM_AC + OSD + OSA
In this study we have proposed a pattern recognition-based approach to solve Gram-positive and Gram-negative protein subcellular localizations in the following steps. First, we have investigated a wide range of physicochemical attributes using several classifiers and feature extraction techniques and selected the 10 attributes that attained the best results for protein subcellular localization. Second, using the evolutionary information embedded in PSSM, we transformed the protein sequence and also extracted semi-composition and auto-covariance feature groups directly from PSSM. Third, we extracted physicochemical-based feature groups by proposing overlapped segmented density, and overlapped segmented autocorrelation feature groups from the transformed protein sequence for all 10 physicochemical attributes mentioned earlier. Fourth, all four feature groups extracted here were combined to make a feature vector that contains both evolutionary and physicochemical discriminatory information simultaneously. Finally, by applying the Rotation Forest classifier to our extracted feature groups, we achieved 83.6% and 76.6% prediction accuracies for Gram-positive and Gram-negative subcellular localization which are 3.3% and 3.4% better than previously reported results for these two tasks, respectively [8, 35].
These enhancements emphasizes the effectiveness of our proposed feature extraction techniques, the discriminatory information embedded in physicochemical-based features, and finally the Rotation Forest classifier that has not been explored for this task. For our future work, we aim at exploring wider range of feature extraction techniques to reduce the number of features as well as enhancing protein subcellular localization.
NICTA is funded by the Australian Government through the Department of Communications and the Australian Research Council through the ICT Center of Excellence Program.
Publication of this article funded by Griffith University and National ICT Australia (NICTA).
This article has been published as part of BMC Bioinformatics Volume 16 Supplement 4, 2015: Selected articles from the 9th IAPR conference on Pattern Recognition in Bioinformatics. The full contents of the supplement are available online at http://www.biomedcentral.com/bmcbioinformatics/supplements/16/S4.
- Xiao X, Wu ZC, Chou KC: A multi-label classifier for predicting the subcellular localization of gram-negative bacterial proteins with both single and multiple sites. PLoS One. 2011, 6 (6): 20592-10.1371/journal.pone.0020592.View ArticleGoogle Scholar
- Chou KC, Elrod DW: Protein subcellular location prediction. Protein engineering. 1999, 12 (2): 107-118. 10.1093/protein/12.2.107.View ArticlePubMedGoogle Scholar
- Gardy JL, Brinkman FSL: Methods for predicting bacterial protein subcellular localization. Nature Reviews Microbiology. 2006, 4 (1): 741-751.View ArticlePubMedGoogle Scholar
- Nakai K, Kanehisa M: Expert system for predicting protein localization sites in gram-negative bacteria. Proteins: Structure, Function, and Bioinformatics. 1991, 11 (2): 95-110. 10.1002/prot.340110203.View ArticleGoogle Scholar
- Shen HB, Chou KC: Virus-ploc: A fusion classifier for predicting the subcellular localization of viral proteins within host and virus-infected cells. Biopolymers. 2007, 85 (3): 233-240. 10.1002/bip.20640.View ArticlePubMedGoogle Scholar
- Mohabatkar H, Beigi MM, Esmaeili A: Prediction of gaba a receptor proteins using the concept of chou's pseudo-amino acid composition and support vector machine. Journal of Theoretical Biology. 2011, 281 (1): 18-23. 10.1016/j.jtbi.2011.04.017.View ArticlePubMedGoogle Scholar
- Mohabatkar H, Beigi MM, Abdolahi K, Mohsenzadeh S: Prediction of allergenic proteins by means of the concept of chou's pseudo amino acid composition and a machine learning approach. Medicinal Chemistry. 2013, 9 (1): 133-137. 10.2174/157340613804488341.View ArticlePubMedGoogle Scholar
- Huang C, Yuan J: Using radial basis function on the general form of chou's pseudo amino acid composition and pssm to predict subcellular locations of proteins with both single and multiple sites. Biosystems. 2013, 113 (1): 50-57. 10.1016/j.biosystems.2013.04.005.View ArticlePubMedGoogle Scholar
- Shen HB, Chou KC: Gneg-mploc: a top-down strategy to enhance the quality of predicting subcellular localization of gram-negative bacterial proteins. Journal of Theoretical Biology. 2010, 264 (2): 326-333. 10.1016/j.jtbi.2010.01.018.View ArticlePubMedGoogle Scholar
- Ashburner M, Ball CA, Blake JA, Botstein D, Butler H, Cherry JM, Davis AP, Dolinski K, Dwight SS, Eppig JT, et al: Gene ontology: tool for the unification of biology. Nature genetics. 2000, 25 (1): 25-29. 10.1038/75556.PubMed CentralView ArticlePubMedGoogle Scholar
- Chou KC, Shen HB: Cell-ploc 2.0: An improved package of web-servers for predicting subcellular localization of proteins in various organisms. Engineering. 2010, 2 (10):Google Scholar
- Hu Y, Li T, Sun J, Tang S, Xiong W, Li D, Chen G, Cong P: Predicting gram-positive bacterial protein subcellular localization based on localization motifs. Journal of theoretical biology. 2012, 308: 135-140.View ArticlePubMedGoogle Scholar
- Mei S: Predicting plant protein subcellular multi-localization by chou's pseaac formulation based multi-label homolog knowledge transfer learning. Journal of theoretical biology. 2012, 310: 80-87.View ArticlePubMedGoogle Scholar
- Chou KC, Shen HB: Plant-mploc: a top-down strategy to augment the power for predicting plant protein subcellular localization. PloS one. 2010, 5 (6): 11335-10.1371/journal.pone.0011335.View ArticleGoogle Scholar
- Shen HB, Chou KC: Gpos-ploc: an ensemble classifier for predicting subcellular localization of gram-positive bacterial proteins. Protein Engineering Design and Selection. 2007, 20 (1): 39-46. 10.1093/protein/gzl053.View ArticleGoogle Scholar
- Chou KC, Shen SB: Large-scale predictions of gram-negative bacterial protein subcellular locations. Journal of Proteome Research. 2006, 5 (12): 3420-3428. 10.1021/pr060404b.View ArticlePubMedGoogle Scholar
- Dehzangi A, Paliwal KK, Sharma A, Dehzangi O, Sattar A: A combination of feature extraction methods with an ensemble of different classifiers for protein structural class prediction problem. IEEE Transaction on Computational Biology and Bioinformatics (TCBB). 2013, 10 (3): 564-575.View ArticleGoogle Scholar
- Mathura VS, Kolippakkam D: Apdbase: Amino acid physico-chemical properties database. Bioinformation. 2005, 12 (1): 2-4.View ArticleGoogle Scholar
- Gromiha MM: A statistical model for predicting protein folding rates from amino acid sequence with structural class information. Journal of Chemical Information and Modeling. 2005, 45 (2): 494-501. 10.1021/ci049757q.View ArticlePubMedGoogle Scholar
- Dehzangi A, Sattar A: Protein fold recognition using segmentation-based feature extraction model. Proceedings of the 5th Asian Conference on Intelligent Information and Database Systems. ACIIDS05 Springer. 2013, 345-354.View ArticleGoogle Scholar
- Sharma A, Paliwal KK, Dehzangi A, Lyons J, Imoto S, Miyano S: A strategy to select suitable physicochemical attributes of amino acids for protein fold recognition. BMC Bioinformatics. 2013, 14 (233): 11-Google Scholar
- Dehzangi A, Paliwal KK, Sharma A, Lyons J, Sattar A: Protein fold recognition using an overlapping segmentation approach and a mixture of feature extraction models. AI 2013: Advances in Artificial Intelligence, Springer. 2013, 32-43.View ArticleGoogle Scholar
- Dehzangi A, Sharma A, Lyons J, Paliwal KK, Sattar A: A mixture of physicochemical and evolutionary-based feature extraction approaches for protein fold recognition. International Journal of Data Mining and Bioinformatics. 2015,Google Scholar
- Dehzangi A, Sattar A: Ensemble of diversely trained support vector machines for protein fold recognition. Proceedings of the 5th Asian Conference on Intelligent Information and Database Systems. ACIIDS05, Springer. 2013, 335-344.View ArticleGoogle Scholar
- Dehzangi A, Paliwal KK, Lyons J, Sharma A, Sattar A: Enhancing protein fold prediction accuracy using evolutionary and structural features. Proceeding of the Eighth IAPR International Conference on Pattern Recognition in Bioinformatics. PRIB. 2013, 196-207.Google Scholar
- Altschul SF, Madden TL, Schaffer AA, Zhang JH, Zhang Z, Miller W, Lipman DJ: Gapped blast and psi-blast: a new generation of protein database search programs. Nucleic Acids Research. 1997, 17: 3389-3402.View ArticleGoogle Scholar
- Dehzangi A, Phon-Amnuaisuk S: Fold prediction problem: The application of new physical and physicochemical-based features. Protein and Peptide Letters. 2011, 18 (2): 174-185. 10.2174/092986611794475101.View ArticlePubMedGoogle Scholar
- Chou KC: Some remarks on protein attribute prediction and pseudo amino acid composition. Journal of Theoretical Biology. 2011, 273 (1): 236-247. 10.1016/j.jtbi.2010.12.024.View ArticlePubMedGoogle Scholar
- Breiman L: Bagging predictors. Machine Learning. 1996, 24: 123-140.Google Scholar
- Rodriguez JJ, Kuncheva LI, Alonso CJ: Rotation forest: A new classifier ensemble method. Pattern Analysis and Machine Intelligence, IEEE Transactions. 2006, 28 (10): 1619-1630.View ArticleGoogle Scholar
- Dehzangi A, Karamizadeh S: Solving protein fold prediction problem using fusion of heterogeneous classifiers. INFORMATION, An International Interdisciplinary Journal. 2011, 14 (11): 3611-3622.Google Scholar
- Dehzangi A, Phon-Amnuaisuk S, Manafi M, Safa S: Using rotation forest for protein fold prediction problem: An empirical study. Evolutionary Computation, Machine Learning and Data Mining in Bioinformatics. 2010, 217-227.View ArticleGoogle Scholar
- Witten I, Frank E: Data Mining: Practical Machine Learning Tools and Techniques. 2005, Morgan Kaufmann, San Francisco, 2Google Scholar
- Dehzangi A, Phon-Amnuaisuk S, Ng KH, Mohandesi E: Protein fold prediction problem using ensemble of classifiers. Proceedings of the 16th International Conference on Neural Information Processing: Part II. ICONIP '09. 2009, 503-511.View ArticleGoogle Scholar
- Pacharawongsakda E, Theeramunkong T: Predict subcellular locations of singleplex and multiplex proteins by semi-supervised learning and dimension-reducing general mode of chou's pseaac. IEEE transactions on nanobioscience. 2013, 12 (4): 311-320.View ArticlePubMedGoogle Scholar
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.