- Research article
- Open Access
Support Vector Machine-based method for predicting subcellular localization of mycobacterial proteins using evolutionary information and motifs
© Rashid et al; licensee BioMed Central Ltd. 2007
- Received: 15 May 2007
- Accepted: 13 September 2007
- Published: 13 September 2007
In past number of methods have been developed for predicting subcellular location of eukaryotic, prokaryotic (Gram-negative and Gram-positive bacteria) and human proteins but no method has been developed for mycobacterial proteins which may represent repertoire of potent immunogens of this dreaded pathogen. In this study, attempt has been made to develop method for predicting subcellular location of mycobacterial proteins.
The models were trained and tested on 852 mycobacterial proteins and evaluated using five-fold cross-validation technique. First SVM (Support Vector Machine) model was developed using amino acid composition and overall accuracy of 82.51% was achieved with average accuracy (mean of class-wise accuracy) of 68.47%. In order to utilize evolutionary information, a SVM model was developed using PSSM (Position-Specific Scoring Matrix) profiles obtained from PSI-BLAST (Position-Specific Iterated BLAST) and overall accuracy achieved was of 86.62% with average accuracy of 73.71%. In addition, HMM (Hidden Markov Model), MEME/MAST (Multiple Em for Motif Elicitation/Motif Alignment and Search Tool) and hybrid model that combined two or more models were also developed. We achieved maximum overall accuracy of 86.8% with average accuracy of 89.00% using combination of PSSM based SVM model and MEME/MAST. Performance of our method was compared with that of the existing methods developed for predicting subcellular locations of Gram-positive bacterial proteins.
A highly accurate method has been developed for predicting subcellular location of mycobacterial proteins. This method also predicts very important class of proteins that is membrane-attached proteins. This method will be useful in annotating newly sequenced or hypothetical mycobacterial proteins. Based on above study, a freely accessible web server TBpred http://www.imtech.res.in/raghava/tbpred/ has been developed.
- Support Vector Machine
- Subcellular Location
- Hide Markov Model
- Amino Acid Composition
- Support Vector Machine Model
According to the GOLD (Genomes OnLine Database) database  as on 12th Dec, 2006 genomes of nine mycobacterial species have been sequenced and published creating a heap of about 45055 kb of genomic data. The coming years will see a lot more as genome-sequencing projects are holding about 19 mycobacterial species in pipeline. Moreover, functions of 48% of the predicted 3995 proteins of Mycobacterium tuberculosis H37Rv are yet to be assigned . Therefore a robust and reliable computer algorithm for functional annotation of mycobacterial proteins is the demand of time. This group of organism is well known for its pathogenicity. After Bacille Calmette-Guérin (BCG), developed in 1921, till date we don't have a promising vaccine against tuberculosis. Furthermore, several new pharmaceutical targets have yet to be unravelled to combat the multi-drug resistant strains of mycobacterium. One of the key features of Gene Ontology (GO) is cellular localization which gives important information about a protein [3, 4]. Thus it is important to develop method for predicting subcellular localization of a protein of a pathogenic organism like mycobacterium.
In last few years several subcellular localization prediction systems have been developed using various features of a protein like composition of amino acid, pseudo amino acid, dipeptide and Physico-chemical properties [5–9]. Recently, a web server 'PseAA'  has been developed for computing pseudo amino acid composition, an important descriptor for protein sequence. Multiple alignments in form of PSSM profile have also been used to extract the compositional information for developing subcellular localization methods [11, 12]. In these methods firstly a protein sequence is represented by fixed length pattern then models are developed using machine learning techniques like Support Vector Machine (SVM), Artificial Neural Network (ANN), K-nearest neighbor (KNN) [13–15]. Broadly, the existing methods of subcellular localization have been developed for i) eukaryotic proteins that includes TSSub, LOCSVMPSI, ESLpred, Euk-Ploc and BaCelLo [11, 12, 15–17] and ii) prokaryotic proteins mainly for bacterial proteins like PSORTb, PSLpred, CELLO, LOCtree, P-classifier, Gpos-ploc, GNBSL [18–26]. Recently, it has been observed that organism specific method performs better than general methods for that organism [13, 27–29]. Thus methods have been developed for predicting subcellular location of human proteins [13, 27, 29]. One of the challenges in subcellular localization is to predict location of proteins having multiple-location [29, 30]. Other subcellular location predictors have been developed very recently for a wide variety of organism type such as plant, bacteria and virus [31–33]. In addition attempts have been made to annotate Mycobacterium tuberculosis genome using experimental and predicted information [34, 35].
To the best of authors' knowledge no method has been developed for predicting subcellular localization of mycobacterial proteins, which has different cell wall composition than Gram-negative or Gram-positive bacteria. In this study we describe models developed for predicting four subcellular locations of mycobacterial proteins, namely cytoplasmic, Integral membrane, secretory and membrane-attached proteins [36, 37]. A systematic attempt has been made to develop highly accurate SVM-based models using various features of proteins like amino acid, dipeptides and PSSM composition [38, 39]. In addition models have been developed using Hidden Markov Model (HMM) and MEME/MAST for predicting subcellular location of mycobacterial proteins [40–43]. We also compared performance of our method with that of the other existing methods on dataset used in the current study.
Performance of BLAST
Prediction of subcellular localization of proteins using BLAST
The performance of various SVM models
ACC ± sd
ACC ± sd
ACC ± sd
ACC ± sd
Amino Acid Composition
88.82 ± 5.4
86.07 ± 7.5
44.00 ± 42.2
55.00 ± 19.4
89.41 ± 7.8
81.09 ± 7.5
50.00 ± 36.8
50.00 ± 17.4
94.71 ± 4.8
87.81 ± 6.1
44.00 ± 42.2
68.33 ± 28
The performance of HMM based model
Sensitivity (percent of correct hits)
The comparison of performance of hybrid model and MEME/MAST model
As shown above in Table 2 and Table 4 (parentheses), SVM models performed well on cytoplasmic and integral membrane where as MEME/MAST motif models performed well on secretory and membrane-attached proteins. Thus there was a need to combine these models in order to develop a highly accurate approach. So a hybrid model was developed where a protein is predicted using SVM and MEME/MAST motif with preference given to MEME/MAST motif. In hybrid model first a protein sequence was searched against all the motifs, if any motif has E-value lower than cut-off value then motif location is assigned as location of protein. In case more than one motif is found in protein then location of motif having minimum E-value is assigned as location of a protein. In case protein does not have any motif then PSSM based SVM models are used to predict its subcellular location. For detailed scheme see Table S7 in Additional File 1. As shown in Table 4, we achieved best performance at E-value 10 with overall accuracy of 86.8%. Though the overall performance was not very high as compared to PSSM based SVM model but average accuracy increases around 16% (from 73.71 to 89%). It means performances for all classes were higher, rather than for only cytoplasm and integral membrane protein.
Comparison with existing methods
The performances of existing methods on dataset used in this study
Intergral Membrane 
Membrane -attached 
Web server description
Various SVM modules developed in the present study were implemented into a web server, TBpred, for predicting the subcellular localization of mycobacterial proteins. User can select from amino acid composition, dipeptide composition and PSSM based SVM models or a hybrid model for prediction. The common gateway interface (CGI) script for TBpred was written using PERL 5.03. This server is installed on a Sun Server (420E) under a UNIX (Solaris 7) environment. TBpred is freely available at http://www.imtech.res.in/raghava/tbpred/.
Several methods have been developed for predicting subcellular location of eukaryotic, prokaryotic (Gram-negative bacteria) and human proteins but no method is available for mycobacterial proteins. Thus there was a need to develop a dedicated method for predicting subcellular localization of mycobacterial proteins. There are two reasons for developing subcellular localization method specially for mycobacterial proteins; i) organism specific subcellular localization method(s) performs better than generalized methods [13, 27–29]; ii) Mycobacterium sp. is different from other organisms (it has complex cell wall and its virulence factors are distinct from other pathogens). Thus we made systematic attempt to develop method for predicting subcellular localization of mycobacterial proteins using state of the art techniques. First standard SVM models have been developed using amino acid and dipeptides composition. The performance of these standard models was excellent for cytoplasmic and integral membrane proteins but failed to predict secretory and membrane-attached proteins (Table 2). The performance improved significantly from 68.47% to 73.71% when PSSM composition is used instead of amino acid composition. Despite overall improvement, accuracy of prediction was low for secretory proteins, though accuracy increased in case of membrane-attached proteins. The failure of these models for secretory and membrane-attached proteins may be due to two reasons-(1) small number of proteins in these locations used for training the model; (2) their amino acid composition is significantly different.
In order to overcome these limitations we developed HMM based models for predicting subcellular location. The performance of HMM based model was reasonable for secretory and membrane-attached proteins but its performance was poor for other two classes (Table 3). It seems that secretory and membrane-attached proteins have signals. We also combined HMM model with PSSM based SVM model but performance did not improve (data not shown). We also developed motif-based method using MEME/MAST, where MEME is used to discover motifs and MAST is used to search these motifs in protein database. As shown in Table 4 (parentheses), motif based model successfully predicted secretory proteins; it means secretory proteins have signals which are detected by MEME/MAST. The motif-based method also predicted membrane-attached proteins with reasonable accuracy, but it failed to predict other two classes' particularly cytoplasmic proteins. It is because cytoplasmic proteins are very different so they do not have any specific motifs. Membrane proteins maintain certain type of secondary structure so there may be few motifs in these proteins. It is concluded therefore that for subcellular localization prediction one approach is not sufficient. Most of the pre-existing methods were either based on composition or based on signal/motif, thus their performance was not high for all locations. It's important to combine two approaches in order to predict all subcellular location with high accuracy. The quest arose how to combine two approaches in order to use their strength. In motif based approaches probability of correct prediction depends on E-value. Thus, first we searched motifs in a protein using MAST, if it has motif then we assigned motif's location as protein's location. In case if protein has no motif then we predicted its location using PSSM based SVM model. The average accuracy increased around 17% with minimum accuracy of 85.3% for a particular location. We also compared our method with existing methods, though one to one comparison was not possible as locations were not same. The performance of our method was better than existing methods on our dataset. Our method predicts very important class of proteins called membrane-attached proteins .
A new subcellular class of mycobacterial proteins named "membrane-attached by lipid anchor" has been introduced for the first time. This class of protein may play a role in enhancing the immune response of the host by acting as surface antigens. Thus the search for a potential vaccine/drug target for this immensely important bacterial pathogen by the experimental researchers will greatly be appended by the prediction algorithm developed in this study. Moreover, the comparison of TBpred prediction efficiency with existing methods developed for Gram- positive bacteria supported our earlier assumption that organism specific classifier performs better than the generalised one.
The Data Set
Statistics of distributions of proteins among different subcellular locations
1. Probably external side of the cell wall
2. Integral membrane protein
5. Membrane associated
6. Soluble or peripheral membrane protein
7. Attached to the membrane by a lipid anchor
8. Probable peripheral membrane protein
9. Type-I membrane protein
10. Surface associated
11. Membrane bound
12. Membrane protein
13. Partially secreted
Number of proteins remaining in various locations, after removing redundant proteins, at cut-off 40%, 60% and 90% using program CD-HIT
Sequences remaining after removal of similar sequences
CD-HIT cut-off (% identity)
Five-fold cross validation
Ideally one should evaluate newly developed method using jack-knife method (leave one out cross-validation) [44, 45]. In jack-knife test each protein is used for testing and remaining proteins are used for training, it means one should repeat the process N times for N number of proteins. But in practice limited cross-validation technique (like five-fold, seven-fold) is commonly used instead of jack-knife [46–48]. In this study we evaluated all models using five-fold cross-validation technique, where dataset is randomly divided into five sets, and each containing equal number of proteins. Four sets are used for training and remaining one set for testing; this process is repeated five times in such a way that each set is used once for testing. Finally average of five sets is calculated.
Support Vector Machine Models
In this study, Support Vector Machine has been implemented using SVMlight, which is widely used for developing methods in the field of bioinformatics [38, 45–51]. We used SVMlight binary classifier using 1-vs- r (one-versus-rest) approach, for developing model for predicting multiple locations. In this 1-vs-r approach a SVM model was built for each class by considering proteins of that class positive and proteins of rest of the classes as negative.
Amino Acid and Dipeptide Composition
The percent amino acid composition of each amino acid was calculated using standard formula described in the past . These compositions are represented by a vector of dimension 20. Similarly dipeptide composition of a protein was calculated and represented by a vector of dimension 400 [49, 50].
Composition of Position-Specific Scoring Matrix
The PSSM profile for each protein was generated using PSI-BLAST  by searching the protein against NR database obtained from NCBI. The PSI-BLAST was used with cut-off value 0.001 with three iterations. The PSSM scores were normalized in order to get values between 0 and 1, and then position specific composition of each amino acid was calculated. This way we got composition of amino acids with evolutionary information in form of 400 values .
HMM turns a multiple sequence alignment into a position specific searching system suitable for searching databases for remotely homologous sequences. HMM analysis complements standard pair wise comparison method for large-scale sequence analysis . HMM profiles were generated using software HMMER V-2.3.2. Sean Eddy at Washington University developed HMMER .
Multiple Em for Motif Elicitation/Motif alignment and Search Tool (MEME/MAST)
Motif is a pattern of nucleotides or amino acids that appear in a DNA or protein family. The MEME/MAST consists of two programs, one allows discovery of motifs shared by closely related sequences (MEME)  and the other facilitates database search for sequences containing these motifs (MAST) . Motifs in related protein sequences occur not merely by chance but because they share some biological functions. These motifs might be the active sites of related enzymes. In the present study meme-3.0.14 version is used. We conducted our study for each subcellular localization class independently keeping in mind that the proteins belonging to a subcellular localization class might share some subsequences and thus some biological functions. The motifs discovered in a subset of samples by MEME were searched within the sequences of another subset of the same family (considered as positive database) and also within the samples of rest of the classes (considered as negative database) by MAST. Hit from samples within the class and outside the class was used to evaluate the efficacy of the MEME/MAST classification system. Expectation value (E-value) cut-off was also taken into account during MAST analysis. If the hits for a protein sample were from both within the class and outside the class, hit with lower E-value was preferred.
We combined the output of MEME/MAST and the output SVM module. Firstly, a comprehensive list was generated that encapsulate the hits, along with their corresponding E-value, from all MEME/MAST model (cytoplasmic, integral membrane, secreted and membrane-attached) and the SVM prediction for each protein in the dataset. The MEME/MAST decision (if any) is given priority upon SVM prediction in the final assignment of class to a particular protein sample. In case if all MEME/MAST models generated hits for a sample, the sample was classified into the model generating hit with lowest E-value. Moreover if the lowest E-value is shared by more than one model (although it is the rare finding), the final decision was taken on consensus among MEME/MAST and SVM models. If MAST produces no hit at given E-value then SVM model was used to predict subcellular location of a protein.
Where x can be any subcellular location (nuclear, cytoplasm, extracellular and mitochondria), exp(x) is the number of sequences observed in location x, p(x) is the number of correctly predicted sequences of location x, n(x) is the number of correctly predicted sequences not of location x, u(x) is the number of under-predicted sequences and o(x) is the number of over-predicted sequences.
Overall and average accuracy
In this study we computed both overall and average accuracy. The overall accuracy is the percent of correctly predicted proteins irrespective of class. The average accuracy is mean accuracy of four classes. Both type of accuracy have their advantage and disadvantage.
Reliability Index (RI)
Assignment of RI to each sequence is based upon the difference of highest and the second highest scores of various 1-v-r SVMs in the multi-class classification. RI is defined as:
The authors are thankful to Council of Scientific and Industrial Research (CSIR) and Department of Biotechnology, Government of India, for financial assistance. This report has IMTECH communication number 02/2007.
- Genomes OnLine Database[http://www.genomesonline.org/]
- Campus JC, Pryor MJ, Medigue C, Cole ST: Re-annotation of the genome sequence of Mycobacterium tuberculosis H37Rv. Microbiology 2002, 148: 2967–2973.View ArticleGoogle Scholar
- Alberts B, Bray D, Lewis J, Raff M, Robertis K, Watson JD: Molecular Biology of the Cell. 3rd edition. Garland Publishing, New York; 1994:1255–1272.Google Scholar
- Lodish H, Baltimore D, Berk A, Zipursky SL, Matsudaira P, Darnell J: Molecular Cell Biology. 3rd edition. Scientific American Books, New York; 1995:739–777.Google Scholar
- Chou KC: Prediction of protein cellular attributes using pseudo-amino acid composition. Proteins 2001, 43: 246–255. 10.1002/prot.1035View ArticlePubMedGoogle Scholar
- Chou KC: Using amphiphilic pseudo amino acid composition to predict enzyme subfamily classes. Bioinformatics 2005, 21: 10–19. 10.1093/bioinformatics/bth466View ArticlePubMedGoogle Scholar
- Wang M, Yang J, Liu GP, Xu ZJ, Chou KC: Weighted-support vector machines for predicting membrane protein types based on pseudo-amino acid composition. Protein Eng Des Sel 2004, 17: 509–516. 10.1093/protein/gzh061View ArticlePubMedGoogle Scholar
- Hua S, Sun Z: Support Vector Machine approach for protein subcellular localization prediction. Bioinformatics 2001, 17: 721–728. 10.1093/bioinformatics/17.8.721View ArticlePubMedGoogle Scholar
- Reinhardt A, Hubbard T: Using neural networks for prediction of the subcellular location of proteins. Nucleic Acids Research 1998, 26: 2230–2236. 10.1093/nar/26.9.2230PubMed CentralView ArticlePubMedGoogle Scholar
- PseAA: Pseudo Amino Acid Composition Computation.[http://chou.med.harvard.edu/bioinf/PseAA/]
- Guo J, Lin Y: TSSub: eukaryotic protein subcellular localization by extracting features from profiles. Bioinformatics 2006, 22: 1784–5. 10.1093/bioinformatics/btl180View ArticlePubMedGoogle Scholar
- Xie D, Li A, Wang M, Fan Z, Feng H: LOCSVMPSI: a web server for subcellular localization of eukaryotic proteins using SVM and profile of PSI-BLAST. Nucleic Acids Research 2005, 33: W105-W110. 10.1093/nar/gki359PubMed CentralView ArticlePubMedGoogle Scholar
- Chou KC, Shen HB: Hum-PLoc: a novel ensemble classifier for predicting human protein subcellular localization. Biochem Biophys Res Commun 2006, 347: 150–157. 10.1016/j.bbrc.2006.06.059View ArticlePubMedGoogle Scholar
- Cai CZ, Han LY, Ji ZL, Chen X, Chen YZ: SVM-Prot: Web-Based Support Vector Machine Software for Functional Classification of a Protein from Its Primary Sequence. Nucleic Acids Research 2003, 31: 3692–3697. 10.1093/nar/gkg600PubMed CentralView ArticlePubMedGoogle Scholar
- Bhasin M, Raghava GP: ESLpred: SVM-based method for subcellular localization of eukaryotic proteins using dipeptide composition and PSI-BLAST. Nucleic Acids Research 2004, 32: W414-W419. 10.1093/nar/gkh350PubMed CentralView ArticlePubMedGoogle Scholar
- Shen HB, Yang J, Chou KC: Euk-PLoc: an ensemble classifier for large-scale eukaryotic protein subcellular location prediction. Amino Acids 2007, 33: 57–67. 10.1007/s00726-006-0478-8View ArticlePubMedGoogle Scholar
- Pierleoni A, Martelli PL, Fariselli P, Casadio R: BaCelLo: a balanced subcellular localization predictor. Bioinformatics 2006, 22: 408–16. 10.1093/bioinformatics/btl222View ArticleGoogle Scholar
- Gardy JL, Spencer C, Wang K, Ester M, Tusnady GE, Simon I, Hua S, deFays K, Lambert C, Nakai K, Brinkman FS: PSORT-B: Improving protein subcellular localization prediction for Gram-negative bacteria. Nucleic Acids Research 2003, 31: 3613–3617. 10.1093/nar/gkg602PubMed CentralView ArticlePubMedGoogle Scholar
- Gardy JL, Laird MR, Chen F, Rey S, Walsh CJ, Ester M, Brinkman FS: PSORTb v.2.0: expanded prediction of bacterial protein subcellular localization and insights gained from comparative proteome analysis. Bioinformatics 2005, 21: 617–623. 10.1093/bioinformatics/bti057View ArticlePubMedGoogle Scholar
- Bhasin M, Garg A, Raghava GP: PSLpred: prediction of subcellular localization of bacterial proteins. Bioinformatics 2005, 21: 2522–2524. 10.1093/bioinformatics/bti309View ArticlePubMedGoogle Scholar
- Yu CS, Lin CJ, Hwang JK: Predicting subcellular localization of proteins for Gram-negative bacteria by support vector machines based on n-peptide compositions. Protein Sci 2004, 13: 1402–1406. 10.1110/ps.03479604PubMed CentralView ArticlePubMedGoogle Scholar
- Yu CS, Chen YC, Lu CH, Hwang JK: Prediction of protein subcellular localization. Proteins 2006, 64: 643–651. 10.1002/prot.21018View ArticlePubMedGoogle Scholar
- Nair R, Rost B: Mimicking cellular sorting improves prediction of subcellular localization. J Mol Biol 2005, 348: 85–100. 10.1016/j.jmb.2005.02.025View ArticlePubMedGoogle Scholar
- Wang J, Sung WK, Krishnan A, Li KB: Protein subcellular localization prediction for Gram-negative bacteria using amino acid subalphabets and a combination of multiple support vector machines. BMC Bioinformatics 2005, 6: 174. 10.1186/1471-2105-6-174PubMed CentralView ArticlePubMedGoogle Scholar
- Shen HB, Chou KC: Gpos-PLoc: an ensemble classifier for predicting subcellular localization of Gram-positive bacterial proteins. Protein Eng Des Sel 2007, 20: 39–46. 10.1093/protein/gzl053View ArticlePubMedGoogle Scholar
- Guo J, Lin Y, Liu X: GNBSL: a new integrative system to predict the subcellular location for Gram-negative bacteria proteins. Proteomics 2006, 6: 5099–5105. 10.1002/pmic.200600064View ArticlePubMedGoogle Scholar
- Garg A, Bhasin M, Raghava GPS: Support Vector Machine-based Method for Subcellular Localization of Human Proteins Using Amino Acid Composition, Their Order, and Similarity Search. J Biol Chem 2005, 280: 14427–14432. 10.1074/jbc.M411789200View ArticlePubMedGoogle Scholar
- Nielsen H, Brunak S, Von Heijne G: Machine learning approaches for the prediction of signal peptides and other protein sorting signals. Protein Engineering 1999, 12: 3–9. 10.1093/protein/12.1.3View ArticlePubMedGoogle Scholar
- Shen HB, Chou KC: Hum-mPLoc: An ensemble classifier for large-scale human protein subcellular location prediction by incorporating samples with multiple sites. Biochem Biophys Res Commun 2007, 355: 1006–1011. 10.1016/j.bbrc.2007.02.071View ArticlePubMedGoogle Scholar
- Chou KC, Shen HB: Euk-mPLoc: a fusion classifier for large-scale eukaryotic protein subcellular location prediction by incorporating multiple sites. Journal of Proteome Research 2007, 6: 1728–1734.View ArticlePubMedGoogle Scholar
- Chou KC, Shen HB: Large-scale plant protein subcellular location prediction. Journal of Cellular Biochemistry 2007, 100: 665–678. 10.1002/jcb.21096View ArticlePubMedGoogle Scholar
- Chou KC, Shen HB: Large-scale predictions of Gram-negative bacterial protein subcellular locations. Journal of Proteome Research 2006, 5: 3420–3428. 10.1021/pr060404bView ArticlePubMedGoogle Scholar
- Shen HB, Chou KC: Virus-PLoc: A fusion classifier for predicting the subcellular localization of viral proteins within host and virus-infected cells. Biopolymers 2007, 85: 233–240. 10.1002/bip.20640View ArticlePubMedGoogle Scholar
- Gomez M, Johnson S, Gennaro ML: Identification of Secreted Proteins of Mycobacterium tuberculosis by a Bioinformatic Approach. Infection and Immunity 2000, 68: 2323–2327. 10.1128/IAI.68.4.2323-2327.2000PubMed CentralView ArticlePubMedGoogle Scholar
- Mawuenyega KG, Forst CV, Dobos KM, Belisle JT, Chen J, Bradbury EM, Bradbury AR, Chen X: Mycobacterium tuberculosis functional network analysis by global subcellular protein profiling. Mol Biol Cell 2005, 16: 396–404. 10.1091/mbc.E04-04-0329PubMed CentralView ArticlePubMedGoogle Scholar
- Chou KC, Shen HB: MemType-2L: A Web server for predicting membrane proteins and their types by incorporating evolution information through Pse-PSSM. Biochem Biophys Res Commun 2007, 360: 339–345. 10.1016/j.bbrc.2007.06.027View ArticlePubMedGoogle Scholar
- Bendtsen JD, Jensen LJ, Bloom N, Von Heijne G, Brunak S: Feature-based prediction of non-classical and leaderless protein secretion. Protein Eng Des Sel 2004, 17: 349–356. 10.1093/protein/gzh037View ArticlePubMedGoogle Scholar
- Joachims T: Learning to classify Text Using Support Vector Machines, Dissertation, Kluwer. 2002.View ArticleGoogle Scholar
- Altschul SF, Madden TL, Schaffer AA, Zhang J, Zhang Z, Miller W, Lipman DJ: Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Research 1997, 25: 3389–3402. 10.1093/nar/25.17.3389PubMed CentralView ArticlePubMedGoogle Scholar
- Krogh A, Brown M, Mian IS, Sjeander K, Haussler D: Hidden Markov models in computational biology: Applications to protein modeling. J Mol Biol 1994, 235: 1501–1531. 10.1006/jmbi.1994.1104View ArticlePubMedGoogle Scholar
- Eddy SR: Profile hidden Markov models. Bioinformatics 1998, 14: 755–763. 10.1093/bioinformatics/14.9.755View ArticlePubMedGoogle Scholar
- Bailey TL, Elkan C: Fitting a Mixture Model By Expectation Maximization To Discover Motifs In Biopolymer. In Proceeding of second International Conference on Intelligent Systems for Molecular Biology. AAAI Press, Menlo Park, California; 1994:28–36.Google Scholar
- Bailey TL, Gribskov M: Combining evidence using P-values: application to sequence homology searches. Bioinformatics 1998, 14: 48–54. 10.1093/bioinformatics/14.1.48View ArticlePubMedGoogle Scholar
- Chou KC, Zhang CT: Review: Prediction of protein structural classes. Critical Reviews in Biochemistry and Molecular Biology 1995, 30: 275–349. 10.3109/10409239509083488View ArticlePubMedGoogle Scholar
- Bhasin M, Raghava GPS: A hybrid approach for predicting promiscuous MHC class I restricted T cell epitopes. J Biosci 2007, 32: 31–42. 10.1007/s12038-007-0004-5View ArticlePubMedGoogle Scholar
- Saha S, Raghava GPS: Prediction of bacterial proteins. In Silico Biology 2007, 7: 0028.Google Scholar
- Saha S, Raghava GPS: Prediction of neurotoxins based on their function and source. In Silico Biology 2007, 7: 0025.Google Scholar
- Kumar M, Verma R, Raghava GPS: Prediction of mitochondrial proteins using support vector machine and hidden markov model. J Biol Chem 2006, 281: 5357–5363. 10.1074/jbc.M511061200View ArticlePubMedGoogle Scholar
- Bhasin M, Raghava GPS: Classification of nuclear receptors based on amino acid composition and dipeptide composition. J Biol Chem 2004, 279: 23262–6. 10.1074/jbc.M401932200View ArticlePubMedGoogle Scholar
- Bhasin M, Raghava GPS: GPCRpred: An SVM Based Method for Prediction of families and subfamilies of G-protein coupled receptors. Nucleic Acids Research 2004, 32: W383–9. 10.1093/nar/gkh416PubMed CentralView ArticlePubMedGoogle Scholar
- Lata S, Sharma BK, Raghava GPS: Analysis and prediction of antibacterial peptides. BMC Bioinformatics 2007, 8: 263. 10.1186/1471-2105-8-263PubMed CentralView ArticlePubMedGoogle Scholar
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.