Karchin R, Karplus K, Haussler D: Classifying G-protein coupled receptors with support vector machines. Bioinformatics 2002, 18: 147–159. 10.1093/bioinformatics/18.1.147
Cai CZ, Han LY, Ji ZL, Chen X, Chen YZ: SVM-Prot: web-based support vector machine software for functional classification of a protein from its primary sequence. Nuclei Acid Res 2003, 31: 3692–3697. 10.1093/nar/gkg600
Cai CZ, Han LY, Ji ZL, Chen YZ: Enzyme family classification by support vector machines. Proteins 2004, 55: 66–76. 10.1002/prot.20045
Han LY, Cai CZ, Lo SL, Chung MC, Chen YZ: Prediction of RNA-binding proteins from primary sequence by a support vector machine approach . RNA 2004, 10: 355–368. 10.1261/rna.5890304
Dubchak I, Muchnick I, Mayor C, Dralyuk I, Kim SH: Recognition of a protein fold in the context of the Structural Classification of Proteins (SCOP) classification. Proteins 1999, 35: 401–407. 10.1002/(SICI)1097-0134(19990601)35:4<401::AID-PROT3>3.0.CO;2-K
Bock JR, Gough DA: Predicting protein--protein interactions from primary structure. Bioinformatics 2001, 17: 455–460. 10.1093/bioinformatics/17.5.455
Bock JR, Gough DA: Whole-proteome interaction mining . Bioinformatics 2003, 19: 125–134. 10.1093/bioinformatics/19.1.125
Lo SL, Cai CZ, Chen YZ, Chung MC: Effect of training datasets on support vector machine prediction of protein-protein interactions. Proteomics 2005, 5: 876–884. 10.1002/pmic.200401118
Chou KC, Cai YD: Predicting protein-protein interactions from sequences in a hybridization space. J Proteome Res 2006, 5: 316–322. 10.1021/pr050331g
Chou KC: Prediction of protein subcellular locations by incorporating quasi-sequence-order effect. Biochem Biophys Res Commun 2000, 278: 477–483. 10.1006/bbrc.2000.3815
Chou KC, Cai YD: Prediction of protein subcellular locations by GO-FunD-PseAA predictor. Biochem Biophys Res Commun 2004, 320: 1236–1239. 10.1016/j.bbrc.2004.06.073
Chou KC, Shen HB: Hum-PLoc: A novel ensemble classifier for predicting human protein subcellular localization. Biochem Biophys Res Commun 2006, 347: 150–157. 10.1016/j.bbrc.2006.06.059
Chou KC, Shen HB: Large-scale plant protein subcellular location prediction. J Cell Biochem 2006, 100(3):665–678. 10.1002/jcb.21096
Bhasin M, Garg A, Raghava GP: PSLpred: prediction of subcellular localization of bacterial proteins. Bioinformatics 2005, 21(10):2522–2524. 10.1093/bioinformatics/bti309
Guo J, Lin Y, Liu XJ: GNBSL: a new integrative system to predict the subcellular location for Gram-negative bacteria proteins. Proteomics 2006, 6(19):5099–5105. 10.1002/pmic.200600064
Guo J, Lin Y: TSSub: eukaryotic protein subcellular localization by extracting features from profiles. Bioinformatics 2006, 22(14):1784–1785. 10.1093/bioinformatics/btl180
Cui J, Han LY, Lin HH, Zhang HL, Tang ZQ, Zheng CJ, Cao ZW, Chen YZ: Prediction of MHC-binding peptides of flexible lengths from sequence-derived structural and physicochemical properties. Mol Immunol 2007, 44: 866–877. 10.1016/j.molimm.2006.04.001
Schneider G, Wrede P: The rational design of amino acid sequences by artificial neural networks and simulated molecular evolution: de novo design of an idealized leader peptidase cleavage site. Biophys J 1994, 66: 355–344.
Brown MP, Grundy WN, Lin D, Cristianini N, Sugnet CW, Furey TS, Ares MJ Jr, Haussler D: Knowledge-based analysis of microarray gene expression data by using support vector machines. Proc Natl Acad Sci USA 2000, 97(1):262–267. 10.1073/pnas.97.1.262
Ward JJ, McGuffin LJ, Buxton BF, Jones DT: Secondary structure prediction with support vector machines . Bioinformatics 2003, 19(13):1650–1655. 10.1093/bioinformatics/btg223
Han LY, Cai CZ, Ji ZL, Cao ZW, Cui J, Chen YZ: Predicting functional family of novel enzymes irrespective of sequence similarity: a statistical learning approach. Nuclei Acid Res 2004, 32: 6437–6444. 10.1093/nar/gkh984
Li ZR, Lin HH, Han LY, Jiang L, Chen X, Chen YZ: PROFEAT: A web server for computing structural and physicochemical features of proteins and peptides from amino acid sequence. Nuclei Acid Res 2006, 34(Web Server issue):W32–37. 10.1093/nar/gkl305
Chou KC, Cai YD: Prediction of membrane protein types by incorporating amphipathic effects. J Chem Inf Model 2005, 45(2 ):407–413. 10.1021/ci049686v
Gao QB, Wang ZZ, Yan C, Du YH: Prediction of protein subcellular location using a combined feature of sequence. FEBS Lett 2005, 579(16):3444–3448. 10.1016/j.febslet.2005.05.021
Feng ZP, Zhang CT: Prediction of membrane protein types based on the hydrophobic index of amino acids. J Protein Chem 2000, 19: 262–275. 10.1023/A:1007091128394
Lin Z, Pan XM: Accurate prediction of protein secondary structural content. J Protein Chem 2001, 20: 217–220. 10.1023/A:1010967008838
Horne DS: Prediction of protein helix content from an autocorrelation analysis of sequence hydrophobicities. Biopolymers 1988, 27: 451–477. 10.1002/bip.360270308
Sokal RR, Thomson BA: Population structure inferred by local spatial autocorrelation: an example from an Amerindian tribal population. Am J Phys Anthropol 2006, 129: 121–131. 10.1002/ajpa.20250
Dubchak I, I M, Holbrook SR, Kim SH: Prediction of protein folding class using global description of amino acid sequence. Proc Natl Acad Sci USA 1995, 92: 8700–8704. 10.1073/pnas.92.19.8700
Lin HH, Han LY, Cai CZ, Ji ZL, Chen YZ: Prediction of transporter family from protein sequence by support vector machine approach. Proteins 2006, 62(1):218–231. 10.1002/prot.20605
Grantham R: Amino acid difference formula to help explain protein evolution. Science 1974, 185: 862–864. 10.1126/science.185.4154.862
Chou KC: Prediction of protein cellular attributes using pseudo amino acid composition. Proteins: Structure Function and Genetics 2001, 43: 246–255. 10.1002/prot.1035
Bhasin M, Raghava GP: Classification of nuclear receptors based on amino acid composition and dipeptide composition. J Biol Chem 2004, 279: 23262–23266. 10.1074/jbc.M401932200
NC-IUBMB: Enzyme Nomenclature. San Diego, California , Academic Press; 1992.
Chou KC: Using amphiphilic pseudo amino acid composition to predict enzyme subfamily classes. Bioinformatics 2005, 21: 10–19. 10.1093/bioinformatics/bth466
Chou KC, Cai YD: Predicting enzyme family class in a hybridization space. Protein Sci 2004, 13: 2857–2863. 10.1110/ps.04981104
Chou KC, Elrod DW: Prediction of enzyme family classes. J Proteome Res 2003, 2: 183–190. 10.1021/pr0255710
Chou KC: Prediction of G-protein-coupled receptor classes. J Proteome Res 2005, 4: 1413–1418. 10.1021/pr050087t
Chou KC, Elrod DW: Bioinformatical analysis of G-protein-coupled receptors. J Proteome Res 2002, 1: 429–433. 10.1021/pr025527k
Bhasin M, Raghava GP: GPCRpred: an SVM-based method for prediction of families and subfamilies of G-protein coupled receptors. Nuclei Acid Res 2004, 32(Web Server issue):W383–389. 10.1093/nar/gkh416
Saier MHJ, Tran CV, Barabote RD: TCDB: the Transporter Classification Database for membrane transport protein analyses and information. In Nuclei Acid Res. Volume 34. Saier Lab Bioinformatics Group; 2006:D181-D186. 10.1093/nar/gkj001
Suzuki JY, Bollivar DW, Bauer CE: Genetic analysis of chlorophyll biosynthesis. Annu Rev Genet 1997, 31: 61–89. 10.1146/annurev.genet.31.1.61
Lin HH, Han LY, Zhang HL, Zheng CJ, Xie B, Chen YZ: Prediction of the functional class of lipid binding proteins from sequence-derived properties irrespective of sequence similarity. J Lipid Res 2006, 47: 824–831. 10.1194/jlr.M500530-JLR200
Brown MP, Grundy WN, Lin D, Cristianini N, Sugnet CW, Furey TS, Ares MJ, Haussler D: Knowledge-based analysis of microarray gene expression data by using support vector machines. Proc Natl Acad Sci USA 2000, 97(1):262–267. 10.1073/pnas.97.1.262
Burbidge R, Trotter M, Buxton B, Holden S: Drug design by machine learning: support vector machines for pharmaceutical data analysis. Comput Chem 2001, 26(1):5–14. 10.1016/S0097-8485(01)00094-8
Baenzigner JU: Protein-specific glycosyltransferase: how and why they do it! FASEB J 1994, 8(13):1019–1025.
Kapitonov D, Yu RK: Conserved domains of glycosyltransferase. Glycobiology 1999, 9: 961–978. 10.1093/glycob/9.10.961
Busch W, Saier MHJ: The Transporter Classification (TC) system . Crit Rev Biochem Mol Biol 2002, 37(5):287–337. 10.1080/10409230290771528
Drews J: Genomic sciences and the medicine of tomorrow. Nat Biotechnol 1996, 14(11):1516–1518. 10.1038/nbt1196-1516
Gudermann TB, Nurnberg B, Schultz G: Receptors and G proteins as primary components of transmembrane signal transduction. Part 1. G-protein-coupled receptors: structure and function. J Mol Med 1995, 73(2):51–63. 10.1007/BF00270578
Muller G: Towards 3D structures of G protein-coupled receptors: a multidisciplinary approach. Curr Med Chem 2000, 7(9):861–888.
Paulson JC, Colley KJ: Glycosyltransferase. J Biol Chem 1989, 264(30):17645–17618.
Beale SI, Weinstein JD: Biochemistry and regulation of photosynthetic pigment formation in plants and algae. In Biosynthesis of Tetrapyrroles. Edited by: Jordan PM. Amsterdam , Elsevier; 1991:155–235.
Glatz JF, Luiken JJ, van Bilsen M, van der Vusse GJ: Cellular lipid binding proteins as facilitators and regulators of lipid metabolism. Mol Cell Biochem 2002, 239: 3–7. 10.1023/A:1020529918782
Burd CG, Dreyfuss G: Conserved structures and diversity of functions of RNA-binding proteins . Science 1994, 265: 615–621. 10.1126/science.8036511
Kiledjian M, Burd CG, Portman DS, Gorlach M, Dreyfuss G: Structure and function of hnRNP proteins. In RNA-Protein Interactions: Frontiers in Molecular Biology. Edited by: Nagai K, Mattaj IW. Oxford , IRL Press; 1994:127–149.
Draper DE: Themes in RNA-protein recognition. J Mol Biol 1999, 293: 255–270. 10.1006/jmbi.1999.2991
Fierro-Monti I, Mathews MB: Proteins binding to duplexed RNA: one motif, multiple functions. Trends Biochem Sci 2000, 25: 241–246. 10.1016/S0968-0004(00)01580-2
Perculis BA: RNA-binding proteins: If it looks like a sn(o)RNA. Curr Biol 2000, 10: R916-R918. 10.1016/S0960-9822(00)00851-4
Perez-Canadillas JM, Varani G: Recent advances in RNA-protein recognition. Curr Opin Struct Biol 2001, 11: 53–58. 10.1016/S0959-440X(00)00164-0
Chou KC, Zhang CT: Prediction of protein structural classes. Crit Rev Biochem Mol Biol 1995, 30(4):275–349.
Li WZ, Godzik A: Cd-hit: a fast program for clustering and comparing large sets of proteins or nucleotide sequences. Bioinformatics 2006, 22: 1658–1659. 10.1093/bioinformatics/btl158
Li WZ, Jaroszewksi L, Godzik A: Clustering of highly homologous sequences to reduce the size of large protein database. Bioinformatics 2001, 17: 282–283. 10.1093/bioinformatics/17.3.282
Li WZ, Jaroszewksi L, Godzik A: Tolerating some redundancy significantly speeds up clustering of large protein databases. Bioinformatics 2002, 18: 77–82. 10.1093/bioinformatics/18.1.77
Garg A, Bhasin M, Raghava GP: Support vector machine-based method for subcellular localization of human proteins using amino acid compositions, their order, and similarity search. J Biol Chem 2005, 280(15):14427014432. 10.1074/jbc.M411789200
Bhasin M, Raghava GP: ESLpred: SVM-based method for subcellular localization of eukaryotic proteins using dipeptide composition and PSI-BLAST. Nuclei Acid Res 2004, 32(Web Server issue):414–419. 10.1093/nar/gkh350
Xue L, Bajorath J: Molecular descriptors in chemoinformatics, computational combinatorial chemistry, and virtual screening. Comb Chem High Throughput Screen 2000, 3(5):363–372.
Xue L, Godden JW, Bajorath J: Identification of a preferred set of descriptors for compound classification based on principal component analysis. J Chem Inf Comput Sci 1999, 39: 669–704.
Xue Y, Li ZR, Yan CW, Sun LZ, Chen X, Chen YZ: Effect of molecular descriptor feature selection in support vector machine classification of pharmacokinetic and toxicological properties of chemical agents. J Chem Inf Comput Sci 2004, 44(5):1630–1638. 10.1021/ci049869h
Brown RD, Martin YC: Use of structure-activity data to compare structure-based clustering methods and descriptors for use in compound selection. J Chem Inf Comput Sci 1996, 36(3):572–584. 10.1021/ci9501047
Cramer RD, Patterson DE, Bunce JD: Comparative molecular field analysis (CoMFA): effect of shape on binding of steroids to carrier proteins. J Am Chem Soc 1988, 110: 5959–5967. 10.1021/ja00226a005
Glen WG, Dunn WJ, Scott RD: Principal components analysis and partial least squares regression. Tetrahedron Comput Methodol 1989, 2: 349–376. 10.1016/0898-5529(89)90004-3
Matter H: Selecting optimally diverse compounds from structure databases: a validation study of two-dimensional and three-dimensional molecular descriptors. J Med Chem 1997, 40(8):1219–1229. 10.1021/jm960352+
Matter H, Pötter T: Comparing 3D pharmacophore triplets and 2D fingerprints for selecting diverse compound subsets. J Chem Inf Comput Sci 1999, 39: 1211–1225. 10.1021/ci980185h
Patterson DEP, Cramer RD, Ferguson AM, Clark RD, Weinberger LE: Neighborhood behavior: a useful concept for validation of "molecular diversity" descriptors. J Med Chem 1996, 39(16):049 -3059. 10.1021/jm960290n
Xue L, Godden JW, Bajorath J: Evaluation of descriptors and mini-fingerprints for the identification of molecules with similar activity. J Chem Inf Comput Sci 2000, 40(5):1227–1234. 10.1021/ci000327j
Lin HH, Han LY, Zhang HL, Zheng CJ, Xie B, Chen YZ: Prediction of the functional class of DNA-binding proteins from sequence derived structural and physicochemical properties. 2006.
Chen C, Zhou X, Tian Y, Zhou X, Cai P: Predicting protein structural class with pseudo-amino acid composition and support vector machine fusion network. Anal Biochem 2006, 357: 116–121. 10.1016/j.ab.2006.07.022
Furey TS, Cristianini N, Duffy N, Bednarski DW, Schummer M, Haussler D: Support vector machines classification and validation of cancer tissue samples using microarray expression data. Bioinformatics 2000, 16: 906–914. 10.1093/bioinformatics/16.10.906
Yu H, Yang J, Wang W, Han J: Discovering compact and highly discriminative features or feature combinations of drug activities using support vector machines. Proc IEEE Comput Soc Bioinform Conf 2003, (2):220–228.
Boeckmann B, Bairoch A, Apweiler R, Blatter MC, Estreicher A, Gasteiger E, Martin MJ, Michoud K, O'Donovan C, Phan I, Pilbout S, Schneider M: The SWISS-PROT protein knowledgebase and its supplement TrEMBL in 2003. Nuclei Acid Res 2003, 31(1):365–370. 10.1093/nar/gkg095
Bateman A, Birney E, Cerruti L, Durbin R, Etwiller L, Eddy SR, Griffiths–Jones S, Howe KL, Marshall M, Sonnhammer EL: The Pfam protein families database. Nuclei Acid Res 2002, 31(1):276–280. 10.1093/nar/30.1.276
Heyer LJ, Kruglyak S, Yooseph S: Exploring expression data: Identification and analysis of coexpressed genes. Genome Res 1999, 9(11):1106–1115. 10.1101/gr.9.11.1106
Broto P, Moreau G, Vandicke C: Molecular structures: perception, autocorrelation descriptor and SAR studies. Eur J Med Chem 1984, 19: 71–78.
Kawashima S, Kanehisa M: AAindex: amino acid index database. Nuclei Acid Res 2000, 28: 374. 10.1093/nar/28.1.374
Cid H, Bunster M, Canales M, Gazitua F: Hydrophobicity and structural classes in proteins. Protein Eng 1992, 5: 373–375. 10.1093/protein/5.5.373
Bhaskaran R, Ponnuswammy PK: Positional flexibilities of amino acid residues in globular proteins. Int J Pept Protein Res 1988, 32: 242–255.
Charton M, Charton BI: The structural dependence of amino acid hydrophobicity parameters. J Theor Biol 1982, 99: 629–644. 10.1016/0022-5193(82)90191-6
Chothia C: The nature of the accessible and buried surfaces in proteins. J Mol Biol 1976, 15: 1–12. 10.1016/0022-2836(76)90191-1
Bigelow CC: On the average hydrophobicity of proteins and the relation between it and protein structure. J Theor Biol 1967, 16: 187–211. 10.1016/0022-5193(67)90004-5
Charton M: Protein folding and the genetic code: an alternative quantitative model. J Theor Biol 1981, 91: 115–373. 10.1016/0022-5193(81)90377-5
Dayhoff H, Calderone H: Composition of proteins. Atlas of Protein Sequence and Structure 1978, 5: 363–373.
Moreau G, Broto P: Autocorrelation of molecular structures, application to SAR studies. Nour J Chim 1980, 4: 757–767.
Moran PAP: Notes on continuous stochastic phenomena. Biometrika 1950, 37: 17–23.
Geary RC: The contiguity ratio and statistical mapping. Incorp Statist 1954, 5: 115–145. 10.2307/2986645
Cai YD, Liu XJ, Xu X, Chou KC: Support vector machines for prediction of protein subcellular location by incorporating quasi-sequence-order effect. J Cell Biochem 2002, 84(2):343–348. 10.1002/jcb.10030
Chou KC, Cai YD: Using functional domain composition and support vector machines for prediction of protein subcellular location. J Biol Chem 2002, 277: 45765–45769. 10.1074/jbc.M204161200
Jones DD: Amino acid properties and side-chain orientation in proteins: a cross correlation approach. J Theor Biol 1975, 50: 167–183. 10.1016/0022-5193(75)90031-4
Hopp TP, Woods KR: Prediction of protein antigenic determinants from amino acid sequences. Proc Natl Acad Sci USA 1981, 78: 3824–3828. 10.1073/pnas.78.6.3824
Feng ZP: An overview on predicting the subcellular location of a protein. In Silico Biol 2002, 2: 291–303.
Burges CJC: A tutorial on support vector machines for pattern recognition. Data Min Knowl Dis 1998, 2(2):121–167. 10.1023/A:1009715923555
Baldi P, Brunak S, Chauvin Y, Andersen CA, Nielsen H: Assessing the accuracy of prediction algorithms for classification: an overview. Bioinformatics 2000, 16(5):412–424. 10.1093/bioinformatics/16.5.412
Roulston JE: Screening with tumor markers: critical issues. Mol Biotechnol 2002, 20(2):153–162. 10.1385/MB:20:2:153
Provost F, Fawcett T, Kohavi R: The case against accuracy estimation for comparing induction algorithms. In Proc 15th International Conf on Machine Learning. San Francisco, California , Morgan Kaufmann; 1998:445–453.