Chan WC, Liang PH, Shih YP, Yang UC, Lin WC, Hsu CN: Learning to predict expression efficacy of vectors in recombinant protein production. BMC Bioinform. 2010, 11 (Suppl 1): S21-10.1186/1471-2105-11-S1-S21.
Article
Google Scholar
van den Berg BA, Reinders MJ, Hulsman M, Wu L, Pel HJ, Roubos JA, de Ridder D: Exploring sequence characteristics related to high-level production of secreted proteins in aspergillus Niger. PLoS One. 2012, 7 (10): e45869-10.1371/journal.pone.0045869.
Article
PubMed Central
PubMed
CAS
Google Scholar
Hirose S, Kawamura Y, Yokota K, Kuroita T, Natsume T, Komiya K, Tsutsumi T, Suwa Y, Isogai T, Goshima N, Noguchi T: Statistical analysis of features associated with protein expression/solubility in an in vivo Escherichia coli expression system and a wheat germ cell-free expression system. J Biochem. 2011, 150 (1): 73-81. 10.1093/jb/mvr042.
Article
PubMed
CAS
Google Scholar
Samak T, Gunter D, Wan Z: Prediction of Protein Solubility in E. coli. 2012, Chicago, IL: E-Science (e-Science), 2012 IEEE 8th International Conference on Date of Conference: 8-12 Oct. 2012, 1-8.
Google Scholar
Fang Y, Fang J: Discrimination of soluble and aggregation-prone proteins based on sequence information. Mol BioSyst. 2013, 9 (4): 806-811. 10.1039/c3mb70033j.
Article
PubMed Central
PubMed
CAS
Google Scholar
Smialowski P, Doose G, Torkler P, Kaufmann S, Frishman D: PROSO II-a new method for protein solubility prediction. FEBS J. 2012, 279 (12): 2192-2200. 10.1111/j.1742-4658.2012.08603.x.
Article
PubMed
CAS
Google Scholar
Xiaohui N, Feng S, Xuehai H, Jingbo X, Nana L: Predicting the protein solubility by integrating chaos games representation and entropy in information theory. Expert Syst Appl. 2014, 41 (4): 1672-1679. 10.1016/j.eswa.2013.08.064.
Article
Google Scholar
Huang H, Charoenkwan P, Kao T, Lee H, Chang F, Huang W, Ho S, Shu L, Chen W, Ho S: Prediction and analysis of protein solubility using a novel scoring card method with dipeptide composition. BMC Bioinfomratics. 2012, 13 (17): S3-
CAS
Google Scholar
Wilkinson DL, Harrison RG: Predicting the solubility of recombinant proteins in Escherichia coli. Nat Biotechnol. 1991, 9 (5): 443-448. 10.1038/nbt0591-443.
Article
CAS
Google Scholar
Hirose S, Noguchi T: ESPRESSO: a system for estimating protein expression and solubility in protein expression systems. Proteomics. 2013, 13 (9): 1444-1456. 10.1002/pmic.201200175.
Article
PubMed
CAS
Google Scholar
Quinlan JR: C4.5: Programs for Machine Learning. Vol: 1. 1993, USA: Morgan Kaufmann
Google Scholar
Cover T, Hart P: Nearest neighbor pattern classification. Inform Theory IEEE Transac. 1967, 13 (1): 21-27.
Article
Google Scholar
Rosenblatt F: Principles of Neurodynamics. 1962, New York: Spartan
Google Scholar
Rumelhart DE, Hinton GE, Williams RJ: Learning Internal Representations by Error Propagation. Parallel Distributed Processing: Explorations in the Microstructure of Cognition. 1985, California University San Diego La Jolla Institute for Cognitive Science, Technical rept. Mar-Sep 1985. (No. ICS-8506)
Google Scholar
Cortes C, Vapnik V: Support-vector networks. Mach Learn. 1995, 20 (3): 273-297.
Google Scholar
Bertone P, Kluger Y, Lan N, Zheng D, Christendat D, Yee A, Edwards AM, Arrowsmith CH, Montelione GT, Gerstein M: SPINE: An integrated tracking database and data mining approach for identifying feasible targets in high throughput structural proteomics. Nucleic Acids Res. 2001, 29 (13): 2884-2898. 10.1093/nar/29.13.2884.
Article
PubMed Central
PubMed
CAS
Google Scholar
Magnan CN, Randall A, Baldi P: SOLpro: accurate sequence-based prediction of protein solubility. Bioinformatics. 2009, 25 (17): 2200-2207. 10.1093/bioinformatics/btp386.
Article
PubMed
CAS
Google Scholar
Davis GD, Elisee C, Newham DM, Harrison RG: New fusion protein systems designed to give soluble expression in Escherichia coli. Biotechnol Bioeng. 1999, 65 (4): 382-388. 10.1002/(SICI)1097-0290(19991120)65:4<382::AID-BIT2>3.0.CO;2-I.
Article
PubMed
CAS
Google Scholar
Smialowski P, Martin-Galiano AJ, Mikolajka A, Girschick T, Holak TA, Frishman D: Protein solubility: sequence based prediction and experimental verification. Bioinformatics. 2007, 23 (19): 2536-2542. 10.1093/bioinformatics/btl623.
Article
PubMed
CAS
Google Scholar
Diaz AA, Tomba E, Lennarson R, Richard R, Bagajewicz MJ, Harrison RG: Prediction of protein solubility in Escherichia coli using logistic regression. Biotechnol Bioeng. 2010, 105 (2): 374-383. 10.1002/bit.22537.
Article
PubMed
CAS
Google Scholar
Chang CCH, Song J, Tey BT, Ramanan RN: Bioinformatics Approaches for Improved Recombinant Protein Production in Escherichia coli: Protein Solubility Prediction. 2013, Oxford: Briefings in bioinformatics, bbt057, First published online August 7, 2013. doi:10.1093/bib/bbt057
Google Scholar
Stiglic G, Kocbek S, Pernek I, Kokol P: Comprehensive decision tree models in bioinformatics. PLoS One. 2012, 7 (3): e33812-10.1371/journal.pone.0033812.
Article
PubMed Central
PubMed
CAS
Google Scholar
Agostini F, Vendruscolo M, Tartaglia GG: Sequence-based prediction of protein solubility. J Mol Biol. 2012, 421 (2): 237-241.
Article
PubMed
CAS
Google Scholar
Kocbek S, Stiglic G, Pernek I, Kokol P: Stability of different feature selection methods for selecting protein sequence descriptors in protein solubility classification problem. Transition. 2010, 7 (21): 50-55.
Google Scholar
Niwa T, Ying BW, Saito K, Jin W, Takada S, Ueda T, Taguchi H: Bimodal protein solubility distribution revealed by an aggregation analysis of the entire ensemble of Escherichia coli proteins. Proc Natl Acad Sci. 2009, 106 (11): 4201-4206. 10.1073/pnas.0811922106.
Article
PubMed Central
PubMed
CAS
Google Scholar
Kumar P, Jayaraman VK, Kulkarni BD: Granular Support Vector Machine Based Method for Prediction of Solubility of Proteins on Overexpression in Escherichia coli. Pattern Recognition and Machine Intelligence, Second International Conference, PReMI 2007, Kolkata, India. 2007, Berlin Heidelberg: Springer, 406-415. Proceedings
Google Scholar
Idicula-Thomas S, Kulkarni AJ, Kulkarni BD, Jayaraman VK, Balaji PV: A support vector machine-based method for predicting the propensity of a protein to be soluble or to form inclusion body on overexpression in Escherichia coli. Bioinformatics. 2006, 22 (3): 278-284. 10.1093/bioinformatics/bti810.
Article
PubMed
CAS
Google Scholar
Idicula‒Thomas S, Balaji PV: Understanding the relationship between the primary structure of proteins and its propensity to be soluble on overexpression in Escherichia coli. Protein Sci. 2005, 14 (3): 582-592. 10.1110/ps.041009005.
Article
Google Scholar
Luan C, Qiu S, Finley JB, Carson M, Gray RJ, Huang W, Johnson D, Tsao J, Reboul J, Vaglio P, Hill DE, Vidal M, DeLucas LJ, Luo M: High-throughput expression of C. elegans proteins. Genome Res. 2004, 14 (10b): 2102-2110. 10.1101/gr.2520504.
Article
PubMed Central
PubMed
CAS
Google Scholar
Goh C, Lan N, Douglas SM, Wu B, Echols N, Smith A, Milburn D, Montelione GT, Zhao H, Gerstein M: Mining the structural Genomics Pipeline: identification of protein properties that affect high throughput experimental analysis. J Mol Biol. 2004, 336 (1): 115-130. 10.1016/j.jmb.2003.11.053.
Article
PubMed
CAS
Google Scholar
Christendat D, Yee A, Dharamsi A, Kluger Y, Savchenko A, Cort JR, Booth V, Mackereth CD, Saridakis V, Ekiel I, Kozlov G, Maxwell KL, Wu N, McIntosh LP, Gehring K, Kennedy MA, Davidson AR, Pai EF, Gerstein M, Edwards AM, Arrowsmith CH: Structural Proteomics of an archaeon. Nat Struct Mol Biol. 2000, 7 (10): 903-909. 10.1038/82823.
Article
CAS
Google Scholar
Li ZR, Lin HH, Han LY, Jiang L, Chen X, Chen YZ: PROFEAT: a web server for computing structural and physicochemical features of proteins and peptides from amino acid sequence. Nucleic Acids Res. 2006, 34 (2): W32-W37.
Article
PubMed Central
PubMed
CAS
Google Scholar
Maruyama Y, Wakamatsu A, Kawamura Y, Kimura K, Yamamoto J, Nishikawa T, Kisu Y, Sugano S, Goshima N, Isogai T, Nomura N: Human Gene and Protein Database (HGPD): a novel database presenting a large quantity of experiment-based results in human proteomics. Nucleic Acid Research. 2009, 37 (1): D762-D766.
Article
CAS
Google Scholar
Kouranov A, Xie L, de la Cruz J, Chen L, Westbrook J, Bourne PE, Berman HM: The RCSB PDB information portal for structural genomics. Nucleic Acids Res. 2006, 34 (1): D302-D305.
Article
PubMed Central
PubMed
CAS
Google Scholar
Berman HM, Battistuz T, Bhat TN, Bluhm WF, Bourne PE, Burkhardt K, Feng Z, Gilliland GL, Iype L, Jain S, Fagan P, Marvin J, Padilla D, Ravichandran V, Schneider B, Thanki N, Weissig H, Westbrook JD, Zardecki C: The Protein Data Bank. Acta Crystallographica Section D: Biological Crystallography. 2002, 58 (6): 899-907. 10.1107/S0907444902003451.
Article
Google Scholar
Chen L, Oughtred R, Berman HM, Westbrook J: TargetDB: a target registration database for structural genomics projects. Bioinformatics. 2004, 20 (16): 2860-2862. 10.1093/bioinformatics/bth300.
Article
PubMed
CAS
Google Scholar
Saeys Y, Inza I, Larrañaga P: A review of feature selection techniques in bioinformatics. Bioinformatics. 2007, 23 (19): 2507-2517. 10.1093/bioinformatics/btm344.
Article
PubMed
CAS
Google Scholar
Ben-Bassat M: Pattern Recognition and Reduction of Dimensionality. Handbook of Statistics. Vol: 2. Edited by: Krishnaiah P, Kanal L. 1982, Amsterdam: North-Holland Publishing Co, 773-910.
Google Scholar
Witten IH, Frank E: Data Mining: Practical Machine Learning Tools and Techniques. 2005, USA: Morgan Kaufmann, 2
Google Scholar
Weston J, Pérez-Cruz F, Bousquet O, Chapelle O, Elisseeff A, Schölkopf B: Feature selection and transduction for prediction of molecularbioactivity for drug design. Bioinformatics. 2003, 19: 764-771. 10.1093/bioinformatics/btg054.
Article
PubMed
CAS
Google Scholar
Mann HB, Whitney DR: On a test of whether one of two random variables is stochastically larger than the other. Ann Math Stat. 1947, 18 (1): 50-60. 10.1214/aoms/1177730491.
Article
Google Scholar
Kittler J: Feature Set Search Algorithms. Pattern Recognition and Signal Processing. Edited by: Chen C. 1978
Google Scholar
Siedlecki W, Sklansky J: On automatic feature selection. Int J Pattern Recognit Artif Intell. 1998, 2 (02): 197-220.
Article
Google Scholar
Kononenko I, Šimec E, Robnik-Šikonja M: Overcoming the Myopia of inductive learning algorithms with RELIEFF. Appl Intell. 1997, 7 (1): 39-55. 10.1023/A:1008280620621.
Article
Google Scholar
Breiman L: Random forests. Mach Learn. 2001, 5 (1): 5-32.
Article
Google Scholar
Guyon I, Weston J, Barnhill S, Vapnik V: Gene selection for cancer classification using support vector machines. Mach Learn. 2002, 46 (1-3): 389-422.
Article
Google Scholar
Piatetsky-Shapiro G: Discovery, analysis and presentation of strong rules. Knowledge Discovery in Databases. Edited by: Piatetsky-Shapiro G, Frawley WJ. 1991, Cambridge: MA
Google Scholar
de Ridder D, de Ridder J, Reinders MJ: Pattern recognition in bioinformatics. Brief Bioinform. 2013, 14 (5): 633-647. 10.1093/bib/bbt020.
Article
PubMed
Google Scholar