# | Paper | Dataset(s) | Feature selection method(s) | Modeling technique(s) | Web server |
---|---|---|---|---|---|
1 | [7] | Bacterial protein sequences with ‘soluble’ and ‘insoluble’ in NCBI are selected randomly. | Wrapper: SVM | Support vector machine | - |
Size: 5692 | |||||
Soluble: 2448 | |||||
Insoluble: 3244 | |||||
2 | [10] | HGPD | Filter: Student’s t-test | Two techniques: | ESPRESSO: |
E. coli | Support vector machine | ||||
Size: 5100 | |||||
Soluble: 1774 | |||||
Insoluble: 3326 | |||||
Wheat germ | Sequence pattern-based method | ||||
Size: 2939 | |||||
Soluble: 1941 | |||||
Insoluble: 998 | |||||
3 | [5] | eSol | Two methods: | Random forest | ProS: |
Size: 1918 | 1. Filter: Student’s t-test | ||||
Soluble: 886 | 2. Wrapper: Random forest | ||||
Insoluble: 1032 | |||||
4 | [8] | Four datasets: | - | Two methods: | SCM: |
Sd957 | Support vector machine | ||||
Scoring card method (SCM) | |||||
Solpro | |||||
PROSO II | |||||
5 | [4] | eSol | - | Four techniques: | - |
Size: 1600 | 1. Support vector machine | ||||
2. Random forest | |||||
3. Conditional inference trees | |||||
4. Rule ensemble | |||||
6 | [6] | PROSO II | Wrapper | A two-layer model: | PROSOII: |
1. Layer 1: Parzen window + logistic regression | |||||
2. Layer 2: Logistic regression | |||||
7 | [22] | eSol | - | Decision tree | - |
Size: 1625 | |||||
Soluble: 843 | |||||
Insoluble: 782 | |||||
8 | [23] | eSol | Wrapper: SVM | Support vector machine | - |
Size: 2159 | |||||
Soluble: 1081 | |||||
Insoluble: 1078 | |||||
9 | [3] | HGPD | Filer: Student’s t-test | Random forest | - |
E. coli | |||||
Size: 7823 | |||||
Soluble: 2796 | |||||
Insoluble: 5027 | |||||
Wheat germ | |||||
Size: 3955 | |||||
Soluble: 2739 | |||||
Insoluble: 1216 | |||||
10 | [24] | SOLP | Seven methods: | Support vector machine | - |
1. Filter: Information gain | |||||
2. Filter: Gain ratio | |||||
3. Filter: Chi squared | |||||
4. Filter: Symmetrical uncertainty | |||||
5. Wrapper: ReliefF | |||||
6. Wrapper: SVM recursive feature elimination (SvmRfe) | |||||
7. Embedded: One attribute rule | |||||
11 | [16] | 121genes from different species were expressed in 6 different vectors. | Feature selection package in LIBSVM: Filter (F-score) + Wrapper (SVM) | Support vector machine | - |
Size: 726 | |||||
Soluble: 231 | |||||
Insoluble: 236 | |||||
Non-expressed: 259 | |||||
12 | [20] | A database collected through literature search. | N/A | Logistic regression | |
Size: 212 | |||||
Soluble: 52 | |||||
Insoluble: 160 | |||||
13 | [17] | Solpro | Wrapper | A two- layer model: | SOLpro: |
1. Layer 1: 20 Support vector machines | |||||
2. Layer 2: One support vector machine | |||||
14 | [25] | eSol | Using histogram | Support vector machine | - |
15 | [19] | PROSO | Two methods: | A two-layer model: | PROSO: |
1. Wrapper | Layer 1: Support vector machine | ||||
2. Filter: Symmetrical uncertainty | Layer 2: Naive Bayes | ||||
16 | [26] | Idicula‒Thomas 2006 | N/A | Support vector machine | - |
17 | [27] | Idicula‒Thomas 2006 | Filter: Unbalanced correlation score | Support vector machine | - |
18 | [28] | Idicula‒Thomas 2005 | Filter: Mann–Whitney test | Discriminant analysis (A heuristic approach of computing solubility index (SI)) | - |
19 | [29] | Genes of C. elegans with one expression vector and one Escherichia coli strain. | Filter: Linear correlation coefficient (LCC) | - | - |
Size: 4854 | |||||
Soluble: 1536 | |||||
Insoluble: 3318 | |||||
20 | [30] | TargetDB | Wrapper: Random forest | Decision tree | - |
Size: 27,000 | |||||
21 | [14] | SPINE | Wrapper | Decision tree | - |
Size: 562 | |||||
22 | [31] | SPINE | Embedded: Decision tree | Decision tree | - |
Size: 356 | |||||
Soluble: 213 | |||||
Insoluble: 143 | |||||
23 | [18] | Some genes of E. coli were expressed. | N/A | Regression | - |
Size: 100 | |||||
24 | [9] | Some genes of E. coli were expressed. | N/A | Regression | - |
Size: 81 |