# | Paper | Features |
---|---|---|
1 | [7] | 1. 2-level triangle CGR |
2. Entropy of “2-level triangle CGR” | ||
3. Dipeptide composition based on a different mode of pseudo amino acid composition (PseAAC) | ||
4. Entropy of “dipeptide composition” | ||
2 | [10] | Same as row 9 (Reference [3]) |
3 | [5] | 1. Counts of aromatic amino acids |
2. Counts of buried amino acids | ||
3. Counts of hydrogen bonds | ||
4. Counts of leucine amino acid | ||
5. Counts of arginine amino acid | ||
6. Negative charge | ||
7. Surface composition of amino acids in intracellular proteins of Mesophiles (percent) | ||
8. Beta-strand indices for beta-proteins | ||
9. Flexibility parameter for two rigid neighbours | ||
10. Net charge | ||
11. Counts of nitrogen atoms | ||
12. Long range non-bonded energy per atom | ||
13. Isometric point (pI) | ||
14. Free energies of transfer of AcWl-X-LL peptides from bilayer interface to water | ||
15. Ratio of negative charge amino acids | ||
16. Ratio of net charge of protein | ||
17. Dependence of partition coefficient on ionic strength | ||
4 | [8] | Dipeptide composition (400 features) |
5 | [4] | 1. Reduced features (39 features produced by pepstats): |
a. Molecular weight, number of residues, average residue weight, charge and isoelectric point | ||
b. For each type of amino acid: number, molar percent and DayhoffStat | ||
c. For each physicochemical class of amino acid: number, molar percent, molar extinction coefficient (A280) and extinction coefficient at 1 mg/ml (A280) | ||
2. Dimers (2400 features): | ||
a. Dimers amino acid frequencies which are computed considering gaps of 1–5 amino acid | ||
3. Complete set | ||
a. Reduced features + Dimers | ||
6 | [6] | 1. Amino acid frequencies (18 features): R, N, D, C, Q, E, G, H, I, K, M, F, P, S, T, W, Y, V |
2. Dipeptide frequencies (13 features): AK, CV, EG, GN, GH, HE, IH, IW, MR, MQ, PR, TS, WD | ||
7 | [22] | 1. Monomer, dimer and trimmers using 7 different alphabets (18 features) |
2. Sequence-computed features: | ||
a. Molecular weight | ||
b. Sequence length | ||
c. Isoelectric point | ||
d. GRAVY index | ||
3. Features used in Niwa et al. work [25] | ||
4. Combination of all the above features 1–3. | ||
8 | [23] | 1. Coil |
2. Disorder | ||
3. Hydrophobicity | ||
4. Hydrophilicity | ||
5. β-turn | ||
6. α-helix | ||
9 | [3] | 1. Nucleotide sequence information: |
a. 1-mer | ||
b. Frequencies of 64 codons (3-mer) | ||
c. GC-contents | ||
2. Amin acid sequence information: | ||
a. Polypeptide length | ||
b. Frequencies of 20 single amino acids (1-mer) | ||
c. Frequencies of 8 chemical property groups | ||
d. Frequencies of 5 physical property groups | ||
e. Repeat of amino acids | ||
f. Repeat of 8 chemical property groups | ||
g. Repeat of 5 physical property groups | ||
3. Amino acid structural information: | ||
a. Frequencies of single amino acids in surface area | ||
b. Frequencies of 8 chemical property groups in surface area | ||
c. Frequencies of 5 physical property groups in surface area | ||
d. Number of transmembrane regions | ||
e. Disordered regions: | ||
i. Number of occurrence | ||
ii. Length | ||
iii. Proportion | ||
f. Secondary structures: | ||
i. alpha-helix | ||
ii. Beta-sheet | ||
iii. Others | ||
10 | [24] | 1497 features computed by Protein Feature Server (PROFEAT) [32]: |
1. Group 1: | ||
a. Amino acid composition | ||
b. Dipeptide composition | ||
2. Group 2: Autocorrelation 1 | ||
a. Normalized Moreau-Broto autocorrelation | ||
3. Group 3: Autocorrelation 2 | ||
a. Moran autocorrelation | ||
4. Group 4: Autocorrelation 3 | ||
a. Geary autocorrelation | ||
5. Group 5: | ||
a. Composition | ||
b. Transition | ||
c. Distribution | ||
6. Group 6: Sequence order 1 | ||
a. Sequence-order-coupling number | ||
b. Quasi-sequence-order descriptors | ||
7. Group 7: Sequence order 2 | ||
a. Pseudo amino acid descriptors | ||
11 | [1] | 1. Nucleotide information: |
a. 1-mer | ||
b. 2-mer | ||
c. 3-mer | ||
d. Sequence length | ||
e. GC content | ||
2. Amino Acid information: | ||
a. Features of Wilkinson and Harrison [9] | ||
b. Features of Idicula-Thomas et al. [27] | ||
c. Isoelectric point | ||
d. Peptide statistics | ||
3. Codon Adaptation Index | ||
4. PTMs | ||
12 | [20] | 1. Molecular weight |
2. Cysteine fraction | ||
3. Hydrophobicity-related parameters: | ||
a. Fraction of total number of hydrophobic amino acids | ||
b. Fraction of largest number of contiguous hydrophobic/hydrophilic amino acids | ||
4. Aliphatic index | ||
5. Secondary structure-related properties: | ||
a. Proline fraction | ||
b. Alpha-helix propensity | ||
c. Beta-sheet Propensity | ||
d. Turn-forming residue fraction | ||
e. Alpha-helix propensity/b-sheet propensity | ||
6. Protein–solvent interaction related parameters: | ||
a. Hydrophilicity index | ||
b. pI | ||
c. Approximate charge average | ||
7. Fractions of: Alanine, Arginine, Asparagine, Aspartate, Glutamate, Glutamine, Glycine, Histidine, Isoleucine, Leucine, Lysine, Methionine, Phenylalanine, Serine, Threonine, Tyrosine, Tryptophan and Valine | ||
13 | [17] | 1. Frequencies of amino acid monomers, dimers and trimmers using 7 different alphabets: |
a. Monomer frequencies | ||
i. [Natural-20:M] | ||
ii. [ClustEM-17:M] | ||
iii. [ClustEM-14:M] | ||
iv. [PhysChem-7:M] | ||
v. [BlosumSM-8:M] | ||
vi. [ConfSimi-7:M] | ||
vii. [Hydropho-5:M] | ||
b. Dimer frequencies | ||
i. [PhysChem-7:D] | ||
ii. [ClustEM-14:D] | ||
iii. [ClustEM-17:D] | ||
iv. [BlosumSM-8:D] | ||
v. [Natural-20:D] | ||
vi. [ConfSimi-7:D] | ||
c. Trimmer frequencies | ||
i. [ClustEM-17:T] | ||
ii. [Hydropho-5:T] | ||
iii. [ConfSimi-7:T] | ||
iv. [ClustEM-14:T] | ||
v. [Natural-20:T] | ||
2. Features computed directly: | ||
a. Sequence length | ||
b. Turn-forming residues fraction | ||
c. Absolute charge per residue | ||
d. Molecular weight | ||
e. GRAVY index | ||
f. Aliphatic index | ||
3. Predicted features using the SCRATCH suite of predictors: | ||
a. Beta residues fraction (Predicted by SSpro) | ||
b. Alpha residues fraction (Predicted by SSpro) | ||
c. Number of domains (Predicted by DOMpro) | ||
d. Exposed residues fraction (Predicted by ACCpro, using a 25% relative solvent accessibility cut-off) | ||
14 | [25] | 1. Molecular weight |
2. Isometric point (pI) | ||
3. Ratios of each amino acid content | ||
15 | [19] | 4. For mono-domain proteins: |
a. Word size 1: | ||
S, IL, M, F, DE, A, C, G, R | ||
b. Word size 2: | ||
R + R, R + C, R + E, R + T, N + Q, N + H, N + L, C + S, Q + A, Q + G, Q + I, E + A, E + G, E + K, E + P, E + V, G + P, H + M, L + Y, K + G, K + K, M + G, S + S, T + I, Y + C, Y + I | ||
c. Word size 3: | ||
ST + ST + ST, ST + ST + N, ST + DQE + AH, ST + C + ST, G + M + R, G + K + G, G + P + G, | ||
G + P + N, M + AH + AH, M + C + Y, DQE + G + R, DQE + R + DQE, DQE + M + ST, | ||
DQE + Y + N, DQE + AH + IV, K + R + IV, K + K + ST, P + DQE + DQE, P + DQE + C, | ||
IV + G + IV, L + IV + DQE, N + FW + DQE, N + C + P, AH + ST + ST, AH + K + L, C + FW + Y, C + K + C | ||
5. For multi-domain proteins: | ||
a. Word size 1: | ||
R, D, C, E, G, L, K, M, S, W | ||
b. Word size 2: | ||
A + Y, A + V, R + N, R + E, R + S, R + Y, N + A, D + M, C + T, Q + A, Q + E, E + D, E + G, E + T, G + I, | ||
G + F, G + S, H + C, H + M, H + P, L + G, L + S, K + D, K + G, K + L, K + F, P + L, T + L, T + Y, V + R | ||
c. Word size 3: | ||
ST + ST + ST, ST + P + DQE, ST + IV + K, R + DQE + FW, R + DQE + IV, R + IV + FW, | ||
FW + DQE + FW, M + ST + DQE, M + G + AH, M + FW + DQE, DQE + ST + ST, | ||
DQE + ST + G, DQE + G + K, DQE + IV + R, DQE + IV + L, P + G + ST, IV + ST + P, | ||
L + K + FW, AH + ST + IV, AH + G + IV, AH + AH + M | ||
16 | [26] | 1. Aliphatic index |
2. Frequency of occurrence of residues Cysteine (Cys), Glutanic acid (Glu), Asparagine (Asn) and Tyrosine (Tyr) | ||
3. Reduced class of conformational similarity [CMQLEKRA] | ||
4. Reduced classes of hydrophobicity [CFILMVW] and [NQSTY] | ||
5. Reduced classes of BLOSUM50 substitution matrix [CILMV] | ||
6. The 18 dipeptide composition: [VC], [AE], [VE], [WF], [YF], [AG], [FG], [WG], [HH], [MI], [HK], [KN], [KP], [ER], [YS], [RV], [KY], [TY] | ||
17 | [27] | 1. Physicochemical properties (6 features): |
a. Length of protein | ||
b. Hydropathy index (GRAVY) | ||
c. Aliphatic index | ||
d. Instability index | ||
e. Instability index of N-terminus | ||
f. Net charge | ||
2. Mono-peptide frequencies (20 features) | ||
3. Dipeptide frequencies (400 features) | ||
4. Reduced alphabet set (20 features) | ||
18 | [28] | 1. Aliphatic index (AI) |
2. Instability index of the N terminus | ||
3. Frequency of occurrence of Asn, Thr, and Tyr | ||
4. Tri-peptide score | ||
19 | [29] | 1. Signal peptide |
2. GRAVY | ||
3. Transmembrane helices | ||
4. Number of Cysteines | ||
5. Anchor peptide | ||
6. Prokaryotic membrane lipoprotein lipid attachment site | ||
7. PDB identity | ||
20 | [30] | 1. General sequence composition |
2. Clusters of orthologous groups (COG) assignment | ||
3. Length of hydrophobic stretches | ||
4. Number of low-complexity regions | ||
5. Number of interaction partners | ||
21 | [16] | 1. Single residue composition: I, T, Y |
2. Combined amino acid compositions: KR, DE, DENQ | ||
3. Predicted secondary structure composition: α and coil | ||
4. Presence of signal sequence | ||
5. Amino acid sequence length | ||
6. Number of amino acids in both short and long low complexity regions (over sequence length) | ||
7. Normalized low complexity value for both short and long regions (over sequence length) | ||
8. Minimum GES hydrophobicity score calculated over all amino acids in a 20 residue sequence window | ||
22 | [31] | 1. Hydrophobe |
2. Cplx: a measure of a short complexity region based on the SEG program. | ||
3. Gln composition | ||
4. Asp + Glu composition | ||
5. Ile-composition | ||
6. Phe + Tyr + Trp composition | ||
7. Gly + Ala + Val + Leu + Ile composition | ||
8. His + Lys + Arg composition | ||
9. Trp composition | ||
10. Alpha-helical secondary structure composition | ||
23 | [18] | Same as row 24 (Reference [9]) |
24 | [9] | 1. Charge average approximation (Asp, Glu, Lys and Arg) |
2. Turn-forming residue fraction (Asn, Gly, Pro and Ser) | ||
3. Cysteine fractions | ||
4. Proline fractions | ||
5. Hydrophilicity | ||
6. Molecular weight (Total number of residues) |