|
RUS + DT-Spark Weka
|
RUS + RF-Spark/Gini Weka
|
RF MLlib 2.0-Spark/Gini (Avg. Impurity Decrease)
|
---|
Entropy
|
Avg. Impurity Decrease
|
Number of Nodes
|
Normal
|
ROS-100
|
ROS-130
|
RUS
|
---|
Alignment-based Features/Algorithm
|
nw
|
0.789
|
0.520
|
42
|
0.809
|
0.180
|
0.175
|
0.171
|
sw
|
0.982
|
0.360
|
802
|
0.035
|
0.642
|
0.647
|
0.647
|
profile3
|
0.783
|
0.360
|
417
|
0.043
|
0.167
|
0.167
|
0.167
|
profile5
|
0.732
|
0.290
|
235
|
0.033
|
0.004
|
0.001
|
0.007
|
profile7
|
0.712
|
0.240
|
330
|
0.080
|
0.008
|
0.010
|
0.008
|
Alignment-free Features
|
aac
|
0.624
|
0.400
|
1891
|
0.033
|
0.173
|
0.171
|
0.169
|
Auto_Geary
|
0.000
|
0.310
|
64
|
0.000
|
0.000
|
0.000
|
0.000
|
Auto_Moran
|
0.000
|
0.320
|
75
|
0.000
|
0.000
|
0.000
|
0.000
|
Auto_Total
|
0.000
|
0.370
|
1124
|
0.000
|
0.000
|
0.000
|
0.001
|
CTD
|
0.408
|
0.310
|
1012
|
0.070
|
0.134
|
0.133
|
0.137
|
CTD_C
|
0.566
|
0.300
|
1482
|
0.071
|
0.060
|
0.062
|
0.066
|
CTD_D
|
0.407
|
0.320
|
1239
|
0.074
|
0.030
|
0.029
|
0.033
|
CTD_T
|
0.529
|
0.290
|
1385
|
0.076
|
0.028
|
0.035
|
0.036
|
fcm
|
0.265
|
0.310
|
1010
|
0.012
|
0.004
|
0.021
|
0.021
|
2-mers
|
0.158
|
0.390
|
954
|
0.022
|
0.003
|
0.003
|
0.002
|
2-mers_don’t care ps-1
|
0.000
|
0.320
|
847
|
0.000
|
0.000
|
0.000
|
0.000
|
2-mers_ don’t care ps-2
|
0.000
|
0.310
|
768
|
0.001
|
0.000
|
0.000
|
0.000
|
2-mers_ don’t care ps-3
|
0.000
|
0.260
|
772
|
0.000
|
0.000
|
0.000
|
0.001
|
3-mers
|
0.078
|
0.370
|
1523
|
0.064
|
0.006
|
0.005
|
0.006
|
3-mers_ don’t care ps-1
|
0.000
|
0.290
|
600
|
0.001
|
0.000
|
0.000
|
0.001
|
3-mers_ don’t care ps-2
|
0.000
|
0.270
|
653
|
0.001
|
0.000
|
0.000
|
0.001
|
3-mers_ don’t care ps-3
|
0.000
|
0.270
|
602
|
0.002
|
0.000
|
0.000
|
0.001
|
length
|
0.507
|
0.400
|
2890
|
0.353
|
0.166
|
0.165
|
0.154
|
nandy
|
0.109
|
0.260
|
902
|
0.009
|
0.000
|
0.000
|
0.001
|
pseaa10
|
0.000
|
0.240
|
825
|
0.000
|
0.000
|
0.000
|
0.001
|
pseaa3
|
0.611
|
0.380
|
1397
|
0.022
|
0.205
|
0.202
|
0.166
|
pseaa4
|
0.609
|
0.380
|
1652
|
0.112
|
0.155
|
0.156
|
0.184
|
QSO_maxlag_30_weight_01
|
0.280
|
0.240
|
1054
|
0.075
|
0.035
|
0.018
|
0.020
|
QSOCN_maxlag_30
|
0
|
0.250
|
513
|
0.001
|
0.000
|
0.000
|
0.001
|
Alignment-based + Alignment-free Features/Algorithm
|
nw
|
0.789
|
0.280
|
131
|
0.786
|
0.382
|
0.373
|
0.374
|
sw
|
0.987
|
0.470
|
646
|
0.005
|
0.135
|
0.139
|
0.126
|
profile3
|
0.769
|
0.280
|
271
|
0.005
|
0.098
|
0.101
|
0.097
|
profile5
|
0.727
|
0.290
|
230
|
0.016
|
0.168
|
0.168
|
0.137
|
profile7
|
0.710
|
0.260
|
229
|
0.004
|
0.083
|
0.084
|
0.126
|
aac
|
0.623
|
0.190
|
230
|
0.015
|
0.073
|
0.071
|
0.072
|
Auto_Geary
|
0.000
|
0.300
|
11
|
0.000
|
0.000
|
0.000
|
0.000
|
Auto_Moran
|
0.000
|
0.270
|
11
|
0.000
|
0.000
|
0.000
|
0.000
|
Auto_Total
|
0.000
|
0.510
|
147
|
0.001
|
0.000
|
0.000
|
0.000
|
CTD
|
0.411
|
0.360
|
109
|
0.005
|
0.000
|
0.000
|
0.000
|
CTD_C
|
0.570
|
0.340
|
204
|
0.039
|
0.032
|
0.032
|
0.032
|
CTD_D
|
0.411
|
0.390
|
151
|
0.009
|
0.002
|
0.001
|
0.001
|
CTD_T
|
0.531
|
0.320
|
164
|
0.001
|
0.002
|
0.003
|
0.004
|
fcm
|
0.260
|
0.300
|
154
|
0.005
|
0.000
|
0.000
|
0.001
|
2-mers
|
0.155
|
0.200
|
81
|
0.003
|
0.000
|
0.000
|
0.000
|
2-mers_don’t care ps-1
|
0.000
|
0.410
|
104
|
0.000
|
0.000
|
0.000
|
0.000
|
2-mers_ don’t care ps-2
|
0.000
|
0.410
|
98
|
0.000
|
0.000
|
0.000
|
0.000
|
2-mers_ don’t care ps-3
|
0.000
|
0.400
|
82
|
0.001
|
0.000
|
0.000
|
0.000
|
3-mers
|
0.074
|
0.230
|
97
|
0.010
|
0.000
|
0.000
|
0.000
|
3-mers_ don’t care ps-1
|
0.000
|
0.390
|
69
|
0.000
|
0.000
|
0.000
|
0.000
|
3-mers_ don’t care ps-2
|
0.000
|
0.340
|
49
|
0.001
|
0.000
|
0.000
|
0.000
|
3-mers_ don’t care ps-3
|
0.000
|
0.390
|
59
|
0.001
|
0.000
|
0.000
|
0.000
|
length
|
0.504
|
0.230
|
231
|
0.059
|
0.012
|
0.014
|
0.014
|
nandy
|
0.113
|
0.320
|
101
|
0.001
|
0.000
|
0.000
|
0.001
|
pseaa10
|
0.000
|
0.310
|
97
|
0.001
|
0.000
|
0.000
|
0.000
|
pseaa3
|
0.613
|
0.190
|
142
|
0.009
|
0.006
|
0.007
|
0.004
|
pseaa4
|
0.610
|
0.210
|
147
|
0.001
|
0.005
|
0.005
|
0.009
|
QSO_maxlag_30_weight = 0.1
|
0.286
|
0.270
|
108
|
0.020
|
0.001
|
0.001
|
0.000
|
QSO_maxlag_30
|
0.000
|
0.340
|
47
|
0.000
|
0.000
|
0.000
|
0.000
|
- nw: global alignment, sw: local alignment, profile: physicochemical profile from matching regions of aligned sequences at different window sizes (3, 5 and 7), aac: amino acid composition, pseacc: pseudo-amino acid composition at λ = 3,4 and 10, Auto_Geary: Geary’s auto correlation, Auto_Moran: Moran’s auto correlation, Auto_Total: Total auto correlation, fcm: four-color maps, nandy: Nandy’s descriptors, CTD: Composition, Distribution and Transition (Total), CTD_C: Composition, Distribution and Transition (Composition), CTD_D: Composition, Distribution and Transition (Distributions), CTD_T: Composition, Distribution and Transition (Transition), k-mers: 2-mers, 3-mers, spaced words: 2-mers with “don’t care positions” = 1, 2 and 3; 3-mer with “don’t care positions” = 1, 2, 3, QSO: Quasi-Sequence-Order, w = weight factor and maximum lag = 30