Skip to main content

Table 1 Dynamic-KNN coverage in partial model with respect to different distance thresholds and voting weight schemes in the cross-validation validation data set. # of seqs: total number of proteins in the set. Distance: distance threshold used in Dynamic-KNN. # of preds: number of predicted proteins and its corresponding proportion in %

From: GODoc: high-throughput protein function prediction using novel k-nearest-neighbor and voting algorithms

Type Dataset # of seqs Distance     Inverse    FunOverlap
# of preds  % # of preds  %
BPO CAFA2-Swiss 8146 Q1  2046 25.12  1882 23.10
Q2  4095 50.27  3654 44.86
Q3  6112 75.03  5167 64.43
CAFA3-Swiss 10,163 Q1  2562 25.21  2309 22.72
Q2  5095 50.13  4470 43.98
Q3  7601 74.79  6333 62.32
CCO CAFA2-Swiss 8114 Q1  2039 25.13  1855 22.86
Q2  4034 49.72  3540 43.63
Q3  6042 74.46  4898 60.36
CAFA3-Swiss 9866 Q1  2548 24.91  2204 22.34
Q2  4922 49.89  4261 43.19
Q3  7357 74.57  5912 59.92
MFO CAFA2-Swiss 5211 Q1  1291 24.77  1204 23.11
Q2  2593 49.76  2366 45.40
Q3  3902 74.88  3405 65.35
CAFA3-Swiss 7017 Q1  1756 25.02  1630 23.23
Q2  3518 50.14  3185 45.39
Q3  5278 75.22  4573 65.17