Skip to main content

Table 1 Comparison of TMpro NN: applying active vs. passive learning algorithms for updating training set from benchmark analysis.

From: Active machine learning for transmembrane helix prediction

 

Methods

# of Proteins in Training-Set

Qok

Qhtm

Qhtm

Qhtm

Q2

    

Fscore

%obs

%prd

 

1

Random

1

14

27

29

25

55

  

2

36

63

67

60

65

  

5

51

82

84

80

70

  

10

54

91

95

88

73

2

Node-Coverage

1

61

94

97

92

75

  

2

61

94

97

91

75

  

5

63

94

97

92

75

  

10

61

94

97

92

75

3

Confusion-Rated

1

14

27

29

25

55

  

2

52

91

95

87

73

  

5

55

91

95

88

73

  

10

59

93

96

89

74

4

Node-Coverage & Confusion-Rated

1

61

94

97

92

75

  

2

59

92

96

88

73

  

5

58

92

96

89

73

  

10

61

94

96

91

74

  1. It can be seen that TMpro achieves high segment accuracy (F-score) even if the classifier is trained with just one protein that is found by Active Learning algorithms. The columns from left to right show: method being evaluated; Number of proteins in training-set; Protein level accuracies: Qok, which is the percentage of proteins in which all experimentally determined segments are predicted correctly, and no extra segments are predicted; that is, there is a one to one match between predicted and experimentally determined segments; Segment F-score which is the geometric mean of Recall and Precision; Recall (Qhtm,%obs, percentage of experimentally determined segments that are predicted correctly); and Precision (Qhtm,%pred percentage of predicted segments that are correct). Q2 is the residue level accuracy when all residues in a protein are considered together, and the Q2 value for the entire set of proteins is the average of that of individual proteins. See [30]for further details on these metrics.