Skip to main content

Table 2 Performance comparison of different methods on the twilight zone sequences, i.e. sequences having less than \(40\%\) identity is shown in this table

From: EnsembleFam: towards more accurate protein family prediction in the twilight zone

Dataset

Method

predCount = 1

predCount = 2

predCount = 3

predCount = 4

predCount = 5

predCount \(> 5\)

Identity: \(0 < x \le 30\)

COG-500-1074

EnsembleFam

72.07

81.00

82.82

84.96

85.33

85.27

pHMM

69.54

73.75

55.51

70.62

70.85

73.55

DeepFam

57.14

54.52

49.90

46.92

43.64

35.94

COG-250-1796

EnsembleFam

72.84

77.07

81.02

82.14

84.66

86.45

pHMM

75.39

73.82

73.84

71.02

67.44

72.43

DeepFam

32.44

32.54

30.24

29.53

30.02

28.68

COG-100-2892

EnsembleFam

75.24

79.55

81.21

80.63

82.05

88.95

pHMM

63.44

59.69

53.45

48.16

47.42

57.57

DeepFam

27.30

26.13

25.54

27.62

24.83

25.36

Identity: \(30 < x \le 40\)

COG-500-1074

EnsembleFam

90.96

94.51

95.88

96.16

97.08

97.84

pHMM

62.22

61.20

88.95

87.38

85.19

85.85

DeepFam

58.45

58.32

59.39

58.41

58.37

54.81

COG-250-1796

EnsembleFam

91.54

95.19

95.52

95.95

96.62

97.73

pHMM

63.05

89.41

89.05

87.74

84.82

83.69

DeepFam

47.09

48.38

50.12

51.09

50.73

48.78

COG-100-2892

EnsembleFam

92.92

95.23

96.04

96.35

96.81

97.99

pHMM

87.07

87.78

86.08

84.04

80.16

81.69

DeepFam

38.73

42.62

46.07

48.33

49.30

45.32

  1. The best results are highlighted in bold font. The dataset is divided into six subgroups based on the number of predictions made by EnsembleFam. Using the column “predCount = 5” as an example, the accuracy in this table is computed as follows. For a protein, if EnsembleFam makes 5 function predictions for it, and one of these is correct, the protein is counted as correct in the column “predCount = 5”; if all 5 function predictions are incorrect, the protein is counted as a wrong prediction. For the same protein, regardless of how many function predictions are made by pHMM, as long as one of these is correct, the protein is counted as correct in the column “predCount = 5”; otherwise, the protein is counted as incorrect in the column. As for DeepFam, which makes exactly one prediction for each protein, the same protein is counted as correct in the column “predCount = 5” if and only if the sole DeepFam prediction for it is correct. All the accuracy value showed in the table is the average of 3-fold cross-validation