Skip to main content

Table 3 Sensitivity and specificity performance measures of binary classification for individual input variables taken from datasets

From: A novel strategy for classifying the output from an in silicovaccine discovery pipeline for eukaryotic pathogens using machine learning algorithms

   

Datasets (comprising evidence profiles)

Input variablea

Typeb

Datac

T. gondii

 

Plasmodium

 

C. elegans

 

Benchmark

 
   

SN

SP

SN

SP

SN

SP

SN

SP

Phobius_TM

TM

D

0.57

0.89

0.85

0.90

0.91

0.97

0.74

0.93

Phobius_SP

SP

T

0.52

0.89

0.39

1.00

0.25

0.99

0.49

0.96

SignalP

SP

C

0.52

1.00

0.39

1.00

0.25

1.00

0.39

1.00

TargetP_SP

SP

C

0.67

1.00

0.77

1.00

0.34

1.00

0.56

1.00

TargetP_loc

SP

T

0.67

0.94

0.76

1.00

0.27

1.00

0.56

1.00

TMHMM_AA

TM

C

0.62

0.89

0.66

0.98

0.91

1.00

0.80

1.00

TMHMM_First60

SP

C

0.43

0.93

0.26

1.00

0.37

1.00

0.49

0.97

TMHMM_TM

TM

D

0.57

0.89

0.65

1.00

0.90

1.00

0.77

1.00

WoLF_PSORT

Sub

C

0.76

0.94

0.42

1.00

0.77

0.98

0.60

0.97

WoLF_PSORT_annotation

Sub

T

1.00

0.56

0.92

0.74

1.00

0.72

0.96

0.73

MHCI

B

C

0.76

0.56

0.78

0.84

0.77

0.69

0.74

0.84

MHCII

B

C

0.86

0.39

0.80

0.74

0.90

0.52

0.54

0.84

  1. Abbreviations: SN = sensitivity; SP = specificity; T. gondii = Toxoplasma gondii; Plasmodium = species in the genus Plasmodium including falciparum, yoelii yoelii, and berghei; C. elegans = Caenorhabditis elegans; Benchmark = dataset comprising evidence for T. gondii and Neospora caninum proteins from published studies.
  2. aInput variable = predicted protein characteristic (i.e. a column from evidence profile).
  3. bType = prediction type: transmembrane domains (TM), secretory signal peptide (SP), sub-cellular location (Sub), peptide-MHC binding (B).
  4. cData = data type: discrete (D), continuous (C), text (T).
  5. The values underlined denote the best performing input variable for classifying the published proteins.
  6. Test criteria on input variable for binary classification:
  7. Phobius_TM: YES if number of transmembrane domains > 0 else NO.
  8. Phobius_SP: YES if = ‘Y’ else NO.
  9. SignalP: YES if > 0.5 else NO.
  10. TargetP_SP: YES if > 0.5 else NO.
  11. TargetP_loc: YES if = ‘S’ else NO.
  12. TMHMM_AA: YES if > 0 18$$ else NO.
  13. TMHMM_ First60: YES if > 10$$ else NO.
  14. TMHMM _TM: YES if number of transmembrane domains > 0 else NO.
  15. Wolf_PSORT: YES if > 16$$ else NO.
  16. WoLF_PSORT_annotation: YES if = ‘membrane’ or ‘secreted’ else NO.
  17. MHCI: YES if > 0.5 else NO.
  18. MHCII: YES if > 0.5 else NO.
  19. $$A value recommended by the creator of the program. Specificity = True Negatives True Negatives + False Positives Sensitivity = True Positives True Positives + False Negatives