Skip to main content

Table 3 Benchmarking FOSTA against the PIRSF dataset

From: Automatically extracting functionally equivalent proteins from SwissProt

Set Families Pairings Basic statistics Evaluation statistics
    TP FP TN FN PPV MCC
A 122 2127 1744 2 3717 383 99.89 0.86
B 1095 18865 12967 23 34656 5898 99.82 0.77
C 474 11221 9146 62 11819 2075 99.33 0.83
D 339 5287 3674 16 4938 1613 99.57 0.72
N 1691 32213 23857 87 50192 8356 99.64 0.79
* 2020 37500 27531 103 55130 9969 99.63 0.79
  1. Set ID: the identifier for each curation set [A='Full/Desc.', B='Full', C='Preliminary', D='None', N=aNnotated (A+B+C), * = All (N+D)]; Curation string: the string that defines the curation set; Families: the number of discrete protein families in the curation set; Pairings: the number of discrete pairings across all families to be tested in FOSTA; Basic statistics: the basic counts of true positives (TP), false positives (FP), true negatives (TN), false negatives (FN); Evaluation statistics: the PPV (positive predictive value, TP/(TP + FP)), and the MCC (Matthews Correlation Coefficient), all rounded to 2dp