Skip to main content

Table 3 Benchmarking FOSTA against the PIRSF dataset

From: Automatically extracting functionally equivalent proteins from SwissProt

Set

Families

Pairings

Basic statistics

Evaluation statistics

   

TP

FP

TN

FN

PPV

MCC

A

122

2127

1744

2

3717

383

99.89

0.86

B

1095

18865

12967

23

34656

5898

99.82

0.77

C

474

11221

9146

62

11819

2075

99.33

0.83

D

339

5287

3674

16

4938

1613

99.57

0.72

N

1691

32213

23857

87

50192

8356

99.64

0.79

*

2020

37500

27531

103

55130

9969

99.63

0.79

  1. Set ID: the identifier for each curation set [A='Full/Desc.', B='Full', C='Preliminary', D='None', N=aNnotated (A+B+C), * = All (N+D)]; Curation string: the string that defines the curation set; Families: the number of discrete protein families in the curation set; Pairings: the number of discrete pairings across all families to be tested in FOSTA; Basic statistics: the basic counts of true positives (TP), false positives (FP), true negatives (TN), false negatives (FN); Evaluation statistics: the PPV (positive predictive value, TP/(TP + FP)), and the MCC (Matthews Correlation Coefficient), all rounded to 2dp