Skip to main content

Table 1 General properties of the investigated benchmark datasets

From: Representativeness of variation benchmark datasets

dataset

collection

subset of VariBench dataset

original filename

no. of variants

no. of variants mapped to PDB

% mapped to PDB

no. of variants in ExAC

% in ExAC

DS1

VariSNP

 

Neutral single nucleotide variants

446,013

39,081

8.76

  

DS2

VariBench Dataset 1

 

Neutral_dbSNP_build_131_mapped

23,671

2358

9.96

  

DS3

VariBenchDataset 1

 

Pathogenic_SNP_mapped

19,335

10,242

52.97

263

1.36

DS4

VariBench Dataset 2

1

Neutral_dataset_Olatubosun_et_al_with_mapping_annotated

19,459

2245

11.54

  

DS5

VariBench Dataset 2

1

Pathogenic_training_dataset_from_PONP

14,610

7261

49.7

221

1.51

DS6

VariBench Dataset 4

1

Neutral_dataset_from_Thusberg_et_al_clustered_with_mapping

17,623

1743

9.89

  

DS7

VariBench Dataset 4

1

Pathogenic_dataset_Thusberg_et_al_clustered_with_mapping

17,525

9519

54.32

227

1.30

DS8

VariBench Dataset 5

2

Neutral_dataset_Olatubosun_et_al_clustered_with_mapping

14,647

1706

11.65

  

DS9

VariBench Dataset 5

2

Pathogenic_dataset_Olatubosun_et_al_clustered_with_mapping

13,096

6652

50.79

195

1.49

DS10

VariBench Dataset 7

2

Neutral_PON-P2_training_data

13,063

1731

13.25

  

DS11

VariBench Dataset 7

2

Pathogenic_PON-P2_training_data

12,584

6420

51.02

173

1.37

DS12

VariBench Dataset 7

2

Neutral_PON-P2_test_data

1605

150

9.35

  

DS13

VariBench Dataset 7

2

Pathogenic_PON-P2_test_data.csv

1301

481

36.97

23

1.77

DS14

VariBench Dataset 7

2

Neutral_PON-P2_c95_training

8664

953

11

  

DS15

VariBench Dataset 7

2

Pathogenic_PON-P2_c95_training

7151

3728

52.13

81

1.13

DS16

VariBench Dataset 7

2

Neutral_PON-P2_c95_test

1053

82

7.79

  

DS17

VariBench Dataset 7

2

Pathogenic_PON-P2_c95_test

751

272

36.22

12

1.60

DS18

VariBench Dataset 9

 

predictSNP_selected_tool_scores

16,098

4494

27.92

  

DS19

VariBench Dataset 9

 

varibench_selected_tool_scores

10,266

3418

33.29

  

DS20

VariBench Dataset 9

 

exovar_filtered_tool_scores

8850

2985

33.73

  

DS21

VariBench Dataset 9

 

humvar_filtered_tool_scores

40,389

10,990

27.21

  

DS22

PolyPhen-2

 

humvar-2011_12.neutral.humvar.output

21,151

2169

10.25

  

DS23

PolyPhen-2

 

humvar-2011_12.deleterious.humvar.output

22,196

10,290

46.36

342

1.54

DS24

SwissVar

 

SwissVar_latest

75,042

12,749

16.99

16,049

21.39