Skip to main content

Table 3 Global statistics comparison between TBGA, BioRel [24], and DTI [10] datasets

From: TBGA: a large-scale Gene-Disease Association dataset for Biomedical Relation Extraction

Dataset

Split

Instances

Bags

Inst.s/bag

Relations

BioRel

Train

534,277

39,969

13.37

125

Validation

114,506

20,675

5.54

Test

114,565

20,756

5.52

DTI

Train

604,303

472,033

1.28

6

Validation

6133

4769

1.29

Test

6312

4817

1.31

TBGA

Train

178,264

85,047

2.10

4

Validation

20,193

10,491

1.92

Test

20,516

10,494

1.96

  1. Statistics are reported separately for each data split. Columns represent, from left to right, the considered granularity level, the data split, the total number of instances and bags, the average number of instances per bag, as well as the total number of relations