Skip to main content

Table 2 Dataset dimensionalities

From: Semantically linking molecular entities in literature through entity relationships

Relation type

Train instances

Test instances

Protein-Component (ST)

1689

334

Subunit-Complex (ST)

751

163

Equivalence (GENIA - E)

720

129

Functional (GENIA - E)

110

17

Locus (GENIA - E)

11

5

Member-Collection (GENIA - E)

5

0

Misc (GENIA - E)

53

11

Object-Variant (GENIA - E)

14

5

Out-of (GENIA - E)

40

7

Protein-Component (GENIA - E)

222

51

Subunit-Complex (GENIA - E)

108

22

Member-Collection (GENIA - NE)

760

181

Protein-Component (GENIA - NE)

593

174

Subunit-Complex (GENIA - NE)

275

82

  1. Number of positive instances of the various types in the entity relation corpora. ST refers to the BioNLP'11 Shared Task data, while GENIA refers to the GENIA relation corpus. The latter corpus is further divided into embedded (E) and non-embedded (NE) cases. Datasets sufficiently large for classification analysis are in bold.