BMC Bioinformatics

Table 2 Dataset dimensionalities

From: Semantically linking molecular entities in literature through entity relationships

Relation type	Train instances	Test instances
Protein-Component (ST)	1689	334
Subunit-Complex (ST)	751	163
Equivalence (GENIA - E)	720	129
Functional (GENIA - E)	110	17
Locus (GENIA - E)	11	5
Member-Collection (GENIA - E)	5	0
Misc (GENIA - E)	53	11
Object-Variant (GENIA - E)	14	5
Out-of (GENIA - E)	40	7
Protein-Component (GENIA - E)	222	51
Subunit-Complex (GENIA - E)	108	22
Member-Collection (GENIA - NE)	760	181
Protein-Component (GENIA - NE)	593	174
Subunit-Complex (GENIA - NE)	275	82

Number of positive instances of the various types in the entity relation corpora. ST refers to the BioNLP'11 Shared Task data, while GENIA refers to the GENIA relation corpus. The latter corpus is further divided into embedded (E) and non-embedded (NE) cases. Datasets sufficiently large for classification analysis are in bold.

Back to article page

ISSN: 1471-2105

Contact us

General enquiries: journalsubmissions@springernature.com