Skip to main content

Table 2 Analysis of false negative sentences

From: Semi-automated curation of protein subcellular localization: a text mining-based approach to Gene Ontology (GO) Cellular Component curation

Reason(s) for search failure

Percentage of total sentences (n = 78)

Non-standard protein nomenclature (NSN)

39.7% (n = 31)

Missing category term(s) (MT)

3.8% (n = 3)

Information spread over multiple sentences (IMS)

11.5% (n = 9)

Information expressed with <3 categories (IFC)

6.4% (n = 5)

NSN + MT

7.7% (n = 6)

NSN + IMS

3.8% (n = 3)

NSN + IFC

3.8% (n = 3)

MT + IMS

10.3% (n = 8)

MT + IFC

1.3% (n = 1)

NSN + MT + IMS

6.4% (n = 5)

Technical issues

5.1% (n = 4)