Entities, relations, events: representing biomolecular semantics
© Pyysalo; licensee BioMed Central Ltd. 2010
Published: 20 September 2010
Biomedical information extraction efforts have until recently primarily focused on the detection of mentions of named entities (NEs) (e.g. genes and proteins) and the recognition of simple associations of these entities, predominantly modeled as pairwise relations. While applicable to many key tasks such as the recognition of protein-protein interactions, the limitations of the relation representation are becoming increasingly apparent in the pursuit of advanced extraction and text mining targets such as Gene Ontology annotations and metabolic and signaling pathways.
A number of recent studies have proposed more expressive alternatives to the relation representation, along with annotated resources such as the BioInfer (http://www.it.utu.fi/BioInfer) and GENIA Event (http://www-tsujii.is.s.u-tokyo.ac.jp/GENIA/) corpora. A major step toward practical systems capable of extracting such representations was taken in the BioNLP 2009 Shared Task on Event Extraction . Providing annotation for gene/protein NEs as a starting point, the task centered on the extraction of an event representation that can capture the associations of arbitrary numbers of participants in specified roles (e.g. Theme and Cause). The representation further connects events to specific statements in text and treats them as primary objects of annotation, allowing events to act as participants in other events and to be specified as being negated or stated speculatively.
Mentions of entity names (e.g. p53) serve as the basis for event extraction as they provide a connection to specific real-world entities. However, this choice implies some approximations in representation: statements involving, for example, complex of c-Rel and p50 are modeled as events with the NEs (c-Rel and p50) as participants. Marking either a non-specific term such as complex or the entire phrase as a participant can capture more context, but also opens a new question for automatic processing: what do events involving such entities imply for the NEs that connect the representation to reality?
Whether the detail afforded by such a model is of sufficient practical value to overweigh the challenges in its automatic extraction remains an interesting question for future study.
- Kim JD, Ohta T, Pyysalo S, Kano Y, Tsujii J: Overview of BioNLP'09 Shared Task on Event Extraction. Proceedings of the BioNLP 2009 Shared Task, Boulder, Colorado 2009, 1–9. [http://www.aclweb.org/anthology/W09–1401]Google Scholar
- Pyysalo S, Ohta T, Kim JD, Tsujii J: Static Relations: a Piece in the Biomedical Information Extraction Puzzle. In Proceedings of Natural Language Processing in Biomedicine (BioNLP) NAACL 2009 Workshop. Boulder, Colorado: Association for Computational Linguistics; 2009:1–9.Google Scholar
This article is published under license to BioMed Central Ltd.