- Oral presentation
- Open Access
Entities, relations, events: representing biomolecular semantics
- Sampo Pyysalo1Email author
https://doi.org/10.1186/1471-2105-11-S5-O6
© Pyysalo; licensee BioMed Central Ltd. 2010
- Published: 20 September 2010
Keywords
- Gene Ontology
- Relation Representation
- Text Mining
- Share Task
- Event Extraction
Biomedical information extraction efforts have until recently primarily focused on the detection of mentions of named entities (NEs) (e.g. genes and proteins) and the recognition of simple associations of these entities, predominantly modeled as pairwise relations. While applicable to many key tasks such as the recognition of protein-protein interactions, the limitations of the relation representation are becoming increasingly apparent in the pursuit of advanced extraction and text mining targets such as Gene Ontology annotations and metabolic and signaling pathways.
A number of recent studies have proposed more expressive alternatives to the relation representation, along with annotated resources such as the BioInfer (http://www.it.utu.fi/BioInfer) and GENIA Event (http://www-tsujii.is.s.u-tokyo.ac.jp/GENIA/) corpora. A major step toward practical systems capable of extracting such representations was taken in the BioNLP 2009 Shared Task on Event Extraction [1]. Providing annotation for gene/protein NEs as a starting point, the task centered on the extraction of an event representation that can capture the associations of arbitrary numbers of participants in specified roles (e.g. Theme and Cause). The representation further connects events to specific statements in text and treats them as primary objects of annotation, allowing events to act as participants in other events and to be specified as being negated or stated speculatively.
Mentions of entity names (e.g. p53) serve as the basis for event extraction as they provide a connection to specific real-world entities. However, this choice implies some approximations in representation: statements involving, for example, complex of c-Rel and p50 are modeled as events with the NEs (c-Rel and p50) as participants. Marking either a non-specific term such as complex or the entire phrase as a participant can capture more context, but also opens a new question for automatic processing: what do events involving such entities imply for the NEs that connect the representation to reality?
Entities, relations and events. Entities shown with light blue background with gene/protein names underlined, relations as labeled arcs below the text (asymmetric relations with arrows) and event above, with labeled arcs showing participants and their roles and light green background for the text expressing the event. (Example modified from the BioInfer corpus.)
Whether the detail afforded by such a model is of sufficient practical value to overweigh the challenges in its automatic extraction remains an interesting question for future study.
Authors’ Affiliations
References
- Kim JD, Ohta T, Pyysalo S, Kano Y, Tsujii J: Overview of BioNLP'09 Shared Task on Event Extraction. Proceedings of the BioNLP 2009 Shared Task, Boulder, Colorado 2009, 1–9. [http://www.aclweb.org/anthology/W09–1401]Google Scholar
- Pyysalo S, Ohta T, Kim JD, Tsujii J: Static Relations: a Piece in the Biomedical Information Extraction Puzzle. In Proceedings of Natural Language Processing in Biomedicine (BioNLP) NAACL 2009 Workshop. Boulder, Colorado: Association for Computational Linguistics; 2009:1–9.Google Scholar
Copyright
This article is published under license to BioMed Central Ltd.