BioLMiner and the BioCreative II.5 challenge

Chen, Yifei; Liu, Feng; Manderick, Bernard

doi:10.1186/1471-2105-11-S5-P6

Volume 11 Supplement 5

Workshop on Advances in Bio Text Mining

Poster presentation
Open access
Published: 06 October 2010

BioLMiner and the BioCreative II.5 challenge

Yifei Chen¹,
Feng Liu¹ &
Bernard Manderick¹

BMC Bioinformatics volume 11, Article number: P6 (2010) Cite this article

2133 Accesses
11 Citations
Metrics details

This paper proposes a prototype text mining system, BioLMiner (Biological Literature Miner). BioLMiner can automatically extract useful information from biological literature, like gene mentions, normalized gene mentions, interaction articles, protein-protein interaction pairs, etc. Figure 1 shows the overall system architecture of BioLMiner. In the future, we will automate all communication between the subsystems and plan to make BioLMiner available as open source software.

The input data are the original articles from biological literature databases like MEDLINE [http://medline.cos.com/] or journals like FEBS letters [http://www.elsevier.com/locate/febslet/]. The output data are the annotated articles together with the information extracted. Some existing gene and protein databases and biological resources are used as external background knowledge, like Entrez Gene [http://jura.wi.mit.edu/entrez_gene/], UniProt [http://www.uniprot.org], MINT [http://mint.bio.uniroma2.it], IntAct [http://www.ebi.ac.uk/intact] and BioThesaurus [http://pir.georgetown.edu/iprolink/biothesaurus] .

The core components of BioLMiner are

the Gene Mention Recognizer (GMRer)
the Gene Normalizer (GNer)
the Interaction Article Classifier (IACer)
the Protein-Protein Interaction Pair Extractor (PPIEor)

Two machine learning techniques are used to develop the four components, including Support Vector Machines (SVMs) [1] and Conditional Random Fields (CRFs) [2], to address classification and sequence labeling problems. For GMRer, a hybrid recognizer is developed based on one sequence labeling model using CRFs and two classification model using SVMs. For GNer, IACer and PPIEor, a binary classifier using SVMs is developed respectively. In order to achieve good performance, our main efforts focus on how to design methods to extract rich and informative features and to combine them effectively. These features fuse the information of the context in the article, domain specific knowledge, the analysis using natural language processing (NLP) tools or specific ones to the biological domain (Bio-NLP). A full description of BioLMiner can be found in [3, 4].

BioLMiner participated in the interaction normalization task (INT) using GNer and interaction pair task (IPT) using PPIEor in the BioCreative II.5 challenge [5]. For the INT, the F _β_-1 measure was 0.289, which ranked second of the 10 participating teams for this task. For the IPT, the F _β_-1 measure was 0.252, which ranked first of the 9 participating teams for this task.

The current state of the art performance is far from satisfactory, especially for the IPT. PPI pairs that appear in the figures or tables, span different sentences or interact with themselves cannot be handled well for the moment. More advanced techniques need to be exploited in the future, like anaphora resolution used for semantic analysis to detect the inter-sentence PPI pairs.

References

Vapnik V: The nature of statistical learning theory. New York: Springer; 1995.
Chapter Google Scholar
Lafferty J, McCallum A, Pereira F: Conditional random fields: probabilistic models for segmenting and labeling sequence data. Proceedings of the Eighteenth International Conference on Machine Learning (ICML 2001) 2001, 282–289.
Google Scholar
Chen Y: Biological Literature Miner: Gene Mention Recognition and Protein-Protein Interaction Pair Extraction. PhD thesis. Vrije Universiteit Brussel 2010.
Google Scholar
Liu F: Biological Literature Miner: Gene Normalization and Interaction Article Classification. PhD thesis. Vrije Universiteit Brussel; 2010.
Google Scholar
Krallinger M, Leitner F, Valencia A: The BioCreative II.5 challenge overview. Proceedings of BioCreative II.5 Workshop 2009, 19.
Google Scholar

Download references

Author information

Authors and Affiliations

Computational Modeling Lab, Department of Informatics, Vrije Universiteit Brussel, Brussels, B-1050, Belgium
Yifei Chen, Feng Liu & Bernard Manderick

Authors

Yifei Chen
View author publications
You can also search for this author in PubMed Google Scholar
Feng Liu
View author publications
You can also search for this author in PubMed Google Scholar
Bernard Manderick
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Bernard Manderick.

Rights and permissions

Open Access This article is published under license to BioMed Central Ltd. This is an Open Access article is distributed under the terms of the Creative Commons Attribution 2.0 International License (https://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Reprints and permissions

About this article

Cite this article

Chen, Y., Liu, F. & Manderick, B. BioLMiner and the BioCreative II.5 challenge. BMC Bioinformatics 11 (Suppl 5), P6 (2010). https://doi.org/10.1186/1471-2105-11-S5-P6

Download citation

Published: 06 October 2010
DOI: https://doi.org/10.1186/1471-2105-11-S5-P6

Workshop on Advances in Bio Text Mining

BioLMiner and the BioCreative II.5 challenge

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

BMC Bioinformatics

Contact us

Workshop on Advances in Bio Text Mining

BioLMiner and the BioCreative II.5 challenge

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

BMC Bioinformatics

Contact us