Medie and Info-pubmed: 2010 update

Ohta, Tomoko; Matsuzaki, Takuya; Okazaki, Naoaki; Miwa, Makoto; Sætre, Rune; Pyysalo, Sampo; Tsujii, Jun’ichi

doi:10.1186/1471-2105-11-S5-P7

Volume 11 Supplement 5

Workshop on Advances in Bio Text Mining

Poster presentation
Open access
Published: 06 October 2010

Medie and Info-pubmed: 2010 update

Tomoko Ohta¹,
Takuya Matsuzaki¹,
Naoaki Okazaki¹,
Makoto Miwa¹,
Rune Sætre¹,
Sampo Pyysalo¹ &
…
Jun’ichi Tsujii^1,2,3

BMC Bioinformatics volume 11, Article number: P7 (2010) Cite this article

3184 Accesses
7 Citations
Metrics details

Introduction

In the recent decades, high-throughput screening methods were established, bringing forth major breakthroughs in the fields of molecular biology and biomedicine. Since researchers in these fields need to interpret an enormous quantity of data and the publication rates of scientific articles are exploding, demands on text mining technology are growing with each passing year.

Medie (http://www-tsujii.is.s.u-tokyo.ac.jp/medie/) and Info-pubmed (http://www-tsujii.is.s.u-tokyo.ac.jp/info-pubmed/) were developed as a response to these information needs. Medie is a general-purpose integrated Pubmed search engine and Info-pubmed is a targeted system for finding information about the interactions of key biomedical entities.

In this work, the first update of these systems since their introduction, we present multiple extensions of the systems based on recent advances in biomedical text mining.

Extensions of Medie and Info-pubmed

Medie and Info-pubmed are based on deep syntactic analysis of sentence structure. To allow users to take advantage of the latest parsing technology, the current release integrates an improved parser [1].

In an extension of semantic search capabilities, the updated Medie system incorporates extended ontology-based search that allows the query verb to be replaced by any GENIA event ontology (http://www-tsujii.is.s.u-tokyo.ac.jp/GENIA/) term. Such searches are expanded to the set of verbs annotated as expressing the given event in GENIA corpus [2]: for example, a search for Positive regulation will now match activate, induce, etc.

To allow more focused searches, we incorporated the section labeling method of Hirohata et al. [3], creating search options limiting queries to specific types of sentences such as methods, results and conclusions. The indexing system and search options were further augmented with Pubmed annotation metadata, allowing searches to be limited by MeSH terms, author, or journal.

The initial release of Info-pubmed implemented search for automatically detected protein-protein interactions. We have extended this search capability to include gene-disease associations [4], allowing the system to be used also to study the epidemiological connections of biomolecules.

Finally, we have extended the coverage of both systems to the entire PubMed and added scheduled update modules that perform daily updates of the system database, fully automating data access, analysis and indexing.

Figure 1 shows an example search result on Medie illustrating a number of the newly introduced functions.

Conclusions

We have introduced extended and updated functionality for Medie and Info-pubmed, search systems integrating state-of-the-art text mining technology. The updates allow advanced semantic searches of the latest published information in all of Pubmed.

References

Ninomiya T, Matsuzaki T, Miyao Y, Tsujii J: A log-linear model with an n-gram reference distribution for accurate HPSG parsing. Proceedings of IWPT 2007 2007. Prague, Czech Republic Prague, Czech Republic
Google Scholar
Kim JD, Ohta T, Tsujii J: Corpus annotation for mining biomedical events from literature. BMC Bioinformatics 2008, 9: 10. [ISSN 1471–2105] [ISSN 1471-2105] 10.1186/1471-2105-9-10
Article PubMed Central PubMed Google Scholar
Hirohata K, Okazaki N, Ananiadou S, Ishizuka M: Identifying Sections in Scientific Abstracts using Conditional Random Fields. Proceedings of the Third International Joint Conference on Natural Language Processing (IJCNLP 2008), Hyderabad, India 2008, 381–388.
Google Scholar
Chun Hw, Tsuruoka Y, Kim JD, Shiba R, Nagata N, Hishiki T, Tsujii J: Extraction of Gene-Disease Relations from MedLine using Domain Dictionaries and Machine Learning. Proceedings of ThePacific Symposium on Biocomputing (PSB), Maui, Hawaii, USA 2006, 4–15.
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science, University of Tokyo, Tokyo, Japan
Tomoko Ohta, Takuya Matsuzaki, Naoaki Okazaki, Makoto Miwa, Rune Sætre, Sampo Pyysalo & Jun’ichi Tsujii
School of Computer Science, University of Manchester, Manchester, UK
Jun’ichi Tsujii
National Centre for Text Mining, University of Manchester, Manchester, UK
Jun’ichi Tsujii

Authors

Tomoko Ohta
View author publications
You can also search for this author in PubMed Google Scholar
Takuya Matsuzaki
View author publications
You can also search for this author in PubMed Google Scholar
Naoaki Okazaki
View author publications
You can also search for this author in PubMed Google Scholar
Makoto Miwa
View author publications
You can also search for this author in PubMed Google Scholar
Rune Sætre
View author publications
You can also search for this author in PubMed Google Scholar
Sampo Pyysalo
View author publications
You can also search for this author in PubMed Google Scholar
Jun’ichi Tsujii
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Tomoko Ohta.

Rights and permissions

Open Access This article is published under license to BioMed Central Ltd. This is an Open Access article is distributed under the terms of the Creative Commons Attribution 2.0 International License (https://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Reprints and permissions

About this article

Cite this article

Ohta, T., Matsuzaki, T., Okazaki, N. et al. Medie and Info-pubmed: 2010 update. BMC Bioinformatics 11 (Suppl 5), P7 (2010). https://doi.org/10.1186/1471-2105-11-S5-P7

Download citation

Published: 06 October 2010
DOI: https://doi.org/10.1186/1471-2105-11-S5-P7

Workshop on Advances in Bio Text Mining

Medie and Info-pubmed: 2010 update

Introduction

Extensions of Medie and Info-pubmed

Conclusions

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

BMC Bioinformatics

Contact us

Workshop on Advances in Bio Text Mining

Medie and Info-pubmed: 2010 update

Introduction

Extensions of Medie and Info-pubmed

Conclusions

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

BMC Bioinformatics

Contact us