Medie and Info-pubmed: 2010 update
© Ohta et al; licensee BioMed Central Ltd. 2010
Published: 06 October 2010
In the recent decades, high-throughput screening methods were established, bringing forth major breakthroughs in the fields of molecular biology and biomedicine. Since researchers in these fields need to interpret an enormous quantity of data and the publication rates of scientific articles are exploding, demands on text mining technology are growing with each passing year.
Medie (http://www-tsujii.is.s.u-tokyo.ac.jp/medie/) and Info-pubmed (http://www-tsujii.is.s.u-tokyo.ac.jp/info-pubmed/) were developed as a response to these information needs. Medie is a general-purpose integrated Pubmed search engine and Info-pubmed is a targeted system for finding information about the interactions of key biomedical entities.
In this work, the first update of these systems since their introduction, we present multiple extensions of the systems based on recent advances in biomedical text mining.
Extensions of Medie and Info-pubmed
Medie and Info-pubmed are based on deep syntactic analysis of sentence structure. To allow users to take advantage of the latest parsing technology, the current release integrates an improved parser .
In an extension of semantic search capabilities, the updated Medie system incorporates extended ontology-based search that allows the query verb to be replaced by any GENIA event ontology (http://www-tsujii.is.s.u-tokyo.ac.jp/GENIA/) term. Such searches are expanded to the set of verbs annotated as expressing the given event in GENIA corpus : for example, a search for Positive regulation will now match activate, induce, etc.
To allow more focused searches, we incorporated the section labeling method of Hirohata et al. , creating search options limiting queries to specific types of sentences such as methods, results and conclusions. The indexing system and search options were further augmented with Pubmed annotation metadata, allowing searches to be limited by MeSH terms, author, or journal.
The initial release of Info-pubmed implemented search for automatically detected protein-protein interactions. We have extended this search capability to include gene-disease associations , allowing the system to be used also to study the epidemiological connections of biomolecules.
Finally, we have extended the coverage of both systems to the entire PubMed and added scheduled update modules that perform daily updates of the system database, fully automating data access, analysis and indexing.
We have introduced extended and updated functionality for Medie and Info-pubmed, search systems integrating state-of-the-art text mining technology. The updates allow advanced semantic searches of the latest published information in all of Pubmed.
- Ninomiya T, Matsuzaki T, Miyao Y, Tsujii J: A log-linear model with an n-gram reference distribution for accurate HPSG parsing. Proceedings of IWPT 2007 2007. Prague, Czech Republic Prague, Czech RepublicGoogle Scholar
- Kim JD, Ohta T, Tsujii J: Corpus annotation for mining biomedical events from literature. BMC Bioinformatics 2008, 9: 10. [ISSN 1471–2105] [ISSN 1471-2105] 10.1186/1471-2105-9-10PubMed CentralView ArticlePubMedGoogle Scholar
- Hirohata K, Okazaki N, Ananiadou S, Ishizuka M: Identifying Sections in Scientific Abstracts using Conditional Random Fields. Proceedings of the Third International Joint Conference on Natural Language Processing (IJCNLP 2008), Hyderabad, India 2008, 381–388.Google Scholar
- Chun Hw, Tsuruoka Y, Kim JD, Shiba R, Nagata N, Hishiki T, Tsujii J: Extraction of Gene-Disease Relations from MedLine using Domain Dictionaries and Machine Learning. Proceedings of ThePacific Symposium on Biocomputing (PSB), Maui, Hawaii, USA 2006, 4–15.Google Scholar
This article is published under license to BioMed Central Ltd.