Skip to main content

Integrating text mining into high-throughput assay analysis

There are two basic paradigms of use cases for the output of text mining tools in the genomics subfield of biomedical natural language processing (BioNLP). In the most common one, the tool is designed to produce output that will be viewed by an individual researcher, most commonly a database curator or a bench scientist. This use case has been heavily studied, thanks in part to shared tasks like BioCreative and TREC Genomics. The other use case is the integration of text mining into high-throughput assay analysis. This includes tasks ranging from sequence alignment to the evaluation of gene expression array data. In the past, text mining has been applied to these areas either as a post-processing step, or as an integrated part of the analysis algorithm. More recently, our lab has developed Hanalyzer, a 3R tool—a tool for the knowledge-based analysis of high-throughput assays based on Reading, Reasoning, and Reporting about experimental results in the context of pre-existing knowledge, including knowledge from text mining. The tool is designed to facilitate exploration of experimental data, explanation of observed patterns in the light of what is already known about the entities involved, and the generation of novel hypotheses. Results from a study of craniofacial development demonstrate that the system can be used to explain patterns in gene expression and to generate a set of hypotheses about the roles of four genes previously not known to be involved in tongue development. These hypotheses were experimentally validated by in situ hybridization and may have clinical consequences related to cleft lip and palate.

Author information

Affiliations

Authors

Corresponding author

Correspondence to K Bretonnel Cohen.

Rights and permissions

Open Access This article is published under license to BioMed Central Ltd. This is an Open Access article is distributed under the terms of the Creative Commons Attribution 2.0 International License (https://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Reprints and Permissions

About this article

Cite this article

Cohen, K.B. Integrating text mining into high-throughput assay analysis. BMC Bioinformatics 11, O3 (2010). https://doi.org/10.1186/1471-2105-11-S5-O3

Download citation

Keywords

  • Natural Language Processing
  • Text Mining
  • Mining Tool
  • Database Curator
  • Shared Task