Identification of concepts bridging diverse biomedical domains
© Juršič et al; licensee BioMed Central Ltd. 2010
Published: 06 October 2010
In biology and medicine, experts are challenged daily with linking information from various highly specialized subfields. The individual subfields can be considered habitually different domains since experts usually master only one of them. However, many novel discoveries are achieved by gaining new insights and knowledge via fusing two or more diverse fields. In this work we propose a method that reveals key concepts which are the most informative and promising to pursue when bridging diverse domains. We evaluate the results against manually selected bridging concepts studied in papers  and .
Materials and methods
This work focuses on identifying bridging concepts (bridging terms or b-terms) in two datasets, each consisting of a pair of domains. The training dataset consists of titles of articles about migraine (first domain) and magnesium (second domain) with b-terms identified in . In the testing dataset are abstracts about autism and calcineurin with b-terms presented in . In these two pairs of domains (retrieved from PubMed) b-terms are known and verified by the expert to provide potential new discoveries in the field.
Our methodology of b-term detection is the following: 1. Employ text mining to pre-process the texts and encode them in the bag-of-words representation; 2. Calculate the heuristics which favour b-terms over other terms; 3. Sort terms by the best heuristic measure and present the top terms (hopefully representing b-terms) to the expert during interactive exploration of the two domains.
The search for the most promising heuristic is based on two phases: 1. Training – we propose over 40 heuristics, from very simple term-frequency statistics to very elaborate combined measures. We evaluate their quality on the first dataset and select the best one, the so-called b-potential measure calculated as a multiplication of the term’s tf-idf weights in the two centroids of the two domains. 2. Testing - we evaluate the b-potential measure on the second dataset to confirm its domain independence and quality of b-term identification.
Results and conclusion
This work was partially supported by the Slovenian national project Knowledge Technologies and by the EU project FP7-211898 BISON.
- Swanson DR: Migraine and magnesium: eleven neglected connections. Perspectives in Biology and Medicine 1988, 31(4):526–557.View ArticlePubMedGoogle Scholar
- Petrič I, Urbančič T, Cestnik B, Macedoni-Lukšič M: Literature mining method RaJoLink for uncovering relations between biomedical concepts. J. Biomed. Inform. 2009, 42(2):219–227. 10.1016/j.jbi.2008.08.004View ArticlePubMedGoogle Scholar
This article is published under license to BioMed Central Ltd.