Identification of concepts bridging diverse biomedical domains

Juršič, Matjaž; Mozetič, Igor; Grčar, Miha; Cestnik, Bojan; Lavrač, Nada

doi:10.1186/1471-2105-11-S5-P4

Volume 11 Supplement 5

Workshop on Advances in Bio Text Mining

Poster presentation
Open access
Published: 06 October 2010

Identification of concepts bridging diverse biomedical domains

Matjaž Juršič¹,
Igor Mozetič¹,
Miha Grčar¹,
Bojan Cestnik^2,1 &
…
Nada Lavrač^1,3

BMC Bioinformatics volume 11, Article number: P4 (2010) Cite this article

1958 Accesses
1 Citations
Metrics details

Background

In biology and medicine, experts are challenged daily with linking information from various highly specialized subfields. The individual subfields can be considered habitually different domains since experts usually master only one of them. However, many novel discoveries are achieved by gaining new insights and knowledge via fusing two or more diverse fields. In this work we propose a method that reveals key concepts which are the most informative and promising to pursue when bridging diverse domains. We evaluate the results against manually selected bridging concepts studied in papers [1] and [2].

Materials and methods

This work focuses on identifying bridging concepts (bridging terms or b-terms) in two datasets, each consisting of a pair of domains. The training dataset consists of titles of articles about migraine (first domain) and magnesium (second domain) with b-terms identified in [1]. In the testing dataset are abstracts about autism and calcineurin with b-terms presented in [2]. In these two pairs of domains (retrieved from PubMed) b-terms are known and verified by the expert to provide potential new discoveries in the field.

Our methodology of b-term detection is the following: 1. Employ text mining to pre-process the texts and encode them in the bag-of-words representation; 2. Calculate the heuristics which favour b-terms over other terms; 3. Sort terms by the best heuristic measure and present the top terms (hopefully representing b-terms) to the expert during interactive exploration of the two domains.

The search for the most promising heuristic is based on two phases: 1. Training – we propose over 40 heuristics, from very simple term-frequency statistics to very elaborate combined measures. We evaluate their quality on the first dataset and select the best one, the so-called b-potential measure calculated as a multiplication of the term’s tf-idf weights in the two centroids of the two domains. 2. Testing - we evaluate the b-potential measure on the second dataset to confirm its domain independence and quality of b-term identification.

Results and conclusion

We experimentally confirmed that the method for identification of concepts bridging diverse biomedical domains using the proposed b-potential measure is the best heuristic for b-term detection and is able to retrieve b-terms approximately 7 times faster compared to a random approach (see Figure 1). Consequently, the b-term identification from the papers [1] and [2] would be considerably simplified by using the b-potential sorted list of terms presented to the experts for a manual selection (as the top of such sorted list is 7 times more probable to contain a b-term in comparison to a random list).

References

Swanson DR: Migraine and magnesium: eleven neglected connections. Perspectives in Biology and Medicine 1988, 31(4):526–557.
Article CAS PubMed Google Scholar
Petrič I, Urbančič T, Cestnik B, Macedoni-Lukšič M: Literature mining method RaJoLink for uncovering relations between biomedical concepts. J. Biomed. Inform. 2009, 42(2):219–227. 10.1016/j.jbi.2008.08.004
Article PubMed Google Scholar

Download references

Acknowledgement

This work was partially supported by the Slovenian national project Knowledge Technologies and by the EU project FP7-211898 BISON.

Author information

Authors and Affiliations

Jožef Stefan Institute, Jamova 39, 1000, Ljubljana, Slovenia
Matjaž Juršič, Igor Mozetič, Miha Grčar, Bojan Cestnik & Nada Lavrač
Temida, d.o.o., Dunajska 51, 1000, Ljubljana, Slovenia
Bojan Cestnik
University of Nova Gorica, Vipavska 13, 5000, Nova Gorica, Slovenia
Nada Lavrač

Authors

Matjaž Juršič
View author publications
You can also search for this author in PubMed Google Scholar
Igor Mozetič
View author publications
You can also search for this author in PubMed Google Scholar
Miha Grčar
View author publications
You can also search for this author in PubMed Google Scholar
Bojan Cestnik
View author publications
You can also search for this author in PubMed Google Scholar
Nada Lavrač
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Matjaž Juršič.

Rights and permissions

Open Access This article is published under license to BioMed Central Ltd. This is an Open Access article is distributed under the terms of the Creative Commons Attribution 2.0 International License (https://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Reprints and permissions

About this article

Cite this article

Juršič, M., Mozetič, I., Grčar, M. et al. Identification of concepts bridging diverse biomedical domains. BMC Bioinformatics 11 (Suppl 5), P4 (2010). https://doi.org/10.1186/1471-2105-11-S5-P4

Download citation

Published: 06 October 2010
DOI: https://doi.org/10.1186/1471-2105-11-S5-P4

Workshop on Advances in Bio Text Mining

Identification of concepts bridging diverse biomedical domains

Background

Materials and methods

Results and conclusion

References

Acknowledgement

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

BMC Bioinformatics

Contact us

Workshop on Advances in Bio Text Mining

Identification of concepts bridging diverse biomedical domains

Background

Materials and methods

Results and conclusion

References

Acknowledgement

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

BMC Bioinformatics

Contact us