A fast and effective dependency graph kernel for PPI relation extraction

Tikk, Domonkos; Palaga, Peter; Leser, Ulf

doi:10.1186/1471-2105-11-S5-P8

Volume 11 Supplement 5

Workshop on Advances in Bio Text Mining

Poster presentation
Open access
Published: 06 October 2010

A fast and effective dependency graph kernel for PPI relation extraction

Domonkos Tikk^1,2,
Peter Palaga¹ &
Ulf Leser¹

BMC Bioinformatics volume 11, Article number: P8 (2010) Cite this article

2739 Accesses
2 Citations
Metrics details

Background

Extraction of protein-protein interactions (PPIs) reported in scientific publications is a core topic of biomedical text mining. The ultimate goal is to devise a PPI extraction method that performs well on large amount of unseen text independently from the training corpus. One popular, machine-learning based approach to PPI extraction builds on the convolution kernels, i.e., similarity functions defined on the parse-based representation of sentences and interactions. Kernel functions differ in (1) the underlying sentence representation (bag-of-words, syntax tree parse, dependency graphs), (2) the substructures retrieved from the sentence representation to define interactions, and (3) calculation of the similarity function.

Method

We present a novel kernel method called k-band shortest path spectrum kernel (kBSPS), an extension of the spectrum tree kernel (SpT) [1]. It combines three ideas: First, interactions are represented as vertex-walks as in SpT but adapted to dependency graphs. The kBSPS kernel includes also edge labels into vertex-walks, thus also exploiting the dependency type of a relationship. Second, it uses a novel similarity function on vertex-walks permitting certain mismatches, thus allowing for linguistic variations. The tolerant matching distinguishes three types of nodes: dependency types (D), candidate entities (E), and other surface tokens (L). Mismatches / matches are scored differently depending on the type of nodes. Third, apart from the shortest path between the proteins of the candidate interaction, kBSPS also adds all nodes within distance k from this path to the vertex-walk representation.

Results

We evaluated kBSPS kernel on the 5 standard PPI benchmark corpora (AIMed, BioInfer, HPRD50, IEPA, LLL) using document-level 10-fold cross-validation (CV) and cross-learning (CL; 4-vs-1) evaluation. CV evaluation is somewhat biased, because the training and the test data have very similar corpus characteristics and machine learners tend to learn that, therefore CL evaluation, where the training and test data sets are drawn from different distributions, provides a more unbiased picture. Our results are compared with three state-of-the-art kernel approaches to PPI extraction (see Table 1).

Table 1 Comparison of kBSPS in terms of AUC, F₁-measure and classification time with other state-of-the-art kernels using the CV and CL evaluation scenarios

Full size table

Conclusion

We have shown that kBSPS kernel is on par with state-of-the-art kernels at the more general CL evaluation. Furthermore, its performance is more stable (drops the least from CV) than other methods. Notably, kBSPS is also much faster than any other kernel, making it applicable to very large corpora.

References

Kuboyama T, Hirata K, Kashima H, Aoki-Kinoshita KF, Yasuda H: A spectrum tree kernel. Information and Media Technologies 2007, 2: 292–299.
Google Scholar
Giuliano C, Lavelli A, Romano L: Exploiting shallow linguistic information for relation extraction from biomedical literature. Proc. of the 11st Conf. of the European Chapter of the Association for Computational Linguistics (EACL'06) Trento, Italy: The Association for Computer Linguistics; 2006, 401–408. [http://acl.ldc.upenn.edu/E/E06/E06–1051.pdf]
Google Scholar
Airola A, Pyysalo S, Björne J, Pahikkala T, Ginter F, et al.: All-paths graph kernel for protein-protein interaction extraction with evaluation of cross-corpus learning. BMC Bioinformatics 2008, 9(Suppl 11):S2. 10.1186/1471-2105-9-S11-S2
Article PubMed Central PubMed Google Scholar

Download references

Acknowledgements

Domonkos Tikk was supported by the Alexander-von-Humboldt Foundation.

Author information

Authors and Affiliations

Knowledge Management in Bioinformatics, Institute for Computer Science, Humboldt-Universität zu Berlin, Unter den Linden 6, 10099, Berlin, Germany
Domonkos Tikk, Peter Palaga & Ulf Leser
Department of Telecommunications and Media Informatics, Budapest University of Technology and Economics, H-1117, Budapest, Magyar Tudósok krt 2, Hungary
Domonkos Tikk

Authors

Domonkos Tikk
View author publications
You can also search for this author in PubMed Google Scholar
Peter Palaga
View author publications
You can also search for this author in PubMed Google Scholar
Ulf Leser
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Domonkos Tikk.

Rights and permissions

Open Access This article is published under license to BioMed Central Ltd. This is an Open Access article is distributed under the terms of the Creative Commons Attribution 2.0 International License (https://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Reprints and permissions

About this article

Cite this article

Tikk, D., Palaga, P. & Leser, U. A fast and effective dependency graph kernel for PPI relation extraction. BMC Bioinformatics 11 (Suppl 5), P8 (2010). https://doi.org/10.1186/1471-2105-11-S5-P8

Download citation

Published: 06 October 2010
DOI: https://doi.org/10.1186/1471-2105-11-S5-P8

Workshop on Advances in Bio Text Mining

A fast and effective dependency graph kernel for PPI relation extraction

Background

Method

Results

Conclusion

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

BMC Bioinformatics

Contact us

Workshop on Advances in Bio Text Mining

A fast and effective dependency graph kernel for PPI relation extraction

Background

Method

Results

Conclusion

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

BMC Bioinformatics

Contact us