Volume 10 Supplement 1
Selected papers from the Seventh Asia-Pacific Bioinformatics Conference (APBC 2009)
Extract interaction detection methods from the biological literature
- Hongning Wang^{1},
- Minlie Huang^{1} and
- Xiaoyan Zhu^{1}Email author
DOI: 10.1186/1471-2105-10-S1-S55
© Wang et al; licensee BioMed Central Ltd. 2009
Published: 30 January 2009
Abstract
Background
Considerable efforts have been made to extract protein-protein interactions from the biological literature, but little work has been done on the extraction of interaction detection methods. It is crucial to annotate the detection methods in the literature, since different detection methods shed different degrees of reliability on the reported interactions. However, the diversity of method mentions in the literature makes the automatic extraction quite challenging.
Results
In this article, we develop a generative topic model, the Correlated Method-Word model (CMW model) to extract the detection methods from the literature. In the CMW model, we formulate the correlation between the different methods and related words in a probabilistic framework in order to infer the potential methods from the given document. By applying the model on a corpus of 5319 full text documents annotated by the MINT and IntAct databases, we observe promising results, which outperform the best result reported in the BioCreative II challenge evaluation.
Conclusion
From the promising experiment results, we can see that the CMW model overcomes the issues caused by the diversity in the method mentions and properly captures the in-depth correlations between the detection methods and related words. The performance outperforming the baseline methods confirms that the dependence assumptions of the model are reasonable and the model is competent for the practical processing.
Background
Interaction detection method extraction
The study of protein interactions is one of the most pressing biological problems. In the literature mining community, considerable efforts have been made to automatically extract the protein-protein interactions (PPI) from the literature [1–3] and some practical systems have been put into use [4, 5].
Nevertheless, little work has been done to automatically extract the interaction detection methods from the literature. The detection methods available to identify protein interactions vary in their level of resolution and the confidence of reliability. Therefore, it is important to identify such detection methods in order to validate the reported interactions. Some interaction databases, such as MINT [6] and IntAct [7], require the interaction entries to be experimentally confirmed. However, manually annotating the detection methods in the literature is time-consuming: on average, the curation of a manuscript takes up 2–3 hours of an expert curator [8]. Therefore, there is great practical demand of automatically extracting the detection methods from the literature.
The first critical assessment of detection method extraction was carried out by the BioCreative II challenge evaluation [9]. But only two groups (out of sixteen) submitted their results.
The diversity of method mentions in the literature is the major obstacle precluding the automatic extraction. In the real situation, different authors prefer different words and phrases to describe the same methods. For example, the detection method "two hybrid" (MI:0018) has 7 related synonyms, e.g. "2-hybrid", "2 H ", "2 h", "classical two hybrid", "Gal4 transcription regeneration", "two-hybrid", "yeast two hybrid", and one exact synonym, e.g. "2 hybrid", in the MI ontology [10] definition (it includes the terms describing the interaction detection methods). Although the ontology has already included so many different descriptions, biologists would just mention "yeast 2-h", which is not included in the ontology, in their manuscripts.
String matching performance.
Precision | Recall | F-Score | |
---|---|---|---|
740 Full Texts | 0.090 | 0.107 | 0.098 |
As Table 1 illustrates, the poor recall performance confirms the serious diversity, and the inferior precision stems from the simple matching algorithm, which does not take the context into consideration, since most of the matched names are not the exact methods applied in the document but the background knowledge. In this sense, the rigid dictionary-based matching strategy fails to address the practical problem.
Another straightforward solution is to treat the extraction issue as a classification problem – for each detection method in the ontology definition, a set of binary classifiers are built to make yes/no decisions [11, 12]. But the traditional discriminative classifiers make little attempt to uncover the probabilistic structure and the correlation within both input and output spaces. In the biological domain, ignoring the correlation within both methods and words would hinder the performance since there are intrinsic relations.
In another point of view, from the perspective of involvement of domain experts, some approaches achieved acceptable results on the small data set. In Rinaldi's work [13], they invited the biologists to summarize the keywords and patterns for the extraction task and manually refined the patterns according to the performance. Obviously, this manner is not suitable for the large-scale data processing and its flexibility is not desirable.
Generative topic model
Nowadays, in the machine learning community, the generative topic model is receiving more and more attentions. Latent Dirichlet Allocation (LDA) [14] is one of the most typical models. LDA reduces the complex process of producing a document into a small number of simple probabilistic steps and thus specifies a probability distribution over all possible documents. Using standard statistical techniques, one can invert the process and infer the set of latent topics responsible for generating a given set of documents [15].
LDA-like topic models are rapidly developed into quite different domains. Xing Wei [16] introduced the LDA model into information retrieval system and improved the retrieval performance; David Mimno [17] proposed the Author-Persona-Topic model to formulate the expertise of authors based on their publications; Fei-Fei Li [18] advanced a hierarchical generative model to classify natural scene in an unsupervised manner.
The advantages of the generative topic models are: 1) it would be easy to postulate complex latent structures responsible for a set of observations; 2) the correlation between different factors could be easily exploited by introducing the latent topic variables.
In this article, in order to extract the detection methods from the biological literature, we propose to formulate the correlation between the detection methods and related word occurrences in a probabilistic framework. In particular, we assume the applied methods are governed by a set of latent topics and the corresponding word descriptions are also influenced by the same topic factors, which characterize the correlation between the methods and related words. Under this setting, we appeal to the generative topic model to capture such latent correlations and infer the potential methods from the observed words by the statistic inference technique.
The intuitive notion behind the proposed model is that: different documents contain informative commonality in the descriptions of the same methods, therefore we propose to discover the common usage patterns for the desired methods from the latent correlations between the methods and related words. This manner is somehow analogous to the idea that to extract templates from the overlapping of different method descriptions. But the diversity in the method mentions brings the traditional template generation algorithms with low support and low confidence problems. Furthermore, when there are multiple methods in one document, the traditional approach would fail to figure out the latent correlations. In contrast, the generative model deals naturally with the missing data and provides a more feasible and theoretical framework.
The paper is organized as follows: in the Methods section, we present detailed descriptions about the proposed model and discuss the inference and parameter estimation procedures for the model; in the Results section, we perform extensive experiments to validate the proposed model; and in the Conclusions section, we would conclude the work and demonstrate our contributions in this paper.
Methods
Correlated Method-Word model
The model can be viewed in the terms of generative process that, the author should first select a set of topics for his/her manuscripts (e.g. physical protein-protein interactions); under different kind of topics, there are different choices of detection methods to confirm the findings (e.g. pull down to confirm protein interactions); the selected methods are represented by the particular word occurrences (e.g. descriptions of the experiment conditions, properties and materials), which are also governed by the selected topics. Therefore, the correlations between the detection methods and related words are characterized by the latent topic factors; and from the observed words, we are able to infer the potential methods in the given document according to such correlations.
Formally, we define a corpus consists of D documents, E methods and V words, and a given document consists of N methods and M words. To simplify the model, we have assumed the topic size k is known and fixed on the whole corpus. In the given document d, we denote θ as the document-specific topic distribution; z = {z_{1}, z_{2}, z_{3},..., z_{ N }} as the particular discrete topic assignments for each method; y = {y_{1}, y_{2}, y_{3},..., y_{ M }} as the indexing variables to indicate which topic factor generates the corresponding word and ϵ as the method distribution under the topics. These are the latent variables. e = {e_{1}, e_{2}, e_{3},..., e_{ N }} and w = {w_{1}, w_{2}, w_{3},..., w_{ M }} are the observed methods and words in document d. Besides, α and η are the parameters of k-dimensional and E-dimensional Dirichlet distributions that postulate the topic and method prior distributions on the corpus and β is a k × V matrix, which represents the word distribution under topics. These are the model parameters.
- 1.
Sample topic proportion θ from the Dirichlet distribution: θ ~Dir(α)
- 2.
For each method e_{ n }, n ∈ {1, 2, 3,..., N}:
- a.
Sample topic factor z_{ n }from the multinomial distribution : z_{ n }~Mul(θ)
- b.
Sample method e_{ n }from the multinomial distribution conditioned on z_{ n }: e_{ n }~p(e_{ n }|ϵ, z_{ n })
- 3.
For each related word w_{ m }, m ∈ {1, 2, 3,..., M}:
- a.
Sample indexing variable y_{ m }from the Uniform distribution conditioned on N: y_{ m }~Unif (1, 2, 3,..., N)
- b.
Sample word w_{ m }from the multinomial distribution conditioned on ${z}_{{y}_{m}}$: w_{ m }~p(w_{ m }|β, ${z}_{{y}_{m}}$)
Our basic notion about each component of this model is that, the discrete occurrences of detection methods and related words in the given document are governed by the topic-specific distributions (e.g. matrix ϵ and β) respectively. We use such conditional distribution to bridge the correlation between the methods and word occurrences: under different topics, there are different choices of detection methods and the corresponding word descriptions. To formulate this notion in a probabilistic framework, we follow the general settings in the LDA model that we assume the document-specific topic proportion θ is drawn from the k-dimensional Dirichlet distribution Dir(α), which determines the topic mixture proportion. Especially, we treat the parameter of method's multinomial distribution ϵ as a k × E matrix (one row represents for each mixture component), and, to avoid over-fitting caused by the unbalanced and sparse method occurrences, we assume that each row of ϵ is independently drawn from the E-dimensional Dirichlet distribution: ϵ_{ i }~Dir(η), which smooths the method distribution under each topic. Each row of matrix β represents the particular word distribution under the topics. Besides, since the correlation between the methods and word occurrences is underlying (a document usually associates with multiple detection methods), we use the indexing variable y to indicate such latent structure between them.
The traditional approach (the left panel of Figure 2) simply assumes the relation between the detection methods and related words is determined by the direct mapping. On the contrary, the CMW model (the right panel of Figure 2) formulates the relationship within a more throughout consideration: via the latent topic factors, word occurrences are formulated as a finite mixture under particular methods, so that they are not restricted to any methods and multiple words could contribute to the same method. This framework is more suitable and robust to deal with the diversity in the method descriptions. Furthermore, the discriminative classification algorithms assume the methods are independent in prior and the words are also independent when observing the given methods. Thus they would neglect the latent patterns within both methods and words. But in the CMW model, different topics govern dissimilar methods and words occurrences, embedding the correlation not only between different methods but also within the related words (see the Correlation between methods and words section and the Methods correlation analysis section for the detailed experiment results).
Efficient dimensional decomposition is explicitly implemented: V-dimensional word space and E-dimensional method space are mapped into the k-dimensional topic space, in which it will be easier for us to reveal the latent correlations between the detection methods and the variant word occurrences.
Inference and parameter estimation
Variational inference
Unfortunately, this posterior distribution is intractable: the couples between the continuous variable θ and discrete variable β, ϵ induce a combinatorial number of terms, making it impossible to efficiently get the exact inference result.
Although the exact inference is intractable, there are a wide variety of approximate inference algorithms can serve for the propose, including: expectation propagation [20], variational inference [21] and Markov chain Monte Carlo (MCMC) [22] etc. For computational efficiency, we develop a variational inference procedure to approximate the lower bound of the desired posterior distribution of methods in a given document.
where the Dirichlet parameters γ, σ and the Multinomial parameters ϕ, λ are free variational parameters.
The meaning of the above variational distribution is that: we discard the dependence among the latent variables by assuming they are independently drawn from the respective distributions. In that case, the aim of the variational inference is to find the optimal variational parameters which could minimize the Kullback-Leibler (KL) divergence between the variational distribution and the true posterior distribution.
- 1.Dirichlet parameter γ:${\gamma}_{i}={\alpha}_{i}+{\displaystyle \sum _{n=1}^{N}{\varphi}_{ni}}$(3)
- 2.Multinomial parameter ϕ:$\mathrm{log}{\varphi}_{ni}\propto {\displaystyle \sum _{m=1}^{M}{\lambda}_{mn}{w}_{m}^{s}{\beta}_{is}}+[\psi ({\gamma}_{i})-\psi ({\displaystyle \sum _{n=1}^{k}{\gamma}_{t}})]+{e}_{n}^{j}[\psi ({\sigma}_{ij})-\psi ({\displaystyle \sum _{t=1}^{E}{\sigma}_{it}})]$(4)
- 3.Multinomial parameter λ:$\mathrm{log}{\lambda}_{mn}\propto {\displaystyle \sum _{i=1}^{k}{\varphi}_{ni}{w}_{m}^{s}{\beta}_{is}}$(5)
- 4.Dirichlet parameter σ:${\sigma}_{ij}={n}_{j}+{\displaystyle \sum _{d=1}^{D}{\displaystyle \sum _{n=1}^{{N}_{d}}{\varphi}_{dni}{e}_{dn}^{j}}}$(6)
These update equations are invoked repeatedly until the relative change in KL is small (< 0.0001%).
Parameter estimation
- 1.Update the Dirichlet parameter α by the Newton-Raphson algorithm:$\frac{\partial L(\alpha )}{\partial {\alpha}_{i}}={\displaystyle \sum _{d=1}^{D}\left\{\psi ({\displaystyle \sum _{t=1}^{k}{\alpha}_{t}})-\psi ({\alpha}_{i})+\psi ({\gamma}_{di})-\psi ({\displaystyle \sum _{t=1}^{k}{\gamma}_{dt}})\right\}}$(8)$\frac{{\partial}^{2}L(\alpha )}{\partial {\alpha}_{i}\partial {\alpha}_{j}}=D\left\{{\psi}^{\prime}({\displaystyle \sum _{t=1}^{k}{\alpha}_{t}})-\delta (i,j){\psi}^{\prime}({\alpha}_{i})\right\}$(9)
- 2.Update the Dirichlet parameter η by the Newton-Raphson algorithm:$\frac{\partial L(\eta )}{\partial {\eta}_{j}}={\displaystyle \sum _{i=1}^{k}\left\{\psi ({\displaystyle \sum _{t=1}^{E}{\eta}_{t}})-\psi ({\eta}_{j})+\psi ({\sigma}_{ij})-\psi ({\displaystyle \sum _{t=1}^{E}{\sigma}_{it}})\right\}}$(10)$\frac{{\partial}^{2}L(\eta )}{\partial {\eta}_{i}\partial {\eta}_{j}}=k\left\{{\psi}^{\prime}({\displaystyle \sum _{t=1}^{E}{\eta}_{t}})-\delta (i,j){\psi}^{\prime}({\eta}_{i})\right\}$(11)
- 3.Update the Multinomial parameter β:${\beta}_{js}\propto {\displaystyle \sum _{d=1}^{D}{\displaystyle \sum _{n=1}^{{N}_{d}}{\displaystyle \sum _{m=1}^{{M}_{d}}{\lambda}_{dmn}{w}_{dm}^{s}{\varphi}_{dnj}}}}$(12)
These update equations correspond to find the maximum likelihood estimation with the expected sufficient statistics for each document taken under the variational posterior.
- 1.
(E-Step) For each document in the training corpus, optimizing the variational parameters (γ, ϕ, λ, σ) according to equations (3) – (6);
- 2.
(M-Step) Maximizing the resulting lower bound on the variational likelihood on the whole corpus with respect to the model parameters (α, β, η) according to equations (8) – (12).
The E-Step and M-Step are repeated until the bound on the likelihood converges (relative change in likelihood is less than 0.001%). The convergency rate of the process depends on the size of parameters in the model, (e.g. number of words, methods and topics). In our experiments (3000 words, 115 methods and up to 500 topics), the algorithm terminates in less than 30 iterations in all the cases.
Results and discussion
We collect 5319 full-text documents from PubMed [23] with method annotations from another two public curated interaction databases: MINT and IntAct. We perform the following pre-processions on the data set: 1) parsing the HTML file; 2) converting the words into lower cases; 3) removing a standard list of 400 stop words, punctuations, and the terms occur less than 50 times; 4) stemming the words to its root by Porter Stemming [24]. We utilize the macro-precision, macro-recall and macro-Fscore [25] to evaluate the performance in average.
Test corpora
We can discover from Figure 3: 1) the 5 dominate detection methods, i.e. pull down (MI:0096), 2 hybrid (MI:0018), coip (MI:0019), anti tag coip (MI:0007) and anti bait coip (MI:0006), take up nearly 59.3% occurrences in the whole corpus; 2) 86.1% (99 out of 115) methods occur in less than 10% documents. In this case, smoothing the estimated parameters is essential to achieve better performance.
Feature selection
The CMW model is proposed to capture the correlation between methods and the "related" words. However, no curations explicitly annotate which words or sentences are related to the curated methods. So we employ χ^{2} statistic [26] to select the most relevant feature words from the whole text.
where A is the number of times t co-occurs with e, B is the number of times t occurs without e, C is the number of times e occurs without t, D is the number of times neither e or t occurs, and N is the total number of documents.
where p(e_{ i }) is the prior probability of method e_{ i }.
In the following experiments, we select the top 3000 terms to build up the feature set according to Eq(13).
Effect of topic factors
where D is the set of testing documents and N_{ d }is the number of methods in the document d.
Better generalization capability is indicated by a lower perplexity over the held-out testing samples. We held out 20% of collection for the testing purpose and used the remaining 80% to train the model, in accordance with 5-fold cross-validation.
Besides understanding the impact of the number of topic factors on the generalization capability, we would be more interested in their explicit effect on the extraction performance. Here, we evaluate the precision and recall performance of the model under different number of topic factors. We use the same data set partition as in Figure 4.
Extraction performance
Since there is few work to compare with, we employ the well studied Naïve Bayes, KNN and SVM as the baseline methods to evaluate the capability of the proposed CMW model. We choose Naïve Bayes because it is the simplest generative model with complete independence assumptions, and KNN model could exploit the heterogeneity among the similar documents. These are the two basic notions in the CMW model. Besides, SVM model is the most powerful discriminative model for the classification task with decent performance [11]. All the baseline models are operating on the same feature set as the CMW model employs.
In the KNN model, we make the prediction by ranking the candidate methods in the union of the unlabeled sample's k-nearest labeled neighbors, and weight the candidate methods by the similarity between the desired unlabeled sample and its neighbors.
In the SVM model, we follow Boutell's strategy [12] to train a set of binary classifiers for each method and predict the unknown methods by the classifiers' output. We use SV M^{ light }[27] toolkit to implement a linear kernel SVM model with the default parameters.
We perform comparisons on different proportions of the data used for training. In this comparison, we set the size of topics in the CMW model to be 250 and k in the KNN model to be 37.
One thing we should note is that, since the data set is unbalanced, we should attend the retrieval performance on the minor methods as well. In the method-level evaluation, the baseline models only retrieve most of the major methods (e.g. the top 5 methods) but ignoring the other minor ones, while the CMW model exhibits superior retrieve power. We demonstrate the coverage performance of each model on the testing set to compare their retrieval capability.
Comparison with BioCreative II best result.
Precision | Recall | F-Score | |
---|---|---|---|
BioCreative II Best Run | 0.506 | 0.522 | 0.483 |
CMW model | 0.654 | 0.545 | 0.543 |
improvement | +29.2% | +4.4% | +12.4% |
Here, we briefly conclude the performance of the CMW model. The extraction performance outperforms the discriminative baseline methods confirms that the dependence assumptions in the proposed CMW model are reasonable. Besides, the traditional discriminative classifiers fail to model the correlation within either the methods or the related words, while in the biological domain such correlations convey important domain dependent information. In this sense, the major advantage of the CMW model is that it properly exploits such informative correlations to reinforce the extraction performance. The improvements against the manually revised templates approach validate that the CMW model does exploit more precise and general patterns for the desired methods from the large-scale statistics, confirming the reasonable underlying semantic structure from another perspective.
Correlation between methods and words
where D is the set of documents associating with the desired method e and M_{ d }is the number of words in the document d.
Top 20 relevant terms for methods.
Method | Terms |
---|---|
x-ray (MI:0114) | structure, crystal, residue, molecule, model, site, form, interface, chain, contact, bond, hydrogen, helix, pp, record, helical, window, surface, linker, segment |
two hybrid (MI:0018) | yeast, two-hybrid, interact, assay, fusion, system, plasmid, clone, cdna, screen, bait, sequence, acid, amino, encode, site, pp, record, domain, plant |
pull down (MI:0096) | gst, fusion, glutathione, pull-down, assay, interact, bead, buffer, wash, yeast, scopus, min, incubate, two-hybrid, antibody, pp, record, system, plasmid, sequence |
anti tag coip (MI:0007) | record, pp, cite, yeast, antibody, strain, panel, anti-flag, saccharomyces, flag, cerevisia, growth, blot, western, flag-tagg, gene, grow, medline, ha, anti-ha |
anti bait coip (MI:0006) | control, buffer, pp, record, isi, bait, cancer, antibody, extract, c-terminus, bead, sirna, tumor, stain, gene, yeast, sds, luciferase, embo, cdna |
coip (MI:0019) | antibody, pp, record, extract, yeast, domain, sequence, expression, blot, cdna, clone, activity, luciferase, growth, transfect, acid, fusion, sirna, mmedta, link |
Methods correlation analysis
By the CMW model, we map different methods into the latent topic space, where we are able to analyze the relationship between the different methods. Meanwhile, there are intrinsic inherit relationships between the methods, defined in the MI ontology and organized as a concept hierarchy.
where ϵ _{ i }is the i th column of ϵ matrix.
Recall that, each row of the multinomial parameter ϵ is the method distribution under a particular topic, so that each column of ϵ represents a method in the topic space. By normalizing ϵ by column, we can represent the different methods over the latent topic factors.
Based on this representation, we employ an accumulative clustering algorithm to perform the hierarchical clustering and utilize a visualization tool gCluto [28] to demonstrate the captured "pedigree" tree. (We only illustrate part of the clustering result because of the page limit.)
Classify irrelevant documents
Although the CMW model is proposed to address the extraction problem in documents with at least one detection method, in most situation, the curators don't know whether the document is PPI related or experimentally confirmed beforehand. So it is necessary to evaluate the model's capability to classify the irrelevant documents.
This measurement indicates the maximum probability of a document containing at least one interaction detection method.
Conclusion
In this paper, we propose a generative probabilistic model, the Correlated Method-Word model, to automatically extract the interaction detection methods from the biological literature. This problem is not well studied by the previous researches. By introducing the latent topic factors, the proposed model formulates the correlation between the detection methods and related words in a probabilistic framework in order to infer the potential methods from the observed words.
In our experiments, the proposed CMW model achieved competitive performance against the other well-studied discriminative classifiers on a corpus of 5319 full text documents. And it outperforms the best result reported in the BioCreative II challenge evaluation (F-Score improved 12.4%). From the promising results, we could see that the proposed CMW model overcomes the diversity in the method descriptions and appropriately solve the detection method extraction issue. Furthermore, the model captures the in-depth relationship not only between the methods and related words (see the Correlation between methods and words section), but also among the different methods (see the Methods correlation analysis section). Most of the discriminative classifiers fail to exploit such relations. The competitive performance confirms that the dependence assumptions in the model are reasonable and it is necessary to model the correlation between the different methods and words in the detection method extraction issue.
Our contributions in this paper lie in: 1) propose a generative probabilistic model with proper underlying semantics for the detection method extraction issue, and the model achieves promising performance; 2) properly model the correlation between the detection methods and related words in the biological literature, which captures the in-depth relationship not only between the methods and related words but also among the different methods.
The CMW model is now integrating to our ONBIRES system [5] to provide on-line service. And in the future work, we are planning to associate the extracted methods with the annotated interaction pairs and retrieve the evidence sentences in the documents, which would provide a more throughout annotation of the protein interactions in the biological literature.
Declarations
Acknowledgements
This work was supported by the Chinese Natural Science Foundation under grant No. 60572084, National High Technology Research and Development Program of China (863 Program) under No. 2006AA02Z321, as well as Tsinghua Basic Research Foundation under grant No. 052220205 and No. 053220002.
This article has been published as part of BMC Bioinformatics Volume 10 Supplement 1, 2009: Proceedings of The Seventh Asia Pacific Bioinformatics Conference (APBC) 2009. The full contents of the supplement are available online at http://www.biomedcentral.com/1471-2105/10?issue=S1
Authors’ Affiliations
References
- Daraselia N, Yuryev A, Egorov S, Novichkova S, Nikitin A, Mazo I: Extracting human protein interactions from MEDLINE using a full-sentence parser. Bioinformatics 2004, 20(5):604–611.View ArticlePubMed
- Ono T, Hishigaki H, Tanigami A, Takagi T: Automated extraction of information on protein protein interactions from the biological literature. IEEE Intelligent Systems 2001, 17(2):155–161.
- Huang M, Zhu X, Hao Y, Payan DG, Qu K, Li M: Discovering patterns to extract protein protein interactions from full texts. Bioinformatics 2004, 20(18):3604–3612.View ArticlePubMed
- Blaschke C, Valencia A: The Frame-Based Module of the SUISEKI Information Extraction System. Bioinformatics 2002, 17(2):14–20.
- Huang M, Zhu X, Ding S, Yu H, Li M: ONBIRES: ONtology-based BIological Relation Extraction System. Proceedings of the Fourth Asia Pacific Bioinformatics Conference 2006, 327–336.
- Molecular INTeraction database Home[http://mint.bio.uniroma2.it/mint/Welcome.do]
- IntAct[http://www.ebi.ac.uk/intact/site/index.jsf]
- Chatr-aryamontri A, Ceol A, Licata L, Cesareni G: Annotating molecular interactions in the MINT database. Proceedings of the second biocreative challenge evaluation workshop 2007, 55–59.
- Krallinger M, Leitner F, Valencia A: Assessment of the Second BioCreative PPI task: Automatic Extraction of Protein-Protein Interactions. Proceedings of the second biocreative challenge evaluation workshop 2007, 41–54.
- S O, L MP, H H, R A: The use of common ontologies and controlled vocabularies to enable data exchange and deposition for complex proteomic experiments. Pacific Symposium on Biocomputing 2005, 186–196.
- Joachims T: Text categorization with support vector machines: learning with many relevant features. European Conference on Machine Learning 1998, 137–142.
- Matthew R, Boutell XS, Luo Jiebo, M Brown C: Learning multi-label scene classification. Pattern Recognition 2004, 37(9):1757–1771.View Article
- Rinaldi F, Kappeler T, Kaijurand K: OntoGene In Biocreative II. Proceedings of the second biocreative challenge evaluation workshop 2007, 193–198.
- Blei DM, Ng AY, Jordan MI: Latent Dirichlet Allocation. The Journal of Machine Learning Research 2003, 3(2–3):993–1022.
- Steyvers M, Griffiths T: Probabilistic Topic Models. In Handbook of Latent Semantic Analysis Edited by: Landauer TK, McNamara DS, Dennis S, Kintsch W, Routledge. 2007, 424–440.
- Wei X, Croft W: LDA-based document models for ad-hoc retrieval. Proceedings of the 29th annual international ACM SIGIR 2006, 178–185.
- Mimno D, McCallum A: Expertise modeling for matching papers with reviewers. Proceedings of the 13th ACM SIGKDD 2007, 500–509.
- Li FF, Perona P: A Bayesian Hierarchical Model for Learning Natural Scene Categories. 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition 2005, 524–531.
- Buntine WL: Operations for Learning with Graphical Models. Journal of Artificial Intelligence Research 1994, 2: 159–225.
- Minka TP: Expectation Propagation for approximate Bayesian inference. Proceedings of the 17th Conference in Uncertainty in Artificial Intelligence 2001, 362–369.
- Attias H: A variational Bayesian framework for graphical models. Advances in Neural Information Processing Systems 2000, 209–215.
- Andrieu C, de Freitas N, Doucet A, Jordan MI: An Introduction to MCMC for Machine Learning. Machine Learning 2003, 50(1–2):5–43.View Article
- PubMed[http://www.ncbi.nlm.nih.gov/pubmed/]
- Porter M: An algorithm for suffix stripping. Program 1980, 14(3):130–137.View Article
- Chai KMA, Chieu HL, Ng HT: Bayesian online classifiers for text classification and filtering. SIGIR '02: Proceedings of the 26th annual international ACM SIGIR 2002, 97–104.View Article
- Yang Y, OPedersen J: A Comparative Study on Feature Selection in Text Categorization. Proceedings of the Fourteenth International Conference on Machine Learning 1997, 412–420.
- Thorsten J: Learning to Classify Text Using Support Vector Machines. Heidelberg, Germany: Springer; 2002.
- Matt Rasmussen, gCluto Home[http://www-users.cs.umn.edu/mrasmus/gcluto/index.shtml]
Copyright
This article is published under license to BioMed Central Ltd. This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.