- Open Access
Simple and efficient machine learning frameworks for identifying protein-protein interaction relevant articles and experimental methods used to study the interactions
© Agarwal et al; licensee BioMed Central Ltd. 2011
- Published: 3 October 2011
Protein-protein interaction (PPI) is an important biomedical phenomenon. Automatically detecting PPI-relevant articles and identifying methods that are used to study PPI are important text mining tasks. In this study, we have explored domain independent features to develop two open source machine learning frameworks. One performs binary classification to determine whether the given article is PPI relevant or not, named “Simple Classifier”, and the other one maps the PPI relevant articles with corresponding interaction method nodes in a standardized PSI-MI (Proteomics Standards Initiative-Molecular Interactions) ontology, named “OntoNorm”.
We evaluated our system in the context of BioCreative challenge competition using the standardized data set. Our systems are amongst the top systems reported by the organizers, attaining 60.8% F1-score for identifying relevant documents, and 52.3% F1-score for mapping articles to interaction method ontology.
Our results show that domain-independent machine learning frameworks can perform competitively well at the tasks of detecting PPI relevant articles and identifying the methods that were used to study the interaction in such articles.
- Development Data
- Negative Instance
- Machine Learning Classifier
- Method Node
- Text Mining Application
Protein-protein interactions (PPI) are responsible for many biological phenomena. Understanding these interactions can greatly benefit biological research; for example, it can help us understand causes of certain diseases which can in turn lead to development of therapeutic interventions. A case of significance of protein-protein interactions can be seen for the BRCA1 and BARD1 proteins, which have been reported to interact with each other and a mutation in BRCA1 can disrupt this interaction, which can lead to breast cancer .
The importance of PPIs has led to the development of several curated databases including IntAct , BioGRID  and MINT . These databases are generally curated manually by humans and store information including the proteins that interact with each other, the articles in which these interactions were detected and the methods that were used to discover these interactions. However, manually curating articles for PPIs is a time consuming process and due to the fast rate of research and rapid increase in amount of published literature, the amount of effort required to maintain such databases has increased significantly. This has spurred the development of text-mining approaches to automate identification of such interactions and help the manual curation process.
One of the important tasks is to identifying the methods used to study PPIs, known as the interaction method task (IMT). IMT helps database curators determine the validity of the reported interactions. Certain methods give better evidence of an interaction than others [5, 6]. The methods sub-ontology in the PSI-MI (Proteomics Standards Initiative-Molecular Interactions) ontology is a controlled vocabulary to which interaction methods can be mapped . Annotating methods with PSI-MI’s methods sub-ontology will help database curation efforts.
To efficiently identify PPI interaction methods, another important task is to first determine if the given article contains protein-protein interaction or not, known as article classification task (ACT). ACT is indispensable for other PPI related text mining applications, such as interaction event detection.
Different approaches have been developed for the ACT task. A simple approach is to make use of n-gram features to train supervised machine learning algorithms, which have been deployed in many similar tasks [8–19]. Normalization and feature selection may be conducted before training the classifiers. Domain-specific adaptations of this approach have been used for this task as well. An modification was proposed to make use of contextual bag-of-words . The context information included the number of protein names that appear in the abstract of an article to be classified with the assumption that the presence of more protein names in the abstract indicates greater likelihood that the article contains protein-protein interaction data. Support vector machine (SVM) classifiers were trained on these contextual bag-of-words features. Other extension work added MeSH terms as features along with selected n-gram features [21–24]. Grover et al. used a “bag-of-nlp” approach where the output of a natural language processing pipeline was augmented with word features to classify articles . Dogan et al. identified the 10-nearest neighbors in the training data of the test article and used the gold standard annotation of these 10-nearest neighbors as features along with the n-gram features .
Approaches that explore features beyond words for classification training have also been proposed. A semi-supervised approach was suggested by , where dependency tree based patterns are automatically learned from the training data. A set of eight patterns were manually seeded for this approach. Another approach made use of information retrieval techniques to identify protein-protein extraction relevant documents . A set of well-known protein interaction related keywords is used as queries. An approach by Kolchinsky et al. made use of features from citation network of relevant literature to classify articles . Kim and Wilbur extracted automatically grammatical patterns from the training corpus and used these patterns for ACT . They found that this approach performs better than the machine learning approaches that were based on bag-of-words representation.
Although a lot of research has been done for the ACT, research for identifying interaction methods is limited. Similar to our goal, most studies in this area attempt to associate method nodes in the PSI-MI ontology to articles. The OntoGene system developed by Rinaldi et al. [23, 30] makes use of pattern matching techniques to identify interaction methods. The system makes use of handcrafted patterns to improve performance. Pattern matching approach has be employed by Lourenco et al.  as well. Dogan et al. combined pattern matching and k-nearest neighbors’ annotations for this task . They also mapped the article’s MeSH terms to PSI-MI nodes to identify relevant method nodes. Use of machine learning-based approaches that view IMT as a document-level classification problem have been reported [24, 32]. They expanded the synonyms for PSI-MI nodes by adding synonyms from UMLS Metathesaurus. Matos et al. approached IMT as an information retrieval problem . The documents were indexed using Lucene and retrieved using method names.
In this study, we report on the development of machine learning frameworks to identify articles that contain protein-protein interaction data and then process these articles to identify the methods that were used to discover protein-protein interactions. Unlike previous approaches many of which rely on human curated data or domain-specific features, our goal is to develop an adaptable framework by exploring domain independent features, which can be generalized to other text mining applications with no or minimum adaptation. For example, our ACT framework can be applied to train and classify any type of text documents, regardless of the domain they belong to. Similarly, our IMT framework can be used to map terms from any ontology to any text. As a result, we explored machine learning-based approaches using features that are domain independent.
The BioCreative (Critical Assessment of Information Extraction systems in Biology) challenge is a community effort to promote the development of biomedical text mining applications. Till date, four BioCreative challenges have been organized. Interaction methods featured in two of these challenges while article classification task featured in the last three challenges [33–37]. The latest BioCreative challenge, BioCreative III, includes both ACT and IMT. We used the data and evaluation provided through BioCreative III to develop and evaluate our machine learning frameworks.
We explored supervised machine learning approaches for both ACT and IMT. The data we used for training is described below, followed by the ACT classification and IMT classification tasks.
Training, development and test data
ACT Data (article abstracts)
Training Data Total
Training Data Positive
Training Data Negative
Development Data Total
Development Data Positive
Development Data Negative
Test Data Total
IMT Data (article full-texts)
Training Number of Articles
Training Number of Annotations
Training Annotations per Article
Development Number of Articles
Development Number of Annotations
Development Annotations per Article
Test Number of Articles
For IMT, the task was to identify interaction methods at a document level and not at interaction or mention level. The methods sub-ontology of the PSI-MI ontology was used to obtain the collection of possible methods. From this sub-ontology, 115 nodes were allowed for IMT. Four nodes from the 115 allowed nodes accounted for roughly half of all annotations; these were (in order of highest to lowest frequency): “anti-bait coimmunoprecipitation”, “anti-tag coimmunoprecipitation”, “pull down” and “two hybrid”. “Anti-bait coimmunoprecipitation” and “anti-tag coimmunoprecipitation” accounted for one third of all annotations. Within the test data, 222 articles out of the 305 articles were annotation-relevant; hence the remaining 83 articles had no annotations assigned to them. The full-text of the article was used for training and testing. Although the full-text articles were originally in PDF format, the organizers of BioCreative III also provided the corresponding files in text format, which were used for training and testing in our experiments.
As noted earlier, the distribution of data in the development data is similar to the distribution of test data. Hence, for tuning, we trained models on the development data and tested them on the training data. We trained two different classifier models – Support Vector Machines (SVM) with polynomial kernel  and Naïve Bayes Multinomial (NBM). We used the implementation provided in the Weka data mining library  (downloaded from: http://www.cs.waikato.ac.nz/ml/weka/).
We normalized all text by lowercasing all characters, removing punctuations, stemming all words (using Porter stemming algorithm [41, 42]) and removing numbers. We then extracted unigrams (individual words) and bigrams (two consecutive words) as features. As this led to a large number of features, we conducted feature selection with two feature scoring algorithms: mutual information and chi-square score. All features were scored with these algorithms and we used the top 20, 50, 100, 400 and 1000 features to train the classifier.
Runs submitted for ACT
Type of features used
Number of features
Unigrams and Bigrams
Unigrams and Bigrams
Unigrams and Bigrams
Unigrams and Bigrams
Unigrams and Bigrams
For ACT, we developed a framework that can apply the feature selection methods described above with different classifier algorithms. The framework is called SimpleClassifier and is available online at http://sourceforge.net/p/simpleclassify/home/. It can be used to train classifiers for any text collection.
Top 10 Unigrams and Bigrams for IMT
Mutual information score
Features used for IMT
Perfect match (2 features)
For each node, checks if (1) the concept name or (2) any synonym name appears in the article
Term match (4 features)
For each node, checks if any unigram/bigram in the node’s (1, 2) concept name or (3, 4) synonyms appears in the article
Term match ratio (4 features)
For each node, the ratio unigram/bigram in the node’s (1, 2) concept name or (3, 4) synonyms that appears in the article
Matched terms mutual information sum (4 features)
Sum of mutual information score of each matching uni-gram/bigram in the node’s (1, 2) concept name or (3, 4) any synonym.
Matched term chi-squared sum (4 features)
Sum of chi-squared value of each matching unigram/bigram in the node’s (1, 2) concept name or (3, 4) any synonym.
The number of times this node is annotated in the training data
Checks if the regular expression-based annotator that was provided by the organizers of BioCreative III annotates the current article-ontology node pair
Checks if the keyword for the ontology node appears in the article
From the recall and precision, the F1-Score was calculated by taking their harmonic mean. The F1-Score obtained was used as a measure of performance during the parameter tuning process, by which we obtained the best number of features for each classifier.
For each article, we identified the evidence sentence from which each interaction method was identified. For this, we calculated a score for each sentence, and the sentence with the highest score was considered to be associated with the interaction method. To calculate the score, the unigrams in the interaction method’s name were looked for in each sentence. If a unigram was present in the sentence, then the unigram’s chi-square value was added to the sentence’s score. If no unigrams were present in the sentence, then a score of 0 was assigned to the sentence. If multiple sentences had the same score, the longest sentence was associated with the interaction method.
As a result, we developed a framework for IMT that can make use of the features mentioned above and conduct feature selection. The framework is called OntoNorm and is available online at http://sourceforge.net/p/ontonorm/home/. It can be used to train models with any ontology and text collection.
Runs submitted for IMT
Number of features
All (21 features)
All (21 features)
All (21 features)
Naïve Bayes Tree
where n is the total number of observations. The area under the curve (AUC iP/R) was measured by drawing the precision/recall curve and interpolating the curve. The area under this curve is the AUC iP/R.
For ACT, the accuracy, sensitivity and specificity of the system were also measured. The accuracy is the ratio of correctly classified instances and all instances, sensitivity is the ratio of true positive instances and all positive instances and specificity is the ratio of true negative instances and all negative instances.
AUC iP/R (%)
AUC iP/R (%)
Results in relation to other systems
We compared the performance of our system with other teams that participated in ACT and IMT tasks for the BioCreative III challenge. Ten teams participated in ACT and eight in IMT. For ACT, compared to other participants, our system’s ranked 2nd when measured by F1-Score and Matthew’s Correlation Coefficient vale, and 5th when measured by AUC iP/R and accuracy. The best performing systems from any team for these measures attained 89.15% accuracy, 61.42% F1-Score, 0.553 Matthew’s Correlation Coefficient value and 67.98% AUC iP/R. Our best performance on these measures was: 87.73% accuracy, 60.80% F1-Score, 0.533 Matthew’s correlation coefficient value and 62.13% AUC iP/R. These results indicate that the performance of our systems was very close to the performance of the best performing system.
Similarly for IMT, our system ranked 3rd when measured by F1-Score, Matthew’s Correlation Coefficient value and AUC iP/R. The best performing systems from any team for these measures attained 55.12% F1-Score, 0.542 Matthew’s Correlation Coefficient value and 35.42% AUC iP/R. Our best performance on these measures was: 52.38% F1-Score, 0.514 Matthew’s Correlation Coefficient value and 30.05 % AUC iP/R.
We have developed supervised machine learning frameworks to identify articles that contain protein-protein interaction data and to map ontology nodes to text of an article. Our goal was to develop these approaches independent of domain knowledge and manual intervention, such that they can be viewed as frameworks that can be applied to other article classification task and ontology mapping tasks. For ACT, our system, Simple Classifier, meets these goals. For IMT, we did modify the ontology by manually adding synonyms and keywords, because of which we cannot claim that OntoNorm meets our goal of being free from manual intervention; however, given an ontology with comprehensive list of synonyms, this manual intervention would be unnecessary. In this sense, OntoNorm can be used to map terms from any given ontologies to any text articles.
For ACT, our approach was simpler than the approach used by many other teams at the BioCreative challenge. Despite the simplicity, our system ranked 2nd amongst 10 teams, and the difference between the performance of the team that ranked 1st and our system was marginal, suggesting that the frameworks we employed in this study are very efficient, competitive and robust. Our SVM-based runs obtained poor AUC iP/R, despite of obtaining good accuracy and F1-score. This was because for most instances, the annotation confidence assigned by the classifier was 100%, which prevented the results to be ranked meaningfully. Except for AUC iP/R, SVM-based models performed better than NBM-based models on the test data, although NBM-based models performed better during tuning. This maybe because the NBM-based models overfit on the training data.
On analyzing incorrectly classified ACT cases, we observed that false positives were seen when an article contained terms that usually indicate protein-protein interaction, but were not used in that context; for example, the article with PMID:19694809 uses the keyword ‘interaction’, but does not indicate protein-protein interaction in this context. At the same time, false negatives were seen when such terms were missing, although the article contained protein-protein interaction data; for example, article with PMID:19724778. The error analysis uncovers one disadvantage of our machine-learning framework that it is based only on lexical features, which may not contain sufficient information and can cause ambiguities in some cases. It also suggests that deep linguistic analysis (e.g. syntactic and semantic analysis) might be needed to further enhance the system’s performance.
For IMT, we identified several domain independent features to classify article-node pairs. We believe that the approach works well, as our system was placed 3rd amongst 8 teams at BioCreative III. We found that tree-based classifier algorithms such as Random Forest and J48 performed better at this task. Most of our errors were seen when annotating nodes “anti-tag coimmunoprecipitation” and “anti-bait coimmunoprecipitation” as “coimmunoprecipitation” was usually mentioned in relevant articles, but whether it was anti-tag or anti-bait coimmunoprecipitation was not explicitly stated. For example, article  was falsely annotated with anti tag coimmunoprecipitation.
We found that unigram related features ranked higher than bigram related features in the IMT task, as 4 out of 5 top features are from unigrams. We speculate that this is because of the high variance when discussing different interactive methods in articles, such that unigram features become more reliable than bigrams.
We have developed machine learning frameworks that make use of domain independent features to classify text (Simple Classifier) and to map nodes in an ontology to text (OntoNorm). These frameworks obtain competitive performance compared with other participant teams when applied on tasks to identify articles that contain protein-protein interaction data and to identify methods from an ontology that were used to study these interactions.
In the future, we may apply our frameworks on other text mining applications. In addition, our current approach for OntoNorm does not make use of the hierarchy of the ontology, which will be investigated and evaluated in the future as well.
We acknowledge the support from the National Library of Medicine, grant number 5R01LM009836 to Hong Yu.
This article has been published as part of BMC Bioinformatics Volume 12 Supplement 8, 2011: The Third BioCreative – Critical Assessment of Information Extraction in Biology Challenge. The full contents of the supplement are available online at http://www.biomedcentral.com/1471-2105/12?issue=S8.
- Hashizume R, Fukuda M, Maeda I, Nishikawa H, Oyake D, Yabuki Y, Ogata H, Ohta T: The RING Heterodimer BRCA1-BARD1 Is a Ubiquitin Ligase Inactivated by a Breast Cancer-derived Mutation. Journal of Biological Chemistry 2001, 276(18):14537–14540. 10.1074/jbc.C000881200View ArticlePubMedGoogle Scholar
- Hermjakob H, Montecchi-Palazzi L, Lewington C, Mudali S, Kerrien S, Orchard S, Vingron M, Roechert B, Roepstorff P, Valencia A, Margalit H, Armstrong J, Bairoch A, Cesareni G, Sherman D, Apweiler R: IntAct: an open source molecular interaction database. Nucleic acids research 2004, 32(Database issue):D452-D455. [http://dx.doi.org/10.1093/nar/gkh052]PubMed CentralView ArticlePubMedGoogle Scholar
- Stark C, Breitkreutz BJ, Reguly T, Boucher L, Breitkreutz A, Tyers M: BioGRID: a general repository for interaction datasets. Nucleic Acids Research 2006, 34(suppl 1):D535-D539.PubMed CentralView ArticlePubMedGoogle Scholar
- Chatr-aryamontri A, Ceol A, Palazzi LM, Nardelli G, Schneider MV, Castagnoli L, Cesareni G: MINT: the Molecular INTeraction database.2007. [http://nar.oxfordjournals.org/cgi/content/short/35/suppl_1/D572]Google Scholar
- Krallinger M: Importance of negations and experimental qualifiers in biomedical literature. Proceedings of the Workshop on Negation and Speculation in Natural Language Processing, NeSp-NLP '10 Morristown, NJ, USA: Association for Computational Linguistics; 2010, 46–49. [http://portal.acm.org/citation.cfm?id=1858959.1858967]Google Scholar
- Orchard S, Montecchi-palazzi L, Hermjakob H, Apweiler R, Orchard S, Montecchi-palazzi L, Hermjakob H, Apweiler R: The use of common ontologies and controlled vocabularies to enable data exchange and deposition for complex proteomic experiments.2005. [http://citeseerx.ist.psu.edu/viewdoc/summary?doi=?doi=10.1.1.111.1176]View ArticleGoogle Scholar
- Hermjakob H, Montecchi-Palazzi L, Bader G, Wojcik J, Salwinski L, Ceol A, Moore S, Orchard S, Sarkans U, von Mering C, Roechert B, Poux S, Jung E, Mersch H, Kersey P, Lappe M, Li Y, Zeng R, Rana D, Nikolski M, Husi H, Brun C, Shanker K, Grant SG, Sander C, Bork P, Zhu W, Pandey A, Brazma A, Jacq B, Vidal M, Sherman D, Legrain P, Cesareni G, Xenarios I, Eisenberg D, Steipe B, Hogue C, Apweiler R: The HUPO PSI’s molecular interaction format-a community standard for the representation of protein interaction data. Nature biotechnology 2004, 22(2):177–183. 10.1038/nbt926View ArticlePubMedGoogle Scholar
- Garcia FC, Puertas E, Hidalgo JMG, Mana M, Mata J: Attribute analysis in biomedical text classification. In Proceedings of the BioCreative II workshop; Madrid, Spain. CNIO; 2007:113–118.Google Scholar
- Cohen AM: Automatically Expanded Dictionaries with Exclusion Rules and Support Vector Machine Text Classifiers: Approaches to the BioCreAtIve 2 GN and PPI-IAS Tasks. In Proceedings of the BioCreative II workshop; Madrid, Spain. CNIO; 2007:169–174.Google Scholar
- Lan M, Tan CL, Su J: A term investigation and majority voting for protein interaction article sub-task 1 (IAS). In Proceedings of the BioCreative II workshop; Madrid, Spain. CNIO; 2007:183–185.Google Scholar
- Shin SY, Kim S, Eom JH, Zhang BT, Sriram R: Identifying Protein-Protein Interaction Sentences Using Boosting and Kernel Methods. In Proceedings of the BioCreative II workshop; Madrid, Spain. CNIO; 2007:187–192.Google Scholar
- Figueroa A, Neumann G: Identifying Protein-Protein interactions in Biomedical publications. In Proceedings of the BioCreative II workshop; Madrid, Spain. CNIO; 2007:217–225.Google Scholar
- Huang M, Ding S, Wang H, Zhu X: Mining Physical Protein-Protein Interactions by Exploiting Abundant Features. In Proceedings of the BioCreative II workshop; Madrid, Spain. CNIO; 2007:237–245.Google Scholar
- Abi-Haider A, Kaur J, Maguitman A, Radivojac P, Retchsteiner A, Verspoor K, Wang Z, Rocha LM: Uncovering Protein-Protein Interactions in the Bibliome. In Proceedings of the BioCreative II workshop; Madrid, Spain. CNIO; 2007:247–255.Google Scholar
- Kolchinsky A, Abi-Haidar A, Kaur J, Hamed AA, Rocha LM: Classification of Protein-Protein Interaction Full-Text Documents Using Text and Citation Network Features. IEEE/ACM Transactions on Computational Biology and Bioinformatics 2010, 7: 400–411.View ArticlePubMedGoogle Scholar
- Lan M, Su J: Empirical Investigations into Full-Text Protein Interaction Article Categorization Task (ACT) in the BioCreative II.5 Challenge. IEEE/ACM Transactions on Computational Biology and Bioinformatics 2010, 7: 421–427.View ArticlePubMedGoogle Scholar
- Cao Y, Li Z, Liu F, Agarwal S, Zhang Q, Yu H: An IR-Aided Machine Learning Framework for the BioCreative II.5 Challenge. IEEE/ACM Transactions on Computational Biology and Bioinformatics 2010, 7: 454–461.View ArticlePubMedGoogle Scholar
- Fontaine JF, Andrade-Navarro MA: Fast classification of scientific abstracts related to protein-protein interaction using a naïve Bayesian linear classifier. In Proceedings of the BioCreative III workshop; Bethesda MA, USA. CNIO; 2010:67–72.Google Scholar
- Matos S, Campos D, Oliveira JL: Vector-space models and terminologies in gene normalization and document classification. In Proceedings of the BioCreative III workshop; Bethesda MA, USA. CNIO; 2010:119–124.Google Scholar
- Dai HJ, Hung HC, Tsai RTH, Hsu WL: IASL Systems in the Gene Mention Tagging Task and Protein Interaction Article Sub-task. In Proceedings of the BioCreative II workshop; Madrid, Spain. CNIO; 2007:69–76.Google Scholar
- Ehrler F, Gobeill J, Tbahriti I, Ruch P: GeneTeam Site Report for BioCreative II: Customizing a Simple Toolkit for Text Mining in Molecular Biology. In Proceedings of the BioCreative II workshop; Madrid, Spain. CNIO; 2007:199–207.Google Scholar
- Nakov P, Divoli A: BioText Report for the Second BioCreAtIvE Challenge. In Proceedings of the BioCreative II workshop; Madrid, Spain. CNIO; 2007:297–306.Google Scholar
- Rinaldi F, Schneider G, Clematide S, Jegen S, Parisot P, Romacker M, Vachon T: OntoGene (Team 65): preliminary analysis of participation in BioCreative III. In Proceedings of the BioCreative III workshop; Bethesda MA, USA. CNIO; 2010:131–136.Google Scholar
- Wang X, Rak R, Restificar A, Nobata C, Rupp C, Batista-Navarro RTB, Nawaz R, Ananiadou S: NaCTeM Systems for BioCreative III PPI Tasks. In Proceedings of the BioCreative III workshop; Bethesda MA, USA. CNIO; 2010:151–156.Google Scholar
- Grover C, Haddow B, Klein E, Matthews M, Neilsen LA, Tobin R, Wang X: Adapting a Relative Extraction Pipeline for the BioCreAtIvE II Tasks. In Proceedings of the BioCreative II workshop; Madrid, Spain. CNIO; 2007:273–286.Google Scholar
- Dogan RI, Yang Y, Neveol A, Huang M, Lu Z: Identifying protein-protein interactions in biomedical text articles. In Proceedings of the BioCreative III workshop; Bethesda MA, USA. CNIO; 2010:61–66.Google Scholar
- Greenwood MA, Stevenson M: A Semi-Supervised Approach To Learning Relevant Protein-Protein Interaction Articles. In Proceedings of the BioCreative II workshop; Madrid, Spain. CNIO; 2007:175–177.Google Scholar
- Chen YH, Ramampiaro H, Laegreid A, Saetre R: ProtIR prototype: abstract relevance for Protein-Protein Interaction in BioCreAtIvE2 Challenge, PPI-IAS subtask. In Proceedings of the BioCreative II workshop; Madrid, Spain. CNIO; 2007:179–181.Google Scholar
- Kim S, Wilbur WJ: Improving Protein-Protein Interaction Article Classification Performance by Utilizing Grammatical Relations. In Proceedings of the BioCreative III workshop; Bethesda MA, USA. CNIO; 2010:83–88.Google Scholar
- Rinaldi F, Kappeler T, Kaljurand K, Schneider G, Klenner M, Hess M, von Allmen JM, Romacker M, Vachon T: OntoGene in Biocreative II. In Proceedings of the BioCreative II workshop; Madrid, Spain. CNIO; 2007:193–198.Google Scholar
- Lourenco A, Conover M, Wong A, Pan F, Abi-Haider A, Nematzadeh A, Shatkay H, Rocha LM: Testing Extensive Use of NER tools in Article Classification and a Statistical Approach for Method Interaction Extraction in the Protein-Protein Interaction Literature. In Proceedings of the BioCreative III workshop; Bethesda MA, USA. CNIO; 2010:113–117.Google Scholar
- Leaman R, Sullivan R, Gonzalez G: A top-down approach for finding interaction detection methods. In Proceedings of the BioCreative III workshop; Bethesda MA, USA. CNIO; 2010:99–103.Google Scholar
- Krallinger M, Valencia A: Evaluating the Detection and Ranking of Protein Interaction relevant Articles: the BioCreative Challenge Interaction Article Sub-task (IAS). In Proceedings of the BioCreative II workshop; Madrid, Spain. CNIO; 2007:29–39.Google Scholar
- Krallinger M, Leitner F, Valencia A: Assessment of the Second BioCreative PPI task: Automatically Extraction of Protein-Protein Interactions. In Proceedings of the BioCreative II workshop; Madrid, Spain. CNIO; 2007:41–54.Google Scholar
- Leitner F, Mardis SA, Krallinger M, Cesareni G, Hirschman LA, Valencia A: An Overview of BioCreative II.5. IEEE/ACM Transactions on Computational Biology and Bioinformatics 2010, 7: 385–399.View ArticlePubMedGoogle Scholar
- Krallinger M, Vazquez M, Leitner F, Valencia A: Results of the BioCreative III Interaction Method Task. In Proceedings of the BioCreative III workshop; Bethesda MA, USA. CNIO; 2010:9–16.Google Scholar
- Krallinger M, Vazquez M, Leitner F, Valencia A: Results of the BioCreative III Interaction Method Task. In Proceedings of the BioCreative III workshop; Bethesda MA, USA. CNIO; 2010:17–23.Google Scholar
- Krallinger M, Vazquez M, Leitner F, Salgado D, Chatraryamontri A, Winter A, Perfetto L, Briganti L, Licata L, Iannuccelli M, Castagnoli L, Cesareni G, Tyers M, Schneider G, Rinaldi F, Leaman R, Gonzalez G, Matos S, Kim S, Wilbur WJ, Rocha L, Tendulkar AV, Rangrej A, Raut V, Agarwal S, Liu F, Wang X, Rak R, Noto K, Elkan C, Lu Z, Dogan RI, Fontaine JF, Andrade-Navarro MA, Valencia A: The Protein-Protein Interaction tasks of BioCreative III: classification/ranking of articles and linking bio-ontology concepts to full text. Manuscript in reviewGoogle Scholar
- Platt JC: Fast training of support vector machines using sequential minimal optimization.1999, 185–208. [http://portal.acm.org/citation.cfm?id=299105]Google Scholar
- Hall M, Frank E, Holmes G, Pfahringer B, Reutemann P, Witten I: The WEKA data mining software: an update. Special Interest Group on Knowledge Discovery and Data Mining Explorer Newsletter 2009, 11: 10–18. [http://dx.doi.org/10.1145/1656274.1656278]Google Scholar
- van rijsbergen CJ, Robertson SE, Porter MF: New models in probabilistic information retrieval. In British Library Research and Development Report, no. 5587. London: British Library; 1980.Google Scholar
- Porter MF: An algorithm for suffix stripping. Program 1980, 14(3):130–137. [http://portal.acm.org/citation.cfm?id=275705]View ArticleGoogle Scholar
- Breiman L: Random Forests. Machine Learning 2001, 45: 5–32. [http://dx.doi.org/10.1023/A:1010933404324] 10.1023/A:1010933404324View ArticleGoogle Scholar
- Kohavi R: Scaling Up the Accuracy of Naive-Bayes Classifiers: a Decision-Tree Hybrid. Proceedings of the Second International Conference on Knowledge Discovery and Data Mining 1996, 202–207. [http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.57.4952]Google Scholar
- Quinlan JR: C4.5: Programs for Machine Learning (Morgan Kaufmann Series in Machine Learning).1st edition. Morgan Kaufmann; 1993. [http://www.amazon.com/exec/obidos/redirect?tag=citeulike07–20\&path=ASIN/1558602380]Google Scholar
- Vitali R, Cesi V, Tanno B, Ferrari-Amorotti G, Dominici C, Calabretta B, Raschella G: Activation of p53-dependent responses in tumor cells treated with a PARC-interacting peptide. Biochemical and biophysical research communications 2008, 368(2):350–356. 10.1016/j.bbrc.2008.01.093View ArticlePubMedGoogle Scholar
This article is published under license to BioMed Central Ltd. This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.