- Research article
- Open Access
Large-scale extraction of accurate drug-disease treatment pairs from biomedical literature for drug repurposing
© Xu and Wang; licensee BioMed Central Ltd. 2013
- Received: 18 September 2012
- Accepted: 30 May 2013
- Published: 6 June 2013
A large-scale, highly accurate, machine-understandable drug-disease treatment relationship knowledge base is important for computational approaches to drug repurposing. The large body of published biomedical research articles and clinical case reports available on MEDLINE is a rich source of FDA-approved drug-disease indication as well as drug-repurposing knowledge that is crucial for applying FDA-approved drugs for new diseases. However, much of this information is buried in free text and not captured in any existing databases. The goal of this study is to extract a large number of accurate drug-disease treatment pairs from published literature.
In this study, we developed a simple but highly accurate pattern-learning approach to extract treatment-specific drug-disease pairs from 20 million biomedical abstracts available on MEDLINE. We extracted a total of 34,305 unique drug-disease treatment pairs, the majority of which are not included in existing structured databases. Our algorithm achieved a precision of 0.904 and a recall of 0.131 in extracting all pairs, and a precision of 0.904 and a recall of 0.842 in extracting frequent pairs. In addition, we have shown that the extracted pairs strongly correlate with both drug target genes and therapeutic classes, therefore may have high potential in drug discovery.
We demonstrated that our simple pattern-learning relationship extraction algorithm is able to accurately extract many drug-disease pairs from the free text of biomedical literature that are not captured in structured databases. The large-scale, accurate, machine-understandable drug-disease treatment knowledge base that is resultant of our study, in combination with pairs from structured databases, will have high potential in computational drug repurposing tasks.
- Biomedical Literature
- Unify Medical Language System
- Anatomical Therapeutic Chemical Code
- Recurrent Colorectal Cancer
Computational drug repurposing approaches
Drug repurposing, the use of known drugs to treat new diseases, has been growing in importance in the last few years [1, 2] because of the prohibitively high cost of drug development, as well as its increasing failure rate. Many computational strategies for drug repurposing have been published . These approaches include repositioning based on chemical similarity [4, 5], molecular activity similarity [6, 7], molecular docking , gene expression similarity [9, 10], and drug side effect similarity . Recently, Chiang et al proposed a data-driven approach to using FDA-approved drug-disease treatment associations for drug repurposing . Even though Chiang’s study used only FDA-approved drug-disease pairs, the researchers were able to infer novel drug uses based on shared treatment profile using a network-based, guilt-by-association method.
A vast amount of drug-disease treatment information exists in the large corpus of published biomedical literature, especially in published clinical trial studies and case reports. Currently, there are 591,623 clinical trial reports and 1,554,544 clinical case reports available on MEDLINE. The drug-disease relationships in biomedical literature include FDA-approved, experimental, and unsuccessful or failed associations. In the USA, and many other countries, off-label use of prescribed drugs are common  and many of these off-label new drug usage results have published in clinical case reports. Consider the following sentence from a clinical case report: “Imatinib in the treatment of follicular dendritic sarcoma: a case report and review of literature." (PMID 17596748). This sentence contains drug repurposing information of using imatinib to treat follicular dendritic sarcoma, for which surgery and radiotherapy are considered as the mainstay treatment options. Another clinical case study example is the repurposing of gabapentin, an FDA-approved drug for controlling seizures in patients with epilepsy, to treat patients with tinnitus, as shown in sentence: “Gabapentin for the treatment of tinnitus: a case report” (PMID 11233342). In this study, we develop a large-scale, pattern-based relationship extraction algorithm to extract drug-disease treatment pairs from published biomedical literature. These pairs include FDA-approved, experimental, and even failed drug-disease associations (the reasons behind failed drug indications are important for drug repurposing). Currently, there exists no knowledge base for failed drug-disease associations.
A large-scale and accurate list of drug-disease treatment pairs derived from published biomedical literature can be used for drug repurposing in two ways: first, the extracted pairs themselves contain many interesting drug-disease repurposing pairs with evidence from case studies or small-scale clinical studies (as shown above). Second, these pairs can be used in network-based systems approaches for drug repurposing. For example, if drug 1 is similar to drug 2 (similarity can be measured based on shared genes, pathways, gene expression profiles, chemical structures or phenotypes), and disease 1 can be treated by drug 1 (based on drug-disease relationship), then we can hypothesize that disease 1 can also be treated by drug 2. This is a very simple example and we can add more constraints to the repurposing algorithms, but drug-disease relationships will be important to connect drugs to diseases.
Drug-disease relationship extraction from biomedical literature
Currently, more than 20 million biomedical abstracts are available on MEDLINE, making it a rich source of biomedical information, including drug-disease treatment associations. However, despite the sheer volume of published articles, most of the available knowledge is buried in free text with limited machine understandability. Common approaches for relation extraction use rule-based, statistical approaches, machine learning or natural language processing (NLP) techniques [14-18]. Automatically extracting drug-disease treatment relationships from free text is an active research area. Cimino et al. used MeSH descriptors and co-occurrence statistics to generate semantic relation extraction rules in order to detect relations in MEDLINE article titles . Lee et al. and Abacha et al. applied manually built patterns to identify treatment specific relations between drugs and diseases [20, 21]. Rosario et al. classified seven relation types, including drug-disease treatment type, using generative and neural network models . Chen et al. used co-occurrence statistics to rank the association between eight disease and relevant drugs. Rindflesch et al. developed the SemRep system to identify semantic relations in the biomedical literature based on linguistic analysis of text and domain knowledge . Recently, Neveol et al. automatically extracted and integrated drug indication information from multiple resources . To extract drug-disease relationships from biomedical text, the researchers use MeSH terms to retrieve related articles from which drug-disease treatment pairs are then extracted. Many of the above studies leveraged MeSH terms in order to extract treatment-specific drug-disease pairs. However, not all drug-disease treatment pairs were captured by MeSH terms. For the two drug repurposing case studies mentioned previously: “imatinib-follicular dendritic sarcoma” and “gabapentin-tinitus”, neither of the pairs are specified in MeSH headings. Machine-learning approaches have been applied to extract drug-disease treatment pairs from free text. Bundschus et al. developed a conditional random fields method to identify the semantic relations between diseases and treatments . The researchers trained and tested the model on a manually annotated text corpus consisting of 3570 sentences generated from MEDLINE 2001 abstracts and reported a 79.5% accuracy in identifying treatment semantic relationship. Similarly Islamaj Dogan et al. developed a context-blocks model for identifying clinical relationships, including treatment semantic relationship, in patient records. The model was trained and tested on a set of 826 patient records and achieved a F-score of 0.704 in identifying drug-disease treatment relationship. Even though both studies reported high performance in identifying treatment semantic relationship from manually annotated test dataset, it is still unknown if these models are generalizable and if they can achieve the same high performance when tested on all MEDLINE abstracts using all known drug-disease treatment pairs (eg., pairs extracted from FDA drug labels or pairs from ClinicalTrials.gov) as test data.
In this study, we develop a large-scale pattern-based approach to extracting drug-disease treatment associations from 20 million MEDLINE articles. Unlike previous studies, our study does not rely on MeSH terms or manually annotated training datasets to classify extracted drug-disease pairs and requires minimal human effort. While most relationship extraction methods put equal emphases on precision and recall, our study focuses on building a large scale and accurate drug-disease treatment relationship knowledge base for the purpose of ‘in silico; drug target discovery and drug-repurposing; therefore high precision, large-scale (not necessary high recall) and unbiasedness are important. The assumption underlying our pattern-learning approach is that even though treatment-specific semantic relationship between a drug and a disease can be expressed in many different ways due to the flexibility and expressive nature of human natural language, these patterns are not randomly distributed. There exist predominant patterns that people are commonly used to describe treatment-specific drug-disease associations, such as “DRUG in the treatment of DISEASE” and “DRUG for the treatment of DISEASE.” In fact, searching MEDLINE for the phrase “in the treatment of,” we retrieved more than 250,000 sentences. Searching for a more specific phrase “in the treatment of breast cancer,” we retrieved more than 1500 sentences. The drugs used to treat breast cancer include tamoxifen, dibromodulcitol, trastuzumab, lapatinib, vindesine, letrozole among many others. Of these drugs, only a few are FDA-approved. In this study, we first automatically learn treatment-specific textual patterns using known drug-disease pairs. We then extract additional drug-disease pairs from published biomedical literature using these learned patterns.
Obtain MEDLINE data
We have used 20 million MEDLINE abstracts (roughly 100 million sentences) published from 1965 to 2010 as the text corpus for our task of treatment-specific drug-disease relationship extraction. The 2010 MEDLINE/PubMed baseline XML files was downloaded from NLM’s anonymous FTP server at ftp://ftp.nlm.nih.gov/nlmdata/.medleasebaseline/. The MEDLINE XML files were then parsed. Abstracts and titles were extracted and split into sentences.
Create drug and disease lexicons
Drug lexicon: The drug lexicon was downloaded from (http://www.drugbank.ca/) and consisted of 6,516 drugs, including both FDA-approved drugs and experimental drugs. The decision of using drug names from DrugBank instead of RxNORM or other sources is that DrugBank contains both experimental and FDA-approved clinical drugs.
Extract known drug-disease pairs from Clinicaltrials.gov
ClinicalTrials.gov is a registry of federally and privately supported clinical trials conducted in the United States and around the world. For each of the trials listed at ClinicalTrials.gov, there is associated medical condition and drug treatment information. We downloaded a total of 115,026 clinical trial XML files from Clinicaltrials.gov (data accessed in 04/2011). A total 196,002 drug-disease pairs were extracted from the downloaded XML files. Many of the disease and drug names in the drug-disease pairs were in free text form. In addition, drug names are often mixtures of drug brand names and trade names. We performed named entity recognition for both drug and disease terms. We then mapped drug trade names to their generic names. Drug generic names as well as trade names were downloaded from DrugBank. After these steps, total 52,000 drug-disease pairs were obtained. These pairs were subsequently used as input (or seeds) to learn treatment-specific patterns, which then were used to extract additional drug-disease pairs from MEDLINE.
Tag MEDLINE sentences and extract patterns
We tagged MEDLINE sentences with disease entities from the clean disease lexicon and drug entities from the drug list we extracted from DrugBank. The tagging was based on case-insensitive extact string matching for high precision an d efficiency. For each sentence tagged with both drug and disease entities, we extracted the textual patterns between each pair. The pattern could be “DRUG pattern DISEASE” if the drug entity precedes the disease entity or “DISEASE pattern DRUG” vice versa. For example, from the phrase: “Role of irinotecan in the treatment of small cell carcinoma” (PMID: 11995707), we extracted the pattern “DRUG in the treatment of DISEASE.” From the sentence: “Seventeen women with breast cancer were treated with tamoxifen (20 mg, twice a day)” (PMID 06798066), the pattern “DISEASE were treated with DRUG” was extracted.
Find treatment-specific patterns
Drug-disease pairs from ClinicalTrials.gov were first used as input to learn drug-disease treatment-specific patterns. Then the learned patterns were used to extract additional pairs from MEDLINE. For example, using the pairs from ClinicalTrial.gov, we learned a treatment-specific pattern “DRUG in the treatment of DISEASE”. We then used this learned pattern to extract additional drug-disease pairs from MEDLINE, which were not included in ClinicalTrials.gov. If the pattern “DRUG in the treatment of DISEASE” is associated with 1,000 pairs from ClinicalTrials.gov and 10,000 pairs in MEDLINE, then we will extract an additional 9,000 pairs from MEDLINE using this pattern.
The patterns between drug entities and disease entities are often highly complicated. The patterns can be very general such as “DRUG and DISEASE” or very specific such as “DRUG in combination with 5-FU/leucovorin (LV) was subsequently evaluated as first-line therapy for DISEASE” as shown in the sentence “Irinotecan in combination with 5-FU/leucovorin (LV) was subsequently evaluated as first-line therapy for metastatic colorectal cancer in two randomized, phase III studies” (PMID 11585970). In addition, the patterns between a drug entity and a DISEASE entity are often unrelated to drug treatment. For instance, the pattern “DRUG-induced DISEASE” in sentence “Tamoxifen -induced endometrial cancer” (PMID 12701962) is related to drug side effect. In order to find drug treatment specific patterns, we extracted the textual patterns between known drug-disease pairs from Clinicaltrials.gov. We then ranked the patterns by the number of associated known drug-disease pairs. Finally, we manually examined the top patterns and selected drug treatment specific ones. After the ranking step, the time required to examine the top ranked patterns was minimal (less than 10 minutes).
Extract additional pairs from MEDLINE with selected patterns
For each of the manually selected treatment-specific patterns, we extracted its associated drug-disease pairs from tagged MEDLINE sentences. These patterns were learned using known drug-disease pairs. Here, we used them to extract additional drug-disease pairs from MEDLINE.
Evaluate extracted drug-disease pairs
In order to evaluate drug-disease pairs extracted from MEDLINE, which include FDA-approved as well as experimental drug-disease pairs, we manually created two MEDLINE-specific datasets to evaluate the precision and recall of the extraction algorithm. The first evaluation set consisted of drug-disease treatment pairs for the drug “irinotecan”. The second set consisted of drug-disease pairs for the disease “thrombocytopenia”. To create the “Irinotecan-Disease” evaluation set, we first retrieved all MEDLINE sentences (not just sentences containing the patterns) tagged with the term “irinotecan” and at least one disease term. We then manually extracted 360 treatment-specific pairs from these sentences. For creating the evaluation set “Drug-Thrombocytopenia”, we retrieved all MEDLINE sentences tagged with thrombocytopenia and at least one drug term. We manually extracted 43 treatment specific pairs from those sentences. The annotation task was performed by three curators. Each curator independently annotated tagged sentences and created two evaluation sets. Only the pairs agreed upon by all three curators were used in the final evaluation. The two sets were created independent of the methods we used (evaluators did not know the patterns we used). In this way, the final performance captured the effect of both the learned patterns and the quality of the drug and disease lexicons. Standard precision, recall, and F1 measures were used to evaluate extracted drug-disease pairs. One of the limitations is that these two manually created evaluation datasets (one drug and one disease only) may not be representative for other diseases and drugs. However, due to the intensive manual curation, we did not create evaluation datasets for multiple drugs and multiple diseases. Since the aim of this paper is to extract many additional pairs (pairs that are not included in ClinicalTrials.gov) from MEDLINE, we could not use pairs from ClincialTrials.gov to evaluate these additional pairs extracted from MEDLINE. But we did used pairs from ClinicalTrials.gov as prior knowledge (or seeds) to learn treatment-specific patterns.
Semantic analysis of extracted drug-disease pairs
To demonstrate the potential of the drug-disease pairs that we extracted from MEDLINE using the selected patterns in drug repurposing, we studied the correlations of our extracted drug-disease pairs with drug target genes as well as drug therapeutic classes. We extracted 10,478 drug-target gene pairs from DrugBank (accessed in 01/2012) and extracted 5,544 drug-ATC associations from the World Health Organization Anatomical Therapeutic Chemical (ATC) Classification System (http://www.whocc.no/atc). Examples of these associations include tamoxifen-anti-estrogens and trometamol-hemofiltrates. For all drug-drug pairs that shared disease indications, we calculated the average shared target genes as well as shared ATC codes, then compared them to those of all drug-drug pairs.
Analyze patterns associated with known drug-disease pairs
Extract additional pairs from MEDLINE using selected patterns
Precision and recall evaluation of the extracted drug-disease pairs
Precision, recall and F1 values at different frequency cutoffs
Semantic analysis of extracted drug-disease pairs
In this study, we developed a pattern-based relationship extraction method to mine drug-disease treatment associations from 20 million published MEDLINE abstracts. We extract total of 34,305 unique drug-disease pairs, the majority of which are not captured in any existing structured databases. The precision and recall are 0.904 and 0.131 respectively for all pairs, and 0.904 and 0.842 respectively for frequent pairs.
Even though our algorithm has achieved high precisions and extracted a large number of additional drug-disease treatment pairs from MEDLINE abstracts, there are several limitations to our study: (1) We only used the simple patterns “DRUG pattern DISEASE”. The recall of such a pattern critically depends on the coverage of the underlying lexicon. In our future studies, we will experiment two additional patterns: (a) “NP1 pattern NP2” where NP1 and NP2 are noun phrases; and (b) “NP1 pattern NP2” where NP1 and NP2 are noun phrases. NP1 contains a drug term and NP2 contains a disease term. Our current approach does not use syntactic information, and its precision and recall depend on the underlying lexicons. Both patterns (a) and (b) rely on parser information to reduce the number of patterns extracted and to increase recall by extracting pairs whose substrings are contained in the input lexicons. For example, in the sentence, “The effect of irinotecan in the treatment of metastatic and recurrent colorectal cancer,” the term “colorectal cancer” instead of “metastatic and recurrent colorectal cancer” is included in the disease lexicon. Using the pattern “in the treatment of”, both pattern (a) and pattern (b) will extract the correct drug-disease pair “irinotecan-metastatic and recurrent colorectal cancer”, but our current method will not, since the term “colorectal cancer” instead of “metastatic and recurrent colorectal cancer” is included in the lexicon. (2) This pattern-based method is limited to extracting pairs from sentences only, not from abstracts. Though important pairs often appear in sentences, some drug-disease pairs may appear only in abstracts. In order to extract drug-disease pairs from abstracts, other relationship extraction methods will be necessary. However, as the size of text corpus increases, the likelihood that drug-disease pairs will appear in a sentence will increase. (3) Even though we extracted 34,305 unique drug-disease pairs using only 17 selected top patterns, the top patterns may only capture common drug-disease pairs. If a drug-disease pair appears in MEDLINE only once, the likelihood of it being associated with one of the selected top patterns is small. In order to increase the recall, we can increase the number of selected patterns, develop other algorithms to complement the pattern-based approach, or increase the size of the text corpus to include full-text articles, web pages or electronic patient medical records. (4) Highly accurate and comprehensive lexicons are prerequisites for many biomedical relationship extraction tasks, including our task of extracting drug-disease pairs from MEDLINE. For drug-disease treatment relationship extraction from MEDLINE, we can obtain a list of accurate FDA-approved drugs with reasonable coverage from DrugBank, or PharmGKB. However, obtaining a disease list with both good accuracy and coverage for this specific task is more challenging. The precisions and recalls of using UMLS-based lexicons in extracting diseases from biomedical text vary [28, 29]. In this study, we manually created a clean disease lexicon by combining a automatic approach with manual curation. However, there is need to increase the coverage of the underlying disease lexicon . (5) Not all sentences in a document are equally informative. Sentence type is important for assessing the strength of extracted drug-disease associations. For example, the strength of drug-disease treatment is strong if it appears in background section sentences or in conclusion sentences. On the other hand, drug-disease associations in objective sections are weaker. We previously developed an algorithm by combining text classification and hidden Markov modeling techniques to automatically structure MEDLINE abstracts . In the future, we plan to assign a confidence score to each extracted association by taking sentence type into account. (6) Negation detection, or sentimental classification of drug-disease treatment relationships into subtypes is important. Some of the possible subtypes of drug-disease treatment relationships include “effective and safe,” “effective, not safe,” “safe, not effective,” and “not effective.” Examples include “Metronidazole proved to be effective and safe in the treatment of perioral dermatitis in children.” (PMID 09407169) (“effective and safe”); “Anthracyclines are effective in the treatment of leukemia, but their use is limited because of cardiotoxicity” (PMID 17043024) (“effective, not safe”); “ Etanercept, at the dosage used, was well tolerated but not effective in the treatment of PSC.” (PMID 14992426) (“safe, not effective”); “Azithromycin was not as effective for the treatment of rosacea.”(PMID 15370397) (“not effective”). In addition, for repositioning strategies based on drug-disease treatment similarity, it is necessary to further differentiate palliative treatments from primary treatments. (7) Patient population characteristics (e.g. age, set) are important for better understanding drug-disease treatment relationships. Consider the following sentence “Forlax is safe and effective in the treatment of constipation in children over 8 years old” (PMID 17937851) and “Lubiprostone (Amitiza), appears to be effective for the treatment of chronic constipation for elderly patients” (PMID 18053448).
We developed a pattern-based biomedical relationship extraction method and extracted 34,305 unique drug-disease pairs from 20 million MEDLINE abstracts. Our algorithm achieved a precision of 0.904 and a recall of 0.131 for all pairs, and a precision of 0.904 and a recall of 0.842 for frequent pairs. We have shown that the extracted drug-disease pairs positively correlate with drug targets as well as therapeutic classes. We demonstrate that the published articles available on MEDLINE are a valuable source of drug-disease treatment information. The pattern-based relationship extraction algorithm is able to accurately extract many additional pairs from MEDLINE. These accurate and machine-understandable drug-disease pairs have high potential in computational drug repurposing tasks.
RX is funded by Case Western Reserve University/Cleveland Clinic CTSA Grant (UL1 RR024989). QW is funded by ThinTek LLC. ThinTek curators have created the clean lexicons and two evaluation data sets.
- Ashburn TT, Thor KB: Drug repositioning: identifying and developing new uses for existing drugs. Nat Rev Drug Discov. 2004, 3: 673-83. 10.1038/nrd1468.View ArticlePubMedGoogle Scholar
- DiMasi J: Success rates for new drugs entering clinical testing in the United States. Clin Pharmacol Ther. 1995, 58: 1-14. 10.1016/0009-9236(95)90066-7.View ArticlePubMedGoogle Scholar
- Dudley J, Deshpande T, Butte AJ: Exploiting drug-disease relationships for computational drug repositioning. Brief Bioinform. 2011, 12: 303-311. 10.1093/bib/bbr013.PubMed CentralView ArticlePubMedGoogle Scholar
- Keiser MJ, Setola V, Irwin JJ: Predicting new molecular targets for known drugs. Nature. 2009, 462: 175-81. 10.1038/nature08506.PubMed CentralView ArticlePubMedGoogle Scholar
- Noeske T, Sasse BC, Stark H: Predicting compound selectivity by self-organizing maps: cross-activities of metabotropic glutamate receptor antagonists. Chem Med Chem. 2006, 1: 1066-8.View ArticlePubMedGoogle Scholar
- Lamb J, Crawford ED, Peck D: The connectivity map: using gene-expression signatures to connect small molecules, genes, and disease. Science. 2006, 313: 1929-35. 10.1126/science.1132939.View ArticlePubMedGoogle Scholar
- Chen B, Wild D, Guha R: PubChem as a source of polypharmacology. J Chem Inf Model. 2009, 49: 2044-55. 10.1021/ci9001876.View ArticlePubMedGoogle Scholar
- Kinnings SL, Liu N, Buchmeier N, Tonge PJ, Xie L, Bourne PE: Drug discovery using chemical systems biology: repositioning the safe medicine Comtan to treat multi-drug and extensively drug resistant tuberculosis. PLoS computational biology. 2009, 5 (7): e1000423-10.1371/journal.pcbi.1000423.PubMed CentralView ArticlePubMedGoogle Scholar
- Dudley JT, Sirota JDM, Shenoy M, Pai RK, Roedder S, Chiang AP, Morgan AA, Sarwal MM, Pasricha PJ, Butte AJ: Computational Repositioning of the Anticonvulsant Topiramate for Inflammatory Bowel Disease. Sci Transl Med. 2011, 3 (96): 96ra76-10.1126/scitranslmed.3002648.PubMed CentralPubMedGoogle Scholar
- Agarwal P, Hu: Human disease-drug network based on genomic expression profiles. PLoS One. 2009, 4: e6536-10.1371/journal.pone.0006536. [http://dx.doi.org/10.1371/journal.pone.0006536 ]PubMed CentralView ArticlePubMedGoogle Scholar
- Campillos M, Kuhn M, Gavin AC, Jensen LJ, Bork P: Drug target identification using side-effect similarity. Science. 2008, 321: 263-266. 10.1126/science.1158140.View ArticlePubMedGoogle Scholar
- Chiang AP, Butte AJ: Systematic evaluation of drug-disease relationships to identify leads for novel drug uses. Clin Pharmacol Ther. 2009, 86: 507-10. 10.1038/clpt.2009.103.PubMed CentralView ArticlePubMedGoogle Scholar
- DeMonaco HJ, Ali A, von Hippel E: The major role of clinicians in the discovery of off-label drug therapies. Pharmacotherapy. 2006, 26: 323-332. 10.1592/phco.26.3.323.View ArticlePubMedGoogle Scholar
- Zweigenbaum P, Demner-Fushman D, Cohen K, HYu: Frontiers of biomedical text mining: current progress. Brief Bioinform. 2007, 8 (5): 358-375. 10.1093/bib/bbm045.PubMed CentralView ArticlePubMedGoogle Scholar
- Blaschke C, Andrade MA, Ouzounis C, Valencia A: Automatic extraction of biological information from scientific text: protein-protein interactions. roc Int Conf Intell Syst Mol Bio. 1999, (7), 60-67.Google Scholar
- Friedman C, Kra P, Yu H, Krauthammer M, Rzhetzky A: Genies: a natural-language processing system for the extraction of molecular pathways from journal articles. Bioinformatics. 2001, 17 (suppl 1): S74-S82. 10.1093/bioinformatics/17.suppl_1.S74.View ArticlePubMedGoogle Scholar
- Rindflesch TC, Tanabe L, Weinstein JN, Hunter L: EDGAR: Extraction of Drugs, Genes And Relations from the Biomedical Literature. Pacific Symposium on Biocomputing. NIH Public Access. 2000, 517-528.Google Scholar
- Xu R, Wang Q: A knowledge-driven conditional approach to extract pharmacogenomics specific drug-gene relationships from free text. J of Biomed Inform. 2012, 45 (5): 827-834. 10.1016/j.jbi.2012.04.011.View ArticleGoogle Scholar
- Cimino J, Barnett G: Automatic knowledge acquisition from MEDLINE. Methods Inf Med. 1993, 32: 120-130.PubMedGoogle Scholar
- Lee C, Khoo C, Na J: Automatic identification of treatment relations for medical ontology learning: An exploratory study. ADVANCES IN KNOWLEDGE ORGANIZATION 2004, (9). 2004, 245-250. , (9)Google Scholar
- Abacha B, Zweigenbaum P: Automatic extraction of semantic relations between medical entities: Application to the treatment relation. Proceedings of the Fourth International Symposium on Semantic Mining in Biomedicine (SMBM). 2010Google Scholar
- Rosario B, Hearst MA: Classifying semantic relations in bioscience texts. Proceedings of the 42nd Annual Meeting on Association For Computational Linguistics. Association for Computational Linguistics;. 2004, 430-430.Google Scholar
- Chen ES, Hripcsak G, Xu H, Markatou M, Friedman C: Automated acquisition of disease drug knowledge from biomedical and clinical documents: an initial study. J Am Med Inform Assoc. 2008, 15: 87-98.PubMed CentralView ArticlePubMedGoogle Scholar
- Rindflesch TC, Fiszman M: The interaction of domain knowledge and linguistic structure in natural language processing: interpreting hypernymic propositions in biomedical text. J Biomed Inform. 2003, 36: 462-477. 10.1016/j.jbi.2003.11.003.View ArticlePubMedGoogle Scholar
- Neveol A, Lu Z: Automatic integration of drug indications from multiple health resources. Proceedings of the 1st ACM International Health Informatics Symposium. ACM. 2010, 666-673.Google Scholar
- Bundschus M, Kriegel H, MDejori: Extraction of semantic biomedical relations from text using conditional random fields. BMC Bioinformatics. 2008, 9: 207-10.1186/1471-2105-9-207.PubMed CentralView ArticlePubMedGoogle Scholar
- Xu R, Musen A, Shah N: A comprehensive analysis of five million UMLS metathesaurus terms using eighteen million MEDLINE citations. AMIA Annu Symp Proc. American Medical Informatics Association. 2010, 907-911.Google Scholar
- Pratt W, Yetisgen-Yildiz M: A Study of Biomedical Concept Identification: MetaMap vs. People. AMIA Annu Symp Proc. American Medical Informatics Association. 2003, 529-533.Google Scholar
- Shah NH, Bhatia N, Jonquet C, Rubin D, Chiang AP, Musen AM: Comparison of concept recognizers for building the open biomedical annotator. BMC Bioinformatics. 2009, 10 (Suppl 9): S14-10.1186/1471-2105-10-S9-S14.PubMed CentralView ArticlePubMedGoogle Scholar
- Xu R, Supekar K, Morgan A, Das A, Garber AM: Unsupervised Method for Automatic Construction of a Disease Dictionary from a Large Free Text Collection. AMIA Annu Symp Proc. American Medical Informatics Association. 2008, 820-824.Google Scholar
- Xu R, Supekar K, Huang Y, Das A, Garber AM: Combining text classification and hidden markov modeling techniques for structuring randomized clinical trial abstracts. AMIA Annu Symp Proc. American Medical Informatics Association. 2006, 824-828.Google Scholar
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.