- Open Access
MeInfoText: associated gene methylation and cancer information from text mining
© Fang et al; licensee BioMed Central Ltd. 2008
- Received: 22 May 2007
- Accepted: 14 January 2008
- Published: 14 January 2008
DNA methylation is an important epigenetic modification of the genome. Abnormal DNA methylation may result in silencing of tumor suppressor genes and is common in a variety of human cancer cells. As more epigenetics research is published electronically, it is desirable to extract relevant information from biological literature. To facilitate epigenetics research, we have developed a database called MeInfoText to provide gene methylation information from text mining.
MeInfoText presents comprehensive association information about gene methylation and cancer, the profile of gene methylation among human cancer types and the gene methylation profile of a specific cancer type, based on association mining from large amounts of literature. In addition, MeInfoText offers integrated protein-protein interaction and biological pathway information collected from the Internet. MeInfoText also provides pathway cluster information regarding to a set of genes which may contribute the development of cancer due to aberrant methylation. The extracted evidence with highlighted keywords and the gene names identified from each methylation-related abstract is also retrieved. The database is now available at http://mit.lifescience.ntu.edu.tw/.
MeInfoText is a unique database that provides comprehensive gene methylation and cancer association information. It will complement existing DNA methylation information and will be useful in epigenetics research and the prevention of cancer.
- Association Rule
- Aberrant Methylation
- Gene Methylation
- Association Information
- Abnormal Methylation
DNA methylation, occurring predominantly in CpG dinucleotides, is an important epigenetic modification of the genome that is involved in mediating various cellular processes . DNA methylation has a wide range of biological functions, including an essential developmental role in the reprogramming of germ cells and early embryos, the genomic imprinting, the X chromosome inactivation, the repression of endogenous retrotransposons and the generalized role in gene expression . Abnormal methylation of DNA may result in increased transcription of oncogenes or silencing of tumor suppressor genes and is common in a variety of human cancer cells . Although the ramifications of global hypomethylation for tumor development are less well understood, it might contribute to chromosomal instability and then increases in gene expression [4, 5]. The hypermethylation of CpG islands in gene promoter regions is associated with aberrant silencing of transcription and has been regarded as a common mechanism for inactivation of tumor suppressor genes in human cancer [3, 6]. As compared with normal cells, the malignant cells show major disruptions in their DNA methylation patterns  and some genes seem to be aberrantly methylated in a tumor-specific manner . Currently, many studies have corroborated the DNA methylation profile of a cell type which may serve as a biomarker with a diagnostic and prognostic value . In addition, the initiation of the process of abnormal promoter methylation may associate with chromatin-remodelling complexes [5, 9]. Therefore, if the contribution of each candidate gene to tumorigenesis can be proved, the exact methylation profiles of tumors are available and the molecular events that initiate and maintain epigenetic gene silencing are understood clearly, then the prevention and treatment of cancer could have come more focused and rational [5, 10].
Text mining in biology is to automatically extract specific information about genes, proteins and their functional associations from text documents . As more biological literature is published electronically, it is desirable to develop methods for automatic extraction of relevant information from any source of biology data, especially from sources such as literature written in human language (also known as natural language) [11, 12]. And the development of text-mining applications specific for biology is the only way to cope with the increasing amount of free textual data produced in this field . Over the past few years, a considerable number of studies have been made on the mining and extraction of information from biomedical literature, such as disease candidate genes [13, 14], protein-protein interactions [15, 16], protein functions  and modifications . Many different approaches such as named-entity recognition (NER) based on a dictionary, template or ontology; statistics of word co-occurrence and natural language processing (NLP) have been adopted or invented by various researchers to achieve the goal. However, so far no attempt has been made to analyze the available DNA methylation information from a vast amount of literature.
In this paper, we present a biological database providing gene, methylation and cancer association information mined from the text and integrated protein-protein interaction and biological pathway information. Since MethDB  is the only public database that was developed to store information containing the origin of the investigated sample, including experimental procedure and DNA methylation data, it could be anticipated that our database is able to act complementarily to the existing databases, fill a gap in the already available DNA methylation resources and facilitate the research on epigenetics.
Data sources and contents
Gene synonym dictionary
We constructed a human gene synonym dictionary containing official gene symbols and aliases to annotate gene names in the literature. To make sure that most gene information stored in our dictionary is validated experimentally, we first collected all human protein entries from Swiss-Prot, a curated protein sequence database, and retrieved corresponding gene information, including official gene symbol, aliases, full name and summary from NCBI Entrez Gene. Information regarding to human miRNA genes was directly obtained from NCBI Entrez Gene. The annotation process was based on pattern matching between the dictionary entries and words in abstracts. The match was case-insensitive and only whole words were matched. After the complete of initial identification, we manually examined most recent 100 gene-annotated documents to reduce false named entity recognitions and enhance dictionary coverage. If unexpected words were frequently matched in the documents, these ambiguous gene synonyms would be regarded as stop words and removed from the dictionary. On the other hand, if no gene synonym was able to be found in the document by the dictionary, the document would be checked manually and then the discovered gene synonyms, if any, would be added to our dictionary. We then annotated gene names in the literature again with the improved dictionary.
Cancer type vocabulary
Cancer types were obtained from the heading fields in the Neoplasms by Site (C04.588) section of the MeSH. If a cancer type is composed of tumor site and Neoplasms i.e. Breast Neoplasms, in order to increase vocabulary coverage only the site name would be added to our cancer name set for matching abstracts and sentences. Other cancer types whose suffixes are -oma such as retinoblastoma were directly added to the set. Neoplasm-related keywords including cancer, tumor, tumour, neoplasm and -oma were first used to check if abstracts or sentences might describe cancers. If any, the set of cancer names would be used to discover what kinds of cancers occur in abstracts or sentences. We then mined associations between gene methylation and cancer based on term co-occurrences. For example, "BRCA1 methylation contributes to a subset of sporadic cancers of the breast." could be first identified by neoplasm-related keyword, cancer, and then by the site name, breast. Furthermore, BRCA1, methylation and breast cancer could be found they occur together in a sentence.
The gene-annotated documents were indexed with Plucene module, a perl search engine toolkit based on the Lucene API , to allow fast access to words stored inside the text. The associations between genes, methylation and cancers were mined according to their co-occurrences in the full abstracts and sentences. The keywords for methylation include 'methyl-', 'hypermethyl-', 'hypomethyl-', 'histone' and the methods of detecting DNA methylation, such as 'MSP' and 'COBRA'.
We used confidence and support to measure our association rule interestingness . Let I be a set of items. Let D be a set of database transactions where each transaction T is a set of items such that T ⊆ I. An association rule is an implication of the form A → B, where A ⊂ I, B ⊂ I, and A ⋂ B = Φ. The rule A → B holds in the transaction set D with support s, where s is the probability that item A and B occur together, P(A ⋃ B). The rule A → B has confidence c in the transaction set D if c is the conditional probability of B given A, P(B|A). In other words, confidence is the frequency of entries containing item A and B within all entries containing item A and support is the frequency of entries containing item A and B within all entries.
Here, we use BRCA1 gene as an example. Currently, of all the abstracts mentioning BRCA1 gene in our database, there are a total of 1095 sentences. Among them, 167 sentences contain BRCA1, methylation and cancer; 251 sentences contain BRCA1 and methylation. Thus, support of the association rule (BRCA1 gene methylation → cancer) is (167/1095) * 100% = 15.3%; confidence is (167/251)*100% = 66.5%.
In order to retrieve gene methylation and cancer information regarding the human BRCA1 (breast cancer 1, early onset) gene, for example, users can specify 'Gene Symbol' search type, input BRCA1 as search term and then press the MIT search button. If the search is matched, the general gene information, including Gene ID, Swiss Prot accession number, Gene Name, Gene Aliases and Description, is shown. Users can then follow the NCBI Entrez Gene ID link for statistics and association information. The returned web page contains information about gene function, cross-references, association between BRCA1 gene methylation and cancer in sentence level, protein-protein interactions, biological pathways and statistics, including the numbers of papers and sentences containing gene methylation, hypermethylation, hypomethylation, histone and stem cell. Users can follow the link of "Does the gene encode a methyltransferase?" to tell if a gene encodes a methyltransferase by support evidences, if any, automatically extracted from Entrez Gene summary, GeneRIF, Gene Ontology  and Swiss-Prot . For example, the parts of extracted evidences about HRMT1L2 (HMT1 hnRNP methyltransferase-like 2) are "protein methyltransferase activity [Gene Ontology]" and "Methylates SUPT5H. [Swiss-Prot] ". In the final part of the returned page, the information about co-occurrences of BRCA1 and different cancer types in the abstracts can be retrieved. Furthermore, one can see which other genes are also involved in a specific cancer type. Minimum number of papers of these genes is able to be specified. Default value is two. Users can then examine the pathway clusters and interactions of these genes. In the page of "Interaction Information about Gene Methylation and Cancer", the interaction network could be retrieved by following the link of Graph. Each node represents a gene, the red one represents the query gene such as BRCA1 and the blue ones represent other genes related to methylation. Each edge means an interaction between two genes. Specific association information and evidence extracted from the literature can be accessed by following the link of the number of papers containing gene, methylation and cancer. Information extracted from the literature is presented with highlighted keywords, identified genes, journal names and publication years.
To find gene methylation profile of human cancer, users may select a particular cancer type such as Breast cancer to find a set of genes undergoing abnormal methylation shared by this cancer. In addition, users can enter multiple official gene symbols separated by space to access gene methylation associations. For example, users are able to input 'BRCA1 APC CDKN2A GSTP1' to simultaneously retrieve association information including the number of related papers, confidences and supports. Users also can input multiple official gene symbols separated by space and select multiple cancer types to examine the profile of gene methylation across human cancer types. For instance, if a set of genes, including APC (adenomatous polyposis coli), BRCA1, CDH1 (cadherin 1), GSTP1 (glutathione S-transferase pi), MGMT (O6-methylguanine-DNA methyltransferase) and TIMP3 (TIMP metallopeptidase inhibitor 3), are inputted and different cancer types, including colon, lung, breast, gastric, liver, esophageal, bladder, leukemia, kidney, ovarian, head and neck, pancreas and lymphoma are selected, the profile of methylation for each gene among different cancers could be shown. Users may find the profile is cancer-specific or gene-specific.
Inference of the pathways responsible for the development of tumors in a specific tissue
In addition to allow users to understand quickly the cancers in which a specific gene may play a role, MeInfoText could infer the pathways responsible for the development of cancers. If a set of genes could be associated to a given disease, they may tend to be connected at the translational levels. In other words, their gene products may interact with each other and involve in a biological pathway. For example, MeInfoText allows the user to query for APC gene and shows the cancers in which the gene may be methylated. Also, it shows the other genes that may be methylated for each of those cancers. In the 70 papers that mention APC gene methylation and colorectal cancer, there are 10 papers mentioning MLH1 (mutL homolog 1), 7 papers mentioning TP53 (tumor protein p53) and 3 papers mentioning MYC (v-myc myelocytomatosis viral oncogene homolog), all of which could be clustered into colorectal cancer pathway. Furthermore, TP53, CTNNB1 (catenin, beta 1), MYC and SFRP1 (secreted frizzled-related protein 1) are able to be clustered into Wnt signaling pathway and the interaction data indicates CTNNB1 has physical and direct interaction with APC. Therefore, users may infer abnormal methylation of these genes involve colorectal cancer-related mechanism and Wnt signaling pathway is responsible the development of tumors in colon tissue .
Our gene synonym dictionary, cancer names and mined associations were evaluated respectively. We used precision and recall measurements to estimate the performances Precision and recall are defined as follows: ; where TP, FP and NP are the number of true positives, false positives and negative positives. The data used for evaluating the gene synonym dictionary can be accessed at . Partially identified gene symbols were considered false positives (i.e. "SKR-1 vs. SKR"). The precision and recall of our dictionary to recognize gene symbols in the test data are 96% and 74%. The data used for the cancer name evaluation can be retrieved at [30, 31]. Non-specific terms such as cancer and tumor were excluded. There are about 382 text localizations tagged by <DIS></DIS> or <DISONLY></DISONLY> and mentioning cancer names. Partially identified cancer names were considered false positives (i.e. "B cell lymphoma vs. lymphoma" or "squamous cell lung cancer vs. lung cancer"). The precision and recall of cancer name recognitions are 79% and 67%.
We tested our association rules using human gene methylation and cancer data published by Das and Singal, 2004  (reporting 13 genes commonly methylated in cancer, called dataset D) and Esteller, 2005  (reporting 36 genes hypermethylated in cancer, called dataset E). In addition, human genes having methylation information and available in MethDB were also tested (dataset M, 18 genes). First, the ratios of relationships between gene methylation and cancer that could be discovered by MeInfoText are 13/13 (100%), 34/36 (94%) and 18/18 (100%) for datasets D, E, and M, respectively. Second, the average confidences and supports for the three test sets are (D: 61.3%, 17.1%) and (E: 60.6%, 17.5%) and (M: 54.4%, 16.2%). We observed that over 85% and 90% of the tested genes have confidences and supports greater than 40% and 7%, respectively. Therefore, we believed rules that satisfy a minimum confidence threshold, 40%, and a minimum support threshold, 7%, in this study are significant.
We evaluated the 75 associations in relation to the 34 unique genes and various specific cancers from Das and Singal, 2004 and Esteller, 2005. The data used for the evaluation are available at . Mined associations with no explicit evidences that can support it were regarded as false positives. The precision and recall of association mining are 99% and 93%, respectively. Furthermore, we randomly selected 20 genes, PYCARD, CDH13, COX2, DAPK1, ESR1, GATA4, SYK, MLH1, TP73, PRDM2, PGR, SFRP1, SOCS1, SOCS3, STK11, TMEFF2, THBS1, RASSF5, PRKCDBP and RARB, from the 34 genes and manually evaluated the mined associations with at least 2, 3 and 5 papers, respectively. The number of the associations is 362, 222 and 103 and precisions are 78%, 85% and 91%, respectively.
MeInfoText might provide methylation markers for the detection of human cancer. From the MeInfoText search, we can find gene methylation profile of almost every human cancer type. The most relevant methylation-associated silencing of genes for each cancer could be combined into a set of potential markers which may reach high cancer detection information. For example, there are 342 genes whose abnormal methylation might relate to lung cancer and the top 11 genes having at least 20 papers include CDKN2A (cyclin-dependent kinase inhibitor 2A), RASSF1 (Ras association domain family 1), TP53, MGMT, DAPK1 (death-associated protein kinase 1), RARB (retinoic acid receptor, beta), HRAS (v-Ha-ras Harvey rat sarcoma viral oncogene homolog), RB1 (retinoblastoma 1), APC, FHIT (fragile histidine triad gene) and GSTP1. The different compositions of 3 or 4 of these potential markers may reach different cancer detection information and further investigation is required to support the usefulness.
Epigenetic inactivation may affect many existing cellular pathways . For instance, through the MeInfoText search, we can find that the abnormal silencing of GSTP1 gene is strongly related to prostate cancer that may be also associated with aberrant methylation of other 20 genes with at least 3 papers. The pathway cluster information indicates both of GSTP1 and PTGS2 (prostaglandin-endoperoxide synthase 2) involve the TNF-alpha pathway, GSTP1, PTGS2 and TIMP3 participate in the IL-1 pathway and both of GSTP1 and CD44 (CD44 antigen) are components of the B cell receptor pathway. In addition, GSTP1 also involve glutathione metabolism and metabolism of xenobiotics by cytochrome p450. It is likely the methylation-associated silencing of multiple genes in different cellular pathways integrated by GSTP1 has a role in the progression of prostate tumors. In addition, the analysis of aberrant methylation of genes and inactivated pathways may help to understand the tumor behavior . Although much more investigation is required to support the network of multiple genes and putative biomarkers for tumor progression, it would assist the understanding of mechanistic factors involving in such a process.
Epigenetics has been regarded as an important field to contribute cancer research and it is known that DNA methylation is not an isolated event but may be regulated by many complex epigenetic mechanisms such as histone modifications . It has been suggested that loss of acetylation at Lys16 and trimethylation at Lys20 of histone H4 is a common hallmark of human cancer . In addition, some studies suggest that DNA methylation may be related to the interactions between DNA methyltransferases, methyl-CpG binding proteins, histone deacetylases and histone methyltransferases and epigenetic information embodied in residue methylation states would flow from histone to DNA and back . Therefore, we included "histone" as one of the keywords for methylation in our association mining.
Comparison between MeInfoText and MethDB
The MeInfoText differs from the MethDB in many ways. Firstly, we have mined associations among genes, methylation and cancers from a large amount of biomedical literature. The computationally organized information would contribute epigenetics research. Secondly, MeInfoText provides the information about the profile of gene methylation among human cancer types and gene methylation profile of a particular cancer type. It would be useful to discover a set of potential markers for the detection of human cancer. Thirdly, MeInfoText provides integrated information about protein-protein interaction and biological pathway. Users can quickly overview which genes with aberrant methylation may contribute a cancer through various signaling pathways.
Future research might focus on database content and dictionary coverage increases. In addition, we would like to apply machine learning or other NLP techniques to do DNA methylation information extraction and compare the results with the study.
MeInfoText is a novel database providing gene methylation and cancer association information from literature mining and integrated protein-protein interaction and pathway information. It facilitates researchers to comprehensively understand the relationships between multiple gene methylation and various cancers, the profile of gene methylation across human cancer types and gene methylation profile of a specific cancer, and to infer putative signaling pathways involving the development of tumors. It will complement existing DNA methylation information and be valuable to epigenetics research and the prevention of cancer.
Project name: MeInfoText
Project home page: http://mit.lifescience.ntu.edu.tw/
Operating system(s): platform independent
Other requirements: Firefox is recommended for the website access
License: the database website is freely accessible
This project is supported by grants from the National Science Council, Taiwan (NSC 95-2221-E-002-183 and NSC 95-3114-P-002-005-Y) and NTU Frontier and Innovative Research Projects. We would like to thank Dr. Hung-Cheng Lai and Dr. Jorng-Tzong Horng for fruitful discussion, and Rui-Lan Huang for testing the database.
- Robertson KD: DNA methylation and human disease. Nat Rev Genet 2005, 6(8):597–610. 10.1038/nrg1655View ArticlePubMedGoogle Scholar
- Scarano MI, Strazzullo M, Matarazzo MR, D'Esposito M: DNA methylation 40 years later: Its role in human health and disease. J Cell Physiol 2005, 204(1):21–35. 10.1002/jcp.20280View ArticlePubMedGoogle Scholar
- Esteller M: Aberrant DNA methylation as a cancer-inducing mechanism. Annu Rev Pharmacol Toxicol 2005, 45: 629–656. 10.1146/annurev.pharmtox.45.120403.095832View ArticlePubMedGoogle Scholar
- Feinberg AP, Tycko B: The history of cancer epigenetics. Nat Rev Cancer 2004, 4(2):143–153. 10.1038/nrc1279View ArticlePubMedGoogle Scholar
- Baylin SB, Ohm JE: Epigenetic gene silencing in cancer - a mechanism for early oncogenic pathway addiction? Nat Rev Cancer 2006, 6(2):107–116. 10.1038/nrc1799View ArticlePubMedGoogle Scholar
- Herman JG, Baylin SB: Gene silencing in cancer in association with promoter hypermethylation. N Engl J Med 2003, 349(21):2042–2054. 10.1056/NEJMra023075View ArticlePubMedGoogle Scholar
- Baylin SB, Herman JG: DNA hypermethylation in tumorigenesis: epigenetics joins genetics. Trends Genet 2000, 16(4):168–174. 10.1016/S0168-9525(99)01971-XView ArticlePubMedGoogle Scholar
- Laird PW: The power and the promise of DNA methylation markers. Nat Rev Cancer 2003, 3(4):253–266. 10.1038/nrc1045View ArticlePubMedGoogle Scholar
- Vire E, Brenner C, Deplus R, Blanchon L, Fraga M, Didelot C, Morey L, Van Eynde A, Bernard D, Vanderwinden JM, Bollen M, Esteller M, Di Croce L, de Launoit Y, Fuks F: The Polycomb group protein EZH2 directly controls DNA methylation. Nature 2006, 439(7078):871–874. 10.1038/nature04431View ArticlePubMedGoogle Scholar
- Das PM, Singal R: DNA methylation and cancer. J Clin Oncol 2004, 22(22):4632–4642. 10.1200/JCO.2004.07.151View ArticlePubMedGoogle Scholar
- Krallinger M, Valencia A: Text-mining and information-retrieval services for molecular biology. Genome Biol 2005, 6(7):224. 10.1186/gb-2005-6-7-224PubMed CentralView ArticlePubMedGoogle Scholar
- Andrade MA, Bork P: Automated extraction of information in molecular biology. FEBS Lett 2000, 476(1–2):12–17. 10.1016/S0014-5793(00)01661-6View ArticlePubMedGoogle Scholar
- Hristovski D, Peterlin B, Mitchell JA, Humphrey SM: Using literature-based discovery to identify disease candidate genes. Int J Med Inform 2005, 74(2–4):289–298. 10.1016/j.ijmedinf.2004.04.024View ArticlePubMedGoogle Scholar
- Tiffin N, Kelso JF, Powell AR, Pan H, Bajic VB, Hide WA: Integration of text- and data-mining using ontologies successfully selects disease gene candidates. Nucleic Acids Res 2005, 33(5):1544–1552. 10.1093/nar/gki296PubMed CentralView ArticlePubMedGoogle Scholar
- Ono T, Hishigaki H, Tanigami A, Takagi T: Automated extraction of information on protein-protein interactions from the biological literature. Bioinformatics 2001, 17(2):155–161. 10.1093/bioinformatics/17.2.155View ArticlePubMedGoogle Scholar
- Hoffmann R, Valencia A: Implementing the iHOP concept for navigation of biomedical literature. Bioinformatics 2005, 21 Suppl 2: ii252-ii258. 10.1093/bioinformatics/bti1142PubMedGoogle Scholar
- Rice SB, Nenadic G, Stapley BJ: Mining protein function from text using term-based support vector machines. BMC Bioinformatics 2005, 6 Suppl 1: S22. 10.1186/1471-2105-6-S1-S22View ArticlePubMedGoogle Scholar
- Narayanaswamy M, Ravikumar KE, Vijay-Shanker K: Beyond the clause: extraction of phosphorylation information from medline abstracts. Bioinformatics 2005, 21 Suppl 1: i319-i327. 10.1093/bioinformatics/bti1011View ArticlePubMedGoogle Scholar
- Grunau C, Renault E, Rosenthal A, Roizes G: MethDB--a public database for DNA methylation data. Nucleic Acids Res 2001, 29(1):270–274. 10.1093/nar/29.1.270PubMed CentralView ArticlePubMedGoogle Scholar
- Maglott D, Ostell J, Pruitt KD, Tatusova T: Entrez Gene: gene-centered information at NCBI. Nucleic Acids Res 2005, 33(Database issue):D54–8. 10.1093/nar/gki031PubMed CentralView ArticlePubMedGoogle Scholar
- Peri S, Navarro JD, Amanchy R, Kristiansen TZ, Jonnalagadda CK, Surendranath V, Niranjan V, Muthusamy B, Gandhi TK, Gronborg M, Ibarrola N, Deshpande N, Shanker K, Shivashankar HN, Rashmi BP, Ramya MA, Zhao Z, Chandrika KN, Padma N, Harsha HC, Yatish AJ, Kavitha MP, Menezes M, Choudhury DR, Suresh S, Ghosh N, Saravana R, Chandran S, Krishna S, Joy M, Anand SK, Madavan V, Joseph A, Wong GW, Schiemann WP, Constantinescu SN, Huang L, Khosravi-Far R, Steen H, Tewari M, Ghaffari S, Blobe GC, Dang CV, Garcia JG, Pevsner J, Jensen ON, Roepstorff P, Deshpande KS, Chinnaiyan AM, Hamosh A, Chakravarti A, Pandey A: Development of human protein reference database as an initial platform for approaching systems biology in humans. Genome Res 2003, 13(10):2363–2371. 10.1101/gr.1680803PubMed CentralView ArticlePubMedGoogle Scholar
- Hermjakob H, Montecchi-Palazzi L, Lewington C, Mudali S, Kerrien S, Orchard S, Vingron M, Roechert B, Roepstorff P, Valencia A, Margalit H, Armstrong J, Bairoch A, Cesareni G, Sherman D, Apweiler R: IntAct: an open source molecular interaction database. Nucleic Acids Res 2004, 32(Database issue):D452–5. 10.1093/nar/gkh052PubMed CentralView ArticlePubMedGoogle Scholar
- Ogata H, Goto S, Sato K, Fujibuchi W, Bono H, Kanehisa M: KEGG: Kyoto Encyclopedia of Genes and Genomes. Nucleic Acids Res 1999, 27(1):29–34. 10.1093/nar/27.1.29PubMed CentralView ArticlePubMedGoogle Scholar
- Apache Lucene[http://jakarta.apache.org/lucene]
- Han J, Kamber M: Data Mining: Concepts and Techniques. Morgan Kaufmann; 2006.Google Scholar
- Ashburner M, Ball CA, Blake JA, Botstein D, Butler H, Cherry JM, Davis AP, Dolinski K, Dwight SS, Eppig JT, Harris MA, Hill DP, Issel-Tarver L, Kasarskis A, Lewis S, Matese JC, Richardson JE, Ringwald M, Rubin GM, Sherlock G: Gene ontology: tool for the unification of biology. The Gene Ontology Consortium. Nat Genet 2000, 25(1):25–29. 10.1038/75556PubMed CentralView ArticlePubMedGoogle Scholar
- Boeckmann B, Bairoch A, Apweiler R, Blatter MC, Estreicher A, Gasteiger E, Martin MJ, Michoud K, O'Donovan C, Phan I, Pilbout S, Schneider M: The SWISS-PROT protein knowledgebase and its supplement TrEMBL in 2003. Nucleic Acids Res 2003, 31(1):365–370. 10.1093/nar/gkg095PubMed CentralView ArticlePubMedGoogle Scholar
- Segditsas S, Tomlinson I: Colorectal cancer and genetic alterations in the Wnt pathway. Oncogene 2006, 25(57):7531–7537. 10.1038/sj.onc.1210059View ArticlePubMedGoogle Scholar
- iHOP data used for manual evaluation[http://www.pdg.cnb.uam.es/UniPub/iHOP/info/gene_index/manual/index.html]
- BioText disease and treatment data[http://biotext.berkeley.edu/data/dis_treat_data/sentences_with_roles_and_relations]
- Rosario B, Hearst M: Classifying semantic relations in bioscience texts. Proceedings of the 42nd Annual Meeting of the Association for Computational Linguistics, ACL Barcelona, Spain 2004.Google Scholar
- Data for evaluating associations between gene methylation and cancers[http://mit.lifescience.ntu.edu.tw/mit_test_data.html]
- Esteller M, Corn PG, Baylin SB, Herman JG: A gene hypermethylation profile of human cancer. Cancer Res 2001, 61(8):3225–3229.PubMedGoogle Scholar
- Fraga MF, Ballestar E, Villar-Garea A, Boix-Chornet M, Espada J, Schotta G, Bonaldi T, Haydon C, Ropero S, Petrie K, Iyer NG, Perez-Rosado A, Calvo E, Lopez JA, Cano A, Calasanz MJ, Colomer D, Piris MA, Ahn N, Imhof A, Caldas C, Jenuwein T, Esteller M: Loss of acetylation at Lys16 and trimethylation at Lys20 of histone H4 is a common hallmark of human cancer. Nat Genet 2005, 37(4):391–400. 10.1038/ng1531View ArticlePubMedGoogle Scholar
- Fuks F: DNA methylation and histone modifications: teaming up to silence genes. Curr Opin Genet Dev 2005, 15(5):490–495. 10.1016/j.gde.2005.08.002View ArticlePubMedGoogle Scholar
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.